CN109308280A

CN109308280A - Data processing method and relevant device

Info

Publication number: CN109308280A
Application number: CN201710617841.0A
Authority: CN
Inventors: 唐贵金; 李贤�; 岳李勇
Original assignee: Hangzhou Huawei Digital Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-07-26
Filing date: 2017-07-26
Publication date: 2019-02-05
Anticipated expiration: 2037-07-26
Also published as: CN109308280B

Abstract

The embodiment of the invention discloses a kind of data processing method, accelerate computing unit, central processing unit and heterogeneous system, for improving service process performance.The data processing method of the embodiment of the present invention is applied to accelerate computing unit, and accelerating computing unit includes accelerating engine and caching management module.This method comprises: accelerating engine, which obtains the business that central processor CPU is sent, accelerates processing request, accelerating engine accelerates processing request to handle business, to obtain processing result, accelerating engine is to caching management module application memory address, obtain target memory address, the memory headroom that processing result write-in target memory address is directed toward by accelerating engine, accelerating engine send target memory address to CPU.For accelerating engine to the acquisition of memory address by realizing to caching management module application, which is to be pre-stored in the memory address accelerated on computing unit.To, accelerating engine can quick obtaining to memory address, realize the raising of service process performance.

Description

Data processing method and relevant device

Technical field

The present embodiments relate to data processing field more particularly to a kind of data processing method, accelerate computing unit, in Central processor and heterogeneous system.

Background technique

On heterogeneous system, processing stream is usually responsible for by central processing unit (Central Processing Unit, CPU) The control of journey, and specific processing (such as compressed and decompressed, encrypting and decrypting etc.) is executed by dedicated acceleration computing unit.Accelerate Computing unit can be for example field programmable gate array (Field Programmable Gate Array, FPGA), figure Shape processing unit/graphics processor (Graphics Processing Unit, GPU), digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC) etc..Specifically, for CPU to accelerating computing unit to send message, which includes that business accelerates processing request and memory Then address accelerates to calculate the acceleration processing request of the cell processing business, obtains processing result, and should by processing result write-in Then the memory headroom that memory address is directed toward accelerates computing unit that the memory address is sent to CPU.

In general, computing unit is accelerated to accelerate processing request to carry out the corresponding processing result for handling and obtaining business Length is uncertain.If the memory headroom for accelerating computing unit to obtain is not enough to store processing result, accelerate computing unit It needs to another memory address of CPU application.Detailed process are as follows: CPU to accelerate computing unit send business accelerate processing request and First memory address obtains processing result so that accelerating the accelerating engine of computing unit to handle the business accelerates processing request, and The processing result is written to the memory headroom of first memory address direction.If the memory headroom that the first memory address is directed toward is write Man Hou, the business processing request still untreated completion, then the accelerating engine of computing unit is accelerated to trigger an interrupt notification CPU, Indicate that result memory headroom is insufficient.The interruption that the interruption processing module processing of CPU accelerates computing unit to report, from the memory of CPU Module application obtains the second memory address.Length of the interruption processing module of CPU the second memory address and the second memory address Notice accelerates computing unit.Accelerate the accelerating engine of computing unit that second memory address is written in the result after continuing with to refer to To memory headroom.If after the memory headroom that the second memory address is directed toward is write completely, business accelerates processing to request still untreated completion, Computing unit is then accelerated to continue to CPU application memory address.

The implementation of above-mentioned application memory address, the problem of bringing, are: accelerating computing unit at processing business acceleration When reason request, if when the Insufficient memory of the memory address obtained, accelerating computing unit by way of interruption to CPU application Memory address, at this time, it may be necessary to waiting for CPU continuation application memory headroom and obtain CPU transmission new memory address.At this section Between time for waiting, accelerate computing unit that cannot continue to handle source data, cause process performance not high.

Summary of the invention

The embodiment of the invention provides a kind of data processing method, accelerate computing unit, central processing unit and heterogeneous system, For improving service process performance.

The first aspect of the embodiment of the present invention provides a kind of data processing method, and this method is applied to accelerate computing unit, Accelerating computing unit includes accelerating engine and caching management module, and caching management module is used for managing internal memory address.This method packet It includes:

Accelerating engine obtains the business that central processor CPU is sent and accelerates processing request.The business accelerates processing request packet Information relevant to source data is included, for example including source data address and source data length.The business accelerates processing request for referring to Show that accelerating engine handles source data.Then, accelerating engine accelerates processing request to handle business, to obtain processing result. In order to cache the processing result, accelerating engine to caching management module application memory address, caching management module be accelerating engine Target memory address is distributed, so that accelerating engine obtains target memory address.Wherein, computing unit is accelerated to prestore at least one Memory address, target memory address belong to the memory address for accelerating to prestore on computing unit.To which accelerating engine can tie processing The memory headroom that target memory address is directed toward is written in fruit.Then, accelerating engine sends target memory address to CPU, so that CPU is logical It crosses the target memory address and gets the processing result.

In this way, accelerating engine handles to obtain processing result, accelerating engine can be obtained to caching management module application memory address To the target memory address accelerated on computing unit is pre-stored in, then, which is written into the target memory address and is directed toward Memory headroom on, accelerating engine to CPU send target memory address after, CPU can be read by the target memory address The processing result.Because accelerating engine to the acquisition of memory address by being realized to caching management module application, the accelerating engine Be located on same acceleration computing unit with caching management module, accelerating engine application to memory address be pre-stored in accelerometer Calculate unit on memory address, in this way, accelerating engine can quick obtaining arrive memory address, with store processing business acceleration handle ask The processing result asked, thus, reduce the waiting time that accelerating engine obtains memory address, it can quick obtaining memory Location realizes the raising of service process performance to use the memory address to store processing result.

In conjunction with the embodiment of the present invention in a first aspect, the first implementation of the first aspect in the embodiment of the present invention In, accelerating engine obtains target memory address and accelerating engine for processing result to caching management module application memory address The memory headroom that target memory address is directed toward is written, comprising:

Accelerating engine processing business accelerates processing request, and for the processing result that caching process obtains, accelerating engine is to slow Management module application memory address is deposited, the first memory address is obtained, wherein the first memory address is to accelerate to prestore on computing unit One of memory address.Then, the memory sky that the first memory address is directed toward is written in the first processing result by accelerating engine Between, the first processing result belongs to processing result.That is accelerating engine processing business accelerates the obtained processing result of processing request to include First processing result, the processing result further include second processing result.At this point, when the memory headroom quilt of the first memory address direction Write full, and when business accelerates processing to request untreated completion, accelerating engine obtains the to caching management module application memory address Two memory address, the second memory address are one of the memory address for accelerating to prestore on computing unit.Then, accelerating engine The memory headroom that the second memory address is directed toward is written into second processing result, second processing result belongs to processing result.

Accelerating engine needs the memory address of the memory headroom caching process result of direction being sent to CPU, thus, accelerate Engine sends target memory address to CPU, comprising: accelerating engine sends the first memory address and the second memory address to CPU.

In this way, accelerating engine can when accelerating engine needs caching process business that processing is accelerated to request obtained processing result Can be by caching management module application memory address, obtaining being stored in the memory address accelerated on computing unit Apply for memory address when processing business accelerates processing request, is also possible to the memory being directed toward in the memory address applied When space is write completely and processing is accelerated to request still untreated completion for business, then to caching management module application memory address, thus nothing It need to be to CPU application memory address.

In conjunction with the embodiment of the present invention in a first aspect, second of implementation of the first aspect in the embodiment of the present invention In, before accelerating engine obtains the business acceleration processing request that CPU is sent, the method for this implementation further include: cache management Module obtains the target memory address that CPU is sent, and then, target memory address is stored in acceleration and calculates list by caching management module In member.To, it can be achieved that accelerate computing unit on stored memory address, for accelerating engine use.

In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention Any one in mode, in the third implementation of the first aspect of the embodiment of the present invention, accelerating engine is sent to CPU After target memory address, the method for this implementation further include: caching management module is with obtaining the target memory that CPU is sent Location, the processing result on memory headroom that target memory address is directed toward are read by CPU.That is, CPU gets target memory address Afterwards, the processing result cached on the memory headroom that target memory address is directed toward is read, then, CPU sends out the target memory address The caching management module for accelerating computing unit is given, thus, target memory address is stored in acceleration and calculated by caching management module On unit, for subsequent use.

In this way, it can be used repeatedly for memory address, it is ensured that accelerate computing unit to have workable memory address, improve System performance, and avoid and divide new memory from memory headroom repeatedly for accelerating computing unit to use, reduce system Expense.

In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention Any one in mode accelerates pre- on computing unit in the 4th kind of implementation of the first aspect of the embodiment of the present invention The memory address deposited is the memory address according to bit bit compression of alignment.To which, accelerating engine is to caching management module application Memory address obtains target memory address, comprising: accelerating engine sends memory address request, the memory to caching management module Address requests are used to request memory address to caching management module.Under the triggering of memory address request, caching management module From determining target memory address in the memory address prestored on computing unit is accelerated, caching management module is right according to being aligned bit Target memory address is decompressed, the target memory address decompressed.Then, accelerating engine obtains caching management module hair The target memory address for the decompression sent.

By compressing to memory address, memory address the space occupied size can be reduced, to be conducive to accelerating On computing unit, in a certain size memory space, more memory address are stored, facilitate lifting system capacity.

In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention Any one in mode accelerates pre- on computing unit in the 5th kind of implementation of the first aspect of the embodiment of the present invention The memory address deposited is preset with check value, which can be on memory address to be stored in acceleration computing unit Before, it is calculated by caching management module, then, be stored in corresponding with the check value of memory address is accelerated on computing unit. To which accelerating engine obtains target memory address to caching management module application memory address, comprising: accelerating engine is to caching Management module sends memory address request, and memory address request is for requesting memory address to caching management module.In the memory Under the triggering of Address requests, caching management module is from accelerating with determining target memory on computing unit in the memory address that prestores The check value of target memory address is calculated in location, caching management module.When the check value and target memory address being calculated Preset check value matching when, expressions target memory address be correct memory address, mistake does not occur in storing process Change, thus caching management module to accelerating engine send target memory address so that accelerating engine is with obtaining target memory Location.

By the default of check value, before accelerating engine is distributed in target memory address, caching management module is first calculated The check value of target memory address out, check value and preset check value that logical matching primitives obtain ensure target memory address The reliable of system operation is improved so that correct memory address can be used in accelerating engine for correct memory address Property.

In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention Any one in mode, in the 6th kind of implementation of the first aspect of the embodiment of the present invention, target memory address configuration There is internal storage state word, the value of internal storage state word includes accelerating computing unit to occupy to occupy with CPU, and the value of internal storage state word is by accelerating Computing unit is arranged to obtain.

Wherein, the acceleration computing unit of target memory address occupies for indicating that target memory address has been stored in acceleration On computing unit, target memory address is used by acceleration computing unit.The CPU of target memory address occupies for indicating in target The processing result for depositing the memory headroom caching of address direction can be read by CPU, and target memory address is used by CPU.

Internal storage state word is set, memory address can be managed by internal storage state word.Specifically, accelerating engine to After CPU sends target memory address, the method for this implementation further include: caching management module obtains the target that CPU is sent Memory address, the value of the internal storage state word of target memory address are continuously time that CPU occupies and are greater than preset time.I.e. in system When detecting that the value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time, for example, passing through CPU carries out the detection operation, which can be sent to again caching management module by CPU, so that cache management mould Target memory address is stored in by block to be accelerated on computing unit.

CPU reads the processing cached on the memory headroom that target memory address is directed toward after getting target memory address As a result, still, it is understood that there may be the problems such as failure, target memory address is caused to be lost, calculated so that memory address is no longer accelerated Unit and CPU are used, if detect the value of the internal storage state word of target memory address be continuously time that CPU occupies be greater than it is pre- If the time, then it represents that the target memory address is in lost condition, so that CPU recycles the target memory address, and by the target Memory address is sent to the caching management module for accelerating computing unit, allows and computing unit is accelerated to reuse in the target Address is deposited, memory source, the reliability of lifting system can be made full use of in this way.

In conjunction with the 6th kind of implementation of the first aspect of the embodiment of the present invention, the of the first aspect of the embodiment of the present invention In seven kinds of implementations, target memory address is also configured with verification status word, verifies status word and the state synchronized time is corresponding, shape Time when being used to state synchronization time indicate to synchronize the value for verifying status word as the value of internal storage state word.Target memory address The value for verifying status word is obtained by the value of the internal storage state word of target memory address synchronous under synchronous condition, and synchronous condition is mesh The value for marking the value of the verification status word of memory address and the internal storage state word of target memory address is not identical.For example, CPU is detected When the value of the internal storage state word of the value and target memory address of the verification status word of target memory address is not identical, CPU is by target The value of the verification status word of memory address is synchronous be target memory address internal storage state word value, and when updating state synchronized Between.

The verification status word obtained in this way and state synchronized time, the case where can be used for detecting internal storage state word, i.e. mesh The value for marking the internal storage state word of memory address is continuously time that CPU occupies and is greater than preset time specifically: target memory address The value of internal storage state word and the value of verification status word of target memory address be all that CPU occupies, and the shape of target memory address Difference between state synchronization time and current time is greater than preset time, and current time is the memory for detecting target memory address The value of the verification status word of the value and target memory address of status word is all the time that CPU occupies.

In this way, the value of internal storage state word to be synchronized to the value for verifying status word, may make verification status word when being worth different Value the case where reflecting the value of internal storage state word, then the state synchronized time is detected, internal storage state word can be obtained The temporal information of value.It verifies status word to be arranged by CPU, CPU by detection verifies status word and state synchronized time to reach pair The detection of internal storage state word, can be improved execution efficiency in this way.

The second aspect of the embodiment of the present invention provides a kind of data processing method, and this method is applied on CPU, and CPU includes Service Processing Module and memory modules.This method comprises:

Because the memory modules on CPU are used for managing internal memory address, so that Service Processing Module is to memory modules application memory Address obtains target memory address.Then, Service Processing Module is into the caching management module transmission target for accelerating computing unit Address is deposited, is accelerated on computing unit so that target memory address is stored in by caching management module.It is calculated in this way, realizing acceleration Stored memory address on unit.Service Processing Module obtains business and accelerates after handling request, and Service Processing Module is calculated to acceleration The accelerating engine of unit sends business and accelerates processing request, so that accelerating engine accelerates processing request to handle business, with Processing result is obtained, accelerating engine is to caching management module application memory address, and after obtaining target memory address, accelerating engine will The memory headroom that target memory address is directed toward is written in processing result.That is CPU first sends memory address to acceleration computing unit, for Accelerate computing unit storage, in this way, when accelerating calculating cell processing business that processing request is accelerated to obtain processing result, in order to cache The processing result accelerates the accelerating engine of computing unit directly can obtain the target memory prestored from acceleration computing unit Address, without to CPU apply acquisition memory address, accelerating engine can quick obtaining to memory address, added with storing processing business The processing result that speed processing request obtains, thus, reduce the waiting time that accelerating engine obtains memory address, it can quick obtaining Memory address realizes the raising of service process performance to use the memory address to store processing result.

In conjunction with the second aspect of the embodiment of the present invention, in the first implementation of the second aspect of the embodiment of the present invention In, Service Processing Module sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, this implementation Method further include: Service Processing Module obtains the target memory address that accelerating engine is sent, which is directed toward interior Depositing spatial cache has processing result, thus the memory headroom reading process knot that Service Processing Module is directed toward from target memory address Fruit.Then, Service Processing Module sends target memory address to the caching management module of the acceleration computing unit, so that caching is managed Target memory address is stored in by reason module to be accelerated on computing unit.

In conjunction with the second aspect of the embodiment of the present invention, in second of implementation of the second aspect of the embodiment of the present invention In, CPU further includes verifying module and target memory address configuration to have internal storage state word, and the value of internal storage state word includes accelerating Computing unit occupies to be occupied with CPU, and the value of the internal storage state word is arranged to obtain by acceleration computing unit.

Internal storage state word is set, memory address can be managed by internal storage state word.Specifically, business processing mould Block sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, the method for this implementation further include: core The value for the internal storage state word for looking into module detection target memory address is continuously whether the time that CPU occupies is greater than preset time.If The value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time, then Service Processing Module to Caching management module sends target memory address, so that target memory address is stored in acceleration computing unit by caching management module On.

CPU reads the processing cached on the memory headroom that target memory address is directed toward after getting target memory address As a result, still, it is understood that there may be the problems such as failure, target memory address is caused to be lost, calculated so that memory address is no longer accelerated Unit and CPU are used, if verify that module detects that the value of the internal storage state word of target memory address is continuously that CPU occupies when Between be greater than preset time, then it represents that the target memory address is in lost condition, so that CPU recycles the target memory address, and The target memory address is sent to the caching management module for accelerating computing unit, allows and computing unit is accelerated to reuse The target memory address can make full use of memory source, the reliability of lifting system in this way.

In conjunction with second of implementation of the second aspect of the embodiment of the present invention, the of the second aspect of the embodiment of the present invention In three kinds of implementations, target memory address is also configured with verification status word.Wherein, status word and state synchronized time pair are verified It answers, the time when state synchronized time is used to indicate to synchronize the value for verifying status word as the value of internal storage state word.

Verify module detection target memory address internal storage state word value be continuously the time that CPU occupies whether be greater than it is pre- If the time, comprising:

Verify module every prefixed time interval, with judging value and the target memory of the verification status word of target memory address Whether the value of the internal storage state word of location is identical；

If the value of the internal storage state word of the value and target memory address of the verification status word of target memory address is not identical, The value that the value of the verification status word of target memory address is synchronized the internal storage state word for target memory address by module is verified, and more The state synchronized time of fresh target memory address；

If the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, when Verify module detect the value of the verifications status word of target memory address occupy for CPU, the state synchronized of target memory address when Between difference between current time be greater than preset time and within a preset time interval the internal storage state word of target memory address Value when having not been changed, Service Processing Module executes the step of sending target memory address to caching management module, and current time is It verifies module and detects that the value of the verification status word of target memory address is the time that CPU occupies.

In this way, the value of internal storage state word to be synchronized to the value for verifying status word, may make verification status word when being worth different Value the case where reflecting the value of internal storage state word, then the state synchronized time is detected, internal storage state word can be obtained The temporal information of value.It verifies status word to be arranged by CPU, CPU by detection verifies status word and state synchronized time to reach pair The detection of internal storage state word, state synchronized time and verification status word are that CPU is arranged to obtain, and facilitate CPU to the state synchronized Time and the use for verifying status word, can be improved execution efficiency in this way.

The another aspect of the application provides a kind of computer readable storage medium, in the computer readable storage medium It is stored with instruction, when run on a computer, so that computer executes method described in above-mentioned various aspects.

The another aspect of the application provides a kind of computer program product comprising instruction, when it runs on computers When, so that computer executes method described in above-mentioned various aspects.

As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that

Accelerating computing unit includes accelerating engine and caching management module, and caching management module is used for managing internal memory address, On accelerating computing unit, accelerating engine obtains the business that CPU is sent and accelerates processing request, and then, accelerating engine adds business Speed processing request is handled, to obtain processing result.Accelerating engine obtains target to caching management module application memory address Memory address, wherein accelerate computing unit to prestore at least one memory address, target memory address belongs to acceleration computing unit On the memory address that prestores.To, the memory headroom that processing result write-in target memory address is directed toward by accelerating engine, Yi Jijia Fast engine sends target memory address to CPU.

In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate computing unit Accelerating engine business that CPU is sent accelerate processing request to handle, to obtain processing result.Accelerating engine needs to store The processing result, for this purpose, accelerating engine can obtain being pre-stored on acceleration computing unit to caching management module application memory address Target memory address, then, by the processing result be written the target memory address direction memory headroom on, accelerating engine to After CPU sends target memory address, CPU can read the processing result by the target memory address.Because of accelerating engine To the acquisition of memory address by realizing to caching management module application, the accelerating engine and caching management module are located at same Accelerate computing unit on, accelerating engine application to memory address be pre-stored in accelerate computing unit on memory address, in this way, Accelerating engine can quick obtaining to memory address, to store the processing result that processing business accelerates processing request to obtain.With acceleration Engine is compared to the scheme of CPU application memory address, and the scheme of the embodiment of the present invention is without waiting for CPU to the sound of memory address It answers, reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory address, with using the memory Location stores processing result, realizes the raising of service process performance.

Detailed description of the invention

Fig. 1 is a kind of hardware system structure schematic diagram of heterogeneous system provided in an embodiment of the present invention；

Fig. 2 shows a kind of existing step schematic diagrames of data processing method；

Fig. 3 is the memory address stored on acceleration computing unit provided in an embodiment of the present invention, accelerates computing unit and CPU Relation schematic diagram；

Fig. 4 is a kind of method flow diagram of data processing method provided in an embodiment of the present invention；

Fig. 5 is the flogic system block diagram of the data processing method of the embodiment of the present invention；

Fig. 6 is the method flow diagram of embodiment illustrated in fig. 5；

Fig. 7 is the division mode schematic diagram in the physical memory space of embodiment illustrated in fig. 5；

Fig. 8 is the memory address schematic diagram of embodiment illustrated in fig. 5；

Fig. 9 is the specific implementation process schematic that embodiment illustrated in fig. 6 is related to；

Figure 10 is a kind of concrete structure schematic diagram for accelerating computing unit provided in an embodiment of the present invention；

Figure 11 is a kind of flow chart of data processing method provided in an embodiment of the present invention；

Figure 12 is that the memory headroom that embodiment illustrated in fig. 11 is related to manages schematic diagram；

Figure 13 is that the memory address that embodiment illustrated in fig. 11 is related to compresses schematic diagram；

Figure 14 is that the memory address that embodiment illustrated in fig. 11 is related to compresses another schematic diagram；

Figure 15 is the memory address mappings relational graph that embodiment illustrated in fig. 11 is related to；

Figure 16 is the internal storage state word that embodiment illustrated in fig. 11 is related to and verification status word relational graph；

Figure 17 is a kind of structural schematic diagram for accelerating computing unit provided in an embodiment of the present invention；

Figure 18 is a kind of structural schematic diagram of central processing unit provided in an embodiment of the present invention.

Specific embodiment

Fig. 1 is a kind of hardware system structure schematic diagram of heterogeneous system provided in an embodiment of the present invention.Refering to fig. 1, this is different Construction system includes CPU and acceleration computing unit.Wherein, the external memory of CPU, and pass through PCIe interface and acceleration computing unit phase Even.CPU is responsible for the control of process flow, and executes specific processing, such as compressed and decompressed and encryption by acceleration computing unit Decryption etc..

The scene of heterogeneous system execution business processing are as follows: CPU includes Service Processing Module, which passes through Message accelerates business to be processed to handle request and be sent to that computing unit is accelerated to carry out acceleration processing, accelerates computing unit to industry After the completion of the processing that business accelerates processing request to carry out, i.e., the memory headroom being directed toward obtained processing result write-in memory address, Accelerate computing unit that the information such as the memory address and processing result length are passed through message buffer descriptor (Buffer Descriptor, BD) it is sent to CPU, the Service Processing Module of CPU receives message BD from receiving queue, to obtain processing knot Fruit, and be further processed.

It is appreciated that the acceleration computing unit can for FPGA, GPU, DSP or ASIC etc., the embodiment of the present invention to this not Make specific limit.

The existing acceleration treatment process of heterogeneous system are as follows: CPU to accelerate computing unit send message, the message include to The source data address of processing and source data length, the purpose memory address and purpose memory address length of storing processing result, with And other processing parameters, the processing parameter are used to refer to accelerate how computing unit is handled source data.Accelerate to calculate single After member gets these information, computing unit is accelerated to read source data from memory according to source data address and source data length, with And the processing type to be executed is obtained according to processing parameter, corresponding processing is then carried out to source data according to processing type, is obtained To processing as a result, and writing processing result in the memory that purpose memory address is directed toward.

In general, the length for the processing result being calculated is uncertain.Such as accelerate computing unit be FPGA The business specifically executed be decompression scene under, the result length after being decompressed to source data be it is uncertain, depend on The algorithm and data format of compression.At this time, the memory headroom size of FPGA storage processing result is sent to regard to bad determination.Example Such as, one section there are the text file of more duplicate contents, and size is 50MB, there was only 10MB after compression, to this compressed file into When row decompression, accelerate computing unit that the result length after decompression can not be determined in advance, CPU can only apply for one section of mesh based on experience value Memory headroom, the size of the purpose memory headroom is, for example, 30MB.Then, CPU sends the address of the purpose memory headroom To FPGA.In decompression procedure the purpose memory headroom can be constantly written in decompression result by FPGA, if the discovery space 30MB is inadequate, Then FPGA is needed to the more memory headrooms of CPU application, to save decompression result.In the scene having, the picture of a 12MB File, file size has 10MB after compression.CPU applies for the memory headroom of 30MB based on experience value, and by the ground of the memory headroom Location is sent to FPGA, for FPGA use, and the space of 12MB can be only used after FPGA decompression, it is seen then that at this moment exist in 18MB Deposit the waste in space.

Accelerate computing unit when handling source data, if the Insufficient memory caching process for the memory address that discovery obtains As a result, existing processing mode is: in the memory headroom deficiency that discovery uses, computing unit being accelerated to notify by interrupt mode CPU, then, CPU apply for memory again, and the memory is supplemented to computing unit is accelerated, for accelerating computing unit to use.Under Face will be explained by taking Fig. 2 as an example.

As shown in Fig. 2, Fig. 2 shows a kind of existing step schematic diagram of data processing method, wherein accelerating to calculate single Member is by taking FPGA as an example.The data processing method includes:

The Service Processing Module executed on step 201:CPU obtains the first memory address to memory modules application memory.

Step 202: Service Processing Module constructs the message BD for being sent to FPGA, and sends message BD to and send team Column.

Wherein message BD includes source data address and source data length to be processed, first for storing processing result Memory address and the first memory address length and other data processing parameters.

Step 203:FPGA reads message BD from transmit queue, message BD is sent to accelerating engine, to be handled.

Accelerating engine parses message BD, obtains source data address, source data length and the first memory address, then accelerates Engine reads to obtain source data according to the source data address and source data length, handles the source data, obtains processing knot Fruit, and the processing result is written in the memory headroom that first memory address is directed toward.

If step is then jumped in the also untreated completion of source data after the memory headroom that the first memory address is directed toward is fully written 204, otherwise jump to step 207.

Step 204: accelerating engine triggers an interrupt notification CPU, and the memory headroom to indicate memory address is insufficient.

The interruption that the interruption processing module processing FPGA of step 205:CPU is reported, obtains in second from memory modules application Deposit address.

The interruption processing module of step 206:CPU notifies the second memory address and the second memory address length to FPGA.

After FPGA obtains the second memory address and the second memory address length, the accelerating engine of FPGA is to source data continuation The memory headroom that the second memory address is directed toward is written in the processing result obtained after processing.If the memory that the second memory address is directed toward After space is write completely, the still untreated completion of source data then jumps to step 204, otherwise jumps to step 207.

Step 207:FPGA has handled source data, constructs message BD, and receiving queue, message BD is written in message BD It include result address list and processing result length.Wherein, which includes that memory headroom is written with processing As a result memory address, the first memory address and the second memory address as escribed above.

The Service Processing Module of step 208:CPU reads message BD from receiving queue, then, parses message BD, gets Result address list and processing result length, to be carried out according to the result address list and processing result length to processing result Further processing.

Such processing mode, bring influence is:

1) FPGA is in low memory, needs waiting for CPU continuation application memory headroom and sends new memory address and interior Address size is deposited, in this period, FPGA cannot continue to handle source data, lead to the data processing performance of FPGA not It is high.

2) FPGA exists with CPU and frequently interacts.

3) in order to reduce the interaction of CPU and FPGA, the memory headroom for the memory address that usual CPU is issued is bigger, this meeting Biggish memory headroom is caused to waste.

Wherein, accelerate computing unit when the memory headroom of the memory address of acquisition is not enough to store processing result, need Waiting for CPU supplements memory headroom, in this period, accelerates calculating is single cannot continue with data, causes to accelerate to calculate single number It is not high according to process performance.The problem of this is especially needed concern accelerates to calculate if can solve the technical problem to improve The data processing performance of unit, will save the time of user, and make the operational effect of heterogeneous system more excellent.

For this purpose, the embodiment of the invention provides a kind of data processing method, accelerating computing unit, central processing unit and isomery System, to solve above-mentioned technical problem.

The present embodiments relate to acceleration computing unit include accelerating engine and caching management module, the accelerating engine For accelerating processing request specifically to be handled business, which is increased module, caching management module The cache list for accelerating computing unit to use is managed, the memory address of cache list storage is provided by CPU, i.e., CPU can Memory address is notified to computing unit is accelerated, so that computing unit is accelerated to flush to the memory address in cache list.Add When fast computing unit is needed using memory, memory address directly can be taken out from the cache list.In this way, accelerating computing unit not It needs to notify CPU, the response of passive waiting for CPU, to improve system process performance by interrupt mode.

Wherein, which can be realized by calculating cell encoding to acceleration.

About the memory address stored on acceleration computing unit, the relationship of computing unit and CPU is accelerated to can refer to shown in Fig. 3 Schematic diagram.In Fig. 3, accelerate computing unit by taking FPGA as an example, FPGA random access memory (random in piece Access memory, RAM) storage memory address, the space that memory address is directed toward is the ram space on CPU.

It is appreciated that accelerating the memory address stored on computing unit that can not also be delayed in the form of cache list It deposits.

In the following, i.e. to data processing method provided in an embodiment of the present invention, acceleration computing unit, central processing unit and isomery System is described in detail, and the content of Examples below can refer to the content of above-mentioned Fig. 1 to embodiment illustrated in fig. 3.

Firstly, summarizing data processing method provided in an embodiment of the present invention.

Fig. 4 is a kind of method flow diagram of data processing method provided in an embodiment of the present invention.This method is applied to accelerate Computing unit, the acceleration computing unit include accelerating engine and caching management module, and caching management module is for managing internal memory Location.Specific usage scenario sees the specific descriptions of Fig. 1 and embodiment shown in Fig. 3 and above-mentioned other embodiments.

Refering to Fig. 4, which includes:

Step 401: accelerating engine obtains the business that CPU is sent and accelerates processing request.

Service Processing Module on CPU obtains business and accelerates after handling request, and the Service Processing Module is single to accelerating to calculate The accelerating engine of member sends the business and accelerates processing request, so that accelerating engine gets the business and accelerates processing request, and right It is handled.

Wherein, it includes information relevant to source data to be processed that business, which accelerates processing request, for example including source data Location, alternatively, including source data address and the information such as source data length and processing parameter.

Computing unit is accelerated to be specifically as follows FPGA, GPU, DSP or ASIC etc..

It can be to be obtained by way of message BD that accelerating engine obtains business to accelerate the concrete mode of processing request from CPU It takes, i.e., the Service Processing Module of CPU constructs to obtain message BD, and message BD includes that business accelerates processing request, and accelerating engine is The business, which is got, by message BD accelerates processing request.

Step 402: accelerating engine accelerates processing request to handle business, to obtain processing result.

Accelerating engine gets business and accelerates after handling request, accelerates processing request to handle the business, after processing Obtain processing result.

CPU is responsible for the control of process flow, and computing unit is accelerated to execute specific processing.

Specifically, accelerate the accelerating engine of computing unit to get business and accelerate processing request, business acceleration processing is asked It asks including source data address, source data length and processing parameter.The accelerating engine according to source data address and source data length from Memory reads source data.Then, accelerating engine carries out the specified acceleration calculating of processing parameter to the source data.

Step 403: accelerating engine obtains target memory address to caching management module application memory address.

Wherein, computing unit is accelerated to prestore at least one memory address, target memory address belongs to acceleration computing unit On the memory address that prestores.

The acceleration computing unit includes accelerating engine and caching management module, and the caching management module is for managing internal memory Location.When accelerating engine needs memory address, when the processing result obtained with storing step 402, accelerating engine is to cache management mould Block application memory address.Because the acceleration computing unit prestores at least one memory address, caching management module gets acceleration After the application of engine, accelerating engine is distributed into the target memory address in the memory address prestored.

Wherein, the application in step 403, is specifically as follows: accelerating engine sends a memory address to caching management module Request, memory address request is for requesting memory address to caching management module.Caching management module is with getting the memory After the request of location, target memory address is determined from the memory address for accelerating computing unit to prestore, and the target memory address is sent out Give accelerating engine.

It is appreciated that target memory address can be one or more memory address, accelerating engine can execute it is primary or Multiple step 403, the present invention is not especially limit this.

Accelerate computing unit to prestore at least one memory address, and accelerates computing unit with obtaining the memory for being used for caching The mode of location can be realized by following step, wherein be illustrated for obtaining target memory address.

Memory modules application memory address from the Service Processing Module of step A1:CPU to the CPU, with obtaining target memory Location.

CPU includes Service Processing Module and memory modules, and memory modules accelerate computing unit to use for managing Physical memory space.The memory headroom that Service Processing Module can be used to memory modules application for acceleration computing unit.

Specifically, in order to which to computing unit storage allocation address is accelerated, Service Processing Module is into memory modules application Address is deposited, i.e. Service Processing Module sends memory address request to memory modules, to trigger memory modules from the memory prestored Target memory address is determined in location, and the target memory address is returned into Service Processing Module.

Step A2: Service Processing Module sends target memory address to the caching management module of acceleration computing unit.

The Service Processing Module notice executed on CPU accelerates computing unit to refresh available memory address, so that caching is managed Target memory address is stored in by reason module to be accelerated on computing unit.

Specifically, CPU construction includes the message BD of memory address, and message BD is sent to and accelerates computing unit Caching management module.

Step A3: the caching management module of computing unit is accelerated to refresh available memory address to cache list.

After caching management module gets the target memory address of Service Processing Module transmission, caching management module is by target Memory address, which is stored in, to be accelerated on computing unit.

Specifically, caching management module reads memory address from message BD, is then buffered in the memory address and adds In the cache list of fast computing unit, refresh available memory address to cache list to realize.

It is appreciated that step 402 and step 403 may be performed simultaneously, can also be executed after step 402 with step 403. Processing request is accelerated to handle business in accelerating engine, also untreated complete, when having to part processing result, acceleration Engine is i.e. to caching management module application target memory address；Alternatively, accelerating engine accelerates processing request processing to complete business After obtaining whole processing results, then to caching management module application memory address.

It is appreciated that accelerating engine needs memory there are many actual conditions of triggering acceleration computing unit execution step 403 When space is to store processing result, 403 are thened follow the steps.Specifically, it can be and add when acceleration computing unit gets business Speed processing request, when handling to obtain processing result to business acceleration request, accelerating engine obtains mesh to caching management module application Mark memory address.Alternatively, the memory headroom write-in processing result for accelerating computing unit to be directed toward to memory address, in the memory headroom After writing completely, business accelerates processing to request still untreated completion, at this time in order to store remaining processing result, accelerates computing unit It need to be to caching management module application memory address.

In the embodiment that the present invention has, step 401 can be that accelerating engine obtains the business acceleration processing that CPU is sent Request and memory address write what the full CPU was sent when accelerating engine handles the processing result that the business accelerates processing request to obtain The memory headroom that memory address is directed toward, and the business accelerates processing request still untreated complete, then accelerating engine executes step 403.

Step 404: the memory headroom that processing result write-in target memory address is directed toward by accelerating engine.

Accelerating engine accelerates processing request to handle business, obtains all or part of processing result, accelerating engine to Caching management module application memory address, obtains target memory address, then, all or part that accelerating engine can obtain this The memory headroom that target memory address is directed toward is written in processing result.

Step 405: accelerating engine sends target memory address to CPU.

After accelerating engine, which has handled business, accelerates processing request, accelerating engine is completed processing result in target memory Write-in in location, accelerating engine can send the target memory address to CPU, so that CPU reading is buffered in the target memory address Processing result on the memory headroom of direction carries out the processing of next step.

Specifically, accelerate computing unit after the memory headroom for being directed toward processing result write-in target memory address, add Fast computing unit handles the message BD completed according to target memory address architecture and notifies CPU, and then, CPU is received from receiving queue Message BD parses BD, obtains target memory address, obtains processing result according to the target memory address.

It is appreciated that accelerating engine can send target memory address to CPU, it can also be to CPU with sending target memory Location and other information, other information are, for example, processing result length, so that CPU is long according to the target memory address and processing result Degree obtains processing result, and wherein processing result length can accelerate processing request to obtain when handling business by accelerating engine It arrives.When accelerating engine sends target memory address to CPU, CPU can be by reading the acquisition of information on target memory address To processing result length, to obtain processing result.

In conclusion the data processing method of the embodiment of the present invention, because being arranged in for managing on accelerating computing unit The caching management module for depositing address accelerates the accelerating engine of computing unit to accelerate at processing request the business that CPU is sent Reason, to obtain processing result.Accelerating engine needs to store the processing result, for this purpose, accelerating engine can be to caching management module Shen Please memory address, obtain being pre-stored in and accelerate the target memory address on computing unit that the target then is written in the processing result On the memory headroom that memory address is directed toward, after accelerating engine sends target memory address to CPU, CPU can be by the target It deposits address and reads the processing result.Because accelerating engine passes through the acquisition of memory address real to caching management module application Existing, the accelerating engine and caching management module are located on same acceleration computing unit, the memory address that accelerating engine application is arrived For be pre-stored in accelerate computing unit on memory address, in this way, accelerating engine can quick obtaining arrive memory address, with store handle The processing result that business accelerates processing request to obtain.Compared with accelerating engine is to the scheme of CPU application memory address, the present invention is real Response of the scheme of example without waiting for CPU to memory address is applied, the waiting time that accelerating engine obtains memory address is reduced, from And can quick obtaining memory address, with use the memory address store processing result, realize the raising of service process performance.

Data processing method provided in an embodiment of the present invention is described in detail below.

Refering to Fig. 5 and Fig. 6, Fig. 5 is the flogic system block diagram of the data processing method of the embodiment of the present invention, and Fig. 6 is Fig. 5 institute Show the method flow diagram of embodiment.With reference to the content and other embodiments above of Fig. 1 and embodiment illustrated in fig. 4, the present invention The data processing method of embodiment is applied on heterogeneous system as shown in Figure 1, which includes CPU and accelerate to calculate single Member, the CPU include Service Processing Module and memory modules, which includes accelerating engine and caching management module, The caching management module is used for managing internal memory address.Wherein, accelerating computing unit may include one or more accelerating engines, no Same accelerating engine can handle different business simultaneously and accelerate processing request.

It is below FPGA as example to accelerate computing unit to more intuitively describe method shown in the embodiment of the present invention It is illustrated, it will be understood that the acceleration computing unit of the embodiment of the present invention can also be other types of acceleration device.

Refering to Fig. 5 and Fig. 6, and with reference to the content of the various embodiments described above, the data processing method packet of the embodiment of the present invention It includes:

Step 601: Service Processing Module obtains target memory address to memory modules application memory address.

On CPU, Service Processing Module is to memory modules application target memory address.

Wherein, the physical memory space that memory modules management uses for FPGA, the physical memory space can be from behaviour Make system (Operating System, OS) startup stage is just reserved or system run-time memory management module to pass through The interface application of OS obtains.Specifically, memory modules can manage the memory sky of memory address direction by memory address Between.

In order to FPGA storage allocation address, Service Processing Module on CPU is to memory modules application memory address, example Such as, Service Processing Module calls memory modules to apply for memory address by code interface, and memory modules are from the memory prestored Target memory address is obtained in location, and the target memory address is sent to Service Processing Module, thus Service Processing Module Shen It please obtain target memory address.

In the embodiment that the present invention has, when memory modules initialize, according to business demand, physical memory space is divided For fixed block size.

For example, the division mode in physical memory space is as shown in Figure 7.Assuming that when system initialization, memory modules are from the Shen OS Please 100MB memory headroom, initial address 0x6400000, end address 0xC800000, then this section of space according to 2MB size is divided into 50 pieces.When Service Processing Module application memory address, with returning to the memory of one of free block every time Location.

Optionally, in the embodiment for increasing memory address state instruction field, memory modules are set when initializing in corresponding The internal storage state word of counterfoil is the free time.When Service Processing Module is to memory modules application memory address, memory modules return to one Internal storage state word is the initial address of idle memory block.It is seen about the embodiment for increasing memory address state instruction field Following detailed description.

Step 602: Service Processing Module sends target memory address to the caching management module of FPGA.

After Service Processing Module gets target memory address, which is sent to the cache management of FPGA Module, so that caching management module gets target memory address.The target memory address is memory address workable for FPGA.

Specifically, Service Processing Module is at least wrapped according to the target memory address architecture notification message, the notification message The target memory address applied is included.Then Service Processing Module sends the notification message to the caching management module of FPGA BD, notice FPGA refresh available memory address.

In the embodiment that the present invention has, which further comprises configuration parameter, which is used to indicate interior Deposit address flush to cache list method.Cache list is specific storage mode of the memory address on FPGA.

It is appreciated that CPU notice FPGA refreshes available memory address, can there are many concrete implementation modes, it is as follows For two examples:

Mode one, CPU, which construct message BD according to memory address and send message BD, discharges queue to memory, wherein the message BD includes the information such as type of message, memory address, configuration parameter.Caching management module is obtained by reading memory release queue Message BD is got, so that parsing message BD obtains memory address.

Mode two, CPU directly configure the register of FPGA.The information such as cache address, configuration parameter on the register.

Step 603: target memory address is stored on FPGA by caching management module.

Caching management module is used for managing internal memory address, and caching management module gets the target memory address of CPU transmission Afterwards, target memory address is stored on FPGA by caching management module.In this way, can be realized on FPGA to the pre- of memory address It deposits.

Wherein, the different modes of target memory address are sent corresponding to CPU, caching management module obtains target memory address Mode also there are many.For example, corresponding to above-mentioned mode one and mode two, caching management module receives target memory address Concrete mode can be as follows:

Mode one: FPGA reads memory and discharges queue, parses message BD, and memory address, configuration ginseng are obtained from message BD The information such as number, and memory address is put into cache list.

Mode two: FPGA obtains memory address, configuration parameter by register, and memory address is put into cache list.

In embodiments of the present invention, memory address is stored on FPGA in a manner of list.In CPU to caching management module In the embodiment for sending configuration parameter, because configuration parameter is used to indicate the method that memory address is put into cache list, cache management Module can specifically cache the target memory address got according to the configuration parameter.For example, by target memory Location is put into before cache list, and similar last in, first out (Last In Fast Out, LIFO), in this way, FPGA need using When memory address, the memory address can be preferentially used.Alternatively, configuration parameter, which can also indicate that, is put into caching for target memory address List backmost, in this way, to realize all memory blocks are used in turn.

It is appreciated that in an embodiment of the present invention, being not limited to the caching of memory address with cache list on FPGA Mode realize.

About the cache address on FPGA, also there are many specific modes.It is as follows:

1) compression memory address is cached.

In the embodiment that the present invention has, on FPGA, cached again after being compressed to memory address, to reduce memory Address the space occupied.For example, as described above, the memory address prestored on FPGA is according to bit bit compression of alignment When memory address, caching management module can be aligned bit according to memory address, realize the compression of memory address.

For example, memory address schematic diagram as shown in Figure 7, since memory address is 2MB alignment, low 20 bit all It is 0, so as to 0 leave out this low bit, realizes the compression of memory address, then cache compressed memory address.Such as figure Shown in 8, the memory address that cache list is written is 0x64,0x66,0x68,0x6a, and the initial address for respectively indicating memory block is 0x6400000、0x6600000、0x6800000、0x6a00000。

It 2) is memory address configuration check value.

In the embodiment that the present invention has, on FPGA, check value can be set for memory address, i.e., so that accelerometer calculates list The memory address prestored in member is preset with check value, so that FPGA can be according to the verification if mistake occurs for the memory address of caching Value detects the mistake of memory address, to guarantee the memory address that accelerating engine uses for correct memory address, to guarantee The normal operation of the embodiment of the present invention.

Wherein, which can be parity values.

Concrete operations can be, after caching management module gets target memory address, according to the tool of target memory address Parity values are calculated in body value, target memory address are being stored when accelerating on computing unit, and parity values are attached It is added in the end of the target memory address, so that being embodied as the target memory address presets check value.Further cache management module When obtaining the target memory address, which is carried out that parity values are calculated, is then obtained after use Parity values and the parity values of caching compare, to realize the verifying to the target memory address.

For example, memory address schematic diagram as shown in Figure 7, configuration are for the memory address of the FPGA memory headroom used 0x6400000~0xC800000, only 28 bit are effectively, then optional bit30, bit31 storage check value, and bit0~ Bit27 stores memory address.

It is appreciated that one of which can be used in above-mentioned two ways, can also be together in concrete implementation mode It uses.

Optionally, in the embodiment for increasing memory address state instruction field, memory address is put into caching column by FPGA When table, i.e., memory address is stored when on FPGA, the internal storage state word for setting the memory address occupies for FPGA, for the core of CPU It looks into module and verifies use.

The value of internal storage state word corresponds to the current state of memory block.The internal storage state word can be configured with the letter such as timestamp Breath.Wherein, the current state of memory block includes following three kinds of states, i.e. the value of internal storage state word includes following three kinds:

1) Idle state: memory block is init state, and the memory address of the memory block flushes to the caching of FPGA not yet In list.

2) FPGA occupies: the memory address of memory block has flushed in the cache list of FPGA, this can be used in FPGA Memory block.

3) CPU occupies: memory block can be used by CPU, and CPU can read the data of memory block caching.

Following detailed description is seen about the embodiment for increasing memory address state instruction field.

Through the above steps 601 to step 603 execution, FPGA has got memory address, and the memory address is delayed Exist in the cache list on FPGA, realize and memory address is prestored, to be counted for FPGA using memory address According to storage prepare.

It is appreciated that the description of above-mentioned steps is illustrated by taking target memory address as an example, the side of the embodiment of the present invention Method can also repeat above-mentioned steps to multiple and different memory address, so that caching multiple memory address on FPGA.

The memory address how FPGA is prestored using this is described below, to realize accelerating engine to the fast of memory address Speed obtains, to improve the business processing efficiency of the heterogeneous system.

Step 604: Service Processing Module sends business to the accelerating engine of FPGA and accelerates processing request.

Service Processing Module obtains business and accelerates after handling request, and Service Processing Module is sent to the accelerating engine of FPGA should Business accelerates processing request, so that the business that accelerating engine gets CPU transmission accelerates processing request.Then, FPGA processing should Business accelerates processing request, and specific treatment process is as shown in the description of subsequent step.

Wherein, business processing request includes business to be processed.For example, the business accelerate processing request may include to The information such as source data address, source data length and the processing parameter of processing.

Specifically, the Service Processing Module construction of CPU is sent to the message BD of FPGA, and the content of message BD includes The information such as source data address, source data length and processing parameter to be processed.Then it sends message BD in transmit queue, So that the accelerating engine of FPGA reads message BD from the transmit queue, to get source number after parsing message BD According to information such as address, source data length and processing parameters.

Step 605: accelerating engine accelerates processing request to handle business, to obtain processing result.

Accelerating engine gets business and accelerates after handling request, accelerates processing request to handle the business, to obtain Processing result.

Specifically, when business processing request includes the information such as source data address, source data length and processing parameter, add Fast engine gets the business and accelerates after handling request, and accelerating engine is obtained according to source data address and source data length from memory Then source data to be processed carries out the specified acceleration of processing parameter to the source data and calculates.

About the specific implementation of step 605, Fig. 9 is seen, Fig. 9 is the specific implementation that embodiment illustrated in fig. 6 is related to Process schematic.As shown in figure 9, step 901 is the specific implementation process of step 605, in step 901, accelerate processing submodule The source data that block requests business processing is handled after obtaining processing result, accelerates processing submodule that the processing result is written The ram in slice space of FPGA, wherein the ram in slice plays the role of caching, meeting Reusability, with specific reference to retouching for step 905 It states.

Step 606: accelerating engine obtains target memory address to caching management module application memory address.

Execution through the above steps prestores at least one memory address, wherein target in the cache list of FPGA Memory address belongs to the memory address for accelerating to prestore on computing unit.

For the processing result stored, accelerating engine is needed to caching management module application memory address.Specifically Application process may is that accelerating engine is sent to caching management module for the memory to caching management module request memory address Address requests are determined from the memory address of the cache list prestored after caching management module gets memory address request Then accelerating engine is distributed in the target memory address by target memory address out.

It is appreciated that the quantity of target memory address can be one or more.

For example, as shown in figure 9, step 902 to step 904 is the specific implementation process of step 606.Wherein, accelerating engine Including accelerating processing submodule, result to write back submodule and ram in slice.

In step 902, accelerates processing submodule to write back submodule to result and send message, the message is for triggering result Submodule work is write back, wherein the message can be triggers to obtain by electric signal.

In step 903, when accelerating engine needs storage of the memory headroom to carry out processing result, submodule is as a result write back Block sends message to caching management module, which is memory address request, and memory address request is for cache management Module application memory address.Specifically, submodule is as a result write back to judge whether to need from caching management module application memory, such as Fruit needs to apply, then sends memory address request message to caching management module.Wherein, the message of memory address request can lead to Electric signal is crossed to trigger to obtain.Specific deterministic process is that do not have memory address when result writes back submodule；Or result writes back son When the memory headroom that the memory address that module has obtained is directed toward has write full, then result writes back submodule and sends memory address request Message.

In step 904, it as a result writes back submodule and obtains the target memory address that caching management module is sent.

After caching management module gets memory address request, caching management module is from first in first out (Fist In Fist Out, FIFO) first memory address cached, i.e. target memory address are taken out in queue, and the target memory address is sent to As a result submodule is write back.In an embodiment of the present invention, after caching management module takes out target memory address, caching management module The owner pointer value of mobile management fifo queue is needed, then gets the memory address request that result writes back submodule transmission next time When, caching management module returns to next memory address.Wherein, caching management module is realized by encoding to FPGA.It should Fifo queue is cache list.

In the embodiment that the present invention has, corresponding to different cache way of the memory address on FPGA, cache management Module to accelerating engine return target memory address mode also there are many, for example, correspond to memory address compressed cache and Check value is configured, step 606 there are following two kinds of specific implementation procedures.

1) memory address prestored on FPGA is the memory address according to bit bit compression of alignment.

Step B1: accelerating engine sends memory address request to caching management module.

Wherein, memory address request is for requesting memory address to caching management module.

The specific implementation of step B1 can refer to the detailed description of the step 903 of Fig. 9.

Step B2: caching management module decompresses target memory address for bit according to alignment, the mesh decompressed Mark memory address.

Caching management module determines target memory address after getting memory address request from cache list.Because slow It deposits the memory address prestored in list to store using compression method, then cache module needs to decompress memory address, at this In inventive embodiments, caching management module decompresses target memory address for bit according to alignment, the target decompressed Memory address.

The concrete mode of cache address can refer to the content of caching compression memory address above on FPGA.

For example, as shown in fig. 7, the memory address that FPGA is obtained is that 2MB is aligned, will be interior by mode as shown in Figure 8 Deposit low bit of address 0 is left out, and after the compression for realizing memory address, FPGA stores the memory address of the compression.Cache management Module determines the target memory address 0x64 of compression after getting memory address request, then that the address value of 0x64 is extensive It is again 0x6400000, the target memory address decompressed.

Step B3: accelerating engine obtains the target memory address for the decompression that caching management module is sent.

After the completion of memory address decompression, the target memory address of the decompression is sent to acceleration and drawn by caching management module It holds up, for accelerating engine use.

The specific implementation of step B3 can refer to the detailed description of the step 904 of Fig. 9.

By compressing memory address, so that low bit of address value does not store in cache list, it is possible to reduce memory The position bit of address, the size of cache list needed for reducing are equivalent to improve power system capacity.Because being put into the memory of cache list Address is typically all to be aligned, for example, 64 byte-aligneds, then 6 bit minimum address values are 0, to be stored in caching The address value of list can remove this 6 bit.When caching management module storage allocation address is to accelerating engine, it is supplemented deleting Low bit removed.Remove it is bit low by way of carry out address compression, bit needed for reducing cache address, The size for reducing required cache list facilitates lifting system capacity under same cache list size.

2) memory address prestored on computing unit is accelerated to be preset with check value.

Step C1: accelerating engine sends memory address request to caching management module.

The specific implementation of step C1 can refer to the detailed description of the step 903 of Fig. 9.

Step C2: the check value of target memory address is calculated in caching management module.

Caching management module takes out target memory address after getting memory address request from cache list, because slow The memory address prestored in list is deposited configured with check value, then caching management module is according to the prewired check value to target memory Address is verified, to carry out address legitimacy verifies.Checking procedure are as follows: target memory address is calculated in caching management module Check value, obtain calculated check value, then matched using the calculated check value and the check value that prestores, if It is identical for matching, and verification passes through, and otherwise, verification does not pass through.When verification passes through, caching management module is just returned to accelerating engine Target memory address.

Step C3: when the matching of the preset check value for the check value and target memory address being calculated, cache management Module sends target memory address to accelerating engine, so that accelerating engine obtains target memory address.

If the check value of the check value and target memory address that are calculated in the step C2 of target memory address prestored Matching, when as identical, then it represents that the target memory address check passes through.The memory address that passes through is verified in storing process not Mistake occurs, when caching management module determines the memory address that the verification passes through from cache list, which is positive True memory address.To which caching management module can return to the target memory address to accelerating engine, so that accelerating engine makes With.

For example, check value be parity values when, caching management module by target memory address caching on FPGA Cache list on when, calculate the parity values of target memory address, and by the target memory address and the even-odd check Value correspondence is stored in cache list.For example, the parity values are added at the end of target memory address, and by the two one Play caching.The surprise of the target memory address is calculated when obtaining target memory address from cache list in caching management module Then even parity value is compared using the parity values cached in the latter parity values and cache list being calculated Compared with if two parity values are identical, caching management module sends the target memory address to accelerating engine.

By making the memory address for accelerating to prestore on computing unit preset check value, it is ensured that the memory address used is just True property, so as to promote heterogeneous system reliability.

Because memory address is on one section of ram space being buffered in inside FPGA, for example, memory address is buffered in caching In list, cache list is one section of ram space inside FPGA, this is arranged there may be failing so as to cause from caching The memory address that table is read is incorrect.When memory address is stored into cache list, the check value of the memory address is calculated, is deposited Enter idle bit of the memory address.Caching management module storage allocation address is to accelerating engine in use, with can calculating memory Whether the check value of location and the bit value of check bit are identical, pass through the check value, it can be ensured that the memory that accelerating engine application is arrived Location is correctly, to increase the reliability of heterogeneous system.

Step 607: the memory headroom that processing result write-in target memory address is directed toward by accelerating engine.

Processing business can be accelerated processing to request obtained processing result by accelerating engine after getting target memory address The memory headroom that target memory address is directed toward is written.

For example, in step 905, as a result writing back submodule refering to Fig. 9 and processing result being requested to write back step by PCIE The memory headroom that the target memory address of 904 applications is directed toward.Wherein, as a result writing back submodule and writing back the condition of memory headroom is: Processing result reaches a certain amount of, such as 512Byte；Or step 901 accelerates processing request processing to complete business.

It is appreciated that accelerating engine can be one or many to caching management module application memory address.

For example, accelerating engine accelerates processing request to handle business, to obtain processing result, at this point, accelerating engine Memory headroom is needed to store the processing result, for this purpose, accelerating engine obtains a memory address from caching management module application.So Afterwards, which is written the memory headroom of memory address direction by accelerating engine, if the memory that the memory address is directed toward is empty Between can cache whole processing results, then accelerating engine can not have to again to caching management module application memory address.If the memory After address is write completely, which accelerates processing to request still untreated completion, then accelerating engine is applied another again to caching management module Then processing result is continued to write to the memory headroom that another memory address is directed toward by memory address.

For example, in the embodiment that has of the present invention, step 606 and step 607 are implemented as follows:

Step D1: accelerating engine obtains the first memory address to caching management module application memory address.

Wherein, the first memory address is one of the memory address prestored on FPGA.

Accelerating engine processing business accelerates processing request, processing result is obtained, at this point, accelerating engine needs memory headroom Cache the processing result.For this purpose, accelerating engine to caching management module application memory address, i.e. accelerating engine executes step 606.

Step D2: the memory headroom that the first memory address is directed toward is written in the first processing result by accelerating engine.

First processing result belongs to the processing result in step 605.Accelerating engine processing business accelerates processing request, may After obtaining the processing result of part, need for the processing result of the part to be stored on the memory headroom that memory address is directed toward. First processing result is the processing result of part.

Step D3: when the first memory address be directed toward memory headroom be fully written, and business accelerate processing request it is untreated complete Cheng Shi, accelerating engine obtain the second memory address to caching management module application memory address.

Second memory address is one of the memory address prestored on FPGA.

If the memory headroom that the first memory address is directed toward is fully written, and the still untreated complete business acceleration processing of accelerating engine is asked It asks, accelerating engine needs memory headroom also to continue to store processing business and processing is accelerated to request obtained subsequent processing result.For This, accelerating engine applies for memory address to caching management module again, obtains the second memory address.

Step D4: the memory headroom that the second memory address is directed toward is written in second processing result by accelerating engine.

Second processing result belongs to the processing result in step 605.

Accelerating engine continues with business and accelerates processing request, continues to obtain processing result, the processing knot continued Fruit is second processing as a result, the memory headroom that the second memory address is directed toward can be written in the second processing result by accelerating engine.

If after the memory headroom of the second memory address is write completely, business accelerates processing to request still untreated completion, then accelerate to draw It holds up and repeats step D3, D4 until business accelerates processing request to be executed.

If the memory headroom that the first memory address and the second memory address are directed toward can cache processing business acceleration processing Obtained processing result is requested, then in step 608, accelerating engine sends the first memory address and the second memory address to CPU.

Step 608: accelerating engine sends target memory address to CPU.

The accelerating engine of FPGA will handle obtained place after the business that processing completion CPU is sent accelerates processing request After managing the memory headroom that result write-in target memory address is directed toward, which is sent to CPU by accelerating engine, so that The Service Processing Module of CPU obtains the processing result according to the target memory address.

For example, accelerating engine constructs message BD, message BD includes the letter such as target memory address, processing result length Breath.Wherein, which can be recorded in the form of result address list.Then, accelerating engine is by the message BD is sent to CPU.

It is appreciated that target memory address is the memory address of memory headroom caching process result, target memory address Number of addresses can be one or more, for example, target memory address includes in first in the example of above-mentioned step D4 Address and the second memory address are deposited, so that first memory address and the second memory address are sent to the industry of CPU by accelerating engine Business processing module.

It is appreciated that accelerating engine can send target memory address to CPU, in the implementation having in the embodiment having In example, accelerating engine can also send target memory address to CPU and other information, the other information include that processing result is long Degree.

Optionally, FPGA is when executing step 608, or executes step 608 front and back, can be by the memory of target memory address Status word is set as CPU and occupies.Specifically, the accelerating engine of FPGA will handle obtained processing result write-in target memory After the memory headroom that address is directed toward, the internal storage state word of target memory address is set CPU by accelerating engine to be occupied.For CPU Verification module verify use.

For example, the concrete structure schematic diagram of FPGA as shown in Figure 9, which includes that result writes back submodule, the result Writing back submodule can be also used for setting internal storage state word.As a result it writes back submodule and meets the eye on every side the corresponding memory of mark memory address writing When block, or receive acceleration request processing complete when, setting target memory address internal storage state word occupy for CPU, specifically may be used Think, the internal storage state word for setting the target memory address in result address list occupies for CPU.Wherein, result address list by The target memory address of write-in processing result obtains.It is seen hereinafter about the embodiment for increasing memory address state instruction field Detailed description.

Step 609: the memory headroom reading process result that Service Processing Module is directed toward from target memory address.

After Service Processing Module obtains the target memory address that accelerating engine is sent, the interior of target memory address direction is read Deposit processing result spatially.

For example, the Service Processing Module executed on CPU reads receiving queue, message BD is obtained, message BD is then parsed, To obtaining result address list and processing result length, then according to processing result length from the target in result address list Reading process result on the memory headroom that memory address is directed toward.

In this way, the business that Service Processing Module gets step 604 accelerates processing to request corresponding processing result.

Step 610: Service Processing Module sends target memory address to caching management module.

After the processing result of the memory headroom caching of target memory address is obtained by Service Processing Module, FPGA can be again sharp With the target memory address, for this purpose, Service Processing Module sends target memory address to caching management module, so that cache management Target memory address is stored on FPGA by module.To realize the recycling of the Service Processing Module on CPU target memory address, And FPGA is notified to refresh available memory address.

The specific implementation of step 610 can refer to the detailed description of step 602.

Step 611: target memory address is stored on FPGA by caching management module.

After the processing result on memory headroom that target memory address is directed toward is read by CPU, caching management module is obtained The target memory address that CPU is sent, and target memory address is stored on FPGA.It the target memory address can be again by FPGA It utilizes.

The specific implementation of step 611 can refer to the detailed description of step 603.

The execution of step 610 and step 611, so that the processing result that CPU handles FPGA is taken away from memory headroom Afterwards, the memory address of the memory headroom can be reused.To which by the execution of step 610, it is slow that CPU notifies that FPGA refreshes List is deposited, so that target memory address is reentered into cache list.By the flush mechanism, memory address can make repeatedly With, it is ensured that FPGA has available memory address, can promote the process performance of heterogeneous system in this way.

It is appreciated that step 610 and step 611 can not execute in the embodiment that the present invention has, for example, business processing After module reads the processing result on the memory headroom that target memory address is directed toward, which can be released to memory Module.

In the embodiment that the present invention has, the Service Processing Module of CPU is in addition to that can send industry to the accelerating engine of FPGA Business accelerates processing request, can also send memory address to the accelerating engine, the acquisition of the memory address can refer to step 601. At this point, the accelerating engine, which handles the business, accelerates processing request, processing result is obtained, and the CPU is written into processing result and is sent The memory headroom that is directed toward of memory address on, write the full and business when the memory headroom and processing accelerated to request still untreated completion When, then accelerating engine executes step 606, to obtain target memory address from caching management module application, then by processing business The processing result for accelerating processing request to obtain continues to write to the target memory address, i.e. execution step 607.In this way, accelerating engine After the processing completion business accelerates processing request, because of the memory headroom of the CPU memory address sent and target memory address direction It is written with processing result, so that accelerating engine sends the memory address and target memory that the CPU is sent to Service Processing Module Location, so that CPU gets processing result according to these memory address.

It is previously mentioned the invention also includes the embodiment for increasing memory address state instruction field, is now said about the embodiment It is bright as follows:

In the embodiment that the present invention has, it also is configured with state instruction field for memory address, passes through the case pointer Section can be recycled abnormal memory address, to improve the reliability of the method for the embodiment of the present invention.

It specifically, may be because of certain originals because when CPU takes the processing result that FPGA is handled from memory headroom away Because leading to application exception, FPGA flush buffers list is not notified, i.e. step 610 and step 611 is not carried out, to lead The memory address of the memory headroom is caused to lose.And the state instruction field of memory address contains the shape of memory address the last time The information such as state information and timestamp, the verification module of CPU pass through the information of state instruction field, can CPU processing is abnormal and The memory address of loss flushes to cache list again, with lifting system reliability.

Use about state instruction field is now illustrated below:

In the embodiment that the present invention has, state instruction field includes internal storage state word.In the following, being with target memory address Example is described.

Target memory address configuration has internal storage state word, and the value of internal storage state word is occupied including acceleration computing unit and CPU Occupy, wherein the acceleration computing unit of target memory address occupies for indicating that target memory address has been stored in accelerometer It calculates on unit, target memory address is used by acceleration computing unit.The CPU of target memory address occupies for indicating target memory The processing result for the memory headroom caching that address is directed toward can be read by CPU, and target memory address is used by CPU.

In the embodiment that the present invention has, acceleration computing unit is FPGA, at this time the acceleration computing unit of internal storage state word Occupy specially FPGA to occupy.

It is appreciated that the value of internal storage state word can also include Idle state, target memory in the embodiment that the present invention has The Idle state of the internal storage state word of address is used to indicate that the memory headroom of target memory address to be init state, target memory Location is stored not yet onto acceleration computing unit, such as is not flushed to also in the cache list of FPGA.

The value of internal storage state word is arranged to obtain by acceleration computing unit, such as can be write back by the result of acceleration computing unit Submodule executes.

About accelerating computing unit to operate the setting of memory status word, description above is seen.

For example, in step 603, caching management module stores target memory address when accelerating on computing unit, delay Management module is deposited to set the internal storage state word of target memory address to computing unit is accelerated to occupy.In step 608 or step 608 front and backs, the accelerating engine of FPGA after the memory headroom that will handle obtained processing result write-in target memory address direction, Caching management module sets CPU for the value of the internal storage state word of target memory address and occupies, so that target memory address The value of internal storage state word occupied by acceleration computing unit and switch to CPU and occupy.And in step 611, caching management module is by mesh Mark memory address, which is stored in, to be accelerated on computing unit, and the internal storage state word of target memory address is set as adding by caching management module Fast computing unit occupies, and switchs to accelerate computing unit so that the value of the internal storage state word of target memory address is occupied by CPU Occupy.

CPU further includes verifying module, and the verification module is for carrying out verification recycling to memory address.

The specific process for verifying recycling memory address are as follows:

After step 608, the method for the embodiment of the present invention further include:

Step E1: the value for the internal storage state word for verifying module detection target memory address, which is continuously the time that CPU occupies, is It is no to be greater than preset time；If the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time, Then follow the steps E2.

If target memory address is not lost, by the setting procedure of the value of above-mentioned internal storage state word, with target memory Alternately accelerated computing unit and CPU are used for address, and the value of the target memory address can be accounted for accelerating computing unit to occupy with CPU Alternately change between having.So the value of the internal storage state word of target memory address is continuously time that CPU occupies when being greater than default Between, it indicates that target memory address be the memory address of loss, that is, is likely to processing result that CPU handles FPGA from memory When space is taken away, target memory address belonging to the memory headroom is lost by CPU, is no longer participate in CPU and is accelerated computing unit Use process.Accordingly it is desirable to be recycled to the target memory address.If the value of the internal storage state word of target memory address continues It is not more than preset time for the time that CPU occupies, then it represents that the target memory address is not set to the memory address of loss, thus nothing Target memory address need to be recycled.

Wherein, the setting of preset time can rule of thumb or after experiment statistics obtains, and is preset by user.

Step E2: Service Processing Module sends target memory address to caching management module.

If the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time, CPU's Service Processing Module is recycled the target memory address, and Service Processing Module sends the target to caching management module Memory address, so that FPGA reuses the target memory address.

Step E3: target memory address is stored in by caching management module to be accelerated on computing unit.

It is greater than preset time, caching pipe because the value of the internal storage state word of target memory address is continuously the time that CPU occupies It manages module and obtains the target memory address that CPU is sent.Then, target memory address is stored in acceleration and calculated by caching management module On unit, to realize the re-using to the target memory address, the reliability of system is improved, and avoid memory The decline of memory usage caused by loss.

Whether the value about the internal storage state word for verifying module detection target memory address is continuously the time that CPU occupies big In preset time specific practising way there are many, such as:

Example one:

Caching management module is the memory shape when the value of the internal storage state word to target memory address is configured State word configuration setting time, the setting time are used to record the time of the value setting of internal storage state word.

To which step E1 is specifically included:

When verification module detects that the value of the internal storage state word of target memory address is occupied for CPU, the internal storage state is judged Whether the difference between setting time and the current time of word is greater than preset time, if the setting time of the internal storage state word and working as Difference between the preceding time is greater than preset time, then it represents that the value of the internal storage state word of target memory address is continuously CPU and occupies Time be greater than preset time, then follow the steps E2.

Wherein, it verifies module and detects that the value of the internal storage state word of target memory address can be every prefixed time interval It is detected.The current time is to verify module to detect that the value of the internal storage state word of target memory address is when CPU occupies Time.

Example two:

In the embodiment that the present invention has, state instruction field includes internal storage state word, verifies status word and state synchronized The information such as time.

Target memory address is also configured with verification status word.

It verifies status word and the state synchronized time is corresponding, the state synchronized time is used to indicate that the value for verifying status word is synchronous For internal storage state word value when time；

The value of the verification status word of target memory address by target memory address synchronous under synchronous condition internal storage state The value of word obtains, and synchronous condition is the value of the verification status word of target memory address and the internal storage state word of target memory address It is worth not identical；

The value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time specifically: mesh The value for marking the value of the internal storage state word of memory address and the verification status word of target memory address is all that CPU occupies, and in target It deposits the difference between the state synchronized time of address and current time and is greater than preset time, current time is to detect target memory The value of the verification status word of the value and target memory address of the internal storage state word of address is all the time that CPU occupies.

It is executed in operation specific, the value of the internal storage state word of the verification module detection target memory address of step E1 is held Continue whether the time occupied for CPU is greater than preset time, specifically include:

If the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, when Verify module detect the value of the verifications status word of target memory address occupy for CPU, the state synchronized of target memory address when Between difference between current time be greater than preset time and within a preset time interval the internal storage state word of target memory address Value when having not been changed, indicate that the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time, To which Service Processing Module executes the step of sending target memory address to caching management module.

Wherein, current time is to verify module to detect that the value of the verification status word of target memory address is what CPU occupied Time.Because the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, thus it is current Time is also that the value of the value for detecting the internal storage state word of target memory address and the verification status word of target memory address is all The time that CPU occupies.Difference between the state synchronized time of target memory address and current time is greater than preset time, indicates The state synchronized time has timed, out.

Wherein, the method that whether value of the detection internal storage state word of target memory address within a preset time interval changes has It is a variety of, for example, in FPGA when the value of the internal storage state word to target memory address is configured, while being the internal storage state word Setup time stamp, which is used to update the time value of state indicating bit, according to the internal clocking monotonic increase of FPGA.I.e. Timestamp can record the time of the value setting of internal storage state word.

It is the state instruction in 3 seconds according to preset time (such as 3 seconds) regular check internal storage state word due to verifying module Position might have multiple variation, such as occupy from CPU → FPGA occupies → CPU occupies, by the timestamp, verification module be can determine that It is variation on earth that the CPU of state indicating bit, which occupies, again without variation.

Alternatively, being simultaneously when the value of the internal storage state word to target memory address is configured in caching management module The internal storage state word configuration status sequence number, the every variation of the internal storage state word of memory address is primary, then caching management module is incremented by The sequence number is primary, which can be cyclic sequence number.By the status switch number, verifying module can determine that state It is variation on earth that the CPU of indicating bit, which occupies, again without variation.

In order to be judged based on the timestamp or status switch number memory status word, the verification of the embodiment of the present invention Status word, which is also configured with, verifies timestamp or check sequence number, synchronizes by the value of internal storage state word as the value of verification status word When, verification timestamp synchronization be internal storage state word timestamp or the check sequence number synchronize be internal storage state word state Sequence number.It is subsequent, pass through the comparison of verification timestamp and the timestamp of internal storage state word or check sequence number and status switch Number comparison, that is, can determine the internal storage state word of the internal storage state word of memory address target memory address within a preset time interval Value whether change.If verifying timestamp identical with the timestamp of internal storage state word or check sequence number and status switch number It is identical, then it represents that internal storage state word has not been changed, and is otherwise change.

In the example of the example two, using verify carry out status word and synchronization time CPU occupy whether Chao Shi judgement, because It verifies status word and is arranged to obtain by CPU synchronization time, so that use of the CPU to status word and synchronization time is verified is facilitated, To improve the treatment effeciency of CPU.For example, if directly judging the CPU of internal storage state word using the timestamp of FPGA setting Occupy whether time-out, need CPU to know the clock stamp clocking method of FPGA, and be converted to the time (such as the second) of CPU, this is unfavorable for The raising for the treatment of effeciency.Also, in order to save logical resource, the bit wide of timestamp only has 30bit.Because time-out time compares Long, for example, 10 minutes, timestamp may be overturn in 10 minutes, and verify the time and for example there was only 3 seconds, use the state of verification Carry out word and synchronization time CPU occupy whether Chao Shi judgement, it is ensured that timestamp will not be overturn in 3 seconds.

Example three:

The value of the internal storage state word of target memory address is set CPU by caching management module when occupying, for the memory shape The setting countdown of state word starts to identify, and countdown starts mark and starts for triggering between carrying out countdown default countdown；

When caching management module is set as the value of the internal storage state word of target memory address that computing unit to be accelerated to occupy, it is The internal storage state word is arranged countdown and cancels mark, and countdown starts mark and cancels falling between default countdown for triggering Timing.

To which step E1 is specifically included:

When the countdown of internal storage state word for verifying module and detecting target memory address starts mark, start to default Countdown is carried out between countdown, if being zero between the countdown result default countdown, then it represents that the memory of target memory address The value of status word is continuously time that CPU occupies and is greater than preset time, thereby executing step E2, if between falling default countdown Before timing result is zero, verifies module and detect that mark is cancelled in the countdown of the internal storage state word of target memory address, then take Disappear between carrying out countdown default countdown.

There are internal storage state word and target memory address to be also configured with verification status word about target memory address configuration, indicates Target memory address and internal storage state word have corresponding relationship and target memory address and verification status word to have corresponding relationship, from And it can identify the internal storage state word of target memory address and verify status word.

For example, realizing the correspondence of memory address and internal storage state word by initial address+offset.As memory block size is 4MB shares 256 pieces, initial address 0x10000.Internal storage state word equally applies for 256 pieces, and each internal storage state word is 4B.From And first internal storage state word saves the state of first memory block, second status word saves the state of second memory block. Known block address memory A can learn which block the block address memory is, needs which refreshes by (A-0x10000)/4MB A internal storage state word.The situation for verifying status word is identical with this.

It is appreciated that in the embodiment that the present invention has, before step 601, the data processing side of the embodiment of the present invention Method can also include the following steps:

Step F1: the number of memory address range and maximum memory block workable for configuration FPGA, each memory block it is big Small and alignment thereof.

Step F2: whether configuration FPGA enables address compression.

Step F3: whether configuration FPGA enables address check.

Whether step F4: configuration FPGA enable internal storage state instruction more new function, if enabled, it is also necessary to configure memory shape The initial position of state word.Internal storage state instruction more new function, which refers to, carries out returning for memory address using information such as internal storage state words It receives.

Step F5: the memory address refresh scheme of configuration FPGA default.For example, determining that caching management module is obtained from CPU To after memory address, which is put into the above or below of cache list.

It is appreciated that above-mentioned step F1-F5 can execute a step therein or multi-step.

In conclusion FPGA includes accelerating engine and caching management module, caching management module is used for managing internal memory address, On FPGA, accelerating engine obtains the business that CPU is sent and accelerates processing request, and then, accelerating engine asks business acceleration processing It asks and is handled, to obtain processing result.Accelerating engine is to caching management module application memory address, with obtaining target memory Location, wherein accelerate computing unit to prestore at least one memory address, target memory address, which belongs to, to be accelerated to prestore on computing unit Memory address.To which FPGA is during processing business accelerates processing request, without by interrupting to CPU application memory Location, but memory address is directly obtained from FPGA, improve process performance.Then, mesh is written in processing result by accelerating engine It marks memory headroom and accelerating engine that memory address is directed toward and sends target memory address to CPU.

In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate computing unit Accelerating engine business that CPU is sent accelerate processing request to handle, to obtain processing result.Accelerating engine needs to store The processing result, for this purpose, accelerating engine can obtain being pre-stored on acceleration computing unit to caching management module application memory address Target memory address, then, by the processing result be written the target memory address direction memory headroom on, accelerating engine to After CPU sends target memory address, CPU can read the processing result by the target memory address.Because of accelerating engine To the acquisition of memory address by realizing to caching management module application, the accelerating engine and caching management module are located at same Accelerate computing unit on, accelerating engine application to memory address be pre-stored in accelerate computing unit on memory address, FPGA Accelerating engine when need, directly from caching management module application memory address, interrupt so not having to send to CPU application Memory.In this way, accelerating engine can quick obtaining to memory address, to store the processing knot that processing business accelerates processing request to obtain Fruit.Compared with accelerating engine is to the scheme of CPU application memory address, the scheme of the embodiment of the present invention is without waiting for CPU to memory The response of address.Since caching management module is to accelerate computing unit internal module, it is that computing unit programming is accelerated to realize, accelerates The accelerating engine of computing unit only needs several hardware beats from cache module application memory address, compares from CPU application tens The accelerating engine of several hundred microseconds, the embodiment of the present invention can be ignored from the time of cache module application memory address.To, The method of the embodiment of the present invention reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory Location realizes the raising of service process performance to use the memory address to store processing result.

In order to which the data processing method to Fig. 5 and embodiment shown in fig. 6 is understood more intuitively, the data are provided below One concrete scene example of processing method, accelerating computing unit as shown in Figure 10, in the concrete scene example is FPGA, should FPGA is specifically as follows compressing card.The FPGA includes two accelerating engines.

With reference to Figure 10 and above-mentioned each embodiment, refering to fig. 11, the data processing method of the embodiment of the present invention includes:

Step 1101: the initial configuration stage.

The memory modules of CPU to the continuous 1GB physical address space of OS application, such as pass through big page in initialization Reserved mode.The initial address in the physical memory space is 0x200000000, and size 1GB, the entire space 1GB divides For 256 memory blocks, each memory block is 4MB.

In addition, memory modules have also applied for the space of 256 internal storage state words, the corresponding memory block of an internal storage state word. The start physical address in the space of internal storage state word is 0x400000000, size 1KB.In the initial configuration stage, memory mould The value of 256 internal storage state words is arranged as the free time in block.

As shown in figure 12, the driver of FPGA memory address range workable for configuration FPGA in initialization is 0x200000000~0x240000000, memory block size are 4MB, and block address memory is aligned with 4MB, the brush of allocating cache list New algorithm is first in first out (First In first Out, FIFO).It configures FPGA and enables address compression function, enable address school Function is tested, the initial position for enabling internal storage state and indicating more new function, and configure internal storage state word is 0x400000000.Caching It is sky that management module, which initializes cache list, and size 1KB can store 256 memory address.

The Service Processing Module executed on step 1102:CPU is from memory modules application memory.Memory modules are to business processing It is idle memory address, such as 0x200000000,0x200400000,0x200800000 ... that module, which returns to internal storage state word,.

Step 1103: Service Processing Module sends memory address to FPGA.

Specifically: the Service Processing Module construction message BD executed on CPU, message BD includes memory address.By this Message BD notifies available memory address in the list of FPGA flush buffers.Wherein, message format includes type of message and memory Location.As shown in table 1 below, type of message 1, memory address 0x200000000.

Table 1

1	0x 200000000

Step 1103 is repeated, CPU notifies the refresh message of 256 memory address to FPGA.

Step 1104: caching management module stores memory address onto FPGA.The caching management module of FPGA receives CPU Refresh message after, check the legitimacy of message.After validity checking passes through, parsing obtains memory address from message BD.So Afterwards, according to the management method of configuration, memory address is compressed, 0x200000000 is such as compressed into 0x2000, and calculate The check value of memory address.Check value calculation method is as follows:

The parity value of address value after bit30=compression, the number if the bit of memory address is 1 is odd number, then odd Even value is 1, is otherwise 0.

Bit31=bit30 is negated.

For example, being 0x2000 after memory address 0x200000000 compression, the number that the bit of 0x2000 is 1 is odd number It is a, therefore its parity value is 1, then the 30bit of memory address stores bit 1, and 31bit negate storage 0.Finally, memory Location 0x200000000 is stored in the value of cache list for 0x40002000, such as Figure 13 institute after increasing address check by address compression Show.

For another example, memory address 0x200400000 is 0x2004, that 0x2004 bit is 1 after memory address compression Number is even number, and parity value 0, then 30bit stores bit 0, and 31bit negate storage 1.Last memory address 0x200400000 is stored in the value of cache list for 0x80002004, such as Figure 14 institute after increasing address check by address compression Show.

The transformational relation of other memory address is similar to shown in above-mentioned example.

According to refresh scheme the address value (such as 0x40002000) after original memory address (such as 0x200000000) conversion It is put into cache list, the internal storage state word of the original memory address of juxtaposition occupies for FPGA.The structure of internal storage state word such as 2 institute of table Show.Wherein, Idle state only has initial phase just to exist.Internal storage state word is FPGA module update, and FPGA refers in more new state When showing, the internal clocking value of current FPGA is written on bit29~bit0.

Table 2:

Step 1104 is repeated, the refresh message of step 1103 is handled and is completed.After the completion of processing, obtain as shown in figure 15 Mapping relations figure.

Step 1105: the processing decompression request of accelerating engine 1, and application obtain memory address.

Accelerating engine 1 has received the decompression request that CPU is sent, and stores decompression knot to caching management module application memory Fruit, caching management module take out cache list free address, supplement low level alignment memory address value, and remove bit30 and Address 0x4002000 is finally reduced to 0x200000000, and the address is returned to accelerating engine 1 by the check bit of bit31.

Step 1106: the processing decompression request of accelerating engine 2, and application obtain memory address.

Accelerating engine 2 has received the decompression request that CPU is sent, and stores decompression knot to caching management module application memory Fruit, the processing of similar step 1105, caching management module return to memory address 0x200400000 to accelerating engine 2.

Step 1107: accelerating engine 1 applies for memory address again.

Accelerating engine 1 in decompression procedure, write completely, not yet by the space 4MB of discovery memory address 0x200000000 Decompression is completed, and continues to store decompression to caching management module application memory as a result, caching management module returns to 0x200800000's Memory address is to accelerating engine 1.

Step 1108: accelerating engine 2 sends memory address to CPU.

The decompression of accelerating engine 2 is completed, and construction message BD notifies CPU, and message includes the address list of storage decompression result 0x200400000 decompresses length 3.2MB.Then, FPGA sets the corresponding internal storage state word of memory address 0x200400000 The value of 0x400000004 is occupied for CPU.

Step 1109:CPU obtains the memory address that accelerating engine 2 is sent.

The decompression that CPU has received step 1108 completes message, parses BD, obtains the memory address and solution of caching decompression result Result length is pressed, calls mmap function that physical address 0x200400000 is mapped as the accessible virtual address of software, goes forward side by side The processing of row next step.

Step 1110:CPU notifies the list of FPGA flush buffers.

CPU has read the decompression data of step 1109, and memory address 0x200400000 can be reused, and construction refreshes Message informing FPGA flush buffers list.

The treatment process of refresh message notice FPGA flush buffers list is constructed with step 1103.

Step 1111: caching management module stores memory address onto FPGA.

FPGA has received the refresh message of the step 1110 of CPU, and caching management module stores memory address onto FPGA, The treatment process is the same as step 1104.

Step 1112: accelerating engine 1 sends memory address to CPU.

The decompression of accelerating engine 1 is completed, and construction message BD notifies that CPU, message BD include the address of storage decompression result List 0x200000000,0x200800000 decompress length 7MB.Then, FPGA set memory address 0x200000000, The value of corresponding internal storage state word 0x400000000, the 0x400000008 of 0x200800000 is occupied for CPU.

Step 1113:CPU obtains the memory address that accelerating engine 1 is sent.

CPU has received the message BD that the decompression of step 1112 is completed, and parses message BD, obtains decompression result address and length Degree, calling the total space 8MB mmap Function Mapping physical address 0x200000000,0x200800000 is the accessible company of software Continuous virtual address, and carry out the processing of next step.

Step 1114:CPU notifies the list of FPGA flush buffers.

CPU has read the decompression data in step 1113, and memory address 0x200000000,0x200800000 can be again It uses, the list of FPGA flush buffers is notified to two address architecture refresh messages respectively.

Step 1115: caching management module stores memory address onto FPGA.

FPGA has received the refresh message of CPU, and the treatment process is the same as step 1104.

Wherein, the Service Processing Module executed on CPU may be exited extremely for some reason, be led when reading data It causes the memory address of memory block FPGA not to be notified to flush to cache list, an EMS memory checking module is thus provided, by CPU It executes, the verification that memory block can be carried out according to internal storage state word is recycled.It is as follows to verify recovery method:

As shown in figure 16, when EMS memory checking module initialization, application is used to protect with an equal amount of space of internal storage state word Verification status word is deposited, and isochronous memory status word is to status word is verified, when recording the corresponding state synchronized of each verification status word Between.

Verify module according to preset time (such as 3 seconds) timing traversal full memory status word, and with verify status word ratio Compared with.

If internal storage state word and verification status word are different, the value for verifying the value of status word as internal storage state word is set, And update the verification status word corresponding state synchronized time.

If internal storage state word and verify status word be it is the same, check verify status word state indicating bit whether be FPGA occupies, and occupies if it is FPGA, without processing；Occupy if it is CPU, compares the state synchronized of the verification status word Whether the difference of time and current time is greater than preset time-out time (such as 10 minutes), and if it is greater than the time, then recycling should Memory address is flushed to cache list by refresh message by the corresponding memory address of status word again.

The specific example of a memory address state instruction field is provided below.

Assuming that it is 3 seconds that the time is verified in preset timing, time-out time is 10 minutes.There are two internal storage state words, and value is such as Shown in table 3, wherein 100,98 be the timestamp for updating state indicating bit.

Table 3:

	First memory address	Second memory address
			Internal storage state word	CPU occupies+100	FPGA occupies+98

It verifies module and carries out the 1st verification.Looking into status word due to protokaryon is initialization value, verifies status word and internal storage state Word is different, verifies module and updates the value for verifying that status word is internal storage state word, as shown in table 4.Wherein, the state synchronized time is It is the time of CPU, different by FPGA write-in from the timestamp of internal storage state word.

Table 4:

It verifies module and carries out the 2nd verification, detect that internal storage state word is as shown in table 5.

Table 5:

	First memory address	Second memory address
			Internal storage state word	CPU occupies+105	CPU occupies+106

At this point, different due to verifying status word and internal storage state word, verifying module to update verification status word is memory shape The value of state word, as shown in table 6.

Table 6:

In subsequent implementation procedure, the Service Processing Module of CPU is abnormal, accelerates computing unit by the second memory address It is sent to Service Processing Module, so that Service Processing Module reads the processing result for the memory headroom that the second memory address is directed toward Afterwards, the second memory address is not released to FPGA.FPGA does not operate the second memory address, therefore the second memory address The value of internal storage state word update never again in the follow-up process.

It verifies module and carries out the 3rd verification, detect that internal storage state word is as shown in table 7.

Table 7:

	First memory address	Second memory address
			Internal storage state word	FPGA occupies+108	CPU occupies+106

Since the verification status word of the first memory address and the value of internal storage state word are different, to first memory address, The value for verifying that status word is internal storage state word is updated, and updates the state synchronized time of the verification status word of the first memory address. And the verification status word and its state synchronized time-preserving of the second memory address, it is obtaining as a result, as shown in table 8.

Table 8:

	First memory address	Second memory address
			Verify status word	FPGA occupies+108	CPU occupies+106
The state synchronized time	1000006	1000003

At this point, the verification status word due to the second memory address remains unchanged, and state indicating bit occupies for CPU, checks The last renewal time (i.e. state synchronized time) of the verification status word is 1000003 seconds, is 3 seconds with current time difference, also It is not timed-out, verifies module and be not processed.

It verifies module and carries out the 4th verification to the 201st verification.It is subsequent because the second memory address is in lost condition, FPGA does not operate the second memory address, so, the value of the internal storage state word of the second memory address is in the follow-up process It updates never again.To which in the 4th is verified to the 201st time and verified, the first memory address is normal because using, thus its memory Status word can generate variation, and the verification status word of the first memory address also changes therewith, and the second memory address is because in loss State, internal storage state word, which always remains as CPU, to be occupied, preset super because being divided into embodiments of the present invention, between verification 3 seconds When the time be 10 minutes, so, difference between the state synchronized time of the second memory address and current time is verified in the 4th Time-out time is both less than into the 201st verification, that is, the internal storage state word of the second memory address is continuously the time that CPU occupies Less than time-out time, CPU is not recycled the second memory address.In this way, the second memory address is verified in the 4th to the 201st In secondary verification, internal storage state word occupies for CPU, and its verification status word also remains CPU and occupies, and verifies the shape of status word State synchronization time is 1000003 seconds.

It verifies module and carries out the 202nd verification, detect that internal storage state word is as shown in table 9.

Table 9:

	First memory address	Second memory address
			Internal storage state word	CPU occupies+1100	CPU occupies+106

Since the verification status word and internal storage state word of the first memory address are different, the verification of the first memory address is updated Status word is the value of internal storage state word, and updates the state synchronized time of the verification status word of the first memory address.And in second Deposit the verification status word and its state synchronized time-preserving of address, obtained implementing result, as shown in table 10.

Table 10:

	First memory address	Second memory address
			Verify status word	CPU occupies+1100	CPU occupies+106
The state synchronized time	1000606	1000003

At this point, the verification status word due to the second memory address remains unchanged, and state indicating bit occupies for CPU, checks The last renewal time of the verification status word of second memory address is 1000003 seconds, is 603 seconds with current time difference, Greater than time-out time 10 minutes of setting, CPU recycled the second memory address.After recycling the second memory address, FPGA update this The internal storage state word of two memory address, so that the internal storage state word of the second memory address restores normal again.

It verifies module and carries out the 203rd verification, detect that internal storage state word is as shown in table 11.

Table 11:

At this point, for the two memory address, it is different due to verifying status word and internal storage state word, update verification state Word is the value of internal storage state word, and obtained result is as shown in table 12.

Table 12:

In conclusion increasing caching management module inside FPGA, the caching management module is for managing in cache list Memory address, these memory address be CPU be supplied to the memory address that FPGA is used.To which FPGA is in processing decompression request etc. During business accelerates processing request, interrupt without sending to CPU application memory address, but directly from caching management module Apply for memory address, directly memory address is obtained from cache list, to improve process performance.

Figure 17 is a kind of structural schematic diagram for accelerating computing unit provided in an embodiment of the present invention.The acceleration computing unit can Method for executing the acceleration computing unit in above-mentioned each embodiment.

As shown in figure 17, which includes accelerating engine 1701 and caching management module 1702, caching Management module 1702 is used for managing internal memory address；

Accelerating engine 1701, the business for obtaining central processor CPU transmission accelerate processing request；

Accelerating engine 1701 is also used to accelerate processing request to handle business, to obtain processing result；

Accelerating engine 1701 is also used to apply for memory address to caching management module 1702, obtains target memory address, add Fast computing unit prestores at least one memory address, and target memory address is with belonging to the memory for accelerating to prestore on computing unit Location；

Accelerating engine 1701 is also used to the processing result memory headroom that target memory address is directed toward is written；

Accelerating engine 1701 is also used to send target memory address to CPU.

Optionally,

Accelerating engine 1701, is also used to caching management module application memory address, obtains the first memory address, in first Depositing address is one of the memory address for accelerating to prestore on computing unit；

Accelerating engine 1701, be also used to by the first processing result be written the first memory address be directed toward memory headroom, first Processing result belongs to processing result；

Accelerating engine 1701 is also used to be fully written when the memory headroom that the first memory address is directed toward, and business acceleration is handled When requesting untreated completion, to caching management module application memory address, the second memory address is obtained, the second memory address is to add One of the memory address prestored on fast computing unit；

Accelerating engine 1701, be also used to by second processing result be written the second memory address be directed toward memory headroom, second Processing result belongs to processing result；

Accelerating engine 1701 is also used to send the first memory address and the second memory address to CPU.

Optionally,

Before accelerating engine 1701 obtains the business acceleration processing request that CPU is sent, caching management module 1702, for obtaining The target memory address for taking CPU to send；

Caching management module 1702 is also used to for being stored in target memory address on acceleration computing unit.

Optionally,

After accelerating engine 1701 sends target memory address to CPU, caching management module 1702 is also used to obtain CPU The target memory address of transmission, the processing result on memory headroom that target memory address is directed toward are read by CPU；

Optionally,

Accelerating the memory address prestored on computing unit is the memory address according to bit bit compression of alignment；

Accelerating engine 1701, is also used to send memory address request to caching management module 1702, and memory address request is used In to caching management module request memory address；

Caching management module 1702, be also used to according to alignment bit target memory address is decompressed, decompressed Target memory address；

Accelerating engine 1701 is also used to obtain the target memory address of the decompression of caching management module transmission.

Optionally,

The memory address prestored on computing unit is accelerated to be preset with check value,

Accelerating engine 1701, be also used to caching management module send memory address request, memory address request for Caching management module requests memory address；

Caching management module 1702 is also used to be calculated the check value of target memory address；

Caching management module 1702 is also used to the preset check value when the check value and target memory address being calculated When matching, target memory address is sent to accelerating engine, so that accelerating engine obtains target memory address.

Optionally,

Target memory address configuration has internal storage state word, and the value of internal storage state word is occupied including acceleration computing unit and CPU Occupy, the value of internal storage state word is arranged to obtain by acceleration computing unit；

The acceleration computing unit of target memory address occupies to be calculated for indicating that target memory address has been stored in acceleration On unit, target memory address is used by acceleration computing unit；

The CPU of target memory address occupies the processing result of the memory headroom caching for indicating to be directed toward target memory address It can be read by CPU, target memory address is used by CPU；

After accelerating engine 1701 sends target memory address to CPU, caching management module 1702 is also used to obtain CPU When the target memory address of transmission, the value of the internal storage state word of target memory address are continuously time that CPU occupies and are greater than default Between；

Optionally,

Target memory address is also configured with verification status word,

In conclusion the acceleration computing unit 1700 includes accelerating engine 1701 and caching management module 1702, caching pipe It manages module 1702 and is used for managing internal memory address；Accelerating engine 1701 obtains the business acceleration processing that central processor CPU is sent and asks It asks；Then, accelerating engine 1701 accelerates processing request to handle business, to obtain processing result.In order to store the processing As a result, accelerating engine 1701 applies for memory address to caching management module 1702, target memory address is obtained, wherein accelerometer It calculates unit and prestores at least one memory address, target memory address belongs to the memory address for accelerating to prestore on computing unit.So Afterwards, the memory headroom and accelerating engine 1701 that processing result write-in target memory address is directed toward by accelerating engine 1701 are to CPU Send target memory address.In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate The accelerating engine of computing unit accelerates processing request to handle the business that CPU is sent, to obtain processing result.Accelerating engine Need to store the processing result, for this purpose, accelerating engine can obtain being pre-stored in accelerometer to caching management module application memory address The target memory address on unit is calculated, then, which is written on the memory headroom of target memory address direction, is added After fast engine sends target memory address to CPU, CPU can read the processing result by the target memory address.Because Accelerating engine to the acquisition of memory address by being realized to caching management module application, the accelerating engine and caching management module position In on same acceleration computing unit, accelerating engine application to memory address be with being pre-stored in the memory accelerated on computing unit Location, in this way, accelerating engine can quick obtaining to memory address, to store the processing knot that processing business accelerates processing request to obtain Fruit.Compared with accelerating engine is to the scheme of CPU application memory address, the scheme of the embodiment of the present invention is without waiting for CPU to memory The response of address reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory address, to use The memory address stores processing result, realizes the raising of service process performance.

Figure 18 is a kind of structural schematic diagram of central processing unit provided in an embodiment of the present invention.The central processing unit can be used for The method for executing the central processing unit in above-mentioned each embodiment.

Refering to fig. 18, the central processing unit 1800 of the embodiment of the present invention includes Service Processing Module 1801 and memory modules 1802；

Service Processing Module 1801 obtains target memory address for applying for memory address to memory modules 1802；

Service Processing Module 1801 is also used to the caching management module transmission target memory address for accelerating computing unit, Accelerate on computing unit so that target memory address is stored in by caching management module；

Service Processing Module 1801 is also used to obtain business and accelerates processing request；

Service Processing Module 1801 is also used to the accelerating engine transmission business acceleration processing request for accelerating computing unit, So that accelerating engine accelerates processing request to handle business, to obtain processing result, accelerating engine is to caching management module Apply for memory address, after obtaining target memory address, the memory that target memory address is directed toward is written in processing result by accelerating engine Space.

Optionally,

Service Processing Module 1801 sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, business Processing module 1801 is also used to obtain the target memory address of accelerating engine transmission；

Service Processing Module 1801 is also used to the memory headroom reading process result being directed toward from target memory address；

Service Processing Module 1801 is also used to send target memory address to caching management module, so that cache management mould Target memory address is stored in by block to be accelerated on computing unit.

Optionally,

CPU further includes verifying module 1803,

Service Processing Module verifies module to accelerating the accelerating engine transmission business of computing unit to accelerate after handling request 1803, it is continuously whether the time that CPU occupies is greater than preset time for detecting the value of internal storage state word of target memory address；

Service Processing Module 1801, if the value for being also used to the internal storage state word of target memory address is continuously what CPU occupied Time is greater than preset time, then sends target memory address to caching management module, so that caching management module is by target memory Address, which is stored in, to be accelerated on computing unit.

Optionally,

Target memory address is also configured with verification status word；

Module 1803 is verified, is also used to judge every prefixed time interval the value of the verification status word of target memory address It is whether identical with the value of the internal storage state word of target memory address；

Module 1803 is verified, if being also used to the value of the verification status word of target memory address and the memory of target memory address The value of status word is not identical, then synchronizes the value of the verification status word of target memory address for the internal storage state of target memory address The value of word, and update the state synchronized time of target memory address；

Service Processing Module 1801, if being also used to value and the target memory address of the verification status word of target memory address The value of internal storage state word is identical, then when verify module detect target memory address verifications status word value occupy for CPU, mesh It marks the difference between the state synchronized time and current time of memory address and is greater than preset time and within a preset time interval mesh When the value of the internal storage state word of mark memory address has not been changed, the step of sending target memory address to caching management module is executed, Current time is to verify module to detect that the value of the verification status word of target memory address is the time that CPU occupies.

In conclusion central processing unit 1800 includes Service Processing Module 1801 and memory modules 1802.Business processing mould Block 1801 applies for memory address to memory modules 1802, obtains target memory address, and then, Service Processing Module 1801 is to acceleration The caching management module of computing unit sends target memory address, adds so that target memory address is stored in by caching management module On fast computing unit.Service Processing Module 1801 obtains after business accelerates processing request, and Service Processing Module 1801 is to accelerometer The accelerating engine for calculating unit sends business and accelerates processing request, so that accelerating engine accelerates processing request to handle business, To obtain processing result, accelerating engine is to caching management module application memory address, after obtaining target memory address, accelerating engine The memory headroom that processing result write-in target memory address is directed toward.

The embodiment of the present invention also provides a kind of heterogeneous system, and the structure of the heterogeneous system sees Fig. 1.

The heterogeneous system includes accelerating computing unit and central processing unit.

Wherein, accelerating computing unit is the acceleration computing unit of embodiment as shown in figure 17, is detailed in above-mentioned each exemplary reality Example is applied, details are not described herein again.

Central processing unit is the central processing unit of embodiment as shown in figure 18, is detailed in above-mentioned each exemplary embodiment, herein It repeats no more.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.

The computer program product includes one or more computer instructions.Load and execute on computers the meter When calculation machine program instruction, entirely or partly generate according to process or function described in the embodiment of the present invention.The computer can To be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website Website, computer, server or data center are transmitted.The computer readable storage medium can be computer and can deposit Any usable medium of storage either includes that the data storages such as one or more usable mediums integrated server, data center are set It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead Body medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the embodiment of the present invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with software product in other words Form embody, which is stored in a storage medium, including some instructions use so that one Computer equipment (can be personal computer, server or the network equipment etc.) executes side described in each embodiment of the present invention The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.

Claims

1. a kind of data processing method, which is characterized in that

The method is applied to accelerate computing unit, and the acceleration computing unit includes accelerating engine and caching management module, institute Caching management module is stated for managing internal memory address；

The described method includes:

The accelerating engine obtains the business that central processor CPU is sent and accelerates processing request；

The accelerating engine accelerates processing request to handle the business, to obtain processing result；

The accelerating engine obtains target memory address to the caching management module application memory address, and the acceleration calculates Unit prestores at least one memory address, and the target memory address is with belonging to the memory prestored on the acceleration computing unit Location；

The memory headroom that the target memory address is directed toward is written in the processing result by the accelerating engine；

The accelerating engine sends the target memory address to the CPU.

2. the method according to claim 1, wherein

The accelerating engine obtains target memory address and the acceleration to the caching management module application memory address The memory headroom that the target memory address is directed toward is written in the processing result by engine, comprising:

The accelerating engine obtains the first memory address, first memory to the caching management module application memory address Address is one of the memory address prestored on the acceleration computing unit；

The memory headroom that first memory address is directed toward, first processing is written in first processing result by the accelerating engine As a result belong to the processing result；

When the memory headroom that first memory address is directed toward is fully written, and the business accelerates processing to request untreated completion When, the accelerating engine obtains the second memory address to the caching management module application memory address, second memory Location is one of the memory address prestored on the acceleration computing unit；

The memory headroom that second memory address is directed toward, the second processing is written in second processing result by the accelerating engine As a result belong to the processing result；

The accelerating engine sends the target memory address to the CPU, comprising:

The accelerating engine sends first memory address and second memory address to the CPU.

3. the method according to claim 1, wherein

Before the accelerating engine obtains the business acceleration processing request that CPU is sent, the method also includes:

The caching management module obtains the target memory address that the CPU is sent；

The target memory address is stored on the acceleration computing unit by the caching management module.

4. method according to any one of claims 1 to 3, which is characterized in that

After the accelerating engine sends the target memory address to the CPU, the method also includes:

The caching management module obtains the target memory address that the CPU is sent, what the target memory address was directed toward Processing result on memory headroom is read by the CPU；

5. method according to any one of claims 1 to 3, which is characterized in that

The memory address prestored on the acceleration computing unit is the memory address according to bit bit compression of alignment；

The accelerating engine obtains target memory address to the caching management module application memory address, comprising:

The accelerating engine sends memory address request to the caching management module, and memory address request is for described Caching management module requests memory address；

The caching management module decompresses target memory address for bit according to alignment, the target memory decompressed Address；

The accelerating engine obtains the target memory address for the decompression that the caching management module is sent.

6. method according to any one of claims 1 to 3, which is characterized in that

The memory address prestored on the acceleration computing unit is preset with check value,

The check value of target memory address is calculated in the caching management module；

When the check value being calculated and the preset check value of the target memory address match, the cache management Module sends the target memory address to the accelerating engine, so that the accelerating engine obtains the target memory address.

7. method according to any one of claims 1 to 3, which is characterized in that

The target memory address configuration has an internal storage state word, the value of the internal storage state word include accelerate computing unit occupy and CPU occupies, and the value of the internal storage state word is arranged to obtain by the acceleration computing unit；

The acceleration computing unit of the target memory address occupies for indicating that the target memory address has been stored in On the acceleration computing unit, the target memory address is used by the acceleration computing unit；

The CPU of the target memory address occupies the memory headroom caching for indicating to be directed toward the target memory address Processing result can be read by CPU, and the target memory address is used by the CPU；

The caching management module obtains the target memory address that the CPU is sent, the memory of the target memory address The value of status word is continuously time that CPU occupies and is greater than preset time；

8. the method according to the description of claim 7 is characterized in that

The target memory address is also configured with verification status word；

The verification status word and state synchronized time are corresponding, and the state synchronized time is for indicating the verification status word Value synchronous time when being the value of the internal storage state word；

The value of the verification status word of the target memory address under synchronous condition by synchronizing the memory of the target memory address The value of status word obtains, and the synchronous condition is for the value of the verification status word of the target memory address and the target memory The value of the internal storage state word of location is not identical；

The value of the internal storage state word of the target memory address is continuously time that CPU occupies and is greater than preset time specifically: institute The value for stating the value of the internal storage state word of target memory address and the verification status word of the target memory address is all that CPU occupies, And the difference between the state synchronized time and current time of the target memory address is greater than preset time, the current time For detect the target memory address internal storage state word value and the target memory address verification status word value it is same The time occupied for CPU.

9. a kind of data processing method, which is characterized in that

The method is applied on CPU, and the CPU includes Service Processing Module and memory modules；

The described method includes:

The Service Processing Module obtains target memory address to the memory modules application memory address；

The Service Processing Module is to accelerating the caching management module of computing unit to send the target memory address, so that described The target memory address is stored on the acceleration computing unit by caching management module；

The Service Processing Module obtains business and accelerates processing request；

The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates processing request, so that The accelerating engine accelerates processing request to handle the business, and to obtain processing result, the accelerating engine is to described Caching management module application memory address, after obtaining the target memory address, the accelerating engine writes the processing result Enter the memory headroom that the target memory address is directed toward.

10. according to the method described in claim 9, it is characterized in that,

The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request, The method also includes:

The Service Processing Module obtains the target memory address that the accelerating engine is sent；

The memory headroom that the Service Processing Module is directed toward from the target memory address reads the processing result；

The Service Processing Module sends the target memory address to the caching management module, so that the cache management mould The target memory address is stored on the acceleration computing unit by block.

11. according to the method described in claim 9, it is characterized in that,

The CPU further includes verifying module,

Whether the value for verifying the internal storage state word that module detects the target memory address is continuously the time that CPU occupies big In preset time；

If the value of the internal storage state word of the target memory address is continuously the time that CPU occupies and is greater than preset time, described Service Processing Module sends the target memory address to the caching management module, so that the caching management module will be described Target memory address is stored on the acceleration computing unit.

12. according to the method for claim 11, which is characterized in that

The target memory address is also configured with verification status word；

Whether the value for verifying the internal storage state word that module detects the target memory address is continuously the time that CPU occupies big In preset time, comprising:

The verification module every prefixed time interval, judge the verification status word of the target memory address value and the mesh Whether the value for marking the internal storage state word of memory address is identical；

If the value of the internal storage state word of the value of the verification status word of the target memory address and the target memory address not phase Together, then the verification module synchronizes the value of the verification status word of the target memory address in the target memory address The value of status word is deposited, and updates the state synchronized time of the target memory address；

If the value of the verification status word of the target memory address is identical with the value of the internal storage state word of the target memory address, Then when the verification module detect the verifications status word of the target memory address value occupy for CPU, the target memory Difference between the state synchronized time of address and current time is greater than preset time and described in the prefixed time interval When the value of the internal storage state word of target memory address has not been changed, the Service Processing Module is executed to be sent out to the caching management module The step of sending the target memory address, the current time are the core that the verification module detects the target memory address Look into the time that the value of status word is occupied for CPU.

13. a kind of acceleration computing unit, which is characterized in that

The acceleration computing unit includes accelerating engine and caching management module, and the caching management module is for managing internal memory Location；

The accelerating engine, the business for obtaining central processor CPU transmission accelerate processing request；

The accelerating engine is also used to accelerate processing request to handle the business, to obtain processing result；

The accelerating engine is also used to obtain target memory address to the caching management module application memory address, described to add Fast computing unit prestores at least one memory address, and the target memory address, which belongs to, to be prestored on the acceleration computing unit Memory address；

The accelerating engine is also used to be written the processing result memory headroom that the target memory address is directed toward；

The accelerating engine is also used to send the target memory address to the CPU.

14. acceleration computing unit according to claim 13, which is characterized in that

The accelerating engine, is also used to the caching management module application memory address, obtains the first memory address, and described One memory address is one of the memory address prestored on the acceleration computing unit；

The accelerating engine is also used to be written the first processing result the memory headroom that first memory address is directed toward, described First processing result belongs to the processing result；

The accelerating engine is also used to be fully written when the memory headroom that first memory address is directed toward, and the business accelerates When untreated completion is requested in processing, to the caching management module application memory address, the second memory address is obtained, described second Memory address is one of the memory address prestored on the acceleration computing unit；

The accelerating engine is also used to be written second processing result the memory headroom that second memory address is directed toward, described Second processing result belongs to the processing result；

The accelerating engine is also used to send first memory address and second memory address to the CPU.

15. acceleration computing unit according to claim 13, which is characterized in that

Before the accelerating engine obtains the business acceleration processing request that CPU is sent, the caching management module, for obtaining State the target memory address of CPU transmission；

The caching management module is also used to for being stored on the acceleration computing unit target memory address.

16. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that

After the accelerating engine sends the target memory address to the CPU, the caching management module is also used to obtain The target memory address that the CPU is sent, the processing result on memory headroom that the target memory address is directed toward by The CPU is read；

17. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that

The accelerating engine, is also used to send memory address request to the caching management module, and the memory address request is used In to the caching management module request memory address；

The caching management module, be also used to according to alignment bit target memory address is decompressed, the mesh decompressed Mark memory address；

The accelerating engine is also used to obtain the target memory address for the decompression that the caching management module is sent.

18. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that

The caching management module is also used to be calculated the check value of target memory address；

The caching management module is also used to when the preset school of the check value and the target memory address being calculated When testing value matching, the target memory address is sent to the accelerating engine, so that the accelerating engine obtains in the target Deposit address.

19. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that

After the accelerating engine sends the target memory address to the CPU, the caching management module is also used to obtain The target memory address that the CPU is sent, the value of the internal storage state word of the target memory address are continuously what CPU occupied Time is greater than preset time；

20. acceleration computing unit according to claim 19, which is characterized in that

The target memory address is also configured with verification status word；

21. a kind of central processing unit, which is characterized in that

The central processing unit includes Service Processing Module and memory modules；

The Service Processing Module, for obtaining target memory address to the memory modules application memory address；

The Service Processing Module is also used to the caching management module transmission target memory address for accelerating computing unit, So that the target memory address is stored on the acceleration computing unit by the caching management module；

The Service Processing Module is also used to obtain business and accelerates processing request；

The Service Processing Module is also used to send the business acceleration processing to the accelerating engine for accelerating computing unit and ask It asks, so that the accelerating engine accelerates processing request to handle the business, to obtain processing result, the accelerating engine To the caching management module application memory address, after obtaining the target memory address, the accelerating engine is by the processing As a result the memory headroom that the target memory address is directed toward is written.

22. central processing unit according to claim 21, which is characterized in that

The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request, The Service Processing Module is also used to obtain the target memory address that the accelerating engine is sent；

The Service Processing Module, the memory headroom for being also used to be directed toward from the target memory address read the processing result；

The Service Processing Module is also used to send the target memory address to the caching management module, so that described slow It deposits management module the target memory address is stored on the acceleration computing unit.

23. central processing unit according to claim 21, which is characterized in that

The CPU further includes verifying module,

The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request, The verification module, for detecting whether the value of internal storage state word of the target memory address is continuously time that CPU occupies Greater than preset time；

The Service Processing Module, if the value for being also used to the internal storage state word of the target memory address is continuously what CPU occupied Time is greater than preset time, then the target memory address is sent to the caching management module, so that the cache management mould The target memory address is stored on the acceleration computing unit by block.

24. central processing unit according to claim 23, which is characterized in that

The target memory address is also configured with verification status word；

The verification module is also used to judge every prefixed time interval the value of the verification status word of the target memory address It is whether identical with the value of the internal storage state word of the target memory address；

The verification module, if be also used to the verification status word of the target memory address value and the target memory address The value of internal storage state word is not identical, then synchronizes the value of the verification status word of the target memory address for the target memory The value of the internal storage state word of location, and update the state synchronized time of the target memory address；

The Service Processing Module, if be also used to the verification status word of the target memory address value and the target memory The value of the internal storage state word of location is identical, then when the verification module detects the value of the verification status word of the target memory address Occupy for CPU, difference between the state synchronized time and current time of the target memory address is greater than preset time and When the value of the internal storage state word of the target memory address has not been changed in the prefixed time interval, execute to the cache management Module send the target memory address the step of, the current time be the verification module with detecting the target memory The value of the verification status word of location is the time that CPU occupies.

25. a kind of heterogeneous system, which is characterized in that

The heterogeneous system includes accelerating computing unit and central processing unit,

Wherein, the acceleration computing unit is such as the described in any item acceleration computing units of claim 13 to 20；

The central processing unit is such as the described in any item central processing units of claim 21 to 24.