CN109739558A - A kind of data duplicate removal method, system and FPGA and data processing system - Google Patents

A kind of data duplicate removal method, system and FPGA and data processing system Download PDF

Info

Publication number
CN109739558A
CN109739558A CN201910013012.0A CN201910013012A CN109739558A CN 109739558 A CN109739558 A CN 109739558A CN 201910013012 A CN201910013012 A CN 201910013012A CN 109739558 A CN109739558 A CN 109739558A
Authority
CN
China
Prior art keywords
data
fpga
duplicate removal
host
caching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910013012.0A
Other languages
Chinese (zh)
Inventor
徐翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201910013012.0A priority Critical patent/CN109739558A/en
Publication of CN109739558A publication Critical patent/CN109739558A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of data duplicate removal method, system and a kind of FPGA, computer readable storage medium and a kind of data processing systems, this method comprises: according to target byte number reads the pending data in caching when receiving the data processing command of host transmission;Data deduplication processing is carried out to the pending data using duplicate removal formula, obtains duplicate removal data;After the completion of the pending data in the caching all processing, Xiang Suoshu host returns to the duplicate removal data.Data duplicate removal method provided by the present application, general processor CPU is used to run main program as host side, FPGA carries out the hardware acceleration unit of data deduplication algorithm as coprocessor, the degree of parallelism of algorithm is improved by the optimal way of parallel pipelining process, the execution performance of data deduplication algorithm can be effectively promoted, throughput performance when algorithm executes is improved.

Description

A kind of data duplicate removal method, system and FPGA and data processing system
Technical field
This application involves field of computer technology, more specifically to a kind of data duplicate removal method, system and one kind FPGA, a kind of computer readable storage medium and a kind of data processing system.
Background technique
In the epoch of cloud computing, a large amount of data transmission needs to carry out data deduplication operation, cause to occupy a large amount of CPU (in Literary full name: central processing unit, full name in English: Central Processing Unit) or calculation resources, influence cloud computing data The operational performance at center, and data deduplication operation efficiency in the prior art is lower.
Therefore, how to improve data deduplication operation efficiency is those skilled in the art's technical issues that need to address.
Summary of the invention
The application's is designed to provide a kind of data duplicate removal method, system and a kind of FPGA, a kind of computer-readable deposits Storage media and a kind of data processing system, improve data deduplication operation efficiency.
To achieve the above object, this application provides a kind of data duplicate removal methods, are applied to FPGA, comprising:
When receiving the data processing command of host transmission, according to target byte number reads the pending data in caching;
Data deduplication processing is carried out to the pending data using duplicate removal formula, obtains duplicate removal data;
After the completion of the pending data in the caching all processing, Xiang Suoshu host returns to the duplicate removal data.
Wherein, further includes:
When receiving the creation cache command of host transmission, the caching is being locallyd create.
To achieve the above object, this application provides a kind of data deduplication systems, are applied to FPGA, comprising:
Read module, for when receiving the data processing command of host transmission, according to target byte number to be read in caching Pending data;
Processing module obtains tuple for carrying out data deduplication processing to the pending data using duplicate removal formula According to;
Return module, for after the completion of the pending data in the caching all processing, Xiang Suoshu host to return to institute State duplicate removal data.
Wherein, further includes:
Creation module, for localling create the caching when receiving the creation cache command of host transmission.
To achieve the above object, this application provides a kind of FPGA, comprising:
Memory, for storing AOCX file;
Processor is realized when for executing the AOCX file such as the step of above-mentioned data duplicate removal method.
To achieve the above object, this application provides a kind of computer readable storage medium, the computer-readable storages It is stored with AOCX file on medium, realizes when the AOCX file is executed by processor such as the step of above-mentioned data duplicate removal method.
To achieve the above object, this application provides a kind of data processing systems, comprising:
Such as above-mentioned FPGA;
It is connected with the FPGA, for pending data to be transmitted in the caching in the FPGA, Xiang Suoshu FPGA hair Data processing command is sent, and receives the host for the duplicate removal data that the FPGA is returned.
Wherein, the host is connected with the FPGA by PCIE interface.
Wherein, further includes:
For generating the hash generator Hash of the host side program of the host and the Kernel program of the FPGA;
It is connected with the hash generator Hash, generates the first of executable file for being compiled to the host side program Compiler;
It is connected with the hash generator Hash, for being compiled the second volume for generating AOCX file to the Kernel program Translate device.
Wherein, first compiler is specially GCC compiler, and second compiler is specially AOC.
By above scheme it is found that a kind of data duplicate removal method provided by the present application, comprising: when receive host transmission When data processing command, according to target byte number reads the pending data in caching;Using duplicate removal formula to the number to be processed According to data deduplication processing is carried out, duplicate removal data are obtained;After the completion of the pending data in the caching all processing, Xiang Suoshu Host returns to the duplicate removal data.
Data duplicate removal method provided by the present application uses general processor CPU to run main program, FPGA as host side (Chinese name: field programmable gate array, full name in English: Field-Programmable Gate Array) is handled as association Device carries out the hardware acceleration unit of data deduplication algorithm, and the degree of parallelism of algorithm is improved by the optimal way of parallel pipelining process, can The effective execution performance for promoting data deduplication algorithm improves throughput performance when algorithm executes.This implementation simultaneously The time that can be effectively shortened when data deduplication algorithm to be deployed to the isomery acceleration platform based on FPGA, simplify research and development of products week Phase.Disclosed herein as well is a kind of data deduplication system and a kind of FPGA, a kind of computer readable storage medium and a kind of data Processing system is equally able to achieve above-mentioned technical effect.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited Application.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.Attached drawing is and to constitute specification for providing further understanding of the disclosure A part, be used to explain the disclosure together with following specific embodiment, but do not constitute the limitation to the disclosure.Attached In figure:
Fig. 1 is a kind of flow chart of data duplicate removal method shown according to an exemplary embodiment;
Fig. 2 is a kind of structure chart of data deduplication system shown according to an exemplary embodiment;
Fig. 3 is the structure chart of FPGA shown according to an exemplary embodiment a kind of;
Fig. 4 is a kind of structure chart of data processing system shown according to an exemplary embodiment;
Fig. 5 is the structure chart of another data processing system shown according to an exemplary embodiment.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.
The embodiment of the present application discloses a kind of data duplicate removal method, improves data deduplication operation efficiency.
Referring to Fig. 1, a kind of flow chart of data duplicate removal method shown according to an exemplary embodiment, as shown in Figure 1, packet It includes:
S101: when receiving the data processing command of host transmission, according to target byte number reads to be processed in caching Data;
The executing subject of the present embodiment can use PCIE between FPGA, with host (such as general processor CPU) Interface connection, carries out data communication.For the PCX data deduplication algorithm of host side and FPGA can (Chinese be complete using OpenCL Claim: open computing language, full name in English: Open Computing Language) high-level language description, it is hard with general RTL Part description language implementation is compared, and the generation of FPGA hardware bit stream is carried out using OpenCL language, can be effectively improved calculation Method realizes efficiency, reduces the development cycle that algorithm is realized.
The host side program run on general processor CPU is generated respectively, and the Kernel journey towards FPGA platform Sequence.Then, host side program is compiled using GCC compiler, generation can execute executable on general processor CPU Program file is compiled Kernel program file using Altera SDK for OpenCL (AOC) High Level Synthesis tool Comprehensive, generation can AOCX file running on the FPGA.Finally, running host side program on general processor CPU, call PCX data deduplication core algorithm hardware circuit on FPGA carries out hardware-accelerated.
Above-mentioned host side program is specially that pending data is sent in the caching of FPGA first, it is to be understood that It can also include when receiving the creation cache command of host transmission, in the step for localling create the caching before this step Suddenly.For example, caching of the DDR3 memory as data on FPGA can be used.
In this step, host sends the data deduplication algorithm at the data processing command control starting end FPGA to FPGA, After FPGA receives the data processing command of host, i.e., above-mentioned Kernel program includes: according to target byte number reading caching Pending data, it is different according to the algorithm of reading, read the pending data of different target word joint numbers every time.
S102: data deduplication processing is carried out to the pending data using duplicate removal formula, obtains duplicate removal data;
After pending data in FPGA according to target byte number reading caching, carried out using data of the duplicate removal formula to reading Data deduplication processing obtains duplicate removal data, returns to host side and complete letter until pending data all complete by processing in caching Number.
S103: after the completion of the pending data in the caching all processing, tuple is removed described in the return of Xiang Suoshu host According to.
In this step, after the completion of the pending data in caching all processing, returning to duplicate removal to host, treated Duplicate removal data.Return step can trigger herein for FPGA, or host triggering triggers FPGA, i.e., in caching All processing directly returns to tuple evidence to host after the completion to pending data.Host is triggered, when host receives FPGA After the completion signal at end, from the caching on the DDR memory of the end FPGA, by duplicate removal reading data to host side.
Data duplicate removal method provided by the embodiments of the present application uses general processor CPU to run main program as host side, FPGA carries out the hardware acceleration unit of data deduplication algorithm as coprocessor, improves algorithm by the optimal way of parallel pipelining process Degree of parallelism, can effectively promote the execution performance of data deduplication algorithm, improve throughput performance when algorithm executes.Simultaneously The time that this implementation can effectively shorten when data deduplication algorithm to be deployed to the isomery acceleration platform based on FPGA, letter Change the research and development of products period.
A kind of data deduplication system provided by the embodiments of the present application is introduced below, a kind of data described below are gone Weight system can be cross-referenced with a kind of above-described data duplicate removal method.
Referring to fig. 2, the structure chart of a kind of data deduplication system shown according to an exemplary embodiment, as shown in Fig. 2, packet It includes:
Read module 201, for when receiving the data processing command of host transmission, according to target byte number to read caching In pending data;
Processing module 202 obtains duplicate removal for carrying out data deduplication processing to the pending data using duplicate removal formula Data;
Return module 203, for after the completion of the pending data in the caching all processing, Xiang Suoshu host to be returned The duplicate removal data.
On the basis of the above embodiments, as a preferred implementation manner, further include:
Creation module, for localling create the caching when receiving the creation cache command of host transmission.
Data deduplication system provided by the embodiments of the present application uses general processor CPU to run main program as host side, FPGA carries out the hardware acceleration unit of data deduplication algorithm as coprocessor, improves algorithm by the optimal way of parallel pipelining process Degree of parallelism, can effectively promote the execution performance of data deduplication algorithm, improve throughput performance when algorithm executes.Simultaneously The time that this implementation can effectively shorten when data deduplication algorithm to be deployed to the isomery acceleration platform based on FPGA, letter Change the research and development of products period.
About the system in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Present invention also provides a kind of FPGA, referring to Fig. 3, a kind of structure chart of FPGA provided by the embodiments of the present application, such as It may include processor 11 and memory 12 shown in Fig. 3.The FPGA can also include multimedia component 13, input/output (I/ O) one or more of interface 14 and communication component 15.
Wherein, processor 11 is used to control the integrated operation of the FPGA, complete in above-mentioned data duplicate removal method to complete Portion or part steps.Memory 12 is for storing various types of data to support the operation in the FPGA, these data are for example It may include the instruction and the relevant data of application program of any application or method for being operated on the FPGA, Such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 12 can be by any kind of volatibility Or non-volatile memory device or their combination are realized, such as static random access memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read- Only Memory, abbreviation PROM), read-only memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash Device, disk or CD.Multimedia component 13 may include screen and audio component.Wherein screen for example can be touch screen, sound Frequency component is used for output and/or input audio signal.For example, audio component may include a microphone, microphone is for connecing Receive external audio signal.The received audio signal can be further stored in memory 12 or be sent out by communication component 15 It send.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O interface 14 is processor 11 and other connect Interface is provided between mouth mold block, other above-mentioned interface modules can be keyboard, mouse, button etc..These buttons can be virtually Button or entity button.Communication component 15 is for carrying out wired or wireless communication between the FPGA and other equipment.Channel radio Letter, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they One or more of combination, therefore the corresponding communication component 15 may include: Wi-Fi module, bluetooth module, NFC mould Block.
In one exemplary embodiment, FPGA can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, referred to as DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing above-mentioned data duplicate removal method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of AOCX file realizes above-mentioned data duplicate removal method when being executed by processor.For example, the computer readable storage medium can Think that the above-mentioned memory 12 including AOCX file, above-mentioned journey AOCX file can be executed above-mentioned to complete by the processor 11 of FPGA Data duplicate removal method.
Present invention also provides a kind of data processing systems, as shown in Figure 4, comprising:
Such as the FPGA200 of above-described embodiment;
It is connected with the FPGA200, for pending data to be transmitted in the caching in the FPGA200, Xiang Suoshu FPGA200 sends data processing command, and receives the host 100 for the duplicate removal data that the FPGA200 is returned.
On the basis of the above embodiments, the host 100 and FPGA200 is logical as a preferred implementation manner, PCIE interface is crossed to be connected.
On the basis of the above embodiments, as a preferred implementation manner, as shown in Figure 5, further includes:
For generating the hash generator Hash 300 of the host side program of the host and the Kernel program of the FPGA;
It is connected with the hash generator Hash 300, generates executable file for being compiled to the host side program First compiler 400;
It is connected with the hash generator Hash 300, generates the of AOCX file for being compiled to the Kernel program Two compilers 500.
On the basis of the above embodiments, first compiler is specially GCC compiling as a preferred implementation manner, Device, second compiler is specially AOC.
Data deduplication system provided by the embodiments of the present application uses general processor CPU to run main program as host side, FPGA carries out the hardware acceleration unit of data deduplication algorithm as coprocessor, improves algorithm by the optimal way of parallel pipelining process Degree of parallelism, can effectively promote the execution performance of data deduplication algorithm, improve throughput performance when algorithm executes.Simultaneously The time that this implementation can effectively shorten when data deduplication algorithm to be deployed to the isomery acceleration platform based on FPGA, letter Change the research and development of products period.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.It should be pointed out that for those skilled in the art, under the premise of not departing from the application principle, also Can to the application, some improvement and modification can also be carried out, these improvement and modification also fall into the protection scope of the claim of this application It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (10)

1. a kind of data duplicate removal method, which is characterized in that be applied to FPGA, comprising:
When receiving the data processing command of host transmission, according to target byte number reads the pending data in caching;
Data deduplication processing is carried out to the pending data using duplicate removal formula, obtains duplicate removal data;
After the completion of the pending data in the caching all processing, Xiang Suoshu host returns to the duplicate removal data.
2. data duplicate removal method according to claim 1, which is characterized in that further include:
When receiving the creation cache command of host transmission, the caching is being locallyd create.
3. a kind of data deduplication system, which is characterized in that be applied to FPGA, comprising:
Read module, for when receive host transmission data processing command when, according to target byte number read caching in Handle data;
Processing module obtains duplicate removal data for carrying out data deduplication processing to the pending data using duplicate removal formula;
Return module, for being gone described in the return of Xiang Suoshu host after the completion of the pending data in the caching all processing Tuple evidence.
4. data deduplication system according to claim 3, which is characterized in that further include:
Creation module, for localling create the caching when receiving the creation cache command of host transmission.
5. a kind of FPGA characterized by comprising
Memory, for storing AOCX file;
Processor, the step of realizing data duplicate removal method as claimed in claim 1 or 2 when for executing the AOCX file.
6. a kind of computer readable storage medium, which is characterized in that be stored with AOCX text on the computer readable storage medium The step of part, the AOCX file realizes data duplicate removal method as claimed in claim 1 or 2 when being executed by processor.
7. a kind of data processing system characterized by comprising
FPGA as claimed in claim 5;
It is connected with the FPGA, for pending data to be transmitted in the caching in the FPGA, Xiang Suoshu FPGA sends number It is ordered according to processing, and receives the host for the duplicate removal data that the FPGA is returned.
8. data processing system according to claim 7, which is characterized in that the host and the FPGA pass through PCIE interface It is connected.
9. data processing system according to claim 7, which is characterized in that further include:
For generating the hash generator Hash of the host side program of the host and the Kernel program of the FPGA;
It is connected with the hash generator Hash, for being compiled the first compiling for generating executable file to the host side program Device;
It is connected with the hash generator Hash, for being compiled the second compiling for generating AOCX file to the Kernel program Device.
10. data processing system according to claim 9, which is characterized in that first compiler is specially GCC compiling Device, second compiler is specially AOC.
CN201910013012.0A 2019-01-07 2019-01-07 A kind of data duplicate removal method, system and FPGA and data processing system Withdrawn CN109739558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910013012.0A CN109739558A (en) 2019-01-07 2019-01-07 A kind of data duplicate removal method, system and FPGA and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910013012.0A CN109739558A (en) 2019-01-07 2019-01-07 A kind of data duplicate removal method, system and FPGA and data processing system

Publications (1)

Publication Number Publication Date
CN109739558A true CN109739558A (en) 2019-05-10

Family

ID=66363622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910013012.0A Withdrawn CN109739558A (en) 2019-01-07 2019-01-07 A kind of data duplicate removal method, system and FPGA and data processing system

Country Status (1)

Country Link
CN (1) CN109739558A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143272A (en) * 2019-12-28 2020-05-12 浪潮(北京)电子信息产业有限公司 Data processing method and device for heterogeneous computing platform and readable storage medium
CN113076066A (en) * 2021-04-14 2021-07-06 湖南兴天电子科技有限公司 High-capacity high-speed storage device and operation method thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143272A (en) * 2019-12-28 2020-05-12 浪潮(北京)电子信息产业有限公司 Data processing method and device for heterogeneous computing platform and readable storage medium
CN113076066A (en) * 2021-04-14 2021-07-06 湖南兴天电子科技有限公司 High-capacity high-speed storage device and operation method thereof
CN113076066B (en) * 2021-04-14 2023-12-08 湖南兴天电子科技股份有限公司 High-capacity high-speed storage device and operation method thereof

Similar Documents

Publication Publication Date Title
TWI489779B (en) Boolean logic in a state machine lattice
US10169513B2 (en) Method and system for designing FPGA based on hardware requirements defined in source code
CN107609644B (en) Method and system for data analysis in a state machine
TWI569206B (en) Results generation for state machine engines
TWI486810B (en) Counter operation in a state machine lattice
TWI515668B (en) Methods and systems for detection in a state machine
TWI600295B (en) Methods and systems for routing in a state machine
TWI502501B (en) Unrolling quantifications to control in-degree and/or out degree of automation
TWI488110B (en) State machine engine and method for the same
CN102652308B (en) Method and system for efficiently loading of data into memory of computing system
JP6994071B2 (en) Comprehensive validation method for Protocol buffer-based projects
CN106371807B (en) A kind of method and device of extensible processor instruction set
CN111538521B (en) Intelligent contract deployment and transaction method and device
US8365111B2 (en) Data driven logic simulation
KR20140047916A (en) Data compressor, memory system comprising the compress and method for compressing data
TW200304086A (en) Method and system for data flow control of execution nodes of an adaptive computing engine(ACE)
JP5419419B2 (en) system
US20160117154A1 (en) Automated software include graph and build environment analysis and optimization in compiled language
CN109739558A (en) A kind of data duplicate removal method, system and FPGA and data processing system
CN109618165A (en) A kind of picture decoding method, system and host and image processing system
CN103999050A (en) Methods and apparatuses for computing checksums for effective caching in continuous distributed builds
CN102279728B (en) Data storage device and method for computing data
CN110399596A (en) A kind of file layout change-over method, system and associated component
US8862785B2 (en) System and method for redirecting input/output (I/O) sequences
CN112463214B (en) Data processing method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190510