CN108268216B - Data processing method, device and server - Google Patents

Data processing method, device and server Download PDF

Info

Publication number
CN108268216B
CN108268216B CN201810009925.0A CN201810009925A CN108268216B CN 108268216 B CN108268216 B CN 108268216B CN 201810009925 A CN201810009925 A CN 201810009925A CN 108268216 B CN108268216 B CN 108268216B
Authority
CN
China
Prior art keywords
data
corresponding relationship
object data
duplication
engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810009925.0A
Other languages
Chinese (zh)
Other versions
CN108268216A (en
Inventor
陈钊
冯宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Technologies Co Ltd
Original Assignee
New H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd filed Critical New H3C Technologies Co Ltd
Priority to CN201810009925.0A priority Critical patent/CN108268216B/en
Publication of CN108268216A publication Critical patent/CN108268216A/en
Application granted granted Critical
Publication of CN108268216B publication Critical patent/CN108268216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application proposes a kind of data processing method, device and server, is related to object storage technology field.This method is received the object data that client is sent by data de-duplication engine, and to received object data carry out data de-duplication processing to judge the received object data whether in the object storage device cluster for repeated data, when the received object data is not repeated data, corresponding object storage device in the object storage device cluster is sent by the received object data according to the first preset rules, when object data is repeated data, then data de-duplication engine is not written into, to guarantee that the identical object data of content can be only written into once in object storage device cluster, realize global data de-duplication processing, effectively reduce the memory space usage amount of object storage system.

Description

Data processing method, device and server
Technical field
This application involves object storage technology fields, in particular to a kind of data processing method, device and service Device.
Background technique
For existing object storage system during carrying out data processing, client directly sends out the request of object data Some object storage device in object storage device (Object-based Storage Device, OSD) cluster is sent, by The object storage device directly manages storage medium space, such as space logical block (LBA) of hard disk, realizes depositing for object data It takes.
If storing many repeated datas in an object storage device, such as in different users stores in cloud disk Hold identical document, picture, video etc., memory space can be caused greatly to waste, therefore need to delete repeated data. Existing solution is object storage device after the object data for receiving client transmission, is docked by object storage device The object data of receipts carries out data de-duplication processing, so that entire object storage device cluster is stored in many repeated datas, makes At the waste of memory space.
Summary of the invention
The embodiment of the present application is designed to provide a kind of data processing method, device and server, is deposited with promoting object Storage space utilization in storage system.
To achieve the goals above, the embodiment of the present application the technical solution adopted is as follows:
In a first aspect, the embodiment of the present application proposes a kind of data processing method, applied to the repetition in object storage system Data delete engine, include being equipped with the server of the data de-duplication engine and by extremely in the object storage system The object storage device cluster of few object storage device composition, the data de-duplication engine and described at least one is right As storage equipment communication connection.The data processing method includes: the object data for receiving client and sending;To received number of objects It handles to judge the received object data whether in the object storage device cluster according to data de-duplication is carried out For repeated data;It, will be described received right according to the first preset rules when the received object data is not repeated data Image data is sent to corresponding object storage device in the object storage device cluster.
Second aspect, the embodiment of the present application also propose a kind of data processing equipment, are applied to object storage system, described right As including being equipped with the server of data de-duplication engine and being made of at least one object storage device in storage system Object storage device cluster, the data de-duplication engine and at least one described object storage device communication connection.Institute Stating data processing equipment includes the data de-duplication engine.The data de-duplication engine includes receiving module, data Processing module and sending module is written, which is used to receive the object data of client transmission, data write-in processing Module be used for received object data carry out data de-duplication processing to judge the received object data whether It is repeated data in the object storage device cluster, which is used to when the received object data be not repeat number According to when, it is corresponding right in the object storage device cluster to send the received object data to according to the first preset rules As storing equipment.
The third aspect, the embodiment of the present application also propose a kind of server, and the server application is described in object storage system Object storage system includes the object storage device cluster being made of at least one object storage device, and the server includes: Memory, for storing one or more programs;Processor;When one or more of programs are executed by the processor, Realize method as described above.
Compared with the prior art, in the embodiment of the present application, since object storage system includes being equipped with repeated data to delete Server except engine and the object storage device cluster that is made of at least one object storage device, when storage object data When, object data is not sent directly to object storage device by client, it is first sent to data de-duplication engine, by After data de-duplication engine carries out data de-duplication processing to the received object data, then it is stored in corresponding object It stores in equipment, realizes the technical effect of global data de-duplication processing, effectively reduce the storage of object storage system Space usage amount.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the composition schematic diagram of object storage system provided by the embodiment of the present application.
Fig. 2 shows the functional block diagrams of server provided by the embodiment of the present application.
Fig. 3 shows the first corresponding relationship that data de-duplication engine provided by the embodiment of the present application is safeguarded and The schematic diagram of two corresponding relationships.
Fig. 4 show data de-duplication engine provided by the embodiment of the present application be it is multiple in the case where schematic diagram.
Fig. 5 shows the flow diagram of data processing method provided by the embodiment of the present application.
Fig. 6 show object data granularity it is larger when data de-duplication engine maintenance the second corresponding relationship signal Figure.
Fig. 7 shows the flow diagram of data processing method provided by another embodiment of the application.
Fig. 8 shows the flow diagram of not repeated data write-in.
Fig. 9 shows the flow diagram of repeated data write-in.
Figure 10 shows the flow diagram of colliding data write-in.
Figure 11 shows the flow diagram of data processing method provided by another embodiment of the application.
Figure 12 shows the flow diagram of upgating object data.
Figure 13 shows the flow diagram of data processing method provided by another embodiment of the application.
Figure 14 shows the flow diagram for obtaining object data.
Figure 15 shows the functional block diagram of data processing equipment provided by another embodiment of the application.
Icon: 10- object storage system;100- server;200- object storage device cluster;300- client;400- Data processing equipment;110- memory;120- processor;130- communication interface;410- receiving module;420- data write-in processing Module;430- sending module;440- corresponding relationship maintenance module;450- reading data processing module.
Specific embodiment
During realizing the technical solution of the embodiment of the present application, present inventor's discovery:
Existing data de-duplication processing is handled on each object storage device, when client writes data, Object data is sent directly to object storage device, object storage device by the object data and fingerprint of record, fingerprint with The mapping relations of logical block address realize data de-duplication processing.For example, identical for content object data (such as object Data a and object data d) only stores a data to this position logical block address LBA1.
Based on the studies above, inventor is by multi-party investigation discovery, and data de-duplication processing in the prior art is not It is that the overall situation is deleted again, there are still repeated data for entire object storage device cluster, reason is: content is identical multiple Object data may be sent in multiple and different object storage devices and be stored again after repeated data delete processing, And object storage device is that data de-duplication processing is carried out to the received object data of this object storage device in the prior art, Therefore it can only guarantee there is no the identical object data of content in an object storage device, but two even more than object storage device Between there is likely to be the identical object data of content, lead to not realize that the overall situation is deleted again.
Defect present in the above scheme in the prior art, is that inventor is obtaining after practicing and carefully studying As a result, therefore, the solution that the discovery procedure of the above problem and hereinafter the embodiment of the present application are proposed regarding to the issue above Scheme all should be the contribution that inventor makes the present invention in process of the present invention.
Below in conjunction with attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Ground description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Usually exist The component of the embodiment of the present application described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed the application's to the detailed description of the embodiments herein provided in the accompanying drawings below Range, but it is merely representative of the selected embodiment of the application.Based on embodiments herein, those skilled in the art are not being done Every other embodiment acquired under the premise of creative work out, shall fall in the protection scope of this application.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile the application's In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Fig. 1 is please referred to, is the composition schematic diagram of object storage system 10 provided by the embodiment of the present application.As shown in Figure 1, Object storage system 10 include server 100, by least one object storage device (OSD.1, OSD.2 as shown in Figure 1, OSD.3 ..., OSD.xx) composition object storage device cluster 200 and client 300, be equipped in the server 100 Data de-duplication engine realizes the access of object data for handling the request of data from client 300.Institute Data de-duplication engine and the communication connection of at least one described object storage device are stated, data de-duplication engine actually may be used To be some processes for executing data de-duplication operations, and object storage device and data de-duplication engine are two independences Process, need to be communicated by network implementations.Wherein, object storage device can be engine-operated in a service with data de-duplication In device 100, also it may operate in the individual equipment independently of server 100.
In the present embodiment, the client 300 may be, but not limited to, smart phone, PC (personal Computer, PC), tablet computer, personal digital assistant (personal digital assistant, PDA), mobile Internet access set Standby (mobile Internet device, MID) etc..
As shown in Fig. 2, for the functional block diagram of server 100 provided by the present embodiment.The server 100 can be with Including memory 110, processor 120 and communication interface 130, the memory 110, processor 120 and communication interface 130, respectively It is directly or indirectly electrically connected between element, to realize the transmission or interaction of data.For example, these elements between each other may be used It is realized and is electrically connected by one or more communication bus or signal wire.Processor 120 is used to execute to store in memory 110 Executable module, such as computer program.
Wherein, memory 110 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Memory 110 can be used for storing software program and module, and one or more data de-duplication engines include that at least one can The operation system in the server 100 is stored in memory 110 or is solidificated in the form of software or firmware (firmware) Software function module in system (operating system, OS).The processor 120 executes after receiving and executing instruction One or more programs with realize the embodiment of the present application disclose data processing method.The communication interface 130 can be used for and other Node device carries out the communication of signaling or data.
Processor 120 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization Each step of method can be completed by the integrated logic circuit of the hardware in processor 120 or the instruction of software form.On The processor 120 stated can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), dedicated Integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor Logical device, discrete hardware components.
The embodiment of the present application also provides a kind of computer readable storage mediums, are stored thereon with computer program, the meter Calculation machine program realizes the data processing method that the embodiment of the present application discloses when being executed by processor 120.
As shown in figure 3, in the present embodiment, maintenance has the first corresponding relationship (Hash in data de-duplication engine Table), first corresponding relationship includes the unique identification information of the object data of the data de-duplication engine maintenance (for example, h1, h2, h3) is deposited into the corresponding of the data name (for example, o1, o2, o3) in object storage device with object data Relationship.Wherein, unique identification information can also use it by taking the hash value of object data as an example, but in practice in the present embodiment He is worth, as long as this value can be with unique identification object data, the anthropoid fingerprint of class, people that can be different with unique identification, Therefore we can also visually unique identification information be referred to as object data finger print information.
Wherein, the object data be deposited into the data name in object storage device can be according to unique mark of the object data Know information acquisition.For example, the unique identification information h1 to object data obtains the object data plus Magic number (or prefix) The data name o1 being deposited into object storage device, the received object data are deposited into corresponding object storage device Data name includes to indicate that the received object data has carried out the identification information of data de-duplication processing and (can be regarded as above-mentioned Magic number).When in the data name for the object data stored in object storage device including above-mentioned identification information, then show The object data is deposited into the object storage device after completing data de-duplication processing by data de-duplication engine. It, can also be directly using the unique identification information of the object data as being deposited into pair it should be noted that in other embodiments The data name in object storage device answered.
Optionally, which can also include the corresponding reference count of unique identification information of object data, The reference count be used for it is every judge an object data for repeated data when to the unique identification information of the object data Accordingly increase primary counting, for example, the corresponding reference count of unique identification information h1 is 2 (a and d), Wei Yibiao in Fig. 3 Knowing the corresponding reference count of information h2 is 1 (b), and the corresponding reference count of unique identification information h3 is 1 (c).
Optionally, in the present embodiment, believe for the ease of the title of query object data and the unique identification of object data The record of breath, and then judge whether object data has been stored in object storage device cluster 200, the data de-duplication engine In also maintenance have the second corresponding relationship (Mapping Table), second corresponding relationship includes that the data de-duplication draws Hold up the unique identification information of title (for example, a, b, c) and object data of the object data of the transmission of reception client 300 of maintenance The corresponding relationship of (for example, h1, h2, h3).Data de-duplication engine, can be directly according to object after receiving object data The title of data searches corresponding unique identification information in the second corresponding relationship.Wherein, data de-duplication processing is a kind of Data reducti techniques do not have to the identical data of content to repeat storage.And in the present embodiment, it is different in object data title In the case of, different object data titles can correspond to the unique identification information of its object data, such as: the number of objects of entitled a According to unique identification information, such as cryptographic Hash is h1, and the unique identification information of the object data of entitled d, such as cryptographic Hash, The corresponding relationship of the corresponding relationship of a and h1 and b and h1 can all be inserted respectively for h1 for the ease of the subsequent lookup to a and d Enter into the second mapping table.In the present embodiment, Hash calculation can be carried out according to the content of object data obtain object The unique identification information fingerprint of object data (obtain) of data, thus the identical object data of content can have it is identical only One identification information.For example, object data a and object data d in Fig. 3, their unique identification information is h1.
It should be understood that the title of object data, refers to the name for the object data that client 300 is sent;Object data is deposited Enter the data name into object storage device, refers to the object data that data de-duplication engine sends client 300 After carrying out the processing of data de-duplication engine, which is deposited into the name in object storage device, and the name of the two can It, can not also be identical with identical.
As shown in figure 4, giving the data de-duplication engine is multiple (for example, Eng.1, Eng.2, Eng.3 ...) In the case where schematic diagram, the property of 10 data processing of object storage system can be improved in the setting of multiple data de-duplication engines Energy.Each data de-duplication engine maintenance based on oneself complete data de-duplication processing object data established first Corresponding relationship and maintenance, which have, receives the second corresponding relationship that the object data that client 300 is sent is established based on oneself.Example Such as, the first corresponding relationship and the second corresponding relationship of Eng.1 maintenance are respectively Hash Table 1, Mapping Table 1, The first corresponding relationship and the second corresponding relationship of Eng.2 maintenance are respectively Hash Table 2, Mapping Table 2, Eng.3 The first corresponding relationship and the second corresponding relationship of maintenance are respectively Hash Table 3, Mapping Table 3.
In the present embodiment, identical for unique identification information multiple right when there is multiple data de-duplication engines Image data determines the data de-duplication engine for executing data de-duplication operations according to identical rule, inevitable by same Data de-duplication engine carries out data de-duplication processing, and the identical object data of content is in object storage device cluster 200 In can only be written into primary, therefore data de-duplication processing is global.If have multiple deduplication engines, each engine can be tieed up Shield includes whole the first corresponding relationships and the second corresponding relationship of corresponding stored data object, and the embodiment of the present application In each data de-duplication engine maintenance based on oneself complete data de-duplication processing object data established first Corresponding relationship, and maintenance is based on oneself receiving the second corresponding relationship established by the object data that client 300 is sent, It can effectively reduce the storage overhead and query cost of each data de-duplication engine.
In the present embodiment, the first corresponding relationship of each data de-duplication engine maintenance and the second corresponding relationship pass through It is backed up at least one different data de-duplication engine at other, ensure that and safeguarded on each data de-duplication engine The first corresponding relationship and the second corresponding relationship high reliability.For example, Hash Table 1 and copy Hash in Fig. 4 Table 1 ', Hash Table 2 and copy Hash Table 2 ', Hash Table3 and copy Hash Table3 ', and Mapping Table 1 and copy Mapping Table 1 ', Mapping Table 2 and copy Mapping Table2 ', Mapping Table 3 and copy Mapping Table 3 '.It should be noted that working as any one data de-duplication engine pair When the first corresponding relationship or the second corresponding relationship oneself safeguarded have carried out update, which is also notified that Maintenance has other data de-duplication engines of the copy of first corresponding relationship or the second corresponding relationship to synchronize copy It updates.
In the present embodiment, in order to guarantee the high reliability of object data, when data de-duplication engine is to corresponding right It, can be right in other object storage devices inside object storage device cluster 200 when storing a object data as storage equipment Object data is backed up, to guarantee the high reliability of object data itself.It should be understood that other object storage devices are stored The object data backup be not by data de-duplication engine handle after be stored in, data de-duplication engine only exists Determine received object data be not repeated data when, the received object data can be just written in object storage device, And whether the object data is backed up inside object storage device cluster 200, data de-duplication engine is simultaneously not concerned with.
It referring to figure 5., is the flow diagram of data processing method provided by the embodiment of the present application.The data processing Method can be applied to the server 100 in object storage system 10 provided by above-described embodiment, be applied particularly to server 100 In data de-duplication engine.It should be noted that data processing method described in the embodiment of the present application not with Fig. 5 and Specific order as described below is limitation, it should be understood that in other embodiments, data processing method described herein is wherein The sequence of part steps can be exchanged with each other according to actual needs or part steps therein also can be omitted or delete.Under Process shown in fig. 5 will be described in detail in face.
Step S101 receives the object data that client 300 is sent.
Wherein, in the present embodiment, each object data has a title (id), for example, object data a, title For a.Client 300 is sent directly to corresponding data de-duplication engine according to third preset rules, by object data.Its In, which may is that client 300 calculates cryptographic Hash according to the title of object data, be determined according to cryptographic Hash Corresponding data de-duplication engine.Client 300 selects the rule for being sent to which data de-duplication engine, once really After fixed, the identical object data of title can be sent to the same data de-duplication engine.
Step S102 carries out data de-duplication processing to received object data to judge the received number of objects According to whether in the object storage device cluster 200 be repeated data.
Wherein, maintenance has the first corresponding relationship in the data de-duplication engine, and first corresponding relationship includes institute The unique identification information and object data for stating the object data of data de-duplication engine maintenance are deposited into object storage device Data name corresponding relationship.The data de-duplication engine is when receiving object data, according to the object data Content calculates the unique identification information of the object data, according to the unique identification information being calculated in the first corresponding pass of maintenance The record of the unique identification information is inquired in system.
When not finding the record of the unique identification information in the first corresponding relationship, then the received number of objects According to for not repeated data.When inquiring the record of the unique identification information in the first corresponding relationship, then received data For repeated data.
It certainly, can also be according to the unique identification in practice in order to keep the whether duplicate judgement of object data more accurate The corresponding data name of information reads corresponding object data, and the object that will be read out from object storage device cluster 200 The content of data is compared with the content of the received object data, can be compared according to byte, if completely the same, Then the received object data is repeated data;If inconsistent, data collision has occurred.
It should be noted that carried out using object as granularity due to data de-duplication processing, larger for granularity (such as Greater than 4M) object data, data de-duplication engine deals with relatively difficult.Therefore, when the granularity of object data is larger When, data de-duplication process flow can be skipped, which is directly written object storage device cluster by client 300 200.As another embodiment, when received object data is greater than default size, the data de-duplication engine can The biggish object data of received granularity is divided into the multiple subobject numbers for meeting default size (e.g., less than or equal to 4M) According to the progress data de-duplication processing of each subobject data.
Step S103 connects according to the first preset rules by described when the received object data is not repeated data The object data of receipts is sent to corresponding object storage device in the object storage device cluster 200.
In the present embodiment, when the received object data be not repeated data when, data de-duplication engine according to First preset rules store received object data into object storage device cluster 200.In the present embodiment, in order to distinguish Whether the object data stored in object storage device passes through data de-duplication processing, can be according to unique mark of the object data Know the information acquisition object data and be deposited into the data name in object storage device cluster 200, which includes to indicate The received object data has carried out the identification information of data de-duplication processing.
In the present embodiment, first preset rules may is that the data name to the object data of acquisition is breathed out Uncommon operation, so that it is determined which object storage device the received object data should be stored in.For example, for object data a, Unique identification information is h1, obtains object data a according to unique identification information h1 and is deposited into object storage device cluster 200 In data name o1;Calculating cryptographic Hash to data name o1 is 1, it is determined that object data a should be stored in object storage device OSD.1, and the title being deposited into object storage device OSD.1 is changed to o1 by a.What certainly, specifically it is deposited into using rule In object storage device, the application and without limitation.
It should be noted that in the present embodiment, data de-duplication engine is set by object data write-in object storage When standby, only corresponding object storage device need to be sent by object data according to the data name of object data, be deposited by object Storage equipment oneself management storage medium space determines the storage location of object data, such as space logical block (LBA) of hard disk, and By object data storage to corresponding position.For example, object storage device can be determined according to the data name of object data The logical block address that should be stored, and the data name of record object data is deposited with object data after the storage for completing object data (for example, the corresponding logical block address of data name o1 is LBA1, data name o8 is corresponding for the corresponding relationship for the logical block address put Logical block address be LBA8).
In the present embodiment, the data that data de-duplication engine direct reception client end 300 is sent carry out repeated data and delete After processing, be then forwarded to object storage device and stored, when object storage device be it is multiple when, the mode of the present embodiment and Data de-duplication only is carried out by each object storage device oneself in the prior art to compare, and be can be realized the overall situation and is deleted again, improves Efficiency is deleted again.
Optionally, when the received object data of institute is not repeated data, data de-duplication engine need to safeguard oneself The first corresponding relationship be updated so as to the inquiry of subsequent repeated data, therefore, the data processing method can also include:
Step S104, when the received object data is not repeated data, only by the received object data One identification information and the received object data are deposited into the corresponding relationship of the data name in corresponding object storage device It is inserted into first corresponding relationship.
It should be noted that in the present embodiment, when received object data is repeated data, data de-duplication draws It holds up and the object data is not written to object storage device cluster 200, it only need to will be received in the first corresponding relationship of maintenance The corresponding reference count of the unique identification information of object data increases once, indicates that the unique identification is directed toward in a new reference Object data in the corresponding deposit object storage device of information.When data collision occurs, data de-duplication engine is then straight It connects and object storage device cluster 200 is written into object data, title of the object data in storage equipment is not changed, without more New first corresponding relationship.For example, object data k is stored in the name in object storage device when data collision occurs for object data k It is still referred to as k.
Optionally, in order to which whether query object data have been stored in object storage device cluster 200, the repeated data is deleted Except also maintenance has the second corresponding relationship in engine, second corresponding relationship includes connecing for the data de-duplication engine maintenance Receive the corresponding relationship of the title for the object data that client 300 is sent and the unique identification information of object data.At the data Reason method further include:
Step S105, if the unique identification information of the title of the received object data and the received object data Corresponding relationship do not recorded in second corresponding relationship, then by the title of the received object data and the reception The corresponding relationship of unique identification information of object data be inserted into second corresponding relationship.
In the present embodiment, when pair of the title of received object data and the unique identification information of received object data It should be related to and not recorded in second corresponding relationship, show that the object data is a new data, to the object data After executing data de-duplication processing operation, no matter the object data is repeated data or non-repeated data, all needs to receive The title of object data and the corresponding relationship of unique identification information of received object data be inserted into the second corresponding relationship.
It should be noted that in the present embodiment, when data collision has occurred in judgement, for colliding data, repeat number Object storage device cluster 200 is directly written with original title according to engine is deleted, and by the title and number of collisions of colliding data According to the corresponding relationship of specific identification information be inserted into the second corresponding relationship, wherein the title of colliding data is client The specific identification information of the title of 300 object datas sent, colliding data can be replaced with a particular value, for example, this is special Value can be " sky ", and data de-duplication engine inquires the unique identification information of object data in the second corresponding relationship as spy When different value, it can determine whether that the object data is colliding data.
Data de-duplication engine is after receiving the read/write requests of object data of the transmission of client 300, according to object The title of data inquires the record of corresponding unique identification information in the second corresponding relationship safeguarded, can determine whether that object data is It is no to be stored in object storage device cluster 200, if having record in the second corresponding relationship, show the object data in object It is stored in memory device set group 200.
In the present embodiment, if the received object data is divided into the default size of satisfaction by data de-duplication engine Multiple subobject data carry out data de-duplication processing, at this time the second corresponding relationship of data de-duplication engine maintenance It is varied.As shown in fig. 6, being changed to by the corresponding relationship of the unique identification information of the title and object data of object data: (right The title Client Object of image data, start offset offset of the subobject data in object data) and subobject data Unique identification information Hash corresponding relationship.In fig. 6, it is supposed that object data a is divided into two sub- object datas, then What offset1 was indicated is start offset of one of subobject data in object data, and what offset2 was indicated is another Start offset of the subobject data in object data.For example, the object data a that granule size is 8M is divided into two 4M's Subobject data, then offset1 is 0M, offset2 4M, for meeting object data (such as the object data c of default size With object data d), corresponding start offset is offset1 (i.e. 0M).When client 300 needs to read object data a When, data de-duplication engine reads out all subobject data of object data a from object storage device, then according to every Two sub- object datas are carried out group by start offset (i.e. offset1 and offset2) of a sub- object data in object data a Conjunction obtains object data a and returns to client 300.
Fig. 7 is please referred to, is the flow diagram of data processing method provided by another embodiment of the application.In this implementation In example, the data de-duplication engine is at least two, and each data de-duplication engine maintenance is completed based on oneself The first corresponding relationship that the object data of data de-duplication processing is established, and maintenance are based on oneself receiving by client The second corresponding relationship that 300 object datas sent are established.Wherein, each data de-duplication engine receives client 300 The object data of transmission is the object data that each data de-duplication engine is sent to according to the second preset rules, this is extremely Few two data de-duplication engines include the first data de-duplication engine and the second data de-duplication engine.The method Include:
Step S201, the first data de-duplication engine receive the object data that client 300 is sent.
In the present embodiment, client 300 sends the first repeated data for the object data according to third preset rules Delete engine.
Step S202 passes through when the first data de-duplication engine receives the object data that the client 300 is sent Second preset rules determine the second data de-duplication engine for executing data de-duplication processing operation, will be described received right Image data is sent to the second data de-duplication engine.
In the present embodiment, after the first data de-duplication engine receives the object data, pass through the second preset rules Determine the second data de-duplication engine for executing data de-duplication processing operation.Wherein, which may is that The unique identification information of object data calculates cryptographic Hash based on the received, determines the repetition for executing data de-duplication processing operation Data delete engine.That is the content of the first data de-duplication engine object data based on the received calculates the object data only One identification information calculates cryptographic Hash according to the unique identification information, so that it is determined that object data is sent to the second repeated data It deletes engine and carries out data de-duplication processing.It should be noted that the first data de-duplication engine is pre- by second If the second data de-duplication engine determined by rule is also likely to be first data de-duplication engine itself, at this time then by One data de-duplication engine oneself executes data de-duplication processing operation.
Step S203, the second data de-duplication engine based on the second data de-duplication engine maintenance One corresponding relationship carries out data de-duplication processing to the received object data.
In the present embodiment, the second data de-duplication engine inquired in the first corresponding relationship oneself safeguarded described in connect The record of the unique identification information of the object data of receipts, to judge that the received object data is repeated data, non-repeated data Or colliding data, specifically refers to the corresponding contents of above-mentioned steps S102.
In the present embodiment, data de-duplication engine has multiple, which data de-duplication engine to execute repeated data by Delete operation can be set according to set rule, not go to change the rule that client 300 sends data in the present embodiment (such as third preset rules) need client 300 also for the original rule for not changing system when avoiding using this programme Do too big change.Theoretically, client 300 can also directly determine the engine for carrying out data de-duplication operations, without Data de-duplication engine determines.
Therefore, when there is multiple data de-duplication engines, as long as the rule of selection data de-duplication engine is identical , the identical object data of content will be addressed to the same data de-duplication engine and be deleted processing again, if it is repetition Data can then be found in the data de-duplication engine.And if do not found in the data de-duplication engine, It is also impossible to find on other engines.Therefore, the engine of data de-duplication processing operation is carried out, it is only necessary to tie up based on oneself First corresponding relationship of shield is deleted processing again and can be achieved with global deleting again.
And if it is determined that object data be not repeated data when, the embodiment of the present application can also include: step S204, when described Received object data be not repeated data when, the second data de-duplication engine by the received object data only One identification information and the received object data are deposited into the correspondence of the data name in the corresponding object storage device Relationship is inserted into the first corresponding relationship of the second data de-duplication engine maintenance.
In the present embodiment, similar with upper one embodiment, when received object data is not repeated data, the second weight Complex data deletes engine and sends the object storage device cluster for the received object data according to the first preset rules Corresponding object storage device in 200, and by the unique identification information of received object data and the received number of objects The second data de-duplication engine is inserted into according to the corresponding relationship for the data name being deposited into corresponding object storage device to be tieed up In first corresponding relationship of shield, the corresponding reference count of the unique identification information of the received object data at this time is 1.
After the completion of the processing of the second data de-duplication engine, handled successfully to the first data de-duplication engine feedback, First data de-duplication engine is by the unique identification of the title of the received object data and the received object data The corresponding relationship of information is inserted into the second corresponding relationship that the first data de-duplication engine is safeguarded.
When received object data is repeated data, the second data de-duplication engine need to only be corresponded in the first of maintenance The reference count of the unique identification information of received object data is increased once, without writing received object data in relationship Enter to object storage device;And the first data de-duplication engine need to be by the title of the received object data and the reception Object data unique identification information corresponding relationship be inserted into oneself maintenance the second corresponding relationship in.Due to repeated data Refer to the object data that content is identical but title is different, therefore when received object data is repeated data, although without pair The object data is stored, if but client 300 can pass through data de-duplication engine when needing to read the object data Pair of the title of the object data and the unique identification information of object data is inquired in the second corresponding relationship of oneself maintenance The record that should be related to determines object data deposit object storage device further according to unique identification information and the first corresponding relationship Title, and then read data from object storage device and return to client 300, therefore, it is also required in the second corresponding relationship The title of repeated data and the corresponding relationship of unique identification information are recorded, to guarantee that client 300 can read data.
When received object data is colliding data, the second data de-duplication engine draws to the first data de-duplication The prompt of feedback data conflict is held up, after the first data de-duplication engine receives the prompt of data collision, directly by number of objects Object storage device cluster 200 is written in original title accordingly, and the title of the object data is recorded in the second corresponding relationship With corresponding specific identification information.In the present embodiment, in the case where multiple data de-duplication engines, each repeated data Delete the first corresponding relationship and dimension that the object data that engine maintenance is handled based on oneself completion data de-duplication is established Shield, which has, receives the second corresponding pass established by the object data that client 300 is sent based on current data de-duplication engine System.In this way, the corresponding relationship in whole system is divided into multiple portions, it is respectively stored in different data de-duplication engines In.And in the case where selecting data de-duplication engine according to established rule, the corresponding relationship of identical object data only can be It is safeguarded in determining data de-duplication engine, while saving query cost and storage overhead, is also able to achieve the overall situation and deletes again.
In the following, in conjunction with practical application scene respectively to not repeated data, repeated data and colliding data be written process into Row is described in detail.Wherein, at least two data de-duplications engine is respectively Eng.1, Eng.2, Eng.3.
As shown in figure 8, the process may include following steps for the flow diagram of not repeated data write-in.
Client 300 according to the title of object data a carry out Hash calculation after (i.e. according to third preset rules), by object Data a is sent directly to Eng.2, step one as shown in Figure 8.
The unique identification information h1 of object data a is calculated in the content of Eng.2 object data a based on the received, to only One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data a and corresponding unique identification Information h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 8.
After Eng.1 receives object data a and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance The record for not finding unique identification information h1 in Hash Table 1 determines object data a for not repeated data, to unique Identification information h1 obtains the data name that object data a is deposited into corresponding object storage device plus Magic number (or prefix) For o1, the corresponding relationship of unique identification information h1 and data name o1 are inserted into the first corresponding relationship Hash Table 1, this When unique identification information h1 reference count be 1, step three as shown in Figure 8.
Eng.1 carries out Hash calculation (i.e. according to the first preset rules) according to data name o1 and determines storage object data a Object storage device OSD.1, and object data a is stored in object storage device OSD.1, title is changed to o1 by a, such as Step 4 shown in Fig. 8.
Eng.1 is handled successfully to Eng.2 return, step five as shown in Figure 8.
The corresponding relationship of the title a of object data and unique identification information h1 is inserted into second pair of oneself maintenance by Eng.2 It should be related in Mapping Table 2, step six as shown in Figure 8.
Eng.2 returns to object data a to client 300 and is written successfully, step seven as shown in Figure 8.
As shown in figure 9, for the flow diagram of repeated data write-in.The process may include following steps.
Client 300 according to the title of object data d carry out Hash calculation after (i.e. according to third preset rules), by object Data d is sent directly to Eng.3, step one as shown in Figure 9.
The unique identification information h1 of object data d is calculated in the content of Eng.3 object data d based on the received, to only One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data d and corresponding unique identification letter Breath h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 9.
After Eng.1 receives object data d and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance The record that unique identification information h1 is found in Hash Table 1, according to the corresponding data name o1 of unique identification information h1, The object data that data name is o1 is read from corresponding object storage device, by data name in the object data of o1 Appearance is compared with the content of object data d, as a result completely the same, and Eng.1 confirms that object data d is repeated data, no Object data d is written in object storage device, step three as shown in Figure 9.
Eng.1 is in the first corresponding relationship Hash Table 1 oneself safeguarded by the corresponding reference of unique identification information h1 It counting and increases once, reference count at this time is 2, show that the object data o1 being stored in object storage device has been cited twice, Step four as shown in Figure 9.
Eng.1 is handled successfully to Eng.3 return, step five as shown in Figure 9.
The corresponding relationship of the title d of object data and unique identification information h1 is inserted into the second corresponding relationship by Eng.3 In Mapping Table 3, step six as shown in Figure 9.
Eng.3 returns to object data d to client 300 and is written successfully, step seven as shown in Figure 9.
It as shown in Figure 10, is the flow diagram of colliding data write-in.The process may include following steps.
Client 300 according to the title of object data k carry out Hash calculation after (i.e. according to third preset rules), by object Data k is sent directly to Eng.3, step one as shown in Figure 10.
The unique identification information h1 of object data k is calculated in the content of Eng.3 object data k based on the received, to only One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data k and corresponding unique identification letter Breath h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 10.
After Eng.1 receives object data k and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance The record that unique identification information h1 is found in Hash Table 1, according to the corresponding data name o1 of unique identification information h1, The object data that data name is o1 is read from corresponding object storage device, by data name in the object data of o1 Appearance is compared with the content of object data k, as a result inconsistent, and Eng.1 determines that object data k is colliding data, is such as schemed Step 3 shown in 10.
Eng.1 is to Eng.3 returned data conflict, step four as shown in Figure 10.
Eng.3 directly calculates cryptographic Hash according to the title k of object data k, determines that object data k should be deposited according to cryptographic Hash The object storage device put, step five as shown in Figure 10.Wherein it is determined that the rule of object storage device can be according to specific feelings Condition setting, such as directly determined according to the cryptographic Hash of k, the present embodiment does not limit.
The title k of object data and the corresponding relationship of specific identification information are inserted into the second corresponding relationship Mapping by Eng.3 In Table 3, for example, the unique identification information is particular value " sky ", step six as shown in Figure 10.When progress reading data When, the corresponding unique beacon information of the object data title wanted to look up is found in the second corresponding relationship as sky, then it can be seen that The data wanted to look up are colliding datas, just determine storage to the rule of which storage equipment according to colliding data to determine at which It is searched in a storage equipment.
Eng.3 returns to object data k to client 300 and is written successfully, step seven as shown in Figure 10.
In the above-described embodiments, the first data de-duplication engine receive client 300 transmission object data when, If only considering received object data all is to be written for the first time, the first data de-duplication engine can directly based on the received The content of object data calculates unique identification information, so according to unique identification information determine the second data de-duplication engine into Row data de-duplication operations.But in practical applications, the received object data of data de-duplication engine may not also be Write-once, data de-duplication engine is not for being that the object data of write-in for the first time need to execute update operation.Please refer to figure 11, it is the flow diagram of data processing method provided by another embodiment of the application.At data in the embodiment of the present application Reason method can be used for realizing the update of object data, applied to data de-duplication engine or equipped with data de-duplication engine Server 100.Process shown in Figure 11 will be described in detail below.
Step S301 receives the object data that client 300 is sent.
In the present embodiment, client 300 sends corresponding repeated data for object data according to third preset rules Delete engine.
Step S302, if the unique identification information of the title of the received object data and the received object data Corresponding relationship have record in second corresponding relationship, in first corresponding relationship, have described in record The corresponding reference count of the unique identification information of the received object data reduces primary.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update When engine is the same engine, when data de-duplication engine receives the object data of the transmission of client 300, oneself tieed up The note of the unique identification information of the title and received object data of the object data of inquire-receive in second corresponding relationship of shield Record, if having record in the second corresponding relationship, object data is not to be written for the first time, needs to be implemented the update behaviour of object data Make.Due to being upgating object data, then the object data, which will be quoted no longer, has unique identification in record in the first corresponding relationship The corresponding object data of information, therefore data de-duplication engine need to will have note in the first corresponding relationship oneself safeguarded The corresponding reference count of the unique identification information of received object data described in record reduces primary.
In the present embodiment, when receive object data data de-duplication engine (such as the third repeating data deletion draw Hold up) with the data de-duplication engine (such as the 4th data de-duplication engine) updated is executed for the same engine when, the Three data de-duplication engines receive the object data that the client 300 is sent according to third preset rules, if described The corresponding relationship of the unique identification information of the title of received object data and the received object data is in the third weight Have record in second corresponding relationship of complex data deletion engine maintenance, has received number of objects described in record according to described According to unique identification information determine the 4th deduplication engine;In the first corresponding relationship of the 4th deduplication engine maintenance In, the corresponding reference count of the unique identification information for having received object data described in record is reduced primary.
Step S303, when the corresponding reference meter of the unique identification information for having received object data described in record When number is 0, sent out to the object storage device for the unique identification information corresponding objects data for being stored with the received object data Send the instruction for deleting the corresponding objects data.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update When engine is the same engine, when the reference count of the unique identification information is reduced to 0, show that client 300 has not had Object data will quote the object data in the corresponding deposit object storage device of unique identification information in the first corresponding relationship, It needs to delete the object data from object storage device cluster 200, i.e., to being stored with the received object data only The object storage device of one identification information corresponding objects data sends the instruction for deleting the corresponding objects data, is stored by object Equipment deletes the corresponding objects data, discharges memory space.
In the present embodiment, when the data de-duplication engine is at least two, one it is optional implement, work as reception When the data de-duplication engine of object data and the data de-duplication engine for executing update are the same engine, see One section of description.In another optional implementation, when data de-duplication engine (such as the third repeating number for receiving object data According to deletion engine) do not draw with the data de-duplication engine (such as the 4th data de-duplication engine) for executing update to be same When holding up, if the received object data in the first corresponding relationship of the 4th data de-duplication engine maintenance is unique Identification information reference count is 0, to pair for the unique identification information corresponding objects data for being stored with the received object data As storage equipment sends the instruction of the deletion corresponding objects data.
Step S304, if the unique identification information of the title of the received object data and the received object data Corresponding relationship have record in second corresponding relationship, recalculate the received object data unique identification letter Breath, in second corresponding relationship only by the corresponding received object data of the title of the received object data One update of identification information is the unique identification information for the received object data being calculated again.
In the present embodiment, in the corresponding reference of unique identification information to having received object data described in record It counts after reducing once, due to being the unique identification information for being updated to object data, therefore needing to recalculate object data, The write-in that data are realized in data de-duplication processing is carried out to object data according to the unique identification information recalculated, i.e., Whether there is record in the unique identification information for the object data that the first corresponding relationship inquiry of oneself maintenance is recalculated, into And judging the received object data is repeated data, non-repeated data or colliding data.It completes at data de-duplication After reason, by the corresponding received object data of the title of the received object data in second corresponding relationship Unique identification information is updated to the unique identification information of the received object data recalculated.Wherein, if completing The object data is colliding data after data de-duplication processing, then by the received object in second corresponding relationship The unique identification information of the corresponding received object data of the title of data is updated to specific identification information.
In the present embodiment, optional one when the data de-duplication engine is at least two based on described previously In implementation, when the data de-duplication engine for receiving object data draws with the data de-duplication engine updated is executed to be same When holding up, the description of the preceding paragraph is seen.In another optional implementation, when the data de-duplication engine for receiving object data (such as the third repeating data delete engine) (such as the 4th data de-duplication draws with the data de-duplication engine for executing update Hold up) for the same engine when, the third repeating data delete the unique identification information that engine recalculates object data, according to weight The unique identification information that is newly calculated, which determines, to be sent to the 4th data de-duplication engine (or other data de-duplications draws Hold up), data de-duplication processing is carried out to object data and realizes the write-in of object data;In the 4th data de-duplication engine After completing data de-duplication processing, the third repeating data are deleted engine and are connect in the second corresponding relationship oneself safeguarded by described What the unique identification information of the corresponding received object data of the title of the object data of receipts was updated to recalculate The unique identification information of the received object data.Certainly, if the 4th data de-duplication engine carries out weight to object data When complex data delete processing judges the object data for colliding data, then the third repeating data delete engine in the oneself safeguarded By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships It is updated to specific identification information.
It should be noted that if data de-duplication engine inquires the reception in the second corresponding relationship safeguarded Object data title, and be specific identification information corresponding to the title of the object data, then can determine the object data It has been stored in object storage device, and has been written in a manner of colliding data.Update for colliding data, it is right based on the received The title of image data first deletes corresponding object data in object storage device cluster 200, then recalculates reception Object data unique identification information, and the received number of objects is realized according to the unique identification information that recalculates According to write-in.
Ground is readily appreciated that, when data de-duplication engine does not inquire described connect in the second corresponding relationship oneself safeguarded When the unique identification information of the object data of receipts, then the object data is to be written for the first time, and process flow can refer to aforementioned reality Apply the corresponding contents of data processing method described in example.
In the present embodiment, data de-duplication engine receive client 300 transmission object data when, by The record of the unique identification information of the object data of inquire-receive can determine whether the reception in second corresponding relationship of oneself maintenance Object data whether have and stored in object storage device cluster 200.If inquiring unique mark of received object data Know information, then show that the object data has been stored in object storage device cluster 200, the update operation of object data need to be carried out;Such as Fruit inquires the unique identification information less than received object data, then oneself calculates the unique identification information of object data, root Data de-duplication processing is carried out according to the unique identification information of calculated object data, the write-in of object data is realized, locates Reason process can refer to the corresponding contents of data processing method described in previous embodiment.
In the present embodiment, in order to which the process for updating object data is more clear, below in conjunction with a reality The process of upgating object data is described in detail in application scenarios.Figure 12 is please referred to, which may include following steps.
Client 300 according to the title of object data a carry out Hash calculation after (i.e. according to third preset rules), by object Data a is sent directly to Eng.2, step one as shown in figure 12.
Eng.2 inquires unique mark of object data a in the second corresponding relationship Mapping Table 2 oneself safeguarded Know information h1, show be this time more new data operation.According to the unique identification information h1 inquired transmit a request to Eng.1 into Row is handled, and includes unique identification information h1, step two as shown in figure 12 in the request.
Eng.1 inquires the record of unique identification information h1 in the first corresponding relationship Hash Table 1 oneself safeguarded, The reference count of unique identification information h1 is reduced once, reference count at this time is to be kept to 1 by 2, because being originally object data a H1 is corresponded to object data d, is to be updated to object data a now, object data a may just not correspond to h1, but right Image data d also corresponds to h1, and the object data o1 being stored in object storage device at this time can't be deleted.It should be noted that if The corresponding reference count of unique identification information h1 is 0, then corresponds to h1 without object data, object data o1 need not just be stored , need to delete the corresponding object data o1 of unique identification information h1 from object storage device cluster 200, as shown in figure 12 The step of three.
Eng.1 returns to processing to Eng.2 and completes, step four as shown in figure 12.
Eng.2 recalculates the unique identification information (i.e. according to the second preset rules) of object data a, obtains h8, according to Object data a should be forwarded to Eng.3 processing, step five as shown in figure 12 by unique identification information h8 determination.
After Eng.3 receives object data a, do not searched in the first corresponding relationship Hash Table 3 of oneself maintenance To the record of unique identification information h8, determine that object data a for not repeated data, determines object according to unique identification information h8 It is o8 that data a, which is deposited into the data name in corresponding object storage device, by unique identification information h8's and data name o8 Corresponding relationship is inserted into the first corresponding relationship Hash Table 1, and the reference count of unique identification information h8 at this time is 1, such as Figure 12 Shown in step 6.
Object data a is stored in corresponding object storage device by Eng.3 according to the first preset rules, and the number of objects It is o8, step seven as shown in figure 12 according to data name of a in the object storage device.
Eng.3 is handled successfully to Eng.2 return, step eight as shown in figure 12.
The corresponding relationship of the title a of object data and unique identification information h1 is updated to the title a of object data by Eng.2 With the corresponding relationship of unique identification information h8, step nine as shown in figure 12.
Eng.2 is written successfully to 300 returning an object value data a of client, step ten as shown in figure 12.
In the present embodiment, when client 300 needs reading object data, the data de-duplication engine can be with Object data is obtained from object storage device cluster 200 according to the first corresponding relationship and the second corresponding relationship safeguarded, and It feeds back to client 300.Figure 13 is please referred to, is the process signal of data processing method provided by another embodiment of the application Figure, this method are applied to data de-duplication engine or the server 100 equipped with data de-duplication engine.In the present embodiment Data processing method below will carry out process shown in Figure 13 detailed for obtaining object data (or reading object data) It is thin to illustrate.
Step S401 receives the read request for the target object data that client 300 is sent.
In the present embodiment, when client 300 needs to read target object data, according to third preset rules by target The read request of object data is sent to corresponding data de-duplication engine, i.e., according to the title of target object data to corresponding Data de-duplication engine sends read request.As it can be seen that client 300 is according to third preset rules when writing data by object data It is sent to corresponding data de-duplication engine, then also according to third preset rules by target object data when reading data Read request is sent to corresponding data de-duplication engine, ensures that client 300 centainly can be by the repeat number in this way According to the acquisition for deleting engine implementation target object data.
Step S402, the second corresponding relationship according to the name query of the target object data.
In the present embodiment, when data de-duplication engine receives the target object data that the client 300 is sent Read request when, the second corresponding relationship for being safeguarded according to the name query of the target object data.
When the data de-duplication engine is at least two, which includes the 5th Data de-duplication engine and the 6th data de-duplication engine, the 5th data de-duplication engine are receiving the target pair When the read request of image data, the second corresponding relationship of the 5th data de-duplication engine maintenance is inquired.In the present embodiment, Assuming that the read request of target object data is sent to the 5th data de-duplication according to third preset rules by client 300 Engine.
Step S403, when the unique identification information for inquiring the target object data in second corresponding relationship When, then according to first corresponding relationship, inquire the corresponding target pair of unique identification information of the target object data Image data is deposited into the data name in object storage device.
When the data de-duplication engine for receiving read request is same with the data de-duplication engine for executing read request When engine, which can inquire the unique identification information of target object data according to the second corresponding relationship, And according to the first corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data is deposited Enter the data name into object storage device.
When the data de-duplication engine (such as the 5th data de-duplication engine) for receiving read request and execute read request Data de-duplication engine (such as the 6th data de-duplication engine) be the same engine when, when the 5th repeat number The target object number is inquired in the second corresponding relationship of the 5th data de-duplication engine maintenance according to engine is deleted According to unique identification information when, then the 6th data de-duplication engine is determined according to the second preset rules, by the target object The read request of data is sent to the 6th data de-duplication engine.The 6th data de-duplication engine is the described 6th In first corresponding relationship of data de-duplication engine maintenance, the unique identification information according to the target object data is true The fixed target object data is deposited into the data name in object storage device.
Step S404 is deposited into the data name in object storage device according to the target object data and reads the mesh Mark object data feeds back to the client 300.
When the data de-duplication engine for receiving read request is same with the data de-duplication engine for executing read request When engine, which can be deposited into the data name in object storage device according to the target object data Read takes the target object data to feed back to the client 300.
When the data de-duplication engine (such as the 5th data de-duplication engine) for receiving read request and execute read request Data de-duplication engine (such as the 6th data de-duplication engine) be the same engine when, the sixfold complex data It deletes engine and the data name in object storage device is deposited into from the object storage device according to the target object data It is middle to read the target object data, the target object data is fed back into the 5th data de-duplication engine, it is described The target object data is fed back to the client 300 by the 5th data de-duplication engine.
In the present embodiment, it since object data is still written in the either reading object data of client 300, is all made of same One rule is sent to the processing of data de-duplication engine, and the title of object data is identical, then can be sent to the same repeated data Delete engine.Data de-duplication engine can be looked into according to the title of target object data in the second corresponding relationship of oneself maintenance Corresponding unique identification information is ask, then inquires the target object number in the first corresponding relationship according to unique identification information According to the data name being deposited into object storage device, and reads the target object data and feed back to the client 300, from And realize the acquisition of object data.
It should be noted that data de-duplication engine is corresponding in second oneself safeguarded according to the title of target object data Inquired in relationship corresponding unique identification information be specific identification information when, it may be determined that the target object data is number of collisions According to, data de-duplication engine according to can directly be read from object storage device cluster 200 according to the title of target object data The target object data is taken, and feeds back to the client 300.
In the present embodiment, in order to keep the process for obtaining object data more clear, below in conjunction with a reality Application scenarios are described in detail.As shown in figure 14, the process for obtaining object data may include following steps.
Client 300 according to the title of target object data d carry out Hash calculation after (i.e. according to third preset rules), will The read request of target object data d is sent directly to Eng.3, step one as shown in figure 14.
Eng.3 inquires target object data d only in the second corresponding relationship Mapping Table 3 oneself safeguarded One identification information h1 calculates cryptographic Hash to unique identification information h1 and determines that read request, which is forwarded to Eng.1, to be handled, such as Figure 14 Shown in step 2.
After Eng.1 receives the read request of target object data d, in the first corresponding relationship Hash of oneself maintenance The record that unique identification information h1 is found in Table1 reads the target that data name is o1 from corresponding object storage device Object data, step three as shown in figure 14.
The target object data of reading is returned to Eng.3, step four as shown in figure 14 by Eng.1.
The target object data is back to client 300, step five as shown in figure 14 by Eng.3.
Figure 15 is please referred to, is the functional block diagram of data processing equipment 400 provided by another embodiment of the application.It is described Data processing equipment 400 includes one or at least two data de-duplication engines, is stored in memory 110, and by handling Device 120 executes.It should be noted that the technology of data processing equipment 400 provided by the present embodiment, basic principle and generation Effect is identical with data processing method described in the various embodiments described above, and to briefly describe, the present embodiment part does not refer to place, can With reference to the corresponding contents for the data processing method that the various embodiments described above provide.
In the present embodiment, the data de-duplication engine may include receiving module 410, data write-in processing module 420, sending module 430 and corresponding relationship maintenance module 440.It should be noted that the data processing equipment 400 in the present embodiment wraps The case where including a data de-duplication engine and at least two data de-duplication engines, and each data de-duplication engine It include identical functional module or unit, therefore, in the present embodiment, only in the angle pair of a data de-duplication engine The process that reads and writees of object data is illustrated.
When object data need to be written in client 300, if receiving the data de-duplication engine of object data and executing weight The data de-duplication engine of complex data delete processing be the same engine, then realize object data write-in receiving module 410, It is that the same data de-duplication draws that processing module 420, sending module 430 and corresponding relationship maintenance module 440, which is written, in data The module held up.
The receiving module 410 is used to receive the object data of the transmission of client 300.
The data write-in processing module 420 is used to carry out received object data data de-duplication processing to sentence Whether the received object data that breaks is repeated data in the object storage device cluster 200.
After the completion of data de-duplication processing, the sending module 430 is used to when the received object data be not When repeated data, the object storage device cluster 200 is sent by the received object data according to the first preset rules In corresponding object storage device.
The corresponding relationship maintenance module 440 is for safeguarding the first corresponding relationship, when the received object data is not When repeated data, the unique identification information of the received object data and the received object data are deposited into corresponding The corresponding relationship of data name in object storage device is inserted into first corresponding relationship.
Optionally, the corresponding relationship maintenance module 440 is also used to safeguard the second corresponding relationship, second corresponding relationship Including the data de-duplication engine maintenance receive client 300 send object data title and object data it is unique The corresponding relationship of identification information.If title and institute of the corresponding relationship maintenance module 440 for the received object data The corresponding relationship for stating the unique identification information of received object data does not record in second corresponding relationship, then will be described The corresponding relationship of the unique identification information of the title of received object data and the received object data is inserted into described the In two corresponding relationships.
In the present embodiment, when the data de-duplication engine is at least two, each data de-duplication The corresponding relationship maintenance module 440 of engine completes data de-duplication processing based on current data de-duplication engine for safeguarding The first corresponding relationship for being established of object data, the current data de-duplication engine is the corresponding relationship maintenance module The 440 data de-duplication engines being currently located.If receiving the data de-duplication engine of object data and executing repeated data The data de-duplication engine of delete processing is not the same engine, then realizes receiving module 410, the data of object data write-in It is the mould on different data de-duplication engines that processing module 420, sending module 430 and corresponding relationship maintenance module 440, which is written, Block.
When the receiving module 410 receives the object data of the transmission of client 300, where the receiving module 410 Data de-duplication engine data write-in processing module 420 be also used to through the second preset rules to the visitor received The object data that family end 300 is sent is analyzed, to determine that the target repeated data for executing data de-duplication processing operation is deleted The target data de-duplication engine is forwarded to except engine, and by the received object data.
The data write-in processing module 420 of the target data de-duplication engine is used to be based on the target repeated data The first corresponding relationship for deleting engine maintenance carries out data de-duplication to the received object data and handles to judge that this is right Whether image data is repeated data.
The sending module 430 of the target data de-duplication engine is used to when the received object data be not repeat When data, it is right in the object storage device cluster 200 to send the received object data to according to the first preset rules The object storage device answered.
The corresponding relationship maintenance module 440 of the target data de-duplication engine, which is used to work as, judges the received object Data be not repeated data when, the unique identification information of the received object data and the received object data are stored in Pair of the target data de-duplication engine is inserted into the corresponding relationship of the data name in corresponding object storage device In the first corresponding relationship for answering relationship safeguard module 440 to safeguard.
If receiving the corresponding relationship maintenance module 440 of the data de-duplication engine of object data for described received right The corresponding relationship of the unique identification information of the title of image data and the received object data is in second corresponding relationship Do not record, then it is the title of the received object data is corresponding with the unique identification information of the received object data Relationship is inserted into second corresponding relationship.
It in the present embodiment, can be with by judging whether received object data has record in the second corresponding relationship Determine whether to execute the object data and updates operation.Wherein, when the data de-duplication engine and execution for receiving object data When the data de-duplication engine of update is the same engine, if the corresponding relationship maintenance module 440 is for described received The corresponding relationship of the unique identification information of the title of object data and the received object data is in second corresponding relationship In have record, in first corresponding relationship, by it is described have record described in received object data unique identification The corresponding reference count of information reduces primary.
In the present embodiment, when the data de-duplication engine of the data processing equipment 400 is at least two, if connecing Receiving the data de-duplication engine of object data and executing the data de-duplication engine updated is not the same engine, is received If the data write-in processing module 420 of the data de-duplication engine of object data is used for the title of the received object data Corresponding relationship with the unique identification information of the received object data is in working as where data write-in processing module 420 Have record in second corresponding relationship of preceding data de-duplication engine maintenance, then determines that target repeats according to the second preset rules Data delete engine, send the target weight for the unique identification information for having received object data described in record Complex data deletes engine.
The corresponding relationship maintenance module 440 of the target data de-duplication engine in the target repeated data for deleting Except in the first corresponding relationship of engine maintenance, by the unique identification information pair for having received object data described in record The reference count answered reduces primary.
The unique identification information that the sending module 430 is also used to described to have received object data described in record When corresponding reference count is 0, to pair for the unique identification information corresponding objects data for being stored with the received object data As storage equipment sends the instruction of the deletion corresponding objects data.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update When engine is the same engine, then the sending module 430 for receiving the data de-duplication engine of object data is used for being stored with The object storage device of the unique identification information corresponding objects data of the received object data, which is sent, deletes described pair of reply The instruction of image data.
When the data de-duplication engine for receiving object data and the data de-duplication engine for executing update are not same When a engine, if then the sending module 430 of target data de-duplication engine is tieed up for the target data de-duplication engine The unique identification information reference count of the received object data in first corresponding relationship of shield is 0, described to being stored with The object storage device of the unique identification information corresponding objects data of received object data, which is sent, deletes the corresponding objects number According to instruction.
In the present embodiment, if data write-in processing module 420 is also used to the title of the received object data Has record in second corresponding relationship with the corresponding relationship of the unique identification information of the received object data, again The unique identification information for calculating the received object data carries out repeat number according to the unique identification information recalculated According to delete processing.
The corresponding relationship maintenance module 440 is also used to the received object data in second corresponding relationship The corresponding received object data of title unique identification information be updated to recalculate it is described received right The unique identification information of image data.In the present embodiment, when receive object data data de-duplication engine and execute update Data de-duplication engine be the same engine when, then receive in the data de-duplication engine of object data data write-in Processing module 420 is used to recalculate the unique identification information of the received object data, according to recalculating only One identification information carries out data de-duplication processing;Receive the corresponding relationship maintenance in the data de-duplication engine of object data Module 440 is used for the corresponding received object of the title of the received object data in second corresponding relationship The unique identification information of data is updated to the unique identification information of the received object data recalculated.
When the data de-duplication engine for receiving object data and the data de-duplication engine for executing update are not same When a engine, then the data write-in processing module 420 in the data de-duplication engine of object data is received for recalculating The unique identification information of the received object data is sent to target according to the unique identification information determination recalculated Data de-duplication engine is deleted processing again and judges whether the object data is repeated data, target data de-duplication engine After completing to delete processing again, the corresponding relationship maintenance module 440 of the data de-duplication engine of object data is received described the By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships It is updated to the unique identification information of the received object data recalculated.
Optionally, in the present embodiment, the receiving module 410 is also used to receive the target object of the transmission of client 300 The read request of data.
When client 300 need to obtain object data, the data de-duplication engine further includes reading data processing mould Block 450, the reading data processing module 450 are used for the second corresponding pass according to the name query of the target object data System, when inquiring the unique identification information of the target object data in second corresponding relationship, then according to described the One corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data are deposited into object The data name in equipment is stored, the data name in object storage device is deposited into according to the target object data and reads institute It states target object data and the target object data is fed back into the client 300.
In the present embodiment, optional one when the data de-duplication engine is at least two based on described previously In implementation, when the data de-duplication engine for receiving read request is same draw with the data de-duplication engine for executing read request When holding up, the description of the preceding paragraph is seen.In another optional implementation, when receive read request data de-duplication engine with hold When the data de-duplication engine of row read request is not the same engine, the reading data processing module 450 is specifically used for looking into The second corresponding relationship for asking the current data de-duplication engine maintenance where the reading data processing module 450, works as inquiry To the target object data unique identification information when, then determine that target data de-duplication draws according to the second preset rules It holds up, the read request of the target object data is sent to the target data de-duplication engine.
The reading data processing module 450 is also used to receive the mesh of the target data de-duplication engine feedback Object data is marked, and the target object data is fed back into the client 300.
In conclusion data processing method, device provided by the embodiment of the present application and server, client directly will be right Image data is sent to data de-duplication engine, by data de-duplication engine according to the first corresponding relationship safeguarded thereon and Two corresponding relationships carry out data de-duplication processing to the received object data, and identical object data can be all addressed to together One target data de-duplication engine is deleted processing again, to guarantee the identical object data of content in object storage device It can be only written into once in cluster, realize the overall situation and delete again, effectively reduce the memory space usage amount of object storage system.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting Standby intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.

Claims (16)

1. a kind of data processing method, which is characterized in that described applied to the data de-duplication engine in object storage system It include that the server of the data de-duplication engine is installed and is set by least one object storage in object storage system The object storage device cluster of standby composition, the data de-duplication engine and at least one described object storage device communication link It connects, maintenance has the first corresponding relationship in the data de-duplication engine, and first corresponding relationship includes the repeated data The unique identification information and object data for deleting the object data of engine maintenance are deposited into the data name in object storage device Corresponding relationship, which comprises
Receive the object data that client is sent;
Data de-duplication processing is carried out to judge the reception to received object data according to first corresponding relationship Object data whether in the object storage device cluster be repeated data;
When the received object data is not repeated data, obtained according to the unique identification information of the received object data It obtains the received object data and is deposited into the data name in the object storage device cluster, and according to the number of acquisition Corresponding object storage device in the object storage device cluster is sent by the received object data according to title, by institute The unique identification information and the received object data for stating received object data are deposited into corresponding object storage device The corresponding relationship of data name be inserted into first corresponding relationship.
2. data processing method as described in claim 1, which is characterized in that when the data de-duplication engine is at least two When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing The first corresponding relationship;
It is described that data de-duplication processing is carried out to received object data, comprising:
When first data de-duplication engine receives the object data that the client is sent, determined by the second preset rules The received object data is sent to described by the second data de-duplication engine for executing data de-duplication processing operation Second data de-duplication engine;
First corresponding relationship pair of the second data de-duplication engine based on the second data de-duplication engine maintenance The received object data carries out data de-duplication processing.
3. data processing method as claimed in claim 2, which is characterized in that described when the received object data is not weigh When complex data, it is corresponding right that the unique identification information of the received object data and the received object data are deposited into As the corresponding relationship of the data name in storage equipment is inserted into first corresponding relationship, comprising:
When the received object data is not repeated data, the second data de-duplication engine will be described received right The unique identification information of image data and the received object data are deposited into the data in the corresponding object storage device The corresponding relationship of title is inserted into the first corresponding relationship of the second data de-duplication engine maintenance.
4. data processing method as described in claim 1, which is characterized in that maintenance has the in the data de-duplication engine Two corresponding relationships, second corresponding relationship include the object that the reception client of the data de-duplication engine maintenance is sent The corresponding relationship of the unique identification information of the title and object data of data, the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists It is not recorded in second corresponding relationship, then by the title of the received object data and the received object data The corresponding relationship of unique identification information is inserted into second corresponding relationship.
5. data processing method as claimed in claim 4, which is characterized in that the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists Have record in second corresponding relationship, recalculate the unique identification information of the received object data, described By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships It is updated to the unique identification information of the received object data recalculated.
6. data processing method as claimed in claim 4, which is characterized in that first corresponding relationship further includes object data The corresponding reference count of unique identification information, the reference count be used for it is every judge an object data for repeated data when Primary counting is accordingly increased to the unique identification information of the object data, the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists Have record in second corresponding relationship, in first corresponding relationship, by it is described have record described in it is received right The corresponding reference count of the unique identification information of image data reduces primary;
When the corresponding reference count of unique identification information for having received object data described in record is 0, Xiang Cun The object storage device transmission deletion for containing the unique identification information corresponding objects data of the received object data is described right Answer the instruction of object data.
7. data processing method as claimed in claim 6, which is characterized in that when the data de-duplication engine is at least two When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing The first corresponding relationship, and maintenance second corresponding closed by the object data that client is sent is established based on oneself receiving System;
It is closed if the title of the received object data is corresponding with the unique identification information of the received object data It ties up in second corresponding relationship and has record, in first corresponding relationship, have reception described in record for described Object data the corresponding reference count of unique identification information reduce it is primary, comprising:
The third repeating data delete engine and receive the object data that the client is sent according to third preset rules, if institute The corresponding relationship of the title of received object data and the unique identification information of the received object data is stated in the third Have record in second corresponding relationship of data de-duplication engine maintenance, has received object described in record according to described The unique identification information of data determines the 4th deduplication engine;
In the first corresponding relationship of the 4th deduplication engine maintenance, has received object described in record for described The corresponding reference count of the unique identification information of data reduces primary.
8. data processing method as claimed in claim 4, which is characterized in that the method also includes:
When receiving the read request for the target object data that the client is sent, according to the title of the target object data Inquire second corresponding relationship;
When inquiring the unique identification information of the target object data in second corresponding relationship, then according to described One corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data are deposited into object Store the data name in equipment;
The data name in object storage device, which is deposited into, according to the target object data reads the target object data simultaneously The target object data is fed back into the client.
9. data processing method as claimed in claim 8, which is characterized in that when the data de-duplication engine is at least two When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing The first corresponding relationship, and maintenance second corresponding closed by the object data that client is sent is established based on oneself receiving System;
It is described when receiving the read request for the target object data that the client is sent, according to the target object data Second corresponding relationship described in name query, comprising:
5th data de-duplication engine inquires the 5th repeat number when receiving the read request of the target object data According to the second corresponding relationship for deleting engine maintenance, wherein the client is according to third preset rules by the target object number According to read request be sent to the 5th data de-duplication engine;
It is described when inquiring the unique identification information of the target object data in second corresponding relationship, then according to institute The first corresponding relationship is stated, the corresponding target object data of unique identification information for inquiring the target object data is deposited into Data name in object storage device, comprising:
When the 5th data de-duplication engine is in the second corresponding relationship of the 5th data de-duplication engine maintenance When inquiring the unique identification information of the target object data, then the 6th data de-duplication is determined according to the second preset rules The read request of the target object data is sent to the 6th data de-duplication engine by engine;The sixfold plural number According to engine is deleted in first corresponding relationship of the 6th data de-duplication engine maintenance, according to the target object The unique identification information of data determines that the target object data is deposited into the data name in object storage device;
It is described that the reading of data name in the object storage device target object number is deposited into according to the target object data According to feeding back to the client, comprising:
The 6th data de-duplication engine is deposited into the data name in object storage device according to the target object data Title reads the target object data from the object storage device, and the target object data is fed back to the 5th weight Complex data deletes engine, and the target object data is fed back to the client by the 5th data de-duplication engine.
10. data processing method as described in claim 1, which is characterized in that described to be repeated to received object data Data delete processing, comprising:
When the received object data is greater than default size, the data de-duplication engine is by the received number of objects According to the multiple subobject data for meeting the default size are divided into, data de-duplication is carried out to each subobject data Processing.
11. a kind of data processing equipment, which is characterized in that be applied to object storage system, include in the object storage system The object storage device collection for being equipped with the server of data de-duplication engine and being made of at least one object storage device Group, the data de-duplication engine and the communication connection of at least one described object storage device, the data processing equipment packet The data de-duplication engine is included, the data de-duplication engine includes:
Corresponding relationship maintenance module, for safeguarding the first corresponding relationship, first corresponding relationship includes that the repeated data is deleted Except the unique identification information and object data of the object data of engine maintenance are deposited into the data name in object storage device Corresponding relationship;
Receiving module, for receiving the object data of client transmission;
Processing module is written in data, for carrying out data de-duplication to received object data according to first corresponding relationship Processing to judge the received object data whether in the object storage device cluster for repeated data;
Sending module is used for when the received object data is not repeated data, according to the received object data Unique identification information obtains the data name that the received object data is deposited into the object storage device cluster, and root It is corresponding right in the object storage device cluster to send the received object data to according to the data name of acquisition As storing equipment;
The corresponding relationship maintenance module is used for the unique identification information of the received object data and described received right The corresponding relationship for the data name that image data is deposited into corresponding object storage device is inserted into first corresponding relationship.
12. data processing equipment as claimed in claim 11, which is characterized in that when the repetition that the data processing equipment includes When data deletion engine is at least two, the corresponding relationship maintenance module of each data de-duplication engine is for safeguarding base It is described in the first corresponding relationship that the object data that current data de-duplication engine completes data de-duplication processing is established Current data de-duplication engine is the data de-duplication engine that the corresponding relationship maintenance module is currently located;
The object that the data write-in processing module is also used to send the client received by the second preset rules Data are analyzed, and to determine the target data de-duplication engine for executing data de-duplication processing operation, and are connect described The object data of receipts is forwarded to the target data de-duplication engine;
The data write-in processing module of the target data de-duplication engine is used to draw based on the target data de-duplication The first corresponding relationship for holding up maintenance carries out data de-duplication processing to the received object data.
13. data processing equipment as claimed in claim 11, which is characterized in that the corresponding of the data de-duplication engine is closed It is that maintenance module is also used to safeguard that the second corresponding relationship, second corresponding relationship include the data de-duplication engine maintenance Reception client send object data title and object data unique identification information corresponding relationship;
If the corresponding relationship maintenance module be also used to the received object data title and the received object data The corresponding relationship of unique identification information do not recorded in second corresponding relationship, then by the received object data The corresponding relationship of the unique identification information of title and the received object data is inserted into second corresponding relationship.
14. data processing equipment as claimed in claim 13, which is characterized in that the receiving module is also used to receive the visitor The read request for the target object data that family end is sent;
The data de-duplication engine further includes reading data processing module, and the reading data processing module is used for according to institute The second corresponding relationship described in the name query of target object data is stated, when inquiring the target in second corresponding relationship When the unique identification information of object data, then according to first corresponding relationship, unique mark of the target object data is inquired Know the corresponding target object data of information and be deposited into the data name in object storage device, according to the target object number The target object data is read according to the data name being deposited into object storage device and feeds back the target object data To the client.
15. data processing equipment as claimed in claim 14, which is characterized in that when the repetition that the data processing equipment includes When data deletion engine is at least two, the corresponding relationship maintenance module of each data de-duplication engine is for safeguarding base In the first corresponding relationship that the object data that current data de-duplication engine completes data de-duplication processing is established, and It safeguards and the second corresponding relationship established by the object data that client is sent is received based on current data de-duplication engine, The current data de-duplication engine is the data de-duplication engine that the corresponding relationship maintenance module is currently located;
The current repeated data that the reading data processing module is specifically used for where inquiring the reading data processing module is deleted Except the second corresponding relationship of engine maintenance, when inquiring the unique identification information of the target object data, then according to second Preset rules determine target data de-duplication engine, and the read request of the target object data is sent to the target and is repeated Data delete engine;
The reading data processing module is also used to receive the target object of the target data de-duplication engine feedback Data, and the target object data is fed back into the client.
16. a kind of server, which is characterized in that be applied to object storage system, the object storage system includes by least one The object storage device cluster of a object storage device composition, the server include:
Memory, for storing one or more programs;
Processor;
When one or more of programs are executed by the processor, such as the described in any item sides of claim 1-10 are realized Method.
CN201810009925.0A 2018-01-05 2018-01-05 Data processing method, device and server Active CN108268216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810009925.0A CN108268216B (en) 2018-01-05 2018-01-05 Data processing method, device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810009925.0A CN108268216B (en) 2018-01-05 2018-01-05 Data processing method, device and server

Publications (2)

Publication Number Publication Date
CN108268216A CN108268216A (en) 2018-07-10
CN108268216B true CN108268216B (en) 2019-11-12

Family

ID=62773412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810009925.0A Active CN108268216B (en) 2018-01-05 2018-01-05 Data processing method, device and server

Country Status (1)

Country Link
CN (1) CN108268216B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408761A (en) * 2018-10-16 2019-03-01 翟红鹰 A kind of filter method of repetitive requests, system, equipment and storage medium
CN110427347A (en) * 2019-07-08 2019-11-08 新华三技术有限公司成都分公司 Method, apparatus, memory node and the storage medium of data de-duplication
CN111510497A (en) * 2020-04-17 2020-08-07 上海七牛信息技术有限公司 Processing method and system for edge storage
CN114265551B (en) * 2021-12-02 2023-10-20 阿里巴巴(中国)有限公司 Data processing method in storage cluster, storage node and equipment
CN117131036B (en) * 2023-10-26 2023-12-22 环球数科集团有限公司 Data maintenance system based on big data and artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8281021B1 (en) * 2009-02-12 2012-10-02 Sprint Communications Company L.P. Multiple cookie handling
CN103279502A (en) * 2013-05-06 2013-09-04 北京赛思信安技术有限公司 Framework and method of repeated data deleting file system combined with parallel file system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9665591B2 (en) * 2013-01-11 2017-05-30 Commvault Systems, Inc. High availability distributed deduplicated storage system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8281021B1 (en) * 2009-02-12 2012-10-02 Sprint Communications Company L.P. Multiple cookie handling
CN103279502A (en) * 2013-05-06 2013-09-04 北京赛思信安技术有限公司 Framework and method of repeated data deleting file system combined with parallel file system

Also Published As

Publication number Publication date
CN108268216A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108268216B (en) Data processing method, device and server
CN108769111A (en) A kind of server connection method, computer readable storage medium and terminal device
CN107704202B (en) Method and device for quickly reading and writing data
CN109213604A (en) A kind of management method and device of data source
CN110737682A (en) cache operation method, device, storage medium and electronic equipment
CN106294352A (en) A kind of document handling method, device and file system
CN111177143B (en) Key value data storage method and device, storage medium and electronic equipment
CN107038092B (en) Data copying method and device
CN111475105A (en) Monitoring data storage method, device, server and storage medium
CN110851474A (en) Data query method, database middleware, data query device and storage medium
CN112579595A (en) Data processing method and device, electronic equipment and readable storage medium
CN105653209A (en) Object storage data transmitting method and device
CN109947729A (en) A kind of real-time data analysis method and device
CN112860953A (en) Data importing method, device, equipment and storage medium of graph database
CN113779286B (en) Method and device for managing graph data
WO2021016050A1 (en) Multi-record index structure for key-value stores
CN104956340A (en) Scalable data deduplication
CN111046106A (en) Cache data synchronization method, device, equipment and medium
CN108154024A (en) A kind of data retrieval method, device and electronic equipment
CN112650692A (en) Heap memory allocation method, device and storage medium
CN105939402A (en) MAC table entry obtaining method and device
CN109947842A (en) Date storage method, apparatus and system in distributed memory system
CN109408496A (en) A kind of method and device reducing data redundancy
US20150134671A1 (en) Method and apparatus for data distribution and concurrence
CN111131197B (en) Filtering strategy management system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant