CN108268216B - Data processing method, device and server - Google Patents
Data processing method, device and server Download PDFInfo
- Publication number
- CN108268216B CN108268216B CN201810009925.0A CN201810009925A CN108268216B CN 108268216 B CN108268216 B CN 108268216B CN 201810009925 A CN201810009925 A CN 201810009925A CN 108268216 B CN108268216 B CN 108268216B
- Authority
- CN
- China
- Prior art keywords
- data
- corresponding relationship
- object data
- duplication
- engine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0607—Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application proposes a kind of data processing method, device and server, is related to object storage technology field.This method is received the object data that client is sent by data de-duplication engine, and to received object data carry out data de-duplication processing to judge the received object data whether in the object storage device cluster for repeated data, when the received object data is not repeated data, corresponding object storage device in the object storage device cluster is sent by the received object data according to the first preset rules, when object data is repeated data, then data de-duplication engine is not written into, to guarantee that the identical object data of content can be only written into once in object storage device cluster, realize global data de-duplication processing, effectively reduce the memory space usage amount of object storage system.
Description
Technical field
This application involves object storage technology fields, in particular to a kind of data processing method, device and service
Device.
Background technique
For existing object storage system during carrying out data processing, client directly sends out the request of object data
Some object storage device in object storage device (Object-based Storage Device, OSD) cluster is sent, by
The object storage device directly manages storage medium space, such as space logical block (LBA) of hard disk, realizes depositing for object data
It takes.
If storing many repeated datas in an object storage device, such as in different users stores in cloud disk
Hold identical document, picture, video etc., memory space can be caused greatly to waste, therefore need to delete repeated data.
Existing solution is object storage device after the object data for receiving client transmission, is docked by object storage device
The object data of receipts carries out data de-duplication processing, so that entire object storage device cluster is stored in many repeated datas, makes
At the waste of memory space.
Summary of the invention
The embodiment of the present application is designed to provide a kind of data processing method, device and server, is deposited with promoting object
Storage space utilization in storage system.
To achieve the goals above, the embodiment of the present application the technical solution adopted is as follows:
In a first aspect, the embodiment of the present application proposes a kind of data processing method, applied to the repetition in object storage system
Data delete engine, include being equipped with the server of the data de-duplication engine and by extremely in the object storage system
The object storage device cluster of few object storage device composition, the data de-duplication engine and described at least one is right
As storage equipment communication connection.The data processing method includes: the object data for receiving client and sending;To received number of objects
It handles to judge the received object data whether in the object storage device cluster according to data de-duplication is carried out
For repeated data;It, will be described received right according to the first preset rules when the received object data is not repeated data
Image data is sent to corresponding object storage device in the object storage device cluster.
Second aspect, the embodiment of the present application also propose a kind of data processing equipment, are applied to object storage system, described right
As including being equipped with the server of data de-duplication engine and being made of at least one object storage device in storage system
Object storage device cluster, the data de-duplication engine and at least one described object storage device communication connection.Institute
Stating data processing equipment includes the data de-duplication engine.The data de-duplication engine includes receiving module, data
Processing module and sending module is written, which is used to receive the object data of client transmission, data write-in processing
Module be used for received object data carry out data de-duplication processing to judge the received object data whether
It is repeated data in the object storage device cluster, which is used to when the received object data be not repeat number
According to when, it is corresponding right in the object storage device cluster to send the received object data to according to the first preset rules
As storing equipment.
The third aspect, the embodiment of the present application also propose a kind of server, and the server application is described in object storage system
Object storage system includes the object storage device cluster being made of at least one object storage device, and the server includes:
Memory, for storing one or more programs;Processor;When one or more of programs are executed by the processor,
Realize method as described above.
Compared with the prior art, in the embodiment of the present application, since object storage system includes being equipped with repeated data to delete
Server except engine and the object storage device cluster that is made of at least one object storage device, when storage object data
When, object data is not sent directly to object storage device by client, it is first sent to data de-duplication engine, by
After data de-duplication engine carries out data de-duplication processing to the received object data, then it is stored in corresponding object
It stores in equipment, realizes the technical effect of global data de-duplication processing, effectively reduce the storage of object storage system
Space usage amount.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the composition schematic diagram of object storage system provided by the embodiment of the present application.
Fig. 2 shows the functional block diagrams of server provided by the embodiment of the present application.
Fig. 3 shows the first corresponding relationship that data de-duplication engine provided by the embodiment of the present application is safeguarded and
The schematic diagram of two corresponding relationships.
Fig. 4 show data de-duplication engine provided by the embodiment of the present application be it is multiple in the case where schematic diagram.
Fig. 5 shows the flow diagram of data processing method provided by the embodiment of the present application.
Fig. 6 show object data granularity it is larger when data de-duplication engine maintenance the second corresponding relationship signal
Figure.
Fig. 7 shows the flow diagram of data processing method provided by another embodiment of the application.
Fig. 8 shows the flow diagram of not repeated data write-in.
Fig. 9 shows the flow diagram of repeated data write-in.
Figure 10 shows the flow diagram of colliding data write-in.
Figure 11 shows the flow diagram of data processing method provided by another embodiment of the application.
Figure 12 shows the flow diagram of upgating object data.
Figure 13 shows the flow diagram of data processing method provided by another embodiment of the application.
Figure 14 shows the flow diagram for obtaining object data.
Figure 15 shows the functional block diagram of data processing equipment provided by another embodiment of the application.
Icon: 10- object storage system;100- server;200- object storage device cluster;300- client;400-
Data processing equipment;110- memory;120- processor;130- communication interface;410- receiving module;420- data write-in processing
Module;430- sending module;440- corresponding relationship maintenance module;450- reading data processing module.
Specific embodiment
During realizing the technical solution of the embodiment of the present application, present inventor's discovery:
Existing data de-duplication processing is handled on each object storage device, when client writes data,
Object data is sent directly to object storage device, object storage device by the object data and fingerprint of record, fingerprint with
The mapping relations of logical block address realize data de-duplication processing.For example, identical for content object data (such as object
Data a and object data d) only stores a data to this position logical block address LBA1.
Based on the studies above, inventor is by multi-party investigation discovery, and data de-duplication processing in the prior art is not
It is that the overall situation is deleted again, there are still repeated data for entire object storage device cluster, reason is: content is identical multiple
Object data may be sent in multiple and different object storage devices and be stored again after repeated data delete processing,
And object storage device is that data de-duplication processing is carried out to the received object data of this object storage device in the prior art,
Therefore it can only guarantee there is no the identical object data of content in an object storage device, but two even more than object storage device
Between there is likely to be the identical object data of content, lead to not realize that the overall situation is deleted again.
Defect present in the above scheme in the prior art, is that inventor is obtaining after practicing and carefully studying
As a result, therefore, the solution that the discovery procedure of the above problem and hereinafter the embodiment of the present application are proposed regarding to the issue above
Scheme all should be the contribution that inventor makes the present invention in process of the present invention.
Below in conjunction with attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Ground description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Usually exist
The component of the embodiment of the present application described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause
This, is not intended to limit claimed the application's to the detailed description of the embodiments herein provided in the accompanying drawings below
Range, but it is merely representative of the selected embodiment of the application.Based on embodiments herein, those skilled in the art are not being done
Every other embodiment acquired under the premise of creative work out, shall fall in the protection scope of this application.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile the application's
In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Fig. 1 is please referred to, is the composition schematic diagram of object storage system 10 provided by the embodiment of the present application.As shown in Figure 1,
Object storage system 10 include server 100, by least one object storage device (OSD.1, OSD.2 as shown in Figure 1,
OSD.3 ..., OSD.xx) composition object storage device cluster 200 and client 300, be equipped in the server 100
Data de-duplication engine realizes the access of object data for handling the request of data from client 300.Institute
Data de-duplication engine and the communication connection of at least one described object storage device are stated, data de-duplication engine actually may be used
To be some processes for executing data de-duplication operations, and object storage device and data de-duplication engine are two independences
Process, need to be communicated by network implementations.Wherein, object storage device can be engine-operated in a service with data de-duplication
In device 100, also it may operate in the individual equipment independently of server 100.
In the present embodiment, the client 300 may be, but not limited to, smart phone, PC (personal
Computer, PC), tablet computer, personal digital assistant (personal digital assistant, PDA), mobile Internet access set
Standby (mobile Internet device, MID) etc..
As shown in Fig. 2, for the functional block diagram of server 100 provided by the present embodiment.The server 100 can be with
Including memory 110, processor 120 and communication interface 130, the memory 110, processor 120 and communication interface 130, respectively
It is directly or indirectly electrically connected between element, to realize the transmission or interaction of data.For example, these elements between each other may be used
It is realized and is electrically connected by one or more communication bus or signal wire.Processor 120 is used to execute to store in memory 110
Executable module, such as computer program.
Wherein, memory 110 may be, but not limited to, random access memory (Random Access Memory,
RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only
Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM),
Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Memory 110 can be used for storing software program and module, and one or more data de-duplication engines include that at least one can
The operation system in the server 100 is stored in memory 110 or is solidificated in the form of software or firmware (firmware)
Software function module in system (operating system, OS).The processor 120 executes after receiving and executing instruction
One or more programs with realize the embodiment of the present application disclose data processing method.The communication interface 130 can be used for and other
Node device carries out the communication of signaling or data.
Processor 120 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization
Each step of method can be completed by the integrated logic circuit of the hardware in processor 120 or the instruction of software form.On
The processor 120 stated can be general processor, including central processing unit (Central Processing Unit, abbreviation
CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), dedicated
Integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor
Logical device, discrete hardware components.
The embodiment of the present application also provides a kind of computer readable storage mediums, are stored thereon with computer program, the meter
Calculation machine program realizes the data processing method that the embodiment of the present application discloses when being executed by processor 120.
As shown in figure 3, in the present embodiment, maintenance has the first corresponding relationship (Hash in data de-duplication engine
Table), first corresponding relationship includes the unique identification information of the object data of the data de-duplication engine maintenance
(for example, h1, h2, h3) is deposited into the corresponding of the data name (for example, o1, o2, o3) in object storage device with object data
Relationship.Wherein, unique identification information can also use it by taking the hash value of object data as an example, but in practice in the present embodiment
He is worth, as long as this value can be with unique identification object data, the anthropoid fingerprint of class, people that can be different with unique identification,
Therefore we can also visually unique identification information be referred to as object data finger print information.
Wherein, the object data be deposited into the data name in object storage device can be according to unique mark of the object data
Know information acquisition.For example, the unique identification information h1 to object data obtains the object data plus Magic number (or prefix)
The data name o1 being deposited into object storage device, the received object data are deposited into corresponding object storage device
Data name includes to indicate that the received object data has carried out the identification information of data de-duplication processing and (can be regarded as above-mentioned
Magic number).When in the data name for the object data stored in object storage device including above-mentioned identification information, then show
The object data is deposited into the object storage device after completing data de-duplication processing by data de-duplication engine.
It, can also be directly using the unique identification information of the object data as being deposited into pair it should be noted that in other embodiments
The data name in object storage device answered.
Optionally, which can also include the corresponding reference count of unique identification information of object data,
The reference count be used for it is every judge an object data for repeated data when to the unique identification information of the object data
Accordingly increase primary counting, for example, the corresponding reference count of unique identification information h1 is 2 (a and d), Wei Yibiao in Fig. 3
Knowing the corresponding reference count of information h2 is 1 (b), and the corresponding reference count of unique identification information h3 is 1 (c).
Optionally, in the present embodiment, believe for the ease of the title of query object data and the unique identification of object data
The record of breath, and then judge whether object data has been stored in object storage device cluster 200, the data de-duplication engine
In also maintenance have the second corresponding relationship (Mapping Table), second corresponding relationship includes that the data de-duplication draws
Hold up the unique identification information of title (for example, a, b, c) and object data of the object data of the transmission of reception client 300 of maintenance
The corresponding relationship of (for example, h1, h2, h3).Data de-duplication engine, can be directly according to object after receiving object data
The title of data searches corresponding unique identification information in the second corresponding relationship.Wherein, data de-duplication processing is a kind of
Data reducti techniques do not have to the identical data of content to repeat storage.And in the present embodiment, it is different in object data title
In the case of, different object data titles can correspond to the unique identification information of its object data, such as: the number of objects of entitled a
According to unique identification information, such as cryptographic Hash is h1, and the unique identification information of the object data of entitled d, such as cryptographic Hash,
The corresponding relationship of the corresponding relationship of a and h1 and b and h1 can all be inserted respectively for h1 for the ease of the subsequent lookup to a and d
Enter into the second mapping table.In the present embodiment, Hash calculation can be carried out according to the content of object data obtain object
The unique identification information fingerprint of object data (obtain) of data, thus the identical object data of content can have it is identical only
One identification information.For example, object data a and object data d in Fig. 3, their unique identification information is h1.
It should be understood that the title of object data, refers to the name for the object data that client 300 is sent;Object data is deposited
Enter the data name into object storage device, refers to the object data that data de-duplication engine sends client 300
After carrying out the processing of data de-duplication engine, which is deposited into the name in object storage device, and the name of the two can
It, can not also be identical with identical.
As shown in figure 4, giving the data de-duplication engine is multiple (for example, Eng.1, Eng.2, Eng.3 ...)
In the case where schematic diagram, the property of 10 data processing of object storage system can be improved in the setting of multiple data de-duplication engines
Energy.Each data de-duplication engine maintenance based on oneself complete data de-duplication processing object data established first
Corresponding relationship and maintenance, which have, receives the second corresponding relationship that the object data that client 300 is sent is established based on oneself.Example
Such as, the first corresponding relationship and the second corresponding relationship of Eng.1 maintenance are respectively Hash Table 1, Mapping Table 1,
The first corresponding relationship and the second corresponding relationship of Eng.2 maintenance are respectively Hash Table 2, Mapping Table 2, Eng.3
The first corresponding relationship and the second corresponding relationship of maintenance are respectively Hash Table 3, Mapping Table 3.
In the present embodiment, identical for unique identification information multiple right when there is multiple data de-duplication engines
Image data determines the data de-duplication engine for executing data de-duplication operations according to identical rule, inevitable by same
Data de-duplication engine carries out data de-duplication processing, and the identical object data of content is in object storage device cluster 200
In can only be written into primary, therefore data de-duplication processing is global.If have multiple deduplication engines, each engine can be tieed up
Shield includes whole the first corresponding relationships and the second corresponding relationship of corresponding stored data object, and the embodiment of the present application
In each data de-duplication engine maintenance based on oneself complete data de-duplication processing object data established first
Corresponding relationship, and maintenance is based on oneself receiving the second corresponding relationship established by the object data that client 300 is sent,
It can effectively reduce the storage overhead and query cost of each data de-duplication engine.
In the present embodiment, the first corresponding relationship of each data de-duplication engine maintenance and the second corresponding relationship pass through
It is backed up at least one different data de-duplication engine at other, ensure that and safeguarded on each data de-duplication engine
The first corresponding relationship and the second corresponding relationship high reliability.For example, Hash Table 1 and copy Hash in Fig. 4
Table 1 ', Hash Table 2 and copy Hash Table 2 ', Hash Table3 and copy Hash Table3 ', and
Mapping Table 1 and copy Mapping Table 1 ', Mapping Table 2 and copy Mapping Table2 ',
Mapping Table 3 and copy Mapping Table 3 '.It should be noted that working as any one data de-duplication engine pair
When the first corresponding relationship or the second corresponding relationship oneself safeguarded have carried out update, which is also notified that
Maintenance has other data de-duplication engines of the copy of first corresponding relationship or the second corresponding relationship to synchronize copy
It updates.
In the present embodiment, in order to guarantee the high reliability of object data, when data de-duplication engine is to corresponding right
It, can be right in other object storage devices inside object storage device cluster 200 when storing a object data as storage equipment
Object data is backed up, to guarantee the high reliability of object data itself.It should be understood that other object storage devices are stored
The object data backup be not by data de-duplication engine handle after be stored in, data de-duplication engine only exists
Determine received object data be not repeated data when, the received object data can be just written in object storage device,
And whether the object data is backed up inside object storage device cluster 200, data de-duplication engine is simultaneously not concerned with.
It referring to figure 5., is the flow diagram of data processing method provided by the embodiment of the present application.The data processing
Method can be applied to the server 100 in object storage system 10 provided by above-described embodiment, be applied particularly to server 100
In data de-duplication engine.It should be noted that data processing method described in the embodiment of the present application not with Fig. 5 and
Specific order as described below is limitation, it should be understood that in other embodiments, data processing method described herein is wherein
The sequence of part steps can be exchanged with each other according to actual needs or part steps therein also can be omitted or delete.Under
Process shown in fig. 5 will be described in detail in face.
Step S101 receives the object data that client 300 is sent.
Wherein, in the present embodiment, each object data has a title (id), for example, object data a, title
For a.Client 300 is sent directly to corresponding data de-duplication engine according to third preset rules, by object data.Its
In, which may is that client 300 calculates cryptographic Hash according to the title of object data, be determined according to cryptographic Hash
Corresponding data de-duplication engine.Client 300 selects the rule for being sent to which data de-duplication engine, once really
After fixed, the identical object data of title can be sent to the same data de-duplication engine.
Step S102 carries out data de-duplication processing to received object data to judge the received number of objects
According to whether in the object storage device cluster 200 be repeated data.
Wherein, maintenance has the first corresponding relationship in the data de-duplication engine, and first corresponding relationship includes institute
The unique identification information and object data for stating the object data of data de-duplication engine maintenance are deposited into object storage device
Data name corresponding relationship.The data de-duplication engine is when receiving object data, according to the object data
Content calculates the unique identification information of the object data, according to the unique identification information being calculated in the first corresponding pass of maintenance
The record of the unique identification information is inquired in system.
When not finding the record of the unique identification information in the first corresponding relationship, then the received number of objects
According to for not repeated data.When inquiring the record of the unique identification information in the first corresponding relationship, then received data
For repeated data.
It certainly, can also be according to the unique identification in practice in order to keep the whether duplicate judgement of object data more accurate
The corresponding data name of information reads corresponding object data, and the object that will be read out from object storage device cluster 200
The content of data is compared with the content of the received object data, can be compared according to byte, if completely the same,
Then the received object data is repeated data;If inconsistent, data collision has occurred.
It should be noted that carried out using object as granularity due to data de-duplication processing, larger for granularity (such as
Greater than 4M) object data, data de-duplication engine deals with relatively difficult.Therefore, when the granularity of object data is larger
When, data de-duplication process flow can be skipped, which is directly written object storage device cluster by client 300
200.As another embodiment, when received object data is greater than default size, the data de-duplication engine can
The biggish object data of received granularity is divided into the multiple subobject numbers for meeting default size (e.g., less than or equal to 4M)
According to the progress data de-duplication processing of each subobject data.
Step S103 connects according to the first preset rules by described when the received object data is not repeated data
The object data of receipts is sent to corresponding object storage device in the object storage device cluster 200.
In the present embodiment, when the received object data be not repeated data when, data de-duplication engine according to
First preset rules store received object data into object storage device cluster 200.In the present embodiment, in order to distinguish
Whether the object data stored in object storage device passes through data de-duplication processing, can be according to unique mark of the object data
Know the information acquisition object data and be deposited into the data name in object storage device cluster 200, which includes to indicate
The received object data has carried out the identification information of data de-duplication processing.
In the present embodiment, first preset rules may is that the data name to the object data of acquisition is breathed out
Uncommon operation, so that it is determined which object storage device the received object data should be stored in.For example, for object data a,
Unique identification information is h1, obtains object data a according to unique identification information h1 and is deposited into object storage device cluster 200
In data name o1;Calculating cryptographic Hash to data name o1 is 1, it is determined that object data a should be stored in object storage device
OSD.1, and the title being deposited into object storage device OSD.1 is changed to o1 by a.What certainly, specifically it is deposited into using rule
In object storage device, the application and without limitation.
It should be noted that in the present embodiment, data de-duplication engine is set by object data write-in object storage
When standby, only corresponding object storage device need to be sent by object data according to the data name of object data, be deposited by object
Storage equipment oneself management storage medium space determines the storage location of object data, such as space logical block (LBA) of hard disk, and
By object data storage to corresponding position.For example, object storage device can be determined according to the data name of object data
The logical block address that should be stored, and the data name of record object data is deposited with object data after the storage for completing object data
(for example, the corresponding logical block address of data name o1 is LBA1, data name o8 is corresponding for the corresponding relationship for the logical block address put
Logical block address be LBA8).
In the present embodiment, the data that data de-duplication engine direct reception client end 300 is sent carry out repeated data and delete
After processing, be then forwarded to object storage device and stored, when object storage device be it is multiple when, the mode of the present embodiment and
Data de-duplication only is carried out by each object storage device oneself in the prior art to compare, and be can be realized the overall situation and is deleted again, improves
Efficiency is deleted again.
Optionally, when the received object data of institute is not repeated data, data de-duplication engine need to safeguard oneself
The first corresponding relationship be updated so as to the inquiry of subsequent repeated data, therefore, the data processing method can also include:
Step S104, when the received object data is not repeated data, only by the received object data
One identification information and the received object data are deposited into the corresponding relationship of the data name in corresponding object storage device
It is inserted into first corresponding relationship.
It should be noted that in the present embodiment, when received object data is repeated data, data de-duplication draws
It holds up and the object data is not written to object storage device cluster 200, it only need to will be received in the first corresponding relationship of maintenance
The corresponding reference count of the unique identification information of object data increases once, indicates that the unique identification is directed toward in a new reference
Object data in the corresponding deposit object storage device of information.When data collision occurs, data de-duplication engine is then straight
It connects and object storage device cluster 200 is written into object data, title of the object data in storage equipment is not changed, without more
New first corresponding relationship.For example, object data k is stored in the name in object storage device when data collision occurs for object data k
It is still referred to as k.
Optionally, in order to which whether query object data have been stored in object storage device cluster 200, the repeated data is deleted
Except also maintenance has the second corresponding relationship in engine, second corresponding relationship includes connecing for the data de-duplication engine maintenance
Receive the corresponding relationship of the title for the object data that client 300 is sent and the unique identification information of object data.At the data
Reason method further include:
Step S105, if the unique identification information of the title of the received object data and the received object data
Corresponding relationship do not recorded in second corresponding relationship, then by the title of the received object data and the reception
The corresponding relationship of unique identification information of object data be inserted into second corresponding relationship.
In the present embodiment, when pair of the title of received object data and the unique identification information of received object data
It should be related to and not recorded in second corresponding relationship, show that the object data is a new data, to the object data
After executing data de-duplication processing operation, no matter the object data is repeated data or non-repeated data, all needs to receive
The title of object data and the corresponding relationship of unique identification information of received object data be inserted into the second corresponding relationship.
It should be noted that in the present embodiment, when data collision has occurred in judgement, for colliding data, repeat number
Object storage device cluster 200 is directly written with original title according to engine is deleted, and by the title and number of collisions of colliding data
According to the corresponding relationship of specific identification information be inserted into the second corresponding relationship, wherein the title of colliding data is client
The specific identification information of the title of 300 object datas sent, colliding data can be replaced with a particular value, for example, this is special
Value can be " sky ", and data de-duplication engine inquires the unique identification information of object data in the second corresponding relationship as spy
When different value, it can determine whether that the object data is colliding data.
Data de-duplication engine is after receiving the read/write requests of object data of the transmission of client 300, according to object
The title of data inquires the record of corresponding unique identification information in the second corresponding relationship safeguarded, can determine whether that object data is
It is no to be stored in object storage device cluster 200, if having record in the second corresponding relationship, show the object data in object
It is stored in memory device set group 200.
In the present embodiment, if the received object data is divided into the default size of satisfaction by data de-duplication engine
Multiple subobject data carry out data de-duplication processing, at this time the second corresponding relationship of data de-duplication engine maintenance
It is varied.As shown in fig. 6, being changed to by the corresponding relationship of the unique identification information of the title and object data of object data: (right
The title Client Object of image data, start offset offset of the subobject data in object data) and subobject data
Unique identification information Hash corresponding relationship.In fig. 6, it is supposed that object data a is divided into two sub- object datas, then
What offset1 was indicated is start offset of one of subobject data in object data, and what offset2 was indicated is another
Start offset of the subobject data in object data.For example, the object data a that granule size is 8M is divided into two 4M's
Subobject data, then offset1 is 0M, offset2 4M, for meeting object data (such as the object data c of default size
With object data d), corresponding start offset is offset1 (i.e. 0M).When client 300 needs to read object data a
When, data de-duplication engine reads out all subobject data of object data a from object storage device, then according to every
Two sub- object datas are carried out group by start offset (i.e. offset1 and offset2) of a sub- object data in object data a
Conjunction obtains object data a and returns to client 300.
Fig. 7 is please referred to, is the flow diagram of data processing method provided by another embodiment of the application.In this implementation
In example, the data de-duplication engine is at least two, and each data de-duplication engine maintenance is completed based on oneself
The first corresponding relationship that the object data of data de-duplication processing is established, and maintenance are based on oneself receiving by client
The second corresponding relationship that 300 object datas sent are established.Wherein, each data de-duplication engine receives client 300
The object data of transmission is the object data that each data de-duplication engine is sent to according to the second preset rules, this is extremely
Few two data de-duplication engines include the first data de-duplication engine and the second data de-duplication engine.The method
Include:
Step S201, the first data de-duplication engine receive the object data that client 300 is sent.
In the present embodiment, client 300 sends the first repeated data for the object data according to third preset rules
Delete engine.
Step S202 passes through when the first data de-duplication engine receives the object data that the client 300 is sent
Second preset rules determine the second data de-duplication engine for executing data de-duplication processing operation, will be described received right
Image data is sent to the second data de-duplication engine.
In the present embodiment, after the first data de-duplication engine receives the object data, pass through the second preset rules
Determine the second data de-duplication engine for executing data de-duplication processing operation.Wherein, which may is that
The unique identification information of object data calculates cryptographic Hash based on the received, determines the repetition for executing data de-duplication processing operation
Data delete engine.That is the content of the first data de-duplication engine object data based on the received calculates the object data only
One identification information calculates cryptographic Hash according to the unique identification information, so that it is determined that object data is sent to the second repeated data
It deletes engine and carries out data de-duplication processing.It should be noted that the first data de-duplication engine is pre- by second
If the second data de-duplication engine determined by rule is also likely to be first data de-duplication engine itself, at this time then by
One data de-duplication engine oneself executes data de-duplication processing operation.
Step S203, the second data de-duplication engine based on the second data de-duplication engine maintenance
One corresponding relationship carries out data de-duplication processing to the received object data.
In the present embodiment, the second data de-duplication engine inquired in the first corresponding relationship oneself safeguarded described in connect
The record of the unique identification information of the object data of receipts, to judge that the received object data is repeated data, non-repeated data
Or colliding data, specifically refers to the corresponding contents of above-mentioned steps S102.
In the present embodiment, data de-duplication engine has multiple, which data de-duplication engine to execute repeated data by
Delete operation can be set according to set rule, not go to change the rule that client 300 sends data in the present embodiment
(such as third preset rules) need client 300 also for the original rule for not changing system when avoiding using this programme
Do too big change.Theoretically, client 300 can also directly determine the engine for carrying out data de-duplication operations, without
Data de-duplication engine determines.
Therefore, when there is multiple data de-duplication engines, as long as the rule of selection data de-duplication engine is identical
, the identical object data of content will be addressed to the same data de-duplication engine and be deleted processing again, if it is repetition
Data can then be found in the data de-duplication engine.And if do not found in the data de-duplication engine,
It is also impossible to find on other engines.Therefore, the engine of data de-duplication processing operation is carried out, it is only necessary to tie up based on oneself
First corresponding relationship of shield is deleted processing again and can be achieved with global deleting again.
And if it is determined that object data be not repeated data when, the embodiment of the present application can also include: step S204, when described
Received object data be not repeated data when, the second data de-duplication engine by the received object data only
One identification information and the received object data are deposited into the correspondence of the data name in the corresponding object storage device
Relationship is inserted into the first corresponding relationship of the second data de-duplication engine maintenance.
In the present embodiment, similar with upper one embodiment, when received object data is not repeated data, the second weight
Complex data deletes engine and sends the object storage device cluster for the received object data according to the first preset rules
Corresponding object storage device in 200, and by the unique identification information of received object data and the received number of objects
The second data de-duplication engine is inserted into according to the corresponding relationship for the data name being deposited into corresponding object storage device to be tieed up
In first corresponding relationship of shield, the corresponding reference count of the unique identification information of the received object data at this time is 1.
After the completion of the processing of the second data de-duplication engine, handled successfully to the first data de-duplication engine feedback,
First data de-duplication engine is by the unique identification of the title of the received object data and the received object data
The corresponding relationship of information is inserted into the second corresponding relationship that the first data de-duplication engine is safeguarded.
When received object data is repeated data, the second data de-duplication engine need to only be corresponded in the first of maintenance
The reference count of the unique identification information of received object data is increased once, without writing received object data in relationship
Enter to object storage device;And the first data de-duplication engine need to be by the title of the received object data and the reception
Object data unique identification information corresponding relationship be inserted into oneself maintenance the second corresponding relationship in.Due to repeated data
Refer to the object data that content is identical but title is different, therefore when received object data is repeated data, although without pair
The object data is stored, if but client 300 can pass through data de-duplication engine when needing to read the object data
Pair of the title of the object data and the unique identification information of object data is inquired in the second corresponding relationship of oneself maintenance
The record that should be related to determines object data deposit object storage device further according to unique identification information and the first corresponding relationship
Title, and then read data from object storage device and return to client 300, therefore, it is also required in the second corresponding relationship
The title of repeated data and the corresponding relationship of unique identification information are recorded, to guarantee that client 300 can read data.
When received object data is colliding data, the second data de-duplication engine draws to the first data de-duplication
The prompt of feedback data conflict is held up, after the first data de-duplication engine receives the prompt of data collision, directly by number of objects
Object storage device cluster 200 is written in original title accordingly, and the title of the object data is recorded in the second corresponding relationship
With corresponding specific identification information.In the present embodiment, in the case where multiple data de-duplication engines, each repeated data
Delete the first corresponding relationship and dimension that the object data that engine maintenance is handled based on oneself completion data de-duplication is established
Shield, which has, receives the second corresponding pass established by the object data that client 300 is sent based on current data de-duplication engine
System.In this way, the corresponding relationship in whole system is divided into multiple portions, it is respectively stored in different data de-duplication engines
In.And in the case where selecting data de-duplication engine according to established rule, the corresponding relationship of identical object data only can be
It is safeguarded in determining data de-duplication engine, while saving query cost and storage overhead, is also able to achieve the overall situation and deletes again.
In the following, in conjunction with practical application scene respectively to not repeated data, repeated data and colliding data be written process into
Row is described in detail.Wherein, at least two data de-duplications engine is respectively Eng.1, Eng.2, Eng.3.
As shown in figure 8, the process may include following steps for the flow diagram of not repeated data write-in.
Client 300 according to the title of object data a carry out Hash calculation after (i.e. according to third preset rules), by object
Data a is sent directly to Eng.2, step one as shown in Figure 8.
The unique identification information h1 of object data a is calculated in the content of Eng.2 object data a based on the received, to only
One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data a and corresponding unique identification
Information h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 8.
After Eng.1 receives object data a and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance
The record for not finding unique identification information h1 in Hash Table 1 determines object data a for not repeated data, to unique
Identification information h1 obtains the data name that object data a is deposited into corresponding object storage device plus Magic number (or prefix)
For o1, the corresponding relationship of unique identification information h1 and data name o1 are inserted into the first corresponding relationship Hash Table 1, this
When unique identification information h1 reference count be 1, step three as shown in Figure 8.
Eng.1 carries out Hash calculation (i.e. according to the first preset rules) according to data name o1 and determines storage object data a
Object storage device OSD.1, and object data a is stored in object storage device OSD.1, title is changed to o1 by a, such as
Step 4 shown in Fig. 8.
Eng.1 is handled successfully to Eng.2 return, step five as shown in Figure 8.
The corresponding relationship of the title a of object data and unique identification information h1 is inserted into second pair of oneself maintenance by Eng.2
It should be related in Mapping Table 2, step six as shown in Figure 8.
Eng.2 returns to object data a to client 300 and is written successfully, step seven as shown in Figure 8.
As shown in figure 9, for the flow diagram of repeated data write-in.The process may include following steps.
Client 300 according to the title of object data d carry out Hash calculation after (i.e. according to third preset rules), by object
Data d is sent directly to Eng.3, step one as shown in Figure 9.
The unique identification information h1 of object data d is calculated in the content of Eng.3 object data d based on the received, to only
One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data d and corresponding unique identification letter
Breath h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 9.
After Eng.1 receives object data d and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance
The record that unique identification information h1 is found in Hash Table 1, according to the corresponding data name o1 of unique identification information h1,
The object data that data name is o1 is read from corresponding object storage device, by data name in the object data of o1
Appearance is compared with the content of object data d, as a result completely the same, and Eng.1 confirms that object data d is repeated data, no
Object data d is written in object storage device, step three as shown in Figure 9.
Eng.1 is in the first corresponding relationship Hash Table 1 oneself safeguarded by the corresponding reference of unique identification information h1
It counting and increases once, reference count at this time is 2, show that the object data o1 being stored in object storage device has been cited twice,
Step four as shown in Figure 9.
Eng.1 is handled successfully to Eng.3 return, step five as shown in Figure 9.
The corresponding relationship of the title d of object data and unique identification information h1 is inserted into the second corresponding relationship by Eng.3
In Mapping Table 3, step six as shown in Figure 9.
Eng.3 returns to object data d to client 300 and is written successfully, step seven as shown in Figure 9.
It as shown in Figure 10, is the flow diagram of colliding data write-in.The process may include following steps.
Client 300 according to the title of object data k carry out Hash calculation after (i.e. according to third preset rules), by object
Data k is sent directly to Eng.3, step one as shown in Figure 10.
The unique identification information h1 of object data k is calculated in the content of Eng.3 object data k based on the received, to only
One identification information h1 calculates cryptographic Hash (i.e. according to the second preset rules), determines object data k and corresponding unique identification letter
Breath h1 is forwarded to Eng.1 and is handled, step two as shown in Figure 10.
After Eng.1 receives object data k and corresponding unique identification information h1, in the first corresponding relationship of oneself maintenance
The record that unique identification information h1 is found in Hash Table 1, according to the corresponding data name o1 of unique identification information h1,
The object data that data name is o1 is read from corresponding object storage device, by data name in the object data of o1
Appearance is compared with the content of object data k, as a result inconsistent, and Eng.1 determines that object data k is colliding data, is such as schemed
Step 3 shown in 10.
Eng.1 is to Eng.3 returned data conflict, step four as shown in Figure 10.
Eng.3 directly calculates cryptographic Hash according to the title k of object data k, determines that object data k should be deposited according to cryptographic Hash
The object storage device put, step five as shown in Figure 10.Wherein it is determined that the rule of object storage device can be according to specific feelings
Condition setting, such as directly determined according to the cryptographic Hash of k, the present embodiment does not limit.
The title k of object data and the corresponding relationship of specific identification information are inserted into the second corresponding relationship Mapping by Eng.3
In Table 3, for example, the unique identification information is particular value " sky ", step six as shown in Figure 10.When progress reading data
When, the corresponding unique beacon information of the object data title wanted to look up is found in the second corresponding relationship as sky, then it can be seen that
The data wanted to look up are colliding datas, just determine storage to the rule of which storage equipment according to colliding data to determine at which
It is searched in a storage equipment.
Eng.3 returns to object data k to client 300 and is written successfully, step seven as shown in Figure 10.
In the above-described embodiments, the first data de-duplication engine receive client 300 transmission object data when,
If only considering received object data all is to be written for the first time, the first data de-duplication engine can directly based on the received
The content of object data calculates unique identification information, so according to unique identification information determine the second data de-duplication engine into
Row data de-duplication operations.But in practical applications, the received object data of data de-duplication engine may not also be
Write-once, data de-duplication engine is not for being that the object data of write-in for the first time need to execute update operation.Please refer to figure
11, it is the flow diagram of data processing method provided by another embodiment of the application.At data in the embodiment of the present application
Reason method can be used for realizing the update of object data, applied to data de-duplication engine or equipped with data de-duplication engine
Server 100.Process shown in Figure 11 will be described in detail below.
Step S301 receives the object data that client 300 is sent.
In the present embodiment, client 300 sends corresponding repeated data for object data according to third preset rules
Delete engine.
Step S302, if the unique identification information of the title of the received object data and the received object data
Corresponding relationship have record in second corresponding relationship, in first corresponding relationship, have described in record
The corresponding reference count of the unique identification information of the received object data reduces primary.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update
When engine is the same engine, when data de-duplication engine receives the object data of the transmission of client 300, oneself tieed up
The note of the unique identification information of the title and received object data of the object data of inquire-receive in second corresponding relationship of shield
Record, if having record in the second corresponding relationship, object data is not to be written for the first time, needs to be implemented the update behaviour of object data
Make.Due to being upgating object data, then the object data, which will be quoted no longer, has unique identification in record in the first corresponding relationship
The corresponding object data of information, therefore data de-duplication engine need to will have note in the first corresponding relationship oneself safeguarded
The corresponding reference count of the unique identification information of received object data described in record reduces primary.
In the present embodiment, when receive object data data de-duplication engine (such as the third repeating data deletion draw
Hold up) with the data de-duplication engine (such as the 4th data de-duplication engine) updated is executed for the same engine when, the
Three data de-duplication engines receive the object data that the client 300 is sent according to third preset rules, if described
The corresponding relationship of the unique identification information of the title of received object data and the received object data is in the third weight
Have record in second corresponding relationship of complex data deletion engine maintenance, has received number of objects described in record according to described
According to unique identification information determine the 4th deduplication engine;In the first corresponding relationship of the 4th deduplication engine maintenance
In, the corresponding reference count of the unique identification information for having received object data described in record is reduced primary.
Step S303, when the corresponding reference meter of the unique identification information for having received object data described in record
When number is 0, sent out to the object storage device for the unique identification information corresponding objects data for being stored with the received object data
Send the instruction for deleting the corresponding objects data.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update
When engine is the same engine, when the reference count of the unique identification information is reduced to 0, show that client 300 has not had
Object data will quote the object data in the corresponding deposit object storage device of unique identification information in the first corresponding relationship,
It needs to delete the object data from object storage device cluster 200, i.e., to being stored with the received object data only
The object storage device of one identification information corresponding objects data sends the instruction for deleting the corresponding objects data, is stored by object
Equipment deletes the corresponding objects data, discharges memory space.
In the present embodiment, when the data de-duplication engine is at least two, one it is optional implement, work as reception
When the data de-duplication engine of object data and the data de-duplication engine for executing update are the same engine, see
One section of description.In another optional implementation, when data de-duplication engine (such as the third repeating number for receiving object data
According to deletion engine) do not draw with the data de-duplication engine (such as the 4th data de-duplication engine) for executing update to be same
When holding up, if the received object data in the first corresponding relationship of the 4th data de-duplication engine maintenance is unique
Identification information reference count is 0, to pair for the unique identification information corresponding objects data for being stored with the received object data
As storage equipment sends the instruction of the deletion corresponding objects data.
Step S304, if the unique identification information of the title of the received object data and the received object data
Corresponding relationship have record in second corresponding relationship, recalculate the received object data unique identification letter
Breath, in second corresponding relationship only by the corresponding received object data of the title of the received object data
One update of identification information is the unique identification information for the received object data being calculated again.
In the present embodiment, in the corresponding reference of unique identification information to having received object data described in record
It counts after reducing once, due to being the unique identification information for being updated to object data, therefore needing to recalculate object data,
The write-in that data are realized in data de-duplication processing is carried out to object data according to the unique identification information recalculated, i.e.,
Whether there is record in the unique identification information for the object data that the first corresponding relationship inquiry of oneself maintenance is recalculated, into
And judging the received object data is repeated data, non-repeated data or colliding data.It completes at data de-duplication
After reason, by the corresponding received object data of the title of the received object data in second corresponding relationship
Unique identification information is updated to the unique identification information of the received object data recalculated.Wherein, if completing
The object data is colliding data after data de-duplication processing, then by the received object in second corresponding relationship
The unique identification information of the corresponding received object data of the title of data is updated to specific identification information.
In the present embodiment, optional one when the data de-duplication engine is at least two based on described previously
In implementation, when the data de-duplication engine for receiving object data draws with the data de-duplication engine updated is executed to be same
When holding up, the description of the preceding paragraph is seen.In another optional implementation, when the data de-duplication engine for receiving object data
(such as the third repeating data delete engine) (such as the 4th data de-duplication draws with the data de-duplication engine for executing update
Hold up) for the same engine when, the third repeating data delete the unique identification information that engine recalculates object data, according to weight
The unique identification information that is newly calculated, which determines, to be sent to the 4th data de-duplication engine (or other data de-duplications draws
Hold up), data de-duplication processing is carried out to object data and realizes the write-in of object data;In the 4th data de-duplication engine
After completing data de-duplication processing, the third repeating data are deleted engine and are connect in the second corresponding relationship oneself safeguarded by described
What the unique identification information of the corresponding received object data of the title of the object data of receipts was updated to recalculate
The unique identification information of the received object data.Certainly, if the 4th data de-duplication engine carries out weight to object data
When complex data delete processing judges the object data for colliding data, then the third repeating data delete engine in the oneself safeguarded
By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships
It is updated to specific identification information.
It should be noted that if data de-duplication engine inquires the reception in the second corresponding relationship safeguarded
Object data title, and be specific identification information corresponding to the title of the object data, then can determine the object data
It has been stored in object storage device, and has been written in a manner of colliding data.Update for colliding data, it is right based on the received
The title of image data first deletes corresponding object data in object storage device cluster 200, then recalculates reception
Object data unique identification information, and the received number of objects is realized according to the unique identification information that recalculates
According to write-in.
Ground is readily appreciated that, when data de-duplication engine does not inquire described connect in the second corresponding relationship oneself safeguarded
When the unique identification information of the object data of receipts, then the object data is to be written for the first time, and process flow can refer to aforementioned reality
Apply the corresponding contents of data processing method described in example.
In the present embodiment, data de-duplication engine receive client 300 transmission object data when, by
The record of the unique identification information of the object data of inquire-receive can determine whether the reception in second corresponding relationship of oneself maintenance
Object data whether have and stored in object storage device cluster 200.If inquiring unique mark of received object data
Know information, then show that the object data has been stored in object storage device cluster 200, the update operation of object data need to be carried out;Such as
Fruit inquires the unique identification information less than received object data, then oneself calculates the unique identification information of object data, root
Data de-duplication processing is carried out according to the unique identification information of calculated object data, the write-in of object data is realized, locates
Reason process can refer to the corresponding contents of data processing method described in previous embodiment.
In the present embodiment, in order to which the process for updating object data is more clear, below in conjunction with a reality
The process of upgating object data is described in detail in application scenarios.Figure 12 is please referred to, which may include following steps.
Client 300 according to the title of object data a carry out Hash calculation after (i.e. according to third preset rules), by object
Data a is sent directly to Eng.2, step one as shown in figure 12.
Eng.2 inquires unique mark of object data a in the second corresponding relationship Mapping Table 2 oneself safeguarded
Know information h1, show be this time more new data operation.According to the unique identification information h1 inquired transmit a request to Eng.1 into
Row is handled, and includes unique identification information h1, step two as shown in figure 12 in the request.
Eng.1 inquires the record of unique identification information h1 in the first corresponding relationship Hash Table 1 oneself safeguarded,
The reference count of unique identification information h1 is reduced once, reference count at this time is to be kept to 1 by 2, because being originally object data a
H1 is corresponded to object data d, is to be updated to object data a now, object data a may just not correspond to h1, but right
Image data d also corresponds to h1, and the object data o1 being stored in object storage device at this time can't be deleted.It should be noted that if
The corresponding reference count of unique identification information h1 is 0, then corresponds to h1 without object data, object data o1 need not just be stored
, need to delete the corresponding object data o1 of unique identification information h1 from object storage device cluster 200, as shown in figure 12
The step of three.
Eng.1 returns to processing to Eng.2 and completes, step four as shown in figure 12.
Eng.2 recalculates the unique identification information (i.e. according to the second preset rules) of object data a, obtains h8, according to
Object data a should be forwarded to Eng.3 processing, step five as shown in figure 12 by unique identification information h8 determination.
After Eng.3 receives object data a, do not searched in the first corresponding relationship Hash Table 3 of oneself maintenance
To the record of unique identification information h8, determine that object data a for not repeated data, determines object according to unique identification information h8
It is o8 that data a, which is deposited into the data name in corresponding object storage device, by unique identification information h8's and data name o8
Corresponding relationship is inserted into the first corresponding relationship Hash Table 1, and the reference count of unique identification information h8 at this time is 1, such as Figure 12
Shown in step 6.
Object data a is stored in corresponding object storage device by Eng.3 according to the first preset rules, and the number of objects
It is o8, step seven as shown in figure 12 according to data name of a in the object storage device.
Eng.3 is handled successfully to Eng.2 return, step eight as shown in figure 12.
The corresponding relationship of the title a of object data and unique identification information h1 is updated to the title a of object data by Eng.2
With the corresponding relationship of unique identification information h8, step nine as shown in figure 12.
Eng.2 is written successfully to 300 returning an object value data a of client, step ten as shown in figure 12.
In the present embodiment, when client 300 needs reading object data, the data de-duplication engine can be with
Object data is obtained from object storage device cluster 200 according to the first corresponding relationship and the second corresponding relationship safeguarded, and
It feeds back to client 300.Figure 13 is please referred to, is the process signal of data processing method provided by another embodiment of the application
Figure, this method are applied to data de-duplication engine or the server 100 equipped with data de-duplication engine.In the present embodiment
Data processing method below will carry out process shown in Figure 13 detailed for obtaining object data (or reading object data)
It is thin to illustrate.
Step S401 receives the read request for the target object data that client 300 is sent.
In the present embodiment, when client 300 needs to read target object data, according to third preset rules by target
The read request of object data is sent to corresponding data de-duplication engine, i.e., according to the title of target object data to corresponding
Data de-duplication engine sends read request.As it can be seen that client 300 is according to third preset rules when writing data by object data
It is sent to corresponding data de-duplication engine, then also according to third preset rules by target object data when reading data
Read request is sent to corresponding data de-duplication engine, ensures that client 300 centainly can be by the repeat number in this way
According to the acquisition for deleting engine implementation target object data.
Step S402, the second corresponding relationship according to the name query of the target object data.
In the present embodiment, when data de-duplication engine receives the target object data that the client 300 is sent
Read request when, the second corresponding relationship for being safeguarded according to the name query of the target object data.
When the data de-duplication engine is at least two, which includes the 5th
Data de-duplication engine and the 6th data de-duplication engine, the 5th data de-duplication engine are receiving the target pair
When the read request of image data, the second corresponding relationship of the 5th data de-duplication engine maintenance is inquired.In the present embodiment,
Assuming that the read request of target object data is sent to the 5th data de-duplication according to third preset rules by client 300
Engine.
Step S403, when the unique identification information for inquiring the target object data in second corresponding relationship
When, then according to first corresponding relationship, inquire the corresponding target pair of unique identification information of the target object data
Image data is deposited into the data name in object storage device.
When the data de-duplication engine for receiving read request is same with the data de-duplication engine for executing read request
When engine, which can inquire the unique identification information of target object data according to the second corresponding relationship,
And according to the first corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data is deposited
Enter the data name into object storage device.
When the data de-duplication engine (such as the 5th data de-duplication engine) for receiving read request and execute read request
Data de-duplication engine (such as the 6th data de-duplication engine) be the same engine when, when the 5th repeat number
The target object number is inquired in the second corresponding relationship of the 5th data de-duplication engine maintenance according to engine is deleted
According to unique identification information when, then the 6th data de-duplication engine is determined according to the second preset rules, by the target object
The read request of data is sent to the 6th data de-duplication engine.The 6th data de-duplication engine is the described 6th
In first corresponding relationship of data de-duplication engine maintenance, the unique identification information according to the target object data is true
The fixed target object data is deposited into the data name in object storage device.
Step S404 is deposited into the data name in object storage device according to the target object data and reads the mesh
Mark object data feeds back to the client 300.
When the data de-duplication engine for receiving read request is same with the data de-duplication engine for executing read request
When engine, which can be deposited into the data name in object storage device according to the target object data
Read takes the target object data to feed back to the client 300.
When the data de-duplication engine (such as the 5th data de-duplication engine) for receiving read request and execute read request
Data de-duplication engine (such as the 6th data de-duplication engine) be the same engine when, the sixfold complex data
It deletes engine and the data name in object storage device is deposited into from the object storage device according to the target object data
It is middle to read the target object data, the target object data is fed back into the 5th data de-duplication engine, it is described
The target object data is fed back to the client 300 by the 5th data de-duplication engine.
In the present embodiment, it since object data is still written in the either reading object data of client 300, is all made of same
One rule is sent to the processing of data de-duplication engine, and the title of object data is identical, then can be sent to the same repeated data
Delete engine.Data de-duplication engine can be looked into according to the title of target object data in the second corresponding relationship of oneself maintenance
Corresponding unique identification information is ask, then inquires the target object number in the first corresponding relationship according to unique identification information
According to the data name being deposited into object storage device, and reads the target object data and feed back to the client 300, from
And realize the acquisition of object data.
It should be noted that data de-duplication engine is corresponding in second oneself safeguarded according to the title of target object data
Inquired in relationship corresponding unique identification information be specific identification information when, it may be determined that the target object data is number of collisions
According to, data de-duplication engine according to can directly be read from object storage device cluster 200 according to the title of target object data
The target object data is taken, and feeds back to the client 300.
In the present embodiment, in order to keep the process for obtaining object data more clear, below in conjunction with a reality
Application scenarios are described in detail.As shown in figure 14, the process for obtaining object data may include following steps.
Client 300 according to the title of target object data d carry out Hash calculation after (i.e. according to third preset rules), will
The read request of target object data d is sent directly to Eng.3, step one as shown in figure 14.
Eng.3 inquires target object data d only in the second corresponding relationship Mapping Table 3 oneself safeguarded
One identification information h1 calculates cryptographic Hash to unique identification information h1 and determines that read request, which is forwarded to Eng.1, to be handled, such as Figure 14
Shown in step 2.
After Eng.1 receives the read request of target object data d, in the first corresponding relationship Hash of oneself maintenance
The record that unique identification information h1 is found in Table1 reads the target that data name is o1 from corresponding object storage device
Object data, step three as shown in figure 14.
The target object data of reading is returned to Eng.3, step four as shown in figure 14 by Eng.1.
The target object data is back to client 300, step five as shown in figure 14 by Eng.3.
Figure 15 is please referred to, is the functional block diagram of data processing equipment 400 provided by another embodiment of the application.It is described
Data processing equipment 400 includes one or at least two data de-duplication engines, is stored in memory 110, and by handling
Device 120 executes.It should be noted that the technology of data processing equipment 400 provided by the present embodiment, basic principle and generation
Effect is identical with data processing method described in the various embodiments described above, and to briefly describe, the present embodiment part does not refer to place, can
With reference to the corresponding contents for the data processing method that the various embodiments described above provide.
In the present embodiment, the data de-duplication engine may include receiving module 410, data write-in processing module
420, sending module 430 and corresponding relationship maintenance module 440.It should be noted that the data processing equipment 400 in the present embodiment wraps
The case where including a data de-duplication engine and at least two data de-duplication engines, and each data de-duplication engine
It include identical functional module or unit, therefore, in the present embodiment, only in the angle pair of a data de-duplication engine
The process that reads and writees of object data is illustrated.
When object data need to be written in client 300, if receiving the data de-duplication engine of object data and executing weight
The data de-duplication engine of complex data delete processing be the same engine, then realize object data write-in receiving module 410,
It is that the same data de-duplication draws that processing module 420, sending module 430 and corresponding relationship maintenance module 440, which is written, in data
The module held up.
The receiving module 410 is used to receive the object data of the transmission of client 300.
The data write-in processing module 420 is used to carry out received object data data de-duplication processing to sentence
Whether the received object data that breaks is repeated data in the object storage device cluster 200.
After the completion of data de-duplication processing, the sending module 430 is used to when the received object data be not
When repeated data, the object storage device cluster 200 is sent by the received object data according to the first preset rules
In corresponding object storage device.
The corresponding relationship maintenance module 440 is for safeguarding the first corresponding relationship, when the received object data is not
When repeated data, the unique identification information of the received object data and the received object data are deposited into corresponding
The corresponding relationship of data name in object storage device is inserted into first corresponding relationship.
Optionally, the corresponding relationship maintenance module 440 is also used to safeguard the second corresponding relationship, second corresponding relationship
Including the data de-duplication engine maintenance receive client 300 send object data title and object data it is unique
The corresponding relationship of identification information.If title and institute of the corresponding relationship maintenance module 440 for the received object data
The corresponding relationship for stating the unique identification information of received object data does not record in second corresponding relationship, then will be described
The corresponding relationship of the unique identification information of the title of received object data and the received object data is inserted into described the
In two corresponding relationships.
In the present embodiment, when the data de-duplication engine is at least two, each data de-duplication
The corresponding relationship maintenance module 440 of engine completes data de-duplication processing based on current data de-duplication engine for safeguarding
The first corresponding relationship for being established of object data, the current data de-duplication engine is the corresponding relationship maintenance module
The 440 data de-duplication engines being currently located.If receiving the data de-duplication engine of object data and executing repeated data
The data de-duplication engine of delete processing is not the same engine, then realizes receiving module 410, the data of object data write-in
It is the mould on different data de-duplication engines that processing module 420, sending module 430 and corresponding relationship maintenance module 440, which is written,
Block.
When the receiving module 410 receives the object data of the transmission of client 300, where the receiving module 410
Data de-duplication engine data write-in processing module 420 be also used to through the second preset rules to the visitor received
The object data that family end 300 is sent is analyzed, to determine that the target repeated data for executing data de-duplication processing operation is deleted
The target data de-duplication engine is forwarded to except engine, and by the received object data.
The data write-in processing module 420 of the target data de-duplication engine is used to be based on the target repeated data
The first corresponding relationship for deleting engine maintenance carries out data de-duplication to the received object data and handles to judge that this is right
Whether image data is repeated data.
The sending module 430 of the target data de-duplication engine is used to when the received object data be not repeat
When data, it is right in the object storage device cluster 200 to send the received object data to according to the first preset rules
The object storage device answered.
The corresponding relationship maintenance module 440 of the target data de-duplication engine, which is used to work as, judges the received object
Data be not repeated data when, the unique identification information of the received object data and the received object data are stored in
Pair of the target data de-duplication engine is inserted into the corresponding relationship of the data name in corresponding object storage device
In the first corresponding relationship for answering relationship safeguard module 440 to safeguard.
If receiving the corresponding relationship maintenance module 440 of the data de-duplication engine of object data for described received right
The corresponding relationship of the unique identification information of the title of image data and the received object data is in second corresponding relationship
Do not record, then it is the title of the received object data is corresponding with the unique identification information of the received object data
Relationship is inserted into second corresponding relationship.
It in the present embodiment, can be with by judging whether received object data has record in the second corresponding relationship
Determine whether to execute the object data and updates operation.Wherein, when the data de-duplication engine and execution for receiving object data
When the data de-duplication engine of update is the same engine, if the corresponding relationship maintenance module 440 is for described received
The corresponding relationship of the unique identification information of the title of object data and the received object data is in second corresponding relationship
In have record, in first corresponding relationship, by it is described have record described in received object data unique identification
The corresponding reference count of information reduces primary.
In the present embodiment, when the data de-duplication engine of the data processing equipment 400 is at least two, if connecing
Receiving the data de-duplication engine of object data and executing the data de-duplication engine updated is not the same engine, is received
If the data write-in processing module 420 of the data de-duplication engine of object data is used for the title of the received object data
Corresponding relationship with the unique identification information of the received object data is in working as where data write-in processing module 420
Have record in second corresponding relationship of preceding data de-duplication engine maintenance, then determines that target repeats according to the second preset rules
Data delete engine, send the target weight for the unique identification information for having received object data described in record
Complex data deletes engine.
The corresponding relationship maintenance module 440 of the target data de-duplication engine in the target repeated data for deleting
Except in the first corresponding relationship of engine maintenance, by the unique identification information pair for having received object data described in record
The reference count answered reduces primary.
The unique identification information that the sending module 430 is also used to described to have received object data described in record
When corresponding reference count is 0, to pair for the unique identification information corresponding objects data for being stored with the received object data
As storage equipment sends the instruction of the deletion corresponding objects data.
In the present embodiment, when the data de-duplication that the data de-duplication engine and execution that receive object data update
When engine is the same engine, then the sending module 430 for receiving the data de-duplication engine of object data is used for being stored with
The object storage device of the unique identification information corresponding objects data of the received object data, which is sent, deletes described pair of reply
The instruction of image data.
When the data de-duplication engine for receiving object data and the data de-duplication engine for executing update are not same
When a engine, if then the sending module 430 of target data de-duplication engine is tieed up for the target data de-duplication engine
The unique identification information reference count of the received object data in first corresponding relationship of shield is 0, described to being stored with
The object storage device of the unique identification information corresponding objects data of received object data, which is sent, deletes the corresponding objects number
According to instruction.
In the present embodiment, if data write-in processing module 420 is also used to the title of the received object data
Has record in second corresponding relationship with the corresponding relationship of the unique identification information of the received object data, again
The unique identification information for calculating the received object data carries out repeat number according to the unique identification information recalculated
According to delete processing.
The corresponding relationship maintenance module 440 is also used to the received object data in second corresponding relationship
The corresponding received object data of title unique identification information be updated to recalculate it is described received right
The unique identification information of image data.In the present embodiment, when receive object data data de-duplication engine and execute update
Data de-duplication engine be the same engine when, then receive in the data de-duplication engine of object data data write-in
Processing module 420 is used to recalculate the unique identification information of the received object data, according to recalculating only
One identification information carries out data de-duplication processing;Receive the corresponding relationship maintenance in the data de-duplication engine of object data
Module 440 is used for the corresponding received object of the title of the received object data in second corresponding relationship
The unique identification information of data is updated to the unique identification information of the received object data recalculated.
When the data de-duplication engine for receiving object data and the data de-duplication engine for executing update are not same
When a engine, then the data write-in processing module 420 in the data de-duplication engine of object data is received for recalculating
The unique identification information of the received object data is sent to target according to the unique identification information determination recalculated
Data de-duplication engine is deleted processing again and judges whether the object data is repeated data, target data de-duplication engine
After completing to delete processing again, the corresponding relationship maintenance module 440 of the data de-duplication engine of object data is received described the
By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships
It is updated to the unique identification information of the received object data recalculated.
Optionally, in the present embodiment, the receiving module 410 is also used to receive the target object of the transmission of client 300
The read request of data.
When client 300 need to obtain object data, the data de-duplication engine further includes reading data processing mould
Block 450, the reading data processing module 450 are used for the second corresponding pass according to the name query of the target object data
System, when inquiring the unique identification information of the target object data in second corresponding relationship, then according to described the
One corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data are deposited into object
The data name in equipment is stored, the data name in object storage device is deposited into according to the target object data and reads institute
It states target object data and the target object data is fed back into the client 300.
In the present embodiment, optional one when the data de-duplication engine is at least two based on described previously
In implementation, when the data de-duplication engine for receiving read request is same draw with the data de-duplication engine for executing read request
When holding up, the description of the preceding paragraph is seen.In another optional implementation, when receive read request data de-duplication engine with hold
When the data de-duplication engine of row read request is not the same engine, the reading data processing module 450 is specifically used for looking into
The second corresponding relationship for asking the current data de-duplication engine maintenance where the reading data processing module 450, works as inquiry
To the target object data unique identification information when, then determine that target data de-duplication draws according to the second preset rules
It holds up, the read request of the target object data is sent to the target data de-duplication engine.
The reading data processing module 450 is also used to receive the mesh of the target data de-duplication engine feedback
Object data is marked, and the target object data is fed back into the client 300.
In conclusion data processing method, device provided by the embodiment of the present application and server, client directly will be right
Image data is sent to data de-duplication engine, by data de-duplication engine according to the first corresponding relationship safeguarded thereon and
Two corresponding relationships carry out data de-duplication processing to the received object data, and identical object data can be all addressed to together
One target data de-duplication engine is deleted processing again, to guarantee the identical object data of content in object storage device
It can be only written into once in cluster, realize the overall situation and delete again, effectively reduce the memory space usage amount of object storage system.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one
A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to
Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting
Standby intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.It should also be noted that similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and explained.
Claims (16)
1. a kind of data processing method, which is characterized in that described applied to the data de-duplication engine in object storage system
It include that the server of the data de-duplication engine is installed and is set by least one object storage in object storage system
The object storage device cluster of standby composition, the data de-duplication engine and at least one described object storage device communication link
It connects, maintenance has the first corresponding relationship in the data de-duplication engine, and first corresponding relationship includes the repeated data
The unique identification information and object data for deleting the object data of engine maintenance are deposited into the data name in object storage device
Corresponding relationship, which comprises
Receive the object data that client is sent;
Data de-duplication processing is carried out to judge the reception to received object data according to first corresponding relationship
Object data whether in the object storage device cluster be repeated data;
When the received object data is not repeated data, obtained according to the unique identification information of the received object data
It obtains the received object data and is deposited into the data name in the object storage device cluster, and according to the number of acquisition
Corresponding object storage device in the object storage device cluster is sent by the received object data according to title, by institute
The unique identification information and the received object data for stating received object data are deposited into corresponding object storage device
The corresponding relationship of data name be inserted into first corresponding relationship.
2. data processing method as described in claim 1, which is characterized in that when the data de-duplication engine is at least two
When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing
The first corresponding relationship;
It is described that data de-duplication processing is carried out to received object data, comprising:
When first data de-duplication engine receives the object data that the client is sent, determined by the second preset rules
The received object data is sent to described by the second data de-duplication engine for executing data de-duplication processing operation
Second data de-duplication engine;
First corresponding relationship pair of the second data de-duplication engine based on the second data de-duplication engine maintenance
The received object data carries out data de-duplication processing.
3. data processing method as claimed in claim 2, which is characterized in that described when the received object data is not weigh
When complex data, it is corresponding right that the unique identification information of the received object data and the received object data are deposited into
As the corresponding relationship of the data name in storage equipment is inserted into first corresponding relationship, comprising:
When the received object data is not repeated data, the second data de-duplication engine will be described received right
The unique identification information of image data and the received object data are deposited into the data in the corresponding object storage device
The corresponding relationship of title is inserted into the first corresponding relationship of the second data de-duplication engine maintenance.
4. data processing method as described in claim 1, which is characterized in that maintenance has the in the data de-duplication engine
Two corresponding relationships, second corresponding relationship include the object that the reception client of the data de-duplication engine maintenance is sent
The corresponding relationship of the unique identification information of the title and object data of data, the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists
It is not recorded in second corresponding relationship, then by the title of the received object data and the received object data
The corresponding relationship of unique identification information is inserted into second corresponding relationship.
5. data processing method as claimed in claim 4, which is characterized in that the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists
Have record in second corresponding relationship, recalculate the unique identification information of the received object data, described
By the unique identification information of the corresponding received object data of the title of the received object data in two corresponding relationships
It is updated to the unique identification information of the received object data recalculated.
6. data processing method as claimed in claim 4, which is characterized in that first corresponding relationship further includes object data
The corresponding reference count of unique identification information, the reference count be used for it is every judge an object data for repeated data when
Primary counting is accordingly increased to the unique identification information of the object data, the method also includes:
If the corresponding relationship of the unique identification information of the title of the received object data and the received object data exists
Have record in second corresponding relationship, in first corresponding relationship, by it is described have record described in it is received right
The corresponding reference count of the unique identification information of image data reduces primary;
When the corresponding reference count of unique identification information for having received object data described in record is 0, Xiang Cun
The object storage device transmission deletion for containing the unique identification information corresponding objects data of the received object data is described right
Answer the instruction of object data.
7. data processing method as claimed in claim 6, which is characterized in that when the data de-duplication engine is at least two
When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing
The first corresponding relationship, and maintenance second corresponding closed by the object data that client is sent is established based on oneself receiving
System;
It is closed if the title of the received object data is corresponding with the unique identification information of the received object data
It ties up in second corresponding relationship and has record, in first corresponding relationship, have reception described in record for described
Object data the corresponding reference count of unique identification information reduce it is primary, comprising:
The third repeating data delete engine and receive the object data that the client is sent according to third preset rules, if institute
The corresponding relationship of the title of received object data and the unique identification information of the received object data is stated in the third
Have record in second corresponding relationship of data de-duplication engine maintenance, has received object described in record according to described
The unique identification information of data determines the 4th deduplication engine;
In the first corresponding relationship of the 4th deduplication engine maintenance, has received object described in record for described
The corresponding reference count of the unique identification information of data reduces primary.
8. data processing method as claimed in claim 4, which is characterized in that the method also includes:
When receiving the read request for the target object data that the client is sent, according to the title of the target object data
Inquire second corresponding relationship;
When inquiring the unique identification information of the target object data in second corresponding relationship, then according to described
One corresponding relationship, the corresponding target object data of unique identification information for inquiring the target object data are deposited into object
Store the data name in equipment;
The data name in object storage device, which is deposited into, according to the target object data reads the target object data simultaneously
The target object data is fed back into the client.
9. data processing method as claimed in claim 8, which is characterized in that when the data de-duplication engine is at least two
When a, each data de-duplication engine maintenance is established based on the object data for oneself completing data de-duplication processing
The first corresponding relationship, and maintenance second corresponding closed by the object data that client is sent is established based on oneself receiving
System;
It is described when receiving the read request for the target object data that the client is sent, according to the target object data
Second corresponding relationship described in name query, comprising:
5th data de-duplication engine inquires the 5th repeat number when receiving the read request of the target object data
According to the second corresponding relationship for deleting engine maintenance, wherein the client is according to third preset rules by the target object number
According to read request be sent to the 5th data de-duplication engine;
It is described when inquiring the unique identification information of the target object data in second corresponding relationship, then according to institute
The first corresponding relationship is stated, the corresponding target object data of unique identification information for inquiring the target object data is deposited into
Data name in object storage device, comprising:
When the 5th data de-duplication engine is in the second corresponding relationship of the 5th data de-duplication engine maintenance
When inquiring the unique identification information of the target object data, then the 6th data de-duplication is determined according to the second preset rules
The read request of the target object data is sent to the 6th data de-duplication engine by engine;The sixfold plural number
According to engine is deleted in first corresponding relationship of the 6th data de-duplication engine maintenance, according to the target object
The unique identification information of data determines that the target object data is deposited into the data name in object storage device;
It is described that the reading of data name in the object storage device target object number is deposited into according to the target object data
According to feeding back to the client, comprising:
The 6th data de-duplication engine is deposited into the data name in object storage device according to the target object data
Title reads the target object data from the object storage device, and the target object data is fed back to the 5th weight
Complex data deletes engine, and the target object data is fed back to the client by the 5th data de-duplication engine.
10. data processing method as described in claim 1, which is characterized in that described to be repeated to received object data
Data delete processing, comprising:
When the received object data is greater than default size, the data de-duplication engine is by the received number of objects
According to the multiple subobject data for meeting the default size are divided into, data de-duplication is carried out to each subobject data
Processing.
11. a kind of data processing equipment, which is characterized in that be applied to object storage system, include in the object storage system
The object storage device collection for being equipped with the server of data de-duplication engine and being made of at least one object storage device
Group, the data de-duplication engine and the communication connection of at least one described object storage device, the data processing equipment packet
The data de-duplication engine is included, the data de-duplication engine includes:
Corresponding relationship maintenance module, for safeguarding the first corresponding relationship, first corresponding relationship includes that the repeated data is deleted
Except the unique identification information and object data of the object data of engine maintenance are deposited into the data name in object storage device
Corresponding relationship;
Receiving module, for receiving the object data of client transmission;
Processing module is written in data, for carrying out data de-duplication to received object data according to first corresponding relationship
Processing to judge the received object data whether in the object storage device cluster for repeated data;
Sending module is used for when the received object data is not repeated data, according to the received object data
Unique identification information obtains the data name that the received object data is deposited into the object storage device cluster, and root
It is corresponding right in the object storage device cluster to send the received object data to according to the data name of acquisition
As storing equipment;
The corresponding relationship maintenance module is used for the unique identification information of the received object data and described received right
The corresponding relationship for the data name that image data is deposited into corresponding object storage device is inserted into first corresponding relationship.
12. data processing equipment as claimed in claim 11, which is characterized in that when the repetition that the data processing equipment includes
When data deletion engine is at least two, the corresponding relationship maintenance module of each data de-duplication engine is for safeguarding base
It is described in the first corresponding relationship that the object data that current data de-duplication engine completes data de-duplication processing is established
Current data de-duplication engine is the data de-duplication engine that the corresponding relationship maintenance module is currently located;
The object that the data write-in processing module is also used to send the client received by the second preset rules
Data are analyzed, and to determine the target data de-duplication engine for executing data de-duplication processing operation, and are connect described
The object data of receipts is forwarded to the target data de-duplication engine;
The data write-in processing module of the target data de-duplication engine is used to draw based on the target data de-duplication
The first corresponding relationship for holding up maintenance carries out data de-duplication processing to the received object data.
13. data processing equipment as claimed in claim 11, which is characterized in that the corresponding of the data de-duplication engine is closed
It is that maintenance module is also used to safeguard that the second corresponding relationship, second corresponding relationship include the data de-duplication engine maintenance
Reception client send object data title and object data unique identification information corresponding relationship;
If the corresponding relationship maintenance module be also used to the received object data title and the received object data
The corresponding relationship of unique identification information do not recorded in second corresponding relationship, then by the received object data
The corresponding relationship of the unique identification information of title and the received object data is inserted into second corresponding relationship.
14. data processing equipment as claimed in claim 13, which is characterized in that the receiving module is also used to receive the visitor
The read request for the target object data that family end is sent;
The data de-duplication engine further includes reading data processing module, and the reading data processing module is used for according to institute
The second corresponding relationship described in the name query of target object data is stated, when inquiring the target in second corresponding relationship
When the unique identification information of object data, then according to first corresponding relationship, unique mark of the target object data is inquired
Know the corresponding target object data of information and be deposited into the data name in object storage device, according to the target object number
The target object data is read according to the data name being deposited into object storage device and feeds back the target object data
To the client.
15. data processing equipment as claimed in claim 14, which is characterized in that when the repetition that the data processing equipment includes
When data deletion engine is at least two, the corresponding relationship maintenance module of each data de-duplication engine is for safeguarding base
In the first corresponding relationship that the object data that current data de-duplication engine completes data de-duplication processing is established, and
It safeguards and the second corresponding relationship established by the object data that client is sent is received based on current data de-duplication engine,
The current data de-duplication engine is the data de-duplication engine that the corresponding relationship maintenance module is currently located;
The current repeated data that the reading data processing module is specifically used for where inquiring the reading data processing module is deleted
Except the second corresponding relationship of engine maintenance, when inquiring the unique identification information of the target object data, then according to second
Preset rules determine target data de-duplication engine, and the read request of the target object data is sent to the target and is repeated
Data delete engine;
The reading data processing module is also used to receive the target object of the target data de-duplication engine feedback
Data, and the target object data is fed back into the client.
16. a kind of server, which is characterized in that be applied to object storage system, the object storage system includes by least one
The object storage device cluster of a object storage device composition, the server include:
Memory, for storing one or more programs;
Processor;
When one or more of programs are executed by the processor, such as the described in any item sides of claim 1-10 are realized
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009925.0A CN108268216B (en) | 2018-01-05 | 2018-01-05 | Data processing method, device and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009925.0A CN108268216B (en) | 2018-01-05 | 2018-01-05 | Data processing method, device and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108268216A CN108268216A (en) | 2018-07-10 |
CN108268216B true CN108268216B (en) | 2019-11-12 |
Family
ID=62773412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810009925.0A Active CN108268216B (en) | 2018-01-05 | 2018-01-05 | Data processing method, device and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108268216B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408761A (en) * | 2018-10-16 | 2019-03-01 | 翟红鹰 | A kind of filter method of repetitive requests, system, equipment and storage medium |
CN110427347A (en) * | 2019-07-08 | 2019-11-08 | 新华三技术有限公司成都分公司 | Method, apparatus, memory node and the storage medium of data de-duplication |
CN111510497A (en) * | 2020-04-17 | 2020-08-07 | 上海七牛信息技术有限公司 | Processing method and system for edge storage |
CN114265551B (en) * | 2021-12-02 | 2023-10-20 | 阿里巴巴(中国)有限公司 | Data processing method in storage cluster, storage node and equipment |
CN117131036B (en) * | 2023-10-26 | 2023-12-22 | 环球数科集团有限公司 | Data maintenance system based on big data and artificial intelligence |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8281021B1 (en) * | 2009-02-12 | 2012-10-02 | Sprint Communications Company L.P. | Multiple cookie handling |
CN103279502A (en) * | 2013-05-06 | 2013-09-04 | 北京赛思信安技术有限公司 | Framework and method of repeated data deleting file system combined with parallel file system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9665591B2 (en) * | 2013-01-11 | 2017-05-30 | Commvault Systems, Inc. | High availability distributed deduplicated storage system |
-
2018
- 2018-01-05 CN CN201810009925.0A patent/CN108268216B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8281021B1 (en) * | 2009-02-12 | 2012-10-02 | Sprint Communications Company L.P. | Multiple cookie handling |
CN103279502A (en) * | 2013-05-06 | 2013-09-04 | 北京赛思信安技术有限公司 | Framework and method of repeated data deleting file system combined with parallel file system |
Also Published As
Publication number | Publication date |
---|---|
CN108268216A (en) | 2018-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108268216B (en) | Data processing method, device and server | |
CN108769111A (en) | A kind of server connection method, computer readable storage medium and terminal device | |
CN107704202B (en) | Method and device for quickly reading and writing data | |
CN109213604A (en) | A kind of management method and device of data source | |
CN110737682A (en) | cache operation method, device, storage medium and electronic equipment | |
CN106294352A (en) | A kind of document handling method, device and file system | |
CN111177143B (en) | Key value data storage method and device, storage medium and electronic equipment | |
CN107038092B (en) | Data copying method and device | |
CN111475105A (en) | Monitoring data storage method, device, server and storage medium | |
CN110851474A (en) | Data query method, database middleware, data query device and storage medium | |
CN112579595A (en) | Data processing method and device, electronic equipment and readable storage medium | |
CN105653209A (en) | Object storage data transmitting method and device | |
CN109947729A (en) | A kind of real-time data analysis method and device | |
CN112860953A (en) | Data importing method, device, equipment and storage medium of graph database | |
CN113779286B (en) | Method and device for managing graph data | |
WO2021016050A1 (en) | Multi-record index structure for key-value stores | |
CN104956340A (en) | Scalable data deduplication | |
CN111046106A (en) | Cache data synchronization method, device, equipment and medium | |
CN108154024A (en) | A kind of data retrieval method, device and electronic equipment | |
CN112650692A (en) | Heap memory allocation method, device and storage medium | |
CN105939402A (en) | MAC table entry obtaining method and device | |
CN109947842A (en) | Date storage method, apparatus and system in distributed memory system | |
CN109408496A (en) | A kind of method and device reducing data redundancy | |
US20150134671A1 (en) | Method and apparatus for data distribution and concurrence | |
CN111131197B (en) | Filtering strategy management system and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |