CN104063377B - Information processing method and use its electronic equipment - Google Patents
Information processing method and use its electronic equipment Download PDFInfo
- Publication number
- CN104063377B CN104063377B CN201310086344.4A CN201310086344A CN104063377B CN 104063377 B CN104063377 B CN 104063377B CN 201310086344 A CN201310086344 A CN 201310086344A CN 104063377 B CN104063377 B CN 104063377B
- Authority
- CN
- China
- Prior art keywords
- data
- data block
- block
- characteristic
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
Abstract
There is provided a kind of information processing method and use its electronic equipment.The electronic equipment has the first file, and first file corresponds to the first data, and first data are at least two first data blocks according to fixed length segmentation regular partition.Described information processing method includes:The characteristic for characterizing described at least two first data blocks is obtained from first data;After the second file corresponding to the second data is formed after first file is changed, by determining the second data block based on the characteristic first data and second data, second data block is the data block relative to first data variation in second data;The data block of correspondence first at least two first data block is updated with second data block.Using the technical scheme of the embodiment of the present invention, the data block that each content in the data block of file is changed can be accurately determined, so that the resource required for saving information processing.
Description
Technical field
The present invention relates to areas of information technology, more particularly, to a kind of information processing method and the information processing is used
The electronic equipment of method.
Background technology
With the development of information technology, various types of files, such as text, image file, audiovisual text are generated
Part, the program file produced in program operation process etc..These files would generally be changed during use, so as to produce
The file of raw different editions.The file of the different types of file and different editions generally needs to occupy substantial amounts of storage sky
Between.In order to provide more storage resources, the storage except increasing the such as terminal of smart mobile phone, panel computer or calculating etc
Outside space, the cloud storage service for storage file of such as Dropbox etc is also progressively developed, so as to effectively solve terminal
Memory space inadequate problem, and the shared of file can be easily realized between different terminal devices.
For the old version and saving memory space of document retaining, generally divide documents into data block and stored.
Specifically, the data block of equal length is divided documents into, when file changes due to the reason such as being changed, can be according to
The sequencing of data block compares the data block and the data block of amended file of the file before modification block by block, and stores
The all of data block of first data block for changing and back.In this document processing scheme, file is only have found
Data block in the data block that changes of first content, so, when the preceding data block of the order of file changes
Wait, even if the posterior data block of order does not change, it is also desirable to be updated wait process, it is all so as to have wasted unnecessarily
Such as memory space and the resource of transmission capacity.
Therefore, it is desirable to exist at the information of the data block that each content in a kind of data block that can determine file is changed
Reason mechanism.
The content of the invention
The embodiment of the invention provides a kind of information processing method and the electronic equipment using the information processing method, its energy
The data block that each content in the data block of file is changed enough is accurately determined, so that the money required for saving information processing
Source.
On the one hand, there is provided a kind of information processing method, electronic equipment is applied to, the electronic equipment has the first text
Part, first file corresponds to the first data, and first data are at least two first according to fixed length segmentation regular partition
Data block, methods described includes:Obtain characteristic from first data, the characteristic be used to characterizing it is described at least
Two the first data blocks;After the second file is formed after first file is changed, second file corresponds to second
Data, by determining the second data block based on the characteristic first data and second data, this
Two data blocks are the data blocks relative to first data variation in second data;Updated with second data block
The data block of correspondence first in described at least two first data blocks.
It is described to be may include the step of acquisition characteristic from first data in described information processing method:Obtain
The adjacent data of adjacent data blocks in described at least two first data blocks is taken as the characteristic.
It is described by based on the characteristic first data and described in described information processing method
The step of two data are to determine second data block may include:The characteristic is found from second data;It is based on
Second data are divided into data block by the characteristic for being found;By accordingly comparing first data block and the second data
Data block determine second data block.
It is described to wrap the step of find the characteristic from second data in described information processing method
Include:Using the first hash algorithm the characteristic is found from second data;It is described by accordingly comparing described the
The step of data block of one data block and the second data is to determine second data block may include:Calculated by using the second Hash
Method accordingly compares the data block of first data block and the second data to determine second data block.
In described information processing method, the adjacent data has a predetermined length, it is described using the first hash algorithm from
The step of finding the characteristic in second data may include:Using the first hash algorithm to each in the first data
Characteristic performs Hash calculation and obtains first group of cryptographic Hash;In units of the predetermined length using the first hash algorithm by
It is secondary that Hash calculation is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described first
Group cryptographic Hash is compared, and its cryptographic Hash in second data is identical with the cryptographic Hash in first group of cryptographic Hash
The data of predetermined length be defined as characteristic.
It is described that by using the second hash algorithm, accordingly relatively the data block of first data block and the second data is come
The step of determining second data block may include:Hash is performed to the data block of second file using the second hash algorithm
The 3rd group of cryptographic Hash is calculated and obtains, it is corresponding to each data block in the first data with the second file using the second hash algorithm
Data block perform Hash calculation and acquisition the 4th group of cryptographic Hash;Accordingly compare the 3rd group of cryptographic Hash and the 4th group of Hash
Value, the second number is defined as by the data block different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data
According to block.
In described information processing method, in renewal at least two first data block with second data block
The first data block of correspondence the step of may include:Second data block is sent to and the electronic equipment via communication network
The remote equipment of communication connection;Second data block is stored in the remote equipment.
Described information processing method may also include:When second file is needed, obtain second data block and
Rest block in first data block in addition to data block corresponding with second data block;Assemble second data
Block and the rest block are obtaining the data of second file.
On the other hand, there is provided a kind of electronic equipment for information processing, the electronic equipment has the first file, should
First file corresponds to the first data, and the electronic equipment includes:Division unit, for according to fixed length segmentation rule by described the
One data are divided at least two first data blocks;Acquiring unit, it is described for obtaining characteristic from first data
Characteristic is used to characterize described at least two first data blocks;Determining unit, for after first file is changed
Formed after the second file, second file corresponds to the second data, by based on the characteristic first number
Determine the second data block according to second data, second data block is relative to the described first number in second data
According to the data block of change;Updating block, for right in second data block renewal at least two first data block
Answer the first data block.
In the electronic equipment, the acquiring unit can obtain the adjacent data in described at least two first data blocks
The adjacent data of block is used as the characteristic.
In the electronic equipment, the determining unit may include:Part is searched, for being found from second data
The characteristic, and the characteristic that will be found is supplied to the division unit, is incited somebody to action with based on the characteristic for being found
Second data are divided into data block;Comparing unit, for the number by accordingly comparing first data block and the second data
Second data block is determined according to block.
In the electronic equipment, the search part finds institute using the first hash algorithm from second data
State characteristic;The comparing unit can accordingly compare first data block and the second number by using the second hash algorithm
According to data block determine second data block.
In the electronic equipment, the adjacent data can have predetermined length, and the search part can be by following behaviour
Make to find the characteristic:Hash calculation is performed to each characteristic in the first data using the first hash algorithm and is obtained
To first group of cryptographic Hash;Hash gradually is performed to second data using the first hash algorithm in units of the predetermined length
Calculate and obtain second group of cryptographic Hash;Second group of cryptographic Hash is seriatim compared with first group of cryptographic Hash, and by institute
The data for stating the cryptographic Hash identical predetermined length in its cryptographic Hash in the second data and first group of cryptographic Hash are defined as
Characteristic.
In the electronic equipment, the comparing unit can determine second data block by following operation:Using
Two hash algorithms perform Hash calculation and obtain the 3rd group of cryptographic Hash to the data block of second file, are calculated using the second Hash
Method performs Hash calculation and obtains the 4th group of Hash to data block corresponding with each data block of the second file in the first data
Value;Accordingly relatively the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data with it is corresponding
The data block that the cryptographic Hash of the first data block is different is defined as the second data block.
In the electronic equipment, the updating block can connect remote equipment, the updating block via communication network
Described renewal can be as follows performed to operate:Second data block is sent to the remote equipment via communication network so that
Second data block is stored in the remote equipment.
The electronic equipment may also include:Module units, for when second file is needed, obtaining second number
According to the rest block in block and first data block in addition to data block corresponding with second data block, and assemble
Second data block and the rest block are obtaining the data of second file.
In the technical scheme of information processing method and electronic equipment according to embodiments of the present invention, by obtaining each number
According to the characteristic of block, and based on this feature data come the data block after comparing the data block before change and changing such that it is able to
The data block that each content in the data block of file is changed is accurately determined, to save the resource required for information processing.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be in embodiment or description of the prior art
The required accompanying drawing for using is briefly described, it should be apparent that, drawings in the following description are only some realities of the invention
Example is applied, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 be a diagram that the flow chart of information processing method according to embodiments of the present invention;
Fig. 2 be a diagram that the flow chart of the determination step in information processing method according to embodiments of the present invention;
Fig. 3 be a diagram that the determination step that the utilization hash algorithm in information processing method according to embodiments of the present invention is realized
Rapid flow chart;
Fig. 4 schematically illustrates storage and the group of the data obtained using information processing method according to embodiments of the present invention
Dress;
Fig. 5 is the block diagram of the electronic equipment for schematically illustrating according to embodiments of the present invention;
Fig. 6 is the block diagram for schematically illustrating the determining unit in electronic equipment according to embodiments of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.
Fig. 1 be a diagram that the flow chart of information processing method 100 according to embodiments of the present invention.
Information processing method according to embodiments of the present invention can be applied to various electronic equipments, including but not limited to intelligence electricity
Phone, computer, personal digital assistant, Website server, bibliographic data base or data storage server etc., electronic equipment
Particular type is not construed as limiting the invention.
The electronic equipment has the first file, and first file can be various types of files, for example text text
Part, image file, audio video document, program file etc., the type of file are not also construed as limiting the invention.First file is made
For data storage in the storage device, wherein, the storage device is high in the clouds storage device, and certain storage device can also be this
Ground storage device.For example, first file corresponds to the first data.First data storage is beyond the clouds in storage device, example
Such as, Dropbox.
When first data are stored, first data are divided at least two first according to fixed length segmentation rule
Data block, to be stored.The fixed length segmentation rule is the rule that the size based on data is divided, and it can be isometric
Division rule, such as in the case where the first data are 6,000,000, it can be divided into 6 the first data blocks(Each first data
The size of block is 1,000,000);The division rule with predetermined space rule is can also be, for example, 6,000,000 the first data can be divided
To be respectively 1,000,000,2,000,000,1,000,000,2,000,000 4 the first data blocks.In addition to fixed length segmentation rule, can also foundation times
What other dividing mode divides the data of for example described first data, for example, drawn according to the content characteristic of the first data
Point, specific division rule is not construed as limiting the invention, as long as first data are divided at least two data blocks
.
Described information processing method 100 may include:Characteristic is obtained from first data, the characteristic is used
In sign at least two first data block(S110);After the second file is formed after first file is changed,
Second file corresponds to the second data, by based on the characteristic first data and second data come
Determine the second data block, second data block is the data block relative to first data variation in second data
(S120);The data block of correspondence first at least two first data block is updated with second data block(S130).
In S110, characteristic is obtained from first data, the characteristic is used to characterize described at least two
Individual first data block such that it is able to distinguished based on the characteristic in described at least two first data blocks each first
Data block.
As an example for obtaining characteristic, the adjacent data in described at least two first data blocks can be obtained
The adjacent data of block is used as the characteristic.With first data 6 1,000,000 are divided into as 6,000,000 and by isometric in order
The first data block as a example by, can using the 64 adjacent bytes in the first adjacent data block 1 and 2 as characteristic, and according to
It is secondary respectively using the first data block 2 and 3 ..., the adjacent data between the first data block 5 and 6 is used as characteristic.64 word
The adjacent data of section for example may include preceding 32 byte in rear 32 byte and the first data block 2 in the first data block 1, alternatively may be used
Preceding 14 byte in rear 50 byte and the first data block 2 in including the first data block 1, also alternatively in the first data block 1
Rear 64 byte, or be alternatively preceding 64 byte etc. in the first data block 2,64 bytes here be only it is schematical,
Other length can as needed be taken.Additionally, the length of different characteristics can be change, and each feature
The feature of the first data block that the length of data can for example be characterized based on this feature data determines.
As another example for obtaining characteristic, the characteristic can be obtained based on the content of the first data.
As a example by content by first data is " today is that March 5, Tuesday, weather in 2013 are cloudy ", can be by the content
In content with low uncertainty(For example, "Yes", " 2013 ", " weather ")The characteristic is set to, such that it is able to easily distinguish
The first data block that each characteristic is characterized.Further, it is also possible to be based on other characteristics of the content of first data(Example
Such as the frequency of occurrences)Etc. come be provided for characterize each the first data block characteristic.
As set forth above, it is possible to take different modes to obtain the characteristic to characterize each the first data block, specifically
Acquisition modes be not construed as limiting the invention.The length of each characteristic, its position in the first data etc. are not yet
It is construed as limiting the invention.
In S120, after the second file is formed after first file is changed, second file corresponds to the
Two data, by determining the second data block based on the characteristic first data and second data, should
Second data block is the data block relative to first data variation in second data.
When first file such as the use due to file or editor is changed and forms the second file, with file pair
The data answered also accordingly change, such as the first data corresponding with the first file change into the second number corresponding with the second file
According to.At this point it is possible to be based on characteristic first data and second data of the acquisition in S110 to determine
State the data block relative to first data variation in the second data(That is the second data block).
Determination mode on second data block, in the case of acquired characteristic difference, can take not
With mode determine second data block.Exemplary description is carried out with reference to Fig. 2 and Fig. 3, more fully to disclose this
Invention.
Can be determined using the step of each shown in Fig. 2 in second data relative to first data variation
Data block, i.e. the second data block.
Fig. 2 be a diagram that the flow chart of the determination step in information processing method according to embodiments of the present invention.Shown in Fig. 2
Determination step include:The characteristic is found from second data(S121);Will based on the characteristic for being found
Second data are divided into data block(S122);By accordingly comparing the data block of first data block and the second data come really
Fixed second data block(S123).
In S121, it is possible to use the first Hash(Hash)Algorithm finds the characteristic from second data.
The step is used to be found from the second data of the second file after change the characteristic of the first data block, so as to literary by first
Each first data block of part is mapped with each data block of the second file, to judge each data block phase of the second file
Whether changed for the first data block.
Hash algorithm is that a kind of compression maps, its output that input data of random length is transformed into length-specific, i.e.,
Cryptographic Hash(Also referred to as hashed value).The space of hashed value is generally much less than the space of input, so as to substitute institute with the hashed value
State and enter data to be processed, treatment effeciency is improved with the case where the degree of accuracy is ensured.There are various types of Hash to calculate
Method, including but not limited to MD5(Message Digest5:Eap-message digest 5)Hash algorithm, SHA1 (Secure Hash
Algorithm, Secure Hash Algorithm) hash algorithm etc..Therefore, first hash algorithm can be MD5 hash algorithms, SHA1
Any one in hash algorithm etc..Further, it is also possible to any other algorithm in addition to hash algorithm come from described second
The characteristic is found in data, the type of the algorithm for being used does not constitute the limitation to the embodiment of the present invention.
In S122, the second data are divided into by data block based on the characteristic for being found.By the feature for being found
Data are for characterizing described at least two first data blocks of the first data, so characterizing the based on identical characteristic
Each first data block and each data block of the second file of one file.Therefore, incited somebody to action by based on the characteristic for being found
Second data are divided into data block, according to mode corresponding with each first data block of the first file by the second file
Second data are divided into data block.
In S123, described second is determined by accordingly comparing the data block of first data block and the second data
Data block.Due to the second of the second file being counted according to mode corresponding with each first data block of the first file in S122
According to being divided into data block, thus can compare block by block in S123 the first file the first data block and the second file with
The corresponding data block of one data block, so that it is determined that relative to the data block of first data variation in second data, i.e.,
Second data block.When the data block of first data block and the second data is accordingly compared, it is also possible to take hash algorithm
To improve relative efficiency, for example, accordingly compare the number of first data block and the second data by using the second hash algorithm
Second data block is determined according to block.Second hash algorithm can be identical with the first hash algorithm, it is also possible to different, even
Can be other algorithms in addition to hash algorithm.
Fig. 3 be a diagram that the determination step that the utilization hash algorithm in information processing method according to embodiments of the present invention is realized
Rapid flow chart.In the example of fig. 3, using the adjacent data of the adjacent data blocks in described at least two first data blocks as
The characteristic, and the adjacent data has predetermined length.
The determination step that the utilization hash algorithm is realized includes:Using first hash algorithm in the first data
Each characteristic performs Hash calculation and obtains first group of cryptographic Hash (S321);First is utilized in units of the predetermined length
Hash algorithm gradually performs Hash calculation and obtains second group of cryptographic Hash to second data(S322);By second group of cryptographic Hash
Seriatim it is compared with first group of cryptographic Hash, and by its cryptographic Hash in second data and first group of Hash
The data of the cryptographic Hash identical predetermined length in value are defined as characteristic(S323);Based on the characteristic for being found by
Two data are divided into data block(S324);Hash calculation is performed simultaneously to the data block of second file using the second hash algorithm
Obtain the 3rd group of cryptographic Hash(S325);It is corresponding to each data block in the first data with the second file using the second hash algorithm
Data block perform Hash calculation and acquisition the 4th group of cryptographic Hash(S326);Accordingly compare the 3rd group of cryptographic Hash and the 4th
Group cryptographic Hash, the data block different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data is defined as
Second data block(S327).
Step S321, S322 and S323 in Fig. 3 are used to realize the step S121 in Fig. 2.Include 6 with the first data below
Individual 1,000,000 the first data block and using adjacent 64 byte data of adjacent data blocks as the characteristic(That is characteristic
1-5)As a example by explanation.
In S321, for example using MD5 hash algorithms to the first data block 1 and 2 ... between the first data block 5 and 6
5 characteristics of 32 bytes perform Hash calculation and obtain including 5 the first of cryptographic Hash group of cryptographic Hash.In S322, profit
Hash meter is performed to the 1-64 bytes of the second file, 2-65 bytes, 3-66 bytes ... with the first hash algorithm
Calculate, until performing Hash calculation to all of second data, to obtain second group of cryptographic Hash.In S323, by second group of Kazakhstan
Uncommon value is seriatim compared with first group of cryptographic Hash, and by its cryptographic Hash in second data with described first group
The data of the byte of cryptographic Hash identical 64 in cryptographic Hash are defined as characteristic, so as to find the first data from the second data
Even in 5 characteristics in all or part, can not find all of 5 characteristics.
S324 in Fig. 3 is identical with the S122 in Fig. 2.If finding 5 characteristics in S323, by S324
Two data are divided into and 6 the first data blocks corresponding 6 data blocks completely;If finding characteristic 1-4 in S323,
Without finding characteristic 5, then the second data are divided into 5 data block 1-5, the data block 5 in the second data in S324
Corresponding to first data block 5 and 6 the two first data blocks;If not finding all of characteristic 1-5 in S323,
Then may determine that and integrally change without being further subdivided into block in the second data.
Step S325, S326 and S327 in Fig. 3 are used to realize the step S123 in Fig. 2.With the first data in S324
Block has accordingly been divided after the data block of the second data, can block by block compare the data block of the second data and corresponding
The first data block.As a example by it have found characteristic 1-4 without finding characteristic 5, using the second hash algorithm to institute
The data block 1-5 for stating the second data performs Hash calculation and obtains the 3rd group of cryptographic Hash(S325);Using the second hash algorithm pair
First data block 1,2,3,4,5-6 performs Hash calculation and obtains the 4th group of cryptographic Hash(S326);Accordingly compare the second data
The data block 1 of data block 1 and first cryptographic Hash ... the cryptographic Hash of the data block 4 of data block 4 and first of the second data,
The cryptographic Hash of the data block 5-6 of data block 5 and first of two data, and by its cryptographic Hash in second data and corresponding
The data block that the cryptographic Hash of one data block is different is defined as the second data block(S327).
As it was previously stated, in the case of the mode difference of the characteristic for obtaining, generally correspondingly to take different sides
Formula determines second data block.Number is characterized with reference to by curriculum offering with low uncertainty in the content of second file
According to situation further illustrate the determination of the second data.
Content with first data be " today is that March 5, Tuesday, weather in 2013 are cloudy ", will with "Yes",
" 2013 ", " weather " corresponding data are set to as a example by 3 characteristics.First data be divided into " today is ",
" March 5, Tuesday in 2013 ", " weather is cloudy " corresponding three first data block 1-3, it is respectively by 3 features
Data characterization.If the first file is modified to second that content is " today is that March 7, Thursday, weather in 2013 are cloudy "
File, then in can determining second data based on the characteristic first data and second data
Relative to the data block of first data variation, i.e. the second data block.Specifically, based on 3 characteristics come block-by-block
Ground compares the first data and second data, so as to content is " March 7, week in 2013 in easily finding out the second data
Four " data block there occurs change, so that it is determined that second data block.
In S130, the data of correspondence first at least two first data block are updated with second data block
Block.When the data storage block in the local storage of electronic equipment, the renewal step in the S130 can be and first
Block in data block accordingly stores second data block, or using second data block replace in the first data with
Corresponding first data block of second data block.When the data storage block in the remote equipment being coupled in communication with electronic equipment
When, the renewal step in the S130 may include to be sent to and the electronic equipment second data block via communication network
The remote equipment of communication connection;Second data block is stored in the remote equipment.
If it is intended to retaining the first file and the second file(That is the file of different editions), then with described second in S130
Data block also retains the correspondence first and counts while replacing the first data block of correspondence at least two first data block
According to block.It is described with reference to Fig. 4.
Fig. 4 schematically illustrates storage and the group of the data obtained using information processing method according to embodiments of the present invention
Dress.In fig. 4, the first data of the first file be divided into the first data block 1, the first data block 2 ... the data blocks of first
N, and according to block storage in electronic equipment or remote equipment;When the first file turns into the second file by modification, utilize
S110 and S120 in above- mentioned information processing method determine that the first data block 2 becomes the second data block, the renewal in S130
In step the second data block is accordingly also stored for the first data block 2(Stored referring to the block on the right side in Fig. 4), now second
The data block of file include the first data block 1, the second data block, the first data block 3 ... the first data blocks of N;When the second text
When part turns into three files by modification, determine that the first data block 3 in the second file becomes using the S110 and S120
3rd data block, the 3rd data block is accordingly also stored in the renewal step in S130 with the first data block 3(Referring to Fig. 4
In right side block storage), now the data block of the 3rd file include the first data block 1, the second data block, the 3rd data
Block ... the first data blocks of N.In practice, in the case where file is had different files by modification, can be with as far as possible few
Memory space store the data block of each version file.This is carried out on the basis of the data block storage for combining Fig. 4 descriptions
Conversion.
In the technical scheme of information processing method according to embodiments of the present invention, by the feature for obtaining each data block
Data, and based on this feature data come the data block after comparing the data block before change and changing such that it is able to it is accurately determined
The data block that each content is changed in the data block of file, and it is empty that required storage is saved in subsequent information processing
Between, network traffics can also be saved when data block is stored remotely.
Additionally, can be with amended file in the information processing method shown in Fig. 1(That is the second file)Data obtain
Take, i.e., when second file is needed, obtain in second data block and first data block except with described the
Rest block outside the corresponding data block of two data blocks;Second data block and the rest block is assembled to obtain described second
The data of file.In the example of fig. 4, when need second file when, obtained from block memory cell the first data block 1,
Second data block, the first data block 3 ... the first data block N, and sequentially each data block for being obtained of assembling to obtain
State the data of the second file;When three file is needed, the first data block 1, the second data are obtained from block memory cell
Block, the 3rd data block, the first data block 4 ... the first data block N, and sequentially each data block for being obtained of assembling to obtain
The data of the 3rd file.
Fig. 5 is the block diagram of the electronic equipment 500 for schematically illustrating according to embodiments of the present invention.
The electronic equipment 500 can be any kind of electronic equipment, including but not limited to intelligent telephone set, calculating
Machine, personal digital assistant, Website server, bibliographic data base or data storage server etc., the particular type of electronic equipment is not
It is construed as limiting the invention.The electronic equipment 500 has the first file, and first file corresponds to the first data.It is described
First file can be various types of files, and the type of file is not also construed as limiting the invention.It is corresponding with the first file
The first data storage in the local storage of electronic equipment 500 or communicate with coupling remote equipment in.
The electronic equipment 500 includes:Division unit 510, for drawing first data according to fixed length segmentation rule
It is divided at least two first data blocks;Acquiring unit 520, for obtaining characteristic, the feature from first data
Data are used to characterize described at least two first data blocks;Determining unit 530, for the shape after first file is changed
Into after the second file, second file corresponds to the second data, by based on the characteristic first data
Determine the second data block with second data, second data block is relative to first data in second data
The data block of change;Updating block 540, for right in second data block renewal at least two first data block
Answer the first data block.
When first data are stored, the division unit 510 draws first data according to fixed length segmentation rule
It is divided at least two first data blocks, to be stored.The fixed length segmentation rule is that the size based on data is divided
Rule, it can be isometric division rule, can also be the division rule with predetermined space rule.Except fixed length segmentation
Outside rule, the division unit 510 can also divide for example described first data according to any other dividing mode
Data, for example, divide according to the content characteristic of the first data.Specific division rule is not construed as limiting the invention, as long as
First data are divided at least two data blocks.
The acquiring unit 520, for obtaining characteristic from first data, the characteristic is used to characterize
Described at least two first data blocks such that it is able in distinguishing described at least two first data blocks based on the characteristic
Each first data block.
The acquiring unit 520 can take different modes to obtain the characteristic, for example, described in can obtaining extremely
The adjacent data of the adjacent data blocks in few two the first data blocks is as the characteristic or based in the first data
Hold to obtain the characteristic.The acquisition modes that acquiring unit 520 is taken do not constitute the limitation to the embodiment of the present invention, respectively
The length of individual characteristic, its position in the first data etc. do not constitute the limitation to the embodiment of the present invention yet.
Used as the adjacent data of adjacent data blocks is obtained as the example of the characteristic, the acquiring unit 520 can
With by the first data block 1 of sequential, the first data block 2 ... the phase of any two adjacent data blocks in the first data block N
Adjacent 64 bytes are used as characteristic.The adjacent data of 64 byte for example may include in rear 32 byte of preceding data block and rear
Preceding 32 byte in data block, alternatively may include rear 50 byte and 14 bytes before rear data block in preceding data block,
Also it is alternatively rear 64 byte in preceding data block, or is alternatively 64 bytes etc. before rear data block, here
64 bytes be only schematical, can as needed take other length.Additionally, the length of different characteristics can
To be change, and the length of each characteristic can for example be based on the spy of the first data block that this feature data be characterized
Levy to determine.
In the case of the characteristic is obtained in the content based on the first data, the content with first data is
As a example by " today is that March 5, Tuesday, weather in 2013 are cloudy ", can be by content with low uncertainty in the content(For example,
"Yes", " weather ")Be set to the characteristic, such that it is able to easily distinguish that each characteristic characterized each first
Data block.Further, it is also possible to be based on other characteristics of the content of first data(Such as frequency of occurrences)Etc. being provided for
Characterize the characteristic of each the first data block.
After the second file is formed after first file is changed, data corresponding to file are also corresponding to be changed,
Such as the first data corresponding with the first file change into the second data corresponding with the second file.The determining unit 530 is led to
Cross based on the characteristic first data and second data to determine the second data block, second data block
It is the data block relative to first data variation in second data.Determination mode on second data block,
In the case of acquired characteristic difference, determining unit 530 can take different modes to determine second data
Block.Exemplary description is carried out with reference to Fig. 6, more fully to disclose the present invention.
Fig. 6 is the block diagram for schematically illustrating the determining unit in electronic equipment according to embodiments of the present invention.Such as Fig. 6 institutes
Show, the determining unit 530 includes:Part 531 is searched, for finding the characteristic from second data, and will
The characteristic for being found is supplied to the division unit 510, is divided into the second data with based on the characteristic for being found
Data block;Comparing unit 532, for by accordingly comparing the data block of first data block and the second data to determine
State the second data block.
The search part 531 can find the characteristic using the first hash algorithm from second data,
The characteristic of the first data block is found from the second data of the second file after change, so as to by the first file each
First data block is mapped with each data block of the second file, to judge each data block of the second file relative to first
Whether data block changes.As it was previously stated, hash algorithm is used to be transformed into specific length into the input data for being about to random length
The output of degree(That is cryptographic Hash)Compression mapping, and in practice can be different type, for example MD5 hash algorithms, SHA1 breathe out
Uncommon algorithm etc..First hash algorithm can be any one hash algorithm, and also can be replaced with any other algorithm,
The type of the algorithm for being used does not constitute the limitation to the embodiment of the present invention.
The characteristic that the search part 531 will be found is supplied to the division unit 510, and division unit 510 is right
The second data are divided into by data block based on found characteristic afterwards.By the characteristic for being found is for characterizing the
Described at least two first data blocks of one data, therefore the second data are divided into number by based on the characteristic for being found
According to block, the second data of the second file are exactly divided into number according to mode corresponding with each first data block of the first file
According to block.
It is described to search in the case of the adjacent data of the acquisition adjacent data blocks of acquiring unit 520 is as the characteristic
Seeking part 531 can find the characteristic by following operation:Using the first hash algorithm to each spy in the first data
Data are levied to perform Hash calculation and obtain first group of cryptographic Hash;In units of the predetermined length using the first hash algorithm gradually
Hash calculation is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described first group
Cryptographic Hash is compared, and by the cryptographic Hash identical in its cryptographic Hash in second data and first group of cryptographic Hash
The data of predetermined length are defined as characteristic.Concrete operations on the search part 531 may refer to combine the step in Fig. 3
The description that rapid S321, S322, S323 are carried out.
Due to division unit 510 according to mode corresponding with each first data block of the first file by the of the second file
Two data are divided into data block, so comparing unit 532 can block by block compare first data block and the second text of the first file
The data block corresponding with the first data block of part, so that it is determined that relative to the number of first data variation in second data
According to block, i.e. the second data block.In order to improve relative efficiency, the comparing unit 532 can be by using the second hash algorithm correspondence
The data block of ground first data block and the second data determines second data block.Second hash algorithm can be with
It is identical with the first hash algorithm, it is also possible to different, even can be substituted with other algorithms in addition to hash algorithm.
As an example, the comparing unit 532 can determine second data block by following operation:Using the second Hash
Algorithm performs the 3rd group of cryptographic Hash of Hash calculation and acquisition to the data block of second file, using the second hash algorithm to the
Data block corresponding with each data block of the second file performs Hash calculation and obtains the 4th group of cryptographic Hash in one data;Correspondence
Ground the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data and corresponding first data
The data block that the cryptographic Hash of block is different is defined as the second data block.Concrete operations on the comparing unit 532 may refer to knot
The description that step S325, S326, S327 in conjunction Fig. 3 are carried out.
In acquiring unit 520 based on the content of the first data to obtain the characteristic in the case of, determining unit 530
The second data block can be determined by based on the characteristic first data and second data as follows.It is false
If the content of first data is " today is that March 5, Tuesday, weather in 2013 are cloudy ", will be with "Yes", " 2013 "
" weather " corresponding data are set to 3 characteristics.3 characteristics be respectively used to characterize with " today is ", " 2013
March 5, Tuesday ", " weather is cloudy " corresponding three first data blocks.If it is " today that the first file is modified to content
That March 7, Thursday, weather in 2013 are cloudy " the second file, then can be based on the characteristic more described first
Data and second data determine the data block in second data relative to first data variation, i.e., with
" on March 7th, 2013, Thursday, " corresponding data block, so that it is determined that the second data block.
The updating block 540 is used to update the correspondence at least two first data block with second data block
First data block.As an example, being deposited via communication network connection remote equipment and in remote equipment in electronic equipment 500
In the case of storage data block, the updating block 540 can as follows perform described renewal and operate:Will be described via communication network
Second data block is sent to the remote equipment so that second data block is stored in the remote equipment.When in electronics
In the local storage of equipment during data storage block, the updating block 540 can perform following renewal and operate:With the first number
Accordingly store second data block according to the block in block, or using second data block replace in the first data with institute
State corresponding first data block of the second data block.
If it is intended to retaining both the first file and the second file(That is the file of different editions), then the updating block
540 also retain institute while the first data block of correspondence at least two first data block is replaced with second data block
State the first data block of correspondence.Mode on both the first data block of storage and the second data block, may refer to be carried out with reference to Fig. 4
Description.
Alternatively, the electronic equipment 500 can also include module units 550(As shown in the dotted line frame in Fig. 5).The group
Dress unit 550 is used to, when second file is needed, obtain second data block and at least two first data block
In rest block in addition to data block corresponding with second data block, and assemble second data block and the residue
Block is obtaining the data of second file.In the case where the second file is modified to the 3rd file, can also use what is be similar to
Mode obtains the data of the 3rd file, so that the file of different editions can be stored to try one's best few memory space.On
The example of the data of file is obtained, the description carried out with reference to Fig. 4 is may refer to.
In the technical scheme of above-mentioned electronic equipment according to embodiments of the present invention, each is obtained by using acquiring unit
The characteristic of data block, and compare the data block before change and the data after change based on this feature data by determining unit
Block such that it is able to be accurately determined the data block that each content in the data block of file is changed, and in subsequent information processing
Memory space required for middle saving, network traffics can also be saved when remotely being stored to data block.
Additionally, it is apparent to those skilled in the art that, for convenience and simplicity of description, foregoing description
Equipment, the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Unit and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, depending on the application-specific and design constraint of technical scheme.Professional and technical personnel
Described function, but this realization can be realized it is not considered that exceeding using distinct methods to each specific application
The scope of the present invention.
If the function is to realize in the form of SFU software functional unit and as independent production marketing or when using, can be with
Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment(Can be individual
People's computer, server, or network equipment etc.)Perform all or part of step of each embodiment methods described of the invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage, random access memory, magnetic disc or CD etc. are each
Planting can be with the medium of store program codes.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
Claims (12)
1. a kind of information processing method, is applied to electronic equipment, and the electronic equipment has the first file, first file correspondence
In the first data, first data are at least two first data blocks, methods described bag according to fixed length segmentation regular partition
Include:
Characteristic is obtained from first data, the characteristic is used to characterize described at least two first data blocks;
After the second file is formed after first file is changed, second file corresponds to the second data, by base
Determine the second data block in the characteristic first data and second data, second data block is institute
State the data block relative to first data variation in the second data;
The data block of correspondence first at least two first data block is updated with second data block;
Wherein, it is described by determining described second based on the characteristic first data and second data
The step of data block, includes:
The characteristic is found from second data;
Second data are divided into by data block based on the characteristic for being found;
Determine second data block by accordingly comparing the data block of first data block and the second data.
2. method according to claim 1, wherein, it is described to include the step of characteristic is obtained from first data:Obtain
The adjacent data of adjacent data blocks in described at least two first data blocks is taken as the characteristic.
3. method according to claim 1, wherein, it is described to wrap the step of find the characteristic from second data
Include:The characteristic is found from second data using the first hash algorithm,
It is described to determine second data block by accordingly comparing the data block of first data block and the second data
Step includes:Accordingly compare the data block of first data block and the second data by using the second hash algorithm to determine
Second data block.
4. method according to claim 2, wherein, the adjacent data has predetermined length,
The step of the first hash algorithm of the utilization finds the characteristic from second data includes:
Hash calculation is performed to each characteristic in the first data using the first hash algorithm and first group of cryptographic Hash is obtained;
Hash calculation gradually is performed to second data using the first hash algorithm in units of the predetermined length and is obtained
Second group of cryptographic Hash;
Second group of cryptographic Hash is seriatim compared with first group of cryptographic Hash, and by its Hash in second data
The data of the cryptographic Hash identical predetermined length in value and first group of cryptographic Hash are defined as characteristic,
It is described accordingly to compare the data block of first data block and the second data by using the second hash algorithm to determine
The step of second data block, includes:
The 3rd group of cryptographic Hash of Hash calculation and acquisition is performed to the data block of second file using the second hash algorithm,
Hash meter is performed to data block corresponding with each data block of the second file in the first data using the second hash algorithm
Calculate and obtain the 4th group of cryptographic Hash;
Accordingly relatively the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data with it is corresponding
The data block that the cryptographic Hash of the first data block is different is defined as the second data block.
5. method according to claim 1, wherein, it is described that at least two first data block is updated with second data block
In the first data block of correspondence the step of include:
Second data block is sent to the remote equipment communicated to connect with the electronic equipment via communication network;
Second data block is stored in the remote equipment.
6. method according to claim 1, methods described also includes:
When second file is needed, obtain in second data block and first data block except with described the
Rest block outside the corresponding data block of two data blocks;
Second data block and the rest block is assembled to obtain the data of second file.
7. a kind of electronic equipment for information processing, the electronic equipment has the first file, and first file corresponds to the
One data, the electronic equipment includes:
Division unit, for first data to be divided into at least two first data blocks according to fixed length segmentation rule;
Acquiring unit, for obtaining characteristic from first data, the characteristic is used to characterize described at least two
Individual first data block;
Determining unit, for after the second file is formed after first file is changed, second file to correspond to the
Two data, by determining the second data block based on the characteristic first data and second data, should
Second data block is the data block relative to first data variation in second data;
Updating block, for updating the data of correspondence first at least two first data block with second data block
Block;
Wherein, the determining unit includes:Part is searched, for finding the characteristic from second data, and will
The characteristic for being found is supplied to the division unit, and the second data are divided into data with based on the characteristic for being found
Block;Comparing unit, for determining described second by accordingly comparing the data block of first data block and the second data
Data block.
8. electronic equipment according to claim 7, wherein, the acquiring unit is obtained at least two first data block
The adjacent data of adjacent data blocks is used as the characteristic.
9. electronic equipment according to claim 7, wherein, the search part is using the first hash algorithm from second data
It is middle to find the characteristic,
The comparing unit accordingly compares the data of first data block and the second data by using the second hash algorithm
Block determines second data block.
10. electronic equipment according to claim 8, wherein, the adjacent data has predetermined length,
The search part finds the characteristic by following operation:Using the first hash algorithm to the first data in it is each
Individual characteristic performs Hash calculation and obtains first group of cryptographic Hash;The first hash algorithm is utilized in units of the predetermined length
Hash calculation gradually is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described
One group of cryptographic Hash is compared, and by the cryptographic Hash phase in its cryptographic Hash in second data and first group of cryptographic Hash
The data of same predetermined length are defined as characteristic,
The comparing unit determines second data block by following operation:Using the second hash algorithm to second file
Data block perform Hash calculation and acquisition the 3rd group of cryptographic Hash, using the second hash algorithm in the first data with the second file
The corresponding data block of each data block perform the 4th group of cryptographic Hash of Hash calculation and acquisition;Accordingly compare the 3rd group of Kazakhstan
Uncommon value and the 4th group of cryptographic Hash, by the number different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data
It is defined as the second data block according to block.
11. electronic equipments according to claim 7, wherein, the updating block connects remote equipment via communication network, described
Updating block performs described renewal and operates as follows:Second data block is sent into the distal end via communication network to set
It is standby so that second data block is stored in the remote equipment.
12. electronic equipments according to claim 7, also include:Module units, for when second file is needed, obtaining
Residue in second data block and first data block in addition to data block corresponding with second data block
Block, and second data block and the rest block are assembled to obtain the data of second file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310086344.4A CN104063377B (en) | 2013-03-18 | 2013-03-18 | Information processing method and use its electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310086344.4A CN104063377B (en) | 2013-03-18 | 2013-03-18 | Information processing method and use its electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104063377A CN104063377A (en) | 2014-09-24 |
CN104063377B true CN104063377B (en) | 2017-06-27 |
Family
ID=51551093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310086344.4A Active CN104063377B (en) | 2013-03-18 | 2013-03-18 | Information processing method and use its electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104063377B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095367B (en) * | 2015-06-26 | 2018-12-28 | 北京奇虎科技有限公司 | A kind of acquisition method and device of client data |
CN107239226B (en) * | 2016-03-29 | 2020-05-26 | 联想(北京)有限公司 | Data deduplication method, terminal and server |
CN107291672B (en) * | 2016-03-31 | 2020-11-20 | 阿里巴巴集团控股有限公司 | Data table processing method and device |
CN106209974B (en) * | 2016-06-21 | 2019-03-12 | 浪潮电子信息产业股份有限公司 | A kind of method of data synchronization, equipment and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101788976A (en) * | 2010-02-10 | 2010-07-28 | 北京播思软件技术有限公司 | File splitting method based on contents |
CN101840363A (en) * | 2009-11-10 | 2010-09-22 | 创新科存储技术有限公司 | Method and device for comparing file blocks |
CN102065098A (en) * | 2010-12-31 | 2011-05-18 | 网宿科技股份有限公司 | Method and system for synchronizing data among network nodes |
CN102202098A (en) * | 2011-05-25 | 2011-09-28 | 成都市华为赛门铁克科技有限公司 | Data processing method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8612392B2 (en) * | 2011-05-09 | 2013-12-17 | International Business Machines Corporation | Identifying modified chunks in a data set for storage |
-
2013
- 2013-03-18 CN CN201310086344.4A patent/CN104063377B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101840363A (en) * | 2009-11-10 | 2010-09-22 | 创新科存储技术有限公司 | Method and device for comparing file blocks |
CN101788976A (en) * | 2010-02-10 | 2010-07-28 | 北京播思软件技术有限公司 | File splitting method based on contents |
CN102065098A (en) * | 2010-12-31 | 2011-05-18 | 网宿科技股份有限公司 | Method and system for synchronizing data among network nodes |
CN102202098A (en) * | 2011-05-25 | 2011-09-28 | 成都市华为赛门铁克科技有限公司 | Data processing method and device |
Non-Patent Citations (1)
Title |
---|
"低带宽环境下远程文件同步技术研究";徐旦;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110915(第09期);论文第15-16页第3.2节 * |
Also Published As
Publication number | Publication date |
---|---|
CN104063377A (en) | 2014-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1684464B (en) | Method and system for updating object between local device and remote device | |
US9400800B2 (en) | Data transport by named content synchronization | |
US8886689B2 (en) | Efficient storage of data allowing for multiple level granularity retrieval | |
EP2344959B1 (en) | Index compression in databases | |
US20170031948A1 (en) | File synchronization method, server, and terminal | |
CN103593440B (en) | The reading/writing method and device of journal file | |
CN108769111A (en) | A kind of server connection method, computer readable storage medium and terminal device | |
US9088403B1 (en) | Identification codewords for a rate-adapted version of a data stream | |
CN101673289B (en) | Method and device for constructing distributed file storage framework | |
CN104063377B (en) | Information processing method and use its electronic equipment | |
WO2022048511A1 (en) | Differential upgrade method for intelligent gas meter firmware | |
CN110209348B (en) | Data storage method and device, electronic equipment and storage medium | |
CN110347651A (en) | Method of data synchronization, device, equipment and storage medium based on cloud storage | |
JP5873925B2 (en) | Compression match enumeration | |
WO2021027331A1 (en) | Graph data-based full relationship calculation method and apparatus, device, and storage medium | |
CN110945496A (en) | System and method for state object data store | |
CN103177092A (en) | Data updating method and system of knowledge base and knowledge base | |
CN112925954A (en) | Method and apparatus for querying data in a graph database | |
CN111538865B (en) | Multiparty set synchronization method and device and electronic equipment | |
CN105095283A (en) | Quasi-friend recommending method in social networking system and quasi-friend recommending system in social networking system | |
CN108763577A (en) | node processing method and device, storage medium and electronic equipment | |
CN109033271B (en) | Data insertion method and device based on column storage, server and storage medium | |
CN106155841A (en) | The method and system of data backup | |
CN106156169B (en) | Discrete data processing method and device | |
CN109002446A (en) | A kind of intelligent sorting method, terminal and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |