CN104063377B - Information processing method and use its electronic equipment - Google Patents

Information processing method and use its electronic equipment Download PDF

Info

Publication number
CN104063377B
CN104063377B CN201310086344.4A CN201310086344A CN104063377B CN 104063377 B CN104063377 B CN 104063377B CN 201310086344 A CN201310086344 A CN 201310086344A CN 104063377 B CN104063377 B CN 104063377B
Authority
CN
China
Prior art keywords
data
data block
block
characteristic
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310086344.4A
Other languages
Chinese (zh)
Other versions
CN104063377A (en
Inventor
邹为星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201310086344.4A priority Critical patent/CN104063377B/en
Publication of CN104063377A publication Critical patent/CN104063377A/en
Application granted granted Critical
Publication of CN104063377B publication Critical patent/CN104063377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces

Abstract

There is provided a kind of information processing method and use its electronic equipment.The electronic equipment has the first file, and first file corresponds to the first data, and first data are at least two first data blocks according to fixed length segmentation regular partition.Described information processing method includes:The characteristic for characterizing described at least two first data blocks is obtained from first data;After the second file corresponding to the second data is formed after first file is changed, by determining the second data block based on the characteristic first data and second data, second data block is the data block relative to first data variation in second data;The data block of correspondence first at least two first data block is updated with second data block.Using the technical scheme of the embodiment of the present invention, the data block that each content in the data block of file is changed can be accurately determined, so that the resource required for saving information processing.

Description

Information processing method and use its electronic equipment
Technical field
The present invention relates to areas of information technology, more particularly, to a kind of information processing method and the information processing is used The electronic equipment of method.
Background technology
With the development of information technology, various types of files, such as text, image file, audiovisual text are generated Part, the program file produced in program operation process etc..These files would generally be changed during use, so as to produce The file of raw different editions.The file of the different types of file and different editions generally needs to occupy substantial amounts of storage sky Between.In order to provide more storage resources, the storage except increasing the such as terminal of smart mobile phone, panel computer or calculating etc Outside space, the cloud storage service for storage file of such as Dropbox etc is also progressively developed, so as to effectively solve terminal Memory space inadequate problem, and the shared of file can be easily realized between different terminal devices.
For the old version and saving memory space of document retaining, generally divide documents into data block and stored. Specifically, the data block of equal length is divided documents into, when file changes due to the reason such as being changed, can be according to The sequencing of data block compares the data block and the data block of amended file of the file before modification block by block, and stores The all of data block of first data block for changing and back.In this document processing scheme, file is only have found Data block in the data block that changes of first content, so, when the preceding data block of the order of file changes Wait, even if the posterior data block of order does not change, it is also desirable to be updated wait process, it is all so as to have wasted unnecessarily Such as memory space and the resource of transmission capacity.
Therefore, it is desirable to exist at the information of the data block that each content in a kind of data block that can determine file is changed Reason mechanism.
The content of the invention
The embodiment of the invention provides a kind of information processing method and the electronic equipment using the information processing method, its energy The data block that each content in the data block of file is changed enough is accurately determined, so that the money required for saving information processing Source.
On the one hand, there is provided a kind of information processing method, electronic equipment is applied to, the electronic equipment has the first text Part, first file corresponds to the first data, and first data are at least two first according to fixed length segmentation regular partition Data block, methods described includes:Obtain characteristic from first data, the characteristic be used to characterizing it is described at least Two the first data blocks;After the second file is formed after first file is changed, second file corresponds to second Data, by determining the second data block based on the characteristic first data and second data, this Two data blocks are the data blocks relative to first data variation in second data;Updated with second data block The data block of correspondence first in described at least two first data blocks.
It is described to be may include the step of acquisition characteristic from first data in described information processing method:Obtain The adjacent data of adjacent data blocks in described at least two first data blocks is taken as the characteristic.
It is described by based on the characteristic first data and described in described information processing method The step of two data are to determine second data block may include:The characteristic is found from second data;It is based on Second data are divided into data block by the characteristic for being found;By accordingly comparing first data block and the second data Data block determine second data block.
It is described to wrap the step of find the characteristic from second data in described information processing method Include:Using the first hash algorithm the characteristic is found from second data;It is described by accordingly comparing described the The step of data block of one data block and the second data is to determine second data block may include:Calculated by using the second Hash Method accordingly compares the data block of first data block and the second data to determine second data block.
In described information processing method, the adjacent data has a predetermined length, it is described using the first hash algorithm from The step of finding the characteristic in second data may include:Using the first hash algorithm to each in the first data Characteristic performs Hash calculation and obtains first group of cryptographic Hash;In units of the predetermined length using the first hash algorithm by It is secondary that Hash calculation is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described first Group cryptographic Hash is compared, and its cryptographic Hash in second data is identical with the cryptographic Hash in first group of cryptographic Hash The data of predetermined length be defined as characteristic.
It is described that by using the second hash algorithm, accordingly relatively the data block of first data block and the second data is come The step of determining second data block may include:Hash is performed to the data block of second file using the second hash algorithm The 3rd group of cryptographic Hash is calculated and obtains, it is corresponding to each data block in the first data with the second file using the second hash algorithm Data block perform Hash calculation and acquisition the 4th group of cryptographic Hash;Accordingly compare the 3rd group of cryptographic Hash and the 4th group of Hash Value, the second number is defined as by the data block different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data According to block.
In described information processing method, in renewal at least two first data block with second data block The first data block of correspondence the step of may include:Second data block is sent to and the electronic equipment via communication network The remote equipment of communication connection;Second data block is stored in the remote equipment.
Described information processing method may also include:When second file is needed, obtain second data block and Rest block in first data block in addition to data block corresponding with second data block;Assemble second data Block and the rest block are obtaining the data of second file.
On the other hand, there is provided a kind of electronic equipment for information processing, the electronic equipment has the first file, should First file corresponds to the first data, and the electronic equipment includes:Division unit, for according to fixed length segmentation rule by described the One data are divided at least two first data blocks;Acquiring unit, it is described for obtaining characteristic from first data Characteristic is used to characterize described at least two first data blocks;Determining unit, for after first file is changed Formed after the second file, second file corresponds to the second data, by based on the characteristic first number Determine the second data block according to second data, second data block is relative to the described first number in second data According to the data block of change;Updating block, for right in second data block renewal at least two first data block Answer the first data block.
In the electronic equipment, the acquiring unit can obtain the adjacent data in described at least two first data blocks The adjacent data of block is used as the characteristic.
In the electronic equipment, the determining unit may include:Part is searched, for being found from second data The characteristic, and the characteristic that will be found is supplied to the division unit, is incited somebody to action with based on the characteristic for being found Second data are divided into data block;Comparing unit, for the number by accordingly comparing first data block and the second data Second data block is determined according to block.
In the electronic equipment, the search part finds institute using the first hash algorithm from second data State characteristic;The comparing unit can accordingly compare first data block and the second number by using the second hash algorithm According to data block determine second data block.
In the electronic equipment, the adjacent data can have predetermined length, and the search part can be by following behaviour Make to find the characteristic:Hash calculation is performed to each characteristic in the first data using the first hash algorithm and is obtained To first group of cryptographic Hash;Hash gradually is performed to second data using the first hash algorithm in units of the predetermined length Calculate and obtain second group of cryptographic Hash;Second group of cryptographic Hash is seriatim compared with first group of cryptographic Hash, and by institute The data for stating the cryptographic Hash identical predetermined length in its cryptographic Hash in the second data and first group of cryptographic Hash are defined as Characteristic.
In the electronic equipment, the comparing unit can determine second data block by following operation:Using Two hash algorithms perform Hash calculation and obtain the 3rd group of cryptographic Hash to the data block of second file, are calculated using the second Hash Method performs Hash calculation and obtains the 4th group of Hash to data block corresponding with each data block of the second file in the first data Value;Accordingly relatively the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data with it is corresponding The data block that the cryptographic Hash of the first data block is different is defined as the second data block.
In the electronic equipment, the updating block can connect remote equipment, the updating block via communication network Described renewal can be as follows performed to operate:Second data block is sent to the remote equipment via communication network so that Second data block is stored in the remote equipment.
The electronic equipment may also include:Module units, for when second file is needed, obtaining second number According to the rest block in block and first data block in addition to data block corresponding with second data block, and assemble Second data block and the rest block are obtaining the data of second file.
In the technical scheme of information processing method and electronic equipment according to embodiments of the present invention, by obtaining each number According to the characteristic of block, and based on this feature data come the data block after comparing the data block before change and changing such that it is able to The data block that each content in the data block of file is changed is accurately determined, to save the resource required for information processing.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be in embodiment or description of the prior art The required accompanying drawing for using is briefly described, it should be apparent that, drawings in the following description are only some realities of the invention Example is applied, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 be a diagram that the flow chart of information processing method according to embodiments of the present invention;
Fig. 2 be a diagram that the flow chart of the determination step in information processing method according to embodiments of the present invention;
Fig. 3 be a diagram that the determination step that the utilization hash algorithm in information processing method according to embodiments of the present invention is realized Rapid flow chart;
Fig. 4 schematically illustrates storage and the group of the data obtained using information processing method according to embodiments of the present invention Dress;
Fig. 5 is the block diagram of the electronic equipment for schematically illustrating according to embodiments of the present invention;
Fig. 6 is the block diagram for schematically illustrating the determining unit in electronic equipment according to embodiments of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.
Fig. 1 be a diagram that the flow chart of information processing method 100 according to embodiments of the present invention.
Information processing method according to embodiments of the present invention can be applied to various electronic equipments, including but not limited to intelligence electricity Phone, computer, personal digital assistant, Website server, bibliographic data base or data storage server etc., electronic equipment Particular type is not construed as limiting the invention.
The electronic equipment has the first file, and first file can be various types of files, for example text text Part, image file, audio video document, program file etc., the type of file are not also construed as limiting the invention.First file is made For data storage in the storage device, wherein, the storage device is high in the clouds storage device, and certain storage device can also be this Ground storage device.For example, first file corresponds to the first data.First data storage is beyond the clouds in storage device, example Such as, Dropbox.
When first data are stored, first data are divided at least two first according to fixed length segmentation rule Data block, to be stored.The fixed length segmentation rule is the rule that the size based on data is divided, and it can be isometric Division rule, such as in the case where the first data are 6,000,000, it can be divided into 6 the first data blocks(Each first data The size of block is 1,000,000);The division rule with predetermined space rule is can also be, for example, 6,000,000 the first data can be divided To be respectively 1,000,000,2,000,000,1,000,000,2,000,000 4 the first data blocks.In addition to fixed length segmentation rule, can also foundation times What other dividing mode divides the data of for example described first data, for example, drawn according to the content characteristic of the first data Point, specific division rule is not construed as limiting the invention, as long as first data are divided at least two data blocks .
Described information processing method 100 may include:Characteristic is obtained from first data, the characteristic is used In sign at least two first data block(S110);After the second file is formed after first file is changed, Second file corresponds to the second data, by based on the characteristic first data and second data come Determine the second data block, second data block is the data block relative to first data variation in second data (S120);The data block of correspondence first at least two first data block is updated with second data block(S130).
In S110, characteristic is obtained from first data, the characteristic is used to characterize described at least two Individual first data block such that it is able to distinguished based on the characteristic in described at least two first data blocks each first Data block.
As an example for obtaining characteristic, the adjacent data in described at least two first data blocks can be obtained The adjacent data of block is used as the characteristic.With first data 6 1,000,000 are divided into as 6,000,000 and by isometric in order The first data block as a example by, can using the 64 adjacent bytes in the first adjacent data block 1 and 2 as characteristic, and according to It is secondary respectively using the first data block 2 and 3 ..., the adjacent data between the first data block 5 and 6 is used as characteristic.64 word The adjacent data of section for example may include preceding 32 byte in rear 32 byte and the first data block 2 in the first data block 1, alternatively may be used Preceding 14 byte in rear 50 byte and the first data block 2 in including the first data block 1, also alternatively in the first data block 1 Rear 64 byte, or be alternatively preceding 64 byte etc. in the first data block 2,64 bytes here be only it is schematical, Other length can as needed be taken.Additionally, the length of different characteristics can be change, and each feature The feature of the first data block that the length of data can for example be characterized based on this feature data determines.
As another example for obtaining characteristic, the characteristic can be obtained based on the content of the first data. As a example by content by first data is " today is that March 5, Tuesday, weather in 2013 are cloudy ", can be by the content In content with low uncertainty(For example, "Yes", " 2013 ", " weather ")The characteristic is set to, such that it is able to easily distinguish The first data block that each characteristic is characterized.Further, it is also possible to be based on other characteristics of the content of first data(Example Such as the frequency of occurrences)Etc. come be provided for characterize each the first data block characteristic.
As set forth above, it is possible to take different modes to obtain the characteristic to characterize each the first data block, specifically Acquisition modes be not construed as limiting the invention.The length of each characteristic, its position in the first data etc. are not yet It is construed as limiting the invention.
In S120, after the second file is formed after first file is changed, second file corresponds to the Two data, by determining the second data block based on the characteristic first data and second data, should Second data block is the data block relative to first data variation in second data.
When first file such as the use due to file or editor is changed and forms the second file, with file pair The data answered also accordingly change, such as the first data corresponding with the first file change into the second number corresponding with the second file According to.At this point it is possible to be based on characteristic first data and second data of the acquisition in S110 to determine State the data block relative to first data variation in the second data(That is the second data block).
Determination mode on second data block, in the case of acquired characteristic difference, can take not With mode determine second data block.Exemplary description is carried out with reference to Fig. 2 and Fig. 3, more fully to disclose this Invention.
Can be determined using the step of each shown in Fig. 2 in second data relative to first data variation Data block, i.e. the second data block.
Fig. 2 be a diagram that the flow chart of the determination step in information processing method according to embodiments of the present invention.Shown in Fig. 2 Determination step include:The characteristic is found from second data(S121);Will based on the characteristic for being found Second data are divided into data block(S122);By accordingly comparing the data block of first data block and the second data come really Fixed second data block(S123).
In S121, it is possible to use the first Hash(Hash)Algorithm finds the characteristic from second data. The step is used to be found from the second data of the second file after change the characteristic of the first data block, so as to literary by first Each first data block of part is mapped with each data block of the second file, to judge each data block phase of the second file Whether changed for the first data block.
Hash algorithm is that a kind of compression maps, its output that input data of random length is transformed into length-specific, i.e., Cryptographic Hash(Also referred to as hashed value).The space of hashed value is generally much less than the space of input, so as to substitute institute with the hashed value State and enter data to be processed, treatment effeciency is improved with the case where the degree of accuracy is ensured.There are various types of Hash to calculate Method, including but not limited to MD5(Message Digest5:Eap-message digest 5)Hash algorithm, SHA1 (Secure Hash Algorithm, Secure Hash Algorithm) hash algorithm etc..Therefore, first hash algorithm can be MD5 hash algorithms, SHA1 Any one in hash algorithm etc..Further, it is also possible to any other algorithm in addition to hash algorithm come from described second The characteristic is found in data, the type of the algorithm for being used does not constitute the limitation to the embodiment of the present invention.
In S122, the second data are divided into by data block based on the characteristic for being found.By the feature for being found Data are for characterizing described at least two first data blocks of the first data, so characterizing the based on identical characteristic Each first data block and each data block of the second file of one file.Therefore, incited somebody to action by based on the characteristic for being found Second data are divided into data block, according to mode corresponding with each first data block of the first file by the second file Second data are divided into data block.
In S123, described second is determined by accordingly comparing the data block of first data block and the second data Data block.Due to the second of the second file being counted according to mode corresponding with each first data block of the first file in S122 According to being divided into data block, thus can compare block by block in S123 the first file the first data block and the second file with The corresponding data block of one data block, so that it is determined that relative to the data block of first data variation in second data, i.e., Second data block.When the data block of first data block and the second data is accordingly compared, it is also possible to take hash algorithm To improve relative efficiency, for example, accordingly compare the number of first data block and the second data by using the second hash algorithm Second data block is determined according to block.Second hash algorithm can be identical with the first hash algorithm, it is also possible to different, even Can be other algorithms in addition to hash algorithm.
Fig. 3 be a diagram that the determination step that the utilization hash algorithm in information processing method according to embodiments of the present invention is realized Rapid flow chart.In the example of fig. 3, using the adjacent data of the adjacent data blocks in described at least two first data blocks as The characteristic, and the adjacent data has predetermined length.
The determination step that the utilization hash algorithm is realized includes:Using first hash algorithm in the first data Each characteristic performs Hash calculation and obtains first group of cryptographic Hash (S321);First is utilized in units of the predetermined length Hash algorithm gradually performs Hash calculation and obtains second group of cryptographic Hash to second data(S322);By second group of cryptographic Hash Seriatim it is compared with first group of cryptographic Hash, and by its cryptographic Hash in second data and first group of Hash The data of the cryptographic Hash identical predetermined length in value are defined as characteristic(S323);Based on the characteristic for being found by Two data are divided into data block(S324);Hash calculation is performed simultaneously to the data block of second file using the second hash algorithm Obtain the 3rd group of cryptographic Hash(S325);It is corresponding to each data block in the first data with the second file using the second hash algorithm Data block perform Hash calculation and acquisition the 4th group of cryptographic Hash(S326);Accordingly compare the 3rd group of cryptographic Hash and the 4th Group cryptographic Hash, the data block different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data is defined as Second data block(S327).
Step S321, S322 and S323 in Fig. 3 are used to realize the step S121 in Fig. 2.Include 6 with the first data below Individual 1,000,000 the first data block and using adjacent 64 byte data of adjacent data blocks as the characteristic(That is characteristic 1-5)As a example by explanation.
In S321, for example using MD5 hash algorithms to the first data block 1 and 2 ... between the first data block 5 and 6 5 characteristics of 32 bytes perform Hash calculation and obtain including 5 the first of cryptographic Hash group of cryptographic Hash.In S322, profit Hash meter is performed to the 1-64 bytes of the second file, 2-65 bytes, 3-66 bytes ... with the first hash algorithm Calculate, until performing Hash calculation to all of second data, to obtain second group of cryptographic Hash.In S323, by second group of Kazakhstan Uncommon value is seriatim compared with first group of cryptographic Hash, and by its cryptographic Hash in second data with described first group The data of the byte of cryptographic Hash identical 64 in cryptographic Hash are defined as characteristic, so as to find the first data from the second data Even in 5 characteristics in all or part, can not find all of 5 characteristics.
S324 in Fig. 3 is identical with the S122 in Fig. 2.If finding 5 characteristics in S323, by S324 Two data are divided into and 6 the first data blocks corresponding 6 data blocks completely;If finding characteristic 1-4 in S323, Without finding characteristic 5, then the second data are divided into 5 data block 1-5, the data block 5 in the second data in S324 Corresponding to first data block 5 and 6 the two first data blocks;If not finding all of characteristic 1-5 in S323, Then may determine that and integrally change without being further subdivided into block in the second data.
Step S325, S326 and S327 in Fig. 3 are used to realize the step S123 in Fig. 2.With the first data in S324 Block has accordingly been divided after the data block of the second data, can block by block compare the data block of the second data and corresponding The first data block.As a example by it have found characteristic 1-4 without finding characteristic 5, using the second hash algorithm to institute The data block 1-5 for stating the second data performs Hash calculation and obtains the 3rd group of cryptographic Hash(S325);Using the second hash algorithm pair First data block 1,2,3,4,5-6 performs Hash calculation and obtains the 4th group of cryptographic Hash(S326);Accordingly compare the second data The data block 1 of data block 1 and first cryptographic Hash ... the cryptographic Hash of the data block 4 of data block 4 and first of the second data, The cryptographic Hash of the data block 5-6 of data block 5 and first of two data, and by its cryptographic Hash in second data and corresponding The data block that the cryptographic Hash of one data block is different is defined as the second data block(S327).
As it was previously stated, in the case of the mode difference of the characteristic for obtaining, generally correspondingly to take different sides Formula determines second data block.Number is characterized with reference to by curriculum offering with low uncertainty in the content of second file According to situation further illustrate the determination of the second data.
Content with first data be " today is that March 5, Tuesday, weather in 2013 are cloudy ", will with "Yes", " 2013 ", " weather " corresponding data are set to as a example by 3 characteristics.First data be divided into " today is ", " March 5, Tuesday in 2013 ", " weather is cloudy " corresponding three first data block 1-3, it is respectively by 3 features Data characterization.If the first file is modified to second that content is " today is that March 7, Thursday, weather in 2013 are cloudy " File, then in can determining second data based on the characteristic first data and second data Relative to the data block of first data variation, i.e. the second data block.Specifically, based on 3 characteristics come block-by-block Ground compares the first data and second data, so as to content is " March 7, week in 2013 in easily finding out the second data Four " data block there occurs change, so that it is determined that second data block.
In S130, the data of correspondence first at least two first data block are updated with second data block Block.When the data storage block in the local storage of electronic equipment, the renewal step in the S130 can be and first Block in data block accordingly stores second data block, or using second data block replace in the first data with Corresponding first data block of second data block.When the data storage block in the remote equipment being coupled in communication with electronic equipment When, the renewal step in the S130 may include to be sent to and the electronic equipment second data block via communication network The remote equipment of communication connection;Second data block is stored in the remote equipment.
If it is intended to retaining the first file and the second file(That is the file of different editions), then with described second in S130 Data block also retains the correspondence first and counts while replacing the first data block of correspondence at least two first data block According to block.It is described with reference to Fig. 4.
Fig. 4 schematically illustrates storage and the group of the data obtained using information processing method according to embodiments of the present invention Dress.In fig. 4, the first data of the first file be divided into the first data block 1, the first data block 2 ... the data blocks of first N, and according to block storage in electronic equipment or remote equipment;When the first file turns into the second file by modification, utilize S110 and S120 in above- mentioned information processing method determine that the first data block 2 becomes the second data block, the renewal in S130 In step the second data block is accordingly also stored for the first data block 2(Stored referring to the block on the right side in Fig. 4), now second The data block of file include the first data block 1, the second data block, the first data block 3 ... the first data blocks of N;When the second text When part turns into three files by modification, determine that the first data block 3 in the second file becomes using the S110 and S120 3rd data block, the 3rd data block is accordingly also stored in the renewal step in S130 with the first data block 3(Referring to Fig. 4 In right side block storage), now the data block of the 3rd file include the first data block 1, the second data block, the 3rd data Block ... the first data blocks of N.In practice, in the case where file is had different files by modification, can be with as far as possible few Memory space store the data block of each version file.This is carried out on the basis of the data block storage for combining Fig. 4 descriptions Conversion.
In the technical scheme of information processing method according to embodiments of the present invention, by the feature for obtaining each data block Data, and based on this feature data come the data block after comparing the data block before change and changing such that it is able to it is accurately determined The data block that each content is changed in the data block of file, and it is empty that required storage is saved in subsequent information processing Between, network traffics can also be saved when data block is stored remotely.
Additionally, can be with amended file in the information processing method shown in Fig. 1(That is the second file)Data obtain Take, i.e., when second file is needed, obtain in second data block and first data block except with described the Rest block outside the corresponding data block of two data blocks;Second data block and the rest block is assembled to obtain described second The data of file.In the example of fig. 4, when need second file when, obtained from block memory cell the first data block 1, Second data block, the first data block 3 ... the first data block N, and sequentially each data block for being obtained of assembling to obtain State the data of the second file;When three file is needed, the first data block 1, the second data are obtained from block memory cell Block, the 3rd data block, the first data block 4 ... the first data block N, and sequentially each data block for being obtained of assembling to obtain The data of the 3rd file.
Fig. 5 is the block diagram of the electronic equipment 500 for schematically illustrating according to embodiments of the present invention.
The electronic equipment 500 can be any kind of electronic equipment, including but not limited to intelligent telephone set, calculating Machine, personal digital assistant, Website server, bibliographic data base or data storage server etc., the particular type of electronic equipment is not It is construed as limiting the invention.The electronic equipment 500 has the first file, and first file corresponds to the first data.It is described First file can be various types of files, and the type of file is not also construed as limiting the invention.It is corresponding with the first file The first data storage in the local storage of electronic equipment 500 or communicate with coupling remote equipment in.
The electronic equipment 500 includes:Division unit 510, for drawing first data according to fixed length segmentation rule It is divided at least two first data blocks;Acquiring unit 520, for obtaining characteristic, the feature from first data Data are used to characterize described at least two first data blocks;Determining unit 530, for the shape after first file is changed Into after the second file, second file corresponds to the second data, by based on the characteristic first data Determine the second data block with second data, second data block is relative to first data in second data The data block of change;Updating block 540, for right in second data block renewal at least two first data block Answer the first data block.
When first data are stored, the division unit 510 draws first data according to fixed length segmentation rule It is divided at least two first data blocks, to be stored.The fixed length segmentation rule is that the size based on data is divided Rule, it can be isometric division rule, can also be the division rule with predetermined space rule.Except fixed length segmentation Outside rule, the division unit 510 can also divide for example described first data according to any other dividing mode Data, for example, divide according to the content characteristic of the first data.Specific division rule is not construed as limiting the invention, as long as First data are divided at least two data blocks.
The acquiring unit 520, for obtaining characteristic from first data, the characteristic is used to characterize Described at least two first data blocks such that it is able in distinguishing described at least two first data blocks based on the characteristic Each first data block.
The acquiring unit 520 can take different modes to obtain the characteristic, for example, described in can obtaining extremely The adjacent data of the adjacent data blocks in few two the first data blocks is as the characteristic or based in the first data Hold to obtain the characteristic.The acquisition modes that acquiring unit 520 is taken do not constitute the limitation to the embodiment of the present invention, respectively The length of individual characteristic, its position in the first data etc. do not constitute the limitation to the embodiment of the present invention yet.
Used as the adjacent data of adjacent data blocks is obtained as the example of the characteristic, the acquiring unit 520 can With by the first data block 1 of sequential, the first data block 2 ... the phase of any two adjacent data blocks in the first data block N Adjacent 64 bytes are used as characteristic.The adjacent data of 64 byte for example may include in rear 32 byte of preceding data block and rear Preceding 32 byte in data block, alternatively may include rear 50 byte and 14 bytes before rear data block in preceding data block, Also it is alternatively rear 64 byte in preceding data block, or is alternatively 64 bytes etc. before rear data block, here 64 bytes be only schematical, can as needed take other length.Additionally, the length of different characteristics can To be change, and the length of each characteristic can for example be based on the spy of the first data block that this feature data be characterized Levy to determine.
In the case of the characteristic is obtained in the content based on the first data, the content with first data is As a example by " today is that March 5, Tuesday, weather in 2013 are cloudy ", can be by content with low uncertainty in the content(For example, "Yes", " weather ")Be set to the characteristic, such that it is able to easily distinguish that each characteristic characterized each first Data block.Further, it is also possible to be based on other characteristics of the content of first data(Such as frequency of occurrences)Etc. being provided for Characterize the characteristic of each the first data block.
After the second file is formed after first file is changed, data corresponding to file are also corresponding to be changed, Such as the first data corresponding with the first file change into the second data corresponding with the second file.The determining unit 530 is led to Cross based on the characteristic first data and second data to determine the second data block, second data block It is the data block relative to first data variation in second data.Determination mode on second data block, In the case of acquired characteristic difference, determining unit 530 can take different modes to determine second data Block.Exemplary description is carried out with reference to Fig. 6, more fully to disclose the present invention.
Fig. 6 is the block diagram for schematically illustrating the determining unit in electronic equipment according to embodiments of the present invention.Such as Fig. 6 institutes Show, the determining unit 530 includes:Part 531 is searched, for finding the characteristic from second data, and will The characteristic for being found is supplied to the division unit 510, is divided into the second data with based on the characteristic for being found Data block;Comparing unit 532, for by accordingly comparing the data block of first data block and the second data to determine State the second data block.
The search part 531 can find the characteristic using the first hash algorithm from second data, The characteristic of the first data block is found from the second data of the second file after change, so as to by the first file each First data block is mapped with each data block of the second file, to judge each data block of the second file relative to first Whether data block changes.As it was previously stated, hash algorithm is used to be transformed into specific length into the input data for being about to random length The output of degree(That is cryptographic Hash)Compression mapping, and in practice can be different type, for example MD5 hash algorithms, SHA1 breathe out Uncommon algorithm etc..First hash algorithm can be any one hash algorithm, and also can be replaced with any other algorithm, The type of the algorithm for being used does not constitute the limitation to the embodiment of the present invention.
The characteristic that the search part 531 will be found is supplied to the division unit 510, and division unit 510 is right The second data are divided into by data block based on found characteristic afterwards.By the characteristic for being found is for characterizing the Described at least two first data blocks of one data, therefore the second data are divided into number by based on the characteristic for being found According to block, the second data of the second file are exactly divided into number according to mode corresponding with each first data block of the first file According to block.
It is described to search in the case of the adjacent data of the acquisition adjacent data blocks of acquiring unit 520 is as the characteristic Seeking part 531 can find the characteristic by following operation:Using the first hash algorithm to each spy in the first data Data are levied to perform Hash calculation and obtain first group of cryptographic Hash;In units of the predetermined length using the first hash algorithm gradually Hash calculation is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described first group Cryptographic Hash is compared, and by the cryptographic Hash identical in its cryptographic Hash in second data and first group of cryptographic Hash The data of predetermined length are defined as characteristic.Concrete operations on the search part 531 may refer to combine the step in Fig. 3 The description that rapid S321, S322, S323 are carried out.
Due to division unit 510 according to mode corresponding with each first data block of the first file by the of the second file Two data are divided into data block, so comparing unit 532 can block by block compare first data block and the second text of the first file The data block corresponding with the first data block of part, so that it is determined that relative to the number of first data variation in second data According to block, i.e. the second data block.In order to improve relative efficiency, the comparing unit 532 can be by using the second hash algorithm correspondence The data block of ground first data block and the second data determines second data block.Second hash algorithm can be with It is identical with the first hash algorithm, it is also possible to different, even can be substituted with other algorithms in addition to hash algorithm.
As an example, the comparing unit 532 can determine second data block by following operation:Using the second Hash Algorithm performs the 3rd group of cryptographic Hash of Hash calculation and acquisition to the data block of second file, using the second hash algorithm to the Data block corresponding with each data block of the second file performs Hash calculation and obtains the 4th group of cryptographic Hash in one data;Correspondence Ground the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data and corresponding first data The data block that the cryptographic Hash of block is different is defined as the second data block.Concrete operations on the comparing unit 532 may refer to knot The description that step S325, S326, S327 in conjunction Fig. 3 are carried out.
In acquiring unit 520 based on the content of the first data to obtain the characteristic in the case of, determining unit 530 The second data block can be determined by based on the characteristic first data and second data as follows.It is false If the content of first data is " today is that March 5, Tuesday, weather in 2013 are cloudy ", will be with "Yes", " 2013 " " weather " corresponding data are set to 3 characteristics.3 characteristics be respectively used to characterize with " today is ", " 2013 March 5, Tuesday ", " weather is cloudy " corresponding three first data blocks.If it is " today that the first file is modified to content That March 7, Thursday, weather in 2013 are cloudy " the second file, then can be based on the characteristic more described first Data and second data determine the data block in second data relative to first data variation, i.e., with " on March 7th, 2013, Thursday, " corresponding data block, so that it is determined that the second data block.
The updating block 540 is used to update the correspondence at least two first data block with second data block First data block.As an example, being deposited via communication network connection remote equipment and in remote equipment in electronic equipment 500 In the case of storage data block, the updating block 540 can as follows perform described renewal and operate:Will be described via communication network Second data block is sent to the remote equipment so that second data block is stored in the remote equipment.When in electronics In the local storage of equipment during data storage block, the updating block 540 can perform following renewal and operate:With the first number Accordingly store second data block according to the block in block, or using second data block replace in the first data with institute State corresponding first data block of the second data block.
If it is intended to retaining both the first file and the second file(That is the file of different editions), then the updating block 540 also retain institute while the first data block of correspondence at least two first data block is replaced with second data block State the first data block of correspondence.Mode on both the first data block of storage and the second data block, may refer to be carried out with reference to Fig. 4 Description.
Alternatively, the electronic equipment 500 can also include module units 550(As shown in the dotted line frame in Fig. 5).The group Dress unit 550 is used to, when second file is needed, obtain second data block and at least two first data block In rest block in addition to data block corresponding with second data block, and assemble second data block and the residue Block is obtaining the data of second file.In the case where the second file is modified to the 3rd file, can also use what is be similar to Mode obtains the data of the 3rd file, so that the file of different editions can be stored to try one's best few memory space.On The example of the data of file is obtained, the description carried out with reference to Fig. 4 is may refer to.
In the technical scheme of above-mentioned electronic equipment according to embodiments of the present invention, each is obtained by using acquiring unit The characteristic of data block, and compare the data block before change and the data after change based on this feature data by determining unit Block such that it is able to be accurately determined the data block that each content in the data block of file is changed, and in subsequent information processing Memory space required for middle saving, network traffics can also be saved when remotely being stored to data block.
Additionally, it is apparent to those skilled in the art that, for convenience and simplicity of description, foregoing description Equipment, the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Unit and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, depending on the application-specific and design constraint of technical scheme.Professional and technical personnel Described function, but this realization can be realized it is not considered that exceeding using distinct methods to each specific application The scope of the present invention.
If the function is to realize in the form of SFU software functional unit and as independent production marketing or when using, can be with Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment(Can be individual People's computer, server, or network equipment etc.)Perform all or part of step of each embodiment methods described of the invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage, random access memory, magnetic disc or CD etc. are each Planting can be with the medium of store program codes.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (12)

1. a kind of information processing method, is applied to electronic equipment, and the electronic equipment has the first file, first file correspondence In the first data, first data are at least two first data blocks, methods described bag according to fixed length segmentation regular partition Include:
Characteristic is obtained from first data, the characteristic is used to characterize described at least two first data blocks;
After the second file is formed after first file is changed, second file corresponds to the second data, by base Determine the second data block in the characteristic first data and second data, second data block is institute State the data block relative to first data variation in the second data;
The data block of correspondence first at least two first data block is updated with second data block;
Wherein, it is described by determining described second based on the characteristic first data and second data The step of data block, includes:
The characteristic is found from second data;
Second data are divided into by data block based on the characteristic for being found;
Determine second data block by accordingly comparing the data block of first data block and the second data.
2. method according to claim 1, wherein, it is described to include the step of characteristic is obtained from first data:Obtain The adjacent data of adjacent data blocks in described at least two first data blocks is taken as the characteristic.
3. method according to claim 1, wherein, it is described to wrap the step of find the characteristic from second data Include:The characteristic is found from second data using the first hash algorithm,
It is described to determine second data block by accordingly comparing the data block of first data block and the second data Step includes:Accordingly compare the data block of first data block and the second data by using the second hash algorithm to determine Second data block.
4. method according to claim 2, wherein, the adjacent data has predetermined length,
The step of the first hash algorithm of the utilization finds the characteristic from second data includes:
Hash calculation is performed to each characteristic in the first data using the first hash algorithm and first group of cryptographic Hash is obtained;
Hash calculation gradually is performed to second data using the first hash algorithm in units of the predetermined length and is obtained Second group of cryptographic Hash;
Second group of cryptographic Hash is seriatim compared with first group of cryptographic Hash, and by its Hash in second data The data of the cryptographic Hash identical predetermined length in value and first group of cryptographic Hash are defined as characteristic,
It is described accordingly to compare the data block of first data block and the second data by using the second hash algorithm to determine The step of second data block, includes:
The 3rd group of cryptographic Hash of Hash calculation and acquisition is performed to the data block of second file using the second hash algorithm,
Hash meter is performed to data block corresponding with each data block of the second file in the first data using the second hash algorithm Calculate and obtain the 4th group of cryptographic Hash;
Accordingly relatively the 3rd group of cryptographic Hash and the 4th group of cryptographic Hash, by its cryptographic Hash in second data with it is corresponding The data block that the cryptographic Hash of the first data block is different is defined as the second data block.
5. method according to claim 1, wherein, it is described that at least two first data block is updated with second data block In the first data block of correspondence the step of include:
Second data block is sent to the remote equipment communicated to connect with the electronic equipment via communication network;
Second data block is stored in the remote equipment.
6. method according to claim 1, methods described also includes:
When second file is needed, obtain in second data block and first data block except with described the Rest block outside the corresponding data block of two data blocks;
Second data block and the rest block is assembled to obtain the data of second file.
7. a kind of electronic equipment for information processing, the electronic equipment has the first file, and first file corresponds to the One data, the electronic equipment includes:
Division unit, for first data to be divided into at least two first data blocks according to fixed length segmentation rule;
Acquiring unit, for obtaining characteristic from first data, the characteristic is used to characterize described at least two Individual first data block;
Determining unit, for after the second file is formed after first file is changed, second file to correspond to the Two data, by determining the second data block based on the characteristic first data and second data, should Second data block is the data block relative to first data variation in second data;
Updating block, for updating the data of correspondence first at least two first data block with second data block Block;
Wherein, the determining unit includes:Part is searched, for finding the characteristic from second data, and will The characteristic for being found is supplied to the division unit, and the second data are divided into data with based on the characteristic for being found Block;Comparing unit, for determining described second by accordingly comparing the data block of first data block and the second data Data block.
8. electronic equipment according to claim 7, wherein, the acquiring unit is obtained at least two first data block The adjacent data of adjacent data blocks is used as the characteristic.
9. electronic equipment according to claim 7, wherein, the search part is using the first hash algorithm from second data It is middle to find the characteristic,
The comparing unit accordingly compares the data of first data block and the second data by using the second hash algorithm Block determines second data block.
10. electronic equipment according to claim 8, wherein, the adjacent data has predetermined length,
The search part finds the characteristic by following operation:Using the first hash algorithm to the first data in it is each Individual characteristic performs Hash calculation and obtains first group of cryptographic Hash;The first hash algorithm is utilized in units of the predetermined length Hash calculation gradually is performed to second data and second group of cryptographic Hash is obtained;By second group of cryptographic Hash seriatim with described One group of cryptographic Hash is compared, and by the cryptographic Hash phase in its cryptographic Hash in second data and first group of cryptographic Hash The data of same predetermined length are defined as characteristic,
The comparing unit determines second data block by following operation:Using the second hash algorithm to second file Data block perform Hash calculation and acquisition the 3rd group of cryptographic Hash, using the second hash algorithm in the first data with the second file The corresponding data block of each data block perform the 4th group of cryptographic Hash of Hash calculation and acquisition;Accordingly compare the 3rd group of Kazakhstan Uncommon value and the 4th group of cryptographic Hash, by the number different from the cryptographic Hash of corresponding first data block of its cryptographic Hash in second data It is defined as the second data block according to block.
11. electronic equipments according to claim 7, wherein, the updating block connects remote equipment via communication network, described Updating block performs described renewal and operates as follows:Second data block is sent into the distal end via communication network to set It is standby so that second data block is stored in the remote equipment.
12. electronic equipments according to claim 7, also include:Module units, for when second file is needed, obtaining Residue in second data block and first data block in addition to data block corresponding with second data block Block, and second data block and the rest block are assembled to obtain the data of second file.
CN201310086344.4A 2013-03-18 2013-03-18 Information processing method and use its electronic equipment Active CN104063377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310086344.4A CN104063377B (en) 2013-03-18 2013-03-18 Information processing method and use its electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310086344.4A CN104063377B (en) 2013-03-18 2013-03-18 Information processing method and use its electronic equipment

Publications (2)

Publication Number Publication Date
CN104063377A CN104063377A (en) 2014-09-24
CN104063377B true CN104063377B (en) 2017-06-27

Family

ID=51551093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310086344.4A Active CN104063377B (en) 2013-03-18 2013-03-18 Information processing method and use its electronic equipment

Country Status (1)

Country Link
CN (1) CN104063377B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095367B (en) * 2015-06-26 2018-12-28 北京奇虎科技有限公司 A kind of acquisition method and device of client data
CN107239226B (en) * 2016-03-29 2020-05-26 联想(北京)有限公司 Data deduplication method, terminal and server
CN107291672B (en) * 2016-03-31 2020-11-20 阿里巴巴集团控股有限公司 Data table processing method and device
CN106209974B (en) * 2016-06-21 2019-03-12 浪潮电子信息产业股份有限公司 A kind of method of data synchronization, equipment and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101788976A (en) * 2010-02-10 2010-07-28 北京播思软件技术有限公司 File splitting method based on contents
CN101840363A (en) * 2009-11-10 2010-09-22 创新科存储技术有限公司 Method and device for comparing file blocks
CN102065098A (en) * 2010-12-31 2011-05-18 网宿科技股份有限公司 Method and system for synchronizing data among network nodes
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8612392B2 (en) * 2011-05-09 2013-12-17 International Business Machines Corporation Identifying modified chunks in a data set for storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840363A (en) * 2009-11-10 2010-09-22 创新科存储技术有限公司 Method and device for comparing file blocks
CN101788976A (en) * 2010-02-10 2010-07-28 北京播思软件技术有限公司 File splitting method based on contents
CN102065098A (en) * 2010-12-31 2011-05-18 网宿科技股份有限公司 Method and system for synchronizing data among network nodes
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"低带宽环境下远程文件同步技术研究";徐旦;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110915(第09期);论文第15-16页第3.2节 *

Also Published As

Publication number Publication date
CN104063377A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN1684464B (en) Method and system for updating object between local device and remote device
US9400800B2 (en) Data transport by named content synchronization
US8886689B2 (en) Efficient storage of data allowing for multiple level granularity retrieval
EP2344959B1 (en) Index compression in databases
US20170031948A1 (en) File synchronization method, server, and terminal
CN103593440B (en) The reading/writing method and device of journal file
CN108769111A (en) A kind of server connection method, computer readable storage medium and terminal device
US9088403B1 (en) Identification codewords for a rate-adapted version of a data stream
CN101673289B (en) Method and device for constructing distributed file storage framework
CN104063377B (en) Information processing method and use its electronic equipment
WO2022048511A1 (en) Differential upgrade method for intelligent gas meter firmware
CN110209348B (en) Data storage method and device, electronic equipment and storage medium
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
JP5873925B2 (en) Compression match enumeration
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
CN110945496A (en) System and method for state object data store
CN103177092A (en) Data updating method and system of knowledge base and knowledge base
CN112925954A (en) Method and apparatus for querying data in a graph database
CN111538865B (en) Multiparty set synchronization method and device and electronic equipment
CN105095283A (en) Quasi-friend recommending method in social networking system and quasi-friend recommending system in social networking system
CN108763577A (en) node processing method and device, storage medium and electronic equipment
CN109033271B (en) Data insertion method and device based on column storage, server and storage medium
CN106155841A (en) The method and system of data backup
CN106156169B (en) Discrete data processing method and device
CN109002446A (en) A kind of intelligent sorting method, terminal and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant