CN115168110B - Incremental data identification method, device, equipment and storage medium - Google Patents
Incremental data identification method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115168110B CN115168110B CN202211075926.8A CN202211075926A CN115168110B CN 115168110 B CN115168110 B CN 115168110B CN 202211075926 A CN202211075926 A CN 202211075926A CN 115168110 B CN115168110 B CN 115168110B
- Authority
- CN
- China
- Prior art keywords
- data
- path
- incremental
- increment
- current candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an incremental data identification method, an incremental data identification device, incremental data identification equipment and a storage medium. The invention discloses: screening a plurality of candidate paths from each data path based on the change time of each data path in the storage device to be backed up, selecting the plurality of candidate paths, taking the selected candidate paths as current candidate paths, acquiring the path names of the current candidate paths and the naming strategies of the current candidate paths, determining the paths to be identified in the current candidate paths based on the naming strategies and the path names, and performing incremental identification on the paths to be identified; according to the method and the device, each data path is screened according to the change time of each data path, so that the path identification range is reduced, the path to be identified in the current candidate path is determined based on the naming strategy and the path name, and the path to be identified is subjected to incremental identification, so that the incremental prediction of the data to be incrementally processed in the storage device to be backed up is realized, and the incremental processing efficiency is effectively improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for incremental data identification.
Background
With the development of science and technology and the progress of society, storage devices have become popular, when files or data in the storage devices are backed up, all files or data in the storage devices are generally required to be backed up in full, and then newly added or modified files or data are subjected to incremental backup, while the current increment cannot accurately predict the files or data in the storage devices, which need to be subjected to incremental operation, so that the storage devices cannot be effectively subjected to incremental backup.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an incremental data identification method, an incremental data identification device, incremental data identification equipment and a storage medium, and aims to solve the technical problem that incremental backup cannot be effectively performed on storage equipment due to the fact that incremental data identification cannot be accurately performed on the storage equipment in the prior art.
In order to achieve the above object, the present invention provides an incremental data identification method, including the steps of:
screening a plurality of candidate paths from each data path based on the change time of each data path in the storage device to be backed up;
selecting a plurality of screened candidate paths, and taking the selected candidate paths as current candidate paths;
acquiring the path name of the current candidate path and the naming strategy of the current candidate path;
determining a path to be identified in the current candidate paths based on the naming strategy and the path name;
and performing incremental identification on the path to be identified.
Optionally, the obtaining the path name of the current candidate path and the naming policy of the current candidate path include:
acquiring the path name of the current candidate path;
performing character string sequencing on each data in the current candidate path according to the path name;
time sequencing is carried out on all data according to the change time;
and determining the naming strategy of the current candidate path according to the character string sequencing result and the time sequencing result.
Optionally, the determining a path to be identified in the current candidate paths based on the naming policy and the pathname includes:
when the naming strategy is a time sequence naming strategy, probability sequencing is carried out on all data in the current candidate path based on the change time;
determining increment probability corresponding to each data in the current candidate path based on the time sequence naming strategy and the probability sorting result;
and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
Optionally, the determining, based on the naming policy and the pathname, a path to be identified in the current candidate paths includes:
when the naming strategy is a random naming strategy, determining the character repetition probability corresponding to each data in the current candidate path according to the path name;
repeatedly sorting each data in the current candidate path based on the character repetition probability, and determining the increment probability corresponding to each data in the current candidate path according to the random naming strategy and the repeated sorting result;
and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
Optionally, when the naming policy is a random naming policy, determining, according to the path name, a character repetition probability corresponding to each data in the current candidate path includes:
when the naming strategy is a random naming strategy, acquiring byte composition information of the path name of the current candidate path, wherein the byte composition information comprises byte number, byte type and byte sequence;
and determining the character repetition probability corresponding to each data in the current candidate path according to the byte composition information.
Optionally, after performing incremental identification on the path to be identified, the method further includes:
performing increment enumeration on the path to be identified according to an increment identification result to obtain data to be incremented in the path to be identified;
and acquiring an increment strategy corresponding to the data to be incremented, and performing increment operation on the data to be incremented according to the increment strategy.
Optionally, after obtaining the increment policy corresponding to the data to be incremented and performing increment operation on the data to be incremented according to the increment policy, the method further includes:
acquiring the quantity of current incremental data and the capacity of the current incremental data;
stopping incremental enumeration when the current incremental data quantity is not less than the non-incremental data quantity in the storage device to be backed up;
or/and stopping performing increment enumeration when the current increment data capacity is not lower than the residual capacity of the storage device to be backed up.
In addition, in order to achieve the above object, the present invention further provides an incremental data identification apparatus, including:
the path screening module is used for screening a plurality of candidate paths from each data path based on the change time of each data path in the storage equipment to be backed up;
the route selection module is used for selecting the screened candidate routes and taking the selected candidate routes as current candidate routes;
a naming obtaining module, configured to obtain a path name of the current candidate path and a naming policy of the current candidate path;
a path identification module, configured to determine a path to be identified in the current candidate paths based on the naming policy and the path name;
and the increment identification module is used for carrying out increment identification on the path to be identified.
In addition, to achieve the above object, the present invention further provides an incremental data identification device, including: a memory, a processor and an incremental data identification program stored on the memory and executable on the processor, the incremental data identification program configured to implement the steps of the incremental data identification method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium having an incremental data identification program stored thereon, which when executed by a processor implements the steps of the incremental data identification method as described above.
The method comprises the steps of screening a plurality of candidate paths from each data path based on the change time of each data path in the storage device to be backed up, selecting the screened candidate paths, and taking the selected candidate paths as current candidate paths; acquiring the path name of the current candidate path and the naming strategy of the current candidate path; determining a path to be identified in the current candidate paths based on the naming strategy and the path name; performing incremental identification on the path to be identified; according to the method and the device, each data path is screened according to the change time of each data path, a plurality of candidate paths are obtained according to the screening result, so that the range of path identification is narrowed, the path name of the current candidate path and the naming strategy of the current candidate path are obtained, and the path to be identified in the current candidate path is determined based on the naming strategy and the path name, so that the path identification efficiency is improved, the path to be identified is subjected to incremental identification, the incremental prediction of data to be subjected to incremental processing in the storage device to be backed up is realized, and the incremental processing efficiency is effectively improved.
Drawings
FIG. 1 is a schematic structural diagram of an incremental data identification device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a first embodiment of an incremental data recognition method according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the incremental data recognition method of the present invention;
FIG. 4 is a flowchart illustrating a method for incremental data identification according to a third embodiment of the present invention;
fig. 5 is a block diagram of the incremental data identification apparatus according to the first embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an incremental data identification device of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the incremental data identification apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the incremental data recognition device, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and an incremental data recognition program.
In the incremental data identification apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the incremental data identification device of the present invention may be provided in the incremental data identification device, and the incremental data identification device calls the incremental data identification program stored in the memory 1005 through the processor 1001 and executes the incremental data identification method provided by the embodiment of the present invention.
An embodiment of the present invention provides an incremental data identification method, and referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of an incremental data identification method according to the present invention.
In this embodiment, the incremental data identification method includes the following steps:
step S10: screening a plurality of candidate paths from each data path based on the change time of each data path in the storage device to be backed up;
step S20: selecting a plurality of screened candidate paths, and taking the selected candidate paths as current candidate paths;
step S30: obtaining the path name of the current candidate path and the naming strategy of the current candidate path;
step S40: determining a path to be identified in the current candidate paths based on the naming strategy and the path name;
step S50: and performing incremental identification on the path to be identified.
It should be noted that the embodiment is applied to a scenario where the storage device to be backed up needs to perform incremental backup, and is used to acquire a naming policy of the storage device to be backed up, determine candidate incremental data in the storage device to be backed up and an incremental probability corresponding to the candidate incremental data based on the naming policy, and sequentially perform incremental prediction on the candidate incremental data according to the incremental probability corresponding to the candidate incremental data, so as to implement incremental prediction on the data to be backed up in the storage device to be backed up, thereby improving the efficiency of incremental backup.
It should be understood that the execution subject of the method of this embodiment may be an incremental data identification device with data processing, network communication and program running functions, such as a computer, or other apparatuses or devices capable of implementing the same or similar functions, and this is described here by taking the above incremental data identification device (hereinafter referred to as an incremental prediction device) as an example.
It should be noted that the Storage device to be backed up may be a readable and writable Storage device that needs to perform file backup or data backup, such as a Network Attached Storage device (NAS), and the Storage device to be backed up may be a file-level (as opposed to block-level Storage) computer data Storage server, and provide file sharing services for computers of various operating systems such as Windows/Linux/Mac OS in a Network, where the NAS Storage may be a UNIX, an NFS server in Linux, or a CIFS server in Windows.
The data path may be a currently existing file path or data path in the storage device to be backed up, the candidate path may be a path in which a file change or a data change has recently occurred in the data path, the incremental data identification device may determine, according to the change time, that the data path has a candidate path in which a file change or a data change has recently occurred, the recent time may be within 1 month or within 3 months, and the like, which is not limited in this embodiment, and the incremental data identification device may set a recent time interval in advance.
The current candidate path may be a path in the candidate paths that meets a preset requirement, where the preset requirement may be a traversal requirement preset by the incremental data identification device for traversing the candidate paths, and for example, the preset requirement may be a format requirement, a data size requirement, or a readable type requirement.
The naming policy may be a rule for naming when the storage device to be backed up stores files or data, for example, the naming policy may be a rule for naming based on a date, a rule for naming based on a path, or a rule for naming at random.
It should be understood that, in order to determine the modification order or the generation order of each data in the storage device to be backed up, the incremental prediction device in this embodiment determines the modification order or the generation order of each data in the storage device to be backed up by acquiring the pathname and the change date of each data in the storage device to be backed up, sorting each data based on the pathname, sorting each data based on the change date, comparing the sorting of the two, and determining the naming policy of each data in the storage device to be backed up according to the comparison result.
For example, the incremental predicting device obtains the path names of the data in the storage device to be backed up as 202104622. Jpg, 202104623. Jpg, 2021041335.Jpg and 2021041356.Jpg, and then obtains the change dates of the data as 2021 year 4 month 13 day 16 point 22 point, 2021 year 4 month 13 day 16 point 23 point, 2021 year 4 month 13 day 16 point 35 point and 2021 year 4 month 13 day 16 point 56 point, and the naming policy of the storage device to be backed up is a rule named based on the date sequence according to the path names and the change dates.
It should be noted that the candidate incremental data may be data that has been changed recently in the storage device to be backed up, and the candidate incremental data may be a file or data, or may be other types of storage data. When the naming strategy is a time sequence naming strategy, the increment prediction device can determine candidate increment data in each data according to the change date of each data, wherein the change date can be the date of the latest change of the data, and the change form can be generation or modification; when the naming policy is a random naming policy, the incremental prediction device may determine candidate incremental data in each data according to the path name of each data.
It should be understood that, in order to accurately perform incremental backup and avoid efficiency reduction caused by data loss or multiple backups due to missed backups, the incremental prediction device determines recently changed data in the storage device to be backed up based on a naming strategy, so as to determine an incremental probability of candidate incremental data and screen out data actually needing incremental backup.
In the specific implementation, the incremental prediction device determines the naming mode of each data in the storage device to be backed up based on a naming strategy, when the naming strategy is a time sequence naming strategy, the data are sequenced according to the change time of the data, and candidate incremental data in the data are determined according to the sequencing result; and when the naming strategy is a random naming strategy, determining the character repetition probability of each data according to the path name of each data, and determining candidate incremental data in each data according to the character repetition probability.
For example, the incremental prediction device determines the naming mode of each data in the storage device to be backed up as a time sequence naming strategy based on the naming strategy, acquires the change date of each data, wherein the data A is 18 o 32 s at 2/month 2022, the data B is 12 o 13 o 52 s at 18 o 18 h 2/month 2022, the data C is 14 o 31 o 12 s at 2/month 22 at 2022, 12 o 14 o 12 s at 22 o 22 h 14 o 9/month 2020, the data D is 22 o 1 o 44 o 32 s at 9/month 22 and the data E is 11 o 29 o 11 s at 20 h 20 at 2019, and sequentially and time sequences the data into the data C, the data B, the data A, the data D and the data E according to the change date, so that the candidate incremental data in each data are determined as the data C, the data B and the data A.
It should be noted that the increment probability may be a probability that the candidate increment data needs to be incrementally backed up, and the increment prediction device may determine the increment probability of the candidate increment data according to a time sequence or a pathname sequence of the candidate increment data.
It should be understood that, in order to reduce the prediction duration and improve the prediction efficiency, the incremental prediction device in this embodiment first determines candidate incremental data in each data in the storage device to be backed up, and then determines the incremental probability of the candidate incremental data, so as to avoid the problem that the incremental probability of all data in the storage device to be backed up needs to be calculated, improve the incremental prediction efficiency, and then sequentially perform incremental prediction on the candidate incremental data according to the incremental probability corresponding to the candidate incremental data.
In the specific implementation, the increment prediction device determines the increment probability corresponding to the candidate increment data, sequences the candidate increment data according to the increment probability, and performs increment prediction on the candidate increment data according to the sequencing result, so that the candidate increment data with higher increment probability is predicted preferentially, and the efficiency of increment prediction is improved.
For example, the incremental prediction device determines candidate incremental data in each data in the storage device to be backed up as data G, data H, data I, data J and data K, wherein the incremental probability of the data G is 50%, the incremental probability of the data H is 10%, the incremental probability of the data I is 15%, the incremental probability of the data J is 5%, and the incremental probability of the data K is 20%, sorts the candidate incremental data according to the incremental probabilities to obtain an incremental prediction sequence of the candidate incremental data as data G, data K, data I, data H, and data J, and performs incremental prediction on the candidate incremental data according to the incremental prediction sequence to determine the data to be incremental in the candidate incremental data.
In a specific implementation, the incremental data recognition device screens 5 candidate paths from the data paths based on the change time of each data path in the storage device to be backed up, selects the 5 selected candidate paths as current candidate paths, acquires the path names of the current candidate paths as 2021 \\27 \25, 2021 \22 \20, 2022\01 \14and 2022\01 \20, respectively, and the naming strategy of the current candidate paths as a time sequence naming strategy, determines the incremental quantities of the current candidate paths based on the naming strategy and the path name, and accordingly determines the incremental quantities of the current candidate paths 2022 \13\14 \55and 2022 \14 \20and the incremental quantities of the current candidate paths as the incremental quantities to be recognized 20 and 2022 14 \20and 20201 \20, and identifies the current candidate paths as 20201 \20and 2022 18 and 2022 18 via the current candidate paths, and thus determines the incremental quantities of the incremental data recognition paths to be recognized 20 and 20201 and 2022 and 20201 \20.
Further, in order to effectively perform the incremental operation on the storage device to be backed up, after step S50, the method may include:
performing increment enumeration on the path to be identified according to an increment identification result to obtain data to be incremented in the path to be identified;
and acquiring an increment strategy corresponding to the data to be incremented, and performing increment operation on the data to be incremented according to the increment strategy.
It should be noted that the incremental operation may include operations such as incremental backup, incremental synchronization, or incremental migration, and the incremental policy may be an operation rule preset by the incremental prediction device. The data to be incremented is data which needs to be subjected to increment operation in the storage device to be backed up.
Further, in order to avoid backup loss caused by insufficient space of the storage device, after obtaining the increment policy corresponding to the data to be incremented and performing the increment operation on the data to be incremented according to the increment policy, the method may include:
acquiring the quantity of current incremental data and the capacity of the current incremental data;
stopping performing incremental enumeration when the current incremental data quantity is not less than the non-incremental data quantity in the storage device to be backed up;
or/and stopping performing increment enumeration when the current increment data capacity is not lower than the residual capacity of the storage device to be backed up.
It should be noted that the current incremental data quantity may be a data quantity that has been subjected to an incremental operation currently, and the non-incremental data quantity may be a data quantity that has not been subjected to an incremental operation in the storage device to be backed up. The current incremental data capacity may be a total capacity occupied by data currently having been subjected to incremental operation, and the remaining capacity may be a remaining space capacity in the storage device to be backed up, where the remaining space capacity may be subjected to incremental operation.
It should be understood that, after incremental backup, incremental synchronization, or incremental migration of data to be incremented is successfully performed by the incremental prediction device, a total data volume, a total size, or a total occupied space of data to be incremented is calculated, and if the total data volume, the total size, or the total occupied space meets the record number stored in the current storage device to be backed up, enumeration is stopped, so as to avoid that the storage device to be backed up loses data due to insufficient storage space.
In this embodiment, a plurality of candidate paths are screened from each data path based on the change time of each data path in the storage device to be backed up, the screened candidate paths are selected, and the selected candidate paths are used as current candidate paths; obtaining the path name of the current candidate path and the naming strategy of the current candidate path; determining a path to be identified in the current candidate paths based on the naming strategy and the path name; performing incremental identification on the path to be identified; according to the method and the device, each data path is screened according to the change time of each data path, a plurality of candidate paths are obtained according to the screening result, so that the range of path identification is narrowed, the path name of the current candidate path and the naming strategy of the current candidate path are obtained, and the path to be identified in the current candidate path is determined based on the naming strategy and the path name, so that the path identification efficiency is improved, the path to be identified is subjected to incremental identification, the incremental prediction of data to be subjected to incremental processing in the storage device to be backed up is realized, and the incremental processing efficiency is effectively improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating an incremental data identification method according to a second embodiment of the present invention.
Based on the foregoing first embodiment, in this embodiment, the step S30 includes:
step S301: acquiring the path name of the current candidate path;
step S302: performing character string sequencing on each data in the current candidate path according to the path name;
step S303: time sequencing is carried out on each data according to the change time;
step S304: and determining the naming strategy of the current candidate path according to the character string sequencing result and the time sequencing result.
It should be noted that each data in the storage device to be backed up may be data or a file stored in the storage device to be backed up, each data in the storage device to be backed up may be data in any format, for example, in a jpg format, a txt format, an exe format, or an stp format, and the above-mentioned path name may be a name of the data or the file stored in the storage device to be backed up. The pathname may be generated by a time-series name, such as 2022\06\02\14\23\42.Jpg; or may be generated by random naming, such as 56786148.Jpg.
In a specific implementation, when the naming strategy is a random naming strategy, the pathname is obtained by calculating a random number, that is, all character components of the pathname are random. When the naming policy is a time-series naming policy, the path name is named according to the time of change of the data, and in this case, the path name may be named by a method such as year, month, day, hour, second, and millisecond, or may be composed of a time-series ID or a reverse-series ID, for example, 2022\03\12\789789.Jpg.
The character string sorting may be sorting of the data in the order of characters according to the characters in the path name, and the character string sorting may be sorting in an ascending order or a reverse order, for example, the path name of the data is sorted in the ascending order of characters 6148.Jpg, 6149.Jpg, 6158.Jpg, and 7132.Jpg.
It should be understood that, in order to accurately determine the naming policy of the storage device to be backed up, the incremental prediction device in this embodiment obtains the character composition type corresponding to the path name of each data to determine the number of characters, the character type, and the like of the path name, and performs character string sorting on each data according to the above character composition type and the path name.
For example, the incremental prediction device obtains a path name corresponding to each piece of data in the storage device to be backed up and a change time corresponding to each piece of data, where data a is 88519246.Jpg, data B is 33199586.Jpg, data C is 11063592.Jpg, and data D is 10020227.Jpg, and the character strings that perform ascending order on the path name of each piece of data are sequenced into 10020227.Jpg, 11063592.Jpg, pg 33199586.Jpg, and 88519246.Jpg, that is, the data rank is data D, data C, data B, and data a.
The change time may be the time at which each piece of data was changed the last time, the change time may be the time at which the piece of data was created, or the change time may be the time at which the piece of data was modified, for example, the time at which the piece of data S was modified the last time is 20 hours 32 minutes 20 on 23 days 1 and 23 in 2022, and the time at which the piece of data S was modified the last time is 20 hours 32 minutes 20 on 23 days 1 and 23 in 2022.
The time-sorting result may be a result of time-sorting each data based on the change time of each data, and for example, when the change time of data a is 23 hours and 12 minutes, the change time of data B is 21 hours and 23 minutes, and the change time of data C is 22 hours and 59 minutes, the time-sorting result of the data is data B, data C, and data a.
It should be understood that, in order to improve the accuracy of predicting the naming policy of the storage device to be backed up, the incremental prediction device in this embodiment obtains the change time corresponding to each data in the storage device to be backed up, performs time sorting on each data based on the change time, performs sequence comparison on the time sorting result and the character string sorting result, and determines the naming policy of the storage device to be backed up according to the sequence comparison result.
In the specific implementation, the incremental prediction device acquires change time corresponding to each data in the storage device to be backed up, time sorting is carried out on each data based on the change time, sorting comparison is carried out on a time sorting result and a character string sorting result, and if the time sorting result is consistent with the arrangement sequence of the character string sorting result, the naming strategy of the storage device to be backed up is determined to be a time sequence naming strategy; and if the time sequencing result is inconsistent with the arrangement sequence of the character string sequencing result, determining the naming strategy of the storage device to be backed up as a random naming strategy.
It should be understood that, in order to accurately predict the naming policy of the storage device to be backed up, the incremental prediction device in this embodiment obtains the sequence of the change time of each data in the storage device to be backed up, compares the result of the sequence of the path names of each data according to the sequence of the change time, and determines the naming policy of the storage device to be backed up according to the result of the comparison.
In the specific implementation, the incremental prediction equipment compares the character string sequencing results according to the sequencing of the change time of each piece of data, and if the comparison result judges that the sequencing results of the character strings are consistent with the sequencing results of the character strings, the naming strategy is a time sequence naming strategy; and if the comparison result judges that the sorting result of the character strings is inconsistent with the sorting result of the change moment, the naming strategy is a random naming strategy.
For example, the incremental prediction device acquires the change time of each data, time-sorts each data according to the change time, the change time-sorting result is data a, data B, data C, and data D, the pathname of each data a is 21751615.Jpg, the pathname of data B is 16068523.Jpg, the pathname of data C is 97585704.Jpg, and the pathname of data D is 561699277. Jpg, the character string-sorting result of each data according to the pathnames is data B, data a, data D, and data C, and the character string-sorting result does not match the change time-sorting result according to the character string-sorting result and the change time-sorting result, so that the naming strategy of the storage device to be backed up is determined to be a random naming strategy.
In this embodiment, the path name of the current candidate path is obtained, the data in the current candidate path are subjected to character string sorting according to the path name, the data are subjected to time sorting according to the change time, and the naming strategy of the current candidate path is determined according to the character string sorting result and the time sorting result; according to the invention, the path name of the current candidate path is obtained, the data in the current candidate path are subjected to character string sequencing according to the path name, the data are subjected to time sequencing according to the change time, and the naming strategy of the current candidate path is determined according to the character string sequencing result and the time sequencing result, so that the efficiency and the accuracy of increment prediction are improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating an incremental data identification method according to a third embodiment of the present invention.
Based on the foregoing first embodiment, in this embodiment, the step S40 includes:
step S401: when the naming strategy is a random naming strategy, determining the character repetition probability corresponding to each data in the current candidate path according to the path name;
step S402: repeatedly sorting each data in the current candidate path based on the character repetition probability, and determining the increment probability corresponding to each data in the current candidate path according to the random naming strategy and the repeated sorting result;
step S403: and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
It should be noted that the random naming policy may be a rule based on random character naming, and the random character may be a number or other type of character, such as 723589.Txt, 083589.Txt or 1098578. Txt. The character repetition probability may be a probability of occurrence of repetition between pathname characters of each data, and the character repetition probability may be a difference probability of a first character in a pathname or a difference probability of other characters, for example, in 2331.Jpg, 2212.Jpg, 2521.Jpg, and 9123.Jpg, the character repetition probability of the first character in each data is the highest character 2, and the character repetition probability is the lowest character 9.
The increment probability may be a probability that the candidate increment data needs to be subjected to increment backup, and the increment prediction device may determine the increment probability of the candidate increment data according to time ordering or pathname ordering of the candidate increment data.
It should be understood that, the increment prediction device in this embodiment performs repetitive sequencing on each data according to the character repetition probability, reversely determines the increment probability of each data according to the result of the repetitive sequencing, for example, the repetitive probability of data a is 80%, the repetitive probability of data B is 15%, and the repetitive probability of data C is 5%, performs repetitive sequencing on each data, performs repetitive sequencing in ascending order to obtain data C, data B, and data a, performs incremental sequencing on each data according to the result of the repetitive sequencing to obtain data a, data B, and data C, and determines the increment probability of each data according to the result of the incremental sequencing.
It should be noted that the increment sorting result may be an increment order sorted according to the increment probability of each data, where the increment sorting may be an ascending sorting or a descending sorting.
In a specific implementation, the increment prediction device performs increment sorting on each data according to the increment probability of each data, and determines candidate increment data in each data according to the increment sorting result and a preset screening requirement, where the preset screening requirement may be that data with an increment probability exceeding a preset threshold is used as the candidate increment data, for example, the preset threshold may be 50% or 40%.
For example, the incremental prediction device obtains an incremental probability of each data, where data a is 80%, data B is 20%, data C is 40%, data D is 10%, and data E is 15%, performs an ascending order of the data according to the incremental probability to obtain data D, data E, data B, data C, and data a, and determines candidate incremental data as data D and data E according to the result of the incremental order.
Further, in order to accurately obtain the character repetition probability of each data, the step S401 may include:
step S4011: when the naming strategy is a random naming strategy, acquiring byte composition information of the path name of the current candidate path, wherein the byte composition information comprises byte number, byte type and byte sequence;
step S4012: and determining the character repetition probability corresponding to each data in the current candidate path according to the byte composition information.
It should be noted that the byte number may be the byte number in the pathname, for example, the byte number of 12590.Jpg is 5 bytes. The byte type can be numeric, english or Chinese. The byte ordering may be an order of arrangement of the bytes in the pathname.
Further, in order to accurately determine the candidate incremental data when the naming policy is the time sequence naming policy, the step S40 may include:
when the naming strategy is a time sequence naming strategy, probability sequencing is carried out on all data in the current candidate path based on the change time;
determining increment probability corresponding to each data in the current candidate path based on the time sequence naming strategy and the probability sorting result;
and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
It should be understood that, when the naming policy is a time sequence naming policy, the time change sequence of each data in the backup storage device is determined according to the time ordering result to determine the data with the change time close to the current time, so as to determine the increment probability corresponding to each data in the backup storage device, perform increment ordering on each data according to the increment probability, and determine candidate increment data in each data according to the increment ordering result.
In a specific implementation, for example, when the naming policy is a time sequence naming policy, the increment prediction device determines, according to the time sorting result, that the change time of the data a in the storage device to be backed up is closest to the current time, next to the data B, next to the data C, and finally to the data D, so that the ascending time sorting result is the data D, the data C, the data B, and the data a, and determines, according to the time sorting result, the ascending increment sorting result of the increment probability of each data is the data D, the data C, the data B, and the data a.
In this embodiment, when the naming policy is a random naming policy, the character repetition probability corresponding to each data in the current candidate path is determined according to the path name, the data in the current candidate path is repeatedly sorted based on the character repetition probability, the increment probability corresponding to each data in the current candidate path is determined according to the random naming policy and the repeated sorting result, the data in the current candidate path is incrementally sorted according to the increment probability, and the path to be identified in the current candidate path is determined based on the increment sorting result. When the naming strategy is a random naming strategy, the character repetition probability corresponding to each data is determined according to the path name corresponding to each data in the storage device to be backed up, the data is repeatedly sequenced according to the character repetition probability, so that the difference division of the randomly named data is realized, the increment probability corresponding to each data is determined according to the repeated sequencing result, the data is incrementally sequenced according to the increment probability, candidate increment data in each data is determined according to the incremental sequencing result, the data is repeatedly sequenced to determine the candidate increment data, the data quantity of increment prediction is reduced, and the efficiency of increment prediction is improved.
In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores an incremental data identification program, and the incremental data identification program, when executed by a processor, implements the steps of the incremental data identification method described above.
Since the storage medium adopts all the technical solutions of all the embodiments, at least all the beneficial effects brought by the technical solutions of the embodiments are provided, and are not described in detail herein.
Referring to fig. 5, fig. 5 is a block diagram illustrating a structure of an incremental data recognition apparatus according to a first embodiment of the present invention.
As shown in fig. 5, an incremental data identification apparatus according to an embodiment of the present invention includes:
the path screening module 10 is configured to screen a plurality of candidate paths from each data path based on change time of each data path in the storage device to be backed up;
a path selection module 20, configured to select multiple candidate paths that are screened out, and use the selected candidate paths as current candidate paths;
a naming obtaining module 30, configured to obtain a path name of the current candidate path and a naming policy of the current candidate path;
a path identifying module 40, configured to determine a path to be identified in the current candidate paths based on the naming policy and the path name;
and the increment identification module 50 is configured to perform increment identification on the path to be identified.
Further, the name obtaining module 30 is further configured to obtain a path name of the current candidate path; performing character string sequencing on each data in the current candidate path according to the path name; time sequencing is carried out on each data according to the change time; and determining the naming strategy of the current candidate path according to the character string sorting result and the time sorting result.
Further, the path identifying module 40 is further configured to, when the naming policy is a time sequence naming policy, perform probability ordering on each data in the current candidate path based on the change time; determining increment probability corresponding to each data in the current candidate path based on the time sequence naming strategy and the probability sorting result; and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
Further, the path identifying module 40 is further configured to, when the naming policy is a random naming policy, determine, according to the path name, a character repetition probability corresponding to each data in the current candidate path; repeatedly sorting each data in the current candidate path based on the character repetition probability, and determining the increment probability corresponding to each data in the current candidate path according to the random naming strategy and the repeated sorting result; and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
Further, the path identifying module 40 is further configured to, when the naming policy is a random naming policy, obtain byte composition information of the path name of the current candidate path, where the byte composition information includes a byte number, a byte type, and a byte order; and determining the character repetition probability corresponding to each data in the current candidate path according to the byte composition information.
Further, the increment identifying module 50 is further configured to perform increment enumeration on the path to be identified according to an increment identifying result, so as to obtain data to be incremented in the path to be identified; and acquiring an increment strategy corresponding to the data to be incremented, and performing increment operation on the data to be incremented according to the increment strategy.
Further, the increment identification module 50 is further configured to obtain a current increment data quantity and a current increment data capacity; stopping performing incremental enumeration when the current incremental data quantity is not less than the non-incremental data quantity in the storage device to be backed up; or/and stopping performing increment enumeration when the current increment data capacity is not lower than the residual capacity of the storage device to be backed up.
The method comprises the steps of screening a plurality of candidate paths from each data path based on the change time of each data path in the storage device to be backed up, selecting the screened candidate paths, and taking the selected candidate paths as current candidate paths; acquiring the path name of the current candidate path and the naming strategy of the current candidate path; determining a path to be identified in the current candidate paths based on the naming strategy and the path name; performing incremental identification on the path to be identified; according to the method and the device, each data path is screened according to the change time of each data path, a plurality of candidate paths are obtained according to the screening result, so that the range of path identification is narrowed, the path name of the current candidate path and the naming strategy of the current candidate path are obtained, and the path to be identified in the current candidate path is determined based on the naming strategy and the path name, so that the path identification efficiency is improved, the path to be identified is subjected to incremental identification, the incremental prediction of data to be subjected to incremental processing in the storage device to be backed up is realized, and the incremental processing efficiency is effectively improved.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the incremental data identification method provided in any embodiment of the present invention, and are not described herein again.
Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.
Claims (8)
1. An incremental data identification method, characterized in that the incremental data identification method comprises:
screening a plurality of candidate paths from each data path based on the change time of each data path in the storage equipment to be backed up;
selecting a plurality of screened candidate paths, and taking the selected candidate paths as current candidate paths;
obtaining the path name of the current candidate path and the naming strategy of the current candidate path;
determining a path to be identified in the current candidate paths based on the naming strategy and the path name;
performing incremental identification on the path to be identified;
the determining a path to be identified in the current candidate paths based on the naming policy and the path name includes:
when the naming strategy is a time sequence naming strategy, probability sequencing is carried out on all data in the current candidate path based on the change time;
determining increment probability corresponding to each data in the current candidate path based on the time sequence naming strategy and the probability sorting result;
performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result;
the determining a path to be identified in the current candidate paths based on the naming policy and the path name includes:
when the naming strategy is a random naming strategy, determining the character repetition probability corresponding to each data in the current candidate path according to the path name;
repeatedly sorting each data in the current candidate path based on the character repetition probability, and determining the increment probability corresponding to each data in the current candidate path according to the random naming strategy and the repeated sorting result;
and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
2. The incremental data identification method of claim 1 wherein said obtaining the path name of the current candidate path and the naming policy of the current candidate path comprises:
acquiring the path name of the current candidate path;
performing character string sequencing on each data in the current candidate path according to the path name;
time sequencing is carried out on all data according to the change time;
and determining the naming strategy of the current candidate path according to the character string sequencing result and the time sequencing result.
3. The incremental data recognition method of claim 1, wherein when the naming policy is a random naming policy, determining a character repetition probability corresponding to each data in the current candidate path according to the path name comprises:
when the naming strategy is a random naming strategy, acquiring byte composition information of the path name of the current candidate path, wherein the byte composition information comprises byte number, byte type and byte sequencing;
and determining the character repetition probability corresponding to each data in the current candidate path according to the byte composition information.
4. The incremental data identification method according to claim 1, wherein after the incremental identification of the path to be identified, the method further comprises:
performing increment enumeration on the path to be identified according to an increment identification result to obtain data to be incremented in the path to be identified;
and acquiring an increment strategy corresponding to the data to be incremented, and performing increment operation on the data to be incremented according to the increment strategy.
5. The incremental data identification method of claim 4, wherein after the obtaining of the incremental policy corresponding to the data to be incremented and the performing of the incremental operation on the data to be incremented according to the incremental policy, the method further comprises:
acquiring the quantity of current incremental data and the capacity of the current incremental data;
stopping incremental enumeration when the current incremental data quantity is not less than the non-incremental data quantity in the storage device to be backed up;
or/and stopping performing increment enumeration when the current increment data capacity is not lower than the residual capacity of the storage device to be backed up.
6. An incremental data recognition apparatus, characterized in that the incremental data recognition apparatus comprises:
the path screening module is used for screening a plurality of candidate paths from each data path based on the change time of each data path in the storage equipment to be backed up;
the route selection module is used for selecting the screened candidate routes and taking the selected candidate routes as current candidate routes;
a naming obtaining module, configured to obtain a path name of the current candidate path and a naming policy of the current candidate path;
a path identification module, configured to determine a path to be identified in the current candidate paths based on the naming policy and the pathname;
the increment identification module is used for carrying out increment identification on the path to be identified;
the path identification module is further configured to perform probability sorting on each data in the current candidate path based on the change time when the naming policy is a time sequence naming policy; determining increment probability corresponding to each data in the current candidate path based on the time sequence naming strategy and the probability sorting result; performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result;
the path identification module is further configured to determine, according to the path name, a character repetition probability corresponding to each data in the current candidate path when the naming policy is a random naming policy; repeatedly sorting each data in the current candidate path based on the character repetition probability, and determining the increment probability corresponding to each data in the current candidate path according to the random naming strategy and the repeated sorting result; and performing increment sorting on each data in the current candidate path according to the increment probability, and determining a path to be identified in the current candidate path based on an increment sorting result.
7. An incremental data recognition apparatus, characterized in that the incremental data recognition apparatus comprises: a memory, a processor, and an incremental data identification program stored on the memory and executable on the processor, the incremental data identification program configured to implement the incremental data identification method of any one of claims 1 to 5.
8. A storage medium having stored thereon an incremental data recognition program which, when executed by a processor, implements an incremental data recognition method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211075926.8A CN115168110B (en) | 2022-09-05 | 2022-09-05 | Incremental data identification method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211075926.8A CN115168110B (en) | 2022-09-05 | 2022-09-05 | Incremental data identification method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115168110A CN115168110A (en) | 2022-10-11 |
CN115168110B true CN115168110B (en) | 2022-11-29 |
Family
ID=83481649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211075926.8A Active CN115168110B (en) | 2022-09-05 | 2022-09-05 | Incremental data identification method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115168110B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389925A (en) * | 2012-05-09 | 2013-11-13 | 南京壹进制信息技术有限公司 | Real-time backup method based on process name identification |
CN112306758A (en) * | 2020-12-24 | 2021-02-02 | 深圳市科力锐科技有限公司 | Backup success rate prediction method, device, equipment and storage medium |
CN114003439A (en) * | 2021-12-30 | 2022-02-01 | 深圳市科力锐科技有限公司 | Data backup method, device, equipment and storage medium |
CN114253850A (en) * | 2021-12-20 | 2022-03-29 | 平安证券股份有限公司 | Code incremental coverage rate statistical method, device, equipment and storage medium |
CN114398333A (en) * | 2021-12-30 | 2022-04-26 | 中国电信股份有限公司 | Incremental data real-time synchronization method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7219317B2 (en) * | 2004-04-19 | 2007-05-15 | Lsi Logic Corporation | Method and computer program for verifying an incremental change to an integrated circuit design |
-
2022
- 2022-09-05 CN CN202211075926.8A patent/CN115168110B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389925A (en) * | 2012-05-09 | 2013-11-13 | 南京壹进制信息技术有限公司 | Real-time backup method based on process name identification |
CN112306758A (en) * | 2020-12-24 | 2021-02-02 | 深圳市科力锐科技有限公司 | Backup success rate prediction method, device, equipment and storage medium |
CN114253850A (en) * | 2021-12-20 | 2022-03-29 | 平安证券股份有限公司 | Code incremental coverage rate statistical method, device, equipment and storage medium |
CN114003439A (en) * | 2021-12-30 | 2022-02-01 | 深圳市科力锐科技有限公司 | Data backup method, device, equipment and storage medium |
CN114398333A (en) * | 2021-12-30 | 2022-04-26 | 中国电信股份有限公司 | Incremental data real-time synchronization method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
大数据容灾备份技术挑战和增量备份解决方案_--;罗圣美 等;《大数据》;20150920;第106-112页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115168110A (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107908540B (en) | Test case creating method and device, computer equipment and medium | |
CN108399072B (en) | Application page updating method and device | |
JP6365195B2 (en) | Instruction history analysis program, instruction history analysis apparatus, and instruction history analysis method | |
JP2001518664A (en) | Method and apparatus for analyzing data | |
CN110231994B (en) | Memory analysis method, memory analysis device and computer readable storage medium | |
WO2008023030A1 (en) | Signature based client automatic data backup system | |
CN111095421A (en) | Context-aware incremental algorithm for gene files | |
CN111400361A (en) | Data real-time storage method and device, computer equipment and storage medium | |
CN111026647A (en) | Code coverage rate obtaining method and device, computer equipment and storage medium | |
CN115729687A (en) | Task scheduling method and device, computer equipment and storage medium | |
CN115190010A (en) | Distributed recommendation method and device based on software service dependency relationship | |
CN111596945A (en) | Differential upgrading method for dynamic multi-partition firmware of embedded system | |
CN115168110B (en) | Incremental data identification method, device, equipment and storage medium | |
CN113312529A (en) | Data visualization method and device, computer equipment and storage medium | |
CN113434122A (en) | Multi-role page creation method and device, server and readable storage medium | |
CN110647452B (en) | Test method, test device, computer equipment and storage medium | |
CN112069236A (en) | Associated file display method, device, equipment and storage medium | |
CN105893614A (en) | Information recommendation method and device and electronic equipment | |
KR20110023580A (en) | The method and system for recovering data | |
US20110185167A1 (en) | Change impact research support device and change impact research support method | |
CN110874612B (en) | Time interval prediction method and device, computer equipment and storage medium | |
CN112699372A (en) | Vulnerability processing method and device and computer readable storage medium | |
CN106528577B (en) | Method and device for setting file to be cleaned | |
CN113778839B (en) | Regression testing method and device and electronic equipment | |
CN114238344B (en) | Data release method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |