CN115309740A - Data archiving method, system, electronic device and storage medium - Google Patents

Data archiving method, system, electronic device and storage medium Download PDF

Info

Publication number
CN115309740A
CN115309740A CN202210807709.7A CN202210807709A CN115309740A CN 115309740 A CN115309740 A CN 115309740A CN 202210807709 A CN202210807709 A CN 202210807709A CN 115309740 A CN115309740 A CN 115309740A
Authority
CN
China
Prior art keywords
data
archived
information
archiving
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210807709.7A
Other languages
Chinese (zh)
Inventor
顾伟涛
曹彩鹏
朱国庆
周游
刘培锴
陈斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Fuyun Network Technology Co ltd
Original Assignee
Hangzhou Fuyun Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Fuyun Network Technology Co ltd filed Critical Hangzhou Fuyun Network Technology Co ltd
Priority to CN202210807709.7A priority Critical patent/CN115309740A/en
Publication of CN115309740A publication Critical patent/CN115309740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data archiving method, a system, an electronic device and a storage medium, wherein the data archiving method comprises the following steps: acquiring data to be archived after the sub-base and the sub-table and the fragment meta-information of the data to be archived; inserting streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data; creating a distributed database according to the fragment meta-information, and synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data; the second synchronized data is archived as a result of the data. By the method and the device, the problems that the archived data cannot be quickly inquired and the maintenance cost is high are solved, and quick storage and inquiry of the archived data result are realized.

Description

Data archiving method, system, electronic device and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data archiving method, system, electronic device, and storage medium.
Background
At present, a mysql database is generally used for storing service data, and mass data is stored in the mysql database along with the expansion of service scale; for historical data, an archiving operation is often required, and in the archiving operation process, a database-based and table-based mode is generally adopted for archiving due to large data volume.
In the prior art, data archiving is generally performed by a method of backing up historical data to an sql file through mysql restore, backing up the historical data through other backup tools, extracting the historical data to a data warehouse, or extracting the historical data to other databases (such as Tidb), however, after the data archiving is performed by the above method, the data cannot be checked at any time and can be queried quickly, and the maintenance cost is high.
Aiming at the problems that the archived data cannot be quickly inquired and the maintenance cost is high in the related technology, no effective solution is provided at present.
Disclosure of Invention
The embodiment provides a data archiving method, a data archiving system, an electronic device and a storage medium, so as to solve the problems that the archived data cannot be rapidly inquired and the maintenance cost is high in the related art.
In a first aspect, in this embodiment, a data archiving method is provided, including:
acquiring data to be archived after being sorted and tabulated and fragment meta-information of the data to be archived;
inserting streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data;
creating a distributed database according to the fragment meta-information, and synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data;
and taking the second synchronous data as a data archiving result.
In some embodiments, the streaming processing the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data includes:
acquiring a time field of the data to be archived according to the fragment meta-information;
acquiring configuration parameters of the streaming service, and acquiring the unique identifier information according to the configuration parameters;
and obtaining the first synchronization data at least according to the fragment meta-information, the time field of the data to be archived and the unique identifier information.
In some embodiments, the synchronizing the first synchronization data to the distributed database for distributed storage to obtain second synchronization data includes:
and synchronizing the first synchronization data to the distributed database at least according to the fragment meta-information and the field information of the distributed database to obtain second synchronization data.
In some embodiments, after obtaining the second synchronization data, before archiving the second synchronization data as a data archiving result, the method further includes:
comparing the data to be archived with the second synchronous data to obtain a comparison result; under the condition that the comparison result indicates that the data to be archived and the second synchronous data are different, acquiring and executing a transmission state statement of the distributed database at least according to field information of the distributed database to obtain a transmission state result; when the transmission state result indicates that the distributed database is in a synchronous state, acquiring a preset waiting time, and after waiting for the preset waiting time, obtaining a secondary comparison result;
or deleting the synchronized second synchronous data, performing streaming processing on the data to be archived to obtain third synchronous data, and synchronizing the third synchronous data into a distributed database to obtain fourth synchronous data; comparing the data to be archived with the fourth synchronous data to obtain a result of comparison again;
and deleting the data to be archived according to at least the fragment meta-information under the condition that the comparison result or the re-comparison result indicates that the data to be archived and the second synchronous data are the same.
In some embodiments, after the archiving the second synchronization data as a result of data, the method further comprises:
and acquiring a recovery fragment statement aiming at the data archiving result according to the fragment meta-information, executing the recovery fragment statement, and recovering the disk fragments of the data to be archived.
In some embodiments, after the archiving the second synchronization data as a result of data, the method further comprises:
and acquiring distributed meta information of the distributed database, and inquiring the data archiving result corresponding to the data to be archived at least according to the distributed meta information.
In some embodiments, the distributed database is a StarRocks database, and/or the streaming service is a maxwell service.
In a second aspect, there is provided in this embodiment a data archiving system, comprising: a terminal device, a transmission device, and a server device; the terminal equipment is connected with the server equipment through the transmission equipment;
the server device is configured to execute the data archiving method according to the first aspect;
the transmission equipment is used for transmitting a data archiving result;
and the terminal equipment is used for displaying the data archiving result.
In a third aspect, in this embodiment, there is provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the data archiving method according to the first aspect.
In a fourth aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the data archiving method according to the first aspect.
Compared with the related art, the data archiving method, the data archiving system, the electronic device and the storage medium provided in the embodiment acquire the data to be archived after the sub-base is sorted and the fragment meta-information of the data to be archived; inserting streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data; creating a distributed database according to the fragment meta-information, and synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data; the second synchronous data is used as a data archiving result, so that the problems that the archived data cannot be quickly inquired and the maintenance cost is high are solved, and the quick storage and inquiry of the data archiving result are realized.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more concise and understandable description of the application, and features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a diagram of an application scenario of a data archiving method in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a data archiving method in one embodiment;
FIG. 3 is a schematic flow chart diagram illustrating a data archiving method in accordance with another embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.
Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (including a reference to the context of the specification and claims) are to be construed to cover both the singular and the plural, as well as the singular and plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference in this application to "connected," "coupled," and the like is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.
The data archiving method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal device 102 communicates with the server device 104 via a network. The server device 104 acquires the data to be archived after the sub-base and the sub-table, and the fragment meta-information of the data to be archived; the server device 104 inserts the streaming service into the data to be archived, and performs streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data; the server device 104 creates a distributed database according to the fragment meta-information, and synchronizes the first synchronization data to the distributed database for distributed storage, so as to obtain second synchronization data; the server device 104 archives the second synchronization data as a result of the data. The terminal device 102 is used for displaying the data archiving result. The terminal device 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server device 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In this embodiment, a data archiving method is provided, and fig. 2 is a flowchart of the data archiving method of this embodiment, as shown in fig. 2, the flowchart includes the following steps:
step S202, the data to be archived after being sorted and tabulated is obtained, and the fragment meta-information of the data to be archived is obtained. The data to be archived is mass data, and the data to be archived is divided into tables and pieces through a database and at least one piece; the data to be archived is stored in a relational database, such as MySQL, noSQL and the like, preferably the data to be archived is stored in MySQL; the fragment meta-information of the data to be archived includes an IP address, a port, a partition name, a time field, a user name, a user password, and a user identifier of the data to be archived.
Step S204, inserting the streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data. The streaming service is a service for converting the data to be archived into streaming data, has the characteristics of periodicity and real-time performance, and can perform streaming processing on at least one piece of data to be archived periodically or in real time; the streaming service can be Canal, maxwell or mysql _ streamer, etc.; at least one piece of unique identifier information is provided, and the unique identifier information corresponds to the data to be archived one by one; the first synchronization data is stored in topic of a streaming database, which may be kafka, redis, kinesis, or the like, and preferably the first synchronization data is stored in kafka. In particular, in this embodiment, inserting a streaming service into the database sub-table sub-module access of the data to be archived based on the relational database is implemented, where the data to be archived after the primary database sub-table and/or sub-table sub-module access corresponds to one streaming service; the insertion operation of the streaming service can be implemented by sql statements, for example, bootstrap statements are used in maxwell services.
Step S206, a distributed database is created according to the fragment meta-information, and the first synchronization data is synchronized to the distributed database for distributed storage, so as to obtain second synchronization data. The distributed database may be, for example, tiDB, spanner, starRocks, or the like.
Step S208, the second synchronous data is taken as a data archiving result. The data archiving method in the embodiment can be automatically run through a shell script.
Through the steps, the data to be archived are synchronized into first synchronous data in a streaming database at regular intervals or in real time, then the first synchronous data are synchronized into second synchronous data in a distributed database, the data to be archived are archived to obtain the data archiving result, the data archiving result is stored in a distributed type, mass data storage can be easily supported, the access performance is high, and a large amount of disk space is saved, so that the data archiving result can be quickly stored and inquired, and the problems that the archived data cannot be quickly inquired and the maintenance cost is high are solved.
In some embodiments, the streaming processing the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data includes:
acquiring a time field of the data to be archived according to the fragment meta-information;
acquiring configuration parameters of the streaming service, and acquiring the unique identifier information according to the configuration parameters;
the first synchronization data is derived at least from the fragmentation meta-information, the time field of the data to be archived and the unique identifier information.
The configuration parameters of the streaming service further include a filter filtering parameter, the filter filtering parameter performs basic filtering on the data to be archived in the running process of the streaming service, the filter filtering parameter may be userid >0, and the userid refers to a user identifier for generating the data to be archived.
Specifically, in this embodiment, a start time and an end time are obtained, where the start time and the end time are time conditions for performing screening on a time field of the data to be archived; acquiring a time field of the data to be archived according to the fragment meta-information; acquiring configuration parameters of the streaming service, and acquiring the unique identifier information and filter filtering parameters according to the configuration parameters; and extracting the data to be archived and storing the data to be archived into the streaming database according to the starting time, the deadline time, the time field of the data to be archived, the unique identifier information, the database name, the branch table name and the user identifier of the fragment meta information to obtain the first synchronization data. Taking the streaming service as maxwell and the streaming database as kafka as an example, the kafka database is started first and a topic is created to store the first synchronization data; secondly, synchronously creating a corresponding maxwell service for the data to be filed of each sub-library and sub-table fragment, integrating the maxwell service into a maxwell management library, and managing all maxwell services in the maxwell management library; aiming at the data to be archived of each sub-library sub-table fragment, the following sql is executed:
insert into bootstrap(database_name,table_name,where_clause,client_id)
values (' repository name ', ' List name ', ' where condition ', ' maxwell ' unique identifier information ');
the database name and the sub-table name refer to the database name and the sub-table name in the fragment meta-information of the data to be archived; the where condition refers to the condition limitation of screening the time field according to the starting time and the ending time; the unique identifier information of maxwell refers to unique identifier information obtained from configuration parameters of maxwell service.
After the sql statement is executed to start maxwell service, checking that the topic message of kafka is empty, running is normal, maxwell automatically extracts historical data to the topic corresponding to kafka, and first synchronization data are generated.
Through the steps, according to the obtained starting time and the obtained ending time, the historical data in the appointed time range can be extracted into the streaming database regularly or in real time, so that the extraction function of mass data to be archived is easily realized, and the rapid extraction of hundreds of sub-tables of data to be archived is supported; through setting up filtering parameter, filter invalid data to promote data extraction efficiency and quality, solved and filed the unable quick inquiry of data, the problem that maintenance cost is high.
In some embodiments, the synchronizing the first synchronization data to the distributed database for distributed storage to obtain second synchronization data includes:
and synchronizing the first synchronization data to the distributed database at least according to the fragment meta-information and the field information of the distributed database to obtain second synchronization data.
The field information of the distributed database comprises a library name, a table name and a field name of the distributed database, and the field information of the distributed database corresponds to the fragment meta-information of the data to be archived.
Specifically, first synchronization data to be archived are acquired from the streaming database according to the fragment meta-information; and synchronizing the first synchronous data to the distributed database according to the IP, the port, the user name and the user password of the distributed database and the field information of the distributed database to obtain second synchronous data. Taking the streaming database as kafka and the distributed database as StarRocks database as an example, the data archiving method in this embodiment first acquires an IP, a port, a user name, and a user password, where StarRocks are deployed, logs in the StarRocks service, and creates a StarRocks database according to the fragmentation meta-information and syntax requirements of StarRocks, where the StarRocks database performs fragmentation according to a time field; according to the topic format and StarRocks field information of the streaming database kafka, the following sql is executed to create a corresponding route load task:
Figure BDA0003738845500000071
in this embodiment, sr _ db and sr _ tab refer to the library name and table name of StarRocks; rl _ name refers to the name of the specific route load task; the where condition refers to a filtering condition corresponding to c1, c2, c3, c4, c5, etc., for example, c1 is a partition field, and the where condition may be: c1>0; kafka cluster link address, kafka's topic name refer to the specific kafka address and topic name, respectively, and if topic sets N partitions, the topic partition number is replaced with: 0,1,2, N-1.
Through the steps, the first synchronous data is synchronized into the second synchronous data through the distributed storage, effective messages can be accurately distinguished according to the field information of the distributed database, massive data storage is easily supported, a large amount of disk space can be saved, accordingly, rapid storage and query of the data archiving result are achieved, and the problems that the archived data cannot be rapidly queried and the maintenance cost is high are solved.
In some embodiments, after obtaining the second synchronization data, before archiving the second synchronization data as a data archiving result, the method further includes:
comparing the data to be archived with the second synchronous data to obtain a comparison result; under the condition that the comparison result indicates that the data to be archived and the second synchronous data are different, acquiring and executing a transmission state statement of the distributed database at least according to the field information of the distributed database to obtain a transmission state result; when the transmission state result indicates that the distributed database is in a synchronous state, acquiring a preset waiting time length, and after waiting for the preset waiting time length, acquiring a secondary comparison result;
or deleting the synchronized second synchronous data, performing streaming processing on the data to be archived to obtain third synchronous data, and synchronizing the third synchronous data into the distributed database to obtain fourth synchronous data; comparing the data to be archived with the fourth synchronous data to obtain a result of comparison again;
and deleting the data to be archived according to at least the fragment meta-information under the condition that the comparison result or the comparison result indicates that the data to be archived and the second synchronous data are the same.
The comparison result may be a result obtained by comparing the data volume or the data similarity of the data to be archived and the second synchronous data in the same time range, and the comparison result includes two results of different data volumes and the same data volume by comparing the data volumes; when the comparison result is that the data volume is different, waiting for the end of the process of synchronizing the second synchronous data by the distributed database, or re-executing the streaming processing and distributed synchronizing processes to respectively obtain third synchronous data and fourth synchronous data, thereby re-calculating to obtain a re-comparison result of the data volume; and under the condition that the data volume is the same, deleting the data to be archived and releasing the disk space.
Specifically, taking the streaming service as maxwell, the streaming database as kafka, the distributed database as StarRocks, and the comparison result as the data volume comparison result as an example, in this embodiment, the data volumes of the data to be archived and the second synchronization data are compared to obtain the data volume comparison result; and under the condition that the data volume comparison result indicates that the data volumes of the data to be archived and the second synchronization data are different, executing an sql statement according to the field information of the StarRocks library of the distributed database to acquire the synchronization state of the StarRocks, for example: show route load for rl _ name \ G; if the RUNNING result of the sql statement indicates that the transmission State result State of the StarRocks is the synchronization State RUNNING and the value of the ReasonOfStateChanged field is null, then the StaRocks is in the normal synchronization kafka message; acquiring a preset waiting time length, and after waiting for the preset waiting time length, acquiring a secondary comparison result;
or deleting the synchronized second synchronous data, re-executing maxwell streaming processing on the data to be archived, transmitting the data to kafka to obtain third synchronous data, and synchronizing the third synchronous data to a StarRocks library of a distributed database to obtain fourth synchronous data; comparing the data volume of the data to be archived with the data volume of the fourth synchronous data to obtain a result of comparison again;
and deleting the data to be archived according to the fragment meta-information, the start time and the deadline under the condition that the comparison result or the re-comparison result indicates that the data to be archived and the second synchronous data are the same, and specifically executing the following sql statement:
pt-archiver-source h = fragment IP, P = fragment port, D = library name, t = table name, u = mysql username, P = mysql password-where condition "— pump-limit = 1000-no-check-charset-txn-size = 1000-progress = 1000-max-lag = 3600-what-quit
The fragment IP, the fragment port, the sub-library name, the sub-table name, the mysql user name and the mysql password respectively refer to an IP address, a port, a sub-library name, a sub-table name, a user name and a user password in the fragment meta-information; the where condition refers to a condition for screening the time field according to the start time and the deadline.
Through the steps, the comparison result is obtained by comparing the second synchronous data with the data to be filed, the data quantity comparison between a large-scale sub-table and a StarRocks table can be realized, the data filing accuracy is ensured, the data to be filed which is backed up as the second synchronous data within a certain time range is deleted, the database service is not influenced, a large amount of disk space can be vacated, the data storage cost is saved, the data filing accuracy and efficiency are improved, and the problems that the filed data cannot be quickly inquired and the maintenance cost is high are solved.
In some embodiments, after archiving the second synchronization data as a result of the data, the method further comprises:
and acquiring a recovery fragment statement aiming at the data archiving result according to the fragment meta-information, executing the recovery fragment statement, and recovering the disk fragment of the data to be archived.
Specifically, the sql of the reclaim shard statement is as follows:
perl pt-online-schema-change h = fragment IP, P = fragment port, D = library name, t = table name, u = mysql username, P = mysql password-alter "reclaim fragment sql" — execute-replay-method = "hosts" — -charset = utf8mb 4-critical-load Threads _ running = 500-no-check-alter-no-version-check
Wherein, the format of the recovery fragment sql in the recovery fragment statement is: the entry table is divided into segments db.
Through the steps, the disk fragments of the data to be archived, which are backed up, are recovered, so that a large amount of disk space can be vacated, the data storage cost is saved, and the problems that the archived data cannot be quickly inquired and the maintenance cost is high are solved.
In some embodiments, after the recovering the disk fragment of the data to be archived, the method further includes:
and acquiring distributed meta information of the distributed database, and inquiring the data archiving result corresponding to the data to be archived at least according to the distributed meta information.
The distributed meta information refers to an IP, a port, a user name, a user password, a library name, and a table name of the distributed database.
Through the steps, the archived data archiving result is queried, the rapid storage and query of the data archiving result can be realized, and the problems that the archived data cannot be rapidly queried and the maintenance cost is high are solved.
In some of these embodiments, the distributed database is a StarRocks database, and/or the streaming service is a maxwell service.
The embodiment also provides a data archiving method. Fig. 3 is a flowchart of another data archiving method according to this embodiment, and as shown in fig. 3, the flowchart includes the following steps:
step S302, data to be archived is acquired. And acquiring the data to be archived after the sub-base and the sub-table and the fragment meta-information of the data to be archived.
Step S304, the streaming service synchronizes the data to be archived to obtain first synchronization data. Inserting the streaming service into the data to be archived, and acquiring a time field of the data to be archived according to the fragment meta-information; acquiring configuration parameters of the streaming service, and acquiring the unique identifier information according to the configuration parameters; and extracting the data to be archived and storing the data to be archived into the streaming database according to the starting time, the deadline, the time field of the data to be archived, the unique identifier information, the sub-library name, the sub-table name and the user identifier of the fragment meta-information to obtain the first synchronization data.
Step S306, the first synchronous data is processed in a distributed mode, and second synchronous data is obtained. And establishing a distributed database according to the fragment meta information, and synchronizing the first synchronization data into the distributed database according to the IP, the port, the user name and the user password of the distributed database and the field information of the distributed database to obtain second synchronization data.
Step S308, the second synchronization data is checked. Judging whether the second synchronous data is the same as the data to be archived, if so, executing step S310; if not, the process returns to step S304.
And step S310, deleting the data to be archived after the archiving is finished. And deleting the data to be archived, which is completely archived, according to the fragment meta-information, the start time and the deadline, and taking the second synchronous data as a data archiving result.
In step S312, the sql disk fragments are recycled. And acquiring a recovery fragment statement aiming at the data archiving result according to the fragment meta-information, executing the recovery fragment statement, and recovering the disk fragments of the data to be archived.
Through the steps, the data to be archived are synchronized into first synchronous data in a streaming database at regular intervals or in real time, then the first synchronous data are synchronized into second synchronous data in a distributed database, the data to be archived are archived to obtain the data archiving result, the data archiving result is stored in a distributed type, mass data storage can be easily supported, the access performance is high, and a large amount of disk space is saved, so that the data archiving result can be quickly stored and inquired, and the problems that the archived data cannot be quickly inquired and the maintenance cost is high are solved.
It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In this embodiment, a data archiving system is further provided, which includes: a terminal device 102, a transmission device, and a server device 104; wherein, the terminal device 102 is connected to the server device 104 through the transmission device;
the server device 104 is configured to perform the steps of any of the method embodiments described above;
the transmission equipment is used for transmitting the data archiving result;
the terminal device 102 is configured to display the data archiving result.
There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring data to be archived after the sub-base and the sub-table and the fragment meta-information of the data to be archived.
S2, inserting the streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data.
And S3, creating a distributed database according to the fragment meta-information, and synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data.
And S4, taking the second synchronous data as a data archiving result.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.
In addition, in combination with the data archiving method provided in the foregoing embodiment, a storage medium may also be provided to implement this embodiment. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the data archiving methods of the above embodiments.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data archiving result data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data archiving method.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.
It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The term "embodiment" is used herein to mean that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly or implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method for archiving data, comprising:
acquiring data to be archived after being sorted and tabulated and fragment meta-information of the data to be archived;
inserting streaming service into the data to be archived, and performing streaming processing on the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain first synchronization data;
creating a distributed database according to the fragment meta-information, and synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data;
and taking the second synchronous data as a data archiving result.
2. The data archiving method according to claim 1, wherein said streaming processing the data to be archived according to at least the fragment meta-information and the unique identifier information corresponding to the streaming service to obtain the first synchronization data comprises:
acquiring a time field of the data to be archived according to the fragment meta-information;
acquiring configuration parameters of the streaming service, and acquiring the unique identifier information according to the configuration parameters;
and obtaining the first synchronization data at least according to the fragment meta-information, the time field of the data to be archived and the unique identifier information.
3. The data archiving method according to claim 1, wherein the synchronizing the first synchronization data into the distributed database for distributed storage to obtain second synchronization data comprises:
and synchronizing the first synchronization data to the distributed database at least according to the fragment meta-information and the field information of the distributed database to obtain second synchronization data.
4. The data archiving method according to claim 1, wherein after said obtaining the second synchronization data, before said archiving the second synchronization data as a data archiving result, further comprising:
comparing the data to be archived with the second synchronous data to obtain a comparison result; under the condition that the comparison result indicates that the data to be archived and the second synchronous data are different, acquiring and executing a transmission state statement of the distributed database at least according to field information of the distributed database to obtain a transmission state result; when the transmission state result indicates that the distributed database is in a synchronous state, acquiring a preset waiting time, and after waiting for the preset waiting time, obtaining a secondary comparison result;
or deleting the synchronized second synchronous data, performing streaming processing on the data to be archived to obtain third synchronous data, and synchronizing the third synchronous data into a distributed database to obtain fourth synchronous data; comparing the data to be archived with the fourth synchronous data to obtain a result of comparison again;
and deleting the data to be archived according to at least the fragment meta-information under the condition that the comparison result or the re-comparison result indicates that the data to be archived and the second synchronous data are the same.
5. The data archiving method according to claim 1, further comprising, after said presenting the second synchronization data as a data archiving result:
and acquiring a recovery fragment statement aiming at the data archiving result according to the fragment meta-information, executing the recovery fragment statement, and recovering the disk fragment of the data to be archived.
6. The data archiving method according to any one of claims 1 to 5, further comprising, after said using said second synchronization data as a data archiving result:
and acquiring distributed meta information of the distributed database, and inquiring the data archiving result corresponding to the data to be archived at least according to the distributed meta information.
7. The data archiving method according to claim 1, wherein the distributed database is a StarRocks database, and/or the streaming service is a maxwell service.
8. A data archiving system, comprising: a terminal device, a transmission device and a server device; the terminal equipment is connected with the server equipment through the transmission equipment;
the server device is used for executing the data archiving method of any one of claims 1 to 7;
the transmission equipment is used for transmitting a data archiving result;
and the terminal equipment is used for displaying the data archiving result.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the data archiving method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data archiving method according to any one of claims 1 to 7.
CN202210807709.7A 2022-07-11 2022-07-11 Data archiving method, system, electronic device and storage medium Pending CN115309740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210807709.7A CN115309740A (en) 2022-07-11 2022-07-11 Data archiving method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210807709.7A CN115309740A (en) 2022-07-11 2022-07-11 Data archiving method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115309740A true CN115309740A (en) 2022-11-08

Family

ID=83856533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210807709.7A Pending CN115309740A (en) 2022-07-11 2022-07-11 Data archiving method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115309740A (en)

Similar Documents

Publication Publication Date Title
CN110276002B (en) Search application data processing method and device, computer equipment and storage medium
CN110347746B (en) Heterogeneous database synchronous data consistency checking method and device
US9031997B2 (en) Log file compression
CN104809201A (en) Database synchronization method and device
CN109376196B (en) Method and device for batch synchronization of redo logs
CN104809202A (en) Database synchronization method and device
CN108121827B (en) Full data synchronization method and device
CN103778136A (en) Cross-room database synchronization method and system
CN104809200A (en) Database synchronization method and device
CN109086382B (en) Data synchronization method, device, equipment and storage medium
US20100228722A1 (en) Method and system for updating images in an image database
CN110737719A (en) Data synchronization method, device, equipment and computer readable storage medium
CN107391303B (en) Data processing method, device, system, server and computer storage medium
CN109634975B (en) Data synchronization method and device, electronic equipment and computer readable storage medium
CN114661823A (en) Data synchronization method and device, electronic equipment and readable storage medium
CN112395360B (en) Data synchronization method, device, apparatus and medium based on non-relational database
CN110008284A (en) Method for synchronizing data of database and equipment based on data page preloading and rollback
CN110209680A (en) Data-updating method, device and electronic device based on Hive external table
CN110196880B (en) Heterogeneous database data synchronization method and device, storage medium and electronic device
CN112711649A (en) Database multi-field matching method, device, equipment and storage medium
CN112699183A (en) Data processing method, system, readable storage medium and computer equipment
CN115309740A (en) Data archiving method, system, electronic device and storage medium
CN111858767A (en) Synchronous data processing method, device, equipment and storage medium
CN112818021B (en) Data request processing method, device, computer equipment and storage medium
CN113535478B (en) Data backup method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination