CN114584574B - Data synchronization method and device, computer equipment and storage medium - Google Patents

Data synchronization method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114584574B
CN114584574B CN202210457043.7A CN202210457043A CN114584574B CN 114584574 B CN114584574 B CN 114584574B CN 202210457043 A CN202210457043 A CN 202210457043A CN 114584574 B CN114584574 B CN 114584574B
Authority
CN
China
Prior art keywords
data
file
server
index
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210457043.7A
Other languages
Chinese (zh)
Other versions
CN114584574A (en
Inventor
陈立军
陈涛
魏军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Barda Technology Co ltd
Original Assignee
Wuhan Sitong Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Sitong Information Service Co ltd filed Critical Wuhan Sitong Information Service Co ltd
Priority to CN202210457043.7A priority Critical patent/CN114584574B/en
Publication of CN114584574A publication Critical patent/CN114584574A/en
Application granted granted Critical
Publication of CN114584574B publication Critical patent/CN114584574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method, a data synchronization device, computer equipment and a storage medium, wherein the data synchronization method comprises the following steps: the method comprises the steps that a log pulling component pulls a database log of an index database in real time, and index information of data to be synchronized is analyzed from the database log; acquiring node state information of each server node in a plurality of server nodes, and determining a plurality of data containing types of each server node and the data containing amount of each data containing type in the data containing types according to the node state information; acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, a plurality of data types and data volumes; and synchronizing the target file data to the file index server. The method and the device can improve the real-time performance of data synchronization and can avoid the problem of important data synchronization failure caused by node states.

Description

Data synchronization method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data synchronization method and apparatus, a computer device, and a storage medium.
Background
The Elasticissearch is a distributed, highly-extended and highly-real-time search and data analysis engine, can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring, fully utilizes the horizontal flexibility of the Elasticissearch, and can enable the data to become more valuable in a production environment. To utilize the data search, analysis and exploration functions of the Elasticsearch, the data of the database needs to be imported into the Elasticsearch server, and the data synchronization between the database and the Elasticsearch server is realized. In the prior art, when data synchronization is performed on a database and an elastic search server, data to be synchronized in the database is obtained at regular time, and the obtained data to be synchronized is synchronized to the elastic search server, so that the real-time performance of data synchronization is poor.
Disclosure of Invention
Embodiments of the present application provide a data synchronization method, apparatus, computer device, and storage medium, which can improve real-time performance of data synchronization and avoid a problem of failure of synchronization of important data due to a node state.
In one aspect, the present application provides a data synchronization method applied to a data synchronization device, where the data synchronization device is in communication connection with an index database, a file storage server, and a file index server, the file index server includes a plurality of server nodes, and the data synchronization method includes:
the database log of the index database is pulled in real time through a log pulling component, and index information of data to be synchronized is analyzed from the database log;
acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the data containing amount of each data type contained in the plurality of data types according to the node state information;
acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, the plurality of types of the accommodated data and the accommodated data volume;
and synchronizing the target file data to the file index server.
In some embodiments of the present application, the file attribute information includes a file importance level, a file type, and a file generation time, and the extracting target file data from the file storage server according to the index information, the file attribute information, the data types and the data volume comprises:
determining a target score corresponding to the index information according to the file importance level, the file type and the file generation time;
and extracting target file data from the file storage server according to the index information, the target score, the plurality of types of the contained data and the contained data amount.
In some embodiments of the present application, the determining a target score corresponding to the index information according to the importance level of the document, the type of the document, and the document generation time includes:
respectively determining a first score, a second score and a third score corresponding to the index information according to the file importance level, the file type and the file generation time;
and determining a target score corresponding to the index information according to the first score, the second score and the third score.
In some embodiments of the present application, said extracting target document data from said document storage server according to said index information, said target score, said number of types of data to accommodate, and said amount of data to accommodate comprises:
screening target index information from the index information according to the target score, the types of the plurality of contained data and the contained data amount;
and extracting target file data from the file storage server according to the target index information.
In some embodiments of the present application, the node status information includes a total storage space, a used storage space, and node capability information, where the node capability information is used to characterize data processing capabilities of the server nodes, and the determining, according to the node status information, a plurality of data types to be accommodated by each server node and a data amount to be accommodated by each data type of the plurality of data types to be accommodated includes:
determining a plurality of data types contained by each server node according to the node capacity information;
determining the residual storage space of each server node according to the total storage space and the used storage space;
and determining the data accommodating amount of each accommodating data type in the accommodating data types based on the accommodating data types of each server node and the residual storage space.
In some embodiments of the present application, the determining, based on the number of accommodated data types of each server node and the remaining storage space, an accommodated data amount of each of the number of accommodated data types includes:
inputting the plurality of data types and the residual storage space of each server node into a data volume distribution model, and outputting the data volume of each of the plurality of data types through the data volume distribution model.
In some embodiments of the present application, the determining, based on the number of accommodated data types of each server node and the remaining storage space, an accommodated data amount of each of the number of accommodated data types includes:
determining a data volume weight of each of the plurality of accommodated data types based on the plurality of accommodated data types of each server node;
and determining the data volume of each of the plurality of data types according to the data volume weight and the residual storage space.
In another aspect, the present application provides a data synchronization apparatus, where the data synchronization apparatus is respectively connected to an index database, a file storage server, and a file index server in communication, where the file index server includes a plurality of server nodes, and the data synchronization apparatus includes:
the information acquisition unit is used for pulling the database logs of the index database in real time through a log pulling component and analyzing the index information of the data to be synchronized from the database logs;
the data determining unit is used for acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the contained data amount of each data type contained in the plurality of data types according to the node state information;
the data extraction unit is used for acquiring file attribute information corresponding to the index information and extracting target file data from the file storage server according to the index information, the file attribute information, the types of the plurality of contained data and the contained data amount;
and the data synchronization unit is used for synchronizing the target file data to the file index server.
In another aspect, the present application further provides a computer device, including:
one or more processors;
a memory; and
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the processor to implement the data synchronization method of any of the first aspects.
In a fourth aspect, the present application further provides a computer readable storage medium, on which a computer program is stored, the computer program being loaded by a processor to execute the steps of the data synchronization method according to any one of the first aspect.
According to the data synchronization method and device, data synchronization is carried out through the database logs pulled by the log pulling component in real time, the real-time performance of file data synchronization can be improved, the target file data are extracted from the file storage server according to the node state information and the file attribute information, the file data can be extracted according to the importance degree of the file data to carry out data synchronization, and the problem that the important file data synchronization fails due to the node state is solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a data synchronization system provided in an embodiment of the present application;
FIG. 2 is a flow chart illustrating an embodiment of a data synchronization method provided in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an embodiment of a data synchronization apparatus provided in the embodiment of the present application;
fig. 4 is a schematic structural diagram of an embodiment of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be considered as limiting the present application. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not set forth in detail in order to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
It should be noted that, since the method in the embodiment of the present application is executed in a computer device, processing objects of each computer device all exist in the form of data or information, for example, time, which is substantially time information, and it is understood that, in the subsequent embodiments, if size, number, position, and the like are mentioned, corresponding data exist so as to be processed by the computer device, and details are not described herein.
Embodiments of the present application provide a data synchronization method, an apparatus, a computer device, and a storage medium, which are described in detail below.
Referring to fig. 1, fig. 1 is a schematic view of a data synchronization system according to an embodiment of the present disclosure, where the data synchronization system may include a computer device 100, an index database, a file storage server and a file index server, where index information is stored in the index database, file data corresponding to the index information is stored in the file storage server, a data synchronization device is integrated in the computer device 100, and the data synchronization device is respectively in communication connection with the index database, the file storage server and the file index server, and is configured to synchronize file data in the file storage server to the file index server, so that the file data stored in the file storage server can be retrieved and analyzed by the file index server.
In the embodiment of the application, the computer device 100 is mainly used for pulling the database log of the index database in real time through the log pulling component, and analyzing the index information of the data to be synchronized from the database log; acquiring node state information of each server node in a plurality of server nodes, and determining a plurality of data containing types of each server node and the data containing amount of each data containing type in the data containing types according to the node state information; acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, a plurality of data types and data volumes; the target file data are synchronized to the file index server, the data in the file storage server can be synchronized to the file index server in real time, data synchronization can be carried out according to the node state information and the file attribute information, and the problem that important data synchronization fails due to the node state is solved.
In this embodiment, the computer device 100 may be an independent server, or may be a server network or a server cluster composed of servers, for example, the computer device 100 described in this embodiment includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server composed of a plurality of servers. Among them, the Cloud server is constituted by a large number of computers or web servers based on Cloud Computing (Cloud Computing).
It will be appreciated that the computer device 100 used in the embodiments of the present application may be a device that includes both receiving and transmitting hardware, i.e., a device having receiving and transmitting hardware capable of performing two-way communications over a two-way communications link. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display. The specific computer device 100 may specifically be a desktop terminal or a mobile terminal, and the computer device 100 may also specifically be one of a mobile phone, a tablet computer, a notebook computer, and the like.
Those skilled in the art will appreciate that the application environment shown in fig. 1 is only one application scenario related to the present application, and does not constitute a limitation on the application scenario of the present application, and that other application environments may further include more or less computer devices than those shown in fig. 1, for example, only 1 computer device is shown in fig. 1, and it is understood that the data synchronization system may further include one or more other services, which are not limited herein.
In addition, as shown in fig. 1, the data synchronization system may further include a memory 200 for storing data, such as node status information, e.g., total storage space of the server node, used storage space of the server node, node capability information of the server node, e.g., file attribute information, e.g., file importance level, file type, file generation time, and the like.
It should be noted that the scenario diagram of the data synchronization system shown in fig. 1 is merely an example, and the data synchronization system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation to the technical solution provided in the embodiment of the present application.
First, an embodiment of the present application provides a data synchronization method, where an execution subject of the data synchronization method is a data synchronization apparatus, and the data synchronization apparatus is applied to a computer device, and the data synchronization method includes: the method comprises the steps that a log pulling component pulls a database log of an index database in real time, and index information of data to be synchronized is analyzed from the database log; acquiring node state information of each server node in a plurality of server nodes, and determining a plurality of data containing types of each server node and the data containing amount of each data containing type in the data containing types according to the node state information; acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, a plurality of data types and data volumes; and synchronizing the target file data to the file index server.
As shown in fig. 2, which is a schematic flow chart of an embodiment of a data synchronization method in the embodiment of the present application, the data synchronization method includes:
s100, a log pulling component pulls the database log of the index database in real time, and index information of data to be synchronized is analyzed from the database log.
Specifically, the index database stores index information corresponding to file data that needs to be subjected to data retrieval and analysis, and the index database is a mysql database, for example. The database log is a database incremental log or a binary log corresponding to the index database and is used for recording change conditions occurring in the index database, such as data modification, table creation and modification. For example, when the index database is a mysql database, the database log is a binylog log (also called Binlog log) of the mysql database, and the binylog log is an SQL statement for recording database changes in a binary form and is used for recording changes occurring in the mysql database, such as data modification, table creation, and modification. The log pull component is a component for pulling and parsing the database log of the index database in the data synchronization device, for example, the log pull component may be a canal component, which is an open source product under the alli flag and provides incremental data subscription and consumption based on the database incremental log parsing.
In this embodiment, the log pull module monitors the database log of the index database in real time, and when it is monitored that the database log of the index database changes, the database log is obtained, and the index information of the data to be synchronized is analyzed from the database log, so that file data can be synchronized based on the index information in the subsequent steps. For example, if file data corresponding to index information a, index information B, and index information C needs to be synchronized by parsing from the database log, the index information includes index information a, index information B, and index information C.
In a specific implementation manner, when the index database is a mysql database, the mysql database comprises a master database and a slave database, the master database is used for recording database change information in a database log when the mysql database is changed, the log pull component is a canal component, and the canal component is connected with the master database of the mysql database. When the database log of the mysql database is pulled through the canal component, the canal component simulates an interaction protocol of a slave library of the mysql database, disguises the canal component as the slave library of the mysql database, sends a dump protocol to a master library of the mysql database, receives the dump protocol from the master library of the mysql database, and starts to push the database log to the canal component.
S200, obtaining node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the data containing amount of each data type contained in the plurality of data types according to the node state information.
The server nodes are all server nodes in the file index server, the node state information is information related to states of the server nodes, the node state information includes but is not limited to total storage space, used storage space, node capacity information and the like, the total storage space is the total storage space size of the server nodes, the used storage space is the used storage space size of the server nodes, and the node capacity information is used for representing data processing capacity of the server nodes, for example, the node capacity information may be the number of cores of processors of the server nodes, for example, the processors of the server nodes are 4 cores, 8 cores and the like. The data type is the type of file data that can be stored by the server node, the data type includes but is not limited to pictures, videos, texts, and the like, and the data amount is the data amount of each data type that can be currently and still be accommodated by each server node.
In consideration of different data types and different data volumes which can be accommodated by server nodes in different states in the file index server, in the embodiment, after the index information of the data to be synchronized is analyzed from the database log, the node state information of each server node in the plurality of server nodes is obtained, and then the plurality of data types accommodated by each server node and the data volume accommodated by each data type accommodated by each server node are determined according to the node state information, so that the file data can be synchronized according to the plurality of data types accommodated by each data type accommodated in the subsequent step.
In one embodiment, step S200 includes:
s210, determining a plurality of data types contained in each server node according to the node capability information;
s220, determining the residual storage space of each server node according to the total storage space and the used storage space;
s230, determining the data accommodating amount of each accommodating data type in the accommodating data types based on the accommodating data types of each server node and the residual storage space.
The remaining storage space is the size of the remaining storage space of the server node, and is determined by the total storage space and the used storage space of the server node, for example, the total storage space is 100G, the used storage space is 50G, and then the remaining storage space is 50G. In the foregoing steps, it is mentioned that the node capability information may represent data processing capabilities of the server nodes, and the data types accommodated by the server nodes with different data processing capabilities may be different, for example, a server node with a strong data processing capability may process a video, and a server node with a weak data processing capability may only process a picture and a text, so that a plurality of data types accommodated by each server node may be determined according to the node capability information of each server node.
In this embodiment, when determining the plurality of data types and the data volume of each server node, first, the plurality of data types of each server node are determined according to the node capability information, and the remaining storage space of each server node is determined according to the total storage space of each server node and the used storage space of each server node, and then, the data volume of each data type of the plurality of data types is determined based on the plurality of data types and the remaining storage space of each server node. For example, the types of the plurality of data to be accommodated are determined to be videos, pictures and texts according to the node capability information, the remaining storage space is determined to be 50G according to the total storage space and the used storage space, and the data accommodating amount of the videos, the data accommodating amount of the pictures and the data accommodating amount of the texts are determined based on the types of the plurality of data to be accommodated and the remaining storage space.
In one embodiment, step S230 includes:
and S231', inputting the plurality of accommodating data types of each server node and the residual storage space into a data quantity distribution model, and outputting the accommodating data quantity of each accommodating data type in the plurality of accommodating data types through the data quantity distribution model.
After determining the plurality of data types and the remaining storage space of each server node, the present embodiment inputs the plurality of data types and the remaining storage space of each server node into the data size distribution model, and outputs the data size of each data type of the plurality of data types of each server node through the data size distribution model. The data size distribution model is obtained by training a preset network model based on a preset training sample set, and the preset network model may adopt a deep learning model or a machine learning model, such as a Convolutional Neural Network (CNN), a deconvolution Neural network (De-Convolutional network, DN), and the like. The embodiment determines the data capacity of each data type based on the data capacity distribution model, and can improve the real-time performance of data synchronization.
When training the preset network model, firstly inputting the plurality of groups of file data types in the training sample set and the storage space corresponding to each group of file data types into the preset network model, outputting the predicted data volume of each file data type in each group of file data types through the preset network model, then determining a loss value according to the predicted data volume, the real data volume and a loss function of the preset network model, correcting the model parameters of the preset network model according to a preset parameter learning rate when the loss value does not meet the preset condition, and continuously executing the step of outputting the predicted data volume of each file data type in each group of file data types through the preset network model, and obtaining a data quantity distribution model until the loss value meets the preset condition. The loss value meeting the preset condition may be that the loss value is smaller than a preset first threshold, or that a difference between the loss values obtained in two times before and after the loss value is smaller than a preset second threshold.
In one embodiment, step S230 includes:
s231', determining the data volume weight of each accommodating data type in the accommodating data types based on the accommodating data types of each server node;
and S232', determining the data capacity of each of the plurality of data types according to the data capacity weight and the residual storage space.
The data amount weight is a weight when the remaining storage space is allocated for each of the received data types, in this embodiment, a corresponding relationship between the received data type and the data amount weight may be preset, for example, the data amount weight corresponding to a video is 30%, the data amount weight corresponding to a picture is 40%, and the data amount weight corresponding to a text is 30%. For example, if the data amount weight of the video is determined to be 30%, the data amount weight of the picture is determined to be 40%, the data amount weight of the text is determined to be 30%, and the remaining storage space is determined to be 50G, the accommodated data amount of the video is 15G, the accommodated data amount of the picture is 20G, and the accommodated data amount of the text is 15G.
S300, obtaining file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, the types of the plurality of contained data and the contained data amount.
The file storage server is used for storing file data corresponding to the index information in the index database, the file storage server includes but is not limited to a minio server, the minio server is a high-performance distributed object storage system, the minio server is compatible with an amazon S3 cloud storage server interface, and the file storage server is very suitable for storing large-capacity unstructured data such as pictures, videos, log files, backup data and container/virtual machine images. The file attribute information is attribute information of file data corresponding to the index information, the file attribute information includes but is not limited to file importance level, file type and file generation time, the file importance level represents importance of the file data corresponding to the index information, the file type is the file type of the file data corresponding to the index information, the file generation time is the generation time of the file data corresponding to the index information, and the target file data is the file data which needs to be subjected to data synchronization and is screened according to the index information, the node state information and the attribute information.
Considering the limitation of the state of the node server, it may happen that the data to be synchronized corresponding to the index information cannot be completely synchronized to the file index server, for example, the data to be synchronized corresponding to the index information is 60G, and the remaining storage space of the server node is only 50G. In order to avoid the problem of failure in synchronizing important file data due to the node state, in this embodiment, after determining a plurality of data types and data volumes of each data type, file attribute information corresponding to the index information is further obtained, and the target file data is extracted from the text storage server according to the index information, the text attribute information, the plurality of data types and the data volumes.
In a specific embodiment, the step of determining, according to the node state information, a plurality of data types to be accommodated by each server node and a data capacity of each data type to be accommodated in the plurality of data types to be accommodated in step S300 includes:
s310, determining a target score corresponding to the index information according to the file importance level, the file type and the file generation time;
s320, extracting target file data from the file storage server according to the index information, the target scores, the types of the plurality of contained data and the quantity of the contained data.
The target score is determined according to the file attribute information and is used for representing the importance degree of the file data corresponding to the index information, for example, when the file attribute information includes the file importance level, the file type and the file generation time, the target score is determined according to the file importance level, the file type and the file generation time, and generally, the larger the target score value is, the more important the file data is.
In a specific embodiment, the file attribute information includes a file importance level, a file type, and a file generation time, and when target file data is extracted according to the index information, the file attribute information, the plurality of accommodated data types, and the accommodated data amount, a target score corresponding to the index information is determined according to the file importance level, the file type, and the file generation time, and then the target file data is extracted from the file storage server according to the index information, the target score, the plurality of accommodated data types, and the accommodated data amount.
In one embodiment, step S310 includes:
s311, respectively determining a first score, a second score and a third score corresponding to the index information according to the file importance level, the file type and the file generation time;
s312, determining a target score corresponding to the index information according to the first score, the second score and the third score.
In this embodiment, a correspondence relationship between an importance level and a first score, a correspondence relationship between a file type and a second score, and a correspondence relationship between a file generation time and a third score may be preset, for example, file data with importance levels of I level, II level, and III level respectively correspond to a score value a1, a score value a2, and a score value A3, and a video, an image, and a text respectively correspond to a score value B1, a score value B2, and a score value B3. After the file attribute information corresponding to the index information is obtained, a first score, a second score and a third score corresponding to the index information can be respectively determined according to the file importance level, the file type, the file generation time and a preset corresponding relation, and then a target score corresponding to the index information is determined according to the first score, the second score and the third score. When determining the target score, the first score, the second score, and the third score may be summed, or the first score, the second score, and the third score may be weighted and summed, or the first score, the second score, and the third score may be summed and then averaged, which is not limited in this application.
In one embodiment, step S320 includes:
s321, screening target index information from the index information according to the target score, the types of the plurality of data to be accommodated and the data accommodating amount;
s322, extracting target file data from the file storage server according to the target index information.
The target index information is index information of important file data screened from index information of data to be synchronized, and when the target file data is extracted according to the target score, the plurality of data types and the data amount, the target index information is firstly screened from the index information according to the target score, the plurality of data types and the data amount, and then the target file data is extracted from the file storage server according to the target index information, so that the file data can be extracted from the file storage server according to the importance degree of the file data for data synchronization, and the problem of failure in synchronization of the important file data caused by the node state is solved.
S400, synchronizing the target file data to the file index server.
After the target file data is extracted from the file storage server, the target file data is further synchronized to the file index server, so that the file data synchronization between the file storage server and the file index server is realized, the data synchronization is performed through the database logs pulled by the log pulling component in real time, the real-time performance of the file data synchronization can be improved, the target file data is extracted from the file storage server according to the node state information and the file attribute information, the file data can be extracted according to the importance degree of the file data for performing the data synchronization, and the problem of failure of the important file data synchronization caused by the node state is solved.
In order to better implement the data synchronization method in the embodiment of the present application, based on the data synchronization method, an embodiment of the present application further provides a data synchronization apparatus, as shown in fig. 3, where the data synchronization apparatus 600 includes:
an information obtaining unit 601, configured to pull a database log of the index database in real time through a log pulling component, and analyze index information of data to be synchronized from the database log;
a data determining unit 602, configured to obtain node state information of each server node in the multiple server nodes, and determine, according to the node state information, a plurality of data types accommodated by each server node and an amount of data accommodated by each data type accommodated by the plurality of data types;
a data extracting unit 603, configured to obtain file attribute information corresponding to the index information, and extract target file data from the file storage server according to the index information, the file attribute information, the multiple types of data to be stored, and the data volume to be stored;
a data synchronization unit 604, configured to synchronize the target file data to the file index server.
In the embodiment of the application, the database logs pulled in real time by the log pulling component are used for data synchronization, so that the real-time performance of file data synchronization can be improved, the target file data is extracted from the file storage server according to the node state information and the file attribute information, the file data can be extracted according to the importance degree of the file data for data synchronization, and the problem of failure of important file data synchronization caused by the node state is solved.
In some embodiments of the present application, the data determining unit 602 is specifically configured to:
determining a plurality of data types contained by each server node according to the node capacity information;
determining the residual storage space of each server node according to the total storage space and the used storage space;
and determining the data accommodating amount of each accommodating data type in the accommodating data types based on the accommodating data types of each server node and the residual storage space.
In some embodiments of the present application, the data determining unit 602 is further specifically configured to:
inputting the plurality of data types and the residual storage space of each server node into a data volume distribution model, and outputting the data volume of each of the plurality of data types through the data volume distribution model.
In some embodiments of the present application, the data determining unit 602 is further specifically configured to:
determining a data volume weight of each of the plurality of accommodated data types based on the plurality of accommodated data types of each server node;
and determining the data volume of each of the plurality of data types according to the data volume weight and the residual storage space.
In some embodiments of the present application, the data extraction unit 603 is specifically configured to:
determining a target score corresponding to the index information according to the file importance level, the file type and the file generation time;
and extracting target file data from the file storage server according to the index information, the target score, the plurality of types of the contained data and the contained data amount.
In some embodiments of the present application, the data extracting unit 603 further includes:
respectively determining a first score, a second score and a third score corresponding to the index information according to the file importance level, the file type and the file generation time;
and determining a target score corresponding to the index information according to the first score, the second score and the third score.
In some embodiments of the present application, the data extracting unit 603 further includes:
screening target index information from the index information according to the target score, the types of the plurality of contained data and the contained data amount;
and extracting target file data from the file storage server according to the target index information.
An embodiment of the present application further provides a computer device, which integrates any one of the data synchronization apparatuses provided in the embodiment of the present application, where the computer device includes:
one or more processors;
a memory; and
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the processor for performing the steps of the data synchronization method described in any of the above data synchronization method embodiments.
The embodiment of the present application further provides a computer device, which integrates any one of the data synchronization apparatuses provided in the embodiments of the present application. Fig. 4 is a schematic diagram showing a structure of a computer device according to an embodiment of the present application, specifically:
the computer device may include components such as a processor 701 of one or more processing cores, memory 702 of one or more computer-readable storage media, a power supply 703, and an input unit 704. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 4 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 701 is a control center of the computer apparatus, connects various parts of the entire computer apparatus using various interfaces and lines, and performs various functions of the computer apparatus and processes data by running or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory 702, thereby monitoring the computer apparatus as a whole. Alternatively, processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701.
The memory 702 may be used to store software programs and modules, and the processor 701 executes various functional applications and data processing by operating the software programs and modules stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 702 may also include a memory controller to provide the processor 701 with access to the memory 702.
The computer device further includes a power supply 703 for supplying power to the various components, and preferably, the power supply 703 is logically connected to the processor 701 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 703 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The computer device may also include an input unit 704, the input unit 704 being operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 701 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 702 according to the following instructions, and the processor 701 runs the application program stored in the memory 702, thereby implementing various functions as follows:
the database log of the index database is pulled in real time through a log pulling component, and index information of data to be synchronized is analyzed from the database log;
acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the data containing amount of each data type contained in the plurality of data types according to the node state information;
acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, the plurality of types of the accommodated data and the accommodated data volume;
and synchronizing the target file data to the file index server.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like. The computer program is loaded by a processor to execute the steps of any one of the data synchronization methods provided by the embodiments of the present application. For example, the computer program may be loaded by a processor to perform the steps of:
the database log of the index database is pulled in real time through a log pulling component, and index information of data to be synchronized is analyzed from the database log;
acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the data containing amount of each data type contained in the plurality of data types according to the node state information;
acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, the plurality of types of the accommodated data and the accommodated data volume;
and synchronizing the target file data to the file index server.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again.
In a specific implementation, each unit or structure may be implemented as an independent entity, or may be combined arbitrarily to be implemented as one or several entities, and the specific implementation of each unit or structure may refer to the foregoing method embodiment, which is not described herein again.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The data synchronization method, apparatus, computer device and storage medium provided by the embodiments of the present application are introduced in detail above, and a specific example is applied in the present application to explain the principle and implementation manner of the present application, and the description of the above embodiments is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (7)

1. A data synchronization method is applied to a data synchronization device, the data synchronization device is respectively in communication connection with an index database, a file storage server and a file index server, the file index server comprises a plurality of server nodes, and the data synchronization method comprises the following steps:
the database log of the index database is pulled in real time through a log pulling component, and index information of data to be synchronized is analyzed from the database log;
acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the data containing amount of each data type contained in the plurality of data types according to the node state information;
the node state information includes a total storage space, a used storage space, and node capability information, where the node capability information is used to characterize data processing capability of the server nodes, and the determining, according to the node state information, a plurality of data types to be accommodated by each server node and an amount of data to be accommodated by each data type to be accommodated in the plurality of data types includes:
determining a plurality of data types contained by each server node according to the node capacity information;
determining the residual storage space of each server node according to the total storage space and the used storage space;
determining the data capacity of each accommodating data type in the accommodating data types based on the accommodating data types of each server node and the residual storage space;
the determining, based on the plurality of accommodated data types of each server node and the remaining storage space, an accommodated data amount of each of the plurality of accommodated data types includes:
inputting a plurality of data types and the residual storage space of each server node into a data volume distribution model, and outputting the data volume of each data type in the plurality of data types through the data volume distribution model;
acquiring file attribute information corresponding to the index information, and extracting target file data from the file storage server according to the index information, the file attribute information, the plurality of data types and the data volume;
and synchronizing the target file data to the file index server.
2. The data synchronization method of claim 1, wherein the file attribute information includes a file importance level, a file type, and a file generation time, and the extracting target file data from the file storage server according to the index information, the file attribute information, the data types and the data volume comprises:
determining a target score corresponding to the index information according to the file importance level, the file type and the file generation time;
and extracting target file data from the file storage server according to the index information, the target score, the plurality of types of the contained data and the contained data amount.
3. The data synchronization method according to claim 2, wherein the determining the target score corresponding to the index information according to the file importance level, the file type, and the file generation time includes:
respectively determining a first score, a second score and a third score corresponding to the index information according to the file importance level, the file type and the file generation time;
and determining a target score corresponding to the index information according to the first score, the second score and the third score.
4. The data synchronization method according to claim 2, wherein the extracting target file data from the file storage server according to the index information, the target score, the plurality of types of the accommodated data, and the amount of the accommodated data comprises:
screening target index information from the index information according to the target scores, the types of the plurality of data and the data capacity;
and extracting target file data from the file storage server according to the target index information.
5. A data synchronization apparatus for performing data synchronization by using the data synchronization method according to any one of claims 1 to 4, wherein the data synchronization apparatus is communicatively connected to an index database, a file storage server, and a file index server, respectively, the file index server includes a plurality of server nodes, and the data synchronization apparatus includes:
the information acquisition unit is used for pulling the database logs of the index database in real time through a log pulling component and analyzing the index information of the data to be synchronized from the database logs;
the data determining unit is used for acquiring node state information of each server node in the plurality of server nodes, and determining a plurality of data types contained in each server node and the contained data amount of each data type contained in the plurality of data types according to the node state information;
the data extraction unit is used for acquiring file attribute information corresponding to the index information and extracting target file data from the file storage server according to the index information, the file attribute information, the types of the plurality of contained data and the contained data amount;
and the data synchronization unit is used for synchronizing the target file data to the file index server.
6. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the data synchronization method of any of claims 1 to 4.
7. A computer-readable storage medium, having stored thereon a computer program which is loaded by a processor for performing the steps of the data synchronization method of any one of claims 1 to 4.
CN202210457043.7A 2022-04-28 2022-04-28 Data synchronization method and device, computer equipment and storage medium Active CN114584574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210457043.7A CN114584574B (en) 2022-04-28 2022-04-28 Data synchronization method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210457043.7A CN114584574B (en) 2022-04-28 2022-04-28 Data synchronization method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114584574A CN114584574A (en) 2022-06-03
CN114584574B true CN114584574B (en) 2022-08-02

Family

ID=81785319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210457043.7A Active CN114584574B (en) 2022-04-28 2022-04-28 Data synchronization method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114584574B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095096B (en) * 2023-01-05 2024-05-03 中国联合网络通信集团有限公司 Data synchronization method, device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806301A (en) * 2021-09-29 2021-12-17 中国平安人寿保险股份有限公司 Data synchronization method, device, server and storage medium

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756108A (en) * 2004-09-29 2006-04-05 华为技术有限公司 Master/backup system data synchronizing method
NO325864B1 (en) * 2006-11-07 2008-08-04 Fast Search & Transfer Asa Procedure for calculating summary information and a search engine to support and implement the procedure
US9607004B2 (en) * 2014-06-18 2017-03-28 International Business Machines Corporation Storage device data migration
CN105657444A (en) * 2014-11-25 2016-06-08 中兴通讯股份有限公司 Query and search method, device and system for business data
US10891302B2 (en) * 2018-01-08 2021-01-12 Accenture Global Solutions Limited Scalable synchronization with cache and index management
CN111046036A (en) * 2019-11-05 2020-04-21 深信服科技股份有限公司 Data synchronization method, device, system and storage medium
CN111858747A (en) * 2020-05-29 2020-10-30 大数金科网络技术有限公司 Method for synchronizing MySQL database to Elasticissearch
CN111782620A (en) * 2020-06-19 2020-10-16 多加网络科技(北京)有限公司 Credit link automatic tracking platform and method thereof
CN112100275A (en) * 2020-09-02 2020-12-18 上海微亿智造科技有限公司 Data synchronization method, system and electronic equipment
CN113239013B (en) * 2021-05-17 2024-04-09 北京青云科技股份有限公司 Distributed system and storage medium
CN113407634A (en) * 2021-07-05 2021-09-17 挂号网(杭州)科技有限公司 Data synchronization method, device, system, server and storage medium
CN113656503A (en) * 2021-08-20 2021-11-16 北京健康之家科技有限公司 Data synchronization method, device and system and computer readable storage medium
CN113934713A (en) * 2021-09-02 2022-01-14 广州伊的家网络科技有限公司 Order data indexing method, system, computer equipment and storage medium
CN114297292A (en) * 2021-12-20 2022-04-08 贵州电子商务云运营有限责任公司 Data synchronization system based on canal platform and execution method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806301A (en) * 2021-09-29 2021-12-17 中国平安人寿保险股份有限公司 Data synchronization method, device, server and storage medium

Also Published As

Publication number Publication date
CN114584574A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
US11853352B2 (en) Method and apparatus for establishing image set for image recognition, network device, and storage medium
US20150213042A1 (en) Search term obtaining method and server, and search term recommendation system
CN108763578A (en) A kind of newer method of index file and server
CN114584574B (en) Data synchronization method and device, computer equipment and storage medium
CN112417846A (en) Text automatic generation method and device, electronic equipment and storage medium
US20230069999A1 (en) Method and apparatus for updating recommendation model, computer device and storage medium
CN113190773A (en) Rendering method of display data, electronic equipment, mobile terminal and storage medium
CN111310072B (en) Keyword extraction method, keyword extraction device and computer-readable storage medium
CN117033082A (en) Virtual machine backup recovery method and device, computer equipment and storage medium
CN115185830A (en) Test unit-based case generation method, device, equipment and storage medium
CN113704299A (en) Model training method and device, storage medium and computer equipment
CN115221060A (en) Case generation method, device and equipment based on associated field and storage medium
CN115174890A (en) Flow playback test method and device, computer equipment and storage medium
CN111538859A (en) Method and device for dynamically updating video label and electronic equipment
CN115578096A (en) Block chain parallel transaction method, device, equipment and storage medium
CN114925078A (en) Data updating method, system, electronic device and storage medium
CN114615287B (en) File backup method and device, computer equipment and storage medium
CN115705320A (en) Index generation method and device, computer equipment and computer readable storage medium
CN113761293A (en) Graph data strong-connectivity component mining method, device, equipment and storage medium
CN112749297B (en) Video recommendation method, device, computer equipment and computer readable storage medium
CN115242685B (en) Playback testing method, device, equipment and storage medium based on incidence matrix
CN113890872B (en) Data set uploading method and device, electronic equipment and storage medium
JP2019144873A (en) Block diagram analyzer
CN114090506A (en) Log creating method and device and storage medium
CN114898888B (en) Medical data processing method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 430070 No. 1, 2 and 10, floors 1-3, building A7, Rongke Zhigu industrial project phase I, No. 555, Wenhua Avenue, Hongshan District, Wuhan City, Hubei Province

Patentee after: Wuhan Barda Technology Co.,Ltd.

Address before: 430070 No. 1, 2 and 10, floors 1-3, building A7, Rongke Zhigu industrial project phase I, No. 555, Wenhua Avenue, Hongshan District, Wuhan City, Hubei Province

Patentee before: Wuhan Sitong Information Service Co.,Ltd.