CN116628068A - Data handling method, system and readable storage medium based on dynamic window - Google Patents

Data handling method, system and readable storage medium based on dynamic window Download PDF

Info

Publication number
CN116628068A
CN116628068A CN202310915713.XA CN202310915713A CN116628068A CN 116628068 A CN116628068 A CN 116628068A CN 202310915713 A CN202310915713 A CN 202310915713A CN 116628068 A CN116628068 A CN 116628068A
Authority
CN
China
Prior art keywords
data
calculating
server
continuity
handling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310915713.XA
Other languages
Chinese (zh)
Inventor
徐行
吴杰
严军荣
闵良志
范能科
朱王飞
杨幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hengtai Technology Co ltd
Original Assignee
Hangzhou Hengtai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hengtai Technology Co ltd filed Critical Hangzhou Hengtai Technology Co ltd
Priority to CN202310915713.XA priority Critical patent/CN116628068A/en
Publication of CN116628068A publication Critical patent/CN116628068A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data carrying method, a system and a readable storage medium based on a dynamic window, wherein the method comprises the following steps: acquiring data source information and target information; performing data slicing according to the correlation characteristics of the data; configuring a dynamic window according to the performance index of the server cluster; generating a parallel carrying queue according to the data slicing; and carrying the data fragments in the parallel carrying queue according to the configured dynamic window value. The invention solves the problems of prolonged data carrying and reduced carrying efficiency of large data quantity in the related technology.

Description

Data handling method, system and readable storage medium based on dynamic window
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a data handling method and system based on a dynamic window and a readable storage medium.
Background
With the continuous development of the big data application field, the storage requirement and the analysis processing requirement of enterprises on data are higher and higher. The data warehouse/bazaar can realize data integration across business lines and systems, and provides unified data support for management analysis and business decision, so that the data warehouse/bazaar is widely applied to enterprises. In the data warehouse/mart construction process, data is layered or zoned, and data between different layers or different zones needs to be converted with each other. The process of extracting, converting and loading from the source end to the destination end is ETL (Extract-Transform-Load).
As the volume of enterprise data increases, the volume of data that needs to be handled in the short term increases significantly. The existing data handling system needs to extract data and then store the extracted data to a disk completely, and then convert the data stored to the disk and load the converted data to a destination. Data handling of large data volumes not only results in prolonged storage and loading times, but also occupies a large amount of disk space resulting in insufficient disk space and reduced performance, thereby resulting in reduced handling efficiency.
In order to reduce the data carrying time delay and improve the data carrying efficiency, a data carrying method, a system and a readable storage medium based on a dynamic window are provided.
Disclosure of Invention
The embodiment of the invention provides a data carrying method, a system and a readable storage medium based on a dynamic window, which at least solve the problems of prolonged data carrying and reduced carrying efficiency of a large amount of data in the related art.
According to one embodiment of the present invention, a method for data handling based on a dynamic window is provided, including:
acquiring data source information and target information;
performing data slicing according to the correlation characteristics of the data; the correlation characteristic of the data comprises any one or more of data attribute, data continuity or data blood-edge relation;
Configuring a dynamic window according to the performance index of the server cluster; the performance index of the server cluster comprises any one or more of the characteristics of the server cluster or the load of the server cluster or the historical transmission information of the server cluster;
generating a parallel carrying queue according to the data slicing;
and carrying the data fragments in the parallel carrying queue according to the configured dynamic window value.
In an exemplary embodiment, the data source information and the target information include any one or more of combination of type information, address information, port information, user information, password information of the data source and the target; the type information comprises a relational database, a big data platform, a file server and a message queue.
In an exemplary embodiment, the data slicing according to the correlation characteristic of the data includes the steps of:
calculating the attribute similarity between the data according to the data length and/or the data type and/or the data time requirement, namely calculating the attribute similarity between the data according to the data length similarity, or calculating the attribute similarity between the data according to the data type similarity, or calculating the attribute similarity between the data according to the data time requirement similarity, or calculating the attribute similarity between the data according to the data length and the data type similarity, or calculating the attribute similarity between the data according to the data length and the data time requirement similarity, or calculating the attribute similarity between the data according to the data type and the data time requirement similarity, or calculating the attribute similarity between the data according to the data length and the data type and the data time requirement similarity;
Calculating continuity between data according to the content continuity and/or sequence number continuity and/or time continuity of the data, namely calculating continuity between data according to the content continuity of the data, or calculating continuity between data according to the sequence number continuity of the data, or calculating continuity between data according to the time continuity of the data, or calculating continuity between data according to the content continuity of the data and the sequence number continuity of the data, or calculating continuity between data according to the content continuity of the data and the time continuity of the data, or calculating continuity between data according to the sequence number continuity of the data and the time continuity of the data, or calculating continuity between data according to the content continuity of the data and the sequence number continuity of the data;
calculating the blood edge similarity between the data according to the similarity of the blood edge relationship of the data;
calculating a data association value according to attribute similarity and/or continuity and/or blood edge similarity between data, namely calculating a data association value according to attribute similarity between data, or calculating a data association value according to continuity between data, or calculating a data association value according to blood edge similarity between data, or calculating a data association value according to attribute similarity between data and continuity between data, or calculating a data association value according to blood edge similarity between data, or calculating a data association value according to continuity between data and blood edge similarity between data, or calculating a data association value according to attribute similarity between data and blood edge similarity between data;
Taking the data with the data relevance value larger than a preset relevance threshold value as a data fragment; the relevance threshold is calculated according to the data volume and the queue bearing capacity.
In an exemplary embodiment, the configuring the dynamic window according to the performance index of the server cluster includes the steps of:
calculating parallel service capability assessment values according to the number of server clusters and/or network environments, namely calculating parallel service capability assessment values according to the number of server clusters, or calculating parallel service capability assessment values according to the network environments of the server clusters, or calculating parallel service capability assessment values according to the number of server clusters and the network environments of the server clusters;
calculating a load capacity evaluation value according to the CPU utilization rate and/or the memory occupancy rate and/or the memory total amount and/or the memory residual amount and/or the disk read-write speed of the server, namely calculating a load capacity evaluation value according to the CPU utilization rate of the server, or calculating a load capacity evaluation value according to the memory occupancy rate of the server, or calculating a load capacity evaluation value according to the CPU utilization rate of the server and the disk read-write speed of the server, or calculating a load capacity evaluation value according to the memory occupancy rate of the server and the disk read-write speed of the server, or calculating a load capacity evaluation value according to the CPU utilization rate of the server and the memory occupancy rate of the server and the disk read-write speed of the server;
Calculating a historical transmission efficiency evaluation value according to the historical transmission speed and/or the historical failure rate and/or the historical memory utilization rate of the server cluster, namely calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical failure rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster and the historical memory utilization rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical failure rate of the server cluster and the historical memory utilization rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster and the historical failure rate of the server cluster and the historical memory utilization rate of the server cluster;
calculating a window indication value according to the correlation between the parallel service capability evaluation value and/or the load capability evaluation value and/or the historical transmission efficiency evaluation value and the window indication value, namely calculating a window indication value according to the correlation between the parallel service capability evaluation value and the window value, or calculating a window indication value according to the correlation between the load capability evaluation value and the window value, or calculating a window indication value according to the correlation between the parallel service capability evaluation value and the historical transmission efficiency evaluation value and the window value, or calculating a window indication value according to the correlation between the load capability evaluation value and the historical transmission efficiency evaluation value and the window value, or calculating a window indication value according to the correlation between the parallel service capability evaluation value and the load capability evaluation value and the historical transmission efficiency evaluation value and the window value;
And configuring a dynamic window according to the calculated window indication value.
In one exemplary embodiment, at step: the method further comprises the following steps of: filtering data according to a preset data quality inspection rule, including:
identifying error data in the data fragments according to a preset data format correctness rule;
identifying the repeatability data in the data fragments according to a preset data repeatability rule;
identifying incomplete data in the data fragments according to a preset data integrity rule;
and deleting the error data, the repeated data and the incomplete data in the data fragments.
In one exemplary embodiment, at step: the method further comprises the following steps of: and screening incremental data according to a data increment judging rule, namely searching a target library according to the content ID in the data fragment and judging whether the target library is in the target library, if so, deleting the content in the data fragment, otherwise, reserving the content in the data fragment.
In one exemplary embodiment, at step: the method further comprises the following steps of: and converting the data types, and converting the content types of the filtered and screened data fragments according to the data type mapping relation of the data source and the target library.
In an exemplary embodiment, the generating parallel handling queues according to the data slicing includes the steps of:
calculating the queue priority of different contents according to the time requirement and/or storage amount and/or retransmission identification of different contents in each data slice, namely calculating the queue priority of different contents according to the time requirement and retransmission identification of different contents in each data slice, or calculating the queue priority of different contents according to the storage amount and retransmission identification of different contents in each data slice, or calculating the queue priority of different contents according to the time requirement and storage amount of different contents in each data slice, or calculating the queue priority of different contents according to the time requirement and retransmission identification of different contents in each data slice, or calculating the queue priority of different contents according to the storage amount and retransmission identification of different contents in each data slice;
Sequentially arranging the content according to the queue priority orders of different contents in each data fragment to generate a data fragment queue;
and generating parallel handling queues according to the number of the server clusters by the data slicing queues.
In one exemplary embodiment, at step: after generating the parallel handling queue according to the data slicing, the method further comprises the steps of: adjusting concurrency according to the handling queue length and the dynamic window value, including:
calculating the idle degree according to the matching relation between the length of the carrying queue corresponding to the server and the dynamic window value;
if the idle degree is larger than a preset resource idle threshold, calculating the concurrency degree of the server according to the idle degree, and increasing the parallel number of the parallel handling queues.
In an exemplary embodiment, the step of handling the data slices in the parallel handling queue according to the configured dynamic window value includes the steps of:
calculating a dynamic window value corresponding to each server in the server cluster;
and distributing a carrying queue for the server according to the dynamic window value and concurrency corresponding to the server, and carrying out data carrying.
In an exemplary embodiment, the method further comprises the step of: breakpoint continuous transmission of handling failure data, including:
when the data is in failure, feeding back an ID or a time mark of the failure data to a server;
Identifying the breakpoint position of the data in the handling queue according to the continuity of the data ID and/or the time stamp, i.e. identifying the breakpoint position of the data in the handling queue according to the continuity of the data ID, or identifying the breakpoint position of the data in the handling queue according to the continuity of the time stamp, or identifying the breakpoint position of the data in the handling queue according to the continuity of the data ID and the time stamp;
and retransmitting the data according to the data ID and the breakpoint position of the data in the handling queue and placing the data at the breakpoint position of the handling queue.
In an exemplary embodiment, the method further comprises the step of: recording data transmission process data and judging data blood-edge relationship, including the steps:
recording the speed and the data flow of data transmission in the data transmission process;
recording a data source and a target library of each data handling;
and forming a data blood relationship chain according to the data transmission process data and the data source and target library of the data handling.
A computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the above-described method.
According to another embodiment of the present invention, there is provided a dynamic window based data handling system, comprising:
A server cluster;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in a memory and configured to be executed by the processor, the programs causing a computer to perform the above-described method.
The data carrying method, system and readable storage medium based on the dynamic window have the advantages that:
(1) And calculating the relevance of the data according to the data attribute and/or the data continuity and/or the data blood relationship and carrying out data slicing according to the relevance, and compared with the traditional technical scheme of data slicing, the method and the device can effectively improve the rationality of data slicing.
(2) Compared with the traditional queue transmission technical scheme, the dynamic window is configured according to the performance index of the server cluster, so that the memory utilization rate of the server can be effectively improved, and the problems of memory overload and blocking of the server are effectively avoided.
(3) According to the content repeatability and/or content continuity and/or content correctness in the data fragments, calculating the quality index value of the content, deleting the content with the quality index value lower than the preset quality threshold value, searching the target library according to the content ID in the data fragments, and judging the existence of the data.
(4) According to the time requirements and/or storage capacity and/or retransmission identifications of different contents in each data slice, the queue priorities of the different contents are calculated, the contents are sequentially arranged according to the queue priority orders of the different contents in each data slice to generate data slice queues, the data slice queues are arranged in parallel according to the number of server clusters to generate parallel carrying queues, and compared with the traditional data queue generating method, the queue priority is effectively improved, and the queue transmission efficiency is improved.
(5) And the data fragments are carried in the parallel carrying queue according to the configured dynamic window value, so that compared with the traditional data carrying method of storing before transmitting, the data can be transmitted in parallel and dynamically, the requirement on data storage is reduced, and the efficiency and quality of carrying large data are improved.
Drawings
FIG. 1 is a flow chart of a dynamic window based data handling method according to an embodiment of the present invention;
FIG. 2 is a flow chart of sub-step S02 of an embodiment of the present invention;
FIG. 3 is a flow chart of sub-step S03 of an embodiment of the present invention;
FIG. 4 is a flow chart of an additional step S04' of an embodiment of the present invention;
FIG. 5 is a flow chart of sub-step S04 of an embodiment of the present invention;
FIG. 6 is a flow chart of an additional step S05' of an embodiment of the present invention;
FIG. 7 is a flow chart of sub-step S05 of an embodiment of the present invention;
FIG. 8 is a flow chart of an additional step S06 of an embodiment of the present invention;
fig. 9 is a flowchart of an additional step S07 of an embodiment of the invention;
FIG. 10 is a schematic diagram of a dynamic window based data handling system according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The data carrying method based on the dynamic window in the embodiment of the invention, a flow chart is shown in fig. 1, comprises the following steps:
step S01, acquiring data source information and target information;
step S02, data slicing is carried out according to the correlation characteristics of the data;
step S03, configuring a dynamic window according to the performance index of the server cluster;
s04, generating a parallel carrying queue according to the data slicing;
Step S05, carrying the data fragments in the parallel carrying queue according to the configured dynamic window value.
In an exemplary embodiment, the data source information and the target information include any one or more of combination of type information, address information, port information, user information, password information of the data source and the target; the type information comprises a relational database, a big data platform, a file server and a message queue. The relational or non-relational data sources mainly include: mySQL, oracle, SQLServer, postgreSQL, hive, HDFS, mongoDB, gbase, kingbase, etc., for a relational database, the data source information that it mainly configures includes ip address, port, user name, password information. The semi-structured data source includes txt, csv, excel, access, and for the semi-structured data source, the data source information mainly configured includes information such as communication protocol type (ftp/sftp), communication protocol information (ip, port, user, password, file path), and data file information (such as start row, start column, file content separator); the user-defined script acquisition needs to upload a script file, the java script additionally needs to be configured with a package name, a class name and a method name, and the Js script and the python script need to define a starting method in the script; the real-time data source mainly comprises: kafka, this type of data source requires configuration of service address, consuming group ID of topic, key and value decoder, etc.
In an exemplary embodiment, the step S02 performs data slicing according to the correlation characteristic of the data, and the flowchart is shown in fig. 2, and includes:
step S021, calculating attribute similarity between data according to the data length and/or data type and/or data time requirement similarity;
step S022, calculating the continuity between the data according to the content continuity and/or sequence number continuity and/or time continuity of the data;
step S023, calculating the blood-edge similarity between the data according to the similarity of the blood-edge relationship of the data;
step S024, calculating a data relevance value according to attribute similarity and/or continuity and/or blood edge similarity among the data;
step S025, taking the data with the data relevance value larger than a preset relevance threshold value as a data fragment; the relevance threshold is calculated according to the data volume and the queue bearing capacity.
In this embodiment, the calculating the attribute similarity between the data according to the data length and/or the data type and/or the similarity required by the data time is: the method comprises the steps of calculating attribute similarity between data according to a positive correlation relation of attribute similarity between data and length of data (calculated according to a ratio of length difference to length), calculating attribute similarity between data according to a positive correlation relation of attribute similarity between data types (calculated according to consistency or overlapping proportion of data formats and/or data structures), calculating attribute similarity between data according to an attribute similarity between data and time requirements (calculated according to a time difference of data time requirements, the higher the time required similarity is the smaller the time difference), calculating attribute similarity between data according to a positive correlation relation of attribute similarity between data and length of data, calculating attribute similarity between data according to a positive correlation relation of attribute similarity between data and attribute similarity between data according to a length similarity of data and a similarity between data of data types, calculating attribute similarity between data according to an attribute similarity between data and attribute similarity between data according to a consistency of data types and a similarity between data time requirements, calculating attribute similarity between data according to a positive correlation relation between attribute similarity between data and a data of data type and a similarity between data according to a length of data and a similarity between data of data requirements, and a positive correlation between data item similarity between data and a data item and a data similarity between data is represented by a positive correlation between data item and a data.
The calculating the continuity between the data according to the content continuity and/or sequence number continuity and/or time continuity of the data is as follows: the method comprises the steps of calculating continuity between data according to a positive correlation between continuity of data content (calculated according to content semantic relevance and/or semantic passing continuity of the data), calculating continuity between data according to a positive correlation between continuity of data sequence numbers (evaluated according to difference values of the data sequence numbers and/or consistency of data labels), calculating continuity between data according to a positive correlation between continuity of data time (calculated according to sequence and/or difference values of data time stamps) and continuity between data, calculating continuity between data according to a positive correlation between continuity of data content and continuity of data sequence numbers, calculating continuity between data according to a positive correlation between continuity of data content and continuity of data time, calculating continuity between data according to a positive correlation between continuity of data sequence numbers and continuity of data time, calculating continuity between continuity of data sequence numbers and continuity of data according to a positive correlation between continuity of data sequence numbers, and continuity between continuity of data according to a positive correlation between continuity of data sequence numbers, and continuity of data represents any one of continuous data item.
And the blood edge similarity among the data is calculated according to the similarity of the blood edge relationships of the data, wherein the similarity of the blood edge relationships is estimated according to the proportion of the same nodes in the blood edge relationship chain of the data, the blood edge similarity among the data is calculated according to the positive correlation between the similarity of the blood edge relationships and the blood edge similarity among the data, and the blood edge similarity among the data is represented by a variable y.
The calculating the data relevance value according to the attribute similarity and/or the continuity and/or the blood edge similarity between the data is calculating the data relevance value according to the positive correlation between the attribute similarity and/or the continuity and/or the blood edge similarity between different data and the data relevance value, wherein the data relevance value is represented by a variable x. In the table a, A1 to A7 represent different embodiments of calculating the data correlation value, for convenience of expression, the data correlation value x in the table a represents the data correlation value between two data, and the attribute similarity e between the data, the continuity d between the data, and the blood-margin similarity y between the data are calculated by adopting the method described in any one of the embodiments.
Table A different embodiments of calculating data relevance values
/>
The relevance threshold is calculated in advance according to the carried data quantity and the queue bearing capacity. The larger the data volume is, the more data in the same data fragment is, the lower the similarity requirement on the data is, the smaller the relevance threshold value is, namely the relevance threshold value and the carried data volume form a negative correlation relation. The stronger the queue carrying capacity (obtained according to the load capacity and/or transmission capacity of the server and/or network environment evaluation), the more data in the same data fragment, the lower the similarity requirement on the data, the smaller the relevance threshold, namely the relevance threshold and the queue carrying capacity form a negative correlation. In this embodiment, according to the correlation between the correlation threshold and the data amount and the queue carrying capacity and the functional relation between the carried data amount and the queue carrying capacity and the correlation threshold obtained by training a large amount of data, the current correlation threshold x=0.8 is calculated according to the functional relation between the carried data amount and the queue carrying capacity and the correlation threshold. And calculating the data relevance value X of the two data according to any calculation mode in the table A, and if X > X, judging that the two data are in the same data fragment. And calculating the data relevance value between every two data and comparing the data relevance value with the relevance threshold value so as to finish data slicing.
In an exemplary embodiment, the step S03 configures a dynamic window according to the performance index of the server cluster, and the flowchart is shown in fig. 3, and includes:
step S031, calculating parallel service capability assessment values according to the number of server clusters and/or network environments;
step S032, calculating a load capacity evaluation value according to CPU utilization rate and/or memory occupancy rate and/or disk read-write speed of the server;
step S033, calculating a historical transmission efficiency evaluation value according to the historical transmission speed and/or the historical failure rate and/or the historical memory utilization rate of the server cluster;
step S034, calculating a window indication value according to the correlation between the parallel service capability evaluation value and/or the load capability evaluation value and/or the historical transmission efficiency evaluation value and the window indication value;
step S035, the dynamic window is configured according to the calculated window indication value.
In this embodiment, the calculating the parallel service capability assessment value according to the number of server clusters and/or the network environment is: the parallel service capability evaluation value is calculated according to the positive correlation relation between the number of server clusters and the parallel service capability evaluation value, the parallel service capability evaluation value is calculated according to the positive correlation relation between the unobstructed degree of the network environment (calculated according to the network speed and/or the network congestion time) and the parallel service capability evaluation value, and any one of the parallel service capability evaluation values is calculated according to the number of server clusters and the positive correlation relation between the unobstructed degree of the network environment and the parallel service capability evaluation value, and the parallel service capability evaluation value is represented by a variable m.
The load capacity evaluation value is calculated according to the CPU utilization rate and/or the memory occupancy rate and/or the disk read-write speed of the server, and is: the load capacity evaluation value is calculated according to the negative correlation between the CPU utilization rate of the server and the load capacity evaluation value, the load capacity evaluation value is calculated according to the negative correlation between the memory occupancy rate of the server and the load capacity evaluation value, the load capacity evaluation value is calculated according to the positive correlation between the disk read-write speed of the server and the load capacity evaluation value, and the load capacity evaluation value is represented by a variable n according to the product or weighted sum of any two or three of the above.
The calculating the historical transmission efficiency evaluation value according to the historical transmission speed and/or the historical failure rate and/or the historical memory utilization rate of the server cluster is as follows: the method comprises the steps of calculating a historical transmission efficiency evaluation value according to a positive correlation relation between a historical transmission speed of a server cluster and the historical transmission efficiency evaluation value, calculating a historical transmission efficiency evaluation value according to a negative correlation relation between a historical failure rate of the server cluster and the historical transmission efficiency evaluation value, calculating a historical transmission efficiency evaluation value according to a positive correlation relation between a historical memory utilization rate of the server cluster and the historical transmission efficiency evaluation value, and calculating a historical transmission efficiency evaluation value according to a product or weighted sum of any two or three of the above, wherein the historical transmission efficiency evaluation value is represented by a variable p.
The window indication value is calculated according to the correlation between the parallel service capability evaluation value and/or the load capability evaluation value and/or the historical transmission efficiency evaluation value and the window indication value, and the window indication value is represented by a variable w according to the positive correlation between the parallel service capability evaluation value and/or the load capability evaluation value and/or the historical transmission efficiency evaluation value and the window indication value.
In the table B, B1 to B7 represent different embodiments of calculating window indication values, and in order to facilitate expression, the window indication value w in the table B represents a window indication value of a certain server, and the parallel service capability evaluation value m, the load capability evaluation value n and the historical transmission efficiency evaluation value p are calculated by adopting the methods described in any of the embodiments.
Table B different embodiments of calculating window indication values
/>
/>
/>
/>
And calculating a window indication value w according to any calculation mode in the table B, obtaining a dynamic window value according to a preset conversion relation between the window indication value and the dynamic window value, and configuring a dynamic window of each server according to the dynamic window value (the window refers to the sum of resources occupied by the server in the handling service).
In an exemplary embodiment, step S03 is followed by step S04', filtering the data according to a preset data quality inspection rule, where the flowchart is shown in fig. 4, and includes:
Step S04'1, identifying error data in the data fragments according to a preset data format correctness rule;
step S04'2, identifying repeated data in the data fragments according to a preset data repeated rule;
step S04'3, identifying incomplete data in the data fragments according to a preset data integrity rule;
step S04', deleting the error data, the repeated data and the incomplete data in the data slicing.
In this embodiment, error data in the data fragments are identified according to a preset data format correctness rule, repetitive data (only one group of repeated data is reserved) in the data fragments are identified according to a preset data repeatability rule, incomplete data (incomplete data) in the data fragments are identified according to a preset data integrity rule, and the error data, the repetitive data and the incomplete data in the data fragments are deleted.
In an exemplary embodiment, step S03 is followed by step S04″ and the incremental data is filtered according to the data increment determination rule, the target library is searched according to the content ID in the data fragment and whether the target library is in the target library is determined, if yes, the content in the data fragment is deleted, and if not, the content in the data fragment is reserved, so as to implement the incremental handling function.
In an exemplary embodiment, step S03 is further followed by step S04' "of converting the data into a data type, and converting the content types of the filtered and filtered data fragments according to the data type mapping relationship between the data source and the target library. The data type conversion may also be performed during handling.
In an exemplary embodiment, the step S04 generates a parallel handling queue according to the data slicing, and the flowchart is shown in fig. 5, and includes:
step S041, calculating the queue priority of different contents according to the time requirements and/or the storage capacity and/or the retransmission identifications of different contents in each data fragment;
step S042, sequentially arranging the contents according to the queue priority order of different contents in each data fragment to generate a data fragment queue;
and step S043, generating parallel carrying queues according to the number of the server clusters by the data slicing queues.
In this embodiment, the calculating the queue priority of different contents according to the time requirement and/or the storage amount and/or the retransmission identifier of different contents in each data slice is: calculating the queue priority of the content according to the positive correlation relation between the urgency of the time requirement of different content in each data slice (the urgency is higher when the time requirement is more recent), calculating the queue priority of the content according to the positive correlation relation between the storage amount of different content in each data slice and the priority, calculating the queue priority of the content according to the retransmission identification indication value (1) of the different content in each data slice, calculating the queue priority of the content according to the urgency of the time requirement of different content in each data slice and the positive correlation relation between the storage amount of the content and the priority, calculating the queue priority of the content according to the urgency of the time requirement of different content in each data slice and the retransmission identification indication value of the content, calculating the queue priority of the content according to the storage amount of different content in each data slice and the retransmission identification indication value of the content, and calculating any item of the priority of the content according to the urgency of the time requirement of different content in each data slice and the urgency of the storage amount of different content and the retransmission identification indication value of the content; and calculating the queue priority of different contents in each data fragment according to any method, and sequentially arranging the contents according to the queue priority sequence of different contents in each data fragment to generate a data fragment queue.
The parallel handling queues are generated according to the number of the server clusters by the data slicing queues, the parallel number of the parallel handling queues is calculated according to the number of the server clusters and the concurrency degree, and if the initial concurrency degree f=1, the parallel number is equal to the number of the server clusters.
In an exemplary embodiment, step S05' is further included after step S04, and the concurrency is adjusted according to the length of the handling queue and the dynamic window value, and the flowchart is shown in fig. 6, including:
step S05'1, calculating the idle degree according to the matching relation between the length of the carrying queue corresponding to the server and the dynamic window value;
and step S05'2, if the idleness is larger than a preset resource idleness threshold, calculating the concurrency of the server according to the idleness, and increasing the parallel line number of the parallel carrying queues.
In this embodiment, the larger the difference between the dynamic window value and the length of the handling queue, the larger the idle degree, the idle degree function is trained in advance according to the matching relationship, and the idle degree q of the current server is calculated according to the idle degree function.
And setting a resource idle threshold value in advance according to the load capacity and the carrying capacity of the server, and if the idle degree is larger than the resource idle threshold value, calculating the concurrency degree of the server according to the idle degree and increasing the parallel line number of the parallel carrying queues. The bigger the idle degree is, the bigger the concurrency degree is, the function of the concurrency degree and the idle degree is obtained by carrying out data training according to the positive correlation between the idle degree and the concurrency degree, the concurrency degree f of the current server is obtained by calculating according to the current idle degree q, and the concurrency degree f is rounded downwards to obtain the new concurrency number of the current server.
In an exemplary embodiment, the step S05, carrying the data slice in the parallel carrying queue according to the configured dynamic window value, and the flowchart is shown in fig. 7, and includes:
step S051, calculating a dynamic window value corresponding to each server in the server cluster;
and step S052, distributing a carrying queue for the server according to the dynamic window value and the concurrency degree corresponding to the server, and carrying out data carrying.
In this embodiment, the dynamic window value corresponding to each server in the server cluster is calculated according to the calculation method in the above embodiment, and the handling queue is allocated to the server according to the dynamic window value corresponding to the server and the concurrency (i.e., the number of parallel servers at present) and data handling is performed.
In an exemplary embodiment, the method further includes step S06, and the breakpoint continuous transmission of the data of the handling failure, and the flowchart is shown in fig. 8, and includes:
step S061, when the data transport fails, feeding back an ID or a time stamp of the transport failure data to the server;
step S062, identifying breakpoint positions of the data in the carrying queue according to the data ID and/or the continuity of the time mark;
and step S063, retransmitting the data according to the data ID and the breakpoint position of the data in the handling queue and placing the data at the breakpoint position of the handling queue.
In this embodiment, when the data handling fails, an ID or a time stamp of the handling failure data is fed back to the server, the breakpoint position of the data in the handling queue is identified according to the continuity of the data ID and/or the time stamp, the data is retransmitted according to the data ID and the breakpoint position of the data in the handling queue, and the data is placed at the breakpoint position of the handling queue.
In an exemplary embodiment, the method further includes step S07, recording data transmission process data and determining a data blood-edge relationship, and the flowchart is shown in fig. 9, and includes:
step 071, recording the data transmission speed and data flow in the data transmission process;
step S072, recording a data source and a target library of each data handling;
step S073, forming a data blood relationship chain with a data source and a target library of data handling according to the data transmission process data.
The method also comprises the steps of: and monitoring and recording the speed and flow of data reading and writing and the blood relationship chain of the data in the conveying process, and generating a conveying data report according to the conveyed data amount, conveying time consumption and the blood relationship chain of the conveying process record and conveying data after the conveying is completed.
A computer-readable storage medium of an embodiment of the present invention stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the above-described method.
A data handling system based on dynamic window according to an embodiment of the present invention, a schematic structural diagram is shown in FIG. 10, includes:
a server cluster;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in a memory and configured to be executed by the processor, the programs causing a computer to perform the above-described method.
Of course, those skilled in the art will recognize that the above embodiments are merely illustrative of the present invention and not intended to be limiting, and that changes and modifications of the above embodiments are within the scope of the present invention.

Claims (14)

1. A method of dynamic window based data handling, comprising:
acquiring data source information and target information;
performing data slicing according to the correlation characteristics of the data; the correlation characteristic of the data comprises any one or more of data attribute, data continuity or data blood-edge relation;
configuring a dynamic window according to the performance index of the server cluster; the performance index of the server cluster comprises any one or more of the characteristics of the server cluster or the load of the server cluster or the historical transmission information of the server cluster;
Generating a parallel carrying queue according to the data slicing;
and carrying the data fragments in the parallel carrying queue according to the configured dynamic window value.
2. The dynamic window based data handling method of claim 1, wherein the data source information and destination information comprises any one or more of a combination of type information, address information, port information, user information, password information of the data source and destination; the type information comprises a relational database, a big data platform, a file server and a message queue.
3. The dynamic window based data handling method of claim 1, wherein the data slicing according to the correlation characteristic of the data comprises the steps of:
calculating attribute similarity between data according to the similarity of the data lengths, calculating attribute similarity between data according to the similarity of the data types, calculating attribute similarity between data according to the similarity of the data time requirements, calculating attribute similarity between data according to the similarity of the data lengths and the data types, calculating attribute similarity between data according to the similarity of the data lengths and the data time requirements, calculating attribute similarity between data according to the similarity of the data types and the data time requirements, or calculating attribute similarity between data according to the similarity of the data lengths and the data types and the data time requirements;
Calculating continuity between data according to content continuity of data, or calculating continuity between data according to sequence number continuity of data, or calculating continuity between data according to content continuity of data and time continuity of data, or calculating continuity between data according to sequence number continuity of data and time continuity of data, or calculating continuity between data according to content continuity of data and sequence number continuity of data and time continuity of data;
calculating the blood edge similarity between the data according to the similarity of the blood edge relationship of the data;
calculating a data relevance value according to the attribute similarity between the data, or calculating a data relevance value according to the continuity between the data, or calculating a data relevance value according to the blood edge similarity between the data, or calculating a data relevance value according to the attribute similarity between the data and the continuity between the data, or calculating a data relevance value according to the attribute similarity between the data and the blood edge similarity between the data, or calculating a data relevance value according to the continuity between the data and the blood edge similarity between the data, or calculating a data relevance value according to the attribute similarity between the data and the continuity between the data and the blood edge similarity between the data;
Taking the data with the data relevance value larger than a preset relevance threshold value as a data fragment; the relevance threshold is calculated according to the data volume and the queue bearing capacity.
4. The dynamic window-based data handling method according to claim 1, wherein the configuring the dynamic window according to the performance index of the server cluster comprises the steps of:
calculating parallel service capability evaluation values according to the number of server clusters, or calculating parallel service capability evaluation values according to the network environments of the server clusters, or calculating parallel service capability evaluation values according to the number of server clusters and the network environments of the server clusters;
calculating a load capacity evaluation value according to the CPU utilization rate of the server, or calculating a load capacity evaluation value according to the memory occupancy rate of the server, or calculating a load capacity evaluation value according to the CPU utilization rate of the server and the disk read-write speed of the server, or calculating a load capacity evaluation value according to the memory occupancy rate of the server and the disk read-write speed of the server, or calculating a load capacity evaluation value according to the CPU utilization rate of the server and the memory occupancy rate of the server and the disk read-write speed of the server;
Calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical failure rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster and the historical memory utilization rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical failure rate of the server cluster and the historical memory utilization rate of the server cluster, or calculating a historical transmission efficiency evaluation value according to the historical transmission speed of the server cluster and the historical failure rate of the server cluster and the historical memory utilization rate of the server cluster;
calculating a window indication value according to the correlation between the parallel service capability evaluation value and the window value, or calculating a window indication value according to the correlation between the load capability evaluation value and the window value, or calculating a window indication value according to the correlation between the parallel service capability evaluation value and the historical transmission efficiency evaluation value and the window value, or calculating a window indication value according to the correlation between the load capability evaluation value and the historical transmission efficiency evaluation value and the window value, or calculating a window indication value according to the correlation between the parallel service capability evaluation value and the load capability evaluation value and the correlation between the historical transmission efficiency evaluation value and the window value;
And configuring a dynamic window according to the calculated window indication value.
5. The dynamic window based data handling method of claim 1, wherein, in the step of: the method further comprises the following steps of: filtering data according to a preset data quality inspection rule, including:
identifying error data in the data fragments according to a preset data format correctness rule;
identifying the repeatability data in the data fragments according to a preset data repeatability rule;
identifying incomplete data in the data fragments according to a preset data integrity rule;
and deleting the error data, the repeated data and the incomplete data in the data fragments.
6. The dynamic window based data handling method of claim 1, wherein, in the step of: the method further comprises the following steps of: and screening incremental data according to a data increment judging rule, namely searching a target library according to the content ID in the data fragment and judging whether the target library is in the target library, if so, deleting the content in the data fragment, otherwise, reserving the content in the data fragment.
7. The dynamic window based data handling method of claim 1, wherein, in the step of: the method further comprises the following steps of: and converting the data types, and converting the content types of the filtered and screened data fragments according to the data type mapping relation of the data source and the target library.
8. The dynamic window based data handling method as claimed in claim 1, wherein the generating parallel handling queues according to the data slices comprises the steps of:
calculating the queue priority of different contents according to the time requirement of different contents in each data slice, calculating the queue priority of different contents according to the storage amount of different contents in each data slice, calculating the queue priority of different contents according to the retransmission identification of different contents in each data slice, calculating the queue priority of different contents according to the time requirement of different contents in each data slice and the storage amount of different contents in each data slice, or calculating the queue priority of different contents according to the time requirement of different contents in each data slice and the retransmission identification of different contents in each data slice, or calculating the queue priority of different contents according to the storage amount of different contents in each data slice and the retransmission identification of different contents in each data slice, or calculating the queue priority of different contents according to the time requirement of different contents in each data slice and the storage amount of different contents in each data slice and the retransmission identification of different contents in each data slice;
Sequentially arranging the content according to the queue priority orders of different contents in each data fragment to generate a data fragment queue;
and generating parallel handling queues according to the number of the server clusters by the data slicing queues.
9. The dynamic window based data handling method of claim 1, wherein, in the step of: after generating the parallel handling queue according to the data slicing, the method further comprises the steps of: adjusting concurrency according to the handling queue length and the dynamic window value, including:
calculating the idle degree according to the matching relation between the length of the carrying queue corresponding to the server and the dynamic window value;
if the idle degree is larger than a preset resource idle threshold, calculating the concurrency degree of the server according to the idle degree, and increasing the parallel number of the parallel handling queues.
10. The dynamic window based data handling method as claimed in claim 9, wherein said handling data slices in parallel handling queues according to configured dynamic window values comprises the steps of:
calculating a dynamic window value corresponding to each server in the server cluster;
and distributing a carrying queue for the server according to the dynamic window value and concurrency corresponding to the server, and carrying out data carrying.
11. The dynamic window based data handling method of claim 1, further comprising the steps of: breakpoint continuous transmission of handling failure data, including:
when the data is in failure, feeding back an ID or a time mark of the failure data to a server;
identifying a breakpoint position of the data in the handling queue according to the continuity of the data ID, or identifying a breakpoint position of the data in the handling queue according to the continuity of the time stamp, or identifying a breakpoint position of the data in the handling queue according to the continuity of the data ID and the time stamp;
and retransmitting the data according to the data ID and the breakpoint position of the data in the handling queue and placing the data at the breakpoint position of the handling queue.
12. The dynamic window based data handling method of claim 1, further comprising the steps of: recording data transmission process data and judging data blood-edge relationship, including the steps:
recording the speed and the data flow of data transmission in the data transmission process;
recording a data source and a target library of each data handling;
and forming a data blood relationship chain according to the data transmission process data and the data source and target library of the data handling.
13. A computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method of any one of claims 1-12.
14. A dynamic window based data handling system, comprising:
a server cluster;
a processor;
a memory;
and
one or more programs, wherein the one or more programs are stored in a memory and configured to be executed by the processor, the programs causing a computer to perform the method of any of claims 1-12.
CN202310915713.XA 2023-07-25 2023-07-25 Data handling method, system and readable storage medium based on dynamic window Pending CN116628068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310915713.XA CN116628068A (en) 2023-07-25 2023-07-25 Data handling method, system and readable storage medium based on dynamic window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310915713.XA CN116628068A (en) 2023-07-25 2023-07-25 Data handling method, system and readable storage medium based on dynamic window

Publications (1)

Publication Number Publication Date
CN116628068A true CN116628068A (en) 2023-08-22

Family

ID=87603132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310915713.XA Pending CN116628068A (en) 2023-07-25 2023-07-25 Data handling method, system and readable storage medium based on dynamic window

Country Status (1)

Country Link
CN (1) CN116628068A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117785332A (en) * 2024-02-28 2024-03-29 国维技术有限公司 Virtual three-dimensional space dynamic resource loading and releasing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718507A (en) * 2016-01-06 2016-06-29 杭州数梦工场科技有限公司 Data migration method and device
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN111831625A (en) * 2020-07-14 2020-10-27 深圳力维智联技术有限公司 Data migration method, data migration device and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718507A (en) * 2016-01-06 2016-06-29 杭州数梦工场科技有限公司 Data migration method and device
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN111831625A (en) * 2020-07-14 2020-10-27 深圳力维智联技术有限公司 Data migration method, data migration device and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李辉,等: "《数据库系统原理及MySQL应用教程第2版》", 机械工业出版, pages: 310 - 314 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117785332A (en) * 2024-02-28 2024-03-29 国维技术有限公司 Virtual three-dimensional space dynamic resource loading and releasing method

Similar Documents

Publication Publication Date Title
US11061900B2 (en) Temporal optimization of data operations using distributed search and server management
US8554738B2 (en) Mitigation of obsolescence for archival services
CN112507029B (en) Data processing system and data real-time processing method
CN109669776B (en) Detection task processing method, device and system
CN116628068A (en) Data handling method, system and readable storage medium based on dynamic window
US11429566B2 (en) Approach for a controllable trade-off between cost and availability of indexed data in a cloud log aggregation solution such as splunk or sumo
CN101902505A (en) Distributed DNS inquiry log real-time statistic device and method thereof
US8291054B2 (en) Information processing system, method and program for classifying network nodes
CN106027595A (en) Access log processing method and system for CDN node
US7814165B2 (en) Message classification system and method
CN112019605A (en) Data distribution method and system of data stream
CN116319777A (en) Intelligent gateway service processing method based on edge calculation
CN113535677B (en) Data analysis query management method, device, computer equipment and storage medium
CN112925964A (en) Big data acquisition method based on cloud computing service and big data acquisition service system
CN105468502A (en) Log collection method, device and system
WO2023077815A1 (en) Method and device for processing sensitive data
CN112000657A (en) Data management method, device, server and storage medium
CN116132448A (en) Data distribution method based on artificial intelligence and related equipment
CN111078975B (en) Multi-node incremental data acquisition system and acquisition method
CN112631801B (en) Distributed parallel method for intelligent remote sensing image model
CN109634914B (en) Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files
CN113568966A (en) Data processing method and system used between ODS layer and DW layer
Racka Apache Nifi As A Tool For Stream Processing Of Measurement Data
SG193013A1 (en) System and method for processing similar emails
CN111753518A (en) Autonomous file consistency checking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination