CN117555487A - Data splitting method, device, computer equipment and storage medium - Google Patents

Data splitting method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117555487A
CN117555487A CN202311731330.3A CN202311731330A CN117555487A CN 117555487 A CN117555487 A CN 117555487A CN 202311731330 A CN202311731330 A CN 202311731330A CN 117555487 A CN117555487 A CN 117555487A
Authority
CN
China
Prior art keywords
data
service
record
splitting
physical storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311731330.3A
Other languages
Chinese (zh)
Inventor
熊凯
王东昊
张杜璠
张安琪
王军杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co ltd
Original Assignee
China Life Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co ltd filed Critical China Life Insurance Co ltd
Priority to CN202311731330.3A priority Critical patent/CN117555487A/en
Publication of CN117555487A publication Critical patent/CN117555487A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to a data splitting method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring each service record; for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data; determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated; and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result. The method improves the data processing efficiency.

Description

Data splitting method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of database technologies, and in particular, to a data splitting method, an apparatus, a computer device, a storage medium, and a computer program product.
Background
In daily management of databases, in the process of exporting data in the databases, splitting processing is generally required to be performed on data to be exported in the databases according to the limitation requirements of export files.
In a conventional data splitting method, an ordered data set in a database is split with a fixed splitting threshold value through a preset window function, so as to obtain split data. Further, a report file is generated based on the split data.
However, in the traditional data splitting method, since the distribution of service data is generally irregular, and the data splitting is performed by adopting a fixed splitting threshold, the flexibility is poor, too many batches are easily caused, and the data processing efficiency is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data splitting method, apparatus, computer device, computer readable storage medium, and computer program product.
In a first aspect, the present application provides a data splitting method, including:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In one embodiment, the determining, for each service record, the physical storage size of each data record in the service data based on a mapping relationship among a data model related to the service record, service data corresponding to the data model, and a physical storage space occupied by the service data includes:
determining a data model related to each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In one embodiment, the determining the data splitting range based on the preset data splitting algorithm, the service record, the physical storage size of each data record, and the size of the file to be generated includes:
determining data records contained in each of the business records based on a data model involved in the business records;
determining the physical storage size of each business record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In one embodiment, the method further includes, after performing data splitting on the service data based on the data splitting range to obtain a data splitting result:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
In one embodiment, before determining the data splitting range, the method further includes:
acquiring incremental business data in the database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In one embodiment, the method further comprises:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
In a second aspect, the present application further provides a data splitting apparatus, the apparatus comprising:
the acquisition module is used for acquiring each service record;
the first determining module is used for determining the physical storage size of each data record in the service data according to the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
the second determining module is used for determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and the splitting module is used for carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
The data splitting method, the data splitting device, the computer equipment, the storage medium and the computer program product acquire each business record; for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data; determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated; and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result. By adopting the method, the physical storage size of each data record is determined based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data, so that the data splitting range is dynamically planned based on the physical storage size of each data record, the flexibility of data splitting is improved, and further, the data splitting and reporting are carried out through reasonable data splitting size, and the data processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a flow diagram of a method of data splitting in one embodiment;
FIG. 2 is a flow diagram of a method of determining the physical storage size of each data record in one embodiment;
FIG. 3 is a flow chart of the steps for determining a range of data splitting in one embodiment;
FIG. 4 is a flowchart illustrating steps for merging target data split results in one embodiment;
FIG. 5 is a flowchart showing the steps for updating the physical storage size of a data record based on incremental data, in one embodiment;
FIG. 6 is a block diagram of the structure of a data splitting device in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a data splitting method is provided, where this embodiment is applied to a server for illustration, and it is understood that the method may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
and 102, acquiring each service record.
In an implementation, a database deployed in a server is used to store various types of business data. When a user needs to perform data backup, data migration and data analysis, data processing needs to be performed in advance in a database due to the size limitation of the exported file, for example, data export is performed on various types of service data, and the service data needs to be split in advance in the database so as to ensure that the exported data is not excessively large when the data export is performed according to batches, and the exported file overflows. In the daily business data management process, because the distribution of business data is irregular, business records are usually used as data management dimensions to perform processes such as data splitting. Therefore, the server acquires each service record according to the preset period.
Step 104, for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data.
In implementation, since the splitting process of the service data needs to be performed, a data splitting range needs to be predetermined, where the data splitting range is dynamically optimized based on data models related to service records, service data related to each data model, and three dimensions of a physical storage space corresponding to the service data, and the physical storage size of each data record in the service data corresponding to each data type at present is determined first. Specifically, a plurality of data models, for example, 49 data models, a performance attribution model, a cost model, a profit analysis model, and the like are stored in advance in the server. The number and the kind of the data models are not limited in the embodiment of the application. The data models are used for carrying out data analysis processing on business data in one business transaction. Therefore, for each service record, the server determines the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data.
And step 106, determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated.
In an implementation, the server determines a data splitting range based on a preset data splitting algorithm, service records, a physical storage size of each data record, and a file size to be generated. Specifically, since the data splitting is performed in the data management dimension by using the service records, that is, in how many service records are included in each derived file, the server determines, for each service record, the data model involved in the service record, and further determines, by the physical storage size of each data record corresponding to each data model, the overall physical storage size corresponding to the service record. Then, the server determines the number of business records that can be included in the file to be generated based on the size limit of the file to be generated (i.e. the export file), and further uses the number of business records as a data splitting range.
And step 108, carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In the implementation, the server performs data splitting on the service data based on the data splitting range to obtain a data splitting result. For example, 10000 business records are included in each file to be generated as a data splitting range, and the server splits the business data in advance in the database according to the data splitting range to obtain data splitting results of each batch including 1000 business records. The data splitting result is used for generating a newspaper file.
In the data splitting method, the physical storage size of each data record is determined based on the mapping relation among the data model related in the service records, the service data corresponding to the data model and the physical storage space occupied by the service data, so that the data splitting range is dynamically planned based on the physical storage size of each data record, the flexibility of data splitting is improved, and the data processing efficiency is further improved.
In one exemplary embodiment, as shown in FIG. 2, the following steps 202 through 206 are included. Wherein:
step 202, for each service record, determining a data model related to the service record.
In implementation, a plurality of data models are pre-stored in the server, and the data models can be separately stored as a data model library and can be stored in a target storage area of the database, which is not limited in the embodiment of the present application. One or more data models may be applied during daily business processes. Thus, for each business record, the server determines the data model to which the business record relates in the database of data models. For example, for data model information recorded in a transaction, a cost model and a profit analysis model are determined for the transaction.
Step 204, determining target business data corresponding to the data model in the database based on the mapping relation between the data model and the business data.
In an implementation, a server determines target business data in a database based on a mapping relationship of a data model and business data. Specifically, a mapping relationship between a data model and service data is preset in the database, so that target service data corresponding to each type of data model is determined for each data model related in the service record.
Step 206, determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In an implementation, the server determines a physical storage size of each data record based on a total physical storage size corresponding to the target service data and a data entry of the target service data. For example, the total physical storage size corresponding to the target service data is 100000GB, and the data entries of the target service data are 100000, and then the server determines that the average physical storage size of each data record is 1GB based on the total physical storage size and the data entries.
In this embodiment, based on the mapping relationship between the data model and the service data, the mapping relationship between the service data and the physical storage size determines the physical storage size of each data record, and further, dynamically optimizes the data splitting range based on the physical storage size of each data record at the current moment, thereby improving the flexibility of data splitting and the data processing efficiency.
In an exemplary embodiment, as shown in FIG. 3, step 106 includes steps 302 through 306. Wherein:
step 302, the data records contained in each business record are determined based on the data models involved in the business records.
In implementations, the server determines the data records contained in each business record based on the data model involved in the business record. If 3 data models are involved in each service record, a mapping relationship is stored between the data models and the service data (i.e. the data records are collectively called as data records), and based on the mapping relationship, the server determines that the service record contains 3 types of data records.
Step 304, determining the physical storage size of each business record based on the type of the data record and the physical storage size of each data record.
In implementations, the server determines the physical storage size of each business record based on the physical storage size of each data record. For example, a certain service record includes 3 types of data records, the average physical storage size corresponding to the first type of data record is 0.5GB, the average physical storage size corresponding to the second type of data record is 1.5GB, and the average physical storage size corresponding to the third type of data record is 1GB, so that the total physical storage size of the service record including the data records is 3GB. For another example, if a service record contains 2 types of data records, a fourth type of data record and a fifth type of data record are respectively used. The average physical storage size corresponding to the fourth type of data record is 3GB, and the average physical storage size corresponding to the fifth type of data record is 2GB, and the total physical storage size of the service record including these data records is 5GB.
Step 306, determining the data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In an implementation, the server determines the data splitting range according to the size of the file to be generated and the physical storage size of the service record. Specifically, if the size of a certain file to be generated is limited to 10GB, and the physical storage size of each service record related to the file to be generated is 3GB, one file to be generated contains at most 3 service records, and further, the data splitting range of the service data corresponding to the file to be generated is determined according to the physical storage size of each data record contained in each service record.
In this embodiment, the physical storage size of each data record is determined based on the mapping relationship among the data model, the service data and the physical storage size, and then the data splitting range is dynamically optimized based on the physical storage size of each data record at the current moment, so that the flexibility of data splitting and the data processing efficiency are improved.
In an exemplary embodiment, after the service data splitting is performed on the determined data splitting range at the current time based on the dynamic optimization policy in the foregoing embodiment, if there is a smaller data splitting result, the data splitting result may be further combined to save storage resources, and specifically, as shown in fig. 4, after step 108, the method further includes:
step 402, if the physical storage size of the data splitting result is smaller than the preset file storage threshold, determining the data splitting result as the target data splitting result.
In an implementation, a file storage threshold is pre-stored in the server, the file storage threshold being smaller than a file size to be generated. In the data splitting process, if the physical storage size of the data splitting result is smaller than a preset file storage threshold, the server determines the data splitting result as a target data splitting result. The target data splitting result is the data splitting result waiting to be combined. For example, the file storage threshold is 4GB, in the data splitting process, since 1 current service record (the size of the service record is 2 GB) remains, the data splitting result is 2GB, which is smaller than the file storage threshold of 4GB, and the data splitting result is determined as the target data splitting result.
And step 404, merging the multiple target data splitting results to obtain a merged data splitting result.
In the implementation, a plurality of target data splitting results are combined to obtain a combined data splitting result. After each round of data splitting, a plurality of target data splitting results appear, for example, a first target data splitting result is 2GB, a second target data splitting result is 3GB, a third data splitting result is 3GB, the size (10 GB) of the file to be generated is taken as a constraint, the plurality of target data splitting results are combined, and the combined data splitting results still meet the requirement of the size of the file to be generated, namely 2gb+3gb+3gb=8gb. 8GB <10GB. And the size of the combined data splitting result reduces the number of files to be generated, and saves storage resources.
In this embodiment, for the data splitting result, based on the size relationship between the data splitting result and the preset file storage threshold, the data splitting result is further processed, and the data splitting result smaller than the file storage threshold is combined, so that the number of files to be generated is reduced, storage resources are saved, and storage efficiency is improved.
In an exemplary embodiment, a time period is preset in the server, and for the data to be exported in the database, as shown in fig. 5, before step 106, the method further includes:
step 502, obtaining incremental service data in a database according to a preset time period.
In implementation, as the business is executed in real time, new business data, called incremental business data in the database, is generated, and the server acquires the incremental data in the database to update the correspondence between the incremental data and the data model.
Step 504, based on the incremental business data, updating the corresponding relation between each data model and the business data.
In an implementation, the server updates the correspondence between each data model and the business data based on the incremental business data. For example, the data model a corresponds to 1000 pieces of service data recorded by data, and the incremental service data includes 200 pieces of service data related to the data model a, and the corresponding relationship between the updated data model and the service data by the server is as follows: correspondence between data model a and business data of 1200 data records.
Step 506, determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data, and the physical storage space occupied by the updated service data.
In implementation, the server determines the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data, and the physical storage space occupied by the updated service data. Specifically, because of irregularities in service data distribution, the size of each data record in service data may vary, and after incremental service data exists, the physical storage size of each data record may also vary, for example, when the service data corresponding to the data model a is 1000 data records, the physical storage space occupied by the service data is 1000GB, and at this time, the average physical storage size of each data record is 1GB; when the service data corresponding to the data storage model A is 1200 data records, the physical storage space occupied by the service data is 1800GB, and the average physical storage size of each data record is 1.5GB. Further, the data splitting range needs to be dynamically updated based on the physical storage size of each data record after the update.
In this embodiment, incremental service data in the database is acquired according to a preset time period, and then, a mapping relationship among the data model, the service data and the physical storage space is updated, and based on the updated mapping relationship, the physical storage size of each updated data record is determined, so as to realize dynamic adjustment of the data splitting range, and improve the flexibility of data splitting.
In an exemplary embodiment, the method further comprises:
step 110, based on the data splitting result, generating and exporting the report file according to the preset file format rule and the preset export batch.
In the implementation, a data export period is preset in the server, and then after the server performs data splitting on the service data in the database to obtain a data splitting result, the report file can be generated and exported according to the preset data export period, a preset file format rule and a preset export batch.
In this embodiment, the report file is generated and exported according to the preset file format rule and the preset export batch based on the data splitting result, so that the data processing efficiency can be improved, the standard consistency and the safety of the data can be ensured, meanwhile, the manual operation and the error can be reduced, and the working efficiency and the quality can be improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data splitting device for realizing the above related data splitting method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the data splitting device provided below may refer to the limitation of the data splitting method hereinabove, and will not be repeated herein.
In one exemplary embodiment, as shown in fig. 6, there is provided a data splitting apparatus 600 comprising: an acquisition module 601, a first determination module 602, a second determination module 603, and a splitting module 604, wherein:
an acquiring module 601, configured to acquire each service record;
a first determining module 602, configured to determine, for each service record, a physical storage size of each data record in the service data based on a mapping relationship among a data model related to the service record, service data corresponding to the data model, and a physical storage space occupied by the service data;
a second determining module 603, configured to determine a data splitting range based on a preset data splitting algorithm, a service record, a physical storage size of each data record, and a file size to be generated;
and the splitting module 604 is configured to perform data splitting on the service data based on the data splitting range, so as to obtain a data splitting result.
In an exemplary embodiment, the first determining module 602 is specifically configured to determine, for each service record, a data model related to the service record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In an exemplary embodiment, the second determining module 603 is specifically configured to determine the data record included in each service record based on the data model involved in the service record;
determining the physical storage size of each service record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In an exemplary embodiment, the apparatus 600 further comprises:
the third determining module is used for determining the data splitting result as a target data splitting result if the physical storage size of the data splitting result is smaller than a preset file storage threshold;
and the merging module is used for merging the multiple target data splitting results to obtain a merged data splitting result.
In an exemplary embodiment, the apparatus 600 further comprises:
the second acquisition module is used for acquiring incremental service data in the database according to a preset time period;
the updating module is used for updating the corresponding relation between each data model and the service data based on the incremental service data;
and the fourth determining module is used for determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In an exemplary embodiment, the apparatus 600 further comprises:
the generation module is used for generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
Each of the modules in the above-described data splitting apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing business data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data splitting method.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one exemplary embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a data model related to the business records aiming at each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a data record contained in each business record based on the data model involved in the business record;
determining the physical storage size of each service record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In one embodiment, the processor when executing the computer program further performs the steps of:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
In one embodiment, the processor when executing the computer program further performs the steps of:
acquiring incremental business data in a database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In one embodiment, the processor when executing the computer program further performs the steps of:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of data splitting, the method comprising:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
2. The method according to claim 1, wherein determining, for each of the service records, a physical storage size of each of the service data based on a mapping relationship among a data model involved in the service record, service data corresponding to the data model, and a physical storage space occupied by the service data, includes:
determining a data model related to each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
3. The method of claim 1, wherein determining the range of data splitting based on a preset data splitting algorithm, the business records, a physical storage size of each data record, and a file size to be generated comprises:
determining data records contained in each of the business records based on a data model involved in the business records;
determining the physical storage size of each business record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
4. The method according to claim 1, wherein after the data splitting is performed on the service data based on the data splitting range, the method further comprises:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
5. The method of claim 1, wherein the determining the range of data splitting is preceded by determining the range of data splitting based on a preset data splitting algorithm, the business records, a physical storage size of each data record, and a file size to be generated:
acquiring incremental business data in the database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
6. The method according to claim 1, wherein the method further comprises:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
7. A data splitting apparatus, the apparatus comprising:
the acquisition module is used for acquiring each service record;
the first determining module is used for determining the physical storage size of each data record in the service data according to the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
the second determining module is used for determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and the splitting module is used for carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311731330.3A 2023-12-15 2023-12-15 Data splitting method, device, computer equipment and storage medium Pending CN117555487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311731330.3A CN117555487A (en) 2023-12-15 2023-12-15 Data splitting method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311731330.3A CN117555487A (en) 2023-12-15 2023-12-15 Data splitting method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117555487A true CN117555487A (en) 2024-02-13

Family

ID=89823210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311731330.3A Pending CN117555487A (en) 2023-12-15 2023-12-15 Data splitting method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117555487A (en)

Similar Documents

Publication Publication Date Title
US11010103B2 (en) Distributed batch processing of non-uniform data objects
CN116088758A (en) Optimization method, optimization device, optimization computer device, optimization storage medium, and optimization program product
CN114924911B (en) Method, device, equipment and storage medium for backing up effective data of Windows operating system
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN117555487A (en) Data splitting method, device, computer equipment and storage medium
CN116167882A (en) Conditional expression dynamic configuration method, accounting condition calculation method and accounting condition calculation device
CN117194350B (en) Document storage method and system in engineering construction stage of data center
CN114661249B (en) Data storage method and device, computer equipment and storage medium
CN112860694B (en) Service data processing method, device and equipment
CN117539690B (en) Method, device, equipment, medium and product for merging and recovering multi-disk data
CN114238258B (en) Database data processing method, device, computer equipment and storage medium
CN117743299A (en) Database migration method, device, equipment, medium and product
CN116401323A (en) Index data processing method and device and computer equipment
CN116821010A (en) Cache data clearing method, device, computer equipment and storage medium
CN117076476A (en) Object information processing method, device, computer equipment and storage medium
CN116880927A (en) Rule management method, device, computer equipment and storage medium
CN116204334A (en) Data transmission method, device, computer equipment and storage medium
CN117130704A (en) Page generation method, page generation device, computer equipment and storage medium
CN117056426A (en) Data chain storage method, device and system in financial credit scene
CN118051494A (en) Method and device for determining SPARK target parameters and electronic equipment
CN117312445A (en) Data synchronization method, apparatus, computer device, storage medium, and program product
CN116541137A (en) Transaction processing method, apparatus, computer device, storage medium, and program product
CN117455501A (en) Request processing method, apparatus, computer device and storage medium
CN118151850A (en) Data storage method, device, computer equipment and storage medium
CN117785674A (en) Interface use case generation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination