CN117555487A - Data splitting method, device, computer equipment and storage medium - Google Patents
Data splitting method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN117555487A CN117555487A CN202311731330.3A CN202311731330A CN117555487A CN 117555487 A CN117555487 A CN 117555487A CN 202311731330 A CN202311731330 A CN 202311731330A CN 117555487 A CN117555487 A CN 117555487A
- Authority
- CN
- China
- Prior art keywords
- data
- service
- record
- splitting
- physical storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013499 data model Methods 0.000 claims abstract description 103
- 238000013507 mapping Methods 0.000 claims abstract description 31
- 238000004590 computer program Methods 0.000 claims abstract description 30
- 238000013479 data entry Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 abstract description 12
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000013523 data management Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present application relates to a data splitting method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring each service record; for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data; determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated; and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result. The method improves the data processing efficiency.
Description
Technical Field
The present application relates to the field of database technologies, and in particular, to a data splitting method, an apparatus, a computer device, a storage medium, and a computer program product.
Background
In daily management of databases, in the process of exporting data in the databases, splitting processing is generally required to be performed on data to be exported in the databases according to the limitation requirements of export files.
In a conventional data splitting method, an ordered data set in a database is split with a fixed splitting threshold value through a preset window function, so as to obtain split data. Further, a report file is generated based on the split data.
However, in the traditional data splitting method, since the distribution of service data is generally irregular, and the data splitting is performed by adopting a fixed splitting threshold, the flexibility is poor, too many batches are easily caused, and the data processing efficiency is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data splitting method, apparatus, computer device, computer readable storage medium, and computer program product.
In a first aspect, the present application provides a data splitting method, including:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In one embodiment, the determining, for each service record, the physical storage size of each data record in the service data based on a mapping relationship among a data model related to the service record, service data corresponding to the data model, and a physical storage space occupied by the service data includes:
determining a data model related to each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In one embodiment, the determining the data splitting range based on the preset data splitting algorithm, the service record, the physical storage size of each data record, and the size of the file to be generated includes:
determining data records contained in each of the business records based on a data model involved in the business records;
determining the physical storage size of each business record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In one embodiment, the method further includes, after performing data splitting on the service data based on the data splitting range to obtain a data splitting result:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
In one embodiment, before determining the data splitting range, the method further includes:
acquiring incremental business data in the database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In one embodiment, the method further comprises:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
In a second aspect, the present application further provides a data splitting apparatus, the apparatus comprising:
the acquisition module is used for acquiring each service record;
the first determining module is used for determining the physical storage size of each data record in the service data according to the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
the second determining module is used for determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and the splitting module is used for carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
The data splitting method, the data splitting device, the computer equipment, the storage medium and the computer program product acquire each business record; for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data; determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated; and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result. By adopting the method, the physical storage size of each data record is determined based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data, so that the data splitting range is dynamically planned based on the physical storage size of each data record, the flexibility of data splitting is improved, and further, the data splitting and reporting are carried out through reasonable data splitting size, and the data processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a flow diagram of a method of data splitting in one embodiment;
FIG. 2 is a flow diagram of a method of determining the physical storage size of each data record in one embodiment;
FIG. 3 is a flow chart of the steps for determining a range of data splitting in one embodiment;
FIG. 4 is a flowchart illustrating steps for merging target data split results in one embodiment;
FIG. 5 is a flowchart showing the steps for updating the physical storage size of a data record based on incremental data, in one embodiment;
FIG. 6 is a block diagram of the structure of a data splitting device in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a data splitting method is provided, where this embodiment is applied to a server for illustration, and it is understood that the method may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
and 102, acquiring each service record.
In an implementation, a database deployed in a server is used to store various types of business data. When a user needs to perform data backup, data migration and data analysis, data processing needs to be performed in advance in a database due to the size limitation of the exported file, for example, data export is performed on various types of service data, and the service data needs to be split in advance in the database so as to ensure that the exported data is not excessively large when the data export is performed according to batches, and the exported file overflows. In the daily business data management process, because the distribution of business data is irregular, business records are usually used as data management dimensions to perform processes such as data splitting. Therefore, the server acquires each service record according to the preset period.
Step 104, for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data.
In implementation, since the splitting process of the service data needs to be performed, a data splitting range needs to be predetermined, where the data splitting range is dynamically optimized based on data models related to service records, service data related to each data model, and three dimensions of a physical storage space corresponding to the service data, and the physical storage size of each data record in the service data corresponding to each data type at present is determined first. Specifically, a plurality of data models, for example, 49 data models, a performance attribution model, a cost model, a profit analysis model, and the like are stored in advance in the server. The number and the kind of the data models are not limited in the embodiment of the application. The data models are used for carrying out data analysis processing on business data in one business transaction. Therefore, for each service record, the server determines the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data.
And step 106, determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated.
In an implementation, the server determines a data splitting range based on a preset data splitting algorithm, service records, a physical storage size of each data record, and a file size to be generated. Specifically, since the data splitting is performed in the data management dimension by using the service records, that is, in how many service records are included in each derived file, the server determines, for each service record, the data model involved in the service record, and further determines, by the physical storage size of each data record corresponding to each data model, the overall physical storage size corresponding to the service record. Then, the server determines the number of business records that can be included in the file to be generated based on the size limit of the file to be generated (i.e. the export file), and further uses the number of business records as a data splitting range.
And step 108, carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In the implementation, the server performs data splitting on the service data based on the data splitting range to obtain a data splitting result. For example, 10000 business records are included in each file to be generated as a data splitting range, and the server splits the business data in advance in the database according to the data splitting range to obtain data splitting results of each batch including 1000 business records. The data splitting result is used for generating a newspaper file.
In the data splitting method, the physical storage size of each data record is determined based on the mapping relation among the data model related in the service records, the service data corresponding to the data model and the physical storage space occupied by the service data, so that the data splitting range is dynamically planned based on the physical storage size of each data record, the flexibility of data splitting is improved, and the data processing efficiency is further improved.
In one exemplary embodiment, as shown in FIG. 2, the following steps 202 through 206 are included. Wherein:
step 202, for each service record, determining a data model related to the service record.
In implementation, a plurality of data models are pre-stored in the server, and the data models can be separately stored as a data model library and can be stored in a target storage area of the database, which is not limited in the embodiment of the present application. One or more data models may be applied during daily business processes. Thus, for each business record, the server determines the data model to which the business record relates in the database of data models. For example, for data model information recorded in a transaction, a cost model and a profit analysis model are determined for the transaction.
Step 204, determining target business data corresponding to the data model in the database based on the mapping relation between the data model and the business data.
In an implementation, a server determines target business data in a database based on a mapping relationship of a data model and business data. Specifically, a mapping relationship between a data model and service data is preset in the database, so that target service data corresponding to each type of data model is determined for each data model related in the service record.
Step 206, determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In an implementation, the server determines a physical storage size of each data record based on a total physical storage size corresponding to the target service data and a data entry of the target service data. For example, the total physical storage size corresponding to the target service data is 100000GB, and the data entries of the target service data are 100000, and then the server determines that the average physical storage size of each data record is 1GB based on the total physical storage size and the data entries.
In this embodiment, based on the mapping relationship between the data model and the service data, the mapping relationship between the service data and the physical storage size determines the physical storage size of each data record, and further, dynamically optimizes the data splitting range based on the physical storage size of each data record at the current moment, thereby improving the flexibility of data splitting and the data processing efficiency.
In an exemplary embodiment, as shown in FIG. 3, step 106 includes steps 302 through 306. Wherein:
step 302, the data records contained in each business record are determined based on the data models involved in the business records.
In implementations, the server determines the data records contained in each business record based on the data model involved in the business record. If 3 data models are involved in each service record, a mapping relationship is stored between the data models and the service data (i.e. the data records are collectively called as data records), and based on the mapping relationship, the server determines that the service record contains 3 types of data records.
Step 304, determining the physical storage size of each business record based on the type of the data record and the physical storage size of each data record.
In implementations, the server determines the physical storage size of each business record based on the physical storage size of each data record. For example, a certain service record includes 3 types of data records, the average physical storage size corresponding to the first type of data record is 0.5GB, the average physical storage size corresponding to the second type of data record is 1.5GB, and the average physical storage size corresponding to the third type of data record is 1GB, so that the total physical storage size of the service record including the data records is 3GB. For another example, if a service record contains 2 types of data records, a fourth type of data record and a fifth type of data record are respectively used. The average physical storage size corresponding to the fourth type of data record is 3GB, and the average physical storage size corresponding to the fifth type of data record is 2GB, and the total physical storage size of the service record including these data records is 5GB.
Step 306, determining the data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In an implementation, the server determines the data splitting range according to the size of the file to be generated and the physical storage size of the service record. Specifically, if the size of a certain file to be generated is limited to 10GB, and the physical storage size of each service record related to the file to be generated is 3GB, one file to be generated contains at most 3 service records, and further, the data splitting range of the service data corresponding to the file to be generated is determined according to the physical storage size of each data record contained in each service record.
In this embodiment, the physical storage size of each data record is determined based on the mapping relationship among the data model, the service data and the physical storage size, and then the data splitting range is dynamically optimized based on the physical storage size of each data record at the current moment, so that the flexibility of data splitting and the data processing efficiency are improved.
In an exemplary embodiment, after the service data splitting is performed on the determined data splitting range at the current time based on the dynamic optimization policy in the foregoing embodiment, if there is a smaller data splitting result, the data splitting result may be further combined to save storage resources, and specifically, as shown in fig. 4, after step 108, the method further includes:
step 402, if the physical storage size of the data splitting result is smaller than the preset file storage threshold, determining the data splitting result as the target data splitting result.
In an implementation, a file storage threshold is pre-stored in the server, the file storage threshold being smaller than a file size to be generated. In the data splitting process, if the physical storage size of the data splitting result is smaller than a preset file storage threshold, the server determines the data splitting result as a target data splitting result. The target data splitting result is the data splitting result waiting to be combined. For example, the file storage threshold is 4GB, in the data splitting process, since 1 current service record (the size of the service record is 2 GB) remains, the data splitting result is 2GB, which is smaller than the file storage threshold of 4GB, and the data splitting result is determined as the target data splitting result.
And step 404, merging the multiple target data splitting results to obtain a merged data splitting result.
In the implementation, a plurality of target data splitting results are combined to obtain a combined data splitting result. After each round of data splitting, a plurality of target data splitting results appear, for example, a first target data splitting result is 2GB, a second target data splitting result is 3GB, a third data splitting result is 3GB, the size (10 GB) of the file to be generated is taken as a constraint, the plurality of target data splitting results are combined, and the combined data splitting results still meet the requirement of the size of the file to be generated, namely 2gb+3gb+3gb=8gb. 8GB <10GB. And the size of the combined data splitting result reduces the number of files to be generated, and saves storage resources.
In this embodiment, for the data splitting result, based on the size relationship between the data splitting result and the preset file storage threshold, the data splitting result is further processed, and the data splitting result smaller than the file storage threshold is combined, so that the number of files to be generated is reduced, storage resources are saved, and storage efficiency is improved.
In an exemplary embodiment, a time period is preset in the server, and for the data to be exported in the database, as shown in fig. 5, before step 106, the method further includes:
step 502, obtaining incremental service data in a database according to a preset time period.
In implementation, as the business is executed in real time, new business data, called incremental business data in the database, is generated, and the server acquires the incremental data in the database to update the correspondence between the incremental data and the data model.
Step 504, based on the incremental business data, updating the corresponding relation between each data model and the business data.
In an implementation, the server updates the correspondence between each data model and the business data based on the incremental business data. For example, the data model a corresponds to 1000 pieces of service data recorded by data, and the incremental service data includes 200 pieces of service data related to the data model a, and the corresponding relationship between the updated data model and the service data by the server is as follows: correspondence between data model a and business data of 1200 data records.
Step 506, determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data, and the physical storage space occupied by the updated service data.
In implementation, the server determines the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data, and the physical storage space occupied by the updated service data. Specifically, because of irregularities in service data distribution, the size of each data record in service data may vary, and after incremental service data exists, the physical storage size of each data record may also vary, for example, when the service data corresponding to the data model a is 1000 data records, the physical storage space occupied by the service data is 1000GB, and at this time, the average physical storage size of each data record is 1GB; when the service data corresponding to the data storage model A is 1200 data records, the physical storage space occupied by the service data is 1800GB, and the average physical storage size of each data record is 1.5GB. Further, the data splitting range needs to be dynamically updated based on the physical storage size of each data record after the update.
In this embodiment, incremental service data in the database is acquired according to a preset time period, and then, a mapping relationship among the data model, the service data and the physical storage space is updated, and based on the updated mapping relationship, the physical storage size of each updated data record is determined, so as to realize dynamic adjustment of the data splitting range, and improve the flexibility of data splitting.
In an exemplary embodiment, the method further comprises:
step 110, based on the data splitting result, generating and exporting the report file according to the preset file format rule and the preset export batch.
In the implementation, a data export period is preset in the server, and then after the server performs data splitting on the service data in the database to obtain a data splitting result, the report file can be generated and exported according to the preset data export period, a preset file format rule and a preset export batch.
In this embodiment, the report file is generated and exported according to the preset file format rule and the preset export batch based on the data splitting result, so that the data processing efficiency can be improved, the standard consistency and the safety of the data can be ensured, meanwhile, the manual operation and the error can be reduced, and the working efficiency and the quality can be improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data splitting device for realizing the above related data splitting method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the data splitting device provided below may refer to the limitation of the data splitting method hereinabove, and will not be repeated herein.
In one exemplary embodiment, as shown in fig. 6, there is provided a data splitting apparatus 600 comprising: an acquisition module 601, a first determination module 602, a second determination module 603, and a splitting module 604, wherein:
an acquiring module 601, configured to acquire each service record;
a first determining module 602, configured to determine, for each service record, a physical storage size of each data record in the service data based on a mapping relationship among a data model related to the service record, service data corresponding to the data model, and a physical storage space occupied by the service data;
a second determining module 603, configured to determine a data splitting range based on a preset data splitting algorithm, a service record, a physical storage size of each data record, and a file size to be generated;
and the splitting module 604 is configured to perform data splitting on the service data based on the data splitting range, so as to obtain a data splitting result.
In an exemplary embodiment, the first determining module 602 is specifically configured to determine, for each service record, a data model related to the service record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In an exemplary embodiment, the second determining module 603 is specifically configured to determine the data record included in each service record based on the data model involved in the service record;
determining the physical storage size of each service record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In an exemplary embodiment, the apparatus 600 further comprises:
the third determining module is used for determining the data splitting result as a target data splitting result if the physical storage size of the data splitting result is smaller than a preset file storage threshold;
and the merging module is used for merging the multiple target data splitting results to obtain a merged data splitting result.
In an exemplary embodiment, the apparatus 600 further comprises:
the second acquisition module is used for acquiring incremental service data in the database according to a preset time period;
the updating module is used for updating the corresponding relation between each data model and the service data based on the incremental service data;
and the fourth determining module is used for determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In an exemplary embodiment, the apparatus 600 further comprises:
the generation module is used for generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
Each of the modules in the above-described data splitting apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing business data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data splitting method.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one exemplary embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a data model related to the business records aiming at each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a data record contained in each business record based on the data model involved in the business record;
determining the physical storage size of each service record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
In one embodiment, the processor when executing the computer program further performs the steps of:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
In one embodiment, the processor when executing the computer program further performs the steps of:
acquiring incremental business data in a database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
In one embodiment, the processor when executing the computer program further performs the steps of:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (10)
1. A method of data splitting, the method comprising:
acquiring each service record;
for each service record, determining the physical storage size of each data record in the service data based on the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
2. The method according to claim 1, wherein determining, for each of the service records, a physical storage size of each of the service data based on a mapping relationship among a data model involved in the service record, service data corresponding to the data model, and a physical storage space occupied by the service data, includes:
determining a data model related to each business record;
determining target business data corresponding to the data model in a database based on the mapping relation between the data model and the business data;
and determining the physical storage size of each data record based on the total physical storage size corresponding to the target service data and the data entry of the target service data.
3. The method of claim 1, wherein determining the range of data splitting based on a preset data splitting algorithm, the business records, a physical storage size of each data record, and a file size to be generated comprises:
determining data records contained in each of the business records based on a data model involved in the business records;
determining the physical storage size of each business record based on the physical storage size of each data record;
and determining a data splitting range according to the size of the file to be generated and the physical storage size of the service record.
4. The method according to claim 1, wherein after the data splitting is performed on the service data based on the data splitting range, the method further comprises:
if the physical storage size of the data splitting result is smaller than a preset file storage threshold, determining the data splitting result as a target data splitting result;
and combining the multiple target data splitting results to obtain a combined data splitting result.
5. The method of claim 1, wherein the determining the range of data splitting is preceded by determining the range of data splitting based on a preset data splitting algorithm, the business records, a physical storage size of each data record, and a file size to be generated:
acquiring incremental business data in the database according to a preset time period;
updating the corresponding relation between each data model and the service data based on the incremental service data;
and determining the physical storage size of each data record in the service data according to each data model, the corresponding relation between the updated data model and the service data and the physical storage space occupied by the updated service data.
6. The method according to claim 1, wherein the method further comprises:
and generating and exporting the report file according to a preset file format rule and a preset export batch based on the data splitting result.
7. A data splitting apparatus, the apparatus comprising:
the acquisition module is used for acquiring each service record;
the first determining module is used for determining the physical storage size of each data record in the service data according to the mapping relation among the data model related in the service record, the service data corresponding to the data model and the physical storage space occupied by the service data;
the second determining module is used for determining a data splitting range based on a preset data splitting algorithm, the service records, the physical storage size of each data record and the size of a file to be generated;
and the splitting module is used for carrying out data splitting on the service data based on the data splitting range to obtain a data splitting result.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311731330.3A CN117555487A (en) | 2023-12-15 | 2023-12-15 | Data splitting method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311731330.3A CN117555487A (en) | 2023-12-15 | 2023-12-15 | Data splitting method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117555487A true CN117555487A (en) | 2024-02-13 |
Family
ID=89823210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311731330.3A Pending CN117555487A (en) | 2023-12-15 | 2023-12-15 | Data splitting method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117555487A (en) |
-
2023
- 2023-12-15 CN CN202311731330.3A patent/CN117555487A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11010103B2 (en) | Distributed batch processing of non-uniform data objects | |
CN116088758A (en) | Optimization method, optimization device, optimization computer device, optimization storage medium, and optimization program product | |
CN114924911B (en) | Method, device, equipment and storage medium for backing up effective data of Windows operating system | |
CN115858471A (en) | Service data change recording method, device, computer equipment and medium | |
CN117555487A (en) | Data splitting method, device, computer equipment and storage medium | |
CN116167882A (en) | Conditional expression dynamic configuration method, accounting condition calculation method and accounting condition calculation device | |
CN117194350B (en) | Document storage method and system in engineering construction stage of data center | |
CN114661249B (en) | Data storage method and device, computer equipment and storage medium | |
CN112860694B (en) | Service data processing method, device and equipment | |
CN117539690B (en) | Method, device, equipment, medium and product for merging and recovering multi-disk data | |
CN114238258B (en) | Database data processing method, device, computer equipment and storage medium | |
CN117743299A (en) | Database migration method, device, equipment, medium and product | |
CN116401323A (en) | Index data processing method and device and computer equipment | |
CN116821010A (en) | Cache data clearing method, device, computer equipment and storage medium | |
CN117076476A (en) | Object information processing method, device, computer equipment and storage medium | |
CN116880927A (en) | Rule management method, device, computer equipment and storage medium | |
CN116204334A (en) | Data transmission method, device, computer equipment and storage medium | |
CN117130704A (en) | Page generation method, page generation device, computer equipment and storage medium | |
CN117056426A (en) | Data chain storage method, device and system in financial credit scene | |
CN118051494A (en) | Method and device for determining SPARK target parameters and electronic equipment | |
CN117312445A (en) | Data synchronization method, apparatus, computer device, storage medium, and program product | |
CN116541137A (en) | Transaction processing method, apparatus, computer device, storage medium, and program product | |
CN117455501A (en) | Request processing method, apparatus, computer device and storage medium | |
CN118151850A (en) | Data storage method, device, computer equipment and storage medium | |
CN117785674A (en) | Interface use case generation method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |