CN114912990A - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN114912990A CN114912990A CN202110123970.0A CN202110123970A CN114912990A CN 114912990 A CN114912990 A CN 114912990A CN 202110123970 A CN202110123970 A CN 202110123970A CN 114912990 A CN114912990 A CN 114912990A
- Authority
- CN
- China
- Prior art keywords
- data
- processed
- node
- determining
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 115
- 238000013467 fragmentation Methods 0.000 claims abstract description 111
- 238000006062 fragmentation reaction Methods 0.000 claims abstract description 111
- 239000012634 fragment Substances 0.000 claims description 127
- 238000000034 method Methods 0.000 claims description 71
- 230000008569 process Effects 0.000 claims description 37
- 238000004590 computer program Methods 0.000 claims description 16
- 238000005192 partition Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 230000006872 improvement Effects 0.000 description 9
- 238000012512 characterization method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 229920002799 BoPET Polymers 0.000 description 1
- 239000005041 Mylar™ Substances 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/125—Finance or payroll
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The specification discloses a data processing method and device, and specifically discloses that a data slicing node firstly determines data to be processed, then determines a task type corresponding to the data to be processed, then determines resource occupation representation information corresponding to the data to be processed and used for representing resource amount occupied when the data to be processed is subjected to data processing according to the task type corresponding to the data to be processed, and then determines each data slicing range in the data to be processed according to the resource occupation representation information so as to perform data processing on the data in the data slicing range through a working node corresponding to the data slicing range for each data slicing range. Therefore, the data fragmentation node can determine the range of each data fragmentation according to the resource amount occupied when the data to be processed is processed, so that reasonable data fragmentation is realized, and the computing resources of the system are saved.
Description
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a method and an apparatus for data processing.
Background
At present, financial services usually adopt a real-time account-out mode and a time-setting account-in mode to process account data. When posting, batch processing needs to be performed on posting data. With the continuous rising of the number of users and the increase of the number of the user cards, the data volume of the posting data needing to be processed is rapidly increased, the batch processing posting data is firstly subjected to data fragmentation, and then the fragmented posting data are processed in parallel by all the working nodes in the computing cluster, so that the batch processing efficiency of the posting data can be effectively improved.
In the prior art, when data fragmentation is performed, data fragmentation is often performed on data to be processed based on attributes of posted data to be processed, and then the data fragmentation is randomly allocated to each working node to perform data processing, so that for one working node, the data volume corresponding to the data fragmentation which is responsible for processing by the working node may be larger than the maximum data volume of the data which can be processed by the working node, and then an overload phenomenon occurs, which results in system performance degradation. In addition, when the data volume corresponding to the data fragment processed by the working node is smaller than the maximum data volume of the data that can be processed by the working node, the computing resources of the working node are wasted.
Disclosure of Invention
The present specification provides a method and an apparatus for data processing, which partially solve the above problems in the prior art.
The technical scheme adopted by the specification is as follows:
the present specification provides a method of data processing, comprising:
determining data to be processed by the data fragmentation node;
determining a task type corresponding to the data to be processed according to the data to be processed;
determining resource occupation representation information corresponding to the data to be processed according to the task type corresponding to the data to be processed, wherein the resource occupation representation information is used for representing the resource amount occupied when the data to be processed is processed;
and determining each data fragment range in the data to be processed according to the resource occupation representation information, so as to perform data processing on the data in the data fragment range through the working node corresponding to the data fragment range aiming at each data fragment range.
Optionally, determining, according to the task type corresponding to the data to be processed, resource occupation representation information corresponding to the data to be processed, specifically including:
if the task type corresponding to the data to be processed is determined to be a first task type, determining a time period corresponding to the current task cycle of the data to be processed as a target time period, wherein the first task type is as follows: the data fragmentation node processes data stored in a database in a system to which the data fragmentation node belongs;
and predicting the number of working nodes required for processing the data to be processed in the current task period according to the historical data processing records corresponding to the historical task periods in the target time period, wherein the working nodes are used as the resource occupation representation information corresponding to the data to be processed.
Optionally, predicting the number of working nodes required for performing data processing on the data to be processed in the current task cycle according to the historical data processing record corresponding to each historical task cycle in the target time period, and specifically includes:
predicting the number of working nodes required for data processing of data generated in the target time period according to historical data processing records corresponding to the target time period in each historical task period, and taking the number of the working nodes as the number of basic nodes;
determining a node coefficient corresponding to the current task period according to the data volume of the data generated in the current task period, wherein the larger the node coefficient is, the more the number of working nodes required for data processing of the data generated in the current task period is;
and predicting the number of working nodes required for performing data processing on the data to be processed in the current task period according to the node coefficient and the number of the basic nodes.
Optionally, determining, according to the resource occupation representation information, each data fragment range in the data to be processed specifically includes:
and determining each data fragment range in the data to be processed according to the number of the working nodes processing the data to be processed.
Optionally, determining, according to the task type corresponding to the data to be processed, resource occupation representation information corresponding to the data to be processed specifically includes:
if it is determined that the task type corresponding to the data to be processed is a second task type, determining a data volume of data to be processed by each working node for processing the data to be processed, as resource occupation representation information corresponding to the data to be processed, where the second task type is: and the data fragmentation node processes data sent by other systems except the system to which the data fragmentation node belongs.
Optionally, determining, according to the resource occupation representation information, each data fragment range in the data to be processed specifically includes:
determining the corresponding initial fragment position of each working node in the data to be processed aiming at each working node for processing the data to be processed;
determining pre-division fragment positions in the data to be processed according to the initial fragment positions and the data quantity of the working node to be processed aiming at the data to be processed;
and if the preset identifier is read from the pre-division fragment position, determining a corresponding data fragment range of the working node in the data to be processed according to the initial fragment position and the read preset identifier, wherein the corresponding data fragment ranges of different working nodes in the data to be processed are not overlapped.
Optionally, for each data fragmentation range, performing data processing on data in the data fragmentation range through a working node corresponding to the data fragmentation range, specifically including:
and aiming at each working node for processing the data to be processed, sending the data to be processed and task information corresponding to the working node, so that the working node determines a data fragment range corresponding to the working node in the data to be processed according to the task information to serve as a target fragment range, and performs data processing on data positioned in the target fragment range in the data to be processed.
The present specification provides an apparatus for data processing, comprising:
the to-be-processed data determining module is used for determining to-be-processed data;
the task type determining module is used for determining a task type corresponding to the data to be processed according to the data to be processed;
the characterization information determining module is used for determining resource occupation characterization information corresponding to the data to be processed according to the task type corresponding to the data to be processed, wherein the resource occupation characterization information is used for characterizing the resource amount occupied when the data to be processed is processed;
and the data fragment range determining module is used for determining each data fragment range in the data to be processed according to the resource occupation representation information so as to process the data in the data fragment range through the working node corresponding to the data fragment range aiming at each data fragment range.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described data processing method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-mentioned data processing method when executing the program.
The technical scheme adopted by the specification can achieve the following beneficial effects:
in the data processing method provided in this specification, a data fragmentation node first determines data to be processed, then determines a task type corresponding to the data to be processed, then determines resource occupation representation information corresponding to the data to be processed according to the task type corresponding to the data to be processed, and then determines each data fragmentation range in the data to be processed according to the resource occupation representation information, so as to perform data processing on the data in the data fragmentation range through a working node corresponding to the data fragmentation range for each data fragmentation range, where the resource occupation representation information is used to represent an amount of resources occupied when performing data processing on the data to be processed. Therefore, the data fragmentation node can determine the range of each data fragmentation according to the resource amount occupied when the data to be processed is processed, so that reasonable data fragmentation is realized, and the computing resources of the system are saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the principles of the specification and not to limit the specification in a limiting sense. In the drawings:
FIG. 1 is a flow chart illustrating a method of data processing according to the present disclosure;
FIG. 2 is a schematic diagram of a data processing apparatus provided herein;
fig. 3 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
To make the objects, technical solutions and advantages of the present specification clearer and more complete, the technical solutions of the present specification will be described in detail and completely with reference to the specific embodiments of the present specification and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a data processing method in this specification, which specifically includes the following steps:
step S100, the data fragmentation node determines the data to be processed.
At present, in many service scenarios, data to be processed needs to be fragmented, and the obtained data fragments are distributed to each working node, so that each working node performs parallel processing on each data fragment, thereby improving data processing efficiency and reducing load pressure. For example, the internet financial industry processes accounting data by adopting a real-time account-out and time-setting account-in mode. For an internet financial institution, there are two main types of data for clearing accounts. One is the clearing document provided by other institutions when the Internet financial institution is cleared with other institutions; the other is data stored in a database when the internet financial institution checks the account in the internet financial institution, and the two kinds of data for checking the account can be used as data to be processed in the specification. At present, when data to be processed is batch processed, data fragmentation is mostly performed on the data to be processed based on attributes of the data to be processed to obtain data fragments corresponding to a plurality of data to be processed, and then data processing is performed on the data corresponding to each data fragment.
In view of the above problems, the present specification provides a data processing system comprising: the system comprises a data fragment node and a plurality of working nodes. The data fragmentation node is used for executing the data processing method and determining a data fragmentation range corresponding to a data fragment corresponding to data to be processed, and the data fragmentation node can be a server or a terminal device (such as a desktop computer) provided with an operation program corresponding to data processing. Each working node is used for carrying out data processing on the data in the data sub-slice which is respectively responsible for the working nodes.
It should be noted that the maximum data amount of data that can be processed by each working node may be the same or different, and may be specifically set according to actual service requirements. For example, to facilitate management and increase or decrease of computing power of the system, each working node in the system may be configured as a terminal device with consistent performance.
And step S102, determining a task type corresponding to the data to be processed according to the data to be processed.
In this specification, the data fragmentation node may determine a task type corresponding to the data to be processed according to a source of the data to be processed, where the task type mentioned herein may include a first task type and a second task type. The first task type is to process data stored in a database in a system to which the data fragment node belongs, for example, when the internet financial institution itself performs internal clearing, the data processing object is data stored in a database of the internet financial institution itself. The second task type is to process data sent by systems other than the system to which the data fragmentation node belongs, for example, when account clearing is performed between an internet financial institution and other institutions, the object of data processing is an account clearing file provided by other institutions.
And step S104, determining resource occupation representation information corresponding to the data to be processed according to the task type corresponding to the data to be processed.
The resource occupation representation information is used for representing the resource amount occupied when the data to be processed is processed. Therefore, data fragmentation can be performed on the data to be processed according to the resource occupation representation information, so that the data in the data fragmentation which needs to be processed by each working node can be more matched with the maximum data volume of the data which can be processed by each working node, and system resources are more reasonably utilized.
Step S106, according to the resource occupation representation information, determining each data fragment range in the data to be processed, and performing data processing on the data in the data fragment range through the working node corresponding to the data fragment range aiming at each data fragment range.
The data fragmentation range refers to a range to which data in each data fragment in the data to be processed belongs after the data are fragmented. For one data fragment, according to the data fragment range corresponding to the data fragment, the working node may determine, from the data to be processed, the data included in the data fragment, and may further perform data processing on the data in each data fragment.
In the method, when the data fragmentation node determines each data fragmentation range in the data to be processed, the range of each data fragmentation can be determined according to the resource amount occupied when the data to be processed is processed, so that the data in the data fragmentation which needs to be processed by each working node can be more matched with the maximum data amount of the data which can be processed by each working node, and the system computing resources are more effectively and reasonably utilized.
In this specification, when the types of the tasks corresponding to the tasks to be processed are different, the slicing manner is also different. To better explain and explain the technical solution in this specification, the following will respectively exemplify two task types corresponding to data to be processed.
The first task type: and the data fragmentation node processes the data stored in the database in the system to which the data fragmentation node belongs.
The method comprises the following steps: and the data fragmentation node determines the data to be processed.
The data fragmentation node can receive a data processing request triggered by a user or a timing task, and then determine the data to be processed according to the identification information of the data to be processed carried in the data request. Of course, the data fragmentation node may also determine the data to be processed directly according to the storage address of the data to be processed, which is provided by the user and needs the data.
The data to be processed may be structured data stored in a database, and each record in the data to be processed records its corresponding generation time. Therefore, in this specification, data fragmentation may be performed on the data to be processed based on the dimension of the generation time of each record in the data to be processed.
Step two: and the data slicing nodes determine a time period corresponding to the data to be processed in the current task period as a target time period, and predict the number of working nodes required for processing the data to be processed in the current task period according to historical data processing records corresponding to the historical task periods in the target time period, wherein the working nodes are used as resource occupation representation information corresponding to the data to be processed.
The current task cycle refers to a time cycle in which the time for generating the data to be processed is located, and the time period of the data to be processed corresponding to the current task cycle refers to a time period for generating the data to be processed. For example, the current task period is 23 o 'clock yesterday to 23 o' clock today, the data in the to-be-processed data a is data generated between 11 o 'clock and 12 o' clock today, and the target time period corresponding to the to-be-processed data a is 11 o 'clock to 12 o' clock today.
And after the data fragmentation node determines the target time period, further determining resource occupation representation information corresponding to the data to be processed.
Specifically, when the data segment node determines the resource occupation representation information corresponding to the data to be processed, the data segment node may first predict, according to the historical data processing record corresponding to each historical task cycle in the target time period, the number of working nodes required for data processing on the data generated in the target time period, and then calculate the number of the working nodes required for data processing on the data to be processed in the current task cycle, according to the data amount of the data generated in the current task cycle, determine the node coefficient corresponding to the current task cycle, and then predict, according to the node coefficient and the number of the basic nodes, the number of the working nodes required for data processing on the data to be processed in the current task cycle.
When the number of basic nodes corresponding to the data to be processed is predicted, the data fragment node may determine an average adopted number of the working nodes corresponding to the target time period according to the number of the working nodes adopted in the historical data processing record corresponding to the target time period in each historical task cycle, and use the average adopted number as the number of the basic nodes corresponding to the data to be processed.
The data fragmentation node may also predict, according to the data volume corresponding to each historical period in the target time period, the data volume of the to-be-processed data corresponding to the target time period in the current task period, as a predicted data volume, and determine, according to the number of the working nodes in the data processing system that can be processed and the predicted data volume, how many working nodes the data in the target time period of the current task period needs to be allocated for processing, and use the number of the working nodes as the number of the base nodes.
The historical data processing record corresponding to each historical task cycle in the target time period may include: the number of nodes processing the data to be processed in the target time period, the data fragments processed by each working node and the data volume of the corresponding data fragments, the data volume of the data to be processed in the target time period, and the like.
Further, the data fragment node also determines a node coefficient corresponding to the current task period. The larger the node coefficient is, the larger the number of working nodes required for data processing of data generated in the current task period is.
In specific implementation, the data fragmentation node may predict the data amount of the data generated in the current task period according to the historical data processing record of each historical task period, then obtain a ratio of the data amount to the total data amount of the data to be processed in the last task period by the system, and use the ratio as the node coefficient corresponding to the current task period, or use the product of the ratio and the node coefficient corresponding to the data to be processed in the last task period as the node coefficient corresponding to the current task period. After the data volume of the data generated in the current task period is predicted by the data slicing node, the data volume may also be compared with the data volume of the data corresponding to each historical task period determined from the historical data processing records of each historical task period, the historical task period closest to the data volume is determined, and the node coefficient corresponding to the historical task period is used as the node coefficient corresponding to the current task period. The data fragmentation node can also determine a corresponding node coefficient according to the data volume of the data which can be processed by the working node for processing the data to be processed. For example, assuming that the node coefficient corresponding to the previous task period is 1, and the data amount of the data that can be processed by each work node in the current task period is increased to 2 times the data amount of the data that can be processed by each work node in the previous task period, the node coefficient corresponding to the current task period may be determined to be 0.5. Other ways are not illustrated in detail here.
Finally, the data fragmentation node can predict the number of working nodes required for data processing on the data to be processed in the current task period according to the node coefficient and the number of basic nodes. For example, the data slicing node may multiply the node coefficient based on the number of the base nodes, and the obtained product value is used as the number of the working nodes required for performing data processing on the data to be processed in the current task cycle.
Step four: and the data fragmentation node determines each data fragmentation range in the data to be processed according to the number of the working nodes for processing the data to be processed.
The data slicing node may slice the data to be processed in a time dimension according to the number of working nodes processing the data to be processed, so that the length of time intervals involved by the data contained in each data slice is the same. For example, the data in the data a to be processed is data generated between 11 pm and 12 pm today, and the number of work nodes for processing the data a to be processed is determined to be 6, data generated between 11 pm and 10 minutes may be taken as data in the first data slice, data generated between 11 pm and 10 minutes may be taken as data in the second data slice, and so on, data generated between 11 pm and 12 pm today may be divided into six data slice ranges.
Of course, the data fragmentation node may also average fragment the predicted data amount of the data to be processed according to the number of working nodes that process the data to be processed, so that the data amount corresponding to each data fragment is the same. When the slicing is performed based on the average data amount, the determined data slicing range of the data slice may also determine the slicing position corresponding to the data slicing range in the time dimension, and simultaneously make the data amount of the data in each data slice equal.
For each determined data fragment range, the data fragment range may be indicated by the generation time corresponding to the first record in the data fragment range in the data to be processed and the generation time corresponding to the last record in the data fragment range in the data to be processed.
Step five: and the data fragmentation node sends the data to be processed and task information corresponding to the working node aiming at each working node for processing the data to be processed, so that the working node determines a data fragmentation range corresponding to the working node in the data to be processed according to the task information to be used as a target fragmentation range, and performs data processing on the data in the target fragmentation range in the data to be processed.
The task information corresponding to the work node may include: the number of each data fragment that the work node needs to process, and the data fragment range information corresponding to each data fragment that the work node needs to process. For example, after data fragmentation is performed on the data a to be processed, 4 data fragments (respectively, data fragment 1, data fragment 2, data fragment 3, and data fragment 4) are obtained, and if the data fragment 3 is handled by the working node 3, the data fragment node needs to send a number corresponding to the data fragment 3, data fragment range information corresponding to the data fragment 3 (for example, a generation time corresponding to a first record and a generation time corresponding to a last record in data in the data fragment 3), and the data a to be processed as task information to the working node 3. Then, the working node 3 determines, according to the number corresponding to the data fragment 3 and the data fragment range information corresponding to the data fragment 3, that the working node 3 needs to process data within the data fragment range corresponding to the data fragment 3 in the to-be-processed data a, and performs data processing on the data.
Of course, the data fragmentation node may also send, to each work node that processes the to-be-processed data, identification information of the to-be-processed data that needs to be processed by the work node (for example, a storage location where the to-be-processed data is stored in the database) and task information corresponding to the work node, then the work node determines, according to the identification information of the to-be-processed data, a storage location where the to-be-processed data is located in the database, and then, according to a data fragmentation range corresponding to the to-be-processed data carried in the task information, obtains, from the database that stores the to-be-processed data, data located in the data fragmentation range, and performs data processing on the data.
Compared with the prior art that data fragmentation is performed according to the attribute of the data to be processed, the data processing method in the specification considers the amount of resources occupied by the data to be processed, and can perform data fragmentation on the data to be processed more reasonably according to the amount of resources occupied by the data to be processed, so that the computing resources of the system are saved.
The second task type: and the data fragmentation node processes data sent by other systems except the system to which the data fragmentation node belongs.
The method comprises the following steps: and the data fragmentation node determines the data to be processed.
In this specification, when the task type corresponding to the to-be-processed data is the second task type, the to-be-processed data is data that originates from outside the data processing system and is stored in the disk in a file form. And the data fragmentation node can directly read out the data volume corresponding to the data to be processed from the disk in which the data to be processed is stored.
Step three: and the data fragmentation node determines the data volume of the data to be processed by each working node for processing the data to be processed as the resource occupation representation information corresponding to the data to be processed.
The data amount of the data to be processed, which needs to be processed by each working node processing the data to be processed, may be the data amount of the data to be processed. The data fragmentation node can directly read out the data volume of the data to be processed from the storage area of the data to be processed.
Step four: the method comprises the steps that a data fragmentation node determines an initial fragmentation position corresponding to a working node in data to be processed aiming at each working node for processing the data to be processed, then a pre-division fragmentation position is determined in the data to be processed according to the initial fragmentation position and the data quantity of the working node to be processed aiming at the data to be processed, if a preset identifier is determined to be read from the pre-division fragmentation position, a corresponding data fragmentation range of the working node in the data to be processed is determined according to the initial fragmentation position and the read preset identifier. And the corresponding data fragment ranges of different working nodes in the data to be processed are not overlapped.
In this specification, a data fragment node may determine an initial fragment position corresponding to the working node in the to-be-processed data, and then determine a termination fragment position corresponding to the working node in the to-be-processed data, so as to determine a data fragment range corresponding to the working node in the to-be-processed data.
For example, the data fragmentation node may sequence the work nodes, and the initial position of the data to be processed is taken as the initial fragmentation position of the first work node. And then, the data fragment node can determine the pre-division fragment position corresponding to the working node from the data to be processed according to the data volume corresponding to the working node. Further, the data fragmentation node reads the data to be processed backward from the pre-division fragmentation position, and when a preset identifier is read from the pre-division fragmentation position, the position where the preset identifier is read is used as a termination fragmentation position corresponding to the working node, so that a data fragmentation range corresponding to the working node in the data to be processed is determined. Then, the data fragment node may use the next position of the termination fragment position corresponding to the working node as the initial fragment position of the second working node, so as to determine the corresponding data fragment range of the second working node in the to-be-processed data in the same manner, and so on.
Of course, in this specification, the data fragment node may also start from each pre-division fragment position after determining the pre-division fragment position corresponding to each working node, read backwards in parallel, determine the corresponding termination fragment position of each working node in the to-be-processed data when reading each preset identifier, and finally determine the corresponding data fragment range of each working node in the to-be-processed data.
Wherein, the aforementioned preset identifier may be a carriage return line feed identifier.
Step five: and the data fragmentation node sends the data to be processed and task information corresponding to the working node aiming at each working node for processing the data to be processed, so that the working node determines a data fragmentation range corresponding to the working node in the data to be processed according to the task information to be used as a target fragmentation range, and performs data processing on the data in the target fragmentation range in the data to be processed.
The task information corresponding to the work node may include: the number of each data fragment that the work node needs to process, and data fragment range information corresponding to each data fragment that the work node needs to process. It should be noted that the information for determining the data fragment range corresponding to each data fragment may be a physical storage address of a fragment position corresponding to the data fragment, or may be an offset of the fragment position of the data fragment with respect to an initial position of the data to be processed.
Certainly, the data slicing node may also send, to each working node that processes the to-be-processed data, identification information of the to-be-processed data that needs to be processed by the working node and task information corresponding to the working node, then the working node determines a storage area of the to-be-processed data according to the identification information of the to-be-processed data, and then, according to a data slicing range corresponding to the working node, reads data located in the data slicing range from the storage area of the to-be-processed data, and performs data processing on the data.
In addition, in this specification, a data fragment node may allocate, to each work node that processes data to be processed, a plurality of data fragments that need to be processed by the data fragment node. When the number of the data fragments processed by each working node is greater than the set number threshold, a new idle working node can be started in the data processing system, so that the computing capacity of the system is increased, and the speed of processing the data to be processed by the system is increased.
It can be seen from the above method that, when the data slicing nodes perform data processing, the range of each data slice can be determined according to the amount of resources occupied when the data to be processed is processed, so that the data in the data slices that each working node needs to process can be more matched with the maximum amount of data that each working node can process, and further, the computing resources of the system can be more effectively and reasonably utilized. In addition, in an actual implementation process, the data fragmentation node is not required to actually divide each data fragment in the data processing method provided by the present specification, but each working node can determine the data to be processed from the data to be processed based on the data fragmentation range determined by the data fragmentation node, so that the data transmission pressure between the data fragmentation node and the working node in the data processing process can be further reduced, the stable operation of the whole data processing system is ensured, and the efficiency of data processing is ensured.
The above method for data processing provided for one or more embodiments of this specification is based on the same idea, and this specification further provides a corresponding apparatus for data processing, as shown in fig. 2.
Fig. 2 is a schematic diagram of a data processing apparatus provided in this specification, which specifically includes:
a data to be processed determining module 200, configured to determine data to be processed;
a task type determining module 201, configured to determine, according to the data to be processed, a task type corresponding to the data to be processed;
a representation information determining module 202, configured to determine, according to a task type corresponding to the to-be-processed data, resource occupation representation information corresponding to the to-be-processed data, where the resource occupation representation information is used to represent a resource amount occupied when data processing is performed on the to-be-processed data;
a data fragmentation range determining module 203, configured to determine, according to the resource occupation representation information, each data fragmentation range in the to-be-processed data, so as to perform data processing on the data in the data fragmentation range through a working node corresponding to the data fragmentation range for each data fragmentation range.
Optionally, the characterization information determining module 202 is specifically configured to determine, if it is determined that the task type corresponding to the to-be-processed data is a first task type, a time period corresponding to the current task cycle of the to-be-processed data, as a target time period, where the first task type is: the data fragmentation node processes data stored in a database in a system to which the data fragmentation node belongs; and predicting the number of working nodes required for processing the data to be processed in the current task period according to the historical data processing records corresponding to the historical task periods in the target time period, wherein the working nodes are used as the resource occupation representation information corresponding to the data to be processed.
Optionally, the characterization information determining module 202 is specifically configured to predict, according to a historical data processing record corresponding to each historical task cycle in the target time period, the number of working nodes required when data processing is performed on data generated in the target time period, and use the number as a basic node number; determining a node coefficient corresponding to the current task period according to the data volume of the data generated in the current task period, wherein the larger the node coefficient is, the larger the number of working nodes required for data processing of the data generated in the current task period is; and predicting the number of the working nodes required for processing the data to be processed in the current task period according to the node coefficients and the number of the basic nodes.
Optionally, the data fragmentation range determining module 203 is specifically configured to determine each data fragmentation range in the data to be processed according to the number of working nodes that process the data to be processed.
Optionally, the characterizing information determining module 202 is specifically configured to, if it is determined that the task type corresponding to the data to be processed is a second task type, determine a data amount of data that needs to be processed by each working node for processing the data to be processed, as the characterizing information of resource occupation corresponding to the data to be processed, where the second task type is: and the data fragmentation node processes data sent by other systems except the system to which the data fragmentation node belongs.
Optionally, the data fragmentation range determining module 203 is specifically configured to determine, for each working node that processes the to-be-processed data, a corresponding initial fragmentation position of the working node in the to-be-processed data; determining pre-division fragment positions in the data to be processed according to the initial fragment positions and the data quantity of the working node to be processed aiming at the data to be processed; and if the preset identifier is read from the pre-division fragment position, determining a corresponding data fragment range of the working node in the data to be processed according to the initial fragment position and the read preset identifier, wherein the corresponding data fragment ranges of different working nodes in the data to be processed are not overlapped.
Optionally, the data fragmentation range determining module 203 is specifically configured to send, for each working node that processes the to-be-processed data, the to-be-processed data and task information corresponding to the working node, so that the working node determines, according to the task information, a data fragmentation range corresponding to the working node in the to-be-processed data, as a target fragmentation range, and performs data processing on data located in the target fragmentation range in the to-be-processed data.
The present specification also provides a computer-readable storage medium storing a computer program operable to perform the method of data processing provided in fig. 1 above.
This specification also provides a schematic block diagram of the electronic device shown in fig. 3. As shown in fig. 3, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program, so as to implement the data processing method described in fig. 1 above. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to the software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abel (advanced boot Expression Language), ahdl (alternate Language Description Language), communication, CUPL (computer universal Programming Language), HDCal (Java Hardware Description Language), langa, Lola, mylar, HDL, PALASM, rhydl (runtime Description Language), vhjhdul (Hardware Description Language), and vhygl-Language, which are currently used commonly. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in purely computer readable program code means, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be regarded as a hardware component and the means for performing the various functions included therein may also be regarded as structures within the hardware component. Or even means for performing the functions may be conceived to be both a software module implementing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present disclosure, and is not intended to limit the present disclosure. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.
Claims (10)
1. A method of data processing, comprising:
determining data to be processed by the data fragmentation node;
determining a task type corresponding to the data to be processed according to the data to be processed;
determining resource occupation representation information corresponding to the data to be processed according to the task type corresponding to the data to be processed, wherein the resource occupation representation information is used for representing the resource amount occupied when the data to be processed is processed;
and determining each data fragment range in the data to be processed according to the resource occupation representation information, so as to perform data processing on the data in the data fragment range through the working node corresponding to the data fragment range aiming at each data fragment range.
2. The method according to claim 1, wherein determining, according to the task type corresponding to the data to be processed, the resource occupation representation information corresponding to the data to be processed specifically includes:
if the task type corresponding to the data to be processed is determined to be a first task type, determining a time period corresponding to the current task cycle of the data to be processed as a target time period, wherein the first task type is as follows: the data slicing node processes data stored in a database in a system to which the data slicing node belongs;
and predicting the number of working nodes required for processing the data to be processed in the current task period according to the historical data processing records corresponding to the historical task periods in the target time period, wherein the working nodes are used as the resource occupation representation information corresponding to the data to be processed.
3. The method according to claim 2, wherein predicting the number of work nodes required for data processing on the data to be processed in the current task cycle according to the historical data processing record corresponding to each historical task cycle in the target time period specifically includes:
predicting the number of working nodes required for data processing of data generated in the target time period according to historical data processing records corresponding to the target time period in each historical task period, and taking the number of the working nodes as the number of basic nodes;
determining a node coefficient corresponding to the current task period according to the data volume of the data generated in the current task period, wherein the larger the node coefficient is, the more the number of working nodes required for data processing of the data generated in the current task period is;
and predicting the number of working nodes required for performing data processing on the data to be processed in the current task period according to the node coefficient and the number of the basic nodes.
4. The method according to claim 2, wherein determining, according to the resource occupation representation information, each data fragment range in the to-be-processed data specifically includes:
and determining each data fragment range in the data to be processed according to the number of the working nodes processing the data to be processed.
5. The method according to claim 1, wherein determining, according to the task type corresponding to the data to be processed, the resource occupation representation information corresponding to the data to be processed specifically includes:
if it is determined that the task type corresponding to the data to be processed is a second task type, determining a data volume of data to be processed by each working node for processing the data to be processed, as resource occupation representation information corresponding to the data to be processed, where the second task type is: and the data slicing node processes data sent by other systems except the system to which the data slicing node belongs.
6. The method according to claim 5, wherein determining, according to the resource occupation representation information, each data fragment range in the to-be-processed data specifically includes:
aiming at each working node for processing the data to be processed, determining the corresponding initial fragment position of the working node in the data to be processed;
determining pre-partition fragment positions in the data to be processed according to the initial fragment positions and the data volume of the working node to be processed aiming at the data to be processed;
and if the preset identifier is read from the pre-division fragment position, determining a corresponding data fragment range of the working node in the data to be processed according to the initial fragment position and the read preset identifier, wherein the corresponding data fragment ranges of different working nodes in the data to be processed are not overlapped.
7. The method according to claim 1, wherein for each data fragmentation range, performing data processing on data in the data fragmentation range through a working node corresponding to the data fragmentation range specifically includes:
and aiming at each working node for processing the data to be processed, sending the data to be processed and task information corresponding to the working node, so that the working node determines a corresponding data fragment range of the working node in the data to be processed according to the task information to be used as a target fragment range, and performs data processing on the data in the target fragment range in the data to be processed.
8. An apparatus for data processing, comprising:
the to-be-processed data determining module is used for determining to-be-processed data;
the task type determining module is used for determining a task type corresponding to the data to be processed according to the data to be processed;
the attribute information determining module is used for determining resource occupation attribute information corresponding to the data to be processed according to the task type corresponding to the data to be processed, wherein the resource occupation attribute information is used for indicating the resource amount occupied when the data to be processed is processed;
and the data fragment range determining module is used for determining each data fragment range in the data to be processed according to the resource occupation representation information so as to process the data in the data fragment range through the working node corresponding to the data fragment range aiming at each data fragment range.
9. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110123970.0A CN114912990A (en) | 2021-01-29 | 2021-01-29 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110123970.0A CN114912990A (en) | 2021-01-29 | 2021-01-29 | Data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114912990A true CN114912990A (en) | 2022-08-16 |
Family
ID=82760941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110123970.0A Pending CN114912990A (en) | 2021-01-29 | 2021-01-29 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114912990A (en) |
-
2021
- 2021-01-29 CN CN202110123970.0A patent/CN114912990A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107450979B (en) | A blockchain consensus method and device | |
CN107038041B (en) | Data processing method, error code dynamic compatibility method, device and system | |
CN108243032B (en) | Method, device and equipment for acquiring service level information | |
CN115190078B (en) | Access flow control method, device, equipment and storage medium | |
CN107578338B (en) | Service publishing method, device and equipment | |
CN112163150A (en) | Information pushing method and device | |
CN107528885B (en) | Service request processing method and device | |
CN110046187B (en) | Data processing system, method and device | |
CN115617799A (en) | Data storage method, device, equipment and storage medium | |
CN107026897B (en) | Data processing method, device and system | |
CN114691309A (en) | Batch business processing system, method and device | |
CN112579292B (en) | A method and device for resource allocation | |
CN111290850B (en) | Data storage method, device and equipment | |
CN118606051A (en) | Resource allocation method, device, equipment and storage medium | |
CN114912990A (en) | Data processing method and device | |
CN108881367B (en) | Service request processing method, device and equipment | |
WO2024099274A1 (en) | Data processing method, device, and storage medium | |
CN117041980A (en) | Network element management method and device, storage medium and electronic equipment | |
CN113342270B (en) | Volume unloading method, device and electronic device | |
CN112346849B (en) | A method and device for configuring CPU | |
CN118466863B (en) | Data storage method and device, storage medium and electronic equipment | |
CN115599564A (en) | Task execution method and device and task execution system | |
CN107645541B (en) | Data storage method and device and server | |
CN112583733A (en) | Interface traffic shaping method and device, storage medium and electronic equipment | |
CN117348999B (en) | A business execution system and business execution method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |