CN116661873A - Data parallel processing method, device, computer equipment and storage medium - Google Patents

Data parallel processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116661873A
CN116661873A CN202310654602.8A CN202310654602A CN116661873A CN 116661873 A CN116661873 A CN 116661873A CN 202310654602 A CN202310654602 A CN 202310654602A CN 116661873 A CN116661873 A CN 116661873A
Authority
CN
China
Prior art keywords
file
data
processed
processors
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310654602.8A
Other languages
Chinese (zh)
Inventor
贾俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202310654602.8A priority Critical patent/CN116661873A/en
Publication of CN116661873A publication Critical patent/CN116661873A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing and financial science and technology, and particularly discloses a data parallel processing method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a file to be processed; splitting subfiles of the files to be processed according to a preset data size threshold value to obtain a plurality of target subfiles; distributing the plurality of target subfiles to a plurality of processors; based on the principle that each processor executes in parallel, executing processing on the target subfiles distributed to the processor by the processor to obtain respective corresponding execution results of the plurality of processors; and generating a data processing result aiming at the file to be processed according to each execution result. The scheme provided by the invention fully utilizes the computing resources of the processor, and greatly improves the efficiency of processing large files with large data volume.

Description

Data parallel processing method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing and financial technology, and in particular, to a data parallel processing method, apparatus, computer device, and storage medium.
Background
With the development of computer technology, more and more technologies (such as big data, cloud computing or blockchain) are applied in the financial world, and the traditional financial industry is gradually changing to financial technology. In the context of big data technology, there is a case of processing a financial big file with a large data volume, for example, in a transaction service system of a bank, when there is a case of processing a big file with a large data volume, such as transaction data or payment data, a processing manner is generally adopted, in which all data of the big file is directly read into a certain processor in a device to be processed correspondingly, and the processor has a limited running memory and a large data volume. Therefore, the serial processing method is adopted to process the large file, and the problem of low efficiency when the large file with large data volume is processed is caused.
Disclosure of Invention
The invention provides a data parallel processing method, a data parallel processing device, computer equipment and a medium, which are used for solving the technical problem of low efficiency when large files with large data volume are processed.
In a first aspect, a data parallel processing method is provided, including:
acquiring a file to be processed;
splitting the files to be processed into subfiles according to a preset data size threshold value to obtain a plurality of target subfiles;
Distributing the target subfiles to the processors;
based on the principle that each processor executes in parallel, executing processing on the target subfiles distributed to the processors by the processors to obtain execution results corresponding to the processors;
and generating a data processing result aiming at the file to be processed according to each execution result.
In a second aspect, there is provided a data parallel processing apparatus comprising:
the acquisition module is used for acquiring the file to be processed;
the splitting module is used for splitting the files to be processed into sub-files according to a preset data size threshold value to obtain a plurality of target sub-files;
the distribution module is used for distributing the target subfiles to the processors;
the text execution module is used for executing processing on the target subfiles distributed to the processors by the processors based on the principle that the processors execute in parallel to obtain execution results corresponding to the processors;
and the generation module is used for generating a data processing result aiming at the file to be processed according to each execution result.
In a third aspect, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the data parallel processing method described above when executing the computer program.
In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the data parallel processing method described above.
In the scheme realized by the data parallel processing method, the data parallel processing device, the computer equipment and the storage medium, the sub-file splitting is carried out on the file to be processed according to the preset data size threshold value, so that a plurality of target sub-files are obtained; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to each execution result, a data processing result aiming at the file to be processed is generated, the target subfiles are split and a plurality of processors are distributed to perform data processing in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing large files with large data volume is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of a data parallel processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data parallel processing method according to an embodiment of the invention;
FIG. 3 is a flowchart illustrating the step S20 in FIG. 2;
FIG. 4 is a flowchart illustrating the step S30 in FIG. 2;
FIG. 5 is a flowchart illustrating the step S40 in FIG. 2;
FIG. 6 is a flowchart illustrating the step S42 in FIG. 5;
FIG. 7 is a schematic diagram of a data parallel processing apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention;
fig. 9 is a schematic diagram of another configuration of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The data parallel processing method provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server through a network. The server side can receive the file to be processed through the client side, and split the sub-files of the file to be processed according to a preset data size threshold value to obtain a plurality of target sub-files; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to the method and the device, the server side feeds back the data processing result to the client side according to each execution result, when the server side processes the file to be processed with larger file, the target subfiles are split and distributed to a plurality of processors to process data in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing the large file with large data volume is greatly improved.
In this embodiment, the server may be a server corresponding to a stock exchange service system of a bank, and the client may be a mobile client or a fixed client corresponding to the stock exchange service system of the bank, for example, in a trade time period of stock exchange, the client sends a trade data file with a larger data size generated in a predetermined time interval to the server as a file to be processed for calculation processing.
Among other things, clients may include, but are not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers. The present invention will be described in detail with reference to specific examples.
Referring to fig. 2, fig. 2 is a schematic flow chart of a data parallel processing method according to an embodiment of the invention, including the following steps S10-S50:
s10: and obtaining the file to be processed.
In the invention, the file to be processed is a file to be processed which is acquired by the server side from any one client side, and the file to be processed is a file which is sent to the server side by the client side and is used for requesting to execute a certain business service. It will be appreciated that in this embodiment, the file to be processed is typically a file with a larger data size. In particular, in a stock exchange system of a bank, in order to ensure timeliness of processing data in the system, transaction data files with larger data sizes need to be processed quickly.
In one embodiment, the file read type of the text to be processed obtained by the server may include various types, such as reading in bytes, reading in rows, or reading in blocks. The specific mode can be realized according to the text size of the file to be processed, for example, the file with smaller data size can be read according to bytes to obtain the file to be processed, and the file with larger data size can be read according to rows or blocks. Of course, for any text to be processed, the file may be obtained by any manner of byte reading, line-by-line reading and block-by-block reading, which is not limited herein specifically.
S20: and splitting the subfiles of the files to be processed according to a preset data size threshold value to obtain a plurality of target subfiles.
In the invention, after the server side obtains the file to be processed, the file to be processed needs to be split in order to facilitate the processing of the file to be processed by a plurality of processors. In splitting, the file to be processed can be split into a plurality of target subfiles with smaller data sizes according to a preset data size threshold. For example, for a transaction data file, a server corresponding to a stock exchange system may split the transaction data file generated within a predetermined time interval into a plurality of transaction data subfiles of smaller data size.
It is understood that the data size threshold may be a predetermined data value or a predetermined range of data values, which is not limited herein.
As shown in fig. 3, a scheme for splitting a file to be processed is provided, and in S20, the method specifically includes the following steps S21-S23:
s21, determining the data size threshold for data splitting based on the file reading type and the file data size of the file to be processed.
In the invention, when splitting the file to be processed, a data size threshold value for splitting the file to be processed can be determined according to the file reading type and the file data size of the file to be processed, wherein the data size threshold value is used for enabling a target sub-file obtained by splitting to meet a certain data size, thereby splitting the file to be processed.
In one embodiment of the present invention, when determining the data size threshold for splitting data based on the file read type and the file data size of the file to be processed, the data size threshold for splitting data of the file to be processed may be determined according to a pre-established correspondence between the two factors, i.e., the file read type and the file data size, and the data size threshold.
In the scheme of the embodiment, the data size threshold value can be rapidly determined according to a mapping relation table between two factors, namely the file reading type and the file data size, and the data size threshold value, so that the efficiency of splitting the data of the file to be processed is improved.
In one embodiment of the present invention, when determining the data size threshold for performing data splitting based on the file read type and the file data size of the file to be processed, it is also possible to implement the following steps.
Determining a maximum data splitting threshold value and a minimum data splitting threshold value for splitting data based on a file reading type of a file to be processed; determining a data size threshold for data splitting within a data size range consisting of a minimum data splitting threshold and a maximum data splitting threshold based on file data size
In this embodiment, when splitting a file to be processed, a file read type of the file to be processed may be obtained first, where the file read type may be any one of byte read, line read or block read, and different file read types determine a maximum data size and a minimum data size that the file to be processed may be used for being split while ensuring the data property.
In this embodiment, the maximum data size and the minimum data size of the files corresponding to different file reading types that can be split may be determined according to the processing situation of the history file, the maximum data splitting threshold and the minimum data splitting threshold of the files corresponding to each file reading type are determined, and accordingly, a correspondence between the file reading type of the file and the maximum data splitting threshold and the minimum data splitting threshold of the file capable of being split is established, and further, when the file reading type of a certain file to be processed is obtained, the file reading type of the file to be processed and the correspondence may be directly determined, and the maximum data splitting threshold and the minimum data splitting threshold of the file to be split are determined.
Further, after determining the maximum data splitting threshold and the minimum data splitting threshold for data splitting, the data size threshold for data splitting may be determined according to the file data size of the file to be processed, between the data size range formed by the minimum data splitting threshold and the maximum data splitting threshold. It will be appreciated that the data size threshold is a data size value that lies between a data size range consisting of a minimum data splitting threshold and a maximum data splitting threshold, which may be set relatively larger when the file data size of the file to be processed is large and set relatively smaller when the file data size of the file to be processed is small. Of course, the average value of the minimum data splitting threshold and the maximum data splitting threshold may be directly used as the data size threshold for data splitting.
In the scheme of the invention, the maximum data splitting threshold value and the minimum data splitting threshold value for splitting data are determined based on the file reading type of the file to be processed, and the data size threshold value for splitting data is determined based on the file data size between the data size range consisting of the minimum data splitting threshold value and the maximum data splitting threshold value, so that the integrity of the split data can be ensured, the subsequent data processing can be conveniently carried out when the file to be processed is split, the file to be processed can be split maximally, and the effect of splitting the data is improved.
S22, splitting the file to be processed into a plurality of data blocks with the data block size smaller than or equal to the data size threshold.
In the invention, after determining the data size threshold, when splitting the file to be processed according to the data size threshold, the file to be processed may be gradually split into a plurality of data blocks with the data block size smaller than or equal to the data size threshold. Specifically, taking an example that the file data size of the file to be processed which needs to be split is 1000.5MB and the data size threshold value is 1MB, the file to be processed can be split into one thousand data blocks of 1MB and one data block of 0.5MB, so that the file to be processed is split.
S23, obtaining a plurality of target subfiles based on the plurality of data blocks obtained through splitting.
And when the plurality of split data blocks are obtained, the plurality of split data blocks can be used as a plurality of target subfiles obtained after splitting the file to be processed.
S30: and distributing the target subfiles to the processors.
In this embodiment, in order to process a file to be processed more quickly, after the file to be processed is split to obtain multiple target subfiles, the multiple target subfiles are distributed to multiple processors to process the file to be processed, so that the file to be processed more quickly is realized based on the principle that each processor executes in parallel.
As shown in fig. 4, a scheme of specifically distributing a plurality of target subfiles to a plurality of processors is provided, and S30 specifically includes the following steps S31-S33:
step S31, determining the file types corresponding to the target subfiles respectively.
Step S32, taking the target subfiles of the same file type as a file set to obtain a plurality of file sets.
And step S33, respectively distributing the plurality of file sets to different pluralities of processors.
In steps S31-S33, to more quickly process multiple target subfiles, one may first calculate
In this embodiment, in order to process multiple target subfiles faster, the file types corresponding to the multiple target subfiles may be determined, and the data types of the data included in the target subfiles with the same file type are similar, and if the data types are allocated to the same processor for processing, the efficiency of the processor for processing the data may be improved. For this purpose, a file type corresponding to each of the plurality of target subfiles may be determined, and specifically, a file type corresponding to each of the target subfiles may be determined according to file attribute information of each of the target subfiles.
After determining the file types corresponding to the target subfiles, the target subfiles with the same file type can be used as a file set to obtain a plurality of file sets. After obtaining a plurality of file sets, the processors the same as the file sets in number can be selected according to the number of the file sets, and each file set is distributed into one processor. For example, when the plurality of target subfiles have 3 file types, 3 file sets can be obtained according to the target subfiles of the 3 file types, and 3 processors are selected, so that the 3 file sets are respectively allocated to the 3 processors.
In the scheme of the embodiment, the target subfiles with the same file type are used as a file set and distributed to one processor for processing, so that the target subfiles with the same file type can be distributed to the same processor for processing, and the efficiency of the processor for processing data can be improved.
In one embodiment, after obtaining the plurality of file sets, a processor that is an integer multiple of the number of file sets may be selected according to the number of file sets, and each file set may be allocated to the plurality of processors. For example, when the plurality of target subfiles respectively correspond to 3 file types, 3 file sets may be obtained according to the file sets of the 3 file types, and 12 processors may be selected from all available processors, where each 4 processors are used to process target subfiles in the same file set, and the target subfiles in the same file set may have 4 processors that may be allocated.
By the method, the target subfiles of the same file type can be distributed to more processors for processing, and therefore the efficiency of data processing of the processors is further improved.
In one embodiment, when the processors with the same number as the file sets are selected according to the number of the file sets, for all available processors, the remaining memory space corresponding to each of all available processors may be calculated first, one file set is selected from all the file sets and allocated to the processor with the largest remaining memory space among all available processors, after one file set is allocated, the remaining memory space corresponding to each of all available processors after the file set is allocated is recalculated, one file set is selected from all the unallocated file sets and allocated to the processor with the largest remaining memory space among all available processors, and the step of jumping to the step of calculating the remaining memory space corresponding to each of all available processors after one file set is allocated again is continued until the file set is allocated. The remaining memory space refers to the running memory space that the processor may use to perform data processing. It will be appreciated that the greater the remaining memory space, the greater the processing performance of the processor. By the method, the file set can be preferentially distributed to the processor with stronger relative idle processing capacity for processing, and the performance of the processor can be fully mined, so that the data processing efficiency is improved.
S40: and executing processing on the target subfiles distributed to the processors by the processors based on the principle that the processors execute in parallel, so as to obtain execution results corresponding to the processors.
In this embodiment, after a plurality of target subfiles are allocated to a plurality of processors for performing data processing, based on a principle that each processor performs parallel execution, the plurality of processors may perform processing on the target subfiles allocated to the processor, to obtain execution results corresponding to each of the plurality of processors, so that the server may implement processing on the file to be processed. Taking the target subfile as a transaction data subfile as an example, when a server corresponding to the stock transaction service system executes processing on the transaction data subfile allocated with the processor based on a plurality of processors in the server, a transaction calculation result is obtained when the plurality of processors process the transaction data subfile allocated with the processor.
As shown in fig. 5, there is provided a scheme for executing, by a processor, a process on a target subfile assigned to the processor, specifically based on a principle that respective processors execute in parallel, S40 specifically including the following steps S41 to S43:
S41: for any one of a plurality of processors, creating a thread by the processor according to the allocated target subfile, and creating a task queue in the thread according to the allocated target subfile.
S42: based on the parallel execution principle of each processor, executing task processing on the corresponding task queue through each thread to obtain a task processing result corresponding to each thread.
S43: and obtaining execution results corresponding to the processors respectively based on the task processing results.
In S41 to S43, when processing is performed on the target subfile by a plurality of processors, each processor creates a thread for performing task processing according to the assigned target subfile for any one of the plurality of processors; each processor also creates a task queue for executing task processing in the thread according to the assigned target subfile, that is, each processor creates a task queue for task processing according to the assigned target subfile, and each target subfile can be respectively created one or more tasks as tasks in the task queue corresponding to the thread of each processor. The thread is used as an application program running in the memory of the processor and is used for processing tasks in the task queue corresponding to the thread.
After each processor creates a thread for executing task processing according to the assigned target subfile, and creates a task queue for executing task processing in the thread according to the assigned target subfile. Based on the parallel execution principle of each processor, when the threads of the processors execute task processing on the task queues in the threads, the threads are mutually independent, so that no extra thread is needed to process interaction between the two threads, the number of threads is reduced, and meanwhile, the data transmission and synchronization overhead between the threads can be reduced, thereby improving the data processing efficiency.
After the task processing results corresponding to each thread are obtained, the execution results corresponding to the multiple processors can be obtained based on the task processing results corresponding to the threads.
As shown in fig. 6, a step S42 of executing task processing on a corresponding task queue by each thread specifically includes the following steps S421 to S422, specifically based on the principle that each of the processors executes in parallel is provided:
s421: and executing task processing on the corresponding task queues by the threads based on the parallel execution principle of the processors.
S422: if all the tasks in the task queues are processed by the thread, the thread acquires tasks from other threads of all the tasks in the task queues which are not processed and executes task processing until all the tasks in the task queues are processed, and the task processing result corresponding to each thread is obtained.
In this embodiment, when a task processing is executed on a task queue in a thread by threads of a plurality of processors based on a principle that each processor executes in parallel, if there is a thread that has already processed all tasks in the task queue, the thread acquires tasks from other threads that have not yet processed all tasks in the task queue and executes the task processing.
Specifically, if there is a thread that has processed all tasks in the task queue, the thread acquires tasks from other threads that have not processed all tasks in the task queue and performs task processing until all tasks in the task queue have been processed by multiple threads, and step S422 of obtaining a task processing result corresponding to each thread may include: if all the tasks in the task queue are processed by the thread, taking the thread which has processed all the tasks in the task queue as an idle thread, and taking any thread which has not processed all the tasks in the task queue as a target thread; and acquiring tasks at the tail of the queue from the task queue of the target thread through the idle thread, and executing task processing until all the threads have processed all the tasks in the task queue, so as to obtain the task processing result corresponding to each thread.
In this embodiment, when a thread has processed all tasks in the task queue, the thread that has processed all tasks in the task queue is taken as an idle thread, and any other thread that has not processed all tasks in the task queue is taken as a target thread for acquiring tasks. Specifically, tasks at the tail of the queue are generally obtained from task queues of the target threads and task processing is executed, so that the influence on the current task processing of other target threads is avoided, and faults are avoided.
Optionally, the idle thread which has processed all the tasks in the task queue may preferably select other target threads similar to the load thereof to acquire the tasks and execute task processing until all the threads have processed all the tasks in the task queue, so as to obtain task processing results corresponding to each thread, and further obtain execution results corresponding to each of the multiple processors.
Optionally, the idle thread which has processed all the tasks in the task queue may also obtain task execution task processing from other target threads with the largest load until all the threads have processed all the tasks in the task queue, so as to obtain task processing results corresponding to each thread, and further obtain execution results corresponding to each of the multiple processors.
In the scheme of the embodiment, the load balance of the tasks can be ensured, excessive tasks accumulated in the task queues of certain threads are avoided, and the data processing efficiency can be effectively improved.
S50: and generating a data processing result aiming at the file to be processed according to each execution result.
In this embodiment, after obtaining execution results corresponding to each of the plurality of processors, the server merges the execution results of the plurality of processors to generate a data processing result for the file to be processed, so that the data processing result is conveniently sent to the client, and feedback of service initiated by the client is achieved. Taking the target subfile as a transaction data subfile as an example, a server corresponding to the stock transaction service system combines transaction calculation results corresponding to the transaction data subfiles of the plurality of processors to generate a final transaction calculation result aiming at the transaction data subfiles, and further the final transaction calculation result is conveniently sent to a client for display processing or other subsequent processing.
In the above scheme, the sub-file splitting is performed on the file to be processed according to the preset data size threshold value, so as to obtain a plurality of target sub-files; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to each execution result, a data processing result aiming at the file to be processed is generated, the target subfiles are split and a plurality of processors are distributed to perform data processing in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing large files with large data volume is greatly improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
In an embodiment, a data parallel processing apparatus is provided, which corresponds to the data parallel processing method in the above embodiment one by one. As shown in fig. 7, the data parallel processing apparatus includes an acquisition module 101, a splitting module 102, an allocation module 103, a text execution module 104, and a generation module 105. The functional modules are described in detail as follows:
an obtaining module 101, configured to obtain a file to be processed;
the splitting module 102 is configured to split the subfiles of the file to be processed according to a preset data size threshold, so as to obtain a plurality of target subfiles;
an allocation module 103, configured to allocate a plurality of the target subfiles to a plurality of the processors;
a text execution module 104, configured to execute, by using the processors, processing the target subfiles allocated to the processors based on a principle that each of the processors executes in parallel, so as to obtain execution results corresponding to each of the plurality of processors;
And the generating module 105 is used for generating a data processing result aiming at the file to be processed according to each execution result.
In one embodiment, the splitting module 102 is specifically configured to:
determining the data size threshold for splitting data based on the file reading type and the file data size of the file to be processed;
splitting the file to be processed into a plurality of data blocks with the data block size smaller than or equal to the data size threshold;
and obtaining a plurality of target subfiles based on the plurality of data blocks obtained by splitting.
In one embodiment, the splitting module 102 is specifically configured to:
determining a maximum data splitting threshold value and a minimum data splitting threshold value for splitting data based on the file reading type of the file to be processed;
and determining the data size threshold for data splitting within the data size range consisting of the minimum data splitting threshold and the maximum data splitting threshold based on the file data size.
In one embodiment, the allocation module 103 is configured to:
determining the file types corresponding to the target subfiles respectively;
taking a target sub-file with the same file type as a file set to obtain a plurality of file sets;
And respectively distributing the plurality of file sets to different pluralities of processors.
In one embodiment, the text execution module 104 is specifically configured to:
creating, by the processor, a thread from the allocated target subfile for any one of the plurality of processors, and creating a task queue in the thread from the allocated target subfile;
based on the parallel execution principle of each processor, executing task processing on the corresponding task queue through each thread to obtain a task processing result corresponding to each thread;
and obtaining execution results corresponding to the processors based on the task processing results.
In one embodiment, the text execution module 104 is specifically configured to:
executing task processing on the corresponding task queues through each thread based on the parallel execution principle of each processor;
if all the tasks in the task queues are processed by the thread, the thread acquires tasks from other threads of all the tasks in the task queues which are not processed and executes task processing until all the tasks in the task queues are processed, and the task processing result corresponding to each thread is obtained.
In one embodiment, the text execution module 104 is specifically configured to:
if all the tasks in the task queue are processed by the thread, taking the thread which has processed all the tasks in the task queue as an idle thread, and taking any thread which has not processed all the tasks in the task queue as a target thread;
and acquiring tasks at the tail of the queue from the task queue of the target thread through the idle thread, and executing task processing until all the threads have processed all the tasks in the task queue, so as to obtain the task processing result corresponding to each thread.
The invention provides a data parallel processing device, which is used for splitting subfiles of a file to be processed according to a preset data size threshold value to obtain a plurality of target subfiles; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to each execution result, a data processing result aiming at the file to be processed is generated, the target subfiles are split and a plurality of processors are distributed to perform data processing in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing large files with large data volume is greatly improved.
For specific limitations of the data parallel processing apparatus, reference may be made to the above limitation of the data parallel processing method, and no further description is given here. The respective modules in the above-described data parallel processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external client via a network connection. The computer program, when executed by a processor, performs the functions or steps of a server-side of an artificial intelligence based data parallel processing method.
In one embodiment, a computer device is provided, which may be a client, the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program, when executed by a processor, performs a function or steps on a client side of a data parallel processing method
In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the computer program:
acquiring a file to be processed;
splitting the files to be processed into subfiles according to a preset data size threshold value to obtain a plurality of target subfiles;
Distributing the target subfiles to the processors;
based on the principle that each processor executes in parallel, executing processing on the target subfiles distributed to the processors by the processors to obtain execution results corresponding to the processors;
and generating a data processing result aiming at the file to be processed according to each execution result.
The invention provides computer equipment, which is used for splitting subfiles of a file to be processed according to a preset data size threshold value to obtain a plurality of target subfiles; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to each execution result, a data processing result aiming at the file to be processed is generated, the target subfiles are split and a plurality of processors are distributed to perform data processing in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing large files with large data volume is greatly improved.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
Acquiring a file to be processed;
splitting the files to be processed into subfiles according to a preset data size threshold value to obtain a plurality of target subfiles;
distributing the target subfiles to the processors;
based on the principle that each processor executes in parallel, executing processing on the target subfiles distributed to the processors by the processors to obtain execution results corresponding to the processors;
and generating a data processing result aiming at the file to be processed according to each execution result.
The invention provides a computer readable storage medium, which is used for splitting a sub-file of a file to be processed according to a preset data size threshold value to obtain a plurality of target sub-files; distributing the target subfiles to a plurality of processors, and executing processing on the target subfiles distributed to the processors by the processors based on the parallel execution principle of each processor to obtain the execution results corresponding to the processors; according to each execution result, a data processing result aiming at the file to be processed is generated, the target subfiles are split and a plurality of processors are distributed to perform data processing in parallel, the computing resources of the processors are fully utilized, and the efficiency of processing large files with large data volume is greatly improved.
It should be noted that, the functions or steps implemented by the computer readable storage medium or the computer device may correspond to the relevant descriptions of the server side and the client side in the foregoing method embodiments, and are not described herein for avoiding repetition.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. A method for parallel processing of data, comprising:
acquiring a file to be processed;
splitting the files to be processed into subfiles according to a preset data size threshold value to obtain a plurality of target subfiles;
Distributing the target subfiles to the processors;
based on the principle that each processor executes in parallel, executing processing on the target subfiles distributed to the processors by the processors to obtain execution results corresponding to the processors;
and generating a data processing result aiming at the file to be processed according to each execution result.
2. The method for parallel processing of data according to claim 1, wherein the splitting the subfile of the file to be processed according to the preset data size threshold to obtain a plurality of target subfiles includes:
determining the data size threshold for splitting data based on the file reading type and the file data size of the file to be processed;
splitting the file to be processed into a plurality of data blocks with the data block size smaller than or equal to the data size threshold;
and obtaining a plurality of target subfiles based on the plurality of data blocks obtained by splitting.
3. The method according to claim 2, wherein determining the data size threshold for data splitting based on the file read type and file data size of the file to be processed comprises:
Determining a maximum data splitting threshold value and a minimum data splitting threshold value for splitting data based on the file reading type of the file to be processed;
and determining the data size threshold for data splitting within the data size range consisting of the minimum data splitting threshold and the maximum data splitting threshold based on the file data size.
4. The method of parallel processing of data according to claim 1, wherein said distributing the plurality of the target subfiles to the plurality of the processors comprises:
determining the file types corresponding to the target subfiles respectively;
taking a target sub-file with the same file type as a file set to obtain a plurality of file sets;
and respectively distributing the plurality of file sets to different pluralities of processors.
5. The method according to claim 1, wherein the step of performing, by the processor, processing of the target subfiles allocated to the processor based on the principle of parallel execution of the respective processors to obtain execution results corresponding to the respective processors, includes:
creating, by the processor, a thread from the allocated target subfile for any one of the plurality of processors, and creating a task queue in the thread from the allocated target subfile;
Based on the parallel execution principle of each processor, executing task processing on the corresponding task queue through each thread to obtain a task processing result corresponding to each thread;
and obtaining execution results corresponding to the processors based on the task processing results.
6. The method of claim 5, wherein the executing task processing on the task queue corresponding to each thread by each thread based on the principle of parallel execution of each processor, to obtain a task processing result corresponding to each thread, includes:
executing task processing on the corresponding task queues through each thread based on the parallel execution principle of each processor;
if all the tasks in the task queues are processed by the thread, the thread acquires tasks from other threads of all the tasks in the task queues which are not processed and executes task processing until all the tasks in the task queues are processed, and the task processing result corresponding to each thread is obtained.
7. The method of parallel processing of data according to claim 6, wherein if there is a task in the task queue that has been processed by the thread, the thread obtains tasks from other threads that have not processed all tasks in the task queue and performs task processing until all tasks in each task queue have been processed, and obtaining the task processing result corresponding to each thread includes:
If all the tasks in the task queue are processed by the thread, taking the thread which has processed all the tasks in the task queue as an idle thread, and taking any thread which has not processed all the tasks in the task queue as a target thread;
and acquiring tasks at the tail of the queue from the task queue of the target thread through the idle thread, and executing task processing until all the threads have processed all the tasks in the task queue, so as to obtain the task processing result corresponding to each thread.
8. A data parallel processing apparatus, comprising:
the acquisition module is used for acquiring the file to be processed;
the splitting module is used for splitting the files to be processed into sub-files according to a preset data size threshold value to obtain a plurality of target sub-files;
the distribution module is used for distributing the target subfiles to the processors;
the text execution module is used for executing processing on the target subfiles distributed to the processors by the processors based on the principle that the processors execute in parallel to obtain execution results corresponding to the processors;
And the generation module is used for generating a data processing result aiming at the file to be processed according to each execution result.
9. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data parallel processing method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the data parallel processing method according to any one of claims 1 to 7.
CN202310654602.8A 2023-06-02 2023-06-02 Data parallel processing method, device, computer equipment and storage medium Pending CN116661873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310654602.8A CN116661873A (en) 2023-06-02 2023-06-02 Data parallel processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310654602.8A CN116661873A (en) 2023-06-02 2023-06-02 Data parallel processing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116661873A true CN116661873A (en) 2023-08-29

Family

ID=87722095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310654602.8A Pending CN116661873A (en) 2023-06-02 2023-06-02 Data parallel processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116661873A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132269A (en) * 2024-03-25 2024-06-04 重庆赛力斯凤凰智创科技有限公司 Method, device, equipment and medium for processing vehicle data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132269A (en) * 2024-03-25 2024-06-04 重庆赛力斯凤凰智创科技有限公司 Method, device, equipment and medium for processing vehicle data

Similar Documents

Publication Publication Date Title
CN109766182B (en) System resource dynamic expansion and contraction method and device, computer equipment and storage medium
US11442790B2 (en) Resource scheduling system, method and server for ensuring high availability of services
CN110162382B (en) Container-based gray level publishing method, device, computer equipment and storage medium
CN111104208B (en) Process scheduling management method, device, computer equipment and storage medium
WO2018058998A1 (en) Data loading method, terminal and computing cluster
CN113849312B (en) Data processing task allocation method and device, electronic equipment and storage medium
CN107153643B (en) Data table connection method and device
CN109325026B (en) Data processing method, device, equipment and medium based on big data platform
US20190332328A1 (en) Storage Controller and IO Request Processing Method
CN115408100A (en) Container cluster scheduling method, device, equipment and storage medium
CN110267060B (en) Video file storage injection method and device, computer equipment and storage medium
CN116661873A (en) Data parallel processing method, device, computer equipment and storage medium
CN111338779B (en) Resource allocation method, device, computer equipment and storage medium
CN111949681A (en) Data aggregation processing device and method and storage medium
CN109923533B (en) Method and apparatus for separating computation and storage in a database
CN108388409B (en) Print request processing method, apparatus, computer device and storage medium
WO2016084327A1 (en) Resource prediction device, resource prediction method, resource prediction program and distributed processing system
CN113419672A (en) Storage capacity management method, system and storage medium
CN115208900B (en) Multi-cloud architecture cloud service resource scheduling method based on blockchain and game model
CN108920278B (en) Resource allocation method and device
WO2020088078A1 (en) Fpga-based data processing method, apparatus, device and medium
CN111259012A (en) Data homogenizing method and device, computer equipment and storage medium
CN108173892B (en) Cloud mirror image operation method and device
CN113204382B (en) Data processing method, device, electronic equipment and storage medium
US10992517B1 (en) Dynamic distributed execution budget management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination