CN117667316A - Asynchronous multithreading file processing method, device and storage medium - Google Patents

Asynchronous multithreading file processing method, device and storage medium Download PDF

Info

Publication number
CN117667316A
CN117667316A CN202311661885.5A CN202311661885A CN117667316A CN 117667316 A CN117667316 A CN 117667316A CN 202311661885 A CN202311661885 A CN 202311661885A CN 117667316 A CN117667316 A CN 117667316A
Authority
CN
China
Prior art keywords
data
file
writing
semaphore
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311661885.5A
Other languages
Chinese (zh)
Other versions
CN117667316B (en
Inventor
贾龙龙
严萍萍
张磊
俞敏
铁锦程
李虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202311661885.5A priority Critical patent/CN117667316B/en
Publication of CN117667316A publication Critical patent/CN117667316A/en
Application granted granted Critical
Publication of CN117667316B publication Critical patent/CN117667316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a file processing method, equipment and storage medium of asynchronous multithreading, wherein the method comprises the following steps: segmenting a large data volume file to be written into data pages, storing the data pages according to the pages, and obtaining a file set of the file to be written based on a preset single file size; when the file set is not empty, configuring a preset semaphore to a first state, writing files into target positions one by one in a single-thread mode, configuring the semaphore to a second state after writing is completed, and repeatedly executing the steps until the file set is empty; aiming at the reading process, carrying out multi-process concurrent inquiry on the premise that the semaphore is configured to be in a second state. Compared with the prior art, the method has the advantages of high reading and writing efficiency of large-data-volume files, realization of separation of reading and writing threads, concurrency of writing threads control query threads and the like.

Description

Asynchronous multithreading file processing method, device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a storage medium for processing an asynchronous multithreading file.
Background
For industries with large data volume and request volume such as banks, how to realize efficient file processing is a problem to be improved.
Chinese patent application publication No. CN 116226251A discloses a data export method, apparatus, electronic device and storage medium, which belongs to the field of data transmission. It specifically discloses that: storing the preposed data obtained from the database into a memory; setting the paging number according to the total number of the data to be exported, and sequentially acquiring the paging data from the database through the main thread according to the retrieval associated data so as to store the paging data into the memory; and acquiring the paging data in the memory through at least one slave thread and performing data processing to acquire the paging data file with the data processing completed. The scheme realizes complete export of the data to be exported on the premise of occupying less memory resources required by paging data, but the application only discloses a process of exporting the data, and does not disclose how to realize good coordination of read and write requests.
To sum up, part of the current batch file generation adopts a mode of firstly inquiring a database and then writing the file, and for the file with large data volume, all data cannot be obtained at one time. The paging inquiry needs to be carried out, one page of data is searched, the next page of data is inquired and written after the data is written into the file, the efficiency of generating the file is low, and a file processing method for solving or partially solving the problems is lacking at present.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an asynchronous multithreading file processing method, device and storage medium so as to improve the read-write efficiency of files with large data volume.
The aim of the invention can be achieved by the following technical scheme:
in one aspect of the present invention, there is provided a file processing method of asynchronous multithreading, including the steps of:
segmenting a large data volume file to be written into data pages, storing the data pages according to the pages, and obtaining a file set of the file to be written based on a preset single file size;
when the file set is not empty, configuring a preset semaphore to a first state, writing files into target positions one by one in a single-thread mode, configuring the semaphore to a second state after writing is completed, and repeatedly executing the steps until the file set is empty;
aiming at the reading process, carrying out multi-process concurrent inquiry on the premise that the semaphore is configured to be in a second state.
As a preferable technical scheme, the process of segmenting a large data volume file to form data pages and storing the data pages comprises the following steps:
and inquiring the maximum record and the minimum record of the large-data-volume file, segmenting the maximum record and the minimum record to form data pages, and storing the data pages according to the pages.
As a preferred technical solution, the process of writing files into target locations one by one includes:
detecting whether a temporary data set is empty, if not, obtaining a current processed data page based on data in the temporary data set, if so, obtaining the current processed data page based on the file set, traversing the current processed data page, writing data into a target position one by one, taking the next data page as the current data page, and repeating the step.
As a preferable technical solution, in the writing process, if the written data reaches a preset single file size, the data to be written is put into the temporary data set and the semaphore is configured to be in the second state.
As a preferable technical scheme, after the writing is completed, the method further comprises deleting the data page after the writing is completed.
As a preferred technical solution, the process of concurrent query based on the query request includes the following steps:
when the semaphore is configured in a second state, the data of each data page is concurrently queried in a multi-threaded manner and stored in order.
As a preferable technical solution, when the semaphore is configured in the first state, the thread blocking state is entered until the semaphore is configured in the second state.
As a preferable technical scheme, the data pages are stored in the map container page by page.
In another aspect of the present invention, there is provided an electronic apparatus including: one or more processors and a memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the asynchronous multithreading file processing method described above.
In another aspect of the invention, a computer readable storage medium is provided that includes one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing the asynchronous multithreading file processing method described above.
Compared with the prior art, the invention has the following beneficial effects:
(1) The reading and writing efficiency of the large data volume file is high: according to the method and the device, asynchronous reading and writing are realized by adopting a mode of multithreading reading and single-thread writing, file reading and writing are performed simultaneously, the problem of long time consumption for reading data is solved, and large-data-quantity file reading and writing can be performed in a short time.
(2) The read-write thread separation is realized, and the write thread controls the inquiry thread concurrency: the method and the device control the number of concurrent query threads by using the semaphore, the read thread and the write thread hold the same semaphore together, when the read thread acquires the semaphore, a new thread is started to concurrent query the current page data, and when the semaphore acquisition fails, the thread is blocked. The write thread will release the semaphore in time after the page data is written into the file. After the reading thread obtains the data, the next page of data is inquired in time. And the maximum cache data quantity in the memory is effectively controlled through the control of the signal quantity.
Drawings
FIG. 1 is a flow chart of a method of asynchronous multithreading file processing in an embodiment;
FIG. 2 is a flow diagram of a main (write) thread in an embodiment;
FIG. 3 is a flow chart of a sub (read) thread in an embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Example 1
Aiming at the problem that a scene of generating batch files in high efficiency and order cannot be generated under a large data volume of an application system, the embodiment provides an asynchronous multithreading file processing method which is used for generating the batch files with large data volume based on a database.
Referring to fig. 1, the method comprises the steps of:
s1, carrying out segmentation processing on a large data volume file to be written to form a data page, storing according to the page, and obtaining a file set of the file to be written based on a preset single file size.
And before the file processing of the large data volume, paging processing is carried out on the data. The main thread firstly inquires the id distribution of the data to be processed in the database, inquires the maximum value of the id, and segments the id according to the actual requirement to form a data page. And splitting the data volume according to the actual data volume of the service, the limit conditions of the number requirement of the files, the single file size and the like, and generating corresponding files, thereby realizing 1, data paging and multi-file control.
Referring to fig. 2, S1 specifically includes the following steps:
s11, querying a database to generate a maximum record (maxId) and a minimum record (minId) of data, and dividing the data into a plurality of Id segments segmentIds.
S12, generating a file name set FileList to be generated according to the actual data volume and the single file size limit;
s13, starting an asynchronous thread by using a thread pool to read data, and placing the data in the map page by page.
S2, when the file set is not empty, configuring a preset semaphore to be in a first state, writing the files into target positions one by one in a single-thread mode, configuring the semaphore to be in a second state after writing is completed, and repeatedly executing the steps until the file set is empty.
In order to improve the performance of writing files, the method realizes the mutual independence of read-write threads in a read-write separation mode by means of asynchronous thread read-write separation.
The main thread is responsible for writing data into a file, and the main thread acquires each page of data from the Map, sequentially writes the data into the data file, and acquires the data of the next page after the writing is completed. If the Map has no current page data, the main thread is blocked until the corresponding page data is acquired.
S2 specifically comprises the following steps:
s21, traversing the fileList by the main thread to acquire the file name of the next file to be generated.
S22, the main thread preferentially acquires data from the temporary data set TempList, and if the temporary data set TempList does not exist, the main thread sequentially acquires current page data from the map.
S23, judging whether the current page is acquired, if not, entering a thread blocking state and executing S22, and if so, executing S24.
S24, traversing the current page data set, and writing data into the file one by one;
and S25, when the written data reach the upper limit of the file, placing the data which are not written in the data set into a temporary data set TempList, deleting the current page data in the map, releasing the Semaphore, and returning to S21.
S26, deleting the current page data in the map after the data set is processed, releasing the Semaphore, setting the next page as the current page, and returning to S22 until all the tape generation files are generated.
S3, aiming at the reading process, carrying out multi-thread concurrent inquiry on the premise that the semaphore is configured to be in a second state.
And starting a new thread in the main thread to serve as a reading thread to control the reading of data. The read thread adopts multithreading to inquire the data of each page concurrently, the data is stored in the Map set, the page number of each page of data is used as Key, the data set is used as Value, and the data order is ensured.
S3 specifically comprises the following steps:
s31, the reading thread traverses the segment Ids data segment.
S32, blocking the acquisition signal quantity Semaphore;
s33, starting a new thread by the thread pool, and inquiring sub-thread data;
s34, inquiring the data set according to the data segment id;
s35, storing the data set into a map, taking the data segment id sequence as a key, and recording the data page.
Repeating the steps until all the data segment ids are queried.
In the method, in order to avoid too fast concurrent inquiry of the read thread, a preset Semaphore (signal quantity) is used for controlling the concurrent thread of the read thread. The read thread attempts to acquire the signal before inquiring the data, and if the signal quantity is successfully acquired, a new thread is started to inquire the data. If the semaphore is not acquired, the thread blocks until the write thread releases the semaphore and the read thread reacquires. The maximum concurrent inquiry thread number of the system is controlled by the signal quantity, so that the creation of a great number of instantaneous inquiry tasks is avoided, and the concurrent control is realized.
In addition, the method and the device can effectively control writing of concurrent data pages in maps based on control of the number of concurrent query threads of Semaphore. And meanwhile, after the main thread finishes writing each page of data, the data of the page in the Map can be deleted in time, so that the number of data pages stored in the Map is ensured, the stability of a memory is ensured, and the memory control is realized.
The method has the following advantages:
(1) High timeliness. The method adopts a mode of multithreading reading and single thread writing to carry out asynchronous reading and writing of the file, and simultaneously carries out file reading and writing, thereby solving the problem of long time consumption for reading data and being capable of carrying out large-data-volume file reading and writing in a short time. By adopting a read-write separation mode, the data is read by multiple threads, the data file is written by a single thread, and the read-write efficiency is high.
(2) The read-write threads are separated, and the write threads control the concurrency of the query threads. And controlling the number of concurrent query threads by using the semaphore, wherein the read thread and the write thread share the same semaphore, and when the read thread acquires the semaphore, starting a new thread to concurrent query the current page data, and when the semaphore acquisition fails, blocking the thread. The write thread will release the semaphore in time after the page data is written into the file. After the reading thread obtains the data, the next page of data is inquired in time. And the maximum cache data quantity in the memory is effectively controlled through the control of the signal quantity. The signal quantity is used for carrying out information transfer among threads, the read-write threads share the signal quantity, and the write threads control concurrency of the read threads, so that the maximum cache data quantity in the memory is effectively controlled.
(3) The data is ordered. And writing the data files by using a single-thread order, and ensuring the data sequence.
Example 2
The present embodiment provides an electronic device on the basis of embodiment 1, including: one or more processors and a memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the asynchronous multithreading file processing method described above.
Example 3
The present embodiment provides a computer-readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising instructions for performing the asynchronous multithreading file processing method of embodiment 1.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. An asynchronous multithreading file processing method, comprising the steps of:
segmenting a large data volume file to be written into data pages, storing the data pages according to the pages, and obtaining a file set of the file to be written based on a preset single file size;
when the file set is not empty, configuring a preset semaphore to a first state, writing files into target positions one by one in a single-thread mode, configuring the semaphore to a second state after writing is completed, and repeatedly executing the steps until the file set is empty;
aiming at the reading process, carrying out multi-process concurrent inquiry on the premise that the semaphore is configured to be in a second state.
2. The method for processing asynchronous multithreading files of claim 1, wherein the process of segmenting large data volume files to form data pages and storing the data pages by page comprises the steps of:
and inquiring the maximum record and the minimum record of the large-data-volume file, segmenting the maximum record and the minimum record to form data pages, and storing the data pages according to the pages.
3. The method for processing files in asynchronous multithreading according to claim 1, wherein the step of writing files one by one into the target location comprises:
detecting whether a temporary data set is empty, if not, obtaining a current processed data page based on data in the temporary data set, if so, obtaining the current processed data page based on the file set, traversing the current processed data page, writing data into a target position one by one, taking the next data page as the current data page, and repeating the step.
4. A method of asynchronous multithreading according to claim 3, wherein during the writing process, if the written data reaches a predetermined single file size, the data to be written is placed in the temporary data set and the semaphore is configured to a second state.
5. The method of claim 1, further comprising deleting the written data page after the writing is completed.
6. The method for processing asynchronous multithreading files according to claim 1, wherein the process of concurrent querying based on the query request comprises the steps of:
when the semaphore is configured in a second state, the data of each data page is concurrently queried in a multi-threaded manner and stored in order.
7. The method of claim 6, wherein the semaphore is configured in a first state, the thread blocking state is entered until the semaphore is configured in a second state.
8. A method of asynchronous multithreading according to claim 1, wherein the data pages are stored page by page in a map container.
9. An electronic device, comprising: one or more processors and a memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the asynchronous multithreading file processing method of any of claims 1-8.
10. A computer readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising instructions for performing the asynchronous multithreading file processing method of any of claims 1-8.
CN202311661885.5A 2023-12-05 2023-12-05 Asynchronous multithreading file processing method, device and storage medium Active CN117667316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311661885.5A CN117667316B (en) 2023-12-05 2023-12-05 Asynchronous multithreading file processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311661885.5A CN117667316B (en) 2023-12-05 2023-12-05 Asynchronous multithreading file processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN117667316A true CN117667316A (en) 2024-03-08
CN117667316B CN117667316B (en) 2024-06-18

Family

ID=90084078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311661885.5A Active CN117667316B (en) 2023-12-05 2023-12-05 Asynchronous multithreading file processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117667316B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118210455A (en) * 2024-05-21 2024-06-18 航天宏图信息技术股份有限公司 Ultra-long field data read-write performance optimization method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799415A (en) * 2012-06-13 2012-11-28 天津大学 File reading and writing parallel processing method combining with semaphore
CN102999378A (en) * 2012-12-03 2013-03-27 中国科学院软件研究所 Read-write lock implement method
CN107045530A (en) * 2017-01-20 2017-08-15 华中科技大学 A kind of method that object storage system is embodied as to local file system
CN109144955A (en) * 2018-07-12 2019-01-04 武汉斗鱼网络科技有限公司 A kind of file reading and electronic equipment
CN112559210A (en) * 2020-12-16 2021-03-26 北京仿真中心 Shared resource read-write mutual exclusion method based on RTX real-time system
KR20210058613A (en) * 2019-11-13 2021-05-24 서강대학교산학협력단 Locking method for parallel i/o of a single file in non-volatiel memeroy file system and computing device implementing the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799415A (en) * 2012-06-13 2012-11-28 天津大学 File reading and writing parallel processing method combining with semaphore
CN102999378A (en) * 2012-12-03 2013-03-27 中国科学院软件研究所 Read-write lock implement method
CN107045530A (en) * 2017-01-20 2017-08-15 华中科技大学 A kind of method that object storage system is embodied as to local file system
CN109144955A (en) * 2018-07-12 2019-01-04 武汉斗鱼网络科技有限公司 A kind of file reading and electronic equipment
KR20210058613A (en) * 2019-11-13 2021-05-24 서강대학교산학협력단 Locking method for parallel i/o of a single file in non-volatiel memeroy file system and computing device implementing the same
CN112559210A (en) * 2020-12-16 2021-03-26 北京仿真中心 Shared resource read-write mutual exclusion method based on RTX real-time system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118210455A (en) * 2024-05-21 2024-06-18 航天宏图信息技术股份有限公司 Ultra-long field data read-write performance optimization method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117667316B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN117667316B (en) Asynchronous multithreading file processing method, device and storage medium
CN109598156B (en) Method for redirecting engine snapshot stream during writing
CN102725752B (en) Method and device for processing dirty data
US20150006591A1 (en) Memory storage apparatus, method of supporting transaction function for database, and memory system
KR20090026296A (en) Predictive data-loader
CN103577470B (en) A kind of file system and method for lifting web server performance
CN104092670A (en) Method for utilizing network cache server to process files and device for processing cache files
CN109240607B (en) File reading method and device
US20240086332A1 (en) Data processing method and system, device, and medium
CN103399823A (en) Method, equipment and system for storing service data
CN105787012A (en) Method for improving small file processing capability of storage system and storage system
CN107817945A (en) Data reading method and system of hybrid memory structure
CN114063922A (en) Method, device, equipment and medium for accelerating replication of master and slave library streams
JPH10301818A (en) File system and method for managing the same
US9176675B1 (en) Fast-zeroing in a file system
CN102722450A (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN110427347A (en) Method, apparatus, memory node and the storage medium of data de-duplication
US20160350343A1 (en) Data Processing and Writing Method and Related Apparatus
CN110008030A (en) A kind of method of metadata access, system and equipment
US11099983B2 (en) Consolidating temporally-related data within log-based storage
JPH04219844A (en) High-speed medium preferential release type exclusive system
CN107450859B (en) Method and device for reading file data
CN111198660A (en) B + tree traversal method and device
CN106649860B (en) Defragmentation method applied to aggregated files
CN113220639B (en) File storage system control device for space application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant