CN103577158B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN103577158B
CN103577158B CN201210250129.9A CN201210250129A CN103577158B CN 103577158 B CN103577158 B CN 103577158B CN 201210250129 A CN201210250129 A CN 201210250129A CN 103577158 B CN103577158 B CN 103577158B
Authority
CN
China
Prior art keywords
read
data
asynchronous
worker thread
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210250129.9A
Other languages
Chinese (zh)
Other versions
CN103577158A (en
Inventor
庄明强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201210250129.9A priority Critical patent/CN103577158B/en
Publication of CN103577158A publication Critical patent/CN103577158A/en
Application granted granted Critical
Publication of CN103577158B publication Critical patent/CN103577158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This application provides a kind of data processing method and device, wherein, data processing method is used for worker thread and processes double buffering data, and described worker thread includes single context;Described worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.By the application, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O performance of system.

Description

Data processing method and device
Technical field
The application is related to technical field of data processing, more particularly to a kind of data processing method and device.
Background technology
Become the Main Means that lifting calculates performance, the pass of Computer System Design with multiprocessor, multinuclear, multithreading Note focus is transferred to the parallel running of multiple or even a large amount of threads from the execution performance of single thread, and exploitation can count On the parallel processor of amount sustainable growth, the application program of Effec-tive Function just becomes extremely important.Meanwhile, software parallelization Also imply that I/O(Input/output)Parallelization.In I/O parallelization, pre-read can efficiently reduce disk tracking number of times and The I/O waiting time of application program, is to improve disk to read one of important optimization means of I/O performance.
Generally, operating system is realized pre-reading function to file in kernel spacing.Mainstream operation system is all followed One simple and effective principle:It is divided into random write and order to read two big class reading mode, and only order is read to pre-read, in advance The access module that reading algorithm is required to recognition application will be accessed for page of data with predicting.Traditional pre-reads calculation The method that method adopts pattern match, monitors the read request sequence to each file for the application program, safeguards certain historical record, and Itself and access module are carried out characteristic matching one by one.If meeting the feature of any nonrandom access module, you can according to This feature is predicted and pre-reads.Taking (SuSE) Linux OS as a example, its pre-read flow process as shown in figure 1, this pre-read including:Step S10:Determine whether that order is read, if it is not, then terminating to pre-read;If so, it is by step S20;Step S20:It is big that calculating pre-reads Little;Step S30:Carry out streamline to pre-read, return to step S10.
But, above-mentioned pre- read mode has system consumption and high cost, and asynchronous reading efficiency is low, especially for distribution For formula system, asynchronous reading efficiency is more low, the problem of system I/O poor performance.
Content of the invention
This application provides a kind of data processing method and device, system cannot be met to solve the existing scheme that pre-reads Pre-read demand, especially distributed system pre-read demand it is impossible to effectively improve the problem of system I/O performance.
In order to solve the above problems, this application discloses a kind of data processing method, methods described is at worker thread Reason double buffering data, wherein, described worker thread includes single context;Described worker thread calls first interface function Monitor its own double buffering asynchronous pre-reads data state.
In order to solve the above problems, disclosed herein as well is a kind of data processing equipment, described device is for worker thread Process double buffering data, wherein, described worker thread includes single context;Described device includes monitoring module, is used for Described worker thread is made to call first interface function to monitor its own double buffering asynchronous pre-reads data state.
Compared with prior art, the application has advantages below:
In the application, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has independence Context and double buffering, and, monitor its own double buffering asynchronous pre-reads data shape by calling first interface function State, monitoring thread that need not be extra.Each worker thread has one or more independent double buffering, can be quickly timely The concurrent pre-reads data in ground;Each worker thread has independent context, and each worker thread can be connect by calling first Mouth function monitors its own double buffering asynchronous pre-reads data state, determines that the double buffering of this worker thread itself is asynchronous and pre-reads The situation of data, need not be using extra monitoring thread, it is to avoid many worker threads are monitored making using a monitoring thread The unnecessary context switching becoming, and the asking of the whole program cisco unity malfunction leading to when monitoring thread goes wrong Topic.It can be seen that, by the application, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O of system Energy.
Brief description
Fig. 1 is that a kind of (SuSE) Linux OS kernel of the prior art pre-reads schematic flow sheet;
Fig. 2 is a kind of schematic diagram of asynchronous pre- read procedure;
Fig. 3 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application two;
Fig. 4 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application three;
Fig. 5 is that one of embodiment illustrated in fig. 4 worker thread enters the schematic diagram that line asynchronous pre-reads;
Fig. 6 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application four;
Fig. 7 is the process chart of the advise function in embodiment illustrated in fig. 6;
Fig. 8 is the process chart of the get_block function in embodiment illustrated in fig. 6;
Fig. 9 is to be illustrated according to the state transfer of the single buffer in a kind of data processing method of the embodiment of the present application five Figure;
Figure 10 is to be illustrated according to the state transfer of the double buffering in a kind of data processing method of the embodiment of the present application five Figure;
Figure 11 is a kind of structured flowchart of the data processing equipment according to the embodiment of the present application six.
Specific embodiment
Understandable for enabling the above-mentioned purpose of the application, feature and advantage to become apparent from, below in conjunction with the accompanying drawings and specifically real Mode of applying is described in further detail to the application.
Hereinafter, for ease of understanding the digital independent scheme of the application, first asynchronous pre- pronouncing is introduced with brief.
Asynchronous pre-reading is more conventional in data pre-head and effective means, and it is shifted to an earlier date desired data by asynchronous I/O It is loaded into internal memory, to postpone to application hides I/O.Fig. 2 is a kind of schematic diagram of asynchronous pre- read procedure, as shown in Fig. 2 Asynchronous pre-reading can make CPU and disk work simultaneously, thus improving the utilization rate of computer system.Without pre-reading, then In loading data, disk is busy and CPU waits;In processing data, CPU is busy and disk is idle.This alternate free time It is the idle waste of the one kind to system resource with waiting.And pass through asynchronous pre-read, operating system carries out I/O in advance on backstage, can To cut down the waiting time of CPU, it is allowed to and disk concurrent working, realizes assembly line work.
The digital independent scheme of the application is based on above-mentioned asynchronous pre- read mode, is below described in detail.
Embodiment one
The data processing method of the present embodiment is used for worker thread and processes double buffering data, and wherein, worker thread includes Individually context;Worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.
Worker thread is used for processing one or more data blocks treating that order is read, and in the present embodiment, each worker thread has There is a single context, and there is the double buffering space of oneself.Each worker thread is processing the double buffering of itself During data, monitor its own double buffering asynchronous pre-reads data state by calling first interface function, and then can be according to this Asynchronous pre-reads data state carries out follow-up pre-reads data and processes, and extra monitoring thread need not enter line asynchronous pre-reads data state Monitoring.
By the present embodiment, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has Independent context and double buffering, and, monitor the asynchronous pre- reading of its own double buffering by calling first interface function According to state, monitoring thread that need not be extra.Each worker thread has one or more independent double buffering, can be quick Concurrent pre-reads data in time;Each worker thread has independent context, and each worker thread can be by calling One interface function monitors its own double buffering asynchronous pre-reads data state, determines that the double buffering of this worker thread itself is asynchronous The situation of pre-reads data, need not be using extra monitoring thread, it is to avoid many worker threads are supervised using a monitoring thread The unnecessary context switching that control causes, and the whole program cisco unity malfunction leading to when monitoring thread goes wrong Problem.It can be seen that, by the present embodiment, can effectively meet the demand that pre-reads of system especially distributed system, improve system I/O performance.
Embodiment two
With reference to Fig. 3, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application two.
The data processing method of the present embodiment comprises the following steps:
Step S 102:Distribute at least one double buffering for each worker thread in internal memory.
Wherein, each worker thread is used for processing one or more data blocks treating that order is read, and, each worker thread There is a single context.
Worker thread is the thread of processing data, and in a distributed manner as a example Database Systems, each worker thread can process one Individual inquiry request, according to inquiry request it is known that need to read those continuous data blocks, then worker thread is from outer simultaneously Portion's memorizer reads these data blocks.If an inquiry request needs to read the multiple consecutive data block from multiple files, Or read multiple consecutive data block from same file, worker thread can need the consecutive data block reading to distribute for each One double buffering is reading desired data.
Often start a thread pool during system start-up, when there being task to need to process, one can be obtained from thread pool As worker thread, each worker thread can create single context when first time reading file to individual thread, certainly also may be used To create context while creating worker thread, worker thread determines continuous data to be read according to the demand of task Block.There are several consecutive data block to need to read it is possible to set up several double bufferings simultaneously, process after completing a task, Worker thread puts back to thread pool it is preferable that the internal memory of context therein and double buffering does not discharge, as idle double buffering Area, remain under treatment individual task when reuse, during next process task, if necessary to double buffering, first from the sky of worker thread Distribute in not busy double buffering, without idle double buffering, then create new double buffering.Certainly, in actual applications, double The quantity of relief area be also possible to different from needing the quantity of consecutive data block reading simultaneously, can be by those skilled in the art It is appropriately arranged with, the data processing scheme of equally applicable the present embodiment.
Each worker thread has a single context, inside this worker thread, context is processed, has Effect avoids the context switching of many work cross-threads;Distribute double buffering for each worker thread, it is possible to use double buffering Alternation, when processing the data of a relief area using another relief area pre-reads data, realizes real asynchronous pre-read.
Step S104:Each worker thread directly one or more treats the data read of order from external memory storage by corresponding The asynchronous pre- double buffering reading this worker thread itself of data of block.
In pre-reads data, each worker thread directly by the data pre-head in external memory storage to User space itself is double In relief area, it is to avoid pagecache using so that data directly in external memory storage as between disk and user buffering area Transmission.
Preferably, in pre-reads data, worker thread calls the instruction of second interface function asynchronous to pre-read one or more treating Order read data block data, wherein, second interface function carry asynchronous pre-read information, the asynchronous information that pre-reads includes:Treat The information of data block sequentially read and the handle of the database file belonging to data block treating sequentially to read;Worker thread is according to asynchronous Pre-read information, directly pre- read work from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read Make the double buffering of thread itself.
Preferably, worker thread is pre-reading information according to asynchronous, directly will be corresponding one or more from external memory storage Whne the data of data block that order is read asynchronous pre- read worker thread itself double buffering when, information can be pre-read according to asynchronous With the size of the double buffering in worker thread, determine the asynchronous number of times pre-reading and order;Asynchronous pre-read further according to determine Number of times and order, call the 3rd interface function directly one or more to treat the data blocks read of order from external memory storage by corresponding The asynchronous pre- double buffering reading worker thread itself of data.
Preferably, the asynchronous information that pre-reads also includes:Pre- read mode and/or cache way, wherein, pre- read mode is used for indicating The data block treating order reading is that forward sequence is read or reverse sequence is read, and cache way is used to indicate whether to the data pre-reading Enter row cache.
Preferably, the |input paramete of the 3rd interface function include this pre-read described in treat the letter of data block that order is read Breath.Additionally, the |input paramete of the 3rd interface function can also include:Pre- read mode and/or cache way, wherein, pre- read mode For indicating that what this treated the data block that order is read pre-read that forward sequence is read or reverse sequence is read, cache way is used for referring to Show whether row cache is entered to the data that this pre-reads.
Step S106:Each worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data shape State, determines the asynchronous progress pre-reading according to this state, obtains data from the double buffering of its own and is processed.
System makes each worker thread monitor its own double buffering asynchronous pre-reads data state by first interface function, And then the asynchronous progress pre-reading of the data of itself can be determined according to this state, determine from this worker thread double buffering of itself Middle reading data is processed.
Process to buffer data can be conventional treatment means, such as modification, storage, transmission etc., and the application is to this It is not restricted.
By the present embodiment, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has Independent context and double buffering, and, each worker thread is directly from the asynchronous pre-reads data of external memory storage, and then according to The asynchronous progress acquisition data that pre-reads is processed.By directly from the asynchronous pre-reads data of external memory storage, avoid pagecache Use;Each worker thread has one or more independent double buffering, being capable of quick concurrent pre-reads data in time; And, when pre-reads data amount is larger, data to be pre-read repeatedly can be read using double buffering asynchronous alternate, reads every time The data volume of buffer size;When worker thread processes the data of one of double buffering relief area such as current buffer, Asynchronous parallel reads in data another relief area to double buffering such as read-ahead buffer from external memory storage, thus realizing Worker thread processing data and the executed in parallel reading data from external memory storage, effectively prevent data and are constantly added into, The situation being eliminated, being rejoined, improves system I/O performance;Additionally, each worker thread have independent upper and lower Literary composition, and each worker thread can determine the asynchronous pre- reading of the double buffering of this worker thread itself according to the asynchronous progress pre-reading According to situation, need not use monitoring thread, it is to avoid many worker threads using a monitoring thread be monitored causing need not The context switching wanted, and the problem of the whole program cisco unity malfunction leading to when monitoring thread goes wrong.It can be seen that, By the present embodiment, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O performance of system.
Embodiment three
With reference to Fig. 4, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application three.
The data processing method of the present embodiment comprises the following steps:
Step S202:Shared out the work thread according to file access command, and, needs are determined according to this document visit order Carry out order read operation, be that assignment thread determines the data block treating that order is read.
Wherein, can be when system receives file access command from thread pool for order read operation assignment thread Distribution, e.g., according to the querying command of data base, share out the work from thread pool thread, and work of not reallocating when order reads Make thread, further according to this querying command, determine needs to access those files worker thread, then determine the need for accessing these literary compositions The continuous data block of part.
By verifying following two conditions, this step can judge that whether a read operation is using linux system kernel The mode that order is read:(1)This is to read for the first time after file is opened, and read is file header;(2)Current read request It is continuous with position in file for the previous read request.When determine to carry out order pre-read after, true for this worker thread Surely data block to be pre-read.
Certainly, not limited to this, actually used in, other modes read according to file access command determination order are equally suitable With, e.g., when accessing data base, being shared out the work thread according to the querying command of inquiry database file, determining to database file Inquiry be range query, then can be identified as order and read, and then the scope inquired about according to range query, true for this worker thread Surely treat the data block that order is read, that is, when accessing data base, first share out the work thread, if range query accesses, then can determine that This inquiry operation is order read operation, and then determines the data block treating that order is read for this worker thread.
Step S204:Distribute at least one double buffering for assignment thread in internal memory, this worker thread has One single context.
Worker thread should processing data, again read data, the file access command that worker thread is processed according to oneself, To determine needing which consecutive data block read from which file, for each consecutive data block reading distribute one double slow Rush area, the double buffering number of worker thread does not know, and determines number according to file access command.And, next time processes literary composition Existing double buffering can be reused, when the double buffering number in worker thread is not enough, then for working during part visit order The new double buffering of thread creation.
That is it is generally the case that each worker thread may be allocated a relief area in internal memory, but, When a worker thread needs to multiple when the data block that order is read is processed, then to divide in internal memory for this worker thread Join the double buffering equal with the quantity of multiple data blocks treated and sequentially read, that is, system can first determine whether each worker thread It is multiple whether the pending data block treating sequentially to read has;If so, then distribute in internal memory for this worker thread and read with treating order Data block quantity identical double buffering.During distribution double buffering, divide first from the idle double buffering of worker thread Join, without enough idle double bufferings, then create new double buffering for worker thread, when worker thread has been processed times After business, by the idle double buffering list of the thread of all double buffering addition work.
Preferably, each double buffering of each worker thread adopts fixed size, and the double buffering of fixed size is realized Fixing pre-reads granularity, using the fixing process that granularity eliminates pre- reading window Rapid Expansion of pre-reading, realizes more simply high Effect.Wherein, the size of double buffering can be appropriately arranged with according to practical situation by those skilled in the art it is preferable that each is double The size of relief area is 1M or 2M.Using the granularity that pre-reads of 1M, the utilization rate of monolithic disk can reach 60% in theory, and uses 2M pre-reads granularity, and the utilization rate of monolithic disk can reach more than 75% in theory.Certainly, it is not limited to the double buffering of fixed size Area, in actual use, the relief area of on-fixed size is equally applicable.
In addition, context such as io_context and double buffering in worker thread can be reused, at place Initialization during first pre- read request of reason.
Step S206:Worker thread directly will be corresponding one or more from external memory storage using Direct IO mode Treat the asynchronous pre- double buffering reading this worker thread itself of data of the data block that order is read.
Each worker thread directly from external memory storage by corresponding one or more data of data blocks treating that order is read The asynchronous pre- double buffering reading this worker thread itself, refers to worker thread workaround system, avoids pagecache, Directly operate IO(Input and output), that is, Direct IO(Directly IO), Direct IO makes file data directly in disk Transmit and user buffering area between, Direct IO is functionally equivalent to original I O of equipment.
In the present embodiment, the double buffering setting each worker thread includes current buffer and read-ahead buffer.Specifically To a double buffering of a worker thread, this worker thread can be used alternatingly its current buffer and read-ahead buffer, Directly the asynchronous data pre-reading corresponding one or more data block treating sequentially to read from external memory storage, pre-reads until asynchronous Complete corresponding one or more all data treated in the data block that order is read.Preferably, worker thread, can when reading data With the pipeline system reading manner using double buffering, that is,:After the data of current buffer returns, if also data needs Read, immediately begin to the asynchronous data pre-reading read-ahead buffer;Then process the data of current buffer, after the completion of process, will Current buffer and read-ahead buffer exchange;After current buffer and read-ahead buffer exchange, if current buffer data Do not complete and read, wait its asynchronous reading to complete;After current buffer data completes to read, if also data needs to read, Start again at the data pre-reading read-ahead buffer immediately, then process the data of current buffer, so circulate, until having read Till becoming all of data.Can see from this process, as long as in the case of having data to need to read, double buffering replaces to be read Take, whenever have a relief area in asynchronous reading data, and the data reading is all needs, does not read any Unnecessary data.
During above-mentioned data pre-head, each worker thread calls first interface function independently to monitor this worker thread certainly The state of the asynchronous pre-reads data of double buffering of body.Preferably, each worker thread, when needing processing data, can call use It is first interface function in the asynchronous interface function pre-reading status checkout, this worker thread itself is monitored by this interface function The state of the asynchronous pre-reads data of double buffering;And then pending data is determined according to the state of asynchronous pre-reads data, and from certainly The double buffering of body obtains the data determining and is processed.
In the present embodiment, the state setting relief area includes idle condition, waiting state and SBR.For double buffering For current buffer in area, the state of current buffer includes idle condition, waiting state and SBR, Current buffer The idle condition in area is used for indicating current buffer current idle, and the waiting state of current buffer is used for instruction and waits asynchronous reading Fetch data current buffer, the SBR of current buffer is used for indicating that the asynchronous data that reads in current buffer completes;Right For the read-ahead buffer in double buffering, the state of read-ahead buffer also includes idle condition, waiting state and prepares shape State, the idle condition of read-ahead buffer is used for indicating read-ahead buffer current idle, and the waiting state of read-ahead buffer is used for referring to Show the asynchronous data that reads of wait to read-ahead buffer, the SBR of read-ahead buffer is used for indicating the asynchronous reading of read-ahead buffer Data completes.
Step S208:Worker thread according to the state of the asynchronous pre-reads data of the double buffering of its own, from its own Obtain data in double buffering and processed.
Specifically, including:Worker thread determines the current buffer of this worker thread or the current state of read-ahead buffer For SBR, then obtain data from the current buffer of this worker thread itself or read-ahead buffer and processed.
It is illustrated below, worker thread mainly processes the work of data, data is from external memory storage such as disk Read coming it is assumed that certain worker thread only needs to the data that order reads one of file consecutive data block, if Data block size is 10M, and each relief area is 1M, and data can be read in current buffer by worker thread first, then beginning The data of reason current buffer, initiates an ensuing data of asynchronous reading to read-ahead buffer, work to operating system simultaneously Make the data that thread constantly processes current buffer, after the completion of process, call similar io_getevents etc. for asynchronous pre- The first interface function of read states detection waits the digital independent of read-ahead buffer to complete, and then exchanges current buffer and pre-reads Relief area, the data processing current buffer initiates asynchronous pre-reads data to read-ahead buffer simultaneously, required until having processed Data.It should be noted that the state that worker thread monitors double buffering can be during processing data, needing to count According to and in the case that data is not ready for, check the state of double buffering, wait to be read complete, generally will not poll double slow Rush zone state, and worker thread is only responsible for checking the corresponding double buffering of IO that oneself is initiated.This mode makes asynchronous pre- Read data speed faster, in hgher efficiency.Certainly, be also feasible by the way of poll, but asynchronous pre-read effect will under Fall.
Hereinafter, with a worker thread based on Libaio the and Direct IO under Linux environment as instantiation, right The present embodiment is explained.
With reference to Fig. 5, show the work based on Libaio the and Direct IO under Linux environment in the present embodiment Enter the schematic diagram that line asynchronous pre-reads as thread.
Libaio is Linux Native AIO, is primary asynchronous I/O interface under Linux, can only be with Direct IO mono- Rise and use, employ Direct IO, the pagecache of operating system cannot be used, so the cache of User space is exactly must Spare unit.Libaio provides tetra- API of io_setup, io_submit, io_getevents and io_destroy(Application program Interface), wherein, io_setup is used for building the handle of an asynchronous IO context;It is asynchronous that io_destroy is used for destruction one The handle of IO context;Io_submit is used for submitting an asynchronous I/O operation to in specified asynchronous IO context handle, incoming Parameter is the structure of a struct iocb, which includes need read-write file handle, offset, size, buffer and The pointer of one void* type;The asynchronous IO that io_getevents is used for the prescribed numbers such as obstruction completes event, incoming the longest Blocking time, the event number of minimum acquisition, the event number at most obtaining, be used for asynchronous pre- read states inspection in the present embodiment Look into.
Using the pagecache of Direct IO workaround system, directly read disk, the internal memory pre-reading and file Cache management has application program to manage independently, and makes the memory management of whole application program controlled.
In the present embodiment, each worker thread has an io_context(IO context, the work space of Libaio), Each worker thread calls io_getevents to monitor the io_context of oneself;Separate between each worker thread, no Share any data, be independent of each other;Extra monitoring thread is not needed to monitor asynchronous reading state.
On the one hand, when whole process shares an io_context, monitoring thread needs to collect from io_context In the message distribution that receives give multiple different worker threads, this process is related to lock competition and context switching, impact Efficiency, and in this example, each thread is owned by an io_context it is not necessary to lock protection, receiving does not need during message point Send out, do not have context to switch.
On the other hand, if there is extra monitoring thread, extra monitoring thread needs recursive call io_ Getevents function checks asynchronous reading state, if asynchronous reading completes, wakes up the corresponding worker thread blocking, then The data that worker thread process has been read, this mode can be periodically waken up due to io_getevents, increased not Necessary context switching, particularly in the case of high concurrent, frequently context switching will affect reading efficiency;And, Once monitoring thread goes wrong, whole program cisco unity malfunction will be led to.And this example uses io_getevents function to supervise The state of the asynchronous pre-reads data of double buffering of control worker thread itself, to monitor asynchronous reading shape without extra monitoring thread State, then avoid unnecessary context switching, also effectively prevent and leads to the whole program can not when monitoring thread goes wrong The situation of normal work.
The scheme of the asynchronous reading of this example as shown in figure 5, showing a worker thread in Fig. 5, this worker thread bag Include multiple double bufferings, from 0 to N, N be more than or equal to 1, each double buffering include current buffer Current Buffer and Read-ahead buffer Ahead Buffer, when this worker thread needs to read data, calls io_submit function to send asynchronous reading Take request, then call io_getevents function etc. to be returned;Worker thread do not need receive data according to when, worker thread is empty In the spare time, there will not be unnecessary context switching.
By the present embodiment, solve existing pre-read the demand that pre-reads that scheme cannot meet system, especially distributed System pre-read demand it is impossible to effectively improve the problem of system I/O performance, effectively meet system especially distributed system Pre-read demand, improve the I/O performance of system.
Example IV
With reference to Fig. 6, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application four.
The present embodiment, in a distributed manner as a example Database Systems, is explained to the data processing method of the application.
There are multiple nodes in distributed data base system, each node is by a disk battle array being made up of several piece disk Row, multiple nodes carry out the concurrent data services performance that Parallel I/O can lift whole distributed data base.Each node saves Thousands of data files, each data file is by thousands of block number evidences, each block about 64KB.Visit to distributed data The pattern of asking is broadly divided into two classes, simple queries(get)And range query(scan), simple queries need the multiple block number of random read take According to range query needs order reads several block number evidences, even travels through the data file of all disks of whole node, and scope is looked into Ask and also include reverse range query.When carrying out range query needs order read same data file different piece or Order reads multiple data files, and multiple random read takes and multiple parallel order may be had to read and simultaneously act on same data File.In order to prevent frequently beating opening/closing data file, distributed data base node often opens thousands of data literary compositions simultaneously Part, and cache file handle, each file may store several orders and pre-read stream, and operating system follows the tracks of so multiple files Pre-read stream mode, a lot of memory sources will be consumed.For this reason, present embodiments providing a kind of method for reading data, ask for solving this Topic provides a kind of feasible program.
The double buffering of the present embodiment is asynchronous to pre-read the asynchronous reading based on Libaio and Direct IO for the machine on demand, still adopts Each worker thread has an io_context, and each worker thread calls io_getevents to monitor the io_ of oneself context;Separate between each worker thread, do not share any data, be independent of each other, monitoring thread that need not be extra is come Monitor asynchronous reading state, when worker thread needs to read data, call io_submit function to send asynchronous read requests, Then call the scheme to be returned such as io_getevents function.And, the present embodiment is given bright using advise interface function True order pre-reads instruction, reads data using get_block interface function order.
The data processing method of the present embodiment comprises the following steps:
Step S302:Setting second interface function advise function and the 3rd interface function get_block function.
Advise function is used for instruction order and reads, namely the asynchronous number pre-reading one or more data blocks treated and sequentially read According to giving certain sequential flow(Treat the data block that order is read)Clearly sequentially pre-read instruction, advise function needs incoming different Step pre-reads information, that is, advise function needs incoming parameter to include data file handle(It is to treat that order is read in the present embodiment The database file belonging to data block handle), need the continuous blocks data message reading(Treat the data block that order is read Information), if desired for all block metamessage arrays reading(Block initial address hereof and length), may comprise Hundreds and thousands of block metamessages, these block metamessages may comprise multiple groups, every group of block in physical file all It is continuous, be equivalent to and specify multiple orders and pre-read stream, a relief area can accommodate 1 multiple block data.Preferably Ground, needs the incoming asynchronous information that pre-reads also to include:Pre- read mode and/or cache way, wherein, it is right that pre- read mode is used for indicating Treat that the data block read of order is that forward sequence is read or reverse sequence is read, that is, the need of reversely pre-reading;Cache way is used for Indicate whether to enter row cache to the data pre-reading, be such as cached to block caching in, i.e. the need of by block number according to add block cache(The block data buffer storage that application program is realized).According to above-mentioned asynchronous pre-read information, worker thread can directly be deposited from outside Reservoir is by corresponding one or more asynchronous pre- double bufferings reading this worker thread itself of data of data blocks treating that order is read Qu Zhong.
Due to can clearly know that in distributed data system which kind of operates as random write(It is defined as during simple queries Random write), which kind of operates is read for order(It is defined as order during range query to read), so providing advise function, allow tune User explicitly calls advise function to be explicitly indicated needs order and pre-reads, and advise function makes to judge whether to sequentially pre-read change Obtain very simple.If worker thread carries out range query, the sequential flow reading is needed to call advise function to give for each bright Really pre-read instruction, during asynchronous reading, instruction is pre-read according to this and carry out sequentially pre-reading.In distributed data base system, every underrange During inquiry, the data file needing to be related to can be calculated, each block number is according to original position in the data file and stop bits Put, when calling advise function for each sequential flow it would be desirable to the information of the block number evidence reading is incoming, then worker thread exists When reading data, using the pipeline system reading manner of double buffering.When current relief area(I.e. current buffer)Data After return, if also data needs to read, immediately begin to asynchronous pre-read ahead relief area(I.e. read-ahead buffer)Data; Then process the data of current relief area, after the completion of process, by current relief area and the exchange of ahead relief area; After current relief area and the exchange of ahead relief area, if current buffer data does not complete read, wait it asynchronous Reading completes;After current buffer data completes to read, if also data needs to read, start again at immediately and pre-read The data of ahead relief area, then processes the data of current relief area, so circulates, complete all of data until reading Till.Can see from this process, as long as in the case of having data to need to read, double buffering replaces reading whenever There is a relief area in asynchronous reading data, and the data reading is all needs, do not read any unnecessary number According to.
One sequential flow is needed the set of blocks reading disposably to pass to advise function by worker thread, and shows and give Clearly pre-read instruction, then call get_block function to take out required block number evidence in order, get_block reads block number evidence Must be in order that if necessary to reversely pre-read, then the block data order that get_block reads is also necessarily it is impossible to out of order Reverse.Get_block function is specified by upper strata needs random write or order to read, if random write, then using synchronous side Formula read, if order read, then read by the way of the double buffering of the present embodiment pre-reads on demand, first read data arrive pair The current buffer of relief area, returns the data needed for application layer, and initiates the back to back data of asynchronous reading to another one In relief area(Read-ahead buffer), upper layer application constantly calls get_block function to obtain required data, works as Current buffer The data in area is employed after floor all takes away, and using read-ahead buffer as current buffer, old current buffer is as new Read-ahead buffer, initiates asynchronous pre-reading again.This process is achieved processing data and is separated with the data reading, and is processing number According to while pre-read data to be processed.
Get_block function can once read the data of a relief area to application layer it is also possible to read one every time Or the data of multiple block is to application layer, the parameter of get_block includes the information of the data block that this pre-reads, namely this The one or more block metamessages pre-reading.Such as pre- read mode is reversely to pre-read, and a general block data only returns to Once, to each sequential flow, the given block metamessage in get_block function parameter is also reverse and connects application layer Continuous, repeatedly call the block metamessage parameter that get_block function gives also to must assure that reverse and continuous, application layer root To determine how to call get_block to obtain data according to the demand of real data, such as to call from each sequential flow successively Get_block obtains first block, then obtains second block of each sequential flow again, until application layer has read institute There are all block data comprising in advise parameter, or terminate digital independent in advance.Get_block function can basis The cache way that advise function is specified enters row cache to block.
In distributed data base system, either simple queries(get), or range query(scan), it is all with block For the unit reading, thus need to provide a get_block function for read a block number evidence, simple queries need at random Read multiple block number evidences, range query needs order reads several block number evidences, even travels through the number of all disks of whole node According to file, range query also includes reverse range query, in order to meet these functions, needs the get_block function being capable of basis |input paramete is determining using random read take block number evidence, or order pre-reads, or reverse sequence pre-reads, because range query can The data file of all disks of whole node can be traveled through, so get_block function needs to decide whether handle according to |input paramete The block number evidence reading is added in block cache.That is, in the letter according to the asynchronous data block pre-reading and treating that order is read Breath, and treat the size of each double buffering in the corresponding worker thread of data block that order is read, determine the asynchronous number of times pre-reading After order;According to the asynchronous number of times pre-reading determining and order, call the 3rd interface function, i.e. get_block function, directly Pre- read this worker thread itself from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read Double buffering.
Step S304:Shared out the work thread according to the querying command of inquiry database file, and, according to this querying command Determine that the inquiry to database file is range query.
Step S306:The scope inquired about according to range query, is that assignment thread determines the data treating that order is read Block, this worker thread has a single context.
Wherein, when this worker thread creates or this worker thread is first for the single context that this worker thread has During secondary reading data, create a single context io_context using Libaio for this worker thread.
Step S308:Worker thread calls advise function to carry out sequentially reading instruction process.
Wherein, advise function handling process as shown in fig. 7, comprises:
Step S3082:Judge whether the data block reading is continuous, if so, then carries out step S3084;If it is not, then terminating suitable Sequence pre-reads flow process.
This step checks whether the block in incoming set of blocks is continuous and belongs to same file.
Step S3084:The data block reading for need distributes this worker thread privately owned double buffering.
Determine whether distribute corresponding double buffering for this document sequential flow in worker thread, if corresponding double slow Rush area not exist, then distribute one from the privately owned idle double buffering of worker thread, if worker thread does not have idle pair to delay Rush area, create a new double buffering as the double buffering of this sequential flow.
Step S3086:The block message that preservation need to be read is to double buffering.
In distributed data base system, carry out the different portions that needs order during range query reads same data file Divide or order reads multiple files that is to say, that a range query may relate to multiple sequential flow(Treat that order is read Data block), each worker thread has one or more double bufferings, and each double buffering saves a shape pre-reading stream State, in worker thread, the number of double buffering and this worker thread process the concurrent sequential flow involved by a range query Number is equal.After worker thread completes a range query, the relief area in thread will be reused by next range query.? In distributed data base system, multiple random read takes and multiple parallel order may be had to read and simultaneously act on same data literary composition Part, worker thread call get_block function realize one block number of random read take according to when, using the privately owned block of an additional thread Relief area, and synchronously read a block using Direct IO, and order reads using privately owned one or more double of worker thread Relief area, is read by the way of asynchronous reading, even if so same data file is read sequentially and random read take simultaneously, What order read pre-reads all without being affected.
The buffer size reading for order can configure, and is defaulted as 1M that is to say, that each read according to required reading The data volume taking, reads data according to the granularity of 1M from disk as far as possible, if the data being actually needed reading is less than 1M, once All of digital independent is completed by disk I/O.Using the reading granularity of 1M, the utilization rate of monolithic disk can reach 60% in theory, The disk array being often made up of several disks in each node of distributed data base system, each disk ensures high usage The handling capacity making node is greatly improved.Additionally, buffer size can also be configured to 2M, the utilization rate of monolithic disk is in theory More than 75% can be reached.
In this step, the information Store such as the block message that read need and file handle is in the internal state of double buffering structure In.
Step S3088:Block in block cache is made marks.
As described above, the data in block cache can reuse, if this data block sequentially pre-reading is in block Exist in cache, then need to be marked.Traversal set of blocks, the block in block cache is arranged one and there is mark Will, prevents from being eliminated during reading file in the block that block cache exists.
Step S30810:Calculate original position and the size of the reading of each relief area, notify get_block function.
Using first not the block offset address in block cache as first time read start offset address and and count Calculate the continuous blocks number needing to read, read in units of block.Calculate start offset address and the block number pre-reading block next time.
Step S310:Worker thread calls get_block function to carry out digital independent process.
Wherein, the handling process of get_block function is as shown in figure 8, include:
Step S3102:Whether decision block, in block cache, if so, then reads from block cache and returns; If it is not, then carrying out step S3104.
As fruit block exists in block cache, read and return.
Step S3104:Determine whether that order reads, if so, then carry out step S3106;If it is not, then synchronous read this block Data, carries out step S3108.
If one block of random read take, synchronously read a block number evidence using Direct IO and return, proceed to step S3108 executes;Otherwise, proceed to the execution of step S3106.
Step S3106:Read block number evidence using double buffering streamline.
Processing sequence reads, and obtains the state of double buffering, reads block number evidence according to double buffering.If returning time-out, Directly return time-out error, otherwise, update double buffering internal state, return the block reading, proceed to the execution of step S3108.
Step S3108:Whether decision block adds in block cache, if it is not, then terminating this to pre-read flow process;If so, then Carry out step S31010.
Step S31010:By block number according in addition block cache.
If necessary by this block copy in block cache, and this block not in block cache, then copies and is somebody's turn to do Block is in block cache.Data is added in block cache it is achieved that the recycling of data.
It can be seen that, by advise function and get_block function, simply and effectively achieve the data of the instruction that order is read Read.
The present embodiment is according to the demand of distributed data base system it is achieved that the asynchronous ahead mechanism on demand of double buffering, this machine What system perfection solved distributed data base system node pre-reads demand it is achieved that the ordered, asynchronization of I/O and parallelization, Thus improving I/O performance.In the present embodiment, application program is had at fingertips to the access module of data, so in user's space Realization pre-reads ratio and realizes pre-reading more targetedly in kernel, can be made to measure according to application demand.Such as in distributed number According in the system of storehouse, simple queries are random read take, and range query is that order reads, and so divide simply direct, only scope are looked into Inquiry is pre-read, and eliminates and is read come recognition sequence with the mode of pattern match.Pre-reading size does not need to estimate, every underrange is looked into Ask required block number according to being all pre-determined for application program, the data pre-reading is all required for program, due to needing Total amount of data to be read only needs to read from back to front it is known that reversely pre-reading, and reads the data of a buffer size every time. The granularity pre-reading is set to 1M or 2M, and using the granularity that pre-reads of 1M, the utilization rate of monolithic disk can reach 60% in theory, use 2M pre-reads granularity, and the utilization rate of monolithic disk can reach more than 75% in theory, and the fixing granularity that pre-reads eliminates pre- reading window The process of Rapid Expansion, realizes more simply efficient.Each worker thread has one or more double bufferings, each double buffering Save a state pre-reading stream, in worker thread, the number of double buffering is with involved by one range query of thread process Concurrent sequential flow number equal, after worker thread completes a range query, the relief area in thread is by by next model Enclose inquiry to be reused, adopt and solve distributed data base in this way to some of same data file or multiple literary composition Part does the demand that order reads simultaneously.For simple queries, do not pre-read, also would not have influence on thread pre-reads buffering Area, thus isolated the impact that random write is read to order.Typically all can add distributed data base from the data of the reading of disk In block cache in, but to large-scale inquiry request, such as travel through all data files of whole node, for these Data is all not written into block cache, it is to avoid to block cache data contamination.Each double buffering comprises a current Relief area and ahead relief area, when worker thread processes the data of current relief area, ahead relief area has begun to Asynchronous pre-read.
Embodiment five
The present embodiment is further optimized to the data processing method in example IV, increases state machine management work Make the state of the double buffering in thread.
The state transfer of the single buffer in the present embodiment and double buffering is respectively as shown in Figure 9 and Figure 10.
With reference to Fig. 9, it illustrates the state transfer schematic diagram of the single buffer in the data processing method of the present embodiment.
As shown in figure 9, there are three kinds of states single relief area:WAIT state(I.e. waiting state), READY state(Prepare State), and FREE state(I.e. idle condition).Wherein, WAIT state representation waits asynchronous reading data in relief area, reads Data not necessarily effective;READY state represents that the asynchronous reading of data in relief area completes, but data is not necessarily effective; FREE state representation relief area is idle, can read new data.
The relief area that state transition diagram shown in from Fig. 9 can be seen that to FREE state is called after io_submit, buffering Area enters WAIT state, and after the data needed for relief area is read in by asynchronous reading, the state of modification relief area is READY shape State, after having processed the last part data in relief area, buffer state is set to FREE, thus can continue in this relief area Continue and be used for reading new data;If the file that the file that current thread reads is read with the relief area being in READY state It is meant that former certain reads time-out when identical, or upper layer application error, application layer does not notify to stop reading The relief area of data, all buffer states remain as WAIT, after the completion of asynchronous reading, abandon invalid data, modification is slow The state rushing area is READY state.READY state is transformed into FREE state two kinds of possible situations:1)Get in relief area Last part data;2)Data in relief area is invalid.
Because single relief area has three state, then double buffering has 6 kinds of combinations of states, respectively FREE+ FREE, READY+READY, WAIT+WAIT, WAIT+READY, WAIT+FREE and READY+FREE.The state of double buffering turns Move figure as shown in Figure 10.
Now, normal double buffering reading flow process is as follows:
Step A:During beginning, double buffering is in state 1, reads data using current relief area, and current buffers Area calls io_submit function to initiate asynchronous read requests, then calls io_getevents function to wait asynchronous reading Return, current relief area is changed into WAIT state, proceeds to state 5.
Step B:Asynchronous reading completes to return, and current relief area is changed into READY state, proceeds to state 6.
Step C:Take out block number in order according to returning to application layer from the current relief area being in READY state, when from Current relief area read first block number according to when, if ahead relief area is idle, and also have data to need to read, initiation Pre-read, ahead calls relief area io_submit function to initiate pre- read request, specifically return without waiting for asynchronous reading, asynchronous pre- Reading can proceed on backstage, and current relief area is READY state, and ahead relief area is WAIT state, proceeds to state 4.
Step D:If pre-reading slower than upper strata is processed, the block number evidence in current relief area has all been taken away, then Current buffer state is changed into FREE state, and the data of ahead relief area is all not yet ready for being still within WAIT shape State, using original ahead relief area as new current relief area, old current relief area is delayed as new ahead Rush area, proceed to state 5 and wait.
Step E:If pre-reading faster than upper strata is processed, two buffer datas are ready for, and two buffer states are all For READY, proceed to state 2.
Step F:If being in state 2, and the block number evidence of current relief area is all taken away, will be original Ahead relief area proceeds to shape as new current relief area, old current relief area as new ahead relief area State 6.
Step G:How many with read data volume according to pre-reading speed, repeatedly can switch between state 2,4,5,6, until reading Take all of data, the current relief area of last only surplus next one READY, ahead relief area is in FREE state, double Relief area is in state 6, if the data of last current relief area also removes completely, proceeds to state 1, waits next time Reading starts.
By the present embodiment, use state machine manages the state of double buffering, enables worker thread according to double buffering State quickly and easily read data.
Embodiment six
With reference to Figure 11, it illustrates a kind of structured flowchart of the data processing equipment according to the embodiment of the present application six.
The data processing equipment of the present embodiment is used for worker thread and processes double buffering data, and wherein, worker thread includes Individually context;Data processing equipment includes monitoring module 402, is used for making worker thread call first interface function to monitor it Itself double buffering asynchronous pre-reads data state.
The data processing equipment of the present embodiment also includes:Pre- read through model 404, for making worker thread in monitoring module 402 Before calling first interface function to monitor its own relief area asynchronous pre-reads data state, worker thread is made to call second interface letter The asynchronous data pre-reading one or more data blocks treated and sequentially read of number instruction, wherein, second interface function carries asynchronous pre- Reading information, the asynchronous information that pre-reads includes:Treat the order information of data block read and the data base belonging to data block treating that order is read The handle of file;Pre-read information according to asynchronous, directly one or more treat the data read of order from external memory storage by corresponding The asynchronous pre- double buffering reading worker thread itself of data of block.
Preferably, the asynchronous information that pre-reads also includes:Pre- read mode and/or cache way, wherein, pre- read mode is used for indicating The data block treating order reading is that forward sequence is read or reverse sequence is read, and cache way is used to indicate whether to the data pre-reading Enter row cache.
Preferably, pre- read through model 404 is pre-reading information according to asynchronous, directly from external memory storage by corresponding one or many Individual whne the data of data block that order is read asynchronous pre- read worker thread itself double buffering when, pre-read information according to asynchronous, With the size of the double buffering in worker thread, determine the asynchronous number of times pre-reading and order;According to determine asynchronous pre-read time Number and order, call the 3rd interface function directly from external memory storage by corresponding one or more data blocks treating that order is read The asynchronous pre- double buffering reading worker thread itself of data.
Preferably, the |input paramete of the 3rd interface function include this pre-read described in treat the letter of data block that order is read Breath.
Preferably, the data processing equipment of the present embodiment also includes:Determining module 406, for making work in pre- read through model The instruction of thread dispatching second interface function is asynchronous pre-read one or more treat the data of data block read of order before, system according to The querying command of inquiry database file determines that the inquiry to database file is range query;Inquired about according to range query Scope, is that worker thread determines the data block treating that order is read.
Preferably, the data processing equipment of the present embodiment also includes:Distribute module 408, in determining module 406 basis The scope that range query is inquired about, is after worker thread determines the data block treating sequentially to read, and judges that determine treats that order is read It is multiple whether data block has;If so, then distribute the quantity identical double buffering with the data block treating sequentially to read for worker thread.
Preferably, double buffering includes current buffer and read-ahead buffer;Double slow for each of worker thread Rush area, pre- read through model 404 directly corresponding one or more will treat that the data of data blocks that order is read is different from external memory storage When step reads the double buffering of worker thread itself in advance, make worker thread that its current buffer and read-ahead buffer are used alternatingly, Directly the asynchronous data pre-reading corresponding one or more data block treating sequentially to read from external memory storage, pre-reads until asynchronous Complete corresponding one or more all data treated in the data block that order is read.
Preferably, the size of each double buffering is fixed.
Preferably, the size of each double buffering is 1M or 2M.
Preferably, pre- read through model 404 make worker thread directly from external memory storage by corresponding one or more treat suitable Sequence read data block data asynchronous pre- read worker thread itself double buffering when, make worker thread use DIRECT IO Mode directly pre- reads itself from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read Double buffering.
Preferably, the single context of worker thread creates in the following manner:When worker thread creates or work When thread reads data first, data processing equipment uses Libaio to create a single context io_ for worker thread context.
Preferably, the data processing equipment of the present embodiment also includes:Data processing module 410, in monitoring module 402 After making worker thread call first interface function to monitor its own relief area asynchronous pre-reads data state, make worker thread according to The state of asynchronous pre-reads data determines pending data, and from the described data that the double buffering of itself obtains determination is carried out Reason.
Preferably, the state of current buffer includes idle condition, waiting state and SBR, the sky of current buffer Not busy state is used for indicating current buffer current idle, and the waiting state of current buffer is used for instruction and waits asynchronous reading data To current buffer, the SBR of current buffer is used for indicating that the asynchronous data that reads in current buffer completes;Pre-read buffering The state in area includes idle condition, waiting state and SBR, and the idle condition of read-ahead buffer pre-reads buffering for instruction Area's current idle, the waiting state of read-ahead buffer is used for instruction and waits asynchronous reading data to read-ahead buffer, pre-reads buffering The SBR in area is used for indicating that the asynchronous data that reads of read-ahead buffer completes.
Preferably, data processing module 410, for making worker thread determine working as of its current buffer or read-ahead buffer Front state is SBR, determines that the current buffer of SBR or the data of read-ahead buffer are pending data, from Obtain data in the current buffer of worker thread itself or described read-ahead buffer and processed.
Preferably, first interface function is the io_getevents function of Libaio, and/or second interface function is Advise function, and/or the 3rd interface function be get_block function.
The data processing equipment of the present embodiment is used for realizing corresponding data processing method in aforesaid plurality of embodiment of the method, And there is the beneficial effect of corresponding embodiment of the method, will not be described here.
This application provides a kind of based on the asynchronous data processing scheme pre-reading it is achieved that double buffering is asynchronous pre-reads on demand Mechanism, when realizing asynchronous reading using Libaio, not using extra monitoring thread, each worker thread independently monitors certainly Oneself asynchronous reading state;Each worker thread distributes double buffering according to required sequential flow quantity, provides interface function Allow user clearly to give to pre-read instruction, realize pre-reading on demand of streamline, do not read unwanted data;Support reversely pre- Read, the multiple random write to same file and order are read to be independent of each other;Increase disk and read granularity, improve disk and utilize Rate;The option whether data reading adds block cache is provided.By the application, reduce disk tracking number of times and The I/O waiting time of application program, improve disk and read I/O performance, lift Parallel I/O efficiency.
In the application, multiple embodiments are in a distributed manner as a example Database Systems, but are not limited only to distributed data base system, It is all suitable for the scheme of this patent, such as distributed file system etc. in User space by the system that block reads file, all can refer to this Application embodiment realizes the digital independent scheme of the application.
Each embodiment in this specification is all described by the way of going forward one by one, what each embodiment stressed be with The difference of other embodiment, between each embodiment identical similar partly mutually referring to.For device embodiment For, due to itself and embodiment of the method basic simlarity, so description is fairly simple, referring to the portion of embodiment of the method in place of correlation Defend oneself bright.
The application is with reference to according to the method for the embodiment of the present application, equipment(System), and computer program flow process Figure and/or block diagram are describing.It should be understood that can be by each stream in computer program instructions flowchart and/or block diagram Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor instructing general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device is to produce A raw machine is so that produced for reality by the instruction of computer or the computing device of other programmable data processing device The device of the function of specifying in present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing device with spy Determine in the computer-readable memory that mode works so that the instruction generation inclusion being stored in this computer-readable memory refers to Make the manufacture of device, this command device realize in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that counting On calculation machine or other programmable devices, execution series of operation steps to be to produce computer implemented process, thus in computer or On other programmable devices, the instruction of execution is provided for realizing in one flow process of flow chart or multiple flow process and/or block diagram one The step of the function of specifying in individual square frame or multiple square frame.
Above a kind of data processing method provided herein and device are described in detail, used herein Specific case is set forth to the principle of the application and embodiment, and the explanation of above example is only intended to help understand this The method of application and its core concept;Simultaneously for one of ordinary skill in the art, according to the thought of the application, concrete All will change on embodiment and range of application, in sum, this specification content should not be construed as to the application's Limit.

Claims (26)

1. a kind of data processing method processes double buffering data it is characterised in that methods described is used for worker thread, wherein, Described worker thread includes single context;
Described worker thread calls the asynchronous data pre-reading one or more data blocks treated and sequentially read of second interface function instruction, Wherein, described second interface function carries and asynchronous pre-reads information;
According to described asynchronous pre-read information, directly one or more of treat the data read of order from external memory storage by corresponding The data of block is asynchronous pre- to read the described worker thread double buffering of itself;
Described worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.
2. method according to claim 1 is it is characterised in that call first interface function to monitor it in described worker thread Before the step of self buffer asynchronous pre-reads data state, also include:
Described worker thread calls the asynchronous data pre-reading one or more data blocks treated and sequentially read of second interface function instruction, Wherein, described second interface function carry asynchronous pre-read information, the described asynchronous information that pre-reads includes:The described number treating that order is read Information and the described handle treating the database file belonging to the data block that order is read according to block;
According to described asynchronous pre-read information, directly one or more of treat the data read of order from external memory storage by corresponding The data of block is asynchronous pre- to read the described worker thread double buffering of itself.
3. method according to claim 2 is it is characterised in that the described asynchronous information that pre-reads also includes:Pre- read mode and/or Cache way, wherein, it is that forward sequence is read or reverse to the described data block treating that order is read that described pre- read mode is used for instruction Order is read, and described cache way is used to indicate whether to enter row cache to the data pre-reading.
4. method according to claim 2 it is characterised in that according to described asynchronous pre-read information, directly from external storage Device pre- reads the double of described worker thread itself by asynchronous for the corresponding one or more of data of data blocks treating that order is read The step of relief area includes:
According to the described asynchronous size pre-reading the double buffering in information, and described worker thread, determine the asynchronous number of times pre-reading And order;
According to the described asynchronous number of times pre-reading determining and order, call the 3rd interface function directly will correspond to from external memory storage One or more of data of data blocks treating that order is read asynchronous pre- read the described worker thread double buffering of itself.
5. method according to claim 4 it is characterised in that the |input paramete of described 3rd interface function to include this pre- The information of the data block that order is read is treated described in reading.
6. method according to claim 2 is it is characterised in that call the instruction of second interface function different in described worker thread Before step pre-reads one or more steps of the data of data block treated and sequentially read, also include:
According to the querying command of inquiry database file, system determines that the inquiry to database file is range query;
The scope inquired about according to described range query, is that described worker thread determines the data block treating that order is read.
7. method according to claim 6 is it is characterised in that in the scope inquired about according to described range query, be institute State after worker thread determines and treat the step of data block read of order, also include:
Judge described determination to treat whether the data block read of order has multiple;If so, be then the distribution of described worker thread with described Treat the quantity identical double buffering of the data block that order is read.
8. method according to claim 7 is it is characterised in that described double buffering includes current buffer and pre-reads buffering Area;
For each of described worker thread double buffering, described directly will be corresponding one or more from external memory storage Treat that the asynchronous pre- step reading the described worker thread double buffering of itself of data of the data block that order is read includes:Described work Thread is used alternatingly its current buffer and read-ahead buffer, directly from described external memory storage asynchronous pre-read described corresponding One or more data of data blocks treating that order is read, until asynchronous pre- running through described corresponding one or more treats that order is read All data in data block.
9. the method according to any one of claim 1 to 8 is it is characterised in that the size of each described double buffering is fixed.
10. method according to claim 9 is it is characterised in that the size of each described double buffering is 1M or 2M.
11. methods according to claim 2 it is characterised in that described directly from external memory storage by corresponding described one Individual or multiple asynchronous pre- steps reading the described worker thread double buffering of itself of data of data block treating that order is read include:
Described worker thread directly one or more treats that order is read from external memory storage by corresponding using DIRECT IO mode The asynchronous pre- double buffering reading itself of data of data block.
12. methods according to claim 1 are it is characterised in that the single context of described worker thread is by following Mode creates:
When described worker thread creates or when described worker thread reads data first, the use of Libaio is described worker thread Create a single context io_context.
13. methods according to claim 8 are it is characterised in that call first interface function to monitor in described worker thread After the step of its own relief area asynchronous pre-reads data state, also include:
Described worker thread determines pending data, and the double buffering from itself according to the state of described asynchronous pre-reads data Obtain the described data determining to be processed.
14. methods according to claim 13 it is characterised in that
The state of described current buffer includes idle condition, waiting state and SBR, the free time of described current buffer State is used for indicating described current buffer current idle, and the waiting state of described current buffer is used for instruction and waits asynchronous reading Fetch data described current buffer, the SBR of described current buffer is used for indicating the asynchronous reading in described current buffer Data completes;
The state of described read-ahead buffer includes idle condition, waiting state and SBR, the free time of described read-ahead buffer State is used for indicating described read-ahead buffer current idle, and the waiting state of described read-ahead buffer is used for instruction and waits asynchronous reading Fetch data described read-ahead buffer, the SBR of described read-ahead buffer is used for indicating the asynchronous reading of described read-ahead buffer Data completes.
15. methods according to claim 14 are it is characterised in that the described state according to described asynchronous pre-reads data determines Pending data, and obtain, from itself relief area, the step that the described data determining processed and include:
Described worker thread determines that the current state of its current buffer or read-ahead buffer is described SBR, determines described The data of the described current buffer of SBR or described read-ahead buffer is described pending data, from described active line Obtain data in the described current buffer of Cheng Zishen or described read-ahead buffer and processed.
16. methods according to claim 4 are it is characterised in that described first interface function is the io_ of Libaio Getevents function, and/or described second interface function be advise function, and/or described 3rd interface function be get_ Block function.
A kind of 17. data processing equipments process double buffering data it is characterised in that described device is used for worker thread, wherein, Described worker thread includes single context;
Described device includes monitoring module, be used for making described worker thread call the instruction of second interface function asynchronous pre-read one or Multiple data of data blocks treating that order is read, wherein, described second interface function carries and asynchronous pre-reads information;According to described different Step pre-reads information, directly will be asynchronous pre- for the data of corresponding one or more of data blocks treating sequentially to read from external memory storage Read the described worker thread double buffering of itself;Described worker thread calls first interface function to monitor its own double buffering Asynchronous pre-reads data state.
18. devices according to claim 17 are it is characterised in that also include:
Pre- read through model, for making described worker thread call first interface function to monitor its own relief area in described monitoring module Before asynchronous pre-reads data state, make described worker thread call second interface function instruction asynchronous pre-read one or more treat suitable The data of the data block that sequence is read, wherein, described second interface function carry asynchronous pre-read information, described asynchronous pre-read packet Include:The described information of data block treating sequentially to read and the handle of the described database file belonging to data block treating sequentially to read;Root According to described asynchronous pre-read information, directly from external memory storage by corresponding one or more of numbers of data blocks treating that order is read Pre- read the described worker thread double buffering of itself according to asynchronous.
19. devices according to claim 18 are it is characterised in that the described asynchronous information that pre-reads also includes:Pre- read mode and/ Or cache way, wherein, it is that forward sequence is read or anti-to the described data block treating that order is read that described pre- read mode is used for instruction Read to order, described cache way is used to indicate whether to enter row cache to the data pre-reading.
20. devices according to claim 18 it is characterised in that
Described pre- read through model according to described asynchronous pre-read information, directly will be corresponding one or more of from external memory storage Whne the data of data block that order is read asynchronous pre- read described worker thread itself double buffering when, asynchronous pre-read according to described The size of the double buffering in information, and described worker thread, determines the asynchronous number of times pre-reading and order;According to determine The asynchronous number of times pre-reading and order, call the 3rd interface function directly one or more of to treat corresponding from external memory storage The data of the data block that order is read is asynchronous pre- to read the described worker thread double buffering of itself.
21. devices according to claim 20 are it is characterised in that the |input paramete of described 3rd interface function includes this The information of the data block that order is read is treated described in pre-reading.
22. devices according to claim 18 are it is characterised in that also include:
Determining module, for making described worker thread call the instruction of second interface function asynchronous to pre-read one in described pre- read through model Or multiple treat the data of data block read of order before, system determines to data base according to the querying command of inquiry database file The inquiry of file is range query;The scope inquired about according to described range query, is that described worker thread determination treats that order is read Data block.
23. devices according to claim 22 are it is characterised in that also include:
Distribute module, for the scope inquired about according to described range query in described determining module, is that described worker thread is true Surely, after treating the data block that order is read, judge whether the data block treating sequentially to read of described determination has multiple;If so, it is then described Worker thread distribution and the quantity identical double buffering of the described data block treating sequentially to read.
24. devices according to claim 23 are it is characterised in that described double buffering includes current buffer and pre-reads slow Rush area;
For each of described worker thread double buffering, described pre- read through model is directly will be corresponding from external memory storage One or more whne the data of data blocks that order is read asynchronous pre- read described worker thread itself double buffering when, make described Worker thread is used alternatingly its current buffer and read-ahead buffer, directly from described external memory storage asynchronous pre-read described right The one or more data of data blocks treating that order is read answered, until asynchronous pre- running through described corresponding one or more treats order All data in the data block read.
25. devices according to any one of claim 17 to 24 are it is characterised in that the size of each described double buffering is solid Fixed.
26. devices according to claim 25 are it is characterised in that the size of each described double buffering is 1M or 2M.
CN201210250129.9A 2012-07-18 2012-07-18 Data processing method and device Active CN103577158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210250129.9A CN103577158B (en) 2012-07-18 2012-07-18 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210250129.9A CN103577158B (en) 2012-07-18 2012-07-18 Data processing method and device

Publications (2)

Publication Number Publication Date
CN103577158A CN103577158A (en) 2014-02-12
CN103577158B true CN103577158B (en) 2017-03-01

Family

ID=50049016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210250129.9A Active CN103577158B (en) 2012-07-18 2012-07-18 Data processing method and device

Country Status (1)

Country Link
CN (1) CN103577158B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885726B (en) * 2014-03-20 2017-07-21 东蓝数码股份有限公司 A kind of efficient multithreading daily record wiring method
CN106161503A (en) * 2015-03-27 2016-11-23 中兴通讯股份有限公司 File reading in a kind of distributed memory system and service end
CN105302743A (en) * 2015-10-12 2016-02-03 北海市云盛科技有限公司 Method and apparatus for pre-reading in cache
CN108132757B (en) * 2016-12-01 2021-10-19 阿里巴巴集团控股有限公司 Data storage method and device and electronic equipment
CN109213607B (en) * 2017-06-30 2021-07-23 武汉斗鱼网络科技有限公司 Multithreading rendering method and device
CN109471671B (en) * 2017-09-06 2023-03-24 武汉斗鱼网络科技有限公司 Program cold starting method and system
US20190034427A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Data management system employing a hash-based and tree-based key-value data structure
CN110968557B (en) * 2018-09-30 2023-05-05 阿里巴巴集团控股有限公司 Data processing method and device in distributed file system and electronic equipment
CN109614220B (en) 2018-10-26 2020-06-30 阿里巴巴集团控股有限公司 Multi-core system processor and data updating method
CN110795632B (en) * 2019-10-30 2022-10-04 北京达佳互联信息技术有限公司 State query method and device and electronic equipment
CN113139003B (en) * 2020-01-19 2023-04-11 上海静客网络科技有限公司 Spark-based big data processing method
CN111258967A (en) * 2020-02-11 2020-06-09 西安奥卡云数据科技有限公司 Data reading method and device in file system and computer readable storage medium
CN111343404B (en) * 2020-02-19 2022-05-24 精微视达医疗科技(三亚)有限公司 Imaging data processing method and device
CN112631957B (en) * 2020-12-14 2024-04-05 深兰人工智能(深圳)有限公司 Data acquisition method and device, electronic equipment and storage medium
CN113609093B (en) * 2021-06-30 2023-12-22 济南浪潮数据技术有限公司 Reverse order reading method, system and related device of distributed file system
CN114327299B (en) * 2022-03-01 2022-06-03 苏州浪潮智能科技有限公司 Sequential reading and pre-reading method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221465A (en) * 2008-01-04 2008-07-16 东南大学 Data buffer zone implementing method for reducing hard disk power consumption
CN102426553A (en) * 2011-11-11 2012-04-25 中国科学技术大学 Method and device for transmitting data to user based on double-cache pre-reading
CN102508638A (en) * 2011-09-27 2012-06-20 华为技术有限公司 Data pre-fetching method and device for non-uniform memory access

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133654A1 (en) * 2006-12-01 2008-06-05 Chei-Yol Kim Network block device using network asynchronous i/o

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221465A (en) * 2008-01-04 2008-07-16 东南大学 Data buffer zone implementing method for reducing hard disk power consumption
CN102508638A (en) * 2011-09-27 2012-06-20 华为技术有限公司 Data pre-fetching method and device for non-uniform memory access
CN102426553A (en) * 2011-11-11 2012-04-25 中国科学技术大学 Method and device for transmitting data to user based on double-cache pre-reading

Also Published As

Publication number Publication date
CN103577158A (en) 2014-02-12

Similar Documents

Publication Publication Date Title
CN103577158B (en) Data processing method and device
US20230168897A1 (en) System Having a Hybrid Threading Processor, a Hybrid Threading Fabric Having Configurable Computing Elements, and a Hybrid Interconnection Network
US8904154B2 (en) Execution migration
Breß et al. Robust query processing in co-processor-accelerated databases
US9760486B2 (en) Accelerating cache state transfer on a directory-based multicore architecture
US7925840B2 (en) Data processing apparatus and method for managing snoop operations
US20060206635A1 (en) DMA engine for protocol processing
CN103345451B (en) Data buffering method in multi-core processor
DE102013209350A1 (en) Resource management subsystem that adheres to fairness and order
CN101799773A (en) Memory access method of parallel computing
WO2013080434A1 (en) Dynamic process/object scoped memory affinity adjuster
CN103365794A (en) Data processing method and system
Cong et al. BiN: A buffer-in-NUCA Scheme for Accelerator-rich CMPs
CN102681890B (en) A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus
CN1928811A (en) Processing operations management systems and methods
CN101853218B (en) Method and system for reading redundant array of inexpensive disks (RAID)
US20090083496A1 (en) Method for Improved Performance With New Buffers on NUMA Systems
CN101290592B (en) Realization method for multiple program sharing SPM on MPSOC
KR20010080208A (en) Processing system scheduling
CN102110019B (en) Transactional memory method based on multi-core processor and partition structure
CN102193828A (en) Decoupling the number of logical threads from the number of simultaneous physical threads in a processor
US11392388B2 (en) System and method for dynamic determination of a number of parallel threads for a request
Geyer et al. Pipeline group optimization on disaggregated systems
Montesano et al. Spatial/temporal locality-based load-sharing in speculative discrete event simulation on multi-core machines
Luo et al. JeCache: just-enough data caching with just-in-time prefetching for big data applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211117

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG TMALL TECHNOLOGY Co.,Ltd.

Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.