CN103577158B - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN103577158B CN103577158B CN201210250129.9A CN201210250129A CN103577158B CN 103577158 B CN103577158 B CN 103577158B CN 201210250129 A CN201210250129 A CN 201210250129A CN 103577158 B CN103577158 B CN 103577158B
- Authority
- CN
- China
- Prior art keywords
- read
- data
- asynchronous
- worker thread
- order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of data processing method and device, wherein, data processing method is used for worker thread and processes double buffering data, and described worker thread includes single context;Described worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.By the application, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O performance of system.
Description
Technical field
The application is related to technical field of data processing, more particularly to a kind of data processing method and device.
Background technology
Become the Main Means that lifting calculates performance, the pass of Computer System Design with multiprocessor, multinuclear, multithreading
Note focus is transferred to the parallel running of multiple or even a large amount of threads from the execution performance of single thread, and exploitation can count
On the parallel processor of amount sustainable growth, the application program of Effec-tive Function just becomes extremely important.Meanwhile, software parallelization
Also imply that I/O(Input/output)Parallelization.In I/O parallelization, pre-read can efficiently reduce disk tracking number of times and
The I/O waiting time of application program, is to improve disk to read one of important optimization means of I/O performance.
Generally, operating system is realized pre-reading function to file in kernel spacing.Mainstream operation system is all followed
One simple and effective principle:It is divided into random write and order to read two big class reading mode, and only order is read to pre-read, in advance
The access module that reading algorithm is required to recognition application will be accessed for page of data with predicting.Traditional pre-reads calculation
The method that method adopts pattern match, monitors the read request sequence to each file for the application program, safeguards certain historical record, and
Itself and access module are carried out characteristic matching one by one.If meeting the feature of any nonrandom access module, you can according to
This feature is predicted and pre-reads.Taking (SuSE) Linux OS as a example, its pre-read flow process as shown in figure 1, this pre-read including:Step
S10:Determine whether that order is read, if it is not, then terminating to pre-read;If so, it is by step S20;Step S20:It is big that calculating pre-reads
Little;Step S30:Carry out streamline to pre-read, return to step S10.
But, above-mentioned pre- read mode has system consumption and high cost, and asynchronous reading efficiency is low, especially for distribution
For formula system, asynchronous reading efficiency is more low, the problem of system I/O poor performance.
Content of the invention
This application provides a kind of data processing method and device, system cannot be met to solve the existing scheme that pre-reads
Pre-read demand, especially distributed system pre-read demand it is impossible to effectively improve the problem of system I/O performance.
In order to solve the above problems, this application discloses a kind of data processing method, methods described is at worker thread
Reason double buffering data, wherein, described worker thread includes single context;Described worker thread calls first interface function
Monitor its own double buffering asynchronous pre-reads data state.
In order to solve the above problems, disclosed herein as well is a kind of data processing equipment, described device is for worker thread
Process double buffering data, wherein, described worker thread includes single context;Described device includes monitoring module, is used for
Described worker thread is made to call first interface function to monitor its own double buffering asynchronous pre-reads data state.
Compared with prior art, the application has advantages below:
In the application, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has independence
Context and double buffering, and, monitor its own double buffering asynchronous pre-reads data shape by calling first interface function
State, monitoring thread that need not be extra.Each worker thread has one or more independent double buffering, can be quickly timely
The concurrent pre-reads data in ground;Each worker thread has independent context, and each worker thread can be connect by calling first
Mouth function monitors its own double buffering asynchronous pre-reads data state, determines that the double buffering of this worker thread itself is asynchronous and pre-reads
The situation of data, need not be using extra monitoring thread, it is to avoid many worker threads are monitored making using a monitoring thread
The unnecessary context switching becoming, and the asking of the whole program cisco unity malfunction leading to when monitoring thread goes wrong
Topic.It can be seen that, by the application, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O of system
Energy.
Brief description
Fig. 1 is that a kind of (SuSE) Linux OS kernel of the prior art pre-reads schematic flow sheet;
Fig. 2 is a kind of schematic diagram of asynchronous pre- read procedure;
Fig. 3 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application two;
Fig. 4 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application three;
Fig. 5 is that one of embodiment illustrated in fig. 4 worker thread enters the schematic diagram that line asynchronous pre-reads;
Fig. 6 is a kind of flow chart of steps of the data processing method according to the embodiment of the present application four;
Fig. 7 is the process chart of the advise function in embodiment illustrated in fig. 6;
Fig. 8 is the process chart of the get_block function in embodiment illustrated in fig. 6;
Fig. 9 is to be illustrated according to the state transfer of the single buffer in a kind of data processing method of the embodiment of the present application five
Figure;
Figure 10 is to be illustrated according to the state transfer of the double buffering in a kind of data processing method of the embodiment of the present application five
Figure;
Figure 11 is a kind of structured flowchart of the data processing equipment according to the embodiment of the present application six.
Specific embodiment
Understandable for enabling the above-mentioned purpose of the application, feature and advantage to become apparent from, below in conjunction with the accompanying drawings and specifically real
Mode of applying is described in further detail to the application.
Hereinafter, for ease of understanding the digital independent scheme of the application, first asynchronous pre- pronouncing is introduced with brief.
Asynchronous pre-reading is more conventional in data pre-head and effective means, and it is shifted to an earlier date desired data by asynchronous I/O
It is loaded into internal memory, to postpone to application hides I/O.Fig. 2 is a kind of schematic diagram of asynchronous pre- read procedure, as shown in Fig. 2
Asynchronous pre-reading can make CPU and disk work simultaneously, thus improving the utilization rate of computer system.Without pre-reading, then
In loading data, disk is busy and CPU waits;In processing data, CPU is busy and disk is idle.This alternate free time
It is the idle waste of the one kind to system resource with waiting.And pass through asynchronous pre-read, operating system carries out I/O in advance on backstage, can
To cut down the waiting time of CPU, it is allowed to and disk concurrent working, realizes assembly line work.
The digital independent scheme of the application is based on above-mentioned asynchronous pre- read mode, is below described in detail.
Embodiment one
The data processing method of the present embodiment is used for worker thread and processes double buffering data, and wherein, worker thread includes
Individually context;Worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.
Worker thread is used for processing one or more data blocks treating that order is read, and in the present embodiment, each worker thread has
There is a single context, and there is the double buffering space of oneself.Each worker thread is processing the double buffering of itself
During data, monitor its own double buffering asynchronous pre-reads data state by calling first interface function, and then can be according to this
Asynchronous pre-reads data state carries out follow-up pre-reads data and processes, and extra monitoring thread need not enter line asynchronous pre-reads data state
Monitoring.
By the present embodiment, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has
Independent context and double buffering, and, monitor the asynchronous pre- reading of its own double buffering by calling first interface function
According to state, monitoring thread that need not be extra.Each worker thread has one or more independent double buffering, can be quick
Concurrent pre-reads data in time;Each worker thread has independent context, and each worker thread can be by calling
One interface function monitors its own double buffering asynchronous pre-reads data state, determines that the double buffering of this worker thread itself is asynchronous
The situation of pre-reads data, need not be using extra monitoring thread, it is to avoid many worker threads are supervised using a monitoring thread
The unnecessary context switching that control causes, and the whole program cisco unity malfunction leading to when monitoring thread goes wrong
Problem.It can be seen that, by the present embodiment, can effectively meet the demand that pre-reads of system especially distributed system, improve system
I/O performance.
Embodiment two
With reference to Fig. 3, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application two.
The data processing method of the present embodiment comprises the following steps:
Step S 102:Distribute at least one double buffering for each worker thread in internal memory.
Wherein, each worker thread is used for processing one or more data blocks treating that order is read, and, each worker thread
There is a single context.
Worker thread is the thread of processing data, and in a distributed manner as a example Database Systems, each worker thread can process one
Individual inquiry request, according to inquiry request it is known that need to read those continuous data blocks, then worker thread is from outer simultaneously
Portion's memorizer reads these data blocks.If an inquiry request needs to read the multiple consecutive data block from multiple files,
Or read multiple consecutive data block from same file, worker thread can need the consecutive data block reading to distribute for each
One double buffering is reading desired data.
Often start a thread pool during system start-up, when there being task to need to process, one can be obtained from thread pool
As worker thread, each worker thread can create single context when first time reading file to individual thread, certainly also may be used
To create context while creating worker thread, worker thread determines continuous data to be read according to the demand of task
Block.There are several consecutive data block to need to read it is possible to set up several double bufferings simultaneously, process after completing a task,
Worker thread puts back to thread pool it is preferable that the internal memory of context therein and double buffering does not discharge, as idle double buffering
Area, remain under treatment individual task when reuse, during next process task, if necessary to double buffering, first from the sky of worker thread
Distribute in not busy double buffering, without idle double buffering, then create new double buffering.Certainly, in actual applications, double
The quantity of relief area be also possible to different from needing the quantity of consecutive data block reading simultaneously, can be by those skilled in the art
It is appropriately arranged with, the data processing scheme of equally applicable the present embodiment.
Each worker thread has a single context, inside this worker thread, context is processed, has
Effect avoids the context switching of many work cross-threads;Distribute double buffering for each worker thread, it is possible to use double buffering
Alternation, when processing the data of a relief area using another relief area pre-reads data, realizes real asynchronous pre-read.
Step S104:Each worker thread directly one or more treats the data read of order from external memory storage by corresponding
The asynchronous pre- double buffering reading this worker thread itself of data of block.
In pre-reads data, each worker thread directly by the data pre-head in external memory storage to User space itself is double
In relief area, it is to avoid pagecache using so that data directly in external memory storage as between disk and user buffering area
Transmission.
Preferably, in pre-reads data, worker thread calls the instruction of second interface function asynchronous to pre-read one or more treating
Order read data block data, wherein, second interface function carry asynchronous pre-read information, the asynchronous information that pre-reads includes:Treat
The information of data block sequentially read and the handle of the database file belonging to data block treating sequentially to read;Worker thread is according to asynchronous
Pre-read information, directly pre- read work from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read
Make the double buffering of thread itself.
Preferably, worker thread is pre-reading information according to asynchronous, directly will be corresponding one or more from external memory storage
Whne the data of data block that order is read asynchronous pre- read worker thread itself double buffering when, information can be pre-read according to asynchronous
With the size of the double buffering in worker thread, determine the asynchronous number of times pre-reading and order;Asynchronous pre-read further according to determine
Number of times and order, call the 3rd interface function directly one or more to treat the data blocks read of order from external memory storage by corresponding
The asynchronous pre- double buffering reading worker thread itself of data.
Preferably, the asynchronous information that pre-reads also includes:Pre- read mode and/or cache way, wherein, pre- read mode is used for indicating
The data block treating order reading is that forward sequence is read or reverse sequence is read, and cache way is used to indicate whether to the data pre-reading
Enter row cache.
Preferably, the |input paramete of the 3rd interface function include this pre-read described in treat the letter of data block that order is read
Breath.Additionally, the |input paramete of the 3rd interface function can also include:Pre- read mode and/or cache way, wherein, pre- read mode
For indicating that what this treated the data block that order is read pre-read that forward sequence is read or reverse sequence is read, cache way is used for referring to
Show whether row cache is entered to the data that this pre-reads.
Step S106:Each worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data shape
State, determines the asynchronous progress pre-reading according to this state, obtains data from the double buffering of its own and is processed.
System makes each worker thread monitor its own double buffering asynchronous pre-reads data state by first interface function,
And then the asynchronous progress pre-reading of the data of itself can be determined according to this state, determine from this worker thread double buffering of itself
Middle reading data is processed.
Process to buffer data can be conventional treatment means, such as modification, storage, transmission etc., and the application is to this
It is not restricted.
By the present embodiment, each is used for processing the worker thread of one or more data blocks treated and sequentially read, and has
Independent context and double buffering, and, each worker thread is directly from the asynchronous pre-reads data of external memory storage, and then according to
The asynchronous progress acquisition data that pre-reads is processed.By directly from the asynchronous pre-reads data of external memory storage, avoid pagecache
Use;Each worker thread has one or more independent double buffering, being capable of quick concurrent pre-reads data in time;
And, when pre-reads data amount is larger, data to be pre-read repeatedly can be read using double buffering asynchronous alternate, reads every time
The data volume of buffer size;When worker thread processes the data of one of double buffering relief area such as current buffer,
Asynchronous parallel reads in data another relief area to double buffering such as read-ahead buffer from external memory storage, thus realizing
Worker thread processing data and the executed in parallel reading data from external memory storage, effectively prevent data and are constantly added into,
The situation being eliminated, being rejoined, improves system I/O performance;Additionally, each worker thread have independent upper and lower
Literary composition, and each worker thread can determine the asynchronous pre- reading of the double buffering of this worker thread itself according to the asynchronous progress pre-reading
According to situation, need not use monitoring thread, it is to avoid many worker threads using a monitoring thread be monitored causing need not
The context switching wanted, and the problem of the whole program cisco unity malfunction leading to when monitoring thread goes wrong.It can be seen that,
By the present embodiment, can effectively meet the demand that pre-reads of system especially distributed system, improve the I/O performance of system.
Embodiment three
With reference to Fig. 4, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application three.
The data processing method of the present embodiment comprises the following steps:
Step S202:Shared out the work thread according to file access command, and, needs are determined according to this document visit order
Carry out order read operation, be that assignment thread determines the data block treating that order is read.
Wherein, can be when system receives file access command from thread pool for order read operation assignment thread
Distribution, e.g., according to the querying command of data base, share out the work from thread pool thread, and work of not reallocating when order reads
Make thread, further according to this querying command, determine needs to access those files worker thread, then determine the need for accessing these literary compositions
The continuous data block of part.
By verifying following two conditions, this step can judge that whether a read operation is using linux system kernel
The mode that order is read:(1)This is to read for the first time after file is opened, and read is file header;(2)Current read request
It is continuous with position in file for the previous read request.When determine to carry out order pre-read after, true for this worker thread
Surely data block to be pre-read.
Certainly, not limited to this, actually used in, other modes read according to file access command determination order are equally suitable
With, e.g., when accessing data base, being shared out the work thread according to the querying command of inquiry database file, determining to database file
Inquiry be range query, then can be identified as order and read, and then the scope inquired about according to range query, true for this worker thread
Surely treat the data block that order is read, that is, when accessing data base, first share out the work thread, if range query accesses, then can determine that
This inquiry operation is order read operation, and then determines the data block treating that order is read for this worker thread.
Step S204:Distribute at least one double buffering for assignment thread in internal memory, this worker thread has
One single context.
Worker thread should processing data, again read data, the file access command that worker thread is processed according to oneself,
To determine needing which consecutive data block read from which file, for each consecutive data block reading distribute one double slow
Rush area, the double buffering number of worker thread does not know, and determines number according to file access command.And, next time processes literary composition
Existing double buffering can be reused, when the double buffering number in worker thread is not enough, then for working during part visit order
The new double buffering of thread creation.
That is it is generally the case that each worker thread may be allocated a relief area in internal memory, but,
When a worker thread needs to multiple when the data block that order is read is processed, then to divide in internal memory for this worker thread
Join the double buffering equal with the quantity of multiple data blocks treated and sequentially read, that is, system can first determine whether each worker thread
It is multiple whether the pending data block treating sequentially to read has;If so, then distribute in internal memory for this worker thread and read with treating order
Data block quantity identical double buffering.During distribution double buffering, divide first from the idle double buffering of worker thread
Join, without enough idle double bufferings, then create new double buffering for worker thread, when worker thread has been processed times
After business, by the idle double buffering list of the thread of all double buffering addition work.
Preferably, each double buffering of each worker thread adopts fixed size, and the double buffering of fixed size is realized
Fixing pre-reads granularity, using the fixing process that granularity eliminates pre- reading window Rapid Expansion of pre-reading, realizes more simply high
Effect.Wherein, the size of double buffering can be appropriately arranged with according to practical situation by those skilled in the art it is preferable that each is double
The size of relief area is 1M or 2M.Using the granularity that pre-reads of 1M, the utilization rate of monolithic disk can reach 60% in theory, and uses
2M pre-reads granularity, and the utilization rate of monolithic disk can reach more than 75% in theory.Certainly, it is not limited to the double buffering of fixed size
Area, in actual use, the relief area of on-fixed size is equally applicable.
In addition, context such as io_context and double buffering in worker thread can be reused, at place
Initialization during first pre- read request of reason.
Step S206:Worker thread directly will be corresponding one or more from external memory storage using Direct IO mode
Treat the asynchronous pre- double buffering reading this worker thread itself of data of the data block that order is read.
Each worker thread directly from external memory storage by corresponding one or more data of data blocks treating that order is read
The asynchronous pre- double buffering reading this worker thread itself, refers to worker thread workaround system, avoids pagecache,
Directly operate IO(Input and output), that is, Direct IO(Directly IO), Direct IO makes file data directly in disk
Transmit and user buffering area between, Direct IO is functionally equivalent to original I O of equipment.
In the present embodiment, the double buffering setting each worker thread includes current buffer and read-ahead buffer.Specifically
To a double buffering of a worker thread, this worker thread can be used alternatingly its current buffer and read-ahead buffer,
Directly the asynchronous data pre-reading corresponding one or more data block treating sequentially to read from external memory storage, pre-reads until asynchronous
Complete corresponding one or more all data treated in the data block that order is read.Preferably, worker thread, can when reading data
With the pipeline system reading manner using double buffering, that is,:After the data of current buffer returns, if also data needs
Read, immediately begin to the asynchronous data pre-reading read-ahead buffer;Then process the data of current buffer, after the completion of process, will
Current buffer and read-ahead buffer exchange;After current buffer and read-ahead buffer exchange, if current buffer data
Do not complete and read, wait its asynchronous reading to complete;After current buffer data completes to read, if also data needs to read,
Start again at the data pre-reading read-ahead buffer immediately, then process the data of current buffer, so circulate, until having read
Till becoming all of data.Can see from this process, as long as in the case of having data to need to read, double buffering replaces to be read
Take, whenever have a relief area in asynchronous reading data, and the data reading is all needs, does not read any
Unnecessary data.
During above-mentioned data pre-head, each worker thread calls first interface function independently to monitor this worker thread certainly
The state of the asynchronous pre-reads data of double buffering of body.Preferably, each worker thread, when needing processing data, can call use
It is first interface function in the asynchronous interface function pre-reading status checkout, this worker thread itself is monitored by this interface function
The state of the asynchronous pre-reads data of double buffering;And then pending data is determined according to the state of asynchronous pre-reads data, and from certainly
The double buffering of body obtains the data determining and is processed.
In the present embodiment, the state setting relief area includes idle condition, waiting state and SBR.For double buffering
For current buffer in area, the state of current buffer includes idle condition, waiting state and SBR, Current buffer
The idle condition in area is used for indicating current buffer current idle, and the waiting state of current buffer is used for instruction and waits asynchronous reading
Fetch data current buffer, the SBR of current buffer is used for indicating that the asynchronous data that reads in current buffer completes;Right
For the read-ahead buffer in double buffering, the state of read-ahead buffer also includes idle condition, waiting state and prepares shape
State, the idle condition of read-ahead buffer is used for indicating read-ahead buffer current idle, and the waiting state of read-ahead buffer is used for referring to
Show the asynchronous data that reads of wait to read-ahead buffer, the SBR of read-ahead buffer is used for indicating the asynchronous reading of read-ahead buffer
Data completes.
Step S208:Worker thread according to the state of the asynchronous pre-reads data of the double buffering of its own, from its own
Obtain data in double buffering and processed.
Specifically, including:Worker thread determines the current buffer of this worker thread or the current state of read-ahead buffer
For SBR, then obtain data from the current buffer of this worker thread itself or read-ahead buffer and processed.
It is illustrated below, worker thread mainly processes the work of data, data is from external memory storage such as disk
Read coming it is assumed that certain worker thread only needs to the data that order reads one of file consecutive data block, if
Data block size is 10M, and each relief area is 1M, and data can be read in current buffer by worker thread first, then beginning
The data of reason current buffer, initiates an ensuing data of asynchronous reading to read-ahead buffer, work to operating system simultaneously
Make the data that thread constantly processes current buffer, after the completion of process, call similar io_getevents etc. for asynchronous pre-
The first interface function of read states detection waits the digital independent of read-ahead buffer to complete, and then exchanges current buffer and pre-reads
Relief area, the data processing current buffer initiates asynchronous pre-reads data to read-ahead buffer simultaneously, required until having processed
Data.It should be noted that the state that worker thread monitors double buffering can be during processing data, needing to count
According to and in the case that data is not ready for, check the state of double buffering, wait to be read complete, generally will not poll double slow
Rush zone state, and worker thread is only responsible for checking the corresponding double buffering of IO that oneself is initiated.This mode makes asynchronous pre-
Read data speed faster, in hgher efficiency.Certainly, be also feasible by the way of poll, but asynchronous pre-read effect will under
Fall.
Hereinafter, with a worker thread based on Libaio the and Direct IO under Linux environment as instantiation, right
The present embodiment is explained.
With reference to Fig. 5, show the work based on Libaio the and Direct IO under Linux environment in the present embodiment
Enter the schematic diagram that line asynchronous pre-reads as thread.
Libaio is Linux Native AIO, is primary asynchronous I/O interface under Linux, can only be with Direct IO mono-
Rise and use, employ Direct IO, the pagecache of operating system cannot be used, so the cache of User space is exactly must
Spare unit.Libaio provides tetra- API of io_setup, io_submit, io_getevents and io_destroy(Application program
Interface), wherein, io_setup is used for building the handle of an asynchronous IO context;It is asynchronous that io_destroy is used for destruction one
The handle of IO context;Io_submit is used for submitting an asynchronous I/O operation to in specified asynchronous IO context handle, incoming
Parameter is the structure of a struct iocb, which includes need read-write file handle, offset, size, buffer and
The pointer of one void* type;The asynchronous IO that io_getevents is used for the prescribed numbers such as obstruction completes event, incoming the longest
Blocking time, the event number of minimum acquisition, the event number at most obtaining, be used for asynchronous pre- read states inspection in the present embodiment
Look into.
Using the pagecache of Direct IO workaround system, directly read disk, the internal memory pre-reading and file
Cache management has application program to manage independently, and makes the memory management of whole application program controlled.
In the present embodiment, each worker thread has an io_context(IO context, the work space of Libaio),
Each worker thread calls io_getevents to monitor the io_context of oneself;Separate between each worker thread, no
Share any data, be independent of each other;Extra monitoring thread is not needed to monitor asynchronous reading state.
On the one hand, when whole process shares an io_context, monitoring thread needs to collect from io_context
In the message distribution that receives give multiple different worker threads, this process is related to lock competition and context switching, impact
Efficiency, and in this example, each thread is owned by an io_context it is not necessary to lock protection, receiving does not need during message point
Send out, do not have context to switch.
On the other hand, if there is extra monitoring thread, extra monitoring thread needs recursive call io_
Getevents function checks asynchronous reading state, if asynchronous reading completes, wakes up the corresponding worker thread blocking, then
The data that worker thread process has been read, this mode can be periodically waken up due to io_getevents, increased not
Necessary context switching, particularly in the case of high concurrent, frequently context switching will affect reading efficiency;And,
Once monitoring thread goes wrong, whole program cisco unity malfunction will be led to.And this example uses io_getevents function to supervise
The state of the asynchronous pre-reads data of double buffering of control worker thread itself, to monitor asynchronous reading shape without extra monitoring thread
State, then avoid unnecessary context switching, also effectively prevent and leads to the whole program can not when monitoring thread goes wrong
The situation of normal work.
The scheme of the asynchronous reading of this example as shown in figure 5, showing a worker thread in Fig. 5, this worker thread bag
Include multiple double bufferings, from 0 to N, N be more than or equal to 1, each double buffering include current buffer Current Buffer and
Read-ahead buffer Ahead Buffer, when this worker thread needs to read data, calls io_submit function to send asynchronous reading
Take request, then call io_getevents function etc. to be returned;Worker thread do not need receive data according to when, worker thread is empty
In the spare time, there will not be unnecessary context switching.
By the present embodiment, solve existing pre-read the demand that pre-reads that scheme cannot meet system, especially distributed
System pre-read demand it is impossible to effectively improve the problem of system I/O performance, effectively meet system especially distributed system
Pre-read demand, improve the I/O performance of system.
Example IV
With reference to Fig. 6, it illustrates a kind of flow chart of steps of the data processing method according to the embodiment of the present application four.
The present embodiment, in a distributed manner as a example Database Systems, is explained to the data processing method of the application.
There are multiple nodes in distributed data base system, each node is by a disk battle array being made up of several piece disk
Row, multiple nodes carry out the concurrent data services performance that Parallel I/O can lift whole distributed data base.Each node saves
Thousands of data files, each data file is by thousands of block number evidences, each block about 64KB.Visit to distributed data
The pattern of asking is broadly divided into two classes, simple queries(get)And range query(scan), simple queries need the multiple block number of random read take
According to range query needs order reads several block number evidences, even travels through the data file of all disks of whole node, and scope is looked into
Ask and also include reverse range query.When carrying out range query needs order read same data file different piece or
Order reads multiple data files, and multiple random read takes and multiple parallel order may be had to read and simultaneously act on same data
File.In order to prevent frequently beating opening/closing data file, distributed data base node often opens thousands of data literary compositions simultaneously
Part, and cache file handle, each file may store several orders and pre-read stream, and operating system follows the tracks of so multiple files
Pre-read stream mode, a lot of memory sources will be consumed.For this reason, present embodiments providing a kind of method for reading data, ask for solving this
Topic provides a kind of feasible program.
The double buffering of the present embodiment is asynchronous to pre-read the asynchronous reading based on Libaio and Direct IO for the machine on demand, still adopts
Each worker thread has an io_context, and each worker thread calls io_getevents to monitor the io_ of oneself
context;Separate between each worker thread, do not share any data, be independent of each other, monitoring thread that need not be extra is come
Monitor asynchronous reading state, when worker thread needs to read data, call io_submit function to send asynchronous read requests,
Then call the scheme to be returned such as io_getevents function.And, the present embodiment is given bright using advise interface function
True order pre-reads instruction, reads data using get_block interface function order.
The data processing method of the present embodiment comprises the following steps:
Step S302:Setting second interface function advise function and the 3rd interface function get_block function.
Advise function is used for instruction order and reads, namely the asynchronous number pre-reading one or more data blocks treated and sequentially read
According to giving certain sequential flow(Treat the data block that order is read)Clearly sequentially pre-read instruction, advise function needs incoming different
Step pre-reads information, that is, advise function needs incoming parameter to include data file handle(It is to treat that order is read in the present embodiment
The database file belonging to data block handle), need the continuous blocks data message reading(Treat the data block that order is read
Information), if desired for all block metamessage arrays reading(Block initial address hereof and length), may comprise
Hundreds and thousands of block metamessages, these block metamessages may comprise multiple groups, every group of block in physical file all
It is continuous, be equivalent to and specify multiple orders and pre-read stream, a relief area can accommodate 1 multiple block data.Preferably
Ground, needs the incoming asynchronous information that pre-reads also to include:Pre- read mode and/or cache way, wherein, it is right that pre- read mode is used for indicating
Treat that the data block read of order is that forward sequence is read or reverse sequence is read, that is, the need of reversely pre-reading;Cache way is used for
Indicate whether to enter row cache to the data pre-reading, be such as cached to block caching in, i.e. the need of by block number according to add block
cache(The block data buffer storage that application program is realized).According to above-mentioned asynchronous pre-read information, worker thread can directly be deposited from outside
Reservoir is by corresponding one or more asynchronous pre- double bufferings reading this worker thread itself of data of data blocks treating that order is read
Qu Zhong.
Due to can clearly know that in distributed data system which kind of operates as random write(It is defined as during simple queries
Random write), which kind of operates is read for order(It is defined as order during range query to read), so providing advise function, allow tune
User explicitly calls advise function to be explicitly indicated needs order and pre-reads, and advise function makes to judge whether to sequentially pre-read change
Obtain very simple.If worker thread carries out range query, the sequential flow reading is needed to call advise function to give for each bright
Really pre-read instruction, during asynchronous reading, instruction is pre-read according to this and carry out sequentially pre-reading.In distributed data base system, every underrange
During inquiry, the data file needing to be related to can be calculated, each block number is according to original position in the data file and stop bits
Put, when calling advise function for each sequential flow it would be desirable to the information of the block number evidence reading is incoming, then worker thread exists
When reading data, using the pipeline system reading manner of double buffering.When current relief area(I.e. current buffer)Data
After return, if also data needs to read, immediately begin to asynchronous pre-read ahead relief area(I.e. read-ahead buffer)Data;
Then process the data of current relief area, after the completion of process, by current relief area and the exchange of ahead relief area;
After current relief area and the exchange of ahead relief area, if current buffer data does not complete read, wait it asynchronous
Reading completes;After current buffer data completes to read, if also data needs to read, start again at immediately and pre-read
The data of ahead relief area, then processes the data of current relief area, so circulates, complete all of data until reading
Till.Can see from this process, as long as in the case of having data to need to read, double buffering replaces reading whenever
There is a relief area in asynchronous reading data, and the data reading is all needs, do not read any unnecessary number
According to.
One sequential flow is needed the set of blocks reading disposably to pass to advise function by worker thread, and shows and give
Clearly pre-read instruction, then call get_block function to take out required block number evidence in order, get_block reads block number evidence
Must be in order that if necessary to reversely pre-read, then the block data order that get_block reads is also necessarily it is impossible to out of order
Reverse.Get_block function is specified by upper strata needs random write or order to read, if random write, then using synchronous side
Formula read, if order read, then read by the way of the double buffering of the present embodiment pre-reads on demand, first read data arrive pair
The current buffer of relief area, returns the data needed for application layer, and initiates the back to back data of asynchronous reading to another one
In relief area(Read-ahead buffer), upper layer application constantly calls get_block function to obtain required data, works as Current buffer
The data in area is employed after floor all takes away, and using read-ahead buffer as current buffer, old current buffer is as new
Read-ahead buffer, initiates asynchronous pre-reading again.This process is achieved processing data and is separated with the data reading, and is processing number
According to while pre-read data to be processed.
Get_block function can once read the data of a relief area to application layer it is also possible to read one every time
Or the data of multiple block is to application layer, the parameter of get_block includes the information of the data block that this pre-reads, namely this
The one or more block metamessages pre-reading.Such as pre- read mode is reversely to pre-read, and a general block data only returns to
Once, to each sequential flow, the given block metamessage in get_block function parameter is also reverse and connects application layer
Continuous, repeatedly call the block metamessage parameter that get_block function gives also to must assure that reverse and continuous, application layer root
To determine how to call get_block to obtain data according to the demand of real data, such as to call from each sequential flow successively
Get_block obtains first block, then obtains second block of each sequential flow again, until application layer has read institute
There are all block data comprising in advise parameter, or terminate digital independent in advance.Get_block function can basis
The cache way that advise function is specified enters row cache to block.
In distributed data base system, either simple queries(get), or range query(scan), it is all with block
For the unit reading, thus need to provide a get_block function for read a block number evidence, simple queries need at random
Read multiple block number evidences, range query needs order reads several block number evidences, even travels through the number of all disks of whole node
According to file, range query also includes reverse range query, in order to meet these functions, needs the get_block function being capable of basis
|input paramete is determining using random read take block number evidence, or order pre-reads, or reverse sequence pre-reads, because range query can
The data file of all disks of whole node can be traveled through, so get_block function needs to decide whether handle according to |input paramete
The block number evidence reading is added in block cache.That is, in the letter according to the asynchronous data block pre-reading and treating that order is read
Breath, and treat the size of each double buffering in the corresponding worker thread of data block that order is read, determine the asynchronous number of times pre-reading
After order;According to the asynchronous number of times pre-reading determining and order, call the 3rd interface function, i.e. get_block function, directly
Pre- read this worker thread itself from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read
Double buffering.
Step S304:Shared out the work thread according to the querying command of inquiry database file, and, according to this querying command
Determine that the inquiry to database file is range query.
Step S306:The scope inquired about according to range query, is that assignment thread determines the data treating that order is read
Block, this worker thread has a single context.
Wherein, when this worker thread creates or this worker thread is first for the single context that this worker thread has
During secondary reading data, create a single context io_context using Libaio for this worker thread.
Step S308:Worker thread calls advise function to carry out sequentially reading instruction process.
Wherein, advise function handling process as shown in fig. 7, comprises:
Step S3082:Judge whether the data block reading is continuous, if so, then carries out step S3084;If it is not, then terminating suitable
Sequence pre-reads flow process.
This step checks whether the block in incoming set of blocks is continuous and belongs to same file.
Step S3084:The data block reading for need distributes this worker thread privately owned double buffering.
Determine whether distribute corresponding double buffering for this document sequential flow in worker thread, if corresponding double slow
Rush area not exist, then distribute one from the privately owned idle double buffering of worker thread, if worker thread does not have idle pair to delay
Rush area, create a new double buffering as the double buffering of this sequential flow.
Step S3086:The block message that preservation need to be read is to double buffering.
In distributed data base system, carry out the different portions that needs order during range query reads same data file
Divide or order reads multiple files that is to say, that a range query may relate to multiple sequential flow(Treat that order is read
Data block), each worker thread has one or more double bufferings, and each double buffering saves a shape pre-reading stream
State, in worker thread, the number of double buffering and this worker thread process the concurrent sequential flow involved by a range query
Number is equal.After worker thread completes a range query, the relief area in thread will be reused by next range query.?
In distributed data base system, multiple random read takes and multiple parallel order may be had to read and simultaneously act on same data literary composition
Part, worker thread call get_block function realize one block number of random read take according to when, using the privately owned block of an additional thread
Relief area, and synchronously read a block using Direct IO, and order reads using privately owned one or more double of worker thread
Relief area, is read by the way of asynchronous reading, even if so same data file is read sequentially and random read take simultaneously,
What order read pre-reads all without being affected.
The buffer size reading for order can configure, and is defaulted as 1M that is to say, that each read according to required reading
The data volume taking, reads data according to the granularity of 1M from disk as far as possible, if the data being actually needed reading is less than 1M, once
All of digital independent is completed by disk I/O.Using the reading granularity of 1M, the utilization rate of monolithic disk can reach 60% in theory,
The disk array being often made up of several disks in each node of distributed data base system, each disk ensures high usage
The handling capacity making node is greatly improved.Additionally, buffer size can also be configured to 2M, the utilization rate of monolithic disk is in theory
More than 75% can be reached.
In this step, the information Store such as the block message that read need and file handle is in the internal state of double buffering structure
In.
Step S3088:Block in block cache is made marks.
As described above, the data in block cache can reuse, if this data block sequentially pre-reading is in block
Exist in cache, then need to be marked.Traversal set of blocks, the block in block cache is arranged one and there is mark
Will, prevents from being eliminated during reading file in the block that block cache exists.
Step S30810:Calculate original position and the size of the reading of each relief area, notify get_block function.
Using first not the block offset address in block cache as first time read start offset address and and count
Calculate the continuous blocks number needing to read, read in units of block.Calculate start offset address and the block number pre-reading block next time.
Step S310:Worker thread calls get_block function to carry out digital independent process.
Wherein, the handling process of get_block function is as shown in figure 8, include:
Step S3102:Whether decision block, in block cache, if so, then reads from block cache and returns;
If it is not, then carrying out step S3104.
As fruit block exists in block cache, read and return.
Step S3104:Determine whether that order reads, if so, then carry out step S3106;If it is not, then synchronous read this block
Data, carries out step S3108.
If one block of random read take, synchronously read a block number evidence using Direct IO and return, proceed to step
S3108 executes;Otherwise, proceed to the execution of step S3106.
Step S3106:Read block number evidence using double buffering streamline.
Processing sequence reads, and obtains the state of double buffering, reads block number evidence according to double buffering.If returning time-out,
Directly return time-out error, otherwise, update double buffering internal state, return the block reading, proceed to the execution of step S3108.
Step S3108:Whether decision block adds in block cache, if it is not, then terminating this to pre-read flow process;If so, then
Carry out step S31010.
Step S31010:By block number according in addition block cache.
If necessary by this block copy in block cache, and this block not in block cache, then copies and is somebody's turn to do
Block is in block cache.Data is added in block cache it is achieved that the recycling of data.
It can be seen that, by advise function and get_block function, simply and effectively achieve the data of the instruction that order is read
Read.
The present embodiment is according to the demand of distributed data base system it is achieved that the asynchronous ahead mechanism on demand of double buffering, this machine
What system perfection solved distributed data base system node pre-reads demand it is achieved that the ordered, asynchronization of I/O and parallelization,
Thus improving I/O performance.In the present embodiment, application program is had at fingertips to the access module of data, so in user's space
Realization pre-reads ratio and realizes pre-reading more targetedly in kernel, can be made to measure according to application demand.Such as in distributed number
According in the system of storehouse, simple queries are random read take, and range query is that order reads, and so divide simply direct, only scope are looked into
Inquiry is pre-read, and eliminates and is read come recognition sequence with the mode of pattern match.Pre-reading size does not need to estimate, every underrange is looked into
Ask required block number according to being all pre-determined for application program, the data pre-reading is all required for program, due to needing
Total amount of data to be read only needs to read from back to front it is known that reversely pre-reading, and reads the data of a buffer size every time.
The granularity pre-reading is set to 1M or 2M, and using the granularity that pre-reads of 1M, the utilization rate of monolithic disk can reach 60% in theory, use
2M pre-reads granularity, and the utilization rate of monolithic disk can reach more than 75% in theory, and the fixing granularity that pre-reads eliminates pre- reading window
The process of Rapid Expansion, realizes more simply efficient.Each worker thread has one or more double bufferings, each double buffering
Save a state pre-reading stream, in worker thread, the number of double buffering is with involved by one range query of thread process
Concurrent sequential flow number equal, after worker thread completes a range query, the relief area in thread is by by next model
Enclose inquiry to be reused, adopt and solve distributed data base in this way to some of same data file or multiple literary composition
Part does the demand that order reads simultaneously.For simple queries, do not pre-read, also would not have influence on thread pre-reads buffering
Area, thus isolated the impact that random write is read to order.Typically all can add distributed data base from the data of the reading of disk
In block cache in, but to large-scale inquiry request, such as travel through all data files of whole node, for these
Data is all not written into block cache, it is to avoid to block cache data contamination.Each double buffering comprises a current
Relief area and ahead relief area, when worker thread processes the data of current relief area, ahead relief area has begun to
Asynchronous pre-read.
Embodiment five
The present embodiment is further optimized to the data processing method in example IV, increases state machine management work
Make the state of the double buffering in thread.
The state transfer of the single buffer in the present embodiment and double buffering is respectively as shown in Figure 9 and Figure 10.
With reference to Fig. 9, it illustrates the state transfer schematic diagram of the single buffer in the data processing method of the present embodiment.
As shown in figure 9, there are three kinds of states single relief area:WAIT state(I.e. waiting state), READY state(Prepare
State), and FREE state(I.e. idle condition).Wherein, WAIT state representation waits asynchronous reading data in relief area, reads
Data not necessarily effective;READY state represents that the asynchronous reading of data in relief area completes, but data is not necessarily effective;
FREE state representation relief area is idle, can read new data.
The relief area that state transition diagram shown in from Fig. 9 can be seen that to FREE state is called after io_submit, buffering
Area enters WAIT state, and after the data needed for relief area is read in by asynchronous reading, the state of modification relief area is READY shape
State, after having processed the last part data in relief area, buffer state is set to FREE, thus can continue in this relief area
Continue and be used for reading new data;If the file that the file that current thread reads is read with the relief area being in READY state
It is meant that former certain reads time-out when identical, or upper layer application error, application layer does not notify to stop reading
The relief area of data, all buffer states remain as WAIT, after the completion of asynchronous reading, abandon invalid data, modification is slow
The state rushing area is READY state.READY state is transformed into FREE state two kinds of possible situations:1)Get in relief area
Last part data;2)Data in relief area is invalid.
Because single relief area has three state, then double buffering has 6 kinds of combinations of states, respectively FREE+
FREE, READY+READY, WAIT+WAIT, WAIT+READY, WAIT+FREE and READY+FREE.The state of double buffering turns
Move figure as shown in Figure 10.
Now, normal double buffering reading flow process is as follows:
Step A:During beginning, double buffering is in state 1, reads data using current relief area, and current buffers
Area calls io_submit function to initiate asynchronous read requests, then calls io_getevents function to wait asynchronous reading
Return, current relief area is changed into WAIT state, proceeds to state 5.
Step B:Asynchronous reading completes to return, and current relief area is changed into READY state, proceeds to state 6.
Step C:Take out block number in order according to returning to application layer from the current relief area being in READY state, when from
Current relief area read first block number according to when, if ahead relief area is idle, and also have data to need to read, initiation
Pre-read, ahead calls relief area io_submit function to initiate pre- read request, specifically return without waiting for asynchronous reading, asynchronous pre-
Reading can proceed on backstage, and current relief area is READY state, and ahead relief area is WAIT state, proceeds to state 4.
Step D:If pre-reading slower than upper strata is processed, the block number evidence in current relief area has all been taken away, then
Current buffer state is changed into FREE state, and the data of ahead relief area is all not yet ready for being still within WAIT shape
State, using original ahead relief area as new current relief area, old current relief area is delayed as new ahead
Rush area, proceed to state 5 and wait.
Step E:If pre-reading faster than upper strata is processed, two buffer datas are ready for, and two buffer states are all
For READY, proceed to state 2.
Step F:If being in state 2, and the block number evidence of current relief area is all taken away, will be original
Ahead relief area proceeds to shape as new current relief area, old current relief area as new ahead relief area
State 6.
Step G:How many with read data volume according to pre-reading speed, repeatedly can switch between state 2,4,5,6, until reading
Take all of data, the current relief area of last only surplus next one READY, ahead relief area is in FREE state, double
Relief area is in state 6, if the data of last current relief area also removes completely, proceeds to state 1, waits next time
Reading starts.
By the present embodiment, use state machine manages the state of double buffering, enables worker thread according to double buffering
State quickly and easily read data.
Embodiment six
With reference to Figure 11, it illustrates a kind of structured flowchart of the data processing equipment according to the embodiment of the present application six.
The data processing equipment of the present embodiment is used for worker thread and processes double buffering data, and wherein, worker thread includes
Individually context;Data processing equipment includes monitoring module 402, is used for making worker thread call first interface function to monitor it
Itself double buffering asynchronous pre-reads data state.
The data processing equipment of the present embodiment also includes:Pre- read through model 404, for making worker thread in monitoring module 402
Before calling first interface function to monitor its own relief area asynchronous pre-reads data state, worker thread is made to call second interface letter
The asynchronous data pre-reading one or more data blocks treated and sequentially read of number instruction, wherein, second interface function carries asynchronous pre-
Reading information, the asynchronous information that pre-reads includes:Treat the order information of data block read and the data base belonging to data block treating that order is read
The handle of file;Pre-read information according to asynchronous, directly one or more treat the data read of order from external memory storage by corresponding
The asynchronous pre- double buffering reading worker thread itself of data of block.
Preferably, the asynchronous information that pre-reads also includes:Pre- read mode and/or cache way, wherein, pre- read mode is used for indicating
The data block treating order reading is that forward sequence is read or reverse sequence is read, and cache way is used to indicate whether to the data pre-reading
Enter row cache.
Preferably, pre- read through model 404 is pre-reading information according to asynchronous, directly from external memory storage by corresponding one or many
Individual whne the data of data block that order is read asynchronous pre- read worker thread itself double buffering when, pre-read information according to asynchronous,
With the size of the double buffering in worker thread, determine the asynchronous number of times pre-reading and order;According to determine asynchronous pre-read time
Number and order, call the 3rd interface function directly from external memory storage by corresponding one or more data blocks treating that order is read
The asynchronous pre- double buffering reading worker thread itself of data.
Preferably, the |input paramete of the 3rd interface function include this pre-read described in treat the letter of data block that order is read
Breath.
Preferably, the data processing equipment of the present embodiment also includes:Determining module 406, for making work in pre- read through model
The instruction of thread dispatching second interface function is asynchronous pre-read one or more treat the data of data block read of order before, system according to
The querying command of inquiry database file determines that the inquiry to database file is range query;Inquired about according to range query
Scope, is that worker thread determines the data block treating that order is read.
Preferably, the data processing equipment of the present embodiment also includes:Distribute module 408, in determining module 406 basis
The scope that range query is inquired about, is after worker thread determines the data block treating sequentially to read, and judges that determine treats that order is read
It is multiple whether data block has;If so, then distribute the quantity identical double buffering with the data block treating sequentially to read for worker thread.
Preferably, double buffering includes current buffer and read-ahead buffer;Double slow for each of worker thread
Rush area, pre- read through model 404 directly corresponding one or more will treat that the data of data blocks that order is read is different from external memory storage
When step reads the double buffering of worker thread itself in advance, make worker thread that its current buffer and read-ahead buffer are used alternatingly,
Directly the asynchronous data pre-reading corresponding one or more data block treating sequentially to read from external memory storage, pre-reads until asynchronous
Complete corresponding one or more all data treated in the data block that order is read.
Preferably, the size of each double buffering is fixed.
Preferably, the size of each double buffering is 1M or 2M.
Preferably, pre- read through model 404 make worker thread directly from external memory storage by corresponding one or more treat suitable
Sequence read data block data asynchronous pre- read worker thread itself double buffering when, make worker thread use DIRECT IO
Mode directly pre- reads itself from external memory storage by asynchronous for the corresponding one or more data of data blocks treating that order is read
Double buffering.
Preferably, the single context of worker thread creates in the following manner:When worker thread creates or work
When thread reads data first, data processing equipment uses Libaio to create a single context io_ for worker thread
context.
Preferably, the data processing equipment of the present embodiment also includes:Data processing module 410, in monitoring module 402
After making worker thread call first interface function to monitor its own relief area asynchronous pre-reads data state, make worker thread according to
The state of asynchronous pre-reads data determines pending data, and from the described data that the double buffering of itself obtains determination is carried out
Reason.
Preferably, the state of current buffer includes idle condition, waiting state and SBR, the sky of current buffer
Not busy state is used for indicating current buffer current idle, and the waiting state of current buffer is used for instruction and waits asynchronous reading data
To current buffer, the SBR of current buffer is used for indicating that the asynchronous data that reads in current buffer completes;Pre-read buffering
The state in area includes idle condition, waiting state and SBR, and the idle condition of read-ahead buffer pre-reads buffering for instruction
Area's current idle, the waiting state of read-ahead buffer is used for instruction and waits asynchronous reading data to read-ahead buffer, pre-reads buffering
The SBR in area is used for indicating that the asynchronous data that reads of read-ahead buffer completes.
Preferably, data processing module 410, for making worker thread determine working as of its current buffer or read-ahead buffer
Front state is SBR, determines that the current buffer of SBR or the data of read-ahead buffer are pending data, from
Obtain data in the current buffer of worker thread itself or described read-ahead buffer and processed.
Preferably, first interface function is the io_getevents function of Libaio, and/or second interface function is
Advise function, and/or the 3rd interface function be get_block function.
The data processing equipment of the present embodiment is used for realizing corresponding data processing method in aforesaid plurality of embodiment of the method,
And there is the beneficial effect of corresponding embodiment of the method, will not be described here.
This application provides a kind of based on the asynchronous data processing scheme pre-reading it is achieved that double buffering is asynchronous pre-reads on demand
Mechanism, when realizing asynchronous reading using Libaio, not using extra monitoring thread, each worker thread independently monitors certainly
Oneself asynchronous reading state;Each worker thread distributes double buffering according to required sequential flow quantity, provides interface function
Allow user clearly to give to pre-read instruction, realize pre-reading on demand of streamline, do not read unwanted data;Support reversely pre-
Read, the multiple random write to same file and order are read to be independent of each other;Increase disk and read granularity, improve disk and utilize
Rate;The option whether data reading adds block cache is provided.By the application, reduce disk tracking number of times and
The I/O waiting time of application program, improve disk and read I/O performance, lift Parallel I/O efficiency.
In the application, multiple embodiments are in a distributed manner as a example Database Systems, but are not limited only to distributed data base system,
It is all suitable for the scheme of this patent, such as distributed file system etc. in User space by the system that block reads file, all can refer to this
Application embodiment realizes the digital independent scheme of the application.
Each embodiment in this specification is all described by the way of going forward one by one, what each embodiment stressed be with
The difference of other embodiment, between each embodiment identical similar partly mutually referring to.For device embodiment
For, due to itself and embodiment of the method basic simlarity, so description is fairly simple, referring to the portion of embodiment of the method in place of correlation
Defend oneself bright.
The application is with reference to according to the method for the embodiment of the present application, equipment(System), and computer program flow process
Figure and/or block diagram are describing.It should be understood that can be by each stream in computer program instructions flowchart and/or block diagram
Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor instructing general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device is to produce
A raw machine is so that produced for reality by the instruction of computer or the computing device of other programmable data processing device
The device of the function of specifying in present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing device with spy
Determine in the computer-readable memory that mode works so that the instruction generation inclusion being stored in this computer-readable memory refers to
Make the manufacture of device, this command device realize in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or
The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that counting
On calculation machine or other programmable devices, execution series of operation steps to be to produce computer implemented process, thus in computer or
On other programmable devices, the instruction of execution is provided for realizing in one flow process of flow chart or multiple flow process and/or block diagram one
The step of the function of specifying in individual square frame or multiple square frame.
Above a kind of data processing method provided herein and device are described in detail, used herein
Specific case is set forth to the principle of the application and embodiment, and the explanation of above example is only intended to help understand this
The method of application and its core concept;Simultaneously for one of ordinary skill in the art, according to the thought of the application, concrete
All will change on embodiment and range of application, in sum, this specification content should not be construed as to the application's
Limit.
Claims (26)
1. a kind of data processing method processes double buffering data it is characterised in that methods described is used for worker thread, wherein,
Described worker thread includes single context;
Described worker thread calls the asynchronous data pre-reading one or more data blocks treated and sequentially read of second interface function instruction,
Wherein, described second interface function carries and asynchronous pre-reads information;
According to described asynchronous pre-read information, directly one or more of treat the data read of order from external memory storage by corresponding
The data of block is asynchronous pre- to read the described worker thread double buffering of itself;
Described worker thread calls first interface function to monitor its own double buffering asynchronous pre-reads data state.
2. method according to claim 1 is it is characterised in that call first interface function to monitor it in described worker thread
Before the step of self buffer asynchronous pre-reads data state, also include:
Described worker thread calls the asynchronous data pre-reading one or more data blocks treated and sequentially read of second interface function instruction,
Wherein, described second interface function carry asynchronous pre-read information, the described asynchronous information that pre-reads includes:The described number treating that order is read
Information and the described handle treating the database file belonging to the data block that order is read according to block;
According to described asynchronous pre-read information, directly one or more of treat the data read of order from external memory storage by corresponding
The data of block is asynchronous pre- to read the described worker thread double buffering of itself.
3. method according to claim 2 is it is characterised in that the described asynchronous information that pre-reads also includes:Pre- read mode and/or
Cache way, wherein, it is that forward sequence is read or reverse to the described data block treating that order is read that described pre- read mode is used for instruction
Order is read, and described cache way is used to indicate whether to enter row cache to the data pre-reading.
4. method according to claim 2 it is characterised in that according to described asynchronous pre-read information, directly from external storage
Device pre- reads the double of described worker thread itself by asynchronous for the corresponding one or more of data of data blocks treating that order is read
The step of relief area includes:
According to the described asynchronous size pre-reading the double buffering in information, and described worker thread, determine the asynchronous number of times pre-reading
And order;
According to the described asynchronous number of times pre-reading determining and order, call the 3rd interface function directly will correspond to from external memory storage
One or more of data of data blocks treating that order is read asynchronous pre- read the described worker thread double buffering of itself.
5. method according to claim 4 it is characterised in that the |input paramete of described 3rd interface function to include this pre-
The information of the data block that order is read is treated described in reading.
6. method according to claim 2 is it is characterised in that call the instruction of second interface function different in described worker thread
Before step pre-reads one or more steps of the data of data block treated and sequentially read, also include:
According to the querying command of inquiry database file, system determines that the inquiry to database file is range query;
The scope inquired about according to described range query, is that described worker thread determines the data block treating that order is read.
7. method according to claim 6 is it is characterised in that in the scope inquired about according to described range query, be institute
State after worker thread determines and treat the step of data block read of order, also include:
Judge described determination to treat whether the data block read of order has multiple;If so, be then the distribution of described worker thread with described
Treat the quantity identical double buffering of the data block that order is read.
8. method according to claim 7 is it is characterised in that described double buffering includes current buffer and pre-reads buffering
Area;
For each of described worker thread double buffering, described directly will be corresponding one or more from external memory storage
Treat that the asynchronous pre- step reading the described worker thread double buffering of itself of data of the data block that order is read includes:Described work
Thread is used alternatingly its current buffer and read-ahead buffer, directly from described external memory storage asynchronous pre-read described corresponding
One or more data of data blocks treating that order is read, until asynchronous pre- running through described corresponding one or more treats that order is read
All data in data block.
9. the method according to any one of claim 1 to 8 is it is characterised in that the size of each described double buffering is fixed.
10. method according to claim 9 is it is characterised in that the size of each described double buffering is 1M or 2M.
11. methods according to claim 2 it is characterised in that described directly from external memory storage by corresponding described one
Individual or multiple asynchronous pre- steps reading the described worker thread double buffering of itself of data of data block treating that order is read include:
Described worker thread directly one or more treats that order is read from external memory storage by corresponding using DIRECT IO mode
The asynchronous pre- double buffering reading itself of data of data block.
12. methods according to claim 1 are it is characterised in that the single context of described worker thread is by following
Mode creates:
When described worker thread creates or when described worker thread reads data first, the use of Libaio is described worker thread
Create a single context io_context.
13. methods according to claim 8 are it is characterised in that call first interface function to monitor in described worker thread
After the step of its own relief area asynchronous pre-reads data state, also include:
Described worker thread determines pending data, and the double buffering from itself according to the state of described asynchronous pre-reads data
Obtain the described data determining to be processed.
14. methods according to claim 13 it is characterised in that
The state of described current buffer includes idle condition, waiting state and SBR, the free time of described current buffer
State is used for indicating described current buffer current idle, and the waiting state of described current buffer is used for instruction and waits asynchronous reading
Fetch data described current buffer, the SBR of described current buffer is used for indicating the asynchronous reading in described current buffer
Data completes;
The state of described read-ahead buffer includes idle condition, waiting state and SBR, the free time of described read-ahead buffer
State is used for indicating described read-ahead buffer current idle, and the waiting state of described read-ahead buffer is used for instruction and waits asynchronous reading
Fetch data described read-ahead buffer, the SBR of described read-ahead buffer is used for indicating the asynchronous reading of described read-ahead buffer
Data completes.
15. methods according to claim 14 are it is characterised in that the described state according to described asynchronous pre-reads data determines
Pending data, and obtain, from itself relief area, the step that the described data determining processed and include:
Described worker thread determines that the current state of its current buffer or read-ahead buffer is described SBR, determines described
The data of the described current buffer of SBR or described read-ahead buffer is described pending data, from described active line
Obtain data in the described current buffer of Cheng Zishen or described read-ahead buffer and processed.
16. methods according to claim 4 are it is characterised in that described first interface function is the io_ of Libaio
Getevents function, and/or described second interface function be advise function, and/or described 3rd interface function be get_
Block function.
A kind of 17. data processing equipments process double buffering data it is characterised in that described device is used for worker thread, wherein,
Described worker thread includes single context;
Described device includes monitoring module, be used for making described worker thread call the instruction of second interface function asynchronous pre-read one or
Multiple data of data blocks treating that order is read, wherein, described second interface function carries and asynchronous pre-reads information;According to described different
Step pre-reads information, directly will be asynchronous pre- for the data of corresponding one or more of data blocks treating sequentially to read from external memory storage
Read the described worker thread double buffering of itself;Described worker thread calls first interface function to monitor its own double buffering
Asynchronous pre-reads data state.
18. devices according to claim 17 are it is characterised in that also include:
Pre- read through model, for making described worker thread call first interface function to monitor its own relief area in described monitoring module
Before asynchronous pre-reads data state, make described worker thread call second interface function instruction asynchronous pre-read one or more treat suitable
The data of the data block that sequence is read, wherein, described second interface function carry asynchronous pre-read information, described asynchronous pre-read packet
Include:The described information of data block treating sequentially to read and the handle of the described database file belonging to data block treating sequentially to read;Root
According to described asynchronous pre-read information, directly from external memory storage by corresponding one or more of numbers of data blocks treating that order is read
Pre- read the described worker thread double buffering of itself according to asynchronous.
19. devices according to claim 18 are it is characterised in that the described asynchronous information that pre-reads also includes:Pre- read mode and/
Or cache way, wherein, it is that forward sequence is read or anti-to the described data block treating that order is read that described pre- read mode is used for instruction
Read to order, described cache way is used to indicate whether to enter row cache to the data pre-reading.
20. devices according to claim 18 it is characterised in that
Described pre- read through model according to described asynchronous pre-read information, directly will be corresponding one or more of from external memory storage
Whne the data of data block that order is read asynchronous pre- read described worker thread itself double buffering when, asynchronous pre-read according to described
The size of the double buffering in information, and described worker thread, determines the asynchronous number of times pre-reading and order;According to determine
The asynchronous number of times pre-reading and order, call the 3rd interface function directly one or more of to treat corresponding from external memory storage
The data of the data block that order is read is asynchronous pre- to read the described worker thread double buffering of itself.
21. devices according to claim 20 are it is characterised in that the |input paramete of described 3rd interface function includes this
The information of the data block that order is read is treated described in pre-reading.
22. devices according to claim 18 are it is characterised in that also include:
Determining module, for making described worker thread call the instruction of second interface function asynchronous to pre-read one in described pre- read through model
Or multiple treat the data of data block read of order before, system determines to data base according to the querying command of inquiry database file
The inquiry of file is range query;The scope inquired about according to described range query, is that described worker thread determination treats that order is read
Data block.
23. devices according to claim 22 are it is characterised in that also include:
Distribute module, for the scope inquired about according to described range query in described determining module, is that described worker thread is true
Surely, after treating the data block that order is read, judge whether the data block treating sequentially to read of described determination has multiple;If so, it is then described
Worker thread distribution and the quantity identical double buffering of the described data block treating sequentially to read.
24. devices according to claim 23 are it is characterised in that described double buffering includes current buffer and pre-reads slow
Rush area;
For each of described worker thread double buffering, described pre- read through model is directly will be corresponding from external memory storage
One or more whne the data of data blocks that order is read asynchronous pre- read described worker thread itself double buffering when, make described
Worker thread is used alternatingly its current buffer and read-ahead buffer, directly from described external memory storage asynchronous pre-read described right
The one or more data of data blocks treating that order is read answered, until asynchronous pre- running through described corresponding one or more treats order
All data in the data block read.
25. devices according to any one of claim 17 to 24 are it is characterised in that the size of each described double buffering is solid
Fixed.
26. devices according to claim 25 are it is characterised in that the size of each described double buffering is 1M or 2M.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210250129.9A CN103577158B (en) | 2012-07-18 | 2012-07-18 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210250129.9A CN103577158B (en) | 2012-07-18 | 2012-07-18 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103577158A CN103577158A (en) | 2014-02-12 |
CN103577158B true CN103577158B (en) | 2017-03-01 |
Family
ID=50049016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210250129.9A Active CN103577158B (en) | 2012-07-18 | 2012-07-18 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103577158B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103885726B (en) * | 2014-03-20 | 2017-07-21 | 东蓝数码股份有限公司 | A kind of efficient multithreading daily record wiring method |
CN106161503A (en) * | 2015-03-27 | 2016-11-23 | 中兴通讯股份有限公司 | File reading in a kind of distributed memory system and service end |
CN105302743A (en) * | 2015-10-12 | 2016-02-03 | 北海市云盛科技有限公司 | Method and apparatus for pre-reading in cache |
CN108132757B (en) * | 2016-12-01 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Data storage method and device and electronic equipment |
CN109213607B (en) * | 2017-06-30 | 2021-07-23 | 武汉斗鱼网络科技有限公司 | Multithreading rendering method and device |
CN109471671B (en) * | 2017-09-06 | 2023-03-24 | 武汉斗鱼网络科技有限公司 | Program cold starting method and system |
US20190034427A1 (en) * | 2017-12-28 | 2019-01-31 | Intel Corporation | Data management system employing a hash-based and tree-based key-value data structure |
CN110968557B (en) * | 2018-09-30 | 2023-05-05 | 阿里巴巴集团控股有限公司 | Data processing method and device in distributed file system and electronic equipment |
CN109614220B (en) | 2018-10-26 | 2020-06-30 | 阿里巴巴集团控股有限公司 | Multi-core system processor and data updating method |
CN110795632B (en) * | 2019-10-30 | 2022-10-04 | 北京达佳互联信息技术有限公司 | State query method and device and electronic equipment |
CN113139003B (en) * | 2020-01-19 | 2023-04-11 | 上海静客网络科技有限公司 | Spark-based big data processing method |
CN111258967A (en) * | 2020-02-11 | 2020-06-09 | 西安奥卡云数据科技有限公司 | Data reading method and device in file system and computer readable storage medium |
CN111343404B (en) * | 2020-02-19 | 2022-05-24 | 精微视达医疗科技(三亚)有限公司 | Imaging data processing method and device |
CN112631957B (en) * | 2020-12-14 | 2024-04-05 | 深兰人工智能(深圳)有限公司 | Data acquisition method and device, electronic equipment and storage medium |
CN113609093B (en) * | 2021-06-30 | 2023-12-22 | 济南浪潮数据技术有限公司 | Reverse order reading method, system and related device of distributed file system |
CN114327299B (en) * | 2022-03-01 | 2022-06-03 | 苏州浪潮智能科技有限公司 | Sequential reading and pre-reading method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101221465A (en) * | 2008-01-04 | 2008-07-16 | 东南大学 | Data buffer zone implementing method for reducing hard disk power consumption |
CN102426553A (en) * | 2011-11-11 | 2012-04-25 | 中国科学技术大学 | Method and device for transmitting data to user based on double-cache pre-reading |
CN102508638A (en) * | 2011-09-27 | 2012-06-20 | 华为技术有限公司 | Data pre-fetching method and device for non-uniform memory access |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133654A1 (en) * | 2006-12-01 | 2008-06-05 | Chei-Yol Kim | Network block device using network asynchronous i/o |
-
2012
- 2012-07-18 CN CN201210250129.9A patent/CN103577158B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101221465A (en) * | 2008-01-04 | 2008-07-16 | 东南大学 | Data buffer zone implementing method for reducing hard disk power consumption |
CN102508638A (en) * | 2011-09-27 | 2012-06-20 | 华为技术有限公司 | Data pre-fetching method and device for non-uniform memory access |
CN102426553A (en) * | 2011-11-11 | 2012-04-25 | 中国科学技术大学 | Method and device for transmitting data to user based on double-cache pre-reading |
Also Published As
Publication number | Publication date |
---|---|
CN103577158A (en) | 2014-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103577158B (en) | Data processing method and device | |
US11880687B2 (en) | System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network | |
US8904154B2 (en) | Execution migration | |
US9760486B2 (en) | Accelerating cache state transfer on a directory-based multicore architecture | |
US20060206635A1 (en) | DMA engine for protocol processing | |
US9684600B2 (en) | Dynamic process/object scoped memory affinity adjuster | |
CN103345451B (en) | Data buffering method in multi-core processor | |
CN1758229A (en) | Local space shared memory method of heterogeneous multi-kernel microprocessor | |
DE102013209350A1 (en) | Resource management subsystem that adheres to fairness and order | |
CN101799773A (en) | Memory access method of parallel computing | |
TWI603198B (en) | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines | |
CN103365793A (en) | Data processing method and system | |
CN103365794A (en) | Data processing method and system | |
CN102681890B (en) | A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus | |
CN110032450A (en) | A kind of extensive deep learning method and system based on solid-state disk exented memory | |
CN1928811A (en) | Processing operations management systems and methods | |
US20090083496A1 (en) | Method for Improved Performance With New Buffers on NUMA Systems | |
CN101290592B (en) | Realization method for multiple program sharing SPM on MPSOC | |
Geyer et al. | Pipeline Group Optimization on Disaggregated Systems. | |
TWI548994B (en) | An interconnect structure to support the execution of instruction sequences by a plurality of engines | |
US20200097297A1 (en) | System and method for dynamic determination of a number of parallel threads for a request | |
CN109710563B (en) | Cache partition dividing method for reconfigurable system | |
Dai et al. | How can we design better networks for DSM systems? | |
CN111651375A (en) | Method and system for realizing consistency of cache data of multi-path processor based on distributed finite directory | |
CN106030517A (en) | Architecture for long latency operations in emulated shared memory architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211117 Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: ZHEJIANG TMALL TECHNOLOGY Co.,Ltd. Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK Patentee before: ALIBABA GROUP HOLDING Ltd. |