CN107818012A - A kind of data processing method, device and electronic equipment - Google Patents

A kind of data processing method, device and electronic equipment Download PDF

Info

Publication number
CN107818012A
CN107818012A CN201610818710.4A CN201610818710A CN107818012A CN 107818012 A CN107818012 A CN 107818012A CN 201610818710 A CN201610818710 A CN 201610818710A CN 107818012 A CN107818012 A CN 107818012A
Authority
CN
China
Prior art keywords
task
digital independent
thread
subregion
timestamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610818710.4A
Other languages
Chinese (zh)
Other versions
CN107818012B (en
Inventor
刘峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610818710.4A priority Critical patent/CN107818012B/en
Publication of CN107818012A publication Critical patent/CN107818012A/en
Application granted granted Critical
Publication of CN107818012B publication Critical patent/CN107818012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a kind of data processing method, device and electronic equipment, and the data processing method includes:Digital independent task caused by subregion is put into task queue;When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent task is extracted from the task queue, reading thread according to the task creation of extraction is put into the thread pool;Wherein, the thread pool is used to deposit the reading thread for taking process resource in turn.The application is more in subregion, and subregion is handled up in the case of differing greatly, and can improve the reading efficiency of partition data.

Description

A kind of data processing method, device and electronic equipment
Technical field
The present invention relates to computer realm, more particularly to a kind of data processing method, device and electronic equipment.
Background technology
The more common data source of cloud computing big data module (or perhaps data relay) is usually to rely on Kafka at present (or the product similar with Kafka, such as MetaQ or Loghub) realize.Kafka is a kind of distributed hair of high-throughput Cloth subscribes to message system, and the issue (publish) of message is referred to as the producer (producer), the subscription (subscribe) of message Referred to as consumer (consumer).MetaQ is a high-performance, High Availabitity, expansible distributed message middleware.LogHub It is a kind of service of a daily record product, there is provided similar to Kafka business function.
Such data source has two features, and one is divided into multiple subregions, second, each subregion can only be consumed by a thread, The thread is usually to read thread;Under the two features, more efficient handling capacity can be supported.Wherein, subregion be Kafka with And similar product does the fundamental unit of distributed treatment, there is following characteristic:One subregion keeps the first in first out of data Logic;One subregion can only have a thread consumption;Each data has offset (Cursor/offset) record; It is all on the basis of returned data to read each time, also returns and currently reads information where.
Under the scene of big data processing, Kafka business is relatively more, and subregion (Shard/Partition) can also compare It is more.Under the premise of data volume caused by different business is different, the frequency and capacity of each subregion generation data are Difference.In this case, it is necessary to which the processing that mass data is completed under rational resource just must be meticulous, otherwise can waste Substantial amounts of resource and background.
Each subregion can produce the digital independent task of this corresponding subregion, the corresponding reading of each digital independent task respectively Line taking journey, the reading thread by CPU when being performed by the data for reading corresponding subregion.Read for the data of multiple subregions Take, correlation technique includes following Three models.
Pattern one, multithreading are not spaced the pattern of processing independently.
As shown in figure 1, under the pattern, multiple reading threads fight for limited cpu resource, and each thread that reads can be carried out The open-ended reading trial of one circulation.The pattern includes following three features:
1st, each read thread to attempt to read all the time, no matter can read.
2nd, available data can not be read always when some reads thread under extreme case, then equivalent to one endless loop, All users at least consuming a CPU use the time of (user).
If the 3, the average generation low volume data of a subregion comparison, also result in the corresponding thread that reads and read every time less Data are measured, but reading times are relatively more.
Above three feature can all cause the server for the server and Kafka for being responsible for digital independent to be in one all the time The state of high pressure, process resource is wasted, while reduce data throughout.In this case the data processing clothes of user's consumption Business device is much more more than the server that theory needs.
The pattern that the circulation of pattern two, increase Sleep is read.
As shown in Fig. 2 under the pattern, the problem of multithreading is read, is typically optimized to a queue.Main thread will have one Fixed Sleep actions, discharge CPU occupancy.Under the pattern, circulation at regular intervals performs all reading threads, twice Circulation suspends a period of time between performing.
Under the pattern, the smaller subregion of partial data amount, if it is corresponding that the subregion is distributed on when circulation performs every time The reading thread CPU time, because no so much data need to read, it will cause the waste of cpu resource, and some data Bigger subregion is measured, then reading is endless because distribution causes data less than enough CPU times, produces accumulation.
Sleep under pattern three, multithreading, discharge the pattern of cpu resource.
The pattern systhesis both the above pattern, multiple reading threads fight for limited cpu resource, and each reads thread With the fixed Sleep cycles, as shown in Figure 3.
The pattern can cause CPU to seize seriously when reading thread is excessive;Other CPU is switched to another thread needs Context switching is carried out, i.e.,:The running environment for the thread that the running environment of preservation current thread and recovery will be switched to, therefore When reading thread is too many, the frequency that CPU carries out context switching is very high, and a large amount of computing resources can all be cut in CPU context Consumed in changing.
The content of the invention
The application provides a kind of data processing method, device and electronic equipment, more in subregion, and subregion is handled up difference In the case of larger, the reading efficiency of partition data can be improved.
The application adopts the following technical scheme that.
A kind of data processing method, including:
Digital independent task caused by subregion is put into task queue;
When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent is extracted from the task queue Task, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit and taken in turn The reading thread of process resource.
Alternatively, it is described digital independent task caused by subregion is put into task queue to include:
Timestamp is generated for digital independent task caused by subregion, the timestamp starts to perform data reading for instruction At the time of taking task;The digital independent task for carrying timestamp is put into the task queue;
The digital independent task of being extracted from task queue includes:
Digital independent at the time of from the task queue indicated by extraction time stamp prior to or equal to the extraction moment is appointed Business.
Alternatively, described is that digital independent task generation timestamp includes caused by subregion:
For digital independent task caused by subregion, add identified delay length with current time and obtain estimated execution Moment, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;The delay length is according to The data volume that a upper digital independent task for subregion is read determines;The more big then delay length of data volume is shorter.
Alternatively, the data volume that the delay length is read according to a upper digital independent task for the subregion determines Including:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default prolong Corresponding relation between the section of Shi Changdu and data amount, determines the delay length.
Alternatively, in the task queue, at the time of digital independent task is according to indicated by entrained timestamp, from elder generation It is ranked up after;
Described digital independent task caused by subregion is put into task queue also includes:
At the time of according to indicated by the timestamp entrained by digital independent task, the digital independent task is put into task Relevant position in queue.
Alternatively, the task queue is Priority Queues, and the timestamp that digital independent task carries is as the digital independent The priority of task;More forward then priority is higher at the time of indicated by timestamp.
Alternatively, the predetermined upper limit of the thread pool thread number is performed for reading thread in the thread pool Twice of CPU number.
Alternatively, described data processing method also includes:
The reading thread in the thread pool is performed in turn.
A kind of data processing equipment, including:
Queue management module, for digital independent task caused by subregion to be put into task queue;
Extraction module, for when the reading number of threads in thread pool is not up to the predetermined upper limit, from the task queue Middle extraction digital independent task, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used Take the reading thread of process resource in turn in storage.
Alternatively, digital independent task caused by subregion is put into task queue and included by the queue management module:
The queue management module is that digital independent task caused by subregion generates timestamp, and the timestamp is used to indicate At the time of starting to perform the digital independent task;The digital independent task for carrying timestamp is put into the task queue;
The extraction module extracts digital independent task from task queue to be included:
The extraction module from the task queue extraction time stamp indicated by the time of prior to or equal to extraction the moment Digital independent task.
Alternatively, the queue management module is that digital independent task generation timestamp includes caused by subregion:
The queue management module is delayed for digital independent task caused by subregion with current time plus identified Length obtain it is estimated perform the moment, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;It is described The data volume that delay length is read according to a upper digital independent task for the subregion determines;The more big then delay length of data volume It is shorter.
Alternatively, the data volume that the delay length is read according to a upper digital independent task for the subregion determines Including:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default prolong Corresponding relation between the section of Shi Changdu and data amount, determines the delay length.
Alternatively, in the task queue, at the time of digital independent task is according to indicated by entrained timestamp, from elder generation It is ranked up after;
Digital independent task caused by subregion is put into task queue by the queue management module also to be included:
At the time of the queue management module is according to indicated by the timestamp entrained by digital independent task, by the data Reading task is put into the relevant position in task queue.
Alternatively, the task queue is Priority Queues, and the timestamp that digital independent task carries is as the digital independent The priority of task;More forward then priority is higher at the time of indicated by timestamp.
Alternatively, the predetermined upper limit of the thread pool thread number is performed for reading thread in the thread pool Twice of CPU number.
Alternatively, described data processing equipment also includes:
Read module, for performing the reading thread in the thread pool in turn.
A kind of electronic equipment for being used to carry out data processing, including:Memory and processor;
The memory is used to preserve the program for being used for carrying out data processing;The program for being used to carry out data processing exists When reading execution by the processor, following operate is performed:
Digital independent task caused by subregion is put into task queue;
When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent is extracted from the task queue Task, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit and taken in turn The reading thread of process resource.
The application includes advantages below:
It is more in subregion at least one embodiment of the application, and subregion handles up in the case of differing greatly, Ke Yigao Data caused by effect ground reading subregion, rationally utilize cpu resource.On the one hand the embodiment seizes CPU by task queue control The quantity of the thread of resource, therefore be not that the digital independent task of each subregion can take cpu resource, only when according to reading After taking task creation thread and being put into thread pool, the digital independent task could take cpu resource.Thread pool thread Number is less than the number of subregion, therefore large-scale CPU will not occur and seize conflict, and CPU context switching frequency can also drop It is low, so as to reduce the resource that CPU is spent in context switching, improve treatment effeciency.On the other hand, due in turn The thread for taking CPU is reduced, thus CPU give time of each thread can be longer, so as to make CPU be read every time from subregion Take more data.And supplement the thread in thread pool, Ke Yibao by extracting digital independent task from task queue What card was tried one's best uses cpu resource without wasting.
In a kind of implementation of the embodiment of the present application, increase timestamp in the digital independent task in task queue, Extracted when digital independent task is extracted from task queue according to timestamp, so can be by adjusting digital independent task Timestamp, guarantee task calls thread pool to use according to business pressing degree.Alternatively, data caused by a subregion are read The data volume for taking the timestamp of task to be read according to the last time from the subregion determines, can so realize and consumption data (is read The data volume taken) intelligence learning, adaptive adjustment is attempted the cycle, that is, the digital independent task creation line according to the subregion Journey is put into the cycle of thread pool.Alternatively, according to indicated by timestamp priority at the time of, to digital independent in task queue Task is ranked up, and so when extracting digital independent task from task queue, can not had to all numbers in task queue Checked according to the timestamp of the task of reading.
Certainly, implementing any product of the application must be not necessarily required to reach all the above advantage simultaneously.
Brief description of the drawings
Fig. 1 is the schematic diagram of the pattern that multithreading is not spaced processing independently in correlation technique;
Fig. 2 is the schematic diagram of the pattern for the circulation reading for increasing Sleep in correlation technique;
Fig. 3 is the Sleep under multithreading in correlation technique, discharges the schematic diagram of the pattern of cpu resource;
Fig. 4 is the flow chart of the data processing method of embodiment one;
Fig. 5 is that the example of embodiment one realizes schematic diagram;
Fig. 6 is the schematic diagram that task is extracted in the example of embodiment one;
Task is added to the schematic diagram of queue in the example of Fig. 7 embodiments one;
Fig. 8 is the schematic diagram of the data processing equipment of embodiment two.
Embodiment
The technical scheme of the application is described in detail below in conjunction with drawings and Examples.
If it should be noted that not conflicting, each feature in the embodiment of the present application and embodiment can be tied mutually Close, within the protection domain of the application.In addition, though logical order is shown in flow charts, but in some situations Under, can be with the step shown or described by being performed different from order herein.
In a kind of configuration mode, the computing device for data processing may include one or more processors (CPU), defeated Enter/output interface, network interface and internal memory (memory).
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.Internal memory may include module 1, module 2 ... ..., module N (N is the integer more than 2).
Computer-readable medium includes permanent and non-permanent, removable and non-removable media storage medium, can be with Realize that information stores by any method or technique.Information can be computer-readable instruction, data structure, the module of program or Other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, disk storage or other magnetic Property storage device or any other non-transmission medium, the information that can be accessed by a computing device available for storage.According to herein Define, computer-readable medium does not include non-temporary computer readable media (transitory media), such as modulation data Signal and carrier wave.
Embodiment one, a kind of data processing method, as shown in figure 4, including step S110~S120:
S110, digital independent task caused by subregion is put into task queue;
S120, when the reading number of threads in thread pool is not up to the predetermined upper limit, extract number from the task queue According to the task of reading, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit and taken turns Stream takes the reading thread of process resource.
It is more in subregion in the present embodiment, and subregion handles up in the case of differing greatly, and can efficiently read subregion Caused data, rationally utilize cpu resource.On the one hand the embodiment seizes the thread of cpu resource by task queue control Quantity, therefore be not that the digital independent task of each subregion can take cpu resource, only when according to reading task creation line Journey and after being put into thread pool, the digital independent task could take cpu resource.In the case where big data handles scene, subregion Number is very big, and the number of thread pool thread therefore will not occur large-scale CPU and seize punching certainly less than the number of subregion Prominent, CPU context switching frequency can also reduce, and so as to reduce the resource that CPU is spent in context switching, improve Treatment effeciency.On the other hand, because the thread for taking CPU in turn is reduced, thus CPU give time of each thread can be compared with It is long, so as to make CPU read more data from subregion every time.And by extracting digital independent task from task queue To supplement the thread in thread pool, it is ensured that tries one's best uses cpu resource without wasting.
In the present embodiment, thread pool is a kind of multiple threads form, and the reading thread in thread pool can all be backstage Thread.Each thread that reads can use the storehouse size of acquiescence, be run with the priority of acquiescence, and be in multiple thread units In.If some reads, thread is idle in Managed Code (as waited some event), and thread pool will insert another Worker thread is busy to make all processors holdings.If all reading threads all remain busy in thread pool, but team Comprising the work hung up in row, then thread pool will create another worker thread over time, but read the number of thread Never exceed the predetermined upper limit.Reading thread more than the predetermined upper limit can be lined up, when other read lines in thread pool Just start after the completion of journey.
In the present embodiment, reading thread in thread pool can by seizing, the mode such as poll take process resource in turn; For example can be that different reading threads distribute timeslice, processing money can be taken in the timeslice distributed by reading thread Source.
In the present embodiment, process resource that the reading thread in thread pool takes in turn, it can refer to be used to perform line The cpu resource of reading thread in Cheng Chi.
In the present embodiment, each subregion can produce digital independent task respectively;Because a subregion can only be consumed by thread, because This will not produce new digital independent task before the digital independent task of a subregion does not complete.Caused by one subregion Digital independent task is first placed into task queue, and reading thread can be accordingly established after being extracted, is put into the thread pool;Should Respective partition progress digital independent will be directed to by reading thread.For example assume it is the reading according to subregion A digital independent task creation Line taking journey, then subregion A data can be read out;After the completion of the reading thread performs, subregion A digital independent task is complete Into;If subregion A also needs to carry out digital independent again, new digital independent task is produced, is put into task queue.
In a kind of implementation, can by for being read out to the data of each subregion computing device (can with but it is unlimited Then server) above-mentioned data processing method is performed, a part of process resource is distributed by the computing device to perform above-mentioned steps S110~S120, this part process resource can regard a central control system as;Remaining process resource is completely or partially used to hold Reading thread in the row thread pool.
In a kind of implementation, can also it be deposited in addition to the computing device for being read out to the data of each subregion In the above-mentioned above-mentioned steps S110~S120 of another independent computing device, for for being read out to the data of each subregion Computing device safeguard thread pool, in the thread pool as described in the computing device for being read out to the data of each subregion Read thread.
In a kind of implementation, digital independent task can be periodically extracted from task queue;Line can be worked as after extraction It is read line by the digital independent Task Switching of extraction during reading number of threads deficiency (be not up to the predetermined upper limit) in Cheng Chi Journey is put into thread pool, that is, starts the reading thread established.
In a kind of implementation or when the reading number of threads deficiency in thread pool, just from task queue Digital independent task is extracted, and establishes reading thread and is put into thread pool.Or can be with:Task is usually periodically extracted, if not It is insufficient that number of threads is read when reaching extracting cycle in thread pool, then is extracted immediately.
In a kind of implementation, the predetermined upper limit of thread pool thread number can be performed for the thread pool center line Twice of CPU number of journey.
Than being used to perform reading thread in the thread pool if any 4 CPU, then thread pool thread number it is predetermined on 8 are limited to, that is, can have up to 8 reading threads.
In other implementations, the predetermined upper limit of thread pool thread number may be designed in other sizes, Ke Yigen Determined according to empirical value or experiment.Generally, the predetermined upper limit is less than the number for the subregion to be read data.
In a kind of implementation, digital independent task caused by subregion is put into task queue and can included:
Timestamp is generated for digital independent task caused by subregion, the timestamp starts to perform data reading for instruction At the time of taking task;The digital independent task for carrying timestamp is put into the task queue;
Digital independent task is extracted from task queue to be included:
Digital independent at the time of from the task queue indicated by extraction time stamp prior to or equal to the extraction moment is appointed Business.
In this implementation, the extraction moment can be, but not limited to refer to start to extract digital independent task from task queue At the time of;Being equal to the extraction moment at the time of indicated by timestamp refers to, is exactly to extract the moment at the time of indicated by timestamp, the time Refer at the time of indicated by stamp prior to the extraction moment, at the time of indicated by timestamp before the moment is extracted;For example extract the moment Be some day 6 points 55 seconds 18 minutes, timestamp indicate at the time of be on the same day 6 points 55 seconds 18 minutes, then be equal to extraction the moment, such as Situations such as fruit is 6 points 54 seconds 18 minutes, 6: 12 17: on the same day, all it is prior to extracting the moment.
In this implementation, by adjusting the timestamp of digital independent task digital independent task can be ensured according to industry Pressing degree be engaged in call thread pool to use.Such as digital independent task caused by subregion corresponding to urgent business, it is raw The timestamp earlier at the time of indicated into one.
In other implementations, it can also otherwise determine to extract the order of digital independent task, such as can be by Digital independent task is extracted according to the rule of first in first out, can be for another example digital independent task before task queue is put into Increase priority, the pressing degree that the priority corresponds to business according to subregion determines that more urgent then priority is higher;Pressed during extraction According to priority from height toward low order, extracted from task queue.
It is that digital independent task generation timestamp can wrap caused by subregion in a kind of alternative of this implementation Include:
For digital independent task caused by subregion, add identified delay length with current time and obtain estimated execution Moment, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;The delay length can basis Data volume (the read line established according to the digital independent task that a upper digital independent task for the subregion is read The total amount of data that journey is read in the process of implementation) determine;The more big then delay length of data volume is shorter.
In this alternative, current time can be, but not limited to refer to by the digital independent task add task queue when At the time of at the time of carving or receive the digital independent task or being digital independent task generation timestamp etc..Represent estimated to hold The information at row moment can be a numerical value or a Serial No., can also be the moment information such as in itself.
In this alternative, such as when the data volume that is read when the last time from subregion A is smaller, when subregion A produces number again According to read task when, for the digital independent task generation timestamp indicated by the time of will compare rearward, that is to say, that meeting It is wait a period of time to carry out digital independent to subregion A again more.If it is last from the data volume that subregion A is read it is larger when, when point When area A produces digital independent task again, for the digital independent task generation timestamp indicated by the time of will be earlier, That is digital independent can be carried out to subregion A again as early as possible.
In this alternative, priority at the time of indicated by the timestamp of the digital independent task of a subregion, depend on The size for the data volume that last time reads from the subregion, therefore can realize and the intelligence of consumption data (data volume read) is learned Practise, with the prediction following time waited, the data volume read according to last time, improve or reduce the trial cycle next time as one sees fit, The cycle of thread pool is namely put into according to the digital independent task creation thread of the subregion.
In this alternative, the corresponding relation between the section of the data volume of reading and delay length can be pre-established; The data volume that can be read according to a upper task determines the section fallen into, true using the delay length corresponding to the section as institute Fixed delay length.
Wherein, the data volume that delay length is read according to a upper digital independent task for the subregion determines to wrap Include:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default prolong Corresponding relation between the section of Shi Changdu and data amount, determines the delay length.
In other alternatives, timestamp can also be otherwise determined;Such as according to this digital independent task Priority, predicted data amount etc. determine timestamp.
In this alternative, for the first task of a subregion, information at the time of representing to be put into the task can be used Timestamp as the task;When can also be with estimated perform be obtained plus predetermined delay length at the time of being put into the task Carve, by the use of representing timestamp of the estimated information for performing the moment as the task.It is described in a kind of alternative of this implementation In task queue, at the time of digital independent task can be according to indicated by entrained timestamp, it is ranked up after arriving first; I.e.:More forward at the time of indicated by timestamp, then position of the task in queue is more forward.
Digital independent task caused by subregion is put into task queue and can also included:
At the time of according to indicated by the timestamp entrained by digital independent task, the digital independent task is put into task Relevant position in queue.
Such as digital independent task T1 timestamp instruction at the time of be some day 3 points 20 seconds 15 minutes, digital independent appoint Business T2 timestamp instruction at the time of be on the same day 3 points 28 seconds 15 minutes, then in task queue, digital independent task T2 sequence exists After digital independent task T1.If it is same at the time of the digital independent task T3 of task queue to be added timestamp instruction It 3 points 15 minutes and 23 seconds, then in task queue, after digital independent task T3 is placed on into digital independent task T1, data are read Before taking task T2.
In this alternative, when extracting digital independent task from task queue, extract previous in task queue Or multiple digital independent tasks, it can not have to check the timestamp of all digital independent tasks in task queue; As long as according to the timestamp of the sequential search digital independent task after arriving first in task queue, read once there are a data The time indicated by the timestamp of task is taken just not have to reexamine the posterior digital independent task of sequence not prior to extracting the moment Timestamp.
In this alternative, if had just in task queue indicated by the timestamp of multiple digital independent tasks It is identical, this multiple tasks can be ranked up according to the order or other conditions for adding task queue.
In this alternative, the queue can regard a Priority Queues, the timestamp that digital independent task carries as Priority as the digital independent task;More forward then priority is higher at the time of indicated by timestamp.Priority Queues has The superlative degree first goes out the behavioural characteristic of (largest-in, first-out), in above-mentioned Priority Queues, has limit priority Digital independent task will arrange up front, can be extracted at first.
One example of the present embodiment is as shown in figure 5, can apply to the big data processing scene of complexity, for example separate unit takes Business device needs processing 3000 with the reading of the data of last subregion, the situation that the data volume of each subregion is had nothing in common with each other.This example In son, above-mentioned steps S110~S120 is performed by the central control system in server.
In this example, the digital independent task of all subregions is placed in a Priority Queues, wherein, a subregion is same One time only existed a digital independent task.Each digital independent task carries timestamp, is somebody's turn to do for indicating to start to perform At the time of digital independent task.In Priority Queues, the timestamp that digital independent task carries is as the digital independent task Priority, more forward at the time of indicated by timestamp, then the priority of digital independent task is higher.
Central control system is responsible for checking Priority Queues, digital independent task is extracted in order from Priority Queues, according to being carried The new reading thread of the digital independent task creation that takes is put into thread pool, each read line in CPU rotation execution threads pond Journey.Wherein, the number of the reading thread in thread pool has the upper limit, and the reading thread of completion can be deleted from thread pool, line When Thread Count in Cheng Chi is not up to the upper limit, it can just be put into or start new reading thread.For example assume in current thread pool also A thread can be put into, then central control system can be established new according to the digital independent task to be made number one in Priority Queues Reading thread be put into thread pool.
Central control system can periodically extract digital independent task from Priority Queues, first at the time of by indicated by timestamp In or equal to extraction the moment digital independent task all extract.
In the case of periodically extracting, certain extraction process of this example is as shown in Figure 6, it is assumed that each timestamp is one The individual numerical value for representing the moment, more forward then numerical value is smaller at the time of represented, i.e.,:If represent morning some day 8:30 numerical value For X, the morning 9 on the same day is represented:00 numerical value is Y, then Y > X;Represent the morning 8 one day after:30 numerical value is Z, then Z > Y > X.Assuming that the timestamp of 6 tasks present in Priority Queues is 10,20,30,40,50,60 respectively, the extraction moment is represented Numerical value is 32, then indicated by timestamp 10,20,30 at the time of prior to extract the moment, extraction time stamp be 10,20,30 task. Understand for convenience in Fig. 6, timestamp employs fairly simple numerical value, and the timestamp in practical application is not limited in Fig. 6 Example.
In the case of periodically extracting, central control system can first be directed to extracted digital independent task creation read line Journey, but be not put into thread pool, i.e.,:First the reading thread is not started;It is suitable according to arrangement of the digital independent task in Priority Queues Sequence, not actuated reading thread is ranked up, when thread pool can be put into new reading thread, central control system is according to arrangement The reading thread of foundation is put into thread pool (the reading thread for starting foundation successively) by order successively.Such as central control system extraction 3 digital independent tasks, when thread pool can be put into a new thread, central control system be arranged in task when will be according to extraction The reading thread that the primary data extraction task of queue is established is put into thread pool;Subsequently when again thread pool can be put into one During new thread, the read line that the deputy digital independent task of task queue is established is arranged in when central control system is by according to extraction Journey is put into thread pool.
Central control system can also in the thread deficiency in thread pool, further according to can be put into reading thread number, from The digital independent task of corresponding number is sequentially extracted in Priority Queues, establishes and reads thread.When extracting digital independent task, also Prior to extracting the digital independent task at moment at the time of can be indicated by extraction time stamp, than situation as shown in Figure 6, no matter The new thread of 3 or 4 or more can be established at present, all only extract 3 digital independent tasks.
Prior to or equal in the case of extracting the task at moment, timestamp is signified at the time of indicated by only extraction time stamp Task of the extraction moment is later than at the time of showing is disregarded, and can so realize Sleep effect.
It is that digital independent task generates timestamp, and root by central control system when subregion produces new digital independent task The correct position being put into digital independent task according to the timestamp in Priority Queues.As shown in Figure 7, it is assumed that exist in Priority Queues The timestamps of 6 digital independent tasks be 10,20,30,40,50,60 respectively, the timestamp of new digital independent task is 35, then after the digital independent task is put into timestamp as 30 digital independent task by central control system, timestamp is 40 number Before reading task.If had just, the timestamp of multiple digital independent tasks is essentially equal, can be according to the preferential team of addition The order of row is to this multiple digital independent task ranking.Understand for convenience in Fig. 7, timestamp employs fairly simple number It is worth, the timestamp in practical application is not limited to the example in Fig. 7.
Central control system can be used as this for the first digital independent task of a subregion by the use of the numerical value for representing current time The timestamp of digital independent task;Can also with currently plus scheduled time length obtain it is estimated perform the moment, with representing it is expected that Perform timestamp of the numerical value at moment as the digital independent task.Wherein, current time is such as but not limited to refer to count this Given birth at the time of adding Priority Queues according to reading task or at the time of receiving the digital independent task or for the digital independent task Into at the time of timestamp etc..
Central control system can be prolonged for the non-first digital independent task of a subregion with current time plus identified Shi Changdu obtain it is estimated perform the moment, by the use of representing timestamp of the estimated numerical value for performing the moment as the digital independent task.Its In, delay length is according to the data volume that a digital independent task is read on the subregion (i.e. according to the digital independent task institute The data volume for reading thread and reading of foundation) determine;The more big then delay length of data volume is shorter.In this example, using table one come Determine delay length.At the time of wherein, current time is such as but not limited to refer to that the digital independent task added into Priority Queues, Or at the time of at the time of receiving the digital independent task or for the digital independent task generating timestamp etc..
The mapping table of table one, the data volume read and delay length
The data volume (bit) that last time reads Delay length Remarks
0 5 seconds May there is no data
Less than 1000 1 second
More than 1000 and less than 5000 500 milliseconds
More than 5000 and less than 10000 200 milliseconds
More than 10000 50 milliseconds Limit priority
Embodiment two, a kind of data processing equipment, as shown in figure 8, including:
Queue management module 81, for digital independent task caused by subregion to be put into task queue;
Extraction module 82, for when the reading number of threads in thread pool is not up to the predetermined upper limit, from the task team Digital independent task is extracted in row, reading thread according to the task creation of extraction is put into the thread pool;Wherein, the thread pool Take the reading thread of process resource in turn for depositing.
In the present embodiment, the queue management module 81 is to be responsible for adding digital independent task in above-mentioned data processing equipment Enter the part of task queue, can be software, hardware, or both combination.
In the present embodiment, the extraction module 82 is to be responsible for the data in task queue in above-mentioned data processing equipment Reading task produce read thread part, can be software, hardware, or both combination.
In a kind of implementation, the data processing equipment is integrated in the computing device (ratio being read out to partition data As but be not limited to server) in, the data processing equipment can also include:
Read module, for performing the reading thread in the thread pool in turn.
The read module is to be responsible for performing the portion that thread reads data from subregion that reads in above-mentioned data processing equipment Point, can be software, hardware, or both combination.
In other implementations, above-mentioned data processing equipment can also with for being set to the calculating that partition data is read out It is standby each independent, the reading thread in the thread pool as described in the computing device for being read out to partition data.
In a kind of implementation, digital independent task caused by subregion is put into task queue by the queue management module It can include:
The queue management module is that digital independent task caused by subregion generates timestamp, and the timestamp is used to indicate At the time of starting to perform the digital independent task;The digital independent task for carrying timestamp is put into the task queue;
The extraction module extracts digital independent task from task queue to be included:
The extraction module from the task queue extraction time stamp indicated by the time of prior to or equal to extraction the moment Digital independent task.
In a kind of alternative of this implementation, the queue management module is that digital independent task caused by subregion is given birth to It can include into timestamp:
The queue management module is delayed for digital independent task caused by subregion with current time plus identified Length obtain it is estimated perform the moment, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;It is described The data volume that delay length is read according to a upper digital independent task for the subregion determines;The more big then delay length of data volume It is shorter.
In this alternative, data that the delay length is read according to a upper digital independent task for the subregion Amount determination can include:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default prolong Corresponding relation between the section of Shi Changdu and data amount, determines the delay length.
In a kind of alternative of this implementation, in the task queue, digital independent task can be according to entrained Timestamp indicated by the time of, be ranked up after arriving first;
Digital independent task caused by subregion is put into task queue and can also included by the queue management module:
At the time of the queue management module is according to indicated by the timestamp entrained by digital independent task, by the data Reading task is put into the relevant position in task queue.
In this alternative, the task queue can be Priority Queues, and the timestamp that digital independent task carries can be with Priority as the digital independent task;More forward then priority is higher at the time of indicated by timestamp.
In a kind of implementation, the predetermined upper limit of the thread pool thread number can be performed for the thread pool Twice of middle CPU number for reading thread.
Operation performed by the module of the device of the present embodiment corresponds respectively to step S110~S120 of embodiment one, respectively Other details of module can be found in embodiment one.
Embodiment three, a kind of electronic equipment for being used to carry out data processing, including:Memory and processor;
The memory is used to preserve the program for being used for carrying out data processing;The program for being used to carry out data processing exists When reading execution by the processor, following operate is performed:
Digital independent task caused by subregion is put into task queue;
When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent is extracted from the task queue Task, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit and taken in turn The reading thread of process resource.
For the program for being used to carry out data processing in the present embodiment when being read out by the processor execution, performed operation is corresponding In step S110~S120 of embodiment one, other details of the operation performed by the program can be found in embodiment one.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD Deng.Alternatively, all or part of step of above-described embodiment can also be realized using one or more integrated circuits.Accordingly Ground, each module/unit in above-described embodiment can be realized in the form of hardware, can also use the shape of software function module Formula is realized.The application is not restricted to the combination of the hardware and software of any particular form.
Certainly, the application can also have other various embodiments, ripe in the case of without departing substantially from the application spirit and its essence Know those skilled in the art when can be made according to the application it is various it is corresponding change and deformation, but these corresponding change and become Shape should all belong to the protection domain of claims hereof.

Claims (17)

1. a kind of data processing method, including:
Digital independent task caused by subregion is put into task queue;
When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent times is extracted from the task queue Business, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit in turn at occupancy Manage the reading thread of resource.
2. data processing method as claimed in claim 1, it is characterised in that described to put digital independent task caused by subregion Entering task queue includes:
Timestamp is generated for digital independent task caused by subregion, the timestamp starts to perform the digital independent times for instruction At the time of business;The digital independent task for carrying timestamp is put into the task queue;
The digital independent task of being extracted from task queue includes:
From the task queue extraction time stamp indicated by the time of prior to or equal to extraction the moment digital independent task.
3. data processing method as claimed in claim 2, it is characterised in that described is that digital independent task caused by subregion is given birth to Include into timestamp:
For digital independent task caused by subregion, add identified delay length with current time and obtain it is expected that when performing Carve, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;The delay length is according to described point The data volume that a upper digital independent task in area is read determines;The more big then delay length of data volume is shorter.
4. data processing method as claimed in claim 3, it is characterised in that the delay length is according to upper the one of the subregion The data volume that individual digital independent task is read determines to include:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default delay length Corresponding relation between degree and the section of data volume, determines the delay length.
5. data processing method as claimed in claim 2, it is characterised in that:In the task queue, digital independent task is pressed At the time of according to indicated by entrained timestamp, it is ranked up after arriving first;
Described digital independent task caused by subregion is put into task queue also includes:
At the time of according to indicated by the timestamp entrained by digital independent task, the digital independent task is put into task queue In relevant position.
6. data processing method as claimed in claim 5, it is characterised in that:
The task queue is Priority Queues, and the timestamp that digital independent task carries is as the preferential of the digital independent task Level;More forward then priority is higher at the time of indicated by timestamp.
7. data processing method as claimed in claim 1, it is characterised in that:
The predetermined upper limit of the thread pool thread number is performed for reading the two of CPU number of thread in the thread pool Times.
8. such as data processing method according to any one of claims 1 to 7, it is characterised in that also include:Described in performing in turn Reading thread in thread pool.
A kind of 9. data processing equipment, it is characterised in that including:
Queue management module, for digital independent task caused by subregion to be put into task queue;
Extraction module, for when the reading number of threads in thread pool is not up to the predetermined upper limit, being carried from the task queue Digital independent task is taken, reading thread according to the task creation of extraction is put into the thread pool;Wherein, the thread pool is used to deposit Put the reading thread for taking process resource in turn.
10. data processing equipment as claimed in claim 9, it is characterised in that the queue management module is by caused by subregion Digital independent task, which is put into task queue, to be included:
The queue management module is that digital independent task caused by subregion generates timestamp, and the timestamp starts for instruction At the time of performing the digital independent task;The digital independent task for carrying timestamp is put into the task queue;
The extraction module extracts digital independent task from task queue to be included:
The extraction module from the task queue extraction time stamp indicated by the time of prior to or equal to extraction the moment number According to the task of reading.
11. data processing equipment as claimed in claim 10, it is characterised in that the queue management module is caused by subregion Digital independent task generation timestamp includes:
The queue management module adds identified delay length for digital independent task caused by subregion with current time Obtain it is estimated perform the moment, by the use of representing timestamp of the estimated information for performing the moment as the digital independent task;The delay The data volume that length is read according to a upper digital independent task for the subregion determines;The more big then delay length of data volume is more It is short.
12. data processing equipment as claimed in claim 11, it is characterised in that the delay length is upper according to the subregion The data volume that one digital independent task is read determines to include:
The section belonging to data volume read according to a upper digital independent task for the subregion, and default delay length Corresponding relation between degree and the section of data volume, determines the delay length.
13. data processing equipment as claimed in claim 10, it is characterised in that:In the task queue, digital independent task At the time of according to indicated by entrained timestamp, it is ranked up after arriving first;
Digital independent task caused by subregion is put into task queue by the queue management module also to be included:
At the time of the queue management module is according to indicated by the timestamp entrained by digital independent task, by the digital independent Task is put into the relevant position in task queue.
14. data processing equipment as claimed in claim 13, it is characterised in that:
The task queue is Priority Queues, and the timestamp that digital independent task carries is as the preferential of the digital independent task Level;More forward then priority is higher at the time of indicated by timestamp.
15. data processing equipment as claimed in claim 9, it is characterised in that:
The predetermined upper limit of the thread pool thread number is performed for reading the two of CPU number of thread in the thread pool Times.
16. the data processing equipment as any one of claim 9~15, it is characterised in that also include:
Read module, for performing the reading thread in the thread pool in turn.
17. a kind of electronic equipment for being used to carry out data processing, including:Memory and processor;
It is characterized in that:The memory is used to preserve the program for being used for carrying out data processing;It is described to be used to carry out data processing Program read by the processor perform when, perform following operate:
Digital independent task caused by subregion is put into task queue;
When the reading number of threads in thread pool is not up to the predetermined upper limit, digital independent times is extracted from the task queue Business, thread is read according to the task creation of extraction and is put into the thread pool;Wherein, the thread pool is used to deposit in turn at occupancy Manage the reading thread of resource.
CN201610818710.4A 2016-09-12 2016-09-12 Data processing method and device and electronic equipment Active CN107818012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610818710.4A CN107818012B (en) 2016-09-12 2016-09-12 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610818710.4A CN107818012B (en) 2016-09-12 2016-09-12 Data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN107818012A true CN107818012A (en) 2018-03-20
CN107818012B CN107818012B (en) 2021-08-27

Family

ID=61601210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610818710.4A Active CN107818012B (en) 2016-09-12 2016-09-12 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107818012B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086138A (en) * 2018-08-07 2018-12-25 北京京东金融科技控股有限公司 Data processing method and system
CN109840149A (en) * 2019-02-14 2019-06-04 百度在线网络技术(北京)有限公司 Method for scheduling task, device, equipment and storage medium
CN111259246A (en) * 2020-01-17 2020-06-09 北京达佳互联信息技术有限公司 Information pushing method and device, electronic equipment and storage medium
CN111367627A (en) * 2018-12-26 2020-07-03 北京奇虎科技有限公司 Processing method and device for disk reading and writing task
CN114519017A (en) * 2020-11-18 2022-05-20 舜宇光学(浙江)研究院有限公司 Data transmission method for event camera, system and electronic equipment thereof
CN117082307A (en) * 2023-10-13 2023-11-17 天津幻彩科技有限公司 Three-dimensional scene stream data play control method and device based on fluency improvement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591721A (en) * 2011-12-30 2012-07-18 北京新媒传信科技有限公司 Method and system for distributing thread execution task
CN103324525A (en) * 2013-07-03 2013-09-25 东南大学 Task scheduling method in cloud computing environment
CN103955491A (en) * 2014-04-15 2014-07-30 南威软件股份有限公司 Method for synchronizing timing data increment
US20150058858A1 (en) * 2013-08-21 2015-02-26 Hasso-Platt ner-Institut fur Softwaresystemtechnik GmbH Dynamic task prioritization for in-memory databases

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591721A (en) * 2011-12-30 2012-07-18 北京新媒传信科技有限公司 Method and system for distributing thread execution task
CN103324525A (en) * 2013-07-03 2013-09-25 东南大学 Task scheduling method in cloud computing environment
US20150058858A1 (en) * 2013-08-21 2015-02-26 Hasso-Platt ner-Institut fur Softwaresystemtechnik GmbH Dynamic task prioritization for in-memory databases
CN103955491A (en) * 2014-04-15 2014-07-30 南威软件股份有限公司 Method for synchronizing timing data increment

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086138A (en) * 2018-08-07 2018-12-25 北京京东金融科技控股有限公司 Data processing method and system
CN111367627A (en) * 2018-12-26 2020-07-03 北京奇虎科技有限公司 Processing method and device for disk reading and writing task
CN111367627B (en) * 2018-12-26 2024-02-13 三六零科技集团有限公司 Method and device for processing read-write disk task
CN109840149A (en) * 2019-02-14 2019-06-04 百度在线网络技术(北京)有限公司 Method for scheduling task, device, equipment and storage medium
CN109840149B (en) * 2019-02-14 2021-07-30 百度在线网络技术(北京)有限公司 Task scheduling method, device, equipment and storage medium
CN111259246A (en) * 2020-01-17 2020-06-09 北京达佳互联信息技术有限公司 Information pushing method and device, electronic equipment and storage medium
CN114519017A (en) * 2020-11-18 2022-05-20 舜宇光学(浙江)研究院有限公司 Data transmission method for event camera, system and electronic equipment thereof
CN114519017B (en) * 2020-11-18 2024-03-29 舜宇光学(浙江)研究院有限公司 Data transmission method for event camera, system and electronic equipment thereof
CN117082307A (en) * 2023-10-13 2023-11-17 天津幻彩科技有限公司 Three-dimensional scene stream data play control method and device based on fluency improvement
CN117082307B (en) * 2023-10-13 2023-12-29 天津幻彩科技有限公司 Three-dimensional scene stream data play control method and device based on fluency improvement

Also Published As

Publication number Publication date
CN107818012B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN107818012A (en) A kind of data processing method, device and electronic equipment
CN101833489B (en) Method for file real-time monitoring and intelligent backup
CN109144699A (en) Distributed task dispatching method, apparatus and system
CN111506430B (en) Method and device for processing data under multitasking and electronic equipment
EP3129880A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
CN102004670A (en) Self-adaptive job scheduling method based on MapReduce
CN107908471B (en) Task parallel processing method and processing system
CN103106152A (en) Data scheduling method based on gradation storage medium
CN113138860B (en) Message queue management method and device
Xiao et al. A priority based scheduling strategy for virtual machine allocations in cloud computing environment
CN110278257A (en) A kind of method of mobilism configuration distributed type assemblies node label
CN109656684A (en) A kind of partition method of Kafka, partition system and relevant apparatus
CN110618860A (en) Spark-based Kafka consumption concurrent processing method and device
CN112269632A (en) Scheduling method and system for optimizing cloud data center
CN109450803A (en) Traffic scheduling method, device and system
CN104182295B (en) A kind of data back up method and device
CN104050193B (en) Generate the method for message and realize the data handling system of this method
CN105049524B (en) A method of the large-scale dataset based on HDFS loads
CN103685492A (en) Dispatching method, dispatching device and application of Hadoop trunking system
CN115291806A (en) Processing method, processing device, electronic equipment and storage medium
CN107038067A (en) The management method and device of process resource in distributed stream processing
CN110321204A (en) Computing system, hardware accelerator management method and device and storage medium
CN108647007A (en) Arithmetic system and chip
CN112860401A (en) Task scheduling method and device, electronic equipment and storage medium
CN115756143B (en) Energy-saving method and device for data packet processing, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant