WO2021159930A1 - 用户级线程控制系统及其方法 - Google Patents

用户级线程控制系统及其方法 Download PDF

Info

Publication number
WO2021159930A1
WO2021159930A1 PCT/CN2021/072790 CN2021072790W WO2021159930A1 WO 2021159930 A1 WO2021159930 A1 WO 2021159930A1 CN 2021072790 W CN2021072790 W CN 2021072790W WO 2021159930 A1 WO2021159930 A1 WO 2021159930A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
thread
level
kernel
message
Prior art date
Application number
PCT/CN2021/072790
Other languages
English (en)
French (fr)
Inventor
袁进辉
牛冲
柳俊丞
李新奇
Original Assignee
北京一流科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京一流科技有限公司 filed Critical 北京一流科技有限公司
Publication of WO2021159930A1 publication Critical patent/WO2021159930A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • the present disclosure relates to a user-level thread control technology. More specifically, the present disclosure relates to a control system and method for classifying and controlling user-level threads.
  • a kernel thread manages multiple user-level threads constituting the task to complete the entire task.
  • a kernel thread corresponding to the task may contain multiple operation subtasks. Therefore, a part of a thread may be used for calculation, and some may be used for data reading and writing. Into, another part may involve disk operations. Therefore, a kernel thread manages multiple user-level threads of the task based on the timing of the task in order to complete the task.
  • the kernel thread may be in a waiting state for a long time, causing the thread to eventually sleep until the result of the disk operation ends and the thread is awakened.
  • the CPU core is likely to be idle for a period of time, which will lead to inefficient use of the CPU core.
  • how to control user-level threads can not only prevent the CPU core from carrying a large number of core threads, but also eliminate the waiting state of the core threads and improve the use efficiency of the CPU, which is a technical problem that people need to solve.
  • the present disclosure provides a user-level thread control system, including: a label preset component, which classifies multiple tasks with the same position mark and operation type in all job tasks into similar tasks and assigns the same label to the same tasks And a kernel thread creation component, which creates a kernel thread based on each of the tags, and at the same time creates a user-level thread of the same tag with the same tag based on each task with the tag, wherein the kernel thread includes common use on the kernel thread
  • the shared message bin of the user-level thread is used to trigger the corresponding user-level thread to perform a predetermined operation when receiving any message whose destination ID is the user-level thread ID associated with the kernel thread.
  • the user-level thread control system further includes: a kernel thread preparation component, which is used to count the number of labels assigned by the label preset component, and prepare a kernel thread corresponding to each label.
  • a kernel thread preparation component which is used to count the number of labels assigned by the label preset component, and prepare a kernel thread corresponding to each label.
  • the message warehouse has a message queue, and the messages in the message queue are arranged in a time sequence of message reception and trigger corresponding user-level threads to perform predetermined operations in a first-in first-out manner.
  • the user-level thread includes a state machine and an operation unit
  • the predetermined operation includes changing the state of the user-level thread's state machine, sending messages through a shared message bin, and through the The operating unit of the user-level thread issues operating instructions.
  • the operation type includes an arithmetic operation type and a handling operation type.
  • the transport operation type includes the transport operation type from the host to the computing device, the transport operation type from the computing device to the host, the transport operation type from the first host to the second host, The type of handling operations for disk reads and writes.
  • the operation operation type includes a data operation operation type and a parameter update operation type.
  • a user-level thread control method which includes: a label presetting step of classifying multiple task nodes with the same position mark and operation type in a task node topology map through a label presetting component Give task nodes of the same type and assign the same label to the task nodes of the same type; and the kernel thread creation step, through the kernel thread creation component, creates a kernel thread based on each of the tags, and at the same time creates the same tag based on each task node with the tag
  • the kernel thread includes a shared message bin shared by the user-level threads on the kernel thread, for receiving any message with a destination ID of the user-level thread ID associated with the kernel thread When, trigger the corresponding user-level thread to perform a predetermined operation.
  • the user-level thread control method further includes: a kernel thread preparation step, which counts the number of labels assigned by the label preset component through the kernel thread preparation component, and prepares the same number of kernel threads for the task node topology map, Thus, a kernel thread is prepared for each label.
  • the message bin has a message queue, and the messages in the message queue are arranged in the time sequence of message reception and trigger the corresponding user-level thread to perform a predetermined operation in a first-in-first-out manner.
  • the user-level thread includes a state machine and an operation unit
  • the predetermined operation includes changing the state of the user-level thread's state machine, sending a message through a shared message bin, and through the The operating unit of the user-level thread issues operating instructions.
  • the kernel thread uses a message-driven method for similar user-level threads in the kernel thread to control the operation of the user-level thread, and when the user-level thread is driven by the message, it only makes state changes As well as sending instructions corresponding to the operation task to the user space, the time required for the change of the state of the finite state machine and the issuance of the operation task instruction is extremely short. Therefore, the kernel thread is basically not in a waiting state during the data processing process, let alone waiting for It takes too long to enter the dormant state, which also eliminates the need for the operating system to frequently wake up kernel threads.
  • Adopting the user-level thread control system of the present disclosure can enable the kernel thread to efficiently use CPU resources, so that the CPU resources will not be idle due to the kernel thread being in the waiting or dormant state, which leads to the waste of CPU resources.
  • different types of user-level threads in the same task processing path users complete a total task
  • the state of waiting for each other More importantly, it avoids the situation where the CPU creates a kernel thread for each user-level thread, causing many kernel threads to wait for each other.
  • Figure 1 shows a schematic diagram of the principle of a user-level thread control system according to the present disclosure.
  • FIG. 2 shows a topology diagram 102 of a complete task node containing tags according to the present disclosure.
  • Figure 3 shows a schematic diagram of the principle of a kernel thread controlling the user-level threads of the same position mark that are the same operation type.
  • Figure 4 shows a schematic diagram of the principle of a kernel thread controlling a user-level thread with the same position mark that is the same as the transport type.
  • Figure 5 shows a schematic diagram of the principle of a user-level thread with the same position mark in which the kernel thread controls the same type of transport.
  • first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • one of the two possible position markers may be referred to as the first position marker or the second position marker, and similarly, the other of the two possible position markers One can be called the second position mark or the first logical position mark.
  • the word "if” as used herein can be interpreted as "when” or "when” or "in response to determination”.
  • FIG. 1 shows a schematic diagram of the principle of a user-level thread control system 10 according to the present disclosure.
  • the user-level thread control system 10 includes a label preset component 11 and a kernel thread creation component 13.
  • the label presetting component 11 classifies multiple task nodes with the same position label and operation type in the complete task node topology map 101 into task nodes of the same type and assigns the same label to the task nodes of the same type, thereby forming the task node shown in FIG. 2
  • a complete task node topology diagram 102 with labels As shown in FIG. 1, in the user-level thread control system adopting the present disclosure, the task node of the user-level thread to be generated is given a position mark and an operation type mark.
  • the complete task node topology diagram 101 shown in FIG. 1 includes a computing task node ON and a transportation task node MN.
  • All classification marks used in the present disclosure are only for convenience of description, and any marks that can be distinguished from each other can be used in practical applications.
  • Each computing task node comes from the computing logic node in the computing logic topology diagram.
  • the job task In the processing of deep learning job tasks, when the job task will be fragmented on a distributed architecture, it usually includes one or more hosts, and each host will be connected to multiple computing devices, such as GPUs, TPUs and other dedicated Computing equipment for large-scale simple operations.
  • the large-scale data blocks that need to be processed are usually divided into multiple computing devices for parallel processing.
  • the model can usually be divided and distributed to different computing devices for processing.
  • Each computing logic node is fragmented based on the computing resources and job task descriptions contained in the user's task configuration data, so that each fragmented tensor is deployed on different computing devices for fragmentation processing.
  • HOST host
  • GPU0 and GPU1 the data can be divided into two parts along the 0th dimension of the data and distributed to GPU0 and GPU1.
  • the segment of the computing logical node is assigned to the computing task node on the host H1's GP0 with the position mark G0/H1, and similarly, the segment of the computing logical node is assigned to the GPU1 of the host H1
  • the arithmetic task node above is assigned a position mark G1/H1.
  • the generation method of transforming the topological graph of the operational logic node into the topological graph of the operational task node belongs to the conventional technology in the field, and therefore will not be repeated here.
  • the computing task nodes E1 and E2 shown in Figure 1 are the computing task nodes formed by the computing logic node E assigned to the GPU0 and GPU1 of the host H1, so their positions are marked as G0/H1 and G1/H1, respectively .
  • computing task nodes A1 and A2 are computing task nodes formed by computing logic node A's computing tasks assigned to GPU0 and GPU1 of host H1, so their positions are marked as G0/H1 and G1/H1
  • computing task node B1 And B2 is the calculation task node formed by the calculation task of the calculation logic node B being allocated to the GPU0 and GPU1 of the host H1, so its location marks are G0/H1 and G1/H1, respectively.
  • the computing logic nodes C, D, and F are all located on the two GPU computing cards of the host H2. Therefore, after processing by the computing task node deployment component 10, their respective computing task nodes C1 and C2, D1 and D2, F1 The positions of and F2 are marked as G0/H2 and G1/H2, respectively. Therefore, when the computing task node deployment component (not shown) obtains the computing logic node topology, based on the task configuration data in the task description input by the user on the basis of the given computing resources, it will calculate any of the computing logic node topology diagrams.
  • the tasks of the arithmetic logic nodes are fragmented into the designated computing resources, thereby generating one or more arithmetic task nodes corresponding to each arithmetic logic node, and assigning each arithmetic task node a position mark corresponding to the designated computing resource.
  • the downstream computing device needs to be on different upstream computing devices.
  • the data generated by the computing task node of the company needs to be migrated across computing devices, which will generate the need for data handling.
  • the applicant of the present disclosure inserted a static handling task node between any upstream and downstream computing task nodes located between two different computing devices.
  • the handling task node insertion component inserts one or more handling task nodes between two upstream and downstream computing task nodes with different position marks, thereby Obtain a complete task node topology diagram 101 with handling task nodes.
  • the transport task nodes E1-H1 and H1-B2 are inserted between the computing task nodes E1 and B2, and the transport task nodes E2-H1 and H1- are inserted between the computing task nodes E2 and B1.
  • the handling task nodes C2-H1, H1- need to be inserted between the computing task node C1 and the computing task node D2 - H2 and H2-D2.
  • the input data required by the computing task node D2 also needs to come from the computing task node C2, it is also necessary to insert the handling task nodes C2-H1, H1- H2, and H2- between the computing task node C2 and the computing task node D2. D2.
  • the data migration between the host and the computing device can eliminate the need to insert the handling tasks mentioned in this disclosure. node. Therefore, only one handling task node H1-H2 needs to be inserted between the computing task node C1 or C2 and D1 or D2, that is, one handling task node H1-H2 can be shared between C1 and C2 and D1 and D2.
  • the four handling task nodes H1-H2 can be one handling task node.
  • the transfer task node inserting component is inserted into the transfer task node, the position mark of the inserted transfer task node is also marked, and the source address and destination address of the transferred data are also marked, that is, the transfer direction of the data is marked.
  • each transport node mentioned above is the source address and destination address of the transport task node and the transport direction.
  • the location of the transfer node is marked as the location of the host receiving the data, for example, if the transfer task node H1-H2, the location mark of the transfer task node (H1-H2) is set to H2 .
  • the computing task nodes with the same position mark G0/H1 and the operation type mark ON such as E1, A1, B1, and C1 are marked with KT0 label, and have the same position mark G1/H1
  • computing task nodes whose computing operation type is marked ON such as E2, A2, B2, and C2 are all marked with the KT1 label.
  • computing task nodes with the same position label G1/H1 and handling operation type label MN such as (E2-H1), (A2-H1), and (C2-H1), are all marked with KT5 labels.
  • the computing task nodes with the same position mark H2 and the transport operation type mark MN for example, several (H1-H2) transport task nodes are marked with the KT8 label.
  • the label "KT" is used to mark the kernel thread to be created to which each task node belongs. All tags used in the present disclosure are only for convenience of description, and any tags that can be distinguished from each other can be used in practical applications.
  • the kernel thread creation component 13 creates a kernel thread based on each of the tags, and at the same time creates a kernel thread based on each task node with the tag. Similar user-level threads of the same label.
  • the computing task node E1 is created as an E1UT computing user-level thread, and is associated with the kernel thread KT0.
  • the computing task node E2 is created as an E2UT computing user-level thread , And is associated with the kernel thread KT1.
  • each kernel thread manages user-level threads UT with the same operation type and the same computing resources.
  • the shared message warehouse of the kernel thread KT used for the user-level thread UT triggers the finite state of the user-level thread UT whose ID is arranged in the kernel thread according to the order of message arrival.
  • the unit arranges the corresponding operation tasks into the corresponding task flow.
  • continuous arithmetic tasks will be inserted into the task stream of a computing device such as GPU, and then the GPU will execute these inserted arithmetic tasks based on the order of the inserted task stream. , And store the calculation result in the output data buffer pre-allocated in the GPU.
  • continuous handling tasks will be inserted into the task flow of a computing device such as a network card, and then the network card will execute these inserted handling tasks based on the order of the inserted task flow, and The transported data is stored in a locked page memory such as a pre-designated output data cache.
  • Figure 3 shows a schematic diagram of the principle of a kernel thread controlling the user-level threads of the same position mark that are the same operation type.
  • the kernel thread KT0 is created based on all computing user-level threads UT of KT0 with the same label. All computing user-level threads share the message bin located in the kernel thread KT0. The message bin is used to receive all destination IDs with the same label KT0.
  • the message warehouse of KT0 contains a message queue MSG00, MSG01, MSG02...
  • the message queue triggers the associated user-level thread UT pointed to by the corresponding message according to the first-in-first-out rule.
  • the computing user-level thread E1UT inserts a predetermined task into the task flow scheduled in the computing device GPU0 in its operating unit, and then computes the user-level thread B1UT and B2UT downstream.
  • the message bins in the kernel threads KT0 and KT1 send messages. If B1 has other downstream user-level threads, it will also send messages to the kernel thread where other user-level threads are located, and at the same time, its limited state opportunity will change its state.
  • the message is numbered MSG00 at this time, and it is arranged in the first position of the message queue.
  • E1UT As the downstream operation of the user-level thread E1UT, when the user-level thread B1UT receives the message MSG00 in the kernel thread, the limited state opportunity of the user-level thread B1UT is triggered and only changes the state (if the kernel thread KT0 has not received the message from E2UT, In the case of A1UT and A2UT messages).
  • the user-level thread B1UT has a limited state opportunity Satisfying the predetermined condition allows the operating unit of the computing user-level thread B1UT to insert the computing task into the task stream in the GPU to which its task points.
  • the user-level thread B1UT has a limited state opportunity Satisfying the predetermined condition allows the operating unit of the computing user-level thread B1UT to insert the computing task into the task stream in the GPU to which its task points.
  • there is a direct access protocol between multiple GPUs connected to the host H1 then there will be no H1-B1-UT1 and H1-B1-UT2 between E2UT, A2UT, and B1UT.
  • the message MSG01 and MSG02 will come directly from the user-level threads A2UT and B1UT.
  • the operation unit of the user-level thread B1UT inserts a computing task into the task stream in the GPU pointed to by its task, its effective state machine will also change.
  • the KT0 message compartment will be upstream of the user-level thread B1UT based on the change of the state machine.
  • User-level threads send feedback messages MSG and send messages MSG to their downstream user-level threads.
  • messages MSG03 and MSG04 that are fed back to user-level threads E1UT and A1UT will be directly arranged in the message queue of KT0 and sent to other kernel threads, such as KT11 (For illustration) the sent message MSG will also be arranged in the KT11 message queue.
  • FIG. 4 shows a schematic diagram of the principle of a kernel thread controlling the user-level thread with the same position mark that is the same as the transportation type.
  • the location of the user-level thread is G1/H1, that is, data is transferred from the GPU1 connected to the host H1 to the host H1.
  • these co-located user-level threads such as E2-H1-UT, receive the producer of the data to be moved in the message warehouse of its kernel thread KT5, such as the message sent by E2UT (such as MSG00) , Its limited state opportunity changes state and triggers its operating unit to issue memory access instructions directly to the memory access unit.
  • the sequence of messages in the message queue then triggers the execution state of the corresponding user-level thread.
  • Figure 5 shows a schematic diagram of the principle of a user-level thread with the same position mark in which the kernel thread controls the same type of transport.
  • the kernel thread KT8 controls the user-level thread H1-H2-UT for transferring data from one host H1 to another host H2.
  • H2 For this kind of host to carry user-level threads, its location is marked as the host that receives the data, such as H2.
  • the message warehouse of the kernel thread KT8 to which the carrying user-level thread H1-H2-UT1 receives the message warehouse of the kernel thread KT5 to which C2-H1-UT belongs
  • the message MSG00 which will trigger the H1-H2-UT1 finite state machine to change state, and send the task of transporting data to the network connection element (not shown) through its operating unit, such as inserting a related request for transporting data into the corresponding In the task flow of the network card or a specially designed transfer request tool, such as a transfer request aggregation component (not shown) deployed on the host H2, the transfer request aggregation component centrally processes all the transfer user-level threads on the H2 In order to transfer the
  • the kernel thread uses a message-driven method to control the operation of the user-level thread for the similar user-level threads in the kernel thread, and when the user-level thread is driven by the message, it only makes state changes and sends corresponding operation tasks to the user space.
  • the time required for the change of the state of the finite state machine and the issuance of the operation task instruction is extremely short. Therefore, the kernel thread will basically not be in the waiting state during the data processing process, and will not enter the dormant state due to the long waiting time. This also eliminates the need for the operating system to frequently wake up kernel threads.
  • Adopting the user-level thread control system of the present disclosure can enable the kernel thread to efficiently use CPU resources, so that the CPU resources will not be idle due to the kernel thread being in the waiting or dormant state, which leads to the waste of CPU resources.
  • different types of user-level threads in the same task processing path users complete a total task
  • the state of waiting for each other More importantly, it avoids the situation where the CPU creates a kernel thread for each user-level thread, causing many kernel threads to wait for each other.
  • the user-level thread control system 10 of the present disclosure further includes a kernel thread preparation component 12.
  • the kernel thread preparation component 12 counts the number of labels assigned by the label presetting component 12, and prepares the same number of kernel threads for the full task node topology map, thereby preparing one for each label. Kernel thread.
  • the kernel thread creation component 13 can know how many kernel threads need to be created for the complete task node topology.
  • the message queues of the message bins of each kernel thread are arranged in the time sequence of message reception and trigger corresponding user-level threads to perform predetermined operations in a first-in-first-out manner.
  • disk operations for example, read and write disks
  • network communication operations for example, network communication operations
  • parameter update operations and so on.
  • These can be achieved by issuing operation instructions from the operation unit of the user-level thread or inserting the operation task into the task flow managed by the corresponding user space element. How to implement these data operations, disk reading and writing, and parameter updating in the user space does not belong to the part that this disclosure needs to face, so it will not be described in detail here.
  • the present disclosure includes a user-level thread control method, including: a label presetting step, using label presetting components to classify multiple task nodes with the same position mark and operation type in the task node topology graph into the same type Task nodes are assigned the same label to task nodes of the same type; and the kernel thread creation step, through the kernel thread creation component, creates a kernel thread based on each of the tags, and at the same time creates the same type of task nodes with the same tag based on each task node with the tag
  • the kernel thread includes a shared message bin commonly used for user-level threads on the kernel thread, and is used for receiving any message whose destination ID is a user-level thread ID associated with the kernel thread, Trigger the corresponding user-level thread to perform a predetermined operation.
  • the user-level thread control method of the present disclosure further includes: a kernel thread preparation step, which counts the number of labels assigned by the label preset component through the kernel thread preparation component, and prepares the same for the task node topology. A large number of kernel threads, so as to prepare a kernel thread for each label.
  • the messages in the message queue of the message warehouse required to implement the user-level thread control method are arranged in the time sequence of message reception, and the corresponding user-level thread is triggered to perform a predetermined operation in a first-in-first-out manner. Through these messages, the state machine and operation unit of the user-level thread generate predetermined operations, including changing the state of the state machine of the user-level thread, sending messages through the shared message bin, and issuing operations through the operation unit of the user-level thread instruction.
  • the order described above is not intended to limit this control order, because these orders do not have an obvious order.
  • the process of generating kernel threads and user-level threads is carried out almost at the same time.
  • the process of creating the kernel thread itself is also the kernel thread.
  • the two are created by associating them with tags representing the same category and the same location. Therefore, although there must be a sequence according to the text description, the actual execution steps are not limited to the sequence relationship defined by the text description sequence.
  • the purpose of the present disclosure can also be realized by running a program or a group of programs on any computing device.
  • the computing device may be a well-known general-purpose device. Therefore, the purpose of the present disclosure can also be achieved only by providing a program product containing program code for implementing the method or device. That is, such a program product also constitutes the present disclosure, and a storage medium storing such a program product also constitutes the present disclosure.
  • the storage medium may be any well-known storage medium or any storage medium developed in the future.
  • each component or each step can be decomposed and/or recombined.
  • These decomposition and/or recombination should be regarded as equivalent solutions of the present disclosure.
  • the steps of executing the above-mentioned series of processing can naturally be executed in chronological order in the order of description, but they do not necessarily need to be executed in chronological order. Some steps can be performed in parallel or independently of each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种用户级线程控制系统,包括:标签预置组件(11),将全部作业任务中的具有相同位置标记和操作类型的多个任务分类为同类任务并为同类任务赋予同一标签;以及内核线程创建组件(13),基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务创建具有同一标签的同类用户级线程,其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。

Description

用户级线程控制系统及其方法 技术领域
本公开涉及一种用户级线程控制技术。更具体地说,本公开涉及一种用于对用户级线程进行分类控制的控制系统和方法。
背景技术
计算机系统在处理数据过程中,通常采用线程对处理的进程进行管理。计算机的操作系统针对每个请求,产生很多指令序列,针对每个指令序列设立一个内核线程。内核线程通过操作系统分时在CPU内核上运行或占用CPU的内核。通常CPU为每个任务设立一个内核线程,并对这些内核线程进行并行处理。这样在存在诸如服务器接受各种访问时会同时设立成千上万的线程,尤其是在深度学习或大数据处理中,会产生更多的内核线程,每个内核线程在运行过程中占用CPU的阶段实际上只有这一小段时间,其他时间都是处于休眠或中断状态,这就为操作系统不断的调入唤醒或调出休眠内核线程花费长达一毫秒或者是半毫秒,会导致极大的开销。
另一方面,随着用户级线程的应用,很多操作可以不需要内核线程来实现,因此,内核线程针对一个具体的任务,可以管理构成这个任务的多个用户级线程来实现整个任务的完成。但是在对一个任务进行处理的过程中,任务所对应的一个内核线程内可能包含有多个操作子任务,因此一个线程中的一部分用于进行运算,一部分可能用于进行数据的读取和写入,还有一部分可能涉及磁盘操作。因此一个内核线程中会基于任务的时序管理该任务的多个用户级线程以便完成所述的任务。当一个用于运算操作的用户级线程依赖于用于磁盘操作的用户级线程的结果时,这个内核线程可能会长时间处于等待状态导致线程最终休眠直到磁盘操作结果结束该线程被唤醒为止。当CPU上存在多种等待和休眠状态时,CPU内核很可能在一段时间内处于空闲状态,这会导致CPU内核使用效率低下。
因此,如何使控制用户级线程,既能防止CPU内核承载众多的内核线程,又能够消除内核线程处于等待状态而提高CPU的使用效率,是人们需要解决的技术问题。
技术解决方案
本公开的目的在于提供一种解决至少上述问题之一的技术方案。具体而言,本公开提供一种用户级线程控制系统,包括:标签预置组件,将全部作业任务中的具有相同位置标记和操作类型的多个任务分类为同类任务并为同类任务赋予同一标签;以及内核线程创建组件,基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务创建具有同一标签的同类用户级线程,其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。
根据本公开的用户级线程控制系统,还包括:内核线程预备组件,用于统计标签预置组件所赋予的标签的数量,并对应每个标签预备一条内核线程。
根据本公开的用户级线程控制系统,其中所述消息仓具有消息队列,所述消息队列中的消息按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。
根据本公开的用户级线程控制系统,其中所述用户级线程包括状态机和操作单元以及所述预定操作包括改变所述用户级线程的状态机的状态、通过共享消息仓发送消息以及通过所述用户级线程的操作单元发出操作指令。
根据本公开的用户级线程控制系统,其中所述操作类型包括运算操作类型和搬运操作类型。
根据本公开的用户级线程控制系统,其中所述搬运操作类型包括从主机向计算设备的搬运操作类型、从计算设备向主机的搬运操作类型、从第一主机向第二主机的搬运操作类型、磁盘读写的搬运操作类型。
根据本公开的用户级线程控制系统,其中所述运算操作类型包括数据运算操作类型、参数更新操作类型。
根据本公开的另一个方面,提供了一种用户级线程控制方法,包括:标签预置步骤,通过标签预置组件将任务节点拓扑图中的具有相同位置标记和操作类型的多个任务节点分类为同类任务节点并为同类任务节点赋予同一标签;以及内核线程创建步骤,通过内核线程创建组件基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务节点创建具有同一标签的同类用户级线程,其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。
根据本公开的用户级线程控制方法,还包括:内核线程预备步骤,通过内核线程预备组件统计标签预置组件所赋予的标签的数量,并为所述任务节点拓扑图预备同样数量的内核线程,从而为对应每个标签预备一条内核线程。
根据本公开的用户级线程控制方法,其中所述消息仓具有消息队列,所述消息队列中的消息按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。
根据本公开的用户级线程控制方法,其中所述用户级线程包括状态机和操作单元以及所述预定操作包括改变所述用户级线程的状态机的状态、通过共享消息仓发送消息以及通过所述用户级线程的操作单元发出操作指令。
通过根据本公开的用户级线程控制系统和方法,由于内核线程对内核线程内的同类用户级线程采用消息驱动方式控制用户级线程的运行,并且用户级线程在受到消息驱动时,仅仅作出状态改变以及向用户空间发送对应操作任务的指令,而有限状态机状态的改变和操作任务指令的发出所用的时间极短,因此内核线程在数据处理过程中基本不会处于等待状态,更不会因为等待时间过长而进入休眠状态,由此也消除了操作系统需要频繁唤醒内核线程的需要。采用本公开的用户级线程的控制系统,能够使得内核线程能够高效地利用CPU资源,使得CPU资源不会因内核线程处于等待或休眠状态而处于空闲状态而导致CPU资源浪费。另外,通过内核线程对同类同位置的用户级线程的集中管理,消除了处于同一任务处理路径中(用户完成一个总的任务)的不同类型的用户级线程由于各个用户级线程处理速度不同而处于彼此等待的状态。更重要的是,避免了CPU为每个用户级线程创建一个内核线程,导致众多内核线程彼此等待的情形。
本发明的其它优点、目标和特征将部分通过下面的说明体现,部分还将通过对本发明的研究和实践而为本领域的技术人员所理解。
附图说明
图1所示的是根据本公开的用户级线程控制系统的原理示意图。
图2所示的是根据本公开的含有标签的完全任务节点拓扑图102。
图3所示的是一种内核线程控制同为运算类型的同位置标记的用户级线程的原理示意图。
图4所示的是一种内核线程控制同为搬运类型的同位置标记的用户级线程的原理示意图。
图5所示的是一种内核线程控制同为另一种搬运类型的同位置标记的用户级线程的原理示意图。
本发明的实施方式
下面结合实施例和附图对本发明做进一步的详细说明,以令本领域技术人员参照说明书文字能够据以实施。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,在下文中,两个可能位置标记之一可以被称为第一位置标记也可以被称为第二位置标记,类似地,两个可能位置标记的另一个可以被称为第二位置标记也可以被称为第一逻位置标记。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使本领域技术人员更好地理解本公开,下面结合附图和具体实施方式对本公开作进一步详细说明。
为了方便描述本公开的原理,本公开基于深度学习场景展开说明。图1所示的是根据本公开的用户级线程控制系统10的原理示意图。如图1所示所述用户级线程控制系统10包括标签预置组件11和内核线程创建组件13。所述标签预置组件11将完全任务节点拓扑图101中的具有相同位置标记和操作类型的多个任务节点分类为同类任务节点并为同类任务节点赋予同一标签,从而形成如图2所示的含有标签的完全任务节点拓扑图102。如图1所示,在采用本公开的用户级线程控制系统中,对将要生成的用户级线程的任务节点赋予位置标记和操作类型标记。
如图1所示的完全任务节点拓扑图101包含运算任务节点ON和搬运任务节点MN。采用“ON”来标记运算任务节点的操作类型,采用“MN”来标记搬运任务节点的操作类型。本公开采用的所有分类标记都仅仅是为了描述方便,在实际应用中可以采用任何可以彼此区分的标记即可。每个运算任务节点都来源于运算逻辑拓扑图中的运算逻辑节点。在深度学习作业任务的处理过程中,当作业任务将在分布架构上进行分片处理时,通常会包括一个或多个主机,每个主机上会连接多个运算设备,例如GPU、TPU等专用于大规模简单运算的计算设备。当需要进行数据并行计算时,所需要处理的大规模数据块通常会被分割分片到多个计算设备上进行并行处理。在模型比较大的情况下,通常也可以将模型进行分割而分布到不同计算设备上进行处理。每个运算逻辑节点基于用户的任务配置数据所包含的计算资源和作业任务的描述都会被分片,从而每个分片张量部署到不同的计算设备上进行分片处理。为此,当可利用的一个主机(HOST)上的设备为两个,例如为GPU0和GPU1时,可以沿着数据的第0维度,将数据分片为两部分,分布到GPU0和GPU1上进行并行处理,如果主机编号为H1, 则为运算逻辑节点的分片到该主机H1的GP0上的运算任务节点赋予位置标记G0/H1,同样,为运算逻辑节点的分片到该主机H1的GPU1上的运算任务节点赋予位置标记G1/H1。将运算逻辑节点拓扑图变换成运算任务节点拓扑图的生成方式属于本领域常规技术,因此不在此赘述。如图1中所示的运算任务节点E1和E2就是运算逻辑节点E的运算任务被分配到主机H1的GPU0和GPU1上形成的运算任务节点,因此其位置标记分别为G0/H1和G1/H1。同样,运算任务节点A1和A2就是运算逻辑节点A的运算任务被分配到主机H1的GPU0和GPU1上形成的运算任务节点,因此其位置标记分别为G0/H1和G1/H1,运算任务节点B1和B2就是运算逻辑节点B的运算任务被分配到主机H1的GPU0和GPU1上形成的运算任务节点,因此其位置标记分别为G0/H1和G1/H1。以此类推,运算逻辑节点C、D、F都位于主机H2的两个GPU计算卡上,因此经过运算任务节点部署组件10处理后,其各自的运算任务节点C1和C2、D1和D2、F1和F2的位置标记分别为记G0/H2和G1/H2。因此,运算任务节点部署组件(未示出)在获得运算逻辑节点拓扑图时,基于用户在给定计算资源的基础上输入的任务描述中的任务配置数据,将运算逻辑节点拓扑图中的任意运算逻辑节点的任务分片到所述指定计算资源,从而生成每个运算逻辑节点对应一个或多个运算任务节点,并赋予每个运算任务节点与所述指定计算资源对应的位置标记。
由于作为上下游运算任务节点的E1和B2之间、E2和B1之间、A1和B2之间以及A2和B1之间处于不同的计算设备上,因此,下游计算设备要获得上游不同计算设备上的运算任务节点所产生的数据,就需要对数据进行跨计算设备的迁移,就会产生对数据搬运的需要。为了实现数据处理的流式控制,减少数据调动的开销,本公开的申请人在任意处于两个不同计算设备之间的上下游运算任务节点之间插入了静态的搬运任务节点。如图1所示,在确定了运算任务拓扑图之后,搬运任务节点插入组件(未示出)在具有不同的位置标记两个上下游运算任务节点之间插入一个或多个搬运任务节点,从而获得具有搬运任务节点的完全任务节点拓扑图101。具体而言,如图1所示,在运算任务节点E1和B2之间插入搬运任务节点E1-H1和H1-B2,在运算任务节点E2和B1之间插入搬运任务节点E2-H1和H1-B1,在运算任务节点A1和B2之间插入搬运任务节点A1-H1和H1-B2,以及在运算任务节点A2和B1之间插入搬运任务节点A2-H1和H1-B1。最终形成图1中的完全任务节点拓扑图101。不过需要指出的是,在图1中,局限于附图的图幅,仅仅显示了完全任务节点拓扑图的一部分,仅仅包括经过搬运任务节点插入后的包含运算任务节点E、A以及B彼此之间的完全任务节点拓扑图以及运算任务节点C和D之间的一部分,其他部分通过省略号的方式予以表达。不过需要指出的是,当连接在同一主机上的不同运算设备(例如GPU)之间具备直接访问协议的情况下,这种同一主机下的运算设备之间的数据迁移可以不用插入本公开所提及的搬运任务节点。
如图1所示,由于运算逻辑节点C分布在主机H1的两个GPU0和GPU1上,其下游运算逻辑节点D分布在主机H2的两个GPU0和GPU1上,因此如图1所示其各自的运算任务节点C1和C2的位置标记为G0/H1或G1/H1以及运算任务节点D1和D2的位置标记为G0/H2或G1/H2。因此,当运算任务节点D1所需的输入数据需要来自于运算任务节点C1时,则如图1所示,需要在运算任务节点C1和运算任务节点D1之间插入搬运任务节点C1-H1、H1- H2和H2-D1。如果运算任务节点D1所需的输入数据同时还需要来自于运算任务节点C2时(当运算逻辑节点C的输出数据张量的分布方式与D的输入数据张量的分布方式不一致时),则还需要在运算任务节点C2和运算任务节点D1之间插入搬运任务节点C2-H1、H1- H2和H2-D1。同样,当运算任务节点D2所需的输入数据需要来自于运算任务节点C1时,则如图1所示,需要在运算任务节点C1和运算任务节点D2之间插入搬运任务节点C1-H1、H1- H2和H2-D2。如果运算任务节点D2所需的输入数据同时还需要来自于运算任务节点C2时,则还需要在运算任务节点C2和运算任务节点D2之间插入搬运任务节点C2-H1、H1- H2和H2-D2。类似地,在主机H1或H2与其所连接的运算设备(例如GPU)之间具备直接访问协议的情况下,这种主机和运算设备之间的数据迁移可以不用插入本公开所提及的搬运任务节点。因此,在运算任务节点C1或C2和D1或D2之间只需要插入一个搬运任务节点H1- H2,也就是说,在C1与C2和D1与D2之间可以共享一个搬运任务节点H1- H2。尽管图1所示的完全任务节点拓扑图101的一部分为了直观理解和方便描述显示分别插入了四个搬运任务节点H1- H2,但是实际上即使在主机H1或H2与其所连接的运算设备(例如GPU)之间不具备直接访问协议的情况下,这四个搬运任务节点H1- H2可以为一个搬运任务节点。根据本公开,在跨主机之间存在数据迁移时,只需要在成对的主机之间一对运算逻辑节点之间插入一个搬运任务节点即可。此外,在搬运任务节点插入组件插入搬运任务节点的同时,也标记了所插入的搬运任务节点的位置标记,此外也标记了搬运数据的源地址和目的地地址,也就是标记数据的搬运方向。上述每个搬运节点的名称即是搬运任务节点的源地址和目的地地址以及搬运方向。根据本公开的主机之间的数据的迁移,将搬运节点的位置标记为接收数据的主机位置,例如,搬运任务节点H1-H2,则将搬运任务节点(H1-H2)的位置标记设置为H2。
结合图1,如图2所示,具有相同位置标记G0/H1以及运算操作类型标记ON的运算任务节点,例如E1、A1、B1以及C1都被打上KT0标签,而具有相同位置标记G1/H1以及运算操作类型标记ON的运算任务节点,例如E2、A2、B2以及C2都被打上KT1标签。以此类推,具有相同位置标记G1/H1以及搬运操作类型标记MN的运算任务节点,例如(E2-H1)、(A2-H1)以及(C2-H1)都被打上KT5标签,同样,在具有相同位置标记H2以及搬运操作类型标记MN的运算任务节点,例如几个(H1-H2)搬运任务节点都被打上KT8标签。采用标签“KT”来标记每个任务节点所属的将要创建的内核线程。本公开采用的所有标签都仅仅是为了描述方便,在实际应用中可以采用任何可以彼此区分的标签即可。
返回参见图1,在完全任务节点拓扑图的所有任务节点都被打上标签之后,内核线程创建组件13基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务节点创建具有同一标签的同类用户级线程。如图1所示的内核线程与用户级线程所示,运算任务节点E1被创建成E1UT运算用户级线程,并有内核线程KT0相关联,类似地运算任务节点E2被创建成E2UT运算用户级线程,并有内核线程KT1相关联。在内核线程KT0、KT1、KT2、KT3、KT4 、KT5、……KT8直到KTn中,每个内核线程管理具有相同操作类型以及具有相同计算资源的用户级线程UT。
在用户级线程构成的网络进入数据处理过程中,用于用户级线程UT的内核线程KT的共享消息仓按照消息到来的先后顺序触发其ID排列在内核线程中的用户级线程UT内的有限状态机的状态的改变,或者处于数据处理路径中的被触发的用户级线程基于这些其上游用户级线程或下游用户级线程发送来的消息,在其有限状态机的状态满足一定条件下驱动其操作单元将对应的操作任务排列到对应的任务流中。具体而言,对于运算操作类型的用户级线程会将连续的运算任务插入诸如GPU类的计算设备的任务流中,随后所述GPU会基于所插入的任务流的顺序执行这些被插入的运算任务,并将计算结果存储在GPU中所预先分配的输出数据缓存中。同样,对于搬运操作类型的用户级线程会将连续的搬运任务插入诸如网卡的计算设备的任务流中,随后所述网卡会基于所插入的任务流的顺序执行这些被插入的搬运任务,并将搬运的数据存储在预先指定输出数据缓存等锁页内存中。
图3所示的是一种内核线程控制同为运算类型的同位置标记的用户级线程的原理示意图。如图3所示。在内核线程KT0是基于具有同一标签的KT0的所有运算用户级线程UT创建的,所有运算用户级线程共享位于内核线程KT0的消息仓,该消息仓用于接收所有目的地ID为具有同一标签KT0的运算用户级线程UT的ID的消息以及将这些用户级线程UT所产生的消息发送到消息所包含的目的地ID所指向的另一内核线程中的具有该目的地ID的用户级线程或者直接作为该内核线程KT0所接收到的消息。KT0的消息仓包含有一个消息队列MSG00、MSG01、MSG02…。所述消息队列按照先进先出的规则,触发对应的消息所指向的相关联的用户级线程UT。结合图1,如图3所示,运算用户级线程E1UT在其操作单元在向其所述的计算设备GPU0中预定的任务流插入预定任务之后,会向其下游运算用户级线程B1UT和B2UT所在的内核线程KT0和KT1内的消息仓发送消息。如果B1还有其他下游用户级线程,也会向其他用户级线程所在的内核线程发送给消息,同时其有限状态机会改变其状态。此时如果KT0的消息仓的消息队列中没有任何消息,此时该消息被编号为MSG00,排列在消息队列的第一位置。作为用户级线程E1UT的下游运算用户级线程B1UT在内核线程收到该消息MSG00时,用户级线程B1UT的有限状态机会受到触发而仅仅改变状态(如果此时内核线程KT0还没有收到来自E2UT、A1UT以及A2UT的消息的情况下)。举例而言,随着内核线程KT0收到逐一来自A1UT发送来的消息MSG01以及KT4发送的关于H1-B1-UT1和H1-B1-UT2的消息MSG01和MSG02后,用户级线程B1UT的有限状态机会满足预定的条件,使得运算用户级线程B1UT的操作单元向其任务所指向的GPU中的任务流中插入运算任务。同样的,如果在与主机H1连接的多个GPU彼此之间存在直接的存取协议,则在E2UT、A2UT与B1UT之间将不存在H1-B1-UT1和H1-B1-UT2,因此,消息MSG01和MSG02将直接来自用户级线程A2UT与B1UT。用户级线程B1UT的操作单元向其任务所指向的GPU中的任务流中插入运算任务后,其有效状态机同样会发生变化,KT0的消息仓基于状态机的变化会向用户级线程B1UT的上游用户级线程发送反馈消息MSG以及向其下游用户级线程发送消息MSG,例如向用户级线程E1UT和A1UT反馈的消息MSG03和MSG04就会直接排列在KT0的消息队列中,向其他内核线程,例如KT11(为示出)发送的消息MSG也会排列在KT11消息队列中。
类似地,图4所示的是一种内核线程控制同为搬运类型的同位置标记的用户级线程的原理示意图。如图4所示,其中的搬运用户级线程的位置为G1/H1,即从与该主机H1连接的GPU1上向主机H1搬运数据。通常这些同位置标记的搬运用户级线程,例如E2-H1-UT,会在其内核线程KT5的消息仓接收到其所要被搬运的数据的生产者,例如E2UT发送来的消息(例如MSG00)时,其有限状态机会改变状态,并触发其操作单元直接向内存访问单元发出内存访问的指令。随后消息队列中的消息顺序触发对应用户级线程执行状态。需要指出的是,在同一主机上的不同计算设备之间的数据的搬运,例如从E2到B1之间的搬运用户级线程E2-H1-UT和H1-B1-UT,其两者可以被放置到同一个内核线程进行控制。可选择地,在运算任务节点E2和运算任务节点B1之间甚至可以只插入一个搬运任务节点,使得从需要两个搬运用户级线程E2-H1-UT和H1-B1-UT变成一个搬运用户级线程。
图5所示的是一种内核线程控制同为另一种搬运类型的同位置标记的用户级线程的原理示意图。如图5所示,内核线程KT8控制的是用于将数据从一个主机H1搬运到另一个主机H2的搬运用户级线程H1-H2-UT。对于这种主机之间搬运用户级线程,其位置标记为接收数据的主机,例如H2。在搬运用户级线程H1-H2-UT1(例如,位于C2UT和D2UT之间的搬运用户级线程)所属的内核线程KT8的消息仓接收到来自C2-H1-UT所属的内核线程KT5的消息仓所发送的消息时(或者,在主机和其计算设备之间存在直接访问协议的情况下不需要C2-H1-UT时,则接收来自C2UT所属的内核线程KT15的消息仓所发送的消息时),例如消息MSG00,该消息会触发H1-H2-UT1的有限状态机改变状态,并通过其操作单元向网络连接元件(未示出)发出搬运数据的任务,例如将相关的搬运数据的请求插入对应的网卡的任务流中或专门设计的搬运请求工具中,例如一种部署在主机H2上的搬运请求汇集组件(未示出),通过该搬运请求汇集组件集中处理H2上的所有搬运用户级线程的搬运指令,从而将主机H1所连接的GPU1上的运算用户级线程所产生的数据搬运到主机H2。数据的具体搬运过程本身不是本公开所要解决的技术问题,因此不在此本公开进行详细描述。内核线程KT8的消息仓在通过MSG00触发H1-H2-UT1的有限状态机改变状态并由此使得操作单元发出搬运指令后,完成了该搬运用户级线程在内核线程中的所有任务,从而由下一个所接收到的消息MSG01触发该消息中所包含的目的地ID所对应的搬运用户级线程中的有限状态机和操作单元,以便其改变状态和向主机间对应的底层的通信链路发送搬运数据的请求。
综上所述,由于内核线程对内核线程内的同类用户级线程采用消息驱动方式控制用户级线程的运行,并且用户级线程在受到消息驱动时,仅仅作出状态改变以及向用户空间发送对应操作任务的指令,而有限状态机状态的改变和操作任务指令的发出所用的时间极短,因此内核线程在数据处理过程中基本不会处于等待状态,更不会因为等待时间过长而进入休眠状态,由此也消除了操作系统需要频繁唤醒内核线程的需要。采用本公开的用户级线程的控制系统,能够使得内核线程能够高效地利用CPU资源,使得CPU资源不会因内核线程处于等待或休眠状态而处于空闲状态而导致CPU资源浪费。另外,通过内核线程对同类同位置的用户级线程的集中管理,消除了处于同一任务处理路径中(用户完成一个总的任务)的不同类型的用户级线程由于各个用户级线程处理速度不同而处于彼此等待的状态。更重要的是,避免了CPU为每个用户级线程创建一个内核线程,导致众多内核线程彼此等待的情形。
可选择地,本公开的用户级线程控制系统10还包括内核线程预备组件12。如图1所示,所述内核线程预备组件12统计标签预置组件12所赋予的标签的数量,并为所述完全任务节点拓扑图预备同样数量的内核线程,从而为对应每个标签预备一条内核线程。通过所述内核线程预备组件12,使得内核线程创建组件13可获知针对完全任务节点拓扑需要准备创建多少条内核线程。
进一步,根据本公开的每个内核线程的消息仓的消息队列按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。需要指出的是,尽管本公开仅仅针对运算和搬运两种操作类型的用户级线程进行了描述,但是实际上还包括磁盘操作(例如读写磁盘)、网络通信操作、参数更新操作等。这些都可以通过用户级线程的操作单元发出操作指令或将操作任务插入对应的用户空间的元件所管理的任务流中来实现。具体如何在用户空间实现这些数据运算、磁盘的读写、参数的更新,并不属于本公开所需要面对的部分,因此不在此进行详细描述。
综合如上描述,本公开包含了一种用户级线程控制方法,包括:标签预置步骤,通过标签预置组件将任务节点拓扑图中的具有相同位置标记和操作类型的多个任务节点分类为同类任务节点并为同类任务节点赋予同一标签;以及内核线程创建步骤,通过内核线程创建组件基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务节点创建具有同一标签的同类用户级线程,其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。此外,如上所述,本公开的用户级线程控制方法,还包括:内核线程预备步骤,通过内核线程预备组件统计标签预置组件所赋予的标签的数量,并为所述任务节点拓扑图预备同样数量的内核线程,从而为对应每个标签预备一条内核线程。并且实现所述的用户级线程控制方法所需要的消息仓的消息队列中的消息按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。通过这些消息,所述用户级线程的状态机和操作单元产生预定操作,包括改变所述用户级线程的状态机的状态、通过共享消息仓发送消息以及通过所述用户级线程的操作单元发出操作指令。
尽管上面按照一定的顺序描述的是如何通过内核线程集中控制同类同位置用户级线程,但是上面描述的顺序并不是为了限定这种控制顺序,因为这些顺序并不具有明显的先后顺序。例如,内核线程以及用户级线程的生成过程几乎在同时进行,或者说,由于处于同一个内核线程的用户级线程构成了内核线程的一部分,因此,内核线程的创建过程本身也就是内核线程中的各个用户级线程的创建过程,两者通过代表相同类别和相同位置的标签关联在一起创建。因此尽管按照文字描述一定存在先后顺序,但是其实际执行步骤并限于这种文字描述顺序所限定的顺序关系。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,对本领域的普通技术人员而言,能够理解本公开的方法和装置的全部或者任何步骤或者部件,可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件、固件、软件或者它们的组合加以实现,这是本领域普通技术人员在阅读了本公开的说明的情况下运用他们的基本编程技能就能实现的。
因此,本公开的目的还可以通过在任何计算装置上运行一个程序或者一组程序来实现。所述计算装置可以是公知的通用装置。因此,本公开的目的也可以仅仅通过提供包含实现所述方法或者装置的程序代码的程序产品来实现。也就是说,这样的程序产品也构成本公开,并且存储有这样的程序产品的存储介质也构成本公开。显然,所述存储介质可以是任何公知的存储介质或者将来所开发出来的任何存储介质。
还需要指出的是,在本公开的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (11)

  1. 一种用户级线程控制系统,包括:
    标签预置组件,将全部作业任务中的具有相同位置标记和操作类型的多个任务分类为同类任务并为同类任务赋予同一标签;以及
    内核线程创建组件,基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务创建具有同一标签的同类用户级线程,
    其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。
  2. 根据权利要求1所述的用户级线程控制系统,还包括:
    内核线程预备组件,用于统计标签预置组件所赋予的标签的数量,并为所述全部作业任务预备同样数量的内核线程,从而为对应每个标签预备一条内核线程。
  3. 根据权利要求1所述的用户级线程控制系统,其中所述消息仓具有消息队列,所述消息队列中的消息按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。
  4. 根据权利要求1-3之一所述的用户级线程控制系统,其中所述用户级线程包括状态机和操作单元以及所述预定操作包括改变所述用户级线程的状态机的状态、通过共享消息仓发送消息以及通过所述用户级线程的操作单元发出操作指令。
  5. 根据权利要求4所述的用户级线程控制系统,其中所述操作类型包括运算操作类型和搬运操作类型。
  6. 根据权利要求5所述的用户级线程控制系统,其中所述搬运操作类型包括从主机向计算设备的搬运操作类型、从计算设备向主机的搬运操作类型、从第一主机向第二主机的搬运操作类型、磁盘读写的搬运操作类型。
  7. 根据权利要求5所述的用户级线程控制系统,其中所述运算操作类型包括数据运算操作类型、参数更新操作类型。
  8. 一种用户级线程控制方法,包括:
    标签预置步骤,通过标签预置组件将任务节点拓扑图中的具有相同位置标记和操作类型的多个任务节点分类为同类任务节点并为同类任务节点赋予同一标签;以及
    内核线程创建步骤,通过内核线程创建组件基于每个所述标签创建一个内核线程,并同时基于具有该标签的每个任务节点创建具有同一标签的同类用户级线程,
    其中所述内核线程包括共用于该内核线程上的用户级线程的共享消息仓,用于在收到任何目的地ID为与所述内核线程关联的用户级线程ID的消息时,触发对应用户级线程执行预定操作。
  9. 根据权利要求8所述的用户级线程控制方法,还包括:
    内核线程预备步骤,通过内核线程预备组件统计标签预置组件所赋予的标签的数量,并为所述任务节点拓扑图预备同样数量的内核线程,从而为对应每个标签预备一条内核线程。
  10. 根据权利要求8所述的用户级线程控制方法,其中所述消息仓具有消息队列,所述消息队列中的消息按照消息接收的时间顺序排列并按照先进先出的方式触发对应用户级线程执行预定操作。
  11. 根据权利要求8-10之一所述的用户级线程控制方法,其中所述用户级线程包括状态机和操作单元以及所述预定操作包括改变所述用户级线程的状态机的状态、通过共享消息仓发送消息以及通过所述用户级线程的操作单元发出操作指令。
PCT/CN2021/072790 2020-02-13 2021-01-20 用户级线程控制系统及其方法 WO2021159930A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010090333.3 2020-02-13
CN202010090333.3A CN110928696B (zh) 2020-02-13 2020-02-13 用户级线程控制系统及其方法

Publications (1)

Publication Number Publication Date
WO2021159930A1 true WO2021159930A1 (zh) 2021-08-19

Family

ID=69854830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072790 WO2021159930A1 (zh) 2020-02-13 2021-01-20 用户级线程控制系统及其方法

Country Status (2)

Country Link
CN (1) CN110928696B (zh)
WO (1) WO2021159930A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928696B (zh) * 2020-02-13 2020-10-09 北京一流科技有限公司 用户级线程控制系统及其方法
CN112631760A (zh) * 2020-12-31 2021-04-09 深圳市大富网络技术有限公司 一种线程创建方法、系统、装置及计算机存储介质
CN114035810B (zh) * 2022-01-10 2022-04-15 北京一流科技有限公司 用于多流并行的同步部署系统及其方法
CN114461400A (zh) * 2022-02-14 2022-05-10 北京百度网讯科技有限公司 数据处理的方法、装置、电子设备和存储介质
CN115098230A (zh) * 2022-06-17 2022-09-23 北京奥星贝斯科技有限公司 管理线程的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124201A1 (en) * 2001-03-01 2002-09-05 International Business Machines Corporation Method and system for log repair action handling on a logically partitioned multiprocessing system
CN102402458A (zh) * 2010-10-01 2012-04-04 微软公司 具有非对称处理器核的系统上的虚拟机和/或多级调度支持
CN104462302A (zh) * 2014-11-28 2015-03-25 北京京东尚科信息技术有限公司 一种分布式数据处理协调方法及系统
CN107391279A (zh) * 2017-07-31 2017-11-24 山东浪潮云服务信息科技有限公司 一种消息队列容器创建方法、装置及消息队列容器
CN107491346A (zh) * 2016-06-12 2017-12-19 阿里巴巴集团控股有限公司 一种应用的任务处理方法、装置及系统
CN110928696A (zh) * 2020-02-13 2020-03-27 北京一流科技有限公司 用户级线程控制系统及其方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124201A1 (en) * 2001-03-01 2002-09-05 International Business Machines Corporation Method and system for log repair action handling on a logically partitioned multiprocessing system
CN102402458A (zh) * 2010-10-01 2012-04-04 微软公司 具有非对称处理器核的系统上的虚拟机和/或多级调度支持
CN104462302A (zh) * 2014-11-28 2015-03-25 北京京东尚科信息技术有限公司 一种分布式数据处理协调方法及系统
CN107491346A (zh) * 2016-06-12 2017-12-19 阿里巴巴集团控股有限公司 一种应用的任务处理方法、装置及系统
CN107391279A (zh) * 2017-07-31 2017-11-24 山东浪潮云服务信息科技有限公司 一种消息队列容器创建方法、装置及消息队列容器
CN110928696A (zh) * 2020-02-13 2020-03-27 北京一流科技有限公司 用户级线程控制系统及其方法

Also Published As

Publication number Publication date
CN110928696B (zh) 2020-10-09
CN110928696A (zh) 2020-03-27

Similar Documents

Publication Publication Date Title
WO2021159930A1 (zh) 用户级线程控制系统及其方法
US10911536B2 (en) Real-time synchronization of data between disparate cloud data sources
US8381230B2 (en) Message passing with queues and channels
US7337275B2 (en) Free list and ring data structure management
US10108458B2 (en) System and method for scheduling jobs in distributed datacenters
US8112559B2 (en) Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment
JPH03126158A (ja) スケジユーリング方法及び装置
CN104063293A (zh) 一种数据备份方法及流计算系统
TWI236595B (en) Data transfer mechanism
EP0566894A2 (en) Optimized buffer handling in cooperative processing
CN106407231A (zh) 一种数据多线程导出方法及系统
TW202020855A (zh) 用於實現智慧處理運算結構的系統和方法
CN105607956B (zh) 一种计算机中的任务分配方法及系统
US20110173287A1 (en) Preventing messaging queue deadlocks in a dma environment
CN109144749A (zh) 一种使用处理器实现多处理器间通信的方法
WO2021218101A1 (zh) 固态硬盘的缓存管理系统、方法、装置
WO2021147876A1 (zh) 内存资源原地共享决策系统及其方法
US20080072015A1 (en) Demand-based processing resource allocation
CN106775984A (zh) 一种管理线程池的方法和装置
CN111240745A (zh) 交叉执行的增强型标量向量双管线架构
Priya et al. A survey on multiprocessor scheduling using evolutionary technique
TW200405167A (en) Signal aggregation
Ntaryamira et al. An efficient FIFO buffer management to ensure task level and effect-chain level data properties
Kessler et al. RAVEL, a support system for the development of distributed, multi-user VE applications
CN110399206A (zh) 一种基于云计算环境下idc虚拟化调度节能系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21754598

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21754598

Country of ref document: EP

Kind code of ref document: A1