WO2018077284A1 - Communication method and system, electronic device and computer cluster - Google Patents

Communication method and system, electronic device and computer cluster Download PDF

Info

Publication number
WO2018077284A1
WO2018077284A1 PCT/CN2017/108429 CN2017108429W WO2018077284A1 WO 2018077284 A1 WO2018077284 A1 WO 2018077284A1 CN 2017108429 W CN2017108429 W CN 2017108429W WO 2018077284 A1 WO2018077284 A1 WO 2018077284A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
message
information
communication
target device
Prior art date
Application number
PCT/CN2017/108429
Other languages
French (fr)
Chinese (zh)
Inventor
郭颖迪
颜深根
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018077284A1 publication Critical patent/WO2018077284A1/en
Priority to US16/234,890 priority Critical patent/US10693816B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • H04L49/358Infiniband Switches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a communication method and system, an electronic device, and a computer cluster for communicating with a device.
  • IB Infiniband
  • RDMA Infiniband and Remote Direct Memory Access
  • the RDMA communication mechanism allows data to be directly transferred between the application address space and the network, bypassing the operating system kernel from the critical path of data transmission, reducing the number of memory copies, and is an efficient data transmission mechanism.
  • MPI Message Passing Interface
  • MPI software can not support Infiniband well and use RDMA technology, which cannot achieve multi-thread transmission. Limits the rate at which data is transferred.
  • the embodiment of the present application provides a technical solution for communicating with a device.
  • a communication method including: creating a corresponding thread for at least one target device of a plurality of target devices, where the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message
  • the processing thread includes a message sending thread and/or a message receiving thread; the corresponding thread based on the configuration communicates with the corresponding target device, wherein the communication process with the first target device includes: the first message sending thread sends the information sending to the first communication thread a message, the first communication thread sends information to the first target device based on the information sending message by calling an IB interface; and/or, the first communication thread receives the information sent by the first target device by calling an IB interface.
  • the first target device is one of the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively the first target device Corresponding communication thread, message sending thread and message receiving thread.
  • a communication system including: a thread configuration module, configured to create a corresponding thread for at least one target device of a plurality of target devices, where the created thread corresponding to the target device includes a communication thread and a message processing thread, the message processing thread includes a message sending thread and/or a message receiving thread; and a data communication module, configured to communicate with the corresponding target device based on the created corresponding thread; wherein the first message sending thread sends the information to the first communication thread Sending a message, the first communication thread transmitting information to the first target device based on the information sending message by calling an IB interface; and/or, the first communication thread receiving the first target device sending by calling an IB interface
  • the information is generated by the information receiving message corresponding to the received information and sent to the first message receiving thread; the first target device is a target device of the plurality of target devices, the first communication thread, the first a message sending thread and the first message receiving thread are respectively associated with the first target device
  • an electronic device includes: a processor, a memory, an IB communication portion, and a communication bus, wherein the processor, the memory, and the communication portion complete each other through the communication bus
  • the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the communication method as above.
  • a computer cluster comprising: a plurality of electronic devices as described above and a switching device respectively connected to each electronic device, and any one of the electronic devices communicates through respective IBs And communicate with other electronic devices via the switching device.
  • the communication method and system, the electronic device and the computer cluster of the present application adopt a multi-threading method for data transmission and reception of the target device, thereby improving the speed of data transmission and effectively utilizing the bandwidth.
  • FIG. 1 is a flow chart of one embodiment of a communication method in accordance with the present application.
  • FIG. 2 is a flowchart of processing of an information transmission message according to an embodiment of a communication method of the present application
  • FIG. 3 is a flow chart of processing an information receiving message according to an embodiment of a communication method of the present application
  • FIG. 4 is a schematic diagram of a hierarchical structure design using the present application.
  • FIG. 5 is a block diagram of an embodiment of a communication system in accordance with the present application.
  • FIG. 6 is a schematic diagram of one embodiment of an electronic device in accordance with the present application.
  • FIG. 7 is a schematic diagram of one embodiment of a computer cluster in accordance with the present application.
  • Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the communication method of the present application adopts multi-threading technology for data interaction of multiple target devices, and can adopt multi-threading technology only for data transmission or reception, and multi-threading technology for data transmission and reception.
  • Corresponding threads are respectively created for at least one or each of the plurality of target devices, for example, a communication thread and a message processing thread corresponding to any target device are created, and the message processing thread includes a message sending thread and a message receiving thread.
  • the first target device is any one of the plurality of target devices
  • the first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and The message receiving thread.
  • FIG. 1 is a flow chart according to an embodiment of a communication method, as shown in FIG. 1:
  • Step 101 The first message sending thread sends an information sending message to the first communications thread.
  • the information sending message may be information about a user purchasing a product sent to a target server.
  • Step 102 The first communication thread sends information to the first target device based on the information sending message by calling the IB interface.
  • the information transmitted to and received from the target device in the present application includes data and/or control commands.
  • Step 103 The first communication thread receives the data sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received data, and sends the information to the first message receiving thread.
  • the first communication thread processes the received information sending message, for example, extracts the product information purchased by the user from the information sending message, and generates a sending data instruction according to a preset rule, and transmits the sending data instruction to the IB interface, and invokes the IB interface.
  • the function is sent to the target device.
  • the information sending message may be in multiple formats, including a message header, a message body, etc., and the corresponding target device information, the message sending thread, and the communication thread information are encapsulated in the message header, and the data to be sent is encapsulated in the message body.
  • the information receiving message may also be in multiple formats, including a message header, a message body, etc., and the corresponding target device information, the message receiving thread, and the communication thread information are encapsulated in the message header, and the received data sent by the target device is encapsulated in the message body. .
  • the first message sending thread corresponding to the first target device is created. If no, the existing first message sending thread is used. It is determined whether the information sent by the first target device is received for the first time, and if so, the first message receiving thread is created, and if not, the existing first message receiving thread is used.
  • the IB interface may include, but is not limited to, IB VERBS, SDP (Sockets Direct Protocol), and IPoIB (Internet Protocol over InfiniBand).
  • IB VERBS IB VERBS
  • SDP Sockets Direct Protocol
  • IPoIB Internet Protocol over InfiniBand
  • the communication thread transmits and receives information, and if not, creates a first communication thread.
  • Each communication thread is assigned its rank number (identification number) when it creates the communication thread, and the rank number is used to identify the communication thread.
  • the first communication thread establishes a connection with the first target device through the socket by calling the IB interface, and exchanges the RDMA communication environment information, such as the buffer address and the quantity, through the established socket connection, and the information transmission is performed through the RDMA mechanism.
  • the first communication thread receives feedback information of the first target device, for example, data transmission success, failure, etc., and delivers the feedback information to the first message sending thread.
  • the first communication thread receives the information sent by the first target device by calling the IB interface, and the first communication thread delivers the information receiving message corresponding to the first target device to the first message receiving thread corresponding to the first target device, and the first message is received.
  • the thread processes the information receiving message, for example, the first message receiving thread extracts data from the information receiving message, and writes the data to a database or the like.
  • the first message receiving thread transmits the feedback information to the first communication thread, and the first communication thread sends the message to the target device.
  • a corresponding communication thread is established for the target device, and a message sending thread and a message receiving thread are used, and the communication thread is used to invoke the IB interface for data communication, and the message sending thread and the message receiving thread can implement the sending.
  • the asynchronous operation of receiving information can improve the efficiency of data transmission and effectively utilize the bandwidth.
  • the first message sending thread determines whether the first communication thread is transmitting data when receiving the information sending message, and if yes, placing the information sending message into the task pool, and if not, sending the information The message is passed to the first communication thread.
  • the first communication thread receives the data sent by the first target device, it is determined whether the first message receiving thread is processing information, and if so, the information receiving message is placed in the task pool, and if not, the information receiving message is delivered to The first message receives the thread.
  • the task pool POOL is used for caching and scheduling.
  • the task pool is a global task pool and is used to cache information sending messages and information receiving messages corresponding to all target devices.
  • the task pool can be an array linked list structure.
  • the global task processing thread polling the task pool is started for all the target devices.
  • the global task processing thread determines that the information sending message corresponding to one target device is cached in the task pool, it is determined whether the communication thread corresponding to the target device is idle, if If yes, the information sending message corresponding to the target device is taken out from the task pool, and sent to the communication thread corresponding to the target device.
  • the global task processing thread determines that the information receiving message corresponding to a target device is cached in the task pool, determining whether the message receiving thread corresponding to the target device is idle, and if yes, extracting information corresponding to the target device from the task pool.
  • the message is received and sent to the message receiving thread corresponding to the target device.
  • Global task processing thread The information of the message receiving thread and the communication thread can be obtained by parsing the message receiving message and the message header of the information sending message.
  • FIG. 2 is a flow chart of processing an information transmission message according to an embodiment of the communication method of the present application, as shown in FIG. 2:
  • Step 201 The first message sending thread receives an information sending message for the first target device.
  • Step 202 Determine whether the first communication thread is transmitting data. If yes, execute step 203. If no, execute step 204.
  • Step 203 The first message sending thread puts the information sending message into the task pool. Go to step S205.
  • Step 204 The first message sending thread sends an information sending message to the first communication thread corresponding to the first target device. Go to step S207.
  • Step 205 The global task processing thread determines whether the first communication thread transmission data (or control command) ends, that is, whether the first communication thread is in an idle state. If yes, step 206 is performed, and if no, step 205 is continued.
  • Step 206 The global task processing thread extracts the information sending message from the task pool and sends the message to the first communication thread.
  • Step 207 The first communication thread sends data to the first target device according to the information sending message.
  • FIG. 3 is a flowchart of processing an information receiving message according to an embodiment of a communication method of the present application, as shown in FIG. 3:
  • Step 301 The first communication thread receives the data sent by the first target device, and generates an information receiving message.
  • step 302 it is determined whether the first message receiving thread is processing data (or a control command). If yes, step 303 is performed, and if no, step 304 is performed.
  • Step 303 The first communication thread puts the information receiving message into the task pool. Go to step S305.
  • Step 304 The first communication thread sends an information receiving message to the first message receiving thread corresponding to the first target device. Go to step S307.
  • Step 305 The global task processing thread determines whether the first message receiving thread processes the data end, that is, whether the first message receiving thread is in an idle state. If yes, step 306 is performed, and if no, step 305 is continued.
  • Step 306 The global task processing thread retrieves the information receiving message from the task pool and sends the message to the first message receiving thread. Go to step S307.
  • Step 307 The first message receiving thread processes the information receiving message.
  • a flag bit can be set for each information sending message and information receiving message, which is used to identify the status and result of the operation of sending and receiving data, and can be set according to specific needs. For example, setting an identifier bit corresponding to the information transmission message and the information reception message, the identification bit identifies the processing status of the information transmission message and the information reception message, for example: 0-initial, 1-successful, -1-failed.
  • the message sending thread and the message receiving thread assign values to the flag according to the received feedback information, and adopt a locking mechanism.
  • Each flag bit is transferred from one state to another state, and only one thread may be doing, and another thread may not exist. Move the same state to a different state. It can provide the flag bit query, and the user can obtain the result of the operation without calling the function again. It can reduce the waste of resources and time when the function call returns, and can also set the timing of the asynchronous transfer.
  • the first message sending thread receives the feedback information corresponding to the sent data sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information.
  • the first message receiving thread assigns a value to the status flag of the information receiving message according to the processing result of the information receiving message.
  • the result of sending and receiving information can be determined by the flag bit, and the waiting flag is sent to the next transmission when the transmission is successful.
  • the global task processing thread extracts the information sending message and the information receiving message corresponding to the first target device from the task pool, and sends the message to the first communication thread and the first message receiving thread for processing. Determining whether the status flag of the information sending message and/or the information receiving message corresponding to the first target device taken from the task pool is successful, and if yes, the global task processing thread enters the task pool based on the information sending message and the information receiving message. In the sequence, the next information sending message corresponding to the first target device is taken out from the task pool, and the information receiving message is sent to the first communication thread and the first message receiving thread for processing. If not, the processing abnormality is prompted.
  • the information sent by the first target device is received, the information includes a check code and a control command, and the control command sent by the first target device is stored in the memory block according to the received storage control command.
  • the new check code corresponding to the control command is determined according to the control command receiving progress.
  • the new check code is compared with the received check code. If the check is successful, the check is successful, the information reception of the first target device is determined, the information transmission of the first target device is completed, and the control command is started. After determining to complete the information reception on the first target device, the received check code may be invalidated by the data, such as filling in 0, setting a random number, and the like.
  • the new check code for determining the control command is the same as the algorithm for generating the check code by the first target device. For example, the new check code of the control command may be calculated by using the redundancy check code CRC32 algorithm.
  • the processing flow can be reduced, thereby improving the operation efficiency and speed; Verification, using this type of verification when receiving information, can be more stable and faster.
  • the queried operation exception event when the queried operation exception event is polled through the IB interface, it is determined whether an exception handling function corresponding to the exception event is registered. For example, it is judged whether or not a pointer of the exception handling function instance can be acquired, and if so, it is determined that the exception handling function corresponding to the exception event is registered, and if not, it is determined that the exception handling function corresponding to the exception event is not registered. If it is determined that an exception handler corresponding to the exception event is registered, the exception handler is automatically called back, and the exception handler handles the exception.
  • Exception handling is handled exceptions regardless of whether an exception handler corresponding to an exception event is registered.
  • the transfer exception includes not only various exceptions that occur when the IB interface function is called, but also exceptions to the dispatch thread, deadlocks in the task pool, and so on.
  • Corresponding processing for various transmission exceptions including: command data retransmission, link termination and disconnection, thread stop and cleanup, etc. For example, when a serious transmission error occurs, the communication thread is stopped and disconnected, and when the connection is disconnected, when there is a new transmission task, the connection is reestablished, and the communication thread is started.
  • Callback function functions can also be provided for events.
  • a callback function is a function called by a function pointer. For most events, there are basic processing functions. If the registered callback function is not empty, the callback function of the user-defined registration will be called back when the corresponding event occurs, so that the user can conveniently pay attention to the event of interest. Add the appropriate action. For example, setting a callback function for receiving a control command, when determining that the event is a receiving control command event, calling a corresponding callback function for receiving the control command, and directly processing the received control command event.
  • multiple implementations can be devised for the communication method of the present application.
  • a multi-layer structure design can be adopted, the COMMON layer is encapsulated on the bottom layer of the IB VERBS interface function layer, and the SIMA main communication layer is encapsulated on the COMMON layer.
  • the user can select the SIMA main communication layer or use the COMMON layer to do some customization of the operation functions as needed.
  • the COMMON layer encapsulates the IB VERBS interface functions and provides some basic event processing logic and exception handling logic.
  • the RMON operation can be performed directly using the COMMON layer.
  • the callback function is provided in the COMMON layer.
  • the COMMON layer provides the corresponding operation error function pointer storage location, which can be used to store the function pointer for execution when an IB operation error occurs.
  • the pointer is used by the SIMA master communication when the SIMA main communication layer is present.
  • the error callback handler is registered (ie, points to this function).
  • the COMMON layer obtains the location of the user-defined function pointer while providing the basic processing function according to the result of the IB VERBS interface function for sending and receiving data. When the COMMON layer detects that the location is not empty, the COMMON layer detects that the location is not empty. In the event of these events, these user-defined functions are called first, and then their basic processing is performed.
  • the SIMA main communication layer can be compatible with the startup mode of the MPI (Message Passing Interface) and can run under the mainstream platform and cluster management software.
  • the SIMA main communication layer does not perform the retry operation when it fails to transmit, but processes it through the callback function or directly feeds back to the caller, and the caller decides whether to resend it. Both the sending data and the receiving data are performed asynchronously without blocking the call.
  • the communication method provided by the foregoing embodiment directly uses the IB VERBS interface or driver, and uses a multi-thread asynchronous operation for data transmission and reception of the target device, and can support an unequal number of receiving and transmitting data operations, thereby improving the speed of information transmission. , effectively use the bandwidth.
  • the present application provides a communication system 50, including: a thread configuration module 51, a data communication module 52, a processing status setting module 53, and an information verification module 54.
  • the thread configuration module 51 respectively creates a corresponding thread for at least one target device of the plurality of target devices, and the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message processing thread includes a message sending thread and/or a message receiving thread. .
  • the data communication module 52 communicates with the corresponding target device based on the created corresponding thread.
  • the first message sending thread sends an information sending message to the first communication thread, and the first communication thread sends the information to the first target device based on the information sending message by calling the IB interface.
  • the first communication thread receives the information sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received information, and sends the information to the first message receiving thread.
  • the first target device is a target device among the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and a message receiving thread. .
  • the first message sending thread places an information transmission message to be sent to the first communication thread into the task pool.
  • the first communication thread places an information receiving message to be sent to the first message receiving thread into the task pool.
  • the task pool is used to cache information transmission messages and/or information reception messages corresponding to multiple target devices.
  • the thread configuration module 51 creates a global task processing thread corresponding to a plurality of target devices.
  • the global task processing thread fetches the informational send message from the task pool and sends it to the first communication thread.
  • the global task processing thread fetches the information receiving message from the task pool and sends it to the first message receiving thread.
  • the processing status setting module 53 sets a status flag corresponding to the information transmission message, and sets a status flag corresponding to the information reception message.
  • the first message sending thread receives the feedback information corresponding to the sending information sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information.
  • the first message receiving thread assigns a value to the status flag of the information receiving message according to the processing result of the information receiving message.
  • the global task processing thread enters the task pool based on the information sending message and/or the information receiving message in response to the success of the status sending bit of the information sending message and/or the information receiving message corresponding to the first target device taken from the task pool.
  • the next information sending message and/or information receiving message corresponding to the first target device is taken out from the task pool and sent to the first communication thread and/or the first message receiving thread.
  • the information verification module 54 determines the new check code according to the received information (including the check code and the control command, etc.) according to the control command receiving the memory block for storing the control command, and the new check code and the receiving The check code to be verified is compared, and when the verification is successful, it is determined that the information reception of the first target device is completed.
  • the information verification module 54 performs data invalidation processing on the received verification code in response to determining to complete the information reception of the first target device.
  • the thread configuration module 51 creates a first message sending thread in response to determining that the current information is sent as the first information transmission to the first target device.
  • a first communication thread is created in response to determining that the first communication thread corresponding to the first target device is not currently created.
  • the first communication thread responds to determining that an exception handling function corresponding to the exception event is registered, and the exception handling function is called back.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 6 there is shown a schematic structural diagram of an electronic device 600 suitable for implementing a terminal device or a server of an embodiment of the present application.
  • the computer system 600 includes one or more processors and a communication unit.
  • the one or more processors for example: one or more central processing units (CPUs) 601, and/or one or more image processing units (GPUs) 613, etc.
  • the processors may be stored in a read-only memory ( Executable instructions in ROM) 602 or executable instructions loaded from random access memory (RAM) 603 from storage portion 608 perform various appropriate actions and processes.
  • the communication unit 612 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 602 and/or the random access memory 603 to execute executable instructions, connect to the communication unit 612 via the bus 604, and communicate with other target devices via the communication unit 612, thereby completing the embodiments of the present application.
  • An operation corresponding to any one of the communication methods for example, an instruction for creating a corresponding thread for each of the plurality of target devices, and the created thread corresponding to any target device includes a communication thread and a message processing thread.
  • a message processing thread includes a message sending thread and/or a message receiving thread; an instruction based on the configured corresponding thread to communicate with the corresponding target device.
  • the communication process with the first target device includes: the first message sending thread sends an information sending message to the first communication thread, and the first communication thread sends the data to the first target device based on the information sending message by calling the IB interface; and/or, The communication thread receives the data sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received data, and sends the message to the first message receiving thread; the first target device is any one of the plurality of target devices.
  • the first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and a message receiving thread.
  • RAM 603 various programs and data required for the operation of the device can be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • ROM 602 is an optional module.
  • the RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the processor 601 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the communication unit 612 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • FIG. 6 is only an optional implementation manner.
  • the number and type of components in FIG. 6 may be selected, deleted, added, or replaced according to actual needs;
  • Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing an instruction corresponding to the communication method step provided by the embodiment of the present application, for example, receiving an instruction for sending an information of a target device; transmitting an information sending message to an instruction of a sending thread corresponding to the target device; and sending a message according to the information sending message
  • the underlying communication thread corresponding to the target device transmits a send data instruction, and the underlying communication thread sends data to the target device by calling the IB interface, and delivers the feedback information to the sending thread.
  • the computer program can be downloaded and installed from the network via the communication portion 612, and/or installed from the removable medium 611.
  • the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the present application are performed.
  • the embodiment of the present application further provides a computer cluster.
  • the computer cluster includes: an IB switch 71 and a plurality of electronic devices 72, 73, 74, ... 75, 76, 77 as above.
  • the IB switch 71 and the plurality of electronic devices 72, 73, 74, ..., 75, 76, 77 can be connected by a bus, a network cable, etc., each of which is provided with a communication portion (for example, an IB network card), and communication of each electronic device
  • the department communicates with other electronic devices through the IB switch 71.
  • the communication method and system, the electronic device and the computer cluster provided by the foregoing embodiments adopt a multi-threaded manner for data transmission and reception of the target device, and receive and transmit data in an asynchronous manner, and provide an identifier bit to identify the result or state of the operation.
  • Supporting the number of unequal reception and transmission data operations can improve the speed of data transmission, and effectively use the bandwidth; adopt hierarchical structure design, provide RID as a data transmission means by calling IB interface or driver, provide custom callback support, convenient for users Customize operations and perform special functions to make communication and calculations more efficient.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Abstract

Disclosed in the embodiments of the present application are a communication method and system, an electronic device and a computer cluster. The method comprises: establishing a corresponding thread for at least one target device among a plurality of target devices, respectively, the established thread corresponding to the target device comprising a communication thread and a message processing thread, the message processing thread comprising a message transmitting thread and/or a message receiving thread; communicating with the corresponding target device on the basis of the established corresponding thread.

Description

通信方法和系统、电子设备和计算机集群Communication method and system, electronic device and computer cluster
本申请要求在2016年10月28日提交中国专利局、申请号为2016010967290.6、发明名称为“通信方法和系统、电子设备和计算机集群”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。The present application claims priority to Chinese Patent Application No. 2016010967290.6, entitled "Communication Methods and Systems, Electronic Devices and Computer Clusters", filed on October 28, 2016, the entire contents of which are incorporated by reference. In the present disclosure.
技术领域Technical field
本申请涉及通信技术领域,尤其涉及一种用于与设备进行通信的通信方法和系统、电子设备和计算机集群。The present application relates to the field of communications technologies, and in particular, to a communication method and system, an electronic device, and a computer cluster for communicating with a device.
背景技术Background technique
在大规模训练集群中,通信是非常重要的模块,特别是在大规模深度学习训练任务中,为了获得更好的模型参数,加速模型收敛,必须进行频繁的通信,通信是训练速度的瓶颈之一。目前在高度集成的训练系统、超算中心中,为加速通信,通常采用无限宽带(Infiniband,简称IB)构架。Infiniband使用专用硬件,简化协议栈,将原本两台计算机间内存共享需要CPU参与的大部分工作直接由IB(Infiniband)硬件完成。In large-scale training clusters, communication is a very important module. Especially in large-scale deep learning training tasks, in order to obtain better model parameters and accelerate model convergence, frequent communication must be carried out. Communication is the bottleneck of training speed. One. At present, in the highly integrated training system and super-computing center, in order to accelerate communication, an Infiniband (IB) architecture is usually adopted. Infiniband uses dedicated hardware to simplify the protocol stack. Most of the work that requires CPU sharing between memory sharing between the two computers is done directly by IB (Infiniband) hardware.
在深度学习中,通信开销是巨大的,通常采用Infiniband和远程直接数据存取(Remote Direct Memory Access,简称RDMA)技术加速传输。RDMA通信机制允许数据在应用程序地址空间和网络之间直接传递,将操作系统内核从数据传输的关键路径上旁路掉,减少了内存拷贝次数,是一种高效的数据传输机制。在上层训练程序上,通常使用实现了MPI(Message Passing Interface,消息传输接口)的软件作为通信软件,但目前的MPI软件不能很好的支持Infiniband并使用RDMA技术,不能实现多线程传输,极大限制了传输数据的速率。In deep learning, communication overhead is huge, usually using Infiniband and Remote Direct Memory Access (RDMA) technology to accelerate transmission. The RDMA communication mechanism allows data to be directly transferred between the application address space and the network, bypassing the operating system kernel from the critical path of data transmission, reducing the number of memory copies, and is an efficient data transmission mechanism. In the upper training program, software that implements MPI (Message Passing Interface) is usually used as communication software, but the current MPI software can not support Infiniband well and use RDMA technology, which cannot achieve multi-thread transmission. Limits the rate at which data is transferred.
发明内容Summary of the invention
有鉴于此,本申请实施例提供一种用于与设备进行通信的技术方案。In view of this, the embodiment of the present application provides a technical solution for communicating with a device.
根据本申请实施例的一个方面,提供一种通信方法,包括:为多个目标设备中的至少一个目标设备创建对应的线程,创建的与目标设备对应的线程包括通信线程和消息处理线程,消息处理线程包括消息发送线程和/或消息接收线程;基于配置的对应线程与对应的目标设备通信,其中,与第一目标设备的通信过程包括:第一消息发送线程向第一通信线程发送信息发送消息,所述第一通信线程通过调用IB接口基于所述信息发送消息向所述第一目标设备发送信息;和/或,第一通信线程通过调用IB接口接收所述第一目标设备发送的信息,生成与接收的信息对应的信息接收消息并发送给第一消息接收线程; 所述第一目标设备为所述多个目标设备中的一目标设备,所述第一通信线程、所述第一消息发送线程和所述第一消息接收线程分别为与所述第一目标设备对应的通信线程,消息发送线程和消息接收线程。According to an aspect of the embodiments of the present application, a communication method is provided, including: creating a corresponding thread for at least one target device of a plurality of target devices, where the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message The processing thread includes a message sending thread and/or a message receiving thread; the corresponding thread based on the configuration communicates with the corresponding target device, wherein the communication process with the first target device includes: the first message sending thread sends the information sending to the first communication thread a message, the first communication thread sends information to the first target device based on the information sending message by calling an IB interface; and/or, the first communication thread receives the information sent by the first target device by calling an IB interface. Generating an information receiving message corresponding to the received information and transmitting the message to the first message receiving thread; The first target device is one of the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively the first target device Corresponding communication thread, message sending thread and message receiving thread.
根据本申请的另一个方面,提供一种通信系统,包括:线程配置模块,用于为多个目标设备中的至少一个目标设备创建对应的线程,创建的与目标设备对应的线程包括通信线程和消息处理线程,消息处理线程包括消息发送线程和/或消息接收线程;数据通信模块,用于基于创建的对应线程与对应的目标设备通信;其中,第一消息发送线程向第一通信线程发送信息发送消息,所述第一通信线程通过调用IB接口基于所述信息发送消息向所述第一目标设备发送信息;和/或,第一通信线程通过调用IB接口接收所述第一目标设备发送的信息,生成与接收的信息对应的信息接收消息并发送给第一消息接收线程;所述第一目标设备为所述多个目标设备中的一目标设备,所述第一通信线程、所述第一消息发送线程和所述第一消息接收线程分别为与所述第一目标设备对应的通信线程,消息发送线程和消息接收线程。According to another aspect of the present application, a communication system is provided, including: a thread configuration module, configured to create a corresponding thread for at least one target device of a plurality of target devices, where the created thread corresponding to the target device includes a communication thread and a message processing thread, the message processing thread includes a message sending thread and/or a message receiving thread; and a data communication module, configured to communicate with the corresponding target device based on the created corresponding thread; wherein the first message sending thread sends the information to the first communication thread Sending a message, the first communication thread transmitting information to the first target device based on the information sending message by calling an IB interface; and/or, the first communication thread receiving the first target device sending by calling an IB interface The information is generated by the information receiving message corresponding to the received information and sent to the first message receiving thread; the first target device is a target device of the plurality of target devices, the first communication thread, the first a message sending thread and the first message receiving thread are respectively associated with the first target device The communication thread, the thread sending the message and the message receiving thread.
根据本申请的又一个方面,提供一种电子设备,包括:处理器、存储器、IB通信部和通信总线,所述处理器、所述存储器和所述通信部通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如上的通信方法对应的操作。According to still another aspect of the present application, an electronic device includes: a processor, a memory, an IB communication portion, and a communication bus, wherein the processor, the memory, and the communication portion complete each other through the communication bus The memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the communication method as above.
根据本申请实施例的再一个方面,提供一种计算机集群,其特征在于,包括多个如上所述的电子设备和以及与各电子设备分别连接的交换设备,任一电子设备通过各自的IB通信部并经所述交换设备与其他电子设备通信。According to still another aspect of the embodiments of the present application, a computer cluster is provided, comprising: a plurality of electronic devices as described above and a switching device respectively connected to each electronic device, and any one of the electronic devices communicates through respective IBs And communicate with other electronic devices via the switching device.
本申请的通信方法和系统、电子设备和计算机集群,对于目标设备的数据发送和接收采用多线程方式,能够提高数据传输的速度,有效地利用带宽。The communication method and system, the electronic device and the computer cluster of the present application adopt a multi-threading method for data transmission and reception of the target device, thereby improving the speed of data transmission and effectively utilizing the bandwidth.
本申请实施例附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。The additional aspects and advantages of the embodiments of the present invention will be set forth in part in the description which follows.
附图说明DRAWINGS
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。The accompanying drawings, which are incorporated in FIG.
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,The present application can be more clearly understood from the following detailed description with reference to the accompanying drawings.
显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图:Obviously, the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor:
图1为根据本申请的通信方法的一个实施例的流程图; 1 is a flow chart of one embodiment of a communication method in accordance with the present application;
图2为根据本申请的通信方法的一个实施例的对于信息发送消息处理的流程图;2 is a flowchart of processing of an information transmission message according to an embodiment of a communication method of the present application;
图3为根据本申请的通信方法的一个实施例的对于信息接收消息处理的流程图;3 is a flow chart of processing an information receiving message according to an embodiment of a communication method of the present application;
图4为使用本申请的采用的层次结构设计示意图;4 is a schematic diagram of a hierarchical structure design using the present application;
图5为根据本申请的通信系统的一个实施例的模块示意图;5 is a block diagram of an embodiment of a communication system in accordance with the present application;
图6为根据本申请的电子设备的一个实施例的示意图;6 is a schematic diagram of one embodiment of an electronic device in accordance with the present application;
图7为根据本申请的计算机集群的一个实施例的示意图。7 is a schematic diagram of one embodiment of a computer cluster in accordance with the present application.
具体实施例Specific embodiment
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。Various exemplary embodiments of the present application will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments are not intended to limit the scope of the application.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。In the meantime, it should be understood that the dimensions of the various parts shown in the drawings are not drawn in the actual scale relationship for the convenience of the description.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。The following description of the at least one exemplary embodiment is merely illustrative and is in no way
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but the techniques, methods and apparatus should be considered as part of the specification, where appropriate.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in one figure, it is not required to be further discussed in the subsequent figures.
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。The computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices.
下文中的“第一”用于描述上相区别,并没有其它特殊的含义。 The "first" below is used to describe the upper phase difference and has no other special meaning.
本申请的通信方法,对多个目标设备进行数据交互采用多线程技术,可以仅对数据的发送或接收采用多线程技术,也可以对数据的发送和接收都采用多线程技术。分别为多个目标设备中的至少一个或者每个目标设备都创建对应的线程,例如,创建与任一目标设备对应的通信线程和消息处理线程,消息处理线程包括消息发送线程、消息接收线程。例如,第一目标设备为多个目标设备中的任一目标设备,第一通信线程、第一消息发送线程和第一消息接收线程分别为与第一目标设备对应的通信线程,消息发送线程和消息接收线程。The communication method of the present application adopts multi-threading technology for data interaction of multiple target devices, and can adopt multi-threading technology only for data transmission or reception, and multi-threading technology for data transmission and reception. Corresponding threads are respectively created for at least one or each of the plurality of target devices, for example, a communication thread and a message processing thread corresponding to any target device are created, and the message processing thread includes a message sending thread and a message receiving thread. For example, the first target device is any one of the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and The message receiving thread.
图1为根据通信方法的一个实施例的流程图,如图1所示:1 is a flow chart according to an embodiment of a communication method, as shown in FIG. 1:
步骤101,第一消息发送线程向第一通信线程发送信息发送消息,例如,该信息发送消息可以为向一台目标服务器发送的某用户购买商品的信息等。Step 101: The first message sending thread sends an information sending message to the first communications thread. For example, the information sending message may be information about a user purchasing a product sent to a target server.
步骤102,第一通信线程通过调用IB接口基于信息发送消息向第一目标设备发送信息。本申请中的向目标设备发送、以及从目标设备接收的信息包括数据和/或控制命令。Step 102: The first communication thread sends information to the first target device based on the information sending message by calling the IB interface. The information transmitted to and received from the target device in the present application includes data and/or control commands.
步骤103,第一通信线程通过调用IB接口接收第一目标设备发送的数据,生成与接收的数据对应的信息接收消息并发送给第一消息接收线程。Step 103: The first communication thread receives the data sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received data, and sends the information to the first message receiving thread.
第一通信线程对接收到的信息发送消息进行处理,例如从信息发送消息中提取用户购买的商品信息,并根据预设的规则生成发送数据指令,将发送数据指令传递给IB接口,调用IB接口函数发送给目标设备。The first communication thread processes the received information sending message, for example, extracts the product information purchased by the user from the information sending message, and generates a sending data instruction according to a preset rule, and transmits the sending data instruction to the IB interface, and invokes the IB interface. The function is sent to the target device.
线程之间发送数据可以采用全局变量、消息机制等。信息发送消息可以为多种格式,包括消息头、消息体等,在消息头中封装对应的目标设备信息以及消息发送线程、通信线程信息,在消息体中封装需要发送的数据。信息接收消息也可以为多种格式,包括消息头、消息体等,在消息头中封装对应的目标设备信息以及消息接收线程、通信线程信息,在消息体中封装接收到的目标设备发送的数据。Data can be sent between threads using global variables, message mechanisms, and so on. The information sending message may be in multiple formats, including a message header, a message body, etc., and the corresponding target device information, the message sending thread, and the communication thread information are encapsulated in the message header, and the data to be sent is encapsulated in the message body. The information receiving message may also be in multiple formats, including a message header, a message body, etc., and the corresponding target device information, the message receiving thread, and the communication thread information are encapsulated in the message header, and the received data sent by the target device is encapsulated in the message body. .
判断是否为首次向第一目标设备发送数据,如果是,则创建与第一目标设备对应的第一消息发送线程。如果否,则使用已有的第一消息发送线程。判断是否首次接收第一目标设备发送的信息,如果是,则创建第一消息接收线程,如果否,则使用已有的第一消息接收线程。It is determined whether the data is sent to the first target device for the first time, and if so, the first message sending thread corresponding to the first target device is created. If no, the existing first message sending thread is used. It is determined whether the information sent by the first target device is received for the first time, and if so, the first message receiving thread is created, and if not, the existing first message receiving thread is used.
对于多个目标设备创建各自的通信线程,通信线程用于接收和发送信息,并接收和发送多种控制命令、反馈信息等。IB接口可包括但不限于IB VERBS、SDP(Sockets Direct Protocol,套接字直接协议)和IPoIB(Internet Protocol over InfiniBand,基于无线带宽的互联网协议)等。例如,通过直接调用IB VERBS接口或驱动,在进行IB传输中直接向IB VERBS接口或驱动发送控制指令,能够使用RDMA方式进行信息的传输。A plurality of target devices are created with respective communication threads for receiving and transmitting information, and receiving and transmitting various control commands, feedback information, and the like. The IB interface may include, but is not limited to, IB VERBS, SDP (Sockets Direct Protocol), and IPoIB (Internet Protocol over InfiniBand). For example, by directly calling the IB VERBS interface or driver, a control command can be directly sent to the IB VERBS interface or driver during IB transmission, and information can be transmitted using the RDMA method.
判断是否已创建了与第一目标设备对应的第一通信线程,如果是,则直接使用第一 通信线程进行信息的发送与接收,如果否,则创建第一通信线程。在创建通信线程时为每个通信线程分配其处理的rank号(标识号),rank号用于标识通信线程。例如,第一通信线程通过调用IB接口与第一目标设备通过socket建立连接,通过建立的socket连接交换RDMA通信环境信息,如缓冲区地址、数量等信息,信息传输则通过RDMA机制进行。第一通信线程收到第一目标设备的反馈信息,例如数据发送成功、失败等,将反馈信息传递给第一消息发送线程。Determining whether the first communication thread corresponding to the first target device has been created, and if so, directly using the first The communication thread transmits and receives information, and if not, creates a first communication thread. Each communication thread is assigned its rank number (identification number) when it creates the communication thread, and the rank number is used to identify the communication thread. For example, the first communication thread establishes a connection with the first target device through the socket by calling the IB interface, and exchanges the RDMA communication environment information, such as the buffer address and the quantity, through the established socket connection, and the information transmission is performed through the RDMA mechanism. The first communication thread receives feedback information of the first target device, for example, data transmission success, failure, etc., and delivers the feedback information to the first message sending thread.
第一通信线程通过调用IB接口接收第一目标设备发送的信息,第一通信线程将对应第一目标设备的信息接收消息传递给与第一目标设备对应的第一消息接收线程,第一消息接收线程对信息接收消息进行处理,例如,第一消息接收线程从信息接收消息中提取数据,并将数据写入数据库等。第一消息接收线程在对信息接收消息处理后,将反馈信息传递给第一通信线程,由第一通信线程发送给目标设备。The first communication thread receives the information sent by the first target device by calling the IB interface, and the first communication thread delivers the information receiving message corresponding to the first target device to the first message receiving thread corresponding to the first target device, and the first message is received. The thread processes the information receiving message, for example, the first message receiving thread extracts data from the information receiving message, and writes the data to a database or the like. After processing the information receiving message, the first message receiving thread transmits the feedback information to the first communication thread, and the first communication thread sends the message to the target device.
在上述实施例中的通信方法,对于目标设备建立有对应的通信线程,以及消息发送线程和消息接收线程,通信线程用于调用IB接口进行数据通信,消息发送线程和消息接收线程能够实现对于发送和接收信息的异步操作,能够提高数据传输的效率,有效地利用带宽。In the communication method in the above embodiment, a corresponding communication thread is established for the target device, and a message sending thread and a message receiving thread are used, and the communication thread is used to invoke the IB interface for data communication, and the message sending thread and the message receiving thread can implement the sending. And the asynchronous operation of receiving information can improve the efficiency of data transmission and effectively utilize the bandwidth.
在一个实施例中,第一消息发送线程在接收到信息发送消息时,判断第一通信线程是否正在传输数据,如果是,则将信息发送消息放入任务池中,如果否,则将信息发送消息传递给第一通信线程。在第一通信线程接收第一目标设备发送的数据时,判断第一消息接收线程是否正在处理信息,如果是,则将信息接收消息放入任务池中,如果否,则将信息接收消息传递给第一消息接收线程。In an embodiment, the first message sending thread determines whether the first communication thread is transmitting data when receiving the information sending message, and if yes, placing the information sending message into the task pool, and if not, sending the information The message is passed to the first communication thread. When the first communication thread receives the data sent by the first target device, it is determined whether the first message receiving thread is processing information, and if so, the information receiving message is placed in the task pool, and if not, the information receiving message is delivered to The first message receives the thread.
对于对多个目标设备采用多线程并发进行数据的接收、发送,采用任务池POOL进行缓存和调度。任务池为全局的任务池,用于缓存对应于全部目标设备的信息发送消息、信息接收消息。任务池可以是一个数组链表结构,通过关键步骤的锁操作,实现多线程下的线程安全,防止读写脏数据,不能立即进行操作的接收和发送操作可以被放入POOL中,进行缓存处理。For multi-threaded concurrent data reception and transmission for multiple target devices, the task pool POOL is used for caching and scheduling. The task pool is a global task pool and is used to cache information sending messages and information receiving messages corresponding to all target devices. The task pool can be an array linked list structure. Through the key operation lock operation, the thread security under multi-threading can be realized, and the dirty data can be prevented from being read and written. The receiving and sending operations that cannot be immediately performed can be put into the POOL for cache processing.
对于全部的目标设备启动全局任务处理线程轮询任务池,当全局任务处理线程判断任务池中缓存有对应于一个目标设备的信息发送消息时,则确定此目标设备对应的通信线程是否空闲,如果是,则从任务池中取出此目标设备对应的信息发送消息,并发送给此目标设备对应的通信线程。The global task processing thread polling the task pool is started for all the target devices. When the global task processing thread determines that the information sending message corresponding to one target device is cached in the task pool, it is determined whether the communication thread corresponding to the target device is idle, if If yes, the information sending message corresponding to the target device is taken out from the task pool, and sent to the communication thread corresponding to the target device.
当全局任务处理线程判断任务池中缓存有对应于一个目标设备的信息接收消息时,则确定此目标设备对应的消息接收线程是否空闲,如果是,则从任务池中取出此目标设备对应的信息接收消息,并发送给此目标设备对应的消息接收线程。全局任务处理线程 可以通过解析信息接收消息、信息发送消息的消息头获取消息接收线程、通信线程的信息。When the global task processing thread determines that the information receiving message corresponding to a target device is cached in the task pool, determining whether the message receiving thread corresponding to the target device is idle, and if yes, extracting information corresponding to the target device from the task pool. The message is received and sent to the message receiving thread corresponding to the target device. Global task processing thread The information of the message receiving thread and the communication thread can be obtained by parsing the message receiving message and the message header of the information sending message.
下面以使用任务池对第一目标设备的信息发送消息、信息接收消息为例进行说明。图2为根据本申请的通信方法的一个实施例的对信息发送消息处理的流程图,如图2所示:The following uses the task pool to send a message and a message receiving message to the first target device as an example. 2 is a flow chart of processing an information transmission message according to an embodiment of the communication method of the present application, as shown in FIG. 2:
步骤201,第一消息发送线程接收到对于第一目标设备的信息发送消息。Step 201: The first message sending thread receives an information sending message for the first target device.
步骤202,判断第一通信线程是否正在传输数据,如果是,则执行步骤203,如果否,则执行步骤204。Step 202: Determine whether the first communication thread is transmitting data. If yes, execute step 203. If no, execute step 204.
步骤203,第一消息发送线程将信息发送消息放入任务池中。到步骤S205。Step 203: The first message sending thread puts the information sending message into the task pool. Go to step S205.
步骤204,第一消息发送线程将信息发送消息发送给与第一目标设备对应的第一通信线程。到步骤S207。Step 204: The first message sending thread sends an information sending message to the first communication thread corresponding to the first target device. Go to step S207.
步骤205,全局任务处理线程判断第一通信线程传输数据(或控制命令)是否结束,即第一通信线程是否处于空闲状态,如果是,则执行步骤206,如果否,则继续执行步骤205。Step 205: The global task processing thread determines whether the first communication thread transmission data (or control command) ends, that is, whether the first communication thread is in an idle state. If yes, step 206 is performed, and if no, step 205 is continued.
步骤206,全局任务处理线程从任务池中取出信息发送消息并发送给第一通信线程。Step 206: The global task processing thread extracts the information sending message from the task pool and sends the message to the first communication thread.
步骤207,第一通信线程根据信息发送消息向第一目标设备发送数据。Step 207: The first communication thread sends data to the first target device according to the information sending message.
图3为根据本申请的通信方法的一个实施例的对信息接收消息处理的流程图,如图3所示:FIG. 3 is a flowchart of processing an information receiving message according to an embodiment of a communication method of the present application, as shown in FIG. 3:
步骤301,第一通信线程接收到第一目标设备发送的数据,生成信息接收消息。Step 301: The first communication thread receives the data sent by the first target device, and generates an information receiving message.
步骤302,判断第一消息接收线程是否正在处理数据(或控制命令),如果是,则执行步骤303,如果否,则执行步骤304。In step 302, it is determined whether the first message receiving thread is processing data (or a control command). If yes, step 303 is performed, and if no, step 304 is performed.
步骤303,第一通信线程将信息接收消息放入任务池中。到步骤S305。Step 303: The first communication thread puts the information receiving message into the task pool. Go to step S305.
步骤304,第一通信线程将信息接收消息发送给与第一目标设备对应的第一消息接收线程。到步骤S307。Step 304: The first communication thread sends an information receiving message to the first message receiving thread corresponding to the first target device. Go to step S307.
步骤305,全局任务处理线程判断第一消息接收线程处理数据是否结束,即第一消息接收线程是否处于空闲状态,如果是,则执行步骤306,如果否,则继续执行步骤305。Step 305: The global task processing thread determines whether the first message receiving thread processes the data end, that is, whether the first message receiving thread is in an idle state. If yes, step 306 is performed, and if no, step 305 is continued.
步骤306,全局任务处理线程从任务池中取出信息接收消息,并发送给第一消息接收线程。到步骤S307。Step 306: The global task processing thread retrieves the information receiving message from the task pool and sends the message to the first message receiving thread. Go to step S307.
步骤307,第一消息接收线程对信息接收消息进行处理。Step 307: The first message receiving thread processes the information receiving message.
可以对每个信息发送消息和信息接收消息都设置标记位,用于标识发送、接收数据操作的状态和结果,可以根据具体的需求进行设置。例如,设置与信息发送消息和信息接收消息相对应的标识位,标识位标识信息发送消息和信息接收消息的处理状态,例如: 0-初始、1-成功、-1-失败。A flag bit can be set for each information sending message and information receiving message, which is used to identify the status and result of the operation of sending and receiving data, and can be set according to specific needs. For example, setting an identifier bit corresponding to the information transmission message and the information reception message, the identification bit identifies the processing status of the information transmission message and the information reception message, for example: 0-initial, 1-successful, -1-failed.
消息发送线程和消息接收线程根据接收到的反馈信息为标识位赋值,采用加锁的机制,每个标识位从一个状态转移到另一个状态仅可能是一个线程在做,不存在另一个线程会把相同的状态转移到另一个不同的状态。能够提供标志位查询,不需要使用者再去调用函数即可获得操作的结果,可以减少函数调用返回而浪费的资源和时间,也可以方便设置异步传输时的时序。The message sending thread and the message receiving thread assign values to the flag according to the received feedback information, and adopt a locking mechanism. Each flag bit is transferred from one state to another state, and only one thread may be doing, and another thread may not exist. Move the same state to a different state. It can provide the flag bit query, and the user can obtain the result of the operation without calling the function again. It can reduce the waste of resources and time when the function call returns, and can also set the timing of the asynchronous transfer.
例如,第一消息发送线程接收到第一通信线程发送的对应于发送数据的反馈信息,根据反馈信息为信息发送消息的状态标识位赋值。第一消息接收线程根据对信息接收消息的处理结果对信息接收消息的状态标识位赋值。可以通过标志位确定发送、接收信息的结果,等待标志位为传输成功时再发起下一次传输。For example, the first message sending thread receives the feedback information corresponding to the sent data sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information. The first message receiving thread assigns a value to the status flag of the information receiving message according to the processing result of the information receiving message. The result of sending and receiving information can be determined by the flag bit, and the waiting flag is sent to the next transmission when the transmission is successful.
全局任务处理线程从任务池中取出与第一目标设备对应的信息发送消息、信息接收消息,并分别发送至第一通信线程、第一消息接收线程进行处理。判断从任务池中取出的与第一目标设备对应的信息发送消息和/或信息接收消息的状态标识位是否为成功,如果是,则全局任务处理线程基于信息发送消息、信息接收消息进入任务池中的顺序,从任务池中取出下一个与第一目标设备对应的信息发送消息、信息接收消息发送给第一通信线程、第一消息接收线程进行处理,如果否,则提示发生处理异常。The global task processing thread extracts the information sending message and the information receiving message corresponding to the first target device from the task pool, and sends the message to the first communication thread and the first message receiving thread for processing. Determining whether the status flag of the information sending message and/or the information receiving message corresponding to the first target device taken from the task pool is successful, and if yes, the global task processing thread enters the task pool based on the information sending message and the information receiving message. In the sequence, the next information sending message corresponding to the first target device is taken out from the task pool, and the information receiving message is sent to the first communication thread and the first message receiving thread for processing. If not, the processing abnormality is prompted.
在一个实施例中,接收到第一目标设备发送的信息,信息包括校验码和控制命令,根据接收到的存储控制命令将第一目标设备发送的控制命令存储在内存块中。在存储控制命令的过程中,根据控制命令接收进度确定控制命令对应的新校验码。In an embodiment, the information sent by the first target device is received, the information includes a check code and a control command, and the control command sent by the first target device is stored in the memory block according to the received storage control command. In the process of storing the control command, the new check code corresponding to the control command is determined according to the control command receiving progress.
将新校验码和接收到的校验码进行校验比较,如果一致则校验成功,确定完成对第一目标设备的信息接收,确定第一目标设备的信息发送完成并开始处理控制指令。在确定完成对第一目标设备的信息接收后,可以将接收到的校验码进行数据无效化处理,如填0、置为随机数等。确定控制命令的新校验码与第一目标设备生成校验码的算法相同,例如,可以采用冗余校验码CRC32算法计算出控制命令的新校验码。The new check code is compared with the received check code. If the check is successful, the check is successful, the information reception of the first target device is determined, the information transmission of the first target device is completed, and the control command is started. After determining to complete the information reception on the first target device, the received check code may be invalidated by the data, such as filling in 0, setting a random number, and the like. The new check code for determining the control command is the same as the algorithm for generating the check code by the first target device. For example, the new check code of the control command may be calculated by using the redundancy check code CRC32 algorithm.
通过根据接收到的信息中的校验码对控制命令进行校验、确定信息发送是否完成,不受限于底层库的实现逻辑,能够较少处理流程,从而提高运行效率、速度;通过实际测试验证,在接收信息时采用此种校验方式,能够更稳定、速度更快。By verifying the control command according to the check code in the received information, determining whether the information transmission is completed, and not being limited to the implementation logic of the underlying library, the processing flow can be reduced, thereby improving the operation efficiency and speed; Verification, using this type of verification when receiving information, can be more stable and faster.
在一个实施例中,在通过IB接口轮询查询到的操作异常事件时,判断是否注册有与异常事件相对应的异常处理函数。例如,判断是否能够获取异常处理函数实例的指针,如果是,则确定注册有与异常事件相对应的异常处理函数,如果否,则确定没有注册有与异常事件相对应的异常处理函数。如果确定注册有与异常事件相对应的异常处理函数,则自动回调异常处理函数,由异常处理函数处理异常。 In one embodiment, when the queried operation exception event is polled through the IB interface, it is determined whether an exception handling function corresponding to the exception event is registered. For example, it is judged whether or not a pointer of the exception handling function instance can be acquired, and if so, it is determined that the exception handling function corresponding to the exception event is registered, and if not, it is determined that the exception handling function corresponding to the exception event is not registered. If it is determined that an exception handler corresponding to the exception event is registered, the exception handler is automatically called back, and the exception handler handles the exception.
无论是否注册有与异常事件相对应的异常处理函数,都会对异常事件进行异常处理。例如,出现传输异常时,传输异常不仅包括调用IB接口函数时出现的各种异常,还包括调度线程出现异常、任务池出现死锁等。针对各种传输异常进行相应的处理,包括:命令数据重传、链接终止与断开、线程停止与清理等。例如,在出现严重的传输错误时,停止通信线程并断开连接,在连接断开的情况下,当有新的发送任务时,进行连接重建,并启动通信线程等。Exception handling is handled exceptions regardless of whether an exception handler corresponding to an exception event is registered. For example, when a transfer exception occurs, the transfer exception includes not only various exceptions that occur when the IB interface function is called, but also exceptions to the dispatch thread, deadlocks in the task pool, and so on. Corresponding processing for various transmission exceptions, including: command data retransmission, link termination and disconnection, thread stop and cleanup, etc. For example, when a serious transmission error occurs, the communication thread is stopped and disconnected, and when the connection is disconnected, when there is a new transmission task, the connection is reestablished, and the communication thread is started.
也可以针对事件提供回调函数功能。回调函数是一个通过函数指针调用的函数。对大部分事件都有基本的处理函数,如果注册的回调函数不为空,则发生相应事件时会还会回调用户定义注册的回调函数,使用户可以很方便的自行关注感兴趣的事件,并添加相应的操作。例如,设置对于接收控制命令的回调函数,当判断事件为接收控制命令事件时,会调用相应的对于接收控制命令的回调函数,可以直接对接收到的控制命令事件进行处理。Callback function functions can also be provided for events. A callback function is a function called by a function pointer. For most events, there are basic processing functions. If the registered callback function is not empty, the callback function of the user-defined registration will be called back when the corresponding event occurs, so that the user can conveniently pay attention to the event of interest. Add the appropriate action. For example, setting a callback function for receiving a control command, when determining that the event is a receiving control command event, calling a corresponding callback function for receiving the control command, and directly processing the received control command event.
在一个实施例中,对于本申请的通信方法可以设计多种实现方式。如图4所示,可以采用多层结构设计,在位于底层的IB VERBS接口函数层之上封装COMMON层,在COMMON层之上封装SIMA主通信层。用户可以根据需要选择SIMA主通信层或者仅使用COMMON层做一些操作功能的个性化定制。COMMON层封装IB VERBS接口函数并提供一些基本的事件处理逻辑和异常处理逻辑。可以直接使用COMMON层进行RDMA的操作。In one embodiment, multiple implementations can be devised for the communication method of the present application. As shown in FIG. 4, a multi-layer structure design can be adopted, the COMMON layer is encapsulated on the bottom layer of the IB VERBS interface function layer, and the SIMA main communication layer is encapsulated on the COMMON layer. The user can select the SIMA main communication layer or use the COMMON layer to do some customization of the operation functions as needed. The COMMON layer encapsulates the IB VERBS interface functions and provides some basic event processing logic and exception handling logic. The RMON operation can be performed directly using the COMMON layer.
在COMMON层中提供回调函数功能,COMMON层提供相应的操作错误函数指针存放位置,可以用于存放当发生IB操作错误时用于进行执行的函数指针,指针在SIMA主通信层时被SIMA主通信的错误回调处理函数注册(即指向这个函数)。COMMON层根据IB VERBS接口函数得到的对于发送、接收数据的结果,在自身提供基本的处理函数的同时,提供用于存放用户自定义函数指针的位置,当COMMON层检测到这些位置不为空时,在遇到这些事件会先调用这些用户自定义的函数,然后再进行自身基本处理。The callback function is provided in the COMMON layer. The COMMON layer provides the corresponding operation error function pointer storage location, which can be used to store the function pointer for execution when an IB operation error occurs. The pointer is used by the SIMA master communication when the SIMA main communication layer is present. The error callback handler is registered (ie, points to this function). The COMMON layer obtains the location of the user-defined function pointer while providing the basic processing function according to the result of the IB VERBS interface function for sending and receiving data. When the COMMON layer detects that the location is not empty, the COMMON layer detects that the location is not empty. In the event of these events, these user-defined functions are called first, and then their basic processing is performed.
SIMA主通信层可以对MPI(Message Passing Interface,消息传输接口)的启动方式进行兼容,能够在主流平台和集群管理软件下运行。SIMA主通信层在发送失败不进行重试操作,而是通过回调函数处理或直接反馈给调用者,由调用者自行决定是否重发。发送数据和接收数据均采用异步方式进行,不阻塞调用。The SIMA main communication layer can be compatible with the startup mode of the MPI (Message Passing Interface) and can run under the mainstream platform and cluster management software. The SIMA main communication layer does not perform the retry operation when it fails to transmit, but processes it through the callback function or directly feeds back to the caller, and the caller decides whether to resend it. Both the sending data and the receiving data are performed asynchronously without blocking the call.
上述实施例提供的通信方法,底层直接使用IB VERBS接口或驱动,对于目标设备的数据发送和接收采用多线程异步方式操作,能够支持数量不对等的接收和发送数据操作,可以提高信息传输的速度,有效地利用带宽。The communication method provided by the foregoing embodiment directly uses the IB VERBS interface or driver, and uses a multi-thread asynchronous operation for data transmission and reception of the target device, and can support an unequal number of receiving and transmitting data operations, thereby improving the speed of information transmission. , effectively use the bandwidth.
在一个实施例中,如图5所示,本申请提供通信系统50,包括:线程配置模块51、数据通信模块52、处理状态设置模块53和信息校验模块54。 In one embodiment, as shown in FIG. 5, the present application provides a communication system 50, including: a thread configuration module 51, a data communication module 52, a processing status setting module 53, and an information verification module 54.
线程配置模块51分别为多个目标设备中的至少一个目标设备创建对应的线程,创建的与目标设备对应的线程包括通信线程和消息处理线程,消息处理线程包括消息发送线程和/或消息接收线程。数据通信模块52基于创建的对应线程与对应的目标设备通信。The thread configuration module 51 respectively creates a corresponding thread for at least one target device of the plurality of target devices, and the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message processing thread includes a message sending thread and/or a message receiving thread. . The data communication module 52 communicates with the corresponding target device based on the created corresponding thread.
第一消息发送线程向第一通信线程发送信息发送消息,第一通信线程通过调用IB接口基于信息发送消息向第一目标设备发送信息。第一通信线程通过调用IB接口接收第一目标设备发送的信息,生成与接收的信息对应的信息接收消息并发送给第一消息接收线程。第一目标设备为多个目标设备中的一目标设备,第一通信线程、第一消息发送线程和第一消息接收线程分别为与第一目标设备对应的通信线程,消息发送线程和消息接收线程。The first message sending thread sends an information sending message to the first communication thread, and the first communication thread sends the information to the first target device based on the information sending message by calling the IB interface. The first communication thread receives the information sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received information, and sends the information to the first message receiving thread. The first target device is a target device among the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and a message receiving thread. .
响应于确定第一通信线程当前处于忙碌状态,第一消息发送线程将待发送给第一通信线程的信息发送消息放入任务池。响应于确定第一消息接收线程当前处于忙碌状态,第一通信线程将待发送给第一消息接收线程的信息接收消息放入任务池。任务池用于缓存对应于多个目标设备的信息发送消息和/或信息接收消息。In response to determining that the first communication thread is currently in a busy state, the first message sending thread places an information transmission message to be sent to the first communication thread into the task pool. In response to determining that the first message receiving thread is currently in a busy state, the first communication thread places an information receiving message to be sent to the first message receiving thread into the task pool. The task pool is used to cache information transmission messages and/or information reception messages corresponding to multiple target devices.
线程配置模块51创建与多个目标设备对应的全局任务处理线程。响应于确定第一通信线程当前处于空闲状态,全局任务处理线程从任务池中取出信息发送消息并发送给第一通信线程。响应于确定第一消息接收线程当前处于空闲状态,全局任务处理线程从任务池中取出信息接收消息并发送给第一消息接收线程。The thread configuration module 51 creates a global task processing thread corresponding to a plurality of target devices. In response to determining that the first communication thread is currently in an idle state, the global task processing thread fetches the informational send message from the task pool and sends it to the first communication thread. In response to determining that the first message receiving thread is currently in an idle state, the global task processing thread fetches the information receiving message from the task pool and sends it to the first message receiving thread.
处理状态设置模块53设置与信息发送消息相对应的状态标识位,设置与信息接收消息相对应的状态标识位。第一消息发送线程接收到第一通信线程发送的对应于发送信息的反馈信息,根据反馈信息为信息发送消息的状态标识位赋值。第一消息接收线程根据对信息接收消息的处理结果对信息接收消息的状态标识位赋值。The processing status setting module 53 sets a status flag corresponding to the information transmission message, and sets a status flag corresponding to the information reception message. The first message sending thread receives the feedback information corresponding to the sending information sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information. The first message receiving thread assigns a value to the status flag of the information receiving message according to the processing result of the information receiving message.
响应于从任务池中取出的与第一目标设备对应的信息发送消息和/或信息接收消息的状态标识位为成功,全局任务处理线程基于信息发送消息和/或信息接收消息进入任务池中的顺序,从任务池中取出下一个与第一目标设备对应的信息发送消息和/或信息接收消息发送给第一通信线程和/或第一消息接收线程。The global task processing thread enters the task pool based on the information sending message and/or the information receiving message in response to the success of the status sending bit of the information sending message and/or the information receiving message corresponding to the first target device taken from the task pool. In sequence, the next information sending message and/or information receiving message corresponding to the first target device is taken out from the task pool and sent to the first communication thread and/or the first message receiving thread.
信息校验模块54响应于接收到的信息(如包括校验码和控制命令等),根据用于存储控制命令的内存块的控制命令接收进度确定新校验码,将新校验码和接收到的校验码进行校验比较,并在校验成功时确定完成对第一目标设备的信息接收。信息校验模块54响应于确定完成对第一目标设备的信息接收,将接收到的校验码进行数据无效化处理。The information verification module 54 determines the new check code according to the received information (including the check code and the control command, etc.) according to the control command receiving the memory block for storing the control command, and the new check code and the receiving The check code to be verified is compared, and when the verification is successful, it is determined that the information reception of the first target device is completed. The information verification module 54 performs data invalidation processing on the received verification code in response to determining to complete the information reception of the first target device.
线程配置模块51响应于确定当前信息发送为向第一目标设备的首次信息发送,创建第一消息发送线程。响应于确定当前未创建与第一目标设备对应的第一通信线程,创建第一通信线程。响应于确定当前信息接收为首次接收第一目标设备发送的信息,创建第 一消息接收线程。在通过IB接口接收到操作异常事件时,第一通信线程响应于确定注册有与异常事件相对应的异常处理函数,回调异常处理函数。The thread configuration module 51 creates a first message sending thread in response to determining that the current information is sent as the first information transmission to the first target device. A first communication thread is created in response to determining that the first communication thread corresponding to the first target device is not currently created. In response to determining that the current information is received for the first time to receive information transmitted by the first target device, creating a A message receiving thread. Upon receiving an operation exception event through the IB interface, the first communication thread responds to determining that an exception handling function corresponding to the exception event is registered, and the exception handling function is called back.
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备600的结构示意图:如图6所示,计算机系统600包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)601,和/或一个或多个图像处理器(GPU)613等,处理器可以根据存储在只读存储器(ROM)602中的可执行指令或者从存储部分608加载到随机访问存储器(RAM)603中的可执行指令而执行各种适当的动作和处理。通信部612可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,The embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like. Referring to FIG. 6, there is shown a schematic structural diagram of an electronic device 600 suitable for implementing a terminal device or a server of an embodiment of the present application. As shown in FIG. 6, the computer system 600 includes one or more processors and a communication unit. Etc., the one or more processors, for example: one or more central processing units (CPUs) 601, and/or one or more image processing units (GPUs) 613, etc., the processors may be stored in a read-only memory ( Executable instructions in ROM) 602 or executable instructions loaded from random access memory (RAM) 603 from storage portion 608 perform various appropriate actions and processes. The communication unit 612 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
处理器可与只读存储器602和/或随机访问存储器603中通信以执行可执行指令,通过总线604与通信部612相连、并经通信部612与其他目标设备通信,从而完成本申请实施例提供的任一项通信方法对应的操作,例如,分别为多个目标设备中的每个目标设备创建对应的线程的指令,创建的与任一目标设备对应的线程包括通信线程和消息处理线程,任一消息处理线程包括消息发送线程和/或消息接收线程;基于配置的对应线程与对应的目标设备通信的指令。与第一目标设备的通信过程包括:第一消息发送线程向第一通信线程发送信息发送消息,第一通信线程通过调用IB接口基于信息发送消息向第一目标设备发送数据;和/或,第一通信线程通过调用IB接口接收第一目标设备发送的数据,生成与接收的数据对应的信息接收消息并发送给第一消息接收线程;第一目标设备为多个目标设备中的任一目标设备,第一通信线程、第一消息发送线程和第一消息接收线程分别为与第一目标设备对应的通信线程,消息发送线程和消息接收线程。The processor can communicate with the read-only memory 602 and/or the random access memory 603 to execute executable instructions, connect to the communication unit 612 via the bus 604, and communicate with other target devices via the communication unit 612, thereby completing the embodiments of the present application. An operation corresponding to any one of the communication methods, for example, an instruction for creating a corresponding thread for each of the plurality of target devices, and the created thread corresponding to any target device includes a communication thread and a message processing thread. A message processing thread includes a message sending thread and/or a message receiving thread; an instruction based on the configured corresponding thread to communicate with the corresponding target device. The communication process with the first target device includes: the first message sending thread sends an information sending message to the first communication thread, and the first communication thread sends the data to the first target device based on the information sending message by calling the IB interface; and/or, The communication thread receives the data sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received data, and sends the message to the first message receiving thread; the first target device is any one of the plurality of target devices. The first communication thread, the first message sending thread, and the first message receiving thread are respectively a communication thread corresponding to the first target device, a message sending thread, and a message receiving thread.
此外,在RAM 603中,还可存储有装置操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。在有RAM603的情况下,ROM602为可选模块。RAM603存储可执行指令,或在运行时向ROM602中写入可执行指令,可执行指令使处理器601执行上述通信方法对应的操作。输入/输出(I/O)接口605也连接至总线604。通信部612可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。Further, in the RAM 603, various programs and data required for the operation of the device can be stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. In the case of RAM 603, ROM 602 is an optional module. The RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the processor 601 to perform operations corresponding to the above-described communication methods. An input/output (I/O) interface 605 is also coupled to bus 604. The communication unit 612 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。 The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet. Driver 610 is also coupled to I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
需要说明的,如图6所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图6的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。It should be noted that the architecture shown in FIG. 6 is only an optional implementation manner. In a specific implementation process, the number and type of components in FIG. 6 may be selected, deleted, added, or replaced according to actual needs; Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more. These alternative embodiments are all within the scope of protection disclosed herein.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的通信方法步骤对应的指令,例如,接收对于目标设备的信息发送消息的指令;将信息发送消息传递给与目标设备对应的发送线程的指令;发送线程根据信息发送消息向与目标设备对应的底层通信线程传递发送数据指令,底层通信线程通过调用IB接口向目标设备发送数据,并将反馈信息传递给发送线程。在这样的实施例中,该计算机程序可以通过通信部612从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing an instruction corresponding to the communication method step provided by the embodiment of the present application, for example, receiving an instruction for sending an information of a target device; transmitting an information sending message to an instruction of a sending thread corresponding to the target device; and sending a message according to the information sending message The underlying communication thread corresponding to the target device transmits a send data instruction, and the underlying communication thread sends data to the target device by calling the IB interface, and delivers the feedback information to the sending thread. In such an embodiment, the computer program can be downloaded and installed from the network via the communication portion 612, and/or installed from the removable medium 611. When the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the present application are performed.
本申请实施例还提供了一种计算机集群,如图7所示,计算机集群包括:IB交换机71和多个如上的电子设备72、73、74……75、76、77。IB交换机71与多个电子设备72、73、74……75、76、77可以通过总线、网线等进行连接,每个电子设备设置有通信部(例如为IB网卡),每个电子设备的通信部通过IB交换机71与其它的电子设备进行通信。The embodiment of the present application further provides a computer cluster. As shown in FIG. 7, the computer cluster includes: an IB switch 71 and a plurality of electronic devices 72, 73, 74, ... 75, 76, 77 as above. The IB switch 71 and the plurality of electronic devices 72, 73, 74, ..., 75, 76, 77 can be connected by a bus, a network cable, etc., each of which is provided with a communication portion (for example, an IB network card), and communication of each electronic device The department communicates with other electronic devices through the IB switch 71.
上述实施例提供的通信方法和系统、电子设备和计算机集群,对于目标设备的数据发送和接收采用多线程方式,接收和发送数据采用异步方式操作,并提供标识位用以标识操作的结果或状态,支持数量不对等的接收和发送数据操作,能够提高数据传输的速度,有效地利用带宽;采用层次结构设计,通过调用IB接口或驱动采用RDMA作为数据传输手段,提供自定义回调支持,方便用户自行定义操作,完成特殊功能,使通信与计算更加高效的结合。The communication method and system, the electronic device and the computer cluster provided by the foregoing embodiments adopt a multi-threaded manner for data transmission and reception of the target device, and receive and transmit data in an asynchronous manner, and provide an identifier bit to identify the result or state of the operation. Supporting the number of unequal reception and transmission data operations, can improve the speed of data transmission, and effectively use the bandwidth; adopt hierarchical structure design, provide RID as a data transmission means by calling IB interface or driver, provide custom callback support, convenient for users Customize operations and perform special functions to make communication and calculations more efficient.
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。The methods, apparatus, and apparatus of the present application may be implemented in a number of ways. For example, the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated. Moreover, in some embodiments, the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限 于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。 The description of the application is given for the sake of example and description, and is not exhaustive or limited to the present application. In the form disclosed. Many modifications and variations will be apparent to those skilled in the art. The embodiments were chosen and described in order to best explain the principles and embodiments of the embodiments of the invention,

Claims (24)

  1. 一种通信方法,其特征在于,包括:A communication method, comprising:
    为多个目标设备中的至少一个目标设备创建对应的线程,创建的与目标设备对应的线程包括通信线程和消息处理线程,消息处理线程包括消息发送线程和/或消息接收线程;Creating a corresponding thread for at least one target device of the plurality of target devices, the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message processing thread includes a message sending thread and/or a message receiving thread;
    基于创建的对应线程与对应的目标设备通信,其中,与第一目标设备的通信过程包括:第一消息发送线程向第一通信线程发送信息发送消息,所述第一通信线程通过调用无限带宽IB接口基于所述信息发送消息向所述第一目标设备发送信息;和/或,第一通信线程通过调用IB接口接收所述第一目标设备发送的信息,生成与接收的信息对应的信息接收消息并发送给第一消息接收线程;And the corresponding target device communicates with the corresponding target device, wherein the communication process with the first target device includes: the first message sending thread sends an information sending message to the first communication thread, where the first communication thread calls the infinite bandwidth IB The interface sends information to the first target device based on the information sending message; and/or, the first communication thread receives the information sent by the first target device by calling an IB interface, and generates an information receiving message corresponding to the received information. And sent to the first message receiving thread;
    所述第一目标设备为所述多个目标设备中的一目标设备,所述第一通信线程、所述第一消息发送线程和所述第一消息接收线程分别为与所述第一目标设备对应的通信线程,消息发送线程和消息接收线程。The first target device is one of the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively the first target device Corresponding communication thread, message sending thread and message receiving thread.
  2. 如权利要求1所述的方法,其特征在于,所述与第一目标设备的通信过程还包括:The method of claim 1, wherein the communication process with the first target device further comprises:
    响应于确定所述第一通信线程当前处于忙碌状态,所述第一消息发送线程将待发送给所述第一通信线程的所述信息发送消息放入任务池;和/或,In response to determining that the first communication thread is currently in a busy state, the first messaging thread places the information transmission message to be sent to the first communication thread into a task pool; and/or,
    响应于确定所述第一消息接收线程当前处于忙碌状态,所述第一通信线程将待发送给第一消息接收线程的所述信息接收消息放入所述任务池;In response to determining that the first message receiving thread is currently in a busy state, the first communication thread puts the information receiving message to be sent to the first message receiving thread into the task pool;
    其中,所述任务池用于缓存对应于所述多个目标设备的待处理信息发送消息和/或信息接收消息。The task pool is configured to cache a to-be-processed information sending message and/or an information receiving message corresponding to the multiple target devices.
  3. 如权利要求2所述的方法,其特征在于,还包括:The method of claim 2, further comprising:
    响应于确定所述第一通信线程当前处于空闲状态,预先创建的与所述多个目标设备对应的全局任务处理线程从所述任务池中取出与所述第一目标设备对应的所述信息发送消息并发送给所述第一通信线程;和/或,In response to determining that the first communication thread is currently in an idle state, a pre-created global task processing thread corresponding to the plurality of target devices retrieves the information transmission corresponding to the first target device from the task pool Sending a message to the first communication thread; and/or,
    响应于确定所述第一消息接收线程当前处于空闲状态,预先创建的与所述多个目标设备对应的全局任务处理线程从所述任务池中取出与所述第一目标设备对应的所述信息接收消息并发送给所述第一消息接收线程。In response to determining that the first message receiving thread is currently in an idle state, a pre-created global task processing thread corresponding to the plurality of target devices retrieves the information corresponding to the first target device from the task pool A message is received and sent to the first message receiving thread.
  4. 如权利要求3所述的方法,其特征在于,所述与第一目标设备的通信过程还包括:The method of claim 3, wherein the communicating with the first target device further comprises:
    设置与所述信息发送消息相对应的状态标识位;Setting a status identifier bit corresponding to the information sending message;
    所述第一消息发送线程接收到所述第一通信线程发送的对应于发送信息的反馈信息,根据所述反馈信息为所述信息发送消息的状态标识位赋值;和/或, The first message sending thread receives the feedback information corresponding to the sending information sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information; and/or,
    设置与所述信息接收消息相对应的状态标识位;Setting a status identifier bit corresponding to the information receiving message;
    所述第一消息接收线程根据对所述信息接收消息的处理结果对所述信息接收消息的状态标识位赋值。The first message receiving thread assigns a value to a status identifier of the information receiving message according to a processing result of the information receiving message.
  5. 如权利要求4所述的方法,其特征在于,所述与第一目标设备的通信过程还包括:The method of claim 4, wherein the communicating with the first target device further comprises:
    响应于从所述任务池中取出的与所述第一目标设备对应的所述信息发送消息和/或所述信息接收消息的状态标识位为成功,所述全局任务处理线程基于所述信息发送消息和/或所述信息接收消息进入所述任务池中的顺序,从所述任务池中取出下一个与所述第一目标设备对应的所述信息发送消息和/或所述信息接收消息发送给所述第一通信线程和/或所述第一消息接收线程。Responding to the success of the information sending message corresponding to the first target device and/or the status flag of the information receiving message taken from the task pool, the global task processing thread is sent based on the information Sending a message and/or the information receiving message into the task pool, and extracting, from the task pool, the next information sending message corresponding to the first target device and/or sending the information receiving message Giving the first communication thread and/or the first message receiving thread.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述与第一目标设备的通信过程还包括:The method according to any one of claims 1 to 5, wherein the communication process with the first target device further comprises:
    响应于所述接收到的包括校验码和控制命令的信息,根据用于存储控制命令的内存块的控制命令接收进度确定新校验码,将新校验码和接收到的校验码进行校验比较,并在校验成功时确定完成对所述第一目标设备的信息接收。Responding to the received information including the check code and the control command, determining a new check code according to a control command receiving progress of the memory block for storing the control command, and performing the new check code and the received check code The comparison is verified, and when the verification is successful, it is determined that the information reception of the first target device is completed.
  7. 根据权利要求6所述的方法,其特征在于,所述与第一目标设备的通信过程还包括:The method according to claim 6, wherein the communication process with the first target device further comprises:
    响应于确定完成对所述第一目标设备的信息接收,将接收到的所述校验码进行数据无效化处理。In response to determining to complete the information reception of the first target device, the received check code is subjected to data invalidation processing.
  8. 如权利要求1至7任一项所述的方法,其特征在于,为所述第一目标设备创建对应的线程包括:The method according to any one of claims 1 to 7, wherein the creating a corresponding thread for the first target device comprises:
    响应于确定当前信息发送为向所述第一目标设备的首次信息发送,创建所述第一消息发送线程;和/或,Creating the first message sending thread in response to determining that the current information is sent as the first information transmission to the first target device; and/or,
    响应于确定当前未创建与所述第一目标设备对应的所述第一通信线程,创建所述第一通信线程;和/或,Creating the first communication thread in response to determining that the first communication thread corresponding to the first target device is not currently created; and/or,
    响应于确定当前信息接收为首次接收所述第一目标设备发送的信息,创建所述第一消息接收线程。The first message receiving thread is created in response to determining that the current information is received for the first time to receive information transmitted by the first target device.
  9. 如权利要求1至8任一项所述的方法,其特征在于,还包括:The method of any of claims 1 to 8, further comprising:
    在通过所述IB接口接收到操作异常事件时,所述第一通信线程响应于确定注册有与所述异常事件相对应的异常处理函数,回调所述异常处理函数。Upon receiving an operation exception event through the IB interface, the first communication thread calls back the exception handling function in response to determining that an exception handling function corresponding to the exception event is registered.
  10. 如权利要求1至9任一项所述的方法,其特征在于:A method according to any one of claims 1 to 9, wherein:
    所述IB接口包括:IB VERBS接口。The IB interface includes: an IB VERBS interface.
  11. 一种通信系统,其特征在于,包括: A communication system, comprising:
    线程配置模块,用于为多个目标设备中的至少一个目标设备创建对应的线程,创建的与目标设备对应的线程包括通信线程和消息处理线程,消息处理线程包括消息发送线程和/或消息接收线程;a thread configuration module, configured to create a corresponding thread for at least one target device of the plurality of target devices, where the created thread corresponding to the target device includes a communication thread and a message processing thread, and the message processing thread includes a message sending thread and/or a message receiving Thread
    数据通信模块,用于基于创建的对应线程与对应的目标设备通信;a data communication module, configured to communicate with a corresponding target device based on the created corresponding thread;
    其中,第一消息发送线程向第一通信线程发送信息发送消息,所述第一通信线程通过调用IB接口基于所述信息发送消息向所述第一目标设备发送信息;和/或,第一通信线程通过调用IB接口接收所述第一目标设备发送的信息,生成与接收的信息对应的信息接收消息并发送给第一消息接收线程;The first message sending thread sends an information sending message to the first communication thread, where the first communication thread sends information to the first target device based on the information sending message by calling an IB interface; and/or, the first communication The thread receives the information sent by the first target device by calling the IB interface, generates an information receiving message corresponding to the received information, and sends the information to the first message receiving thread;
    所述第一目标设备为所述多个目标设备中的一目标设备,所述第一通信线程、所述第一消息发送线程和所述第一消息接收线程分别为与所述第一目标设备对应的通信线程,消息发送线程和消息接收线程。The first target device is one of the plurality of target devices, and the first communication thread, the first message sending thread, and the first message receiving thread are respectively the first target device Corresponding communication thread, message sending thread and message receiving thread.
  12. 如权利要求11所述的系统,其特征在于,The system of claim 11 wherein:
    响应于确定所述第一通信线程当前处于忙碌状态,所述第一消息发送线程将待发送给所述第一通信线程的所述信息发送消息放入任务池;和/或,In response to determining that the first communication thread is currently in a busy state, the first messaging thread places the information transmission message to be sent to the first communication thread into a task pool; and/or,
    响应于确定所述第一消息接收线程当前处于忙碌状态,所述第一通信线程将待发送给第一消息接收线程的所述信息接收消息放入所述任务池;In response to determining that the first message receiving thread is currently in a busy state, the first communication thread puts the information receiving message to be sent to the first message receiving thread into the task pool;
    其中,所述任务池用于缓存对应于所述多个目标设备的待处理信息发送消息和/或信息接收消息。The task pool is configured to cache a to-be-processed information sending message and/or an information receiving message corresponding to the multiple target devices.
  13. 如权利要求12所述的系统,其特征在于,The system of claim 12 wherein:
    所述线程配置模块,还用于创建与所述多个目标设备对应的全局任务处理线程;The thread configuration module is further configured to create a global task processing thread corresponding to the plurality of target devices;
    其中,响应于确定所述第一通信线程当前处于空闲状态,所述全局任务处理线程从所述任务池中取出与所述第一目标设备对应的所述信息发送消息并发送给所述第一通信线程;和/或,响应于确定所述第一消息接收线程当前处于空闲状态,所述全局任务处理线程从所述任务池中取出与所述第一目标设备对应的所述信息接收消息并发送给所述第一消息接收线程。The global task processing thread extracts the information sending message corresponding to the first target device from the task pool and sends the information to the first in response to determining that the first communication thread is currently in an idle state. a communication thread; and/or, in response to determining that the first message receiving thread is currently in an idle state, the global task processing thread fetches the information receiving message corresponding to the first target device from the task pool and Sended to the first message receiving thread.
  14. 如权利要求13所述的系统,其特征在于,还包括:The system of claim 13 further comprising:
    处理状态设置模块,用于设置与所述信息发送消息相对应的状态标识位,和/或,设置与所述信息接收消息相对应的状态标识位;a processing status setting module, configured to set a status identifier bit corresponding to the information sending message, and/or to set a status identifier bit corresponding to the information receiving message;
    其中,所述第一消息发送线程接收到所述第一通信线程发送的对应于发送信息的反馈信息,根据所述反馈信息为所述信息发送消息的状态标识位赋值;和/或,所述第一消息接收线程根据对所述信息接收消息的处理结果对所述信息接收消息的状态标识位赋值。 The first message sending thread receives the feedback information corresponding to the sending information sent by the first communication thread, and assigns a status flag to the information sending message according to the feedback information; and/or, the The first message receiving thread assigns a value to the status flag of the information receiving message according to the processing result of the information receiving message.
  15. 如权利要求14所述的系统,其特征在于,The system of claim 14 wherein:
    响应于从所述任务池中取出的与所述第一目标设备对应的所述信息发送消息和/或所述信息接收消息的状态标识位为成功,所述全局任务处理线程基于所述信息发送消息和/或所述信息接收消息进入所述任务池中的顺序,从所述任务池中取出下一个与所述第一目标设备对应的所述信息发送消息和/或所述信息接收消息发送给所述第一通信线程和/或所述第一消息接收线程。Responding to the success of the information sending message corresponding to the first target device and/or the status flag of the information receiving message taken from the task pool, the global task processing thread is sent based on the information Sending a message and/or the information receiving message into the task pool, and extracting, from the task pool, the next information sending message corresponding to the first target device and/or sending the information receiving message Giving the first communication thread and/or the first message receiving thread.
  16. 根据权利要求11至15中任一项所述的系统,其特征在于,还包括:A system according to any one of claims 11 to 15 further comprising:
    信息校验模块,用于响应于所述接收到的信息包括校验码和控制命令,根据用于存储控制命令的内存块的控制命令接收进度确定新校验码,将新校验码和接收到的校验码进行校验比较,并在校验成功时确定完成对所述第一目标设备的信息接收。The information verification module is configured to: in response to the received information including a check code and a control command, determine a new check code according to a control command receiving progress of a memory block for storing the control command, and generate a new check code and receive The check code to be checked for comparison, and when the verification is successful, it is determined that the information reception of the first target device is completed.
  17. 根据权利要求16所述的方法,其特征在于,The method of claim 16 wherein:
    所述信息校验模块,还用于响应于确定完成对所述第一目标设备的信息接收,将接收到的所述校验码进行数据无效化处理。The information verification module is further configured to perform data invalidation processing on the received check code in response to determining to complete information reception on the first target device.
  18. 如权利要求11至17任一项所述的系统,其特征在于,A system according to any one of claims 11 to 17, wherein
    所述线程配置模块,还用于响应于确定当前信息发送为向所述第一目标设备的首次信息发送,创建所述第一消息发送线程;和/或,响应于确定当前未创建与所述第一目标设备对应的所述第一通信线程,创建所述第一通信线程;和/或,响应于确定当前信息接收为首次接收所述第一目标设备发送的信息,创建所述第一消息接收线程。The thread configuration module is further configured to create the first message sending thread in response to determining that the current information is sent as the first information transmission to the first target device; and/or, in response to determining that the current information is not created and Creating, by the first communication thread corresponding to the first target device, the first communication thread; and/or, in response to determining that the current information is received for the first time to receive information sent by the first target device, creating the first message Receive thread.
  19. 如权利要求11至18任一项所述的系统,其特征在于,A system according to any one of claims 11 to 18, wherein
    在通过所述IB接口接收到操作异常事件时,所述第一通信线程响应于确定注册有与所述异常事件相对应的异常处理函数,回调所述异常处理函数。Upon receiving an operation exception event through the IB interface, the first communication thread calls back the exception handling function in response to determining that an exception handling function corresponding to the exception event is registered.
  20. 如权利要求11至19任一项所述的系统,其特征在于:A system according to any one of claims 11 to 19, wherein:
    所述IB接口包括:IB VERBS接口。The IB interface includes: an IB VERBS interface.
  21. 一种电子设备,其特征在于,包括:处理器、存储器、IB通信部和通信总线,所述处理器、所述存储器和所述通信部通过所述通信总线完成相互间的通信;An electronic device, comprising: a processor, a memory, an IB communication unit, and a communication bus, wherein the processor, the memory, and the communication unit complete communication with each other through the communication bus;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-10任一所述的通信方法对应的操作。The memory is for storing at least one executable instruction that causes the processor to perform operations corresponding to the communication method of any of claims 1-10.
  22. 根据权利要求21所述的电子设备,其特征在于,所述IB通信部包括IB网卡,所述处理器经所述通信总线并通过所述IB网卡与外部其他电子设备通信。The electronic device of claim 21, wherein the IB communication portion comprises an IB network card, and the processor communicates with other external electronic devices via the communication bus and through the IB network card.
  23. 一种计算机集群,其特征在于,包括多个如权利要求21或22所述的电子设备和以及与各电子设备分别连接的交换设备,任一电子设备通过各自的IB通信部并经所述交换设备与其他电子设备通信。 A computer cluster, comprising: a plurality of electronic devices according to claim 21 or 22, and switching devices respectively connected to the respective electronic devices, each of the electronic devices passing through the respective IB communication portion and being exchanged The device communicates with other electronic devices.
  24. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,所述处理器执行权利要求1-10中任一所述的通信方法中的各步骤。 A computer program comprising computer instructions that, when executed in a processor of a device, perform the steps of the communication method of any of claims 1-10.
PCT/CN2017/108429 2016-10-28 2017-10-30 Communication method and system, electronic device and computer cluster WO2018077284A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/234,890 US10693816B2 (en) 2016-10-28 2018-12-28 Communication methods and systems, electronic devices, and computer clusters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610967290.6A CN108011909B (en) 2016-10-28 2016-10-28 Communication method and system, electronic device and computer cluster
CN201610967290.6 2016-10-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/234,890 Continuation US10693816B2 (en) 2016-10-28 2018-12-28 Communication methods and systems, electronic devices, and computer clusters

Publications (1)

Publication Number Publication Date
WO2018077284A1 true WO2018077284A1 (en) 2018-05-03

Family

ID=62024391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108429 WO2018077284A1 (en) 2016-10-28 2017-10-30 Communication method and system, electronic device and computer cluster

Country Status (3)

Country Link
US (1) US10693816B2 (en)
CN (1) CN108011909B (en)
WO (1) WO2018077284A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659140B (en) * 2018-06-30 2022-01-04 武汉斗鱼网络科技有限公司 Instruction execution method and related equipment
US11425183B2 (en) * 2019-06-07 2022-08-23 Eaton Intelligent Power Limited Multi-threaded data transfer to multiple remote devices using wireless hart protocol
CN110297722B (en) * 2019-06-28 2021-08-24 Oppo广东移动通信有限公司 Thread task communication method and related product
CN111899150A (en) * 2020-08-28 2020-11-06 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
CN114764346A (en) * 2021-01-14 2022-07-19 华为技术有限公司 Data transmission method, system and computing node
CN115729688B (en) * 2022-11-23 2023-09-12 北京百度网讯科技有限公司 Multithreading scheduling method and device for processor, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409715A (en) * 2008-10-22 2009-04-15 中国科学院计算技术研究所 Method and system for communication using InfiniBand network
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
US20120216216A1 (en) * 2011-02-21 2012-08-23 Universidade Da Coruna Method and middleware for efficient messaging on clusters of multi-core processors
CN104639596A (en) * 2013-11-08 2015-05-20 塔塔咨询服务有限公司 System and method for multiple sender support in low latency fifo messaging using rdma

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037224B2 (en) * 2002-10-08 2011-10-11 Netlogic Microsystems, Inc. Delegating network processor operations to star topology serial bus interfaces
US20050091334A1 (en) * 2003-09-29 2005-04-28 Weiyi Chen System and method for high performance message passing
CN103562882B (en) * 2011-05-16 2016-10-12 甲骨文国际公司 For providing the system and method for messaging application interface
US9086909B2 (en) * 2011-05-17 2015-07-21 Oracle International Corporation System and method for supporting work sharing muxing in a cluster
US9026705B2 (en) * 2012-08-09 2015-05-05 Oracle International Corporation Interrupt processing unit for preventing interrupt loss
FR3022420B1 (en) * 2014-06-13 2018-03-23 Bull Sas METHODS AND SYSTEMS FOR MANAGING AN INTERCONNECTION NETWORK

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409715A (en) * 2008-10-22 2009-04-15 中国科学院计算技术研究所 Method and system for communication using InfiniBand network
US20120216216A1 (en) * 2011-02-21 2012-08-23 Universidade Da Coruna Method and middleware for efficient messaging on clusters of multi-core processors
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
CN104639596A (en) * 2013-11-08 2015-05-20 塔塔咨询服务有限公司 System and method for multiple sender support in low latency fifo messaging using rdma

Also Published As

Publication number Publication date
US20190140982A1 (en) 2019-05-09
US10693816B2 (en) 2020-06-23
CN108011909B (en) 2020-09-01
CN108011909A (en) 2018-05-08

Similar Documents

Publication Publication Date Title
WO2018077284A1 (en) Communication method and system, electronic device and computer cluster
US10341196B2 (en) Reliably updating a messaging system
CN106161537B (en) Method, device and system for processing remote procedure call and electronic equipment
Jose et al. Memcached design on high performance RDMA capable interconnects
US9558048B2 (en) System and method for managing message queues for multinode applications in a transactional middleware machine environment
US8468541B2 (en) Event driven sendfile
US9088622B2 (en) Dynamic client registration for server push events in long polling scenarios
US20190132276A1 (en) Unified event processing for data/event exchanges with existing systems
US20120197959A1 (en) Processing pattern framework for dispatching and executing tasks in a distributed computing grid
CN108063813B (en) Method and system for parallelizing password service network in cluster environment
CN103414579A (en) Cross-platform monitoring system applicable to cloud computing and monitoring method thereof
US20130219009A1 (en) Scalable data feed system
CN108733496B (en) Event processing method and device
CN113703997A (en) Bidirectional asynchronous communication middleware system integrating multiple message agents and implementation method
US7587399B2 (en) Integrated software toolset for a web server
US9176796B2 (en) Shared memory reusable IPC library
US8316083B2 (en) System and method for client interoperability
CN114584618A (en) Information interaction method, device, equipment, storage medium and system
CN108347471B (en) Method, device and system for acquiring third-party user information
AU2017382907B2 (en) Technologies for scaling user interface backend clusters for database-bound applications
US20210334185A1 (en) Task based service management platform
US20100250684A1 (en) High availability method and apparatus for shared resources
CN107483384B (en) Network data interaction method and device
Soumagne et al. Advancing RPC for Data Services at Exascale.
US20050076106A1 (en) Asynchronous information retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865590

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/08/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17865590

Country of ref document: EP

Kind code of ref document: A1