WO2022267304A1

WO2022267304A1 - Network communication method, computing device, and readable storage medium

Info

Publication number: WO2022267304A1
Application number: PCT/CN2021/129671
Authority: WO
Inventors: 马海亮; 孟杰; 薛皓琳; 吴昆鹏
Original assignee: 统信软件技术有限公司
Priority date: 2021-06-25
Filing date: 2021-11-10
Publication date: 2022-12-29
Also published as: CN113452532A; CN115242563A; CN113452532B; CN115242563B

Abstract

Disclosed in the present invention is a network communication method performed in a computing device, the computing device comprising a Loongson processor, the method comprising: acquiring a network communication software framework UCX; adding to the UCX a target function that supports Loongson processor architecture and adding to the processor mode acquisition function of the UCX the function of acquiring a Loongson processor mode to obtain target UCX; and compiling and installing the target UCX on the computing device, so that the computing device performs network communication using an interface provided by the target UCX. Also disclosed in the present invention are a corresponding computing device and readable storage medium. The network communication method of the present invention enables a Loongson platform that is not supported by original UCX can also achieve high-speed network interconnection communication using the interface provided by the UCX.

Description

A network communication method, computing device and readable storage medium

technical field

The invention relates to the field of computers, in particular to a network communication method, computing equipment and a readable storage medium.

Background technique

With the increasing demand for computing, high-performance parallel computing is becoming more and more important. Among them, high-speed interconnection network communication is an important part of high-performance parallel computing, and plays a vital role in the calculation efficiency of high-performance parallel computing. Currently, there is UCX (Unified Communications X) that can realize high-speed network interconnection communication.

UCX is a network communication framework (a collection of libraries and interfaces) that provides efficient and relatively simple ways to build widely used HPC (High Performance Computing) protocols: label matching, remote memory access operations, streams, remote atomic operations, etc.

However, the existing UCX has poor scalability and only supports X86_64, Power8, Power9 and Arm v8 architectures. Therefore, platforms based on other architectures cannot utilize the interfaces provided by UCX to realize high-speed network interconnection communication.

Contents of the invention

Therefore, the present invention provides a network communication method, a computing device and a readable storage medium in an attempt to solve or at least alleviate the above existing problems.

According to one aspect of the present invention, a network communication method is provided, executed in a computing device, the computing device includes a Godson processor, the method includes: obtaining a network communication software framework UCX; adding an objective function supporting the Godson processor architecture in UCX And add the function of obtaining the Godson processor mode in the function of obtaining the processor mode of UCX, and obtain the target UCX. The target function includes refreshing the processor data and instruction cache function, calculating the leading zero function in the binary code, and the inline hook function; The target UCX is compiled and installed on the device, so that the computing device uses the interface provided by the target UCX for network communication.

Optionally, in the network communication method according to the present invention, the step of adding an objective function supporting the Godson processor architecture in UCX includes: adding a refresh processor data and instruction cache supporting the Godson processor architecture in the UCS part of UCX function and calculate the leading zero function in the binary code; add an inline hook function that supports the Godson processor architecture in the UCM part of UCX.

Optionally, in the network communication method according to the present invention, the step of increasing the function of obtaining the Godson processor mode in the function of obtaining the processor mode of UCX includes: adding the Godson processor mode enumeration in the processor mode enumeration type Enumeration: Add the logic of obtaining the enumeration item of the Godson processor mode in the function of obtaining the processor mode.

Optionally, in the network communication method according to the present invention, after the UCS part of UCX adds the refresh processor data and instruction cache function supporting the Godson processor architecture, refresh the Godson processor data and the instruction cache function through the following inline assembly expression Instruction cache:

asm volatile("sync":::"memory")

Among them, asm is used to declare an inline assembly expression, volatile is used to declare to the compiler that the inline assembly will not be optimized, sync is used to refresh the processor data and cache in the LoongISA architecture, and memory is used to declare that the memory has changed .

Optionally, in the network communication method according to the present invention, after the UCS part of UCX adds the leading zero function in the calculation binary code supporting the Loongson processor architecture, the calculation leading zero instruction of the LoongISA architecture is used to calculate the leading zero in the binary code the number of .

Optionally, in the network communication method according to the present invention, after the inline hook function supporting the Godson processor architecture is added to the UCM part of UCX, the inline hook function is implemented in the following way to replace the call with a function defined by UCX system library function: when calling a system library function, obtain the address of the called system library function, and obtain the address of the function defined by UCX corresponding to the called system library function; combine the jump instruction with the obtained UCX The address of the self-defined function is written to the address of the called system library function.

Optionally, in the network communication method according to the present invention, after adding the function of acquiring the Godson processor mode in the acquisition processor mode function of UCX, it is determined that the current processor is the Godson processor in the following manner: when acquiring the processor When the return value of the mode function is the enumeration item of the Godson processor mode, it is determined that the current processor is the Godson processor.

According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising Execute the instructions of the network communication method according to the present invention.

According to still another aspect of the present invention, a readable storage medium storing program instructions is provided, and when the program instructions are read and executed by a computing device, the computing device executes the network communication method according to the present invention.

According to the network communication method of the present invention, firstly, the network communication software framework UCX is obtained. Then, add the target function supporting the Loongson processor architecture in UCX and add the function of acquiring Loongson processor mode in the function of acquiring the processor mode of UCX to obtain the target UCX. After obtaining the target UCX, compile and install the target UCX on the computing device including the Godson processor. In this way, the computing device including the Godson processor can use the interface provided by the target UCX to realize high-speed Internet communication, thereby improving the efficiency of high-performance parallel computing of the computing device. It can be seen that the network communication method of the present invention can enable the Godson platform not supported by the original UCX to use the interface provided by it to realize high-speed network interconnection communication, thereby improving the efficiency of high-performance parallel computing of the Godson platform.

Description of drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are herein described, taken in conjunction with the following description and drawings, which are indicative of the various ways in which the principles disclosed herein may be practiced, and all aspects and their equivalents are intended to fall within the scope of within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent by reading the following detailed description in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout this disclosure.

FIG. 1 shows a structural block diagram of a computing device 100 according to an embodiment of the present invention;

FIG. 2 shows a flowchart of a network communication method 200 according to an embodiment of the present invention;

Fig. 3 shows a schematic diagram of a function calling process using the Inline Hook method according to an embodiment of the present invention.

detailed description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

With the acceleration of domestic self-reliance and the rapid development of the Xinchuang industry, it is becoming more and more important to realize the independent control of core technologies. Among them, in order to realize the independent controllability of the CPU, my country has independently developed the Godson processor. At present, Godson processors have been widely used in various industries.

The Loongson processor is based on the LoongISA architecture. Based on the previous description, it can be seen that the existing UCX does not support the LoongISA architecture. Therefore, in order to solve the problem of supporting high performance and parallel communication on the Godson platform, the present invention implements a method for inter-node communication based on a high-speed network supporting the Godson platform based on the existing UCX. Furthermore, the UCX communication interface for efficient intra-node communication is realized based on the shared memory mechanism.

FIG. 1 shows a structural block diagram of a computing device 100 according to an embodiment of the present invention. It should be noted that the computing device 100 shown in FIG. 1 is only an example. In practice, the computing device used to implement the network communication method of the present invention may be any type of device, and its hardware configuration may be the same as that shown in FIG. 1 The computing device 100 shown in FIG. 1 is the same as or may be different from the computing device 100 shown in FIG. 1 . In practice, the computing device used to implement the network communication method of the present invention may add or delete hardware components of the computing device 100 shown in FIG. 1 , and the present invention does not limit the specific hardware configuration of the computing device.

As shown in FIG. 1 , in a base configuration 102 , computing device 100 typically includes system memory 106 and one or more processors 104 . A memory bus 108 may be used for communication between the processor 104 and the system memory 106 .

Depending on the desired configuration, processor 104 may be any type of processing including, but not limited to, a microprocessor (μP), microcontroller (μC), digital information processor (DSP), or any combination thereof. Processor 104 may include one or more levels of cache such as L1 cache 110 and L2 cache 112 , processor core 114 and registers 116 . Exemplary processor core 114 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP core), or any combination thereof. An example memory controller 118 may be used with the processor 104 or, in some implementations, the memory controller 118 may be an internal part of the processor 104 .

Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The physical memory in the computing device usually refers to the volatile memory RAM, and the data in the disk needs to be loaded into the physical memory before being read by the processor 104 . System memory 106 may include an operating system 120 , one or more applications 122 , and program data 124 . In some implementations, applications 122 may be arranged to execute instructions on an operating system with program data 124 by one or more processors 104 . The operating system 120 may be, for example, Linux, Windows, etc., which includes program instructions for handling basic system services and performing hardware-dependent tasks. The application 122 includes program instructions for realizing various user-desired functions. The application 122 may be, for example, a browser, instant messaging software, software development tools (such as an integrated development environment IDE, a compiler, etc.), but is not limited thereto. When the application 122 is installed into the computing device 100 , a driver module may be added to the operating system 120 .

When the computing device 100 starts to run, the processor 104 reads program instructions of the operating system 120 from the system memory 106 and executes them. The application 122 runs on the operating system 120, and utilizes the interface provided by the operating system 120 and the underlying hardware to realize various user-desired functions. When the user starts the application 122 , the application 122 is loaded into the system memory 106 , and the processor 104 reads and executes the program instructions of the application 122 from the system memory 106 .

Computing device 100 also includes storage device 132 , which includes removable storage 136 and non-removable storage 138 , both of which are connected to storage interface bus 134 .

Computing device 100 may also include interface bus 140 to facilitate communication from various interface devices (eg, output devices 142 , peripheral interfaces 144 , and communication devices 146 ) to base configuration 102 via bus/interface controller 130 . Example output devices 142 include a graphics processing unit 148 and an audio processing unit 150 . They may be configured to facilitate communication with various external devices such as a display or speakers via one or more A/V ports 152 . Example peripherals interfaces 144 may include serial interface controller 154 and parallel interface controller 156, which may be configured to facilitate communication via one or more I/O ports 158 and input devices such as (e.g., keyboard, mouse, pen) , voice input device, touch input device) or other peripherals (such as printers, scanners, etc.) to communicate with external devices such as. The example communication device 146 may include a network controller 160 , which may be arranged to facilitate communication with one or more other computing devices 162 over a network communication link via one or more communication ports 164 .

A network communication link may be one example of a communication medium. Communication media typically embodies computer readable instructions, data structures, program modules in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media. A "modulated data signal" may be a signal that has one or more of its data sets or changes thereof in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired or dedicated-line network, and various wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

In the computing device 100 according to the present invention, the application 122 includes instructions for executing the network communication method 200 of the present invention, and the instructions may instruct the processor 104 to execute the network communication method of the present invention. Those skilled in the art can understand that, in addition to instructions for executing the network communication method 200, the application 122 may also include other applications 126 for implementing other functions.

FIG. 2 shows a flow chart of a network communication method 200 according to an embodiment of the present invention, and the method 200 is suitable for being executed in a computing device (such as the computing device 100 shown in FIG. 1 ). The computing device includes a Godson processor.

As shown in FIG. 2, the network communication method 200 of the present invention starts at step S210. In step S210, the network communication software framework UCX is obtained.

In order to facilitate the understanding of the present invention, a description of UCX is given here. UCX mainly includes four parts: UCS, UCM, UCT and UCP. specifically:

UCS is a service layer that provides the necessary functionality to implement portable and efficient utilities. This layer mainly includes the following services: abstractions for accessing platform-specific functionality (atomic operations, thread safety, etc.), tools for efficient memory management (memory pools, memory allocators, etc.), commonly used data structures (hash, tree, list).

UCM is mainly responsible for intercepting the memory allocation and release events used by the memory registration cache.

UCT is a transport layer that abstracts the differences between various hardware architectures and provides a low-level API that implements a communication protocol. The main goal of this layer is to provide direct and efficient access to hardware networking functions. In addition, this layer provides communication context management (on a thread- and application-level basis) and constructs for allocation and management to servers. In terms of communication API, UCT defines the communication methods of short data transmission (short), transmission with data copy (bcopy) and transmission with zero copy (zcopy) according to the data length. Short data transfer (short), this type of operation is optimized for the transfer of short data. Transfer with data copy (bcopy), this type of operation is optimized for moderately sized messages sent through so-called bounce buffers. This secondary buffer is usually allocated given network constraints and is ready for immediate use by the hardware. This method can be used for non-sequential I/O because custom data packing routines can be provided. Zero-copy transfers, this type of operation enables messages to be sent directly from user buffers, or received directly from user buffers, without copying between network layers.

UCP implements the higher-level protocols used by parallel programming models such as MPI and PGAS by using the lower-level functions exposed by the UCT layer. UCP mainly provides the following functions: initialization, remote memory access (RMA) communication, remote atomic memory operation (AMO), activity message, tag matching. Initialization, the functions of this interface include the setting of the communication context, querying network capabilities and initializing the local communication endpoint. A communication context represents an abstraction of network transport resources. The communication endpoint setup interface initializes a UCP endpoint, which is an abstraction of all necessary resources associated with a particular connection. Communication endpoints are used as input to all communication operations to describe the source and destination of the communication. Remote Memory Access (RMA) communication, this interface defines the low overhead required to implement distributed and shared memory programming models, the one-sided communication operations (such as PUT and GET) required for direct access to memory communication structures. UCP consists of a separate set of interfaces for passing discrete data. This functionality is included to support the communication requirements of various programming models and to take advantage of the scatter-gather capabilities of modern networking hardware. Remote Atomic Memory Operations (AMO), this interface provides support for atomic execution of operations on remote memory, which is an important operation of the PGAS programming model (especially OpenSHMEM). Tag matching, this interface supports tag matching for send-receive semantics, which is a key communication semantic defined by the MPI specification. Active Message (Active Message), this interface implements calling the sender-specified callback on incoming data packets for processing by the receiving process. For example, a double-sided MPI interface can be easily implemented on top of this concept. However, these interfaces are more general and apply to other programming paradigms where the receiver process does not pre-post a receive, but wishes to react directly to incoming packets. Like the RMA and tag-matching interfaces, the active message interface provides separate APIs for different message types and discontinuous data. Stream, this interface provides sequential and reliable communication semantics. Data is seen as an ordered sequence of bytes pushed over the connection. In contrast to tag-matching interfaces, the size of each sender does not have to match the size of each receiver as long as the total number of bytes is the same. This API is designed to match the widely used BSD socket-based programming model.

Then enter step S220, add the target function supporting the Godson processor architecture to the UCX and add the function of acquiring the Godson processor mode to the function of acquiring the processor mode of the UCX, and obtain the target UCX. Target functions include flushing processor data and instruction cache functions, computing leading zeros in binary code, and inlining hook functions.

Among them, when adding the objective function supporting the Godson processor architecture in UCX, it involves the UCS part and UCM part of UCX. Specifically, in the UCS part of UCX, add the function of refreshing processor data and instruction cache that supports the Godson processor architecture, and the function of calculating leading zeros in binary code. Add an inline hook function that supports the Loongson processor architecture in the UCM part of UCX.

According to an embodiment of the present invention, the function of refreshing processor data and instruction cache is implemented by means of inline assembly. Specifically, it can be realized through the inline assembly expression asm volatile("sync":::"memory"). Among them, asm (all inline assembly expressions start with this) is used to declare an inline assembly expression. volatile is used to declare to the compiler that this inline assembly will not be optimized. sync (sync is an instruction to refresh processor data and cache in LoongISA architecture) is used to refresh processor data and cache in LoongISA architecture. memory is used to declare that the memory has changed, that is, to tell the compiler that the memory has changed, and it needs to be read directly from the corresponding memory, and the copy stored in the register should not be used anymore.

Functions that count leading zeros in binary codes are also implemented in inline assembly. Specifically, it can be implemented through the calculation leading zero instruction of the LoongISA architecture. Among them, the calculation leading zero instruction of LoongISA architecture includes clz assembly instruction and dclz assembly instruction. clz returns the number of 0s before the first 1 in the 32-bit binary code, and dclz returns the number of 0s before the first 1 in the 64-bit binary code.

In addition, about the step of adding an inline hook function that supports the Godson processor architecture in the UCM part of UCX, let me explain a little bit here. The purpose of adding the inline hook function supporting the Godson processor architecture in the UCM part of UCX is to replace the system library function with the function customized by UCX. Specifically, when a program calls a certain system library function, the system library function is replaced with a function defined by UCX corresponding to the system library function. That is, when a certain program calls a certain system library function, the system library function is not executed, but the function defined by the UCX corresponding to the system function is executed.

Among them, the inline hook (Inline Hook) is to replace the system library function with the function customized by UCX by modifying the machine code. Specifically, when the program calls the system library function, the address of the called system library function is obtained, and the address of the function defined by UCX corresponding to the called system library function is obtained. Then, write the jump instruction and the obtained address of the function defined by the UCX into the address of the called system library function. In this way, when a program executes a system library function call, it will jump to the user-defined function corresponding to the system library function for execution.

For details, refer to 3. FIG. 3 shows a schematic diagram of a function call process using the Inline Hook method according to an embodiment of the present invention. When a program executes a system library function call, it first jumps to the address of the system call. Then execute the jump instruction at the system call address to jump to the user-defined function address. After executing the user-defined function, return to execute the next statement of jalr. The specific implementation steps are as follows:

(1) Construct a jump instruction, assign the address of the custom function to the t9 register by means of machine code, and then jump to the t9 register.

(2) Write the construction instruction to the address of the system call.

(3) When the system call is executed, it jumps to the user-defined function for execution.

(4) Execute the original process after executing the custom function.

It can be seen that after the Inline Hook method supporting the Loongson processor architecture is added to the UCM part of UCX, when a program calls a system library function, it can intercept the system library function, thereby executing the UCX's own function corresponding to the system library function. defined function. Among them, the system library functions mmap, munmap, mremap, shmat, shmdt, sbrk, brk, and madvise correspond to the functions ucm_mmap, ucm_munmap, ucm_mremap, ucm_shmat, ucm_shmdt, ucm_sbrk, ucm_brk, and ucm_madvise defined by UCX.

So far, functions to refresh processor data and instruction cache, calculate leading zeros in binary code, and inline hook functions that support the Loongson processor architecture have been added to UCX.

Next, the step of adding the function of obtaining the Godson processor mode in the function of obtaining the processor mode of UCX is described. Among them, this step is to add the function of obtaining the Godson processor mode in the function of obtaining the processor mode of the UCS part of UCX.

Specifically, add the Loongson processor mode enumeration item to the processor mode enumeration type, and add the logic of acquiring the Loongson processor mode enumeration item to the processor mode acquisition function. In this way, when the return value of the function of obtaining the processor mode is the enumeration item of the Godson processor mode, it can be determined that the current processor is the Godson processor.

The following is a specific example to illustrate, add the UCS_CPU_MODEL_LOONGISA enumeration item in the enumeration type ucs_cpu_model_t, and add the logic of obtaining the UCS_CPU_MODEL_LOONGISA enumeration item in the ucs_arch_get_cpu_model function. In this way, when the return value received by the ucs_arch_get_cpu_model function is UCS_CPU_MODEL_LOONGISA, it indicates that the CPU mode is LoongISA.

So far, in the UCS part of UCX, the function of refreshing processor data and instruction cache function, calculating the leading zero function in the binary code and obtaining the mode of the Godson processor has been added in the UCS part of UCX, and the function of supporting Godson processor has been added in the UCM part of UCX Inline hook function for processor architecture, get target UCX.

Explain a bit here, the present invention utilizes UCS_TEST_F (test_math, bitops) test function to test the function of leading zero function in the calculation binary code in UCS, utilize UCS_TEST_F (test_type, cpu_set) test function to the cpu pattern function in UCS Test, use the UCS_TEST_F (malloc_hook_cplusplus, mmap_ptrs) and UCS_TEST_F (malloc_hook, bistro_patch) test functions to test the Inline Hook function in UCM. Among them, after running the test command make-C test/gtest test, all the tests are passed, which shows that after adding support for the Godson processor in UCX, UCX can be compiled and compiled on the platform based on the Godson processor architecture. run.

After obtaining the target UCX, enter step S230, compile and install the target UCX on the computing device, so that the computing device uses the interface provided by the target UCX to perform network communication.

Among them, after adding the Inline Hook method supporting the Loongson processor architecture in the UCM part, the system library functions mmap, munmap, mremap, shmat, shmdt, sbrk, brk, madvise can be intercepted, thereby executing ucm_mmap, ucm_munmap, ucm_mremap, Ucm_shmat, ucm_shmdt, ucm_sbrk, ucm_brk, ucm_madvise and other functions.

After adding the function of refreshing processor data and instruction cache that supports the Loongson processor architecture, calculating the leading zero function in the binary code, and obtaining the mode of the Loongson processor in the UCS part, you can obtain the mode of the Loongson processor and refresh the Loongson processor. Leading zeros in data and caches and computing binary codes, which in turn can enable other functions in UCS, such as abstraction of (atomic operations, thread safety, etc.), tools for efficient memory management (memory pools, memory allocators, etc.), Commonly used data structures (hash, tree, list), etc. can also be used in platforms based on preset processing architectures. Therefore, after compiling and installing the target UCX on the computing device, the computing device can use the interface provided by the target UCX for network communication.

The following is a specific example to illustrate, in the UCS part of UCX, add the refresh processor data and instruction cache function supporting the LoongISA architecture and the function of calculating the leading zero in the binary code, and add the inline hook function supporting the LoongISA architecture in the UCM part of UCX , Add the function of obtaining the Godson processor mode to the function of obtaining the processor mode in the UCS part of UCX, and obtain the target UCX. The obtained target UCX supports the Loongson platform, so the target UCX can be compiled and installed on the Loongson platform. In this way, the interface provided by UCX can be used on the Loongson platform to realize high-speed Internet communication, thereby improving the efficiency of high-performance parallel computing on the Loongson platform.

According to the network communication method of the present invention, firstly, the network communication software framework UCX is obtained. Then, add the target function supporting the Loongson processor architecture in UCX and add the function of acquiring Loongson processor mode in the function of acquiring the processor mode of UCX to obtain the target UCX. After obtaining the target UCX, compile and install the target UCX on the computing device including preset processing. In this way, the computing device including preset processing can use the interface provided by the target UCX to realize high-speed Internet communication, thereby improving the efficiency of high-performance parallel computing of the computing device. It can be seen that the network communication method of the present invention can enable the Godson platform not supported by the original UCX to use the interface provided by it to realize high-speed network interconnection communication, thereby significantly improving the efficiency of high-performance parallel computing on the Godson platform.

The various techniques described herein can be implemented in conjunction with hardware or software, or a combination thereof. Thus, the method and device of the present invention, or certain aspects or parts of the method and device of the present invention may be embedded in a tangible medium, such as a removable hard disk, USB flash drive, floppy disk, CD-ROM or any other machine-readable storage medium In the form of program code (ie, instructions) in a machine such as a computer, when the program is loaded into a machine such as a computer and executed by the machine, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on a programmable computer, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein, the memory is configured to store program code; the processor is configured to execute the document loading method of the present invention according to instructions in the program code stored in the memory.

Readable media include, by way of example and not limitation, readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, the algorithms and displays are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with examples of the invention. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

It should be appreciated that in the above description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or in its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will understand that the modules or units or components of the devices in the examples disclosed herein may be arranged in the device as described in this embodiment, or alternatively may be located in a different location than the device in this example. in one or more devices. The modules in the preceding examples may be combined into one module or furthermore may be divided into a plurality of sub-modules.

Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that may be implemented by a processor of a computer system or by other means for performing the described function. Thus, a processor with the necessary instructions for carrying out the described method or element of a method forms a means for carrying out the method or element of a method. Furthermore, elements described herein of an apparatus embodiment are examples of means for carrying out the function performed by the element for the purpose of carrying out the invention.

As used herein, unless otherwise specified, the use of ordinal numbers "first," "second," "third," etc. to describe generic objects merely means referring to different instances of similar objects and is not intended to imply such The described objects must have a given order temporally, spatially, sequentially or in any other way.

While the invention has been described in terms of a limited number of embodiments, it will be apparent to a person skilled in the art having the benefit of the above description that other embodiments are conceivable within the scope of the invention thus described. In addition, it should be noted that the language used in the specification has been chosen primarily for the purpose of readability and instruction rather than to explain or define the inventive subject matter. Accordingly, many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. With respect to the scope of the present invention, the disclosure of the present invention is intended to be illustrative rather than restrictive, and the scope of the present invention is defined by the appended claims.

Claims

A network communication method, adapted to be executed in a computing device, the computing device comprising a Godson processor, the method comprising:

Obtain network communication software framework UCX;

Add the target function supporting the Godson processor architecture in the UCX and increase the function of acquiring the Godson processor mode in the acquisition processor mode function of the UCX to obtain the target UCX, and the target function includes refreshing processor data and instructions Caching functions, functions for calculating leading zeros in binary codes, and inline hook functions;

compiling and installing the target UCX on the computing device, so that the computing device uses the interface provided by the target UCX to perform network communication.
The method according to claim 1, wherein the step of adding an objective function supporting the Godson processor architecture in the UCX includes:

In the UCS part of UCX, add the refresh processor data and instruction cache functions that support the Godson processor architecture, and the function of calculating leading zeros in binary code;

Add an inline hook function that supports the Loongson processor architecture in the UCM part of UCX.
The method according to claim 1 or 2, wherein the step of increasing the function of obtaining the Godson processor mode in the obtaining processor mode function of the UCX includes:

Add the Godson processor mode enumeration item in the processor mode enumeration type;

Add the logic of obtaining the Loongson processor mode enumeration item in the function of obtaining the processor mode.
The method according to claim 2, wherein, after the UCS part of UCX adds the refresh processor data and instruction cache function supporting the Godson processor architecture, the Godson processor data and instruction cache are refreshed by the following inline assembly expression:

asm volatile("sync":::"memory")

Among them, asm is used to declare an inline assembly expression, volatile is used to declare to the compiler that the inline assembly will not be optimized, sync is used to refresh the processor data and cache in the LoongISA architecture, and memory is used to declare that the memory has changed .
The method according to claim 2 or 4, wherein, after the UCS part of UCX adds the leading zero function in the calculation binary code supporting the Loongson processor architecture, the calculation leading zero instruction of the LoongISA architecture is used to calculate the leading zero in the binary code number.
The method according to claim 2, wherein, after adding the inline hook function supporting the Godson processor architecture in the UCM part of UCX, the inline hook function realizes replacing the call with a function defined by UCX in the following manner System library functions:

When the system library function is called, the address of the called system library function is obtained, and the address of the function defined by UCX corresponding to the called system library function is obtained;

Write the jump instruction and the address of the function defined by the obtained UCX into the address of the called system library function.
The method according to claim 3, wherein, after adding the function of obtaining the Godson processor mode in the acquisition processor mode function of the UCX, it is determined that the current processor is the Godson processor in the following manner:

When the return value of the get processor mode function is the enumeration item of the Godson processor mode, it is determined that the current processor is the Godson processor.
A computing device comprising:

at least one processor; and

A memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the method according to any one of claims 1-7 .
A readable storage medium storing program instructions, when the program instructions are read and executed by a computing device, the computing device is made to execute the method according to any one of claims 1-7.