CN117453424A - Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource - Google Patents

Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource Download PDF

Info

Publication number
CN117453424A
CN117453424A CN202311804579.2A CN202311804579A CN117453424A CN 117453424 A CN117453424 A CN 117453424A CN 202311804579 A CN202311804579 A CN 202311804579A CN 117453424 A CN117453424 A CN 117453424A
Authority
CN
China
Prior art keywords
algorithm
computer
server
resource
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311804579.2A
Other languages
Chinese (zh)
Other versions
CN117453424B (en
Inventor
齐永兴
陈东
吴铤
王雷
于洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Innovation Research Institute of Beihang University
Original Assignee
Hangzhou Innovation Research Institute of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Innovation Research Institute of Beihang University filed Critical Hangzhou Innovation Research Institute of Beihang University
Priority to CN202311804579.2A priority Critical patent/CN117453424B/en
Publication of CN117453424A publication Critical patent/CN117453424A/en
Application granted granted Critical
Publication of CN117453424B publication Critical patent/CN117453424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a method and a device for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource, wherein the method comprises the following steps: identifying an operating system of the computer or server; identifying and recording hardware resources of a computer or a server based on an operating system, and forming a resource pool by combining the hardware resources provided by the operating system; providing SDK corresponding to the supported algorithm for the testing party to call the testing item to carry out the algorithm test, and finishing the test by returning the performance testing data of the supported algorithm; and calling other computing resources in the resource pool to participate in the operation work of the computing instruction in a single or combined mode according to the computing instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the computing instruction and the current use condition of the computing resources in the resource pool. According to the invention, the calculation pressure of the CPU is relieved, resources are reasonably scheduled, and the calculation intensive algorithm is transferred and split from the CPU, so that the calculation pressure of the CPU is greatly reduced, and the system stability is improved.

Description

Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource
Technical Field
The present invention relates to the field of computing resource scheduling technologies, and in particular, to a method and an apparatus for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource.
Background
In recent years, important industries such as finance, education, telecommunication, electric power and the like are vigorously developed. The current mature security products with wide application mainly adopt an X86 hardware architecture, and the X86 architecture security products have bottlenecks in application layer detection and packet processing performance.
In addition, the performance of the CPU adapted by the security product is relatively low, and the demand of the computing resource of the CPU is urgent. Meanwhile, the national cryptographic algorithm has a bottleneck of software algorithm in terms of mathematical principle, and the algorithm is realized under an inefficient CPU, so that the algorithm efficiency is lower, and the system breakdown is easy to cause.
Disclosure of Invention
First, the technical problem to be solved
In view of the above-mentioned shortcomings and disadvantages of the prior art, the present invention provides a method and apparatus for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource, which solve the technical problems of the existing low-efficiency CPU computing resource embarrassment and low algorithm efficiency.
(II) technical scheme
In order to achieve the above purpose, the main technical scheme adopted by the invention comprises the following steps:
in a first aspect, an embodiment of the present invention provides a method for dynamically expanding the running efficiency of an acceleration algorithm of a computing resource, including:
identifying an operating system of the computer or server after accessing the computer or server;
identifying hardware resources of a computer or a server based on system call or API provided by an operating system, and forming a uniform resource pool by combining the hardware resources provided by the operating system;
providing SDK corresponding to the algorithm supported by the testing party for the testing party to call the test item to perform algorithm test, and completing the test to receive the performance test data returned by the testing party;
according to the calculation instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the calculation instruction and the current use condition of the calculation resources of the resource pool, so as to reduce the CPU resource use of the computer or the server as an optimization target, and other calculation resources in the resource pool are called to participate in the calculation work of the calculation instruction in a single or combined mode.
Optionally, after accessing the computer or the server, identifying the operating system of the computer or the server includes:
after accessing a computer or a server, a pre-configured driver installation program is provided for the computer or the server to carry out driver installation;
the operating system on the computer or server is identified by means of the installed drivers and an authorization request is initiated to the operating system to obtain an authorization grant for the operating system to identify system hardware resources and schedule system resources.
Optionally, identifying hardware resources of the computer or the server based on a system call or an API provided by the operating system, forming a unified resource pool in combination with the hardware resources provided by the operating system includes:
at least one operation among hardware equipment resource list acquisition, hardware equipment attribute information acquisition and related task execution of the hardware equipment including a password operation task and an image processing task is carried out through system call or API provided by an operating system, so that hardware resources of a computer or a server are identified and recorded;
and forming a uniform resource pool according to the hardware resources of the computer or the server and the hardware resources provided by the computer or the server.
Optionally, providing the SDK corresponding to the algorithm supported by the testing party to the testing party, so that the testing party can call the test item to perform the algorithm test, and the performance test data returned by the testing party is received after the test is to be tested, wherein the performance test data comprises:
providing SDK corresponding to the algorithm supported by the testing party for the testing party to call the test item to carry out the algorithm test, thereby obtaining the execution times per second under the single-thread condition of each resource containing the algorithm and the performance test data of the memory resource occupied in the algorithm execution;
after the performance test is completed, the performance test data returned by the tester is received.
Optionally, according to the calculation instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the calculation instruction, and the current use condition of the calculation resource of the resource pool, to reduce the CPU resource usage of the computer or the server as an optimization target, invoking the operation work of other calculation resources in the resource pool to participate in the calculation instruction in a single or combined manner includes:
acquiring a calculation instruction issued by an upper layer service;
analyzing the calculation instruction to obtain at least one stage task algorithm contained in the calculation instruction;
matching each stage task algorithm with the algorithm supported by the algorithm, and taking the performance test data of the algorithm with the consistency degree larger than a set threshold value as the performance test data of the stage task algorithm;
judging whether the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm according to the type, the number and the performance test data of each stage task algorithm;
if the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm, directly using the current CPU resource of the computer or the server to carry out the operation processing of the corresponding at least one stage task algorithm, and transmitting other stage task algorithms which are included in the calculation instruction and are except for the current CPU resource of the dependent computer or the server to complete the stage task algorithm to other calculation resources in a resource pool to participate in the operation work of the calculation instruction in a single or combined mode;
if the current CPU resource of the computer or the server can not meet the requirement of at least one stage task algorithm, then the task algorithms of all stages are handed to other computing resources in the resource pool to participate in the operation work of the computing instruction in a single or combined mode;
the conditions required by the phase task algorithm include, among others, expected target efficiency, latest completion time, and running rate.
Optionally, the operation of participating in the computing instruction by other computing resources in the resource pool, alone or in combination, includes:
identifying the current use condition of the computing resources of the resource pool through driving;
selecting computing resources of which the residual resources meet the constraint conditions among the computing resources except CPU resources of a computer or a server in the resource pool according to the constraint conditions comprising the type of the stage task algorithm, the number of the stage task algorithms, the performance test data of the stage task algorithm and the current use condition of the computing resources of the resource pool so as to form a schedulable group;
the optimization target of reducing the CPU resource utilization ratio of the computer or the server is that a computing resource is independently scheduled from a schedulable group to participate in the operation work of the computing instruction or an algorithm chain is sequentially scheduled and combined to participate in the operation work of the computing instruction in stages.
In a second aspect, an embodiment of the present invention provides an apparatus for dynamically expanding the running efficiency of an acceleration algorithm of a computing resource, where the apparatus is a software and hardware integrated apparatus connected to a computer or a server, and is configured to perform the method as described above.
Optionally, multiple devices can be simultaneously accessed to the computer or the server, and each device also supports capacity expansion by self hardware resources, and when the hardware resources including the types and the quantity of the devices are accessed to the computer or the server, corresponding update information is registered to a system of the computer or the server, and resource updating of a resource pool is performed.
Optionally, the device is internally provided with a plurality of cryptographic algorithms and has multi-level encryption and decryption parallel capability;
the cryptographic algorithm comprises national cryptographic algorithms SM2, SM3 and SM4, international algorithms RSA, SHA, AES, DES and 3DES and homomorphic encryption Paillier; and the realization of each cryptographic algorithm is divided into a CPU version and a GPU version, and the CPU version and the GPU version have different parallel acceleration modes.
Optionally, the device is programmable with built-in multiple cryptographic algorithms that support autonomous programming, modification, replacement, and optimization by the user.
(III) beneficial effects
The beneficial effects of the invention are as follows: under the condition that a computer or a server is not greatly modified, the self-carried resources are added into the construction of the hardware resource pool, the calculation pressure of the CPU is liberated as an optimization purpose, the use of the CPU resources is reduced as much as possible, all the resources in the resource pool are reasonably called, the function of the hardware resources is fully exerted, the calculation intensive algorithm is transferred and shunted from the CPU of the computer or the server to other resources, the calculation pressure of the CPU is greatly reduced, the system stability is improved, and the situation of embarrassing the calculation resources of the low-efficiency CPU is relieved.
Drawings
FIG. 1 is a flow chart of a method for accelerating the operation efficiency of an algorithm capable of dynamically expanding computing resources according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for dynamically expanding the operation efficiency of an acceleration algorithm for computing resources according to an embodiment of the present invention in step S4;
FIG. 5 is a schematic diagram of a specific flow chart of an operation of a method for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource, in which other computing resources in a resource pool participate in a computing instruction in a single or combined manner;
fig. 6 is a schematic diagram of the composition and connection of a device capable of dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource according to an embodiment of the present invention.
Detailed Description
The invention will be better explained for understanding by referring to the following detailed description of the embodiments in conjunction with the accompanying drawings.
As shown in fig. 1, a method for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource according to an embodiment of the present invention includes: identifying an operating system of the computer or server after accessing the computer or server; identifying hardware resources of a computer or a server based on system call or API provided by an operating system, and forming a uniform resource pool by combining the hardware resources provided by the operating system; providing SDK corresponding to the algorithm supported by the testing party for the testing party to call the test item to perform algorithm test, and completing the test to receive the performance test data returned by the testing party; according to the calculation instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the calculation instruction and the current use condition of the calculation resources of the resource pool, so as to reduce the CPU resource use of the computer or the server as an optimization target, and other calculation resources in the resource pool are called to participate in the calculation work of the calculation instruction in a single or combined mode.
Under the condition that a computer or a server is not greatly modified, the self-carried resources are added into the construction of the hardware resource pool, the calculation pressure of the CPU is liberated as an optimization purpose, the use of the CPU resources is reduced as much as possible, all the resources in the resource pool are reasonably called, the function of the hardware resources is fully exerted, the calculation intensive algorithm is transferred and shunted from the CPU of the computer or the server to other resources, the calculation pressure of the CPU is greatly reduced, the system stability is improved, and the situation of embarrassing the calculation resources of the low-efficiency CPU is relieved.
In order to better understand the above technical solution, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Specifically, the embodiment of the invention provides a method for accelerating algorithm operation efficiency capable of dynamically expanding computing resources, which comprises the following steps:
s1, after accessing a computer or a server, identifying an operating system of the computer or the server.
Further, as shown in fig. 2, step S1 includes:
s11, after the computer or the server is accessed, a preconfigured driver installation program is provided for the computer or the server to carry out driver installation.
S12, identifying the operating system on the computer or the server by means of the installed driver, and initiating an authorization request to the operating system to obtain the authority grant of the operating system for identifying the system hardware resources and scheduling the system resources.
S2, identifying and recording hardware resources of the computer or the server based on system call or API provided by the operating system, and forming a unified resource pool by combining the hardware resources provided by the operating system.
Further, as shown in fig. 3, step S2 includes:
s21, at least one operation among hardware device resource list acquisition, hardware device attribute information acquisition and related task execution of the hardware device including a password operation task and an image processing task is carried out through a system call or an API provided by an operating system, so that hardware resources of a computer or a server are identified and recorded. Wherein, the operation task of the cryptographic algorithm: such as encryption and decryption, signature verification, etc.; graphics processing tasks: digital operations and the like required for displaying high-quality graphics are performed.
S22, forming a unified resource pool according to the hardware resources of the computer or the server and the hardware resources provided by the computer or the server.
And S3, providing the SDK corresponding to the algorithm supported by the testing party to the testing party so as to enable the testing party to call the test item to carry out algorithm test, and finishing the test to receive the performance test data returned by the testing party. And providing the SDK corresponding to the algorithm, and enabling a user to periodically or manually start a benchmark test of the local resource, wherein the test returns to the test performance of the supported algorithm. From the test results, a performance statistical analysis of the supported cryptographic algorithms may be derived. The performance test is to obtain the execution times of each second of the algorithm under the single-thread condition of each resource on the device accessed to the computer or the server, and the memory resource occupied by the algorithm in execution. Algorithms refer to various algorithms built into the device, such as: the signature of the SM2 algorithm can be achieved for several times per second, and a user can determine whether the device can achieve the own calculation requirement or not according to the performance statistics, whether the resource expansion is needed or not, and the like.
S4, according to the calculation instruction transmitted by the upper layer service, performance test data of an algorithm corresponding to the calculation instruction and the current use condition of calculation resources of the resource pool, the CPU resource use of the computer or the server is reduced as an optimization target, and other calculation resources in the resource pool are called to participate in the calculation work of the calculation instruction in a single or combined mode.
Further, as shown in fig. 4, step S4 includes:
s41, acquiring a calculation instruction issued by the upper layer service.
S42, analyzing the calculation instruction to obtain at least one stage task algorithm contained in the calculation instruction.
S43, matching each stage task algorithm with the algorithm supported by the algorithm, and taking the performance test data of the algorithm with the consistency degree larger than the set threshold value as the performance test data of the stage task algorithm.
S44, judging whether the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm according to the type, the number and the performance test data of each stage task algorithm. The conditions required by the phase task algorithm include, among others, expected target efficiency, latest completion time, and running rate.
S45, if the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm, directly using the current CPU resource of the computer or the server to carry out the operation processing of the corresponding at least one stage task algorithm, and delivering other stage task algorithms which are included in the calculation instruction and are except for the current CPU resource of the dependent computer or the server to complete the stage task algorithm to other calculation resources in a resource pool to participate in the operation work of the calculation instruction in a single or combined mode.
And S46, if the current CPU resource of the computer or the server can not meet the requirement of at least one stage task algorithm, the stage task algorithms are transmitted to other computing resources in the resource pool to participate in the operation work of the computing instruction in a single or combined mode.
Further, as shown in fig. 5, the operation of participating in the computing instruction by other computing resources in the resource pool, alone or in combination, includes:
a11, identifying the current use condition of the computing resources of the resource pool through driving.
A12, selecting the computing resources of which the residual resources meet the constraint conditions among the computing resources except the CPU resources of the computer or the server in the resource pool according to the constraint conditions of the current use conditions of the computing resources of the resource pool, wherein the constraint conditions comprise the type of the stage task algorithm, the number of the stage task algorithms, the performance test data of the stage task algorithm and the resource pool, so as to form a schedulable group.
A13, independently dispatching a computing resource to participate in the computing work of the computing instruction from the schedulable group or sequentially dispatching and combining an algorithm chain to participate in the computing work of the computing instruction in stages according to the optimization target of reducing the CPU resource use ratio of the computer or the server.
After receiving the issued calculation instruction, the use condition of the existing calculation resources, such as the idle core number of the CPU, the occupation condition of the GPU resources, the occupation rate of the resources and the like, can be identified through the driver. And then, selecting different computing resources according to different algorithms, such as a computationally intensive cryptographic algorithm mainly by calling a CPU or FPGA of the host to compute, preferably using the own computing resources or using a GPU to achieve the aim of saving resources by reducing the CPU use of the host as much as possible. GPU resources of the device or GPU resources of the host are mainly used for graphics processing and the like. By using different computing resources for different algorithms or tasks and reducing the use of host resources as much as possible, all the devices in the hardware resource pool are reasonably called, the function of the hardware resources is fully exerted, and the aim of accelerating the password computation is fulfilled.
In a specific embodiment, the upper layer service generally uses a specific protocol flow, and does not directly use a cryptographic algorithm, wherein the cryptographic algorithm is only a part of a protocol layer, and for this situation, the device does not adopt a mode of reasonably scheduling the CPU and other hardware resources to optimize the algorithm for reducing the occupancy rate of the CPU. For different protocols, the scheduling and the flow are different, but the optimization thinking is consistent. The device does not provide algorithms for the protocol layer, but the user has associated protocol flows, communication schedules, etc. For example, the user executes a secure multiparty computing protocol, which is divided into multiple parts, an algorithm scheduling process, an algorithm computing process, a data local circulation process, etc. Assuming that the protocol has two participants, the first step is that the two parties execute the operations of small resource occupation, such as initialization, and the like, and the low-efficiency CPU can also easily cope with the operations, and the CPU can be directly used for calculation and operation; the second step of the protocol is that the A party executes homomorphic encryption operation, the device can carry out reasonable resource call according to the data quantity of the homomorphic encryption operation to be executed, if the quantity of the data to be encrypted is small, for example, only a few or more than ten data can be directly used as CPU resources, if the quantity of the data to be encrypted is large, the device can select to call other computing resources, after the computation is completed, the ciphertext result is sent to the B party, and the ciphertext sending operation is completed by the CPU; and the third step of the protocol is that the party B executes homomorphic ciphertext operation on the received ciphertext, and at the moment, the computing resource can be reasonably called according to the computing quantity. Because the addition calculation amount is small, the parallel amount is small, the actual time consumption of the calculation task issuing is longer than the running time of the actual algorithm, and the calculation task issuing is not lost; after the calculation of the participant B is completed, the calculation result is sent to the participant A; and the fourth step of the protocol is that the party A executes homomorphic decryption operation, and returns the result to the user after the result is recovered. In summary, even if the homomorphic encryption operation with large calculation amount is used, the device still performs resource scheduling according to the actual calculation requirement, and the calculation resource is not directly selected according to the algorithm, but reasonably selected according to the actual calculation amount.
As shown in fig. 6, the embodiment of the present invention further provides a device capable of dynamically expanding the operation efficiency of the acceleration algorithm of the computing resource, where the device is a device integrated with software and hardware connected to a computer or a server, and is configured to perform the method as described above. The device is a software and hardware integrated device, so that the device has a hardware installation flow. The device can be installed on a server or a computer only through a PCIE slot. After physically installing the device on a server or computer, the user can install the corresponding driver according to the provided driver installation program, so that the user can call the device. After the drive is installed, the device may identify the operating system of the server or computer, including but not limited to Windows, linux, etc.
Then, multiple devices can be simultaneously accessed to the computer or the server, and the self hardware resources of each device also support capacity expansion, and the self hardware resources comprise: and under the condition of updating hardware resources of devices accessed to the computer or the server, the FPGA, the SOC, the display card and the like register corresponding updating information to a system of the computer or the server, and update resources of a resource pool.
The intensive task is scheduled or even expanded according to the test result and the current device resource use condition, if the test performance or the device resource of the device obviously does not accord with the target efficiency expectation, the capacity expansion can be carried out according to the expectation, namely, one or more devices are additionally connected, if the test performance is better but the residual CPU resource is insufficient, the intensive task calculation can be realized by means of the FPGA or even the GPU of the device, otherwise, if the performance test is worse but the resource is sufficient, the calculation can be finished by a mode of starting a plurality of parallel tasks and simultaneously using a large amount of calculation resources except the CPU of the computer. Graphics processing tasks are similar.
Then, the device is internally provided with a cryptographic algorithm in advance, has multi-level encryption and decryption parallel capability, and is driven to provide an internal cryptographic algorithm calling interface and upper service calling at the same time; the cryptographic algorithm comprises national cryptographic algorithms SM2, SM3 and SM4, international algorithms RSA, SHA, AES, DES and 3DES and homomorphic encryption Paillier; the implementation of the cryptographic algorithm is divided into a CPU version and a GPU version, and different parallel acceleration modes are provided. If ECC adopts a complete serial scheme in CPU, single thread is used in GPU by adopting bottom finite field operation, and multithreading is adopted in upper-level modular exponentiation and elliptic curve arithmetic operation. AES and SM4 are different in acceleration and parallel modes according to different CPU and GPU, the Intel CPU is realized by adopting AESNI instruction to optimize AES and SM4, and the GPU is realized by adopting CUDA architecture of NVIDIA to realize efficient parallel optimization, so that multipath parallel execution and dynamic load balancing capability are provided.
Furthermore, the device is programmable, and various built-in cryptographic algorithms support autonomous programming, modification, replacement and optimization by a user. The algorithm provided in the device supports user-defined implementation, and if the user has a more efficient and unwilling to disclose cryptographic algorithm implementation, the user can autonomously program the device through the provided development kit to replace or optimize any algorithm in the device, and the calculation of the algorithm is optimized and accelerated by combining the method for accelerating the operation efficiency of the algorithm provided by the device. In addition, the user can realize any algorithm required by the user in the device, and optimize and accelerate the calculation of the algorithm by combining a resource calling mechanism provided by the device.
In summary, the embodiments of the present invention provide a method and an apparatus for dynamically expanding the operation efficiency of an acceleration algorithm of a computing resource, where the apparatus is an apparatus capable of dynamically expanding capacity and identifying and allocating software and hardware of a computing resource of a computer or a server, and the apparatus may be inserted into a PCIE card slot on the computer or the server by way of a PCIE interface, and the computer or the server invokes the capability of the apparatus through a driver (SDK). The overall workflow of the device is as follows: firstly, inserting a software and hardware integrated device into a computer or a server through a PCIE slot; secondly, identifying the operating system of the computer or the server, and selecting different driving formats to serve as an upper layer according to different operating systems; next, by identifying and recording hardware resources of the computer or the server, including but not limited to CPU, GPU, etc., and combining the hardware resources provided by the device itself, a unified resource pool is formed; furthermore, the device is internally provided with an efficient cryptographic algorithm, has the capability of accelerating the cryptographic algorithm in parallel, and is driven to provide an internal cryptographic algorithm calling interface at the same time, so that upper-layer service calling can be realized; then, the upper layer service determines the function to be called, and the device can obtain the general application performance of the function; finally, the upper layer service calls the function through the drive, and the device starts to automatically allocate resources to complete the calculation task.
Thus, the present invention has several advantages:
(1) The system has high performance, provides layering parallel and load balancing capability, and is provided with a built-in high-performance encryption and decryption algorithm, wherein the performance is obviously higher than that of a pure software algorithm.
(2) The calculation pressure of the CPU is liberated, resources are reasonably scheduled, the calculation-intensive encryption and decryption algorithm is transferred and shunted from the CPU to the device, the calculation pressure of the CPU is greatly reduced, and the system stability is improved.
(3) The method has strong expandability, the number of devices can be arbitrarily expanded according to different calculation demands, the internal algorithm is customized, and the algorithm execution is accelerated.
Since the system/device described in the foregoing embodiments of the present invention is a system/device used for implementing the method of the foregoing embodiments of the present invention, those skilled in the art will be able to understand the specific structure and modification of the system/device based on the method of the foregoing embodiments of the present invention, and thus will not be described in detail herein. All systems/devices used in the methods of the above embodiments of the present invention are within the scope of the present invention.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
It should be noted that the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the inventive arrangements where several means are recited, several of these means can be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. are for convenience of description only and do not denote any order. These terms may be understood as part of the component name.
Furthermore, it should be noted that in the description of the present specification, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to a specific feature, structure, material, or characteristic described in connection with the embodiment or example being included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art upon learning the basic inventive concepts. Therefore, the present invention should be construed as including the preferred embodiments and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention should also include such modifications and variations provided that they fall within the scope of the present invention and the equivalents thereof.

Claims (10)

1. A method for dynamically expanding the efficiency of an accelerated algorithm operation for computing resources, comprising:
identifying an operating system of the computer or server after accessing the computer or server;
identifying hardware resources of a computer or a server based on system call or API provided by an operating system, and forming a uniform resource pool by combining the hardware resources provided by the operating system;
providing SDK corresponding to the algorithm supported by the testing party for the testing party to call the test item to perform algorithm test, and completing the test to receive the performance test data returned by the testing party;
according to the calculation instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the calculation instruction and the current use condition of the calculation resources of the resource pool, so as to reduce the CPU resource use of the computer or the server as an optimization target, and other calculation resources in the resource pool are called to participate in the calculation work of the calculation instruction in a single or combined mode.
2. The method for accelerating algorithm operational efficiency of a dynamically expandable computing resource of claim 1, wherein, after accessing the computer or server, identifying an operating system of the computer or server comprises:
after accessing a computer or a server, a pre-configured driver installation program is provided for the computer or the server to carry out driver installation;
the operating system on the computer or server is identified by means of the installed drivers and an authorization request is initiated to the operating system to obtain an authorization grant for the operating system to identify system hardware resources and schedule system resources.
3. The method for accelerating algorithm running efficiency of a computing resource dynamically expandable according to claim 1, wherein identifying hardware resources of a computer or a server based on a system call or an API provided by an operating system, forming a unified resource pool in combination with the hardware resources provided by itself comprises:
at least one operation among hardware equipment resource list acquisition, hardware equipment attribute information acquisition and related task execution of the hardware equipment including a password operation task and an image processing task is carried out through system call or API provided by an operating system, so that hardware resources of a computer or a server are identified and recorded;
and forming a uniform resource pool according to the hardware resources of the computer or the server and the hardware resources provided by the computer or the server.
4. The method for accelerating algorithm running efficiency of a computing resource capable of being dynamically expanded according to claim 1, wherein providing the SDK corresponding to the algorithm supported by the testing party for the testing party to call a test item to perform an algorithm test, and the completion of the test to receive the performance test data returned by the testing party comprises:
providing SDK corresponding to the algorithm supported by the testing party for the testing party to call the test item to carry out the algorithm test, thereby obtaining the execution times per second under the single-thread condition of each resource containing the algorithm and the performance test data of the memory resource occupied in the algorithm execution;
after the performance test is completed, the performance test data returned by the tester is received.
5. The method of any one of claims 1-4, wherein the step of calling other computing resources in the resource pool to participate in the operation of the computing instruction in a single or combined manner according to the computing instruction transmitted by the upper layer service, the performance test data of the algorithm corresponding to the computing instruction, and the current usage of the computing resource in the resource pool to reduce the CPU resource usage of the computer or the server as an optimization target comprises:
acquiring a calculation instruction issued by an upper layer service;
analyzing the calculation instruction to obtain at least one stage task algorithm contained in the calculation instruction;
matching each stage task algorithm with the algorithm supported by the algorithm, and taking the performance test data of the algorithm with the consistency degree larger than a set threshold value as the performance test data of the stage task algorithm;
judging whether the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm according to the type, the number and the performance test data of each stage task algorithm;
if the current CPU resource of the computer or the server can meet the requirement of at least one stage task algorithm, directly using the current CPU resource of the computer or the server to carry out the operation processing of the corresponding at least one stage task algorithm, and transmitting other stage task algorithms which are included in the calculation instruction and are except for the current CPU resource of the dependent computer or the server to complete the stage task algorithm to other calculation resources in a resource pool to participate in the operation work of the calculation instruction in a single or combined mode;
if the current CPU resource of the computer or the server can not meet the requirement of at least one stage task algorithm, then the task algorithms of all stages are handed to other computing resources in the resource pool to participate in the operation work of the computing instruction in a single or combined mode;
the conditions required by the phase task algorithm include, among others, expected target efficiency, latest completion time, and running rate.
6. The method for dynamically expanding the operational efficiency of an acceleration algorithm for a computing resource of claim 5, wherein the operation of engaging in the computing instruction, alone or in combination, with other computing resources in the resource pool comprises:
identifying the current use condition of the computing resources of the resource pool through driving;
selecting computing resources of which the residual resources meet the constraint conditions among the computing resources except CPU resources of a computer or a server in the resource pool according to the constraint conditions comprising the type of the stage task algorithm, the number of the stage task algorithms, the performance test data of the stage task algorithm and the current use condition of the computing resources of the resource pool so as to form a schedulable group;
the optimization target of reducing the CPU resource utilization ratio of the computer or the server is that a computing resource is independently scheduled from a schedulable group to participate in the operation work of the computing instruction or an algorithm chain is sequentially scheduled and combined to participate in the operation work of the computing instruction in stages.
7. An apparatus for dynamically expanding the efficiency of the operation of an acceleration algorithm on a computing resource, wherein the apparatus is a software and hardware integrated apparatus connected to a computer or a server, for performing the method of any of claims 1-6.
8. The apparatus for accelerating algorithm operation efficiency of dynamically expandable computing resources according to claim 7, wherein a plurality of apparatuses can be simultaneously accessed to the computer or the server, and each apparatus supports capacity expansion of its own hardware resources, and when the type and number of the apparatuses included in the apparatus are updated, corresponding update information is registered in the system of the computer or the server, and resource updating of the resource pool is performed.
9. The device for dynamically expanding the operation efficiency of the algorithm for accelerating the computing resource according to claim 7, wherein a plurality of cryptographic algorithms are built in the device, and the device has multi-level encryption and decryption parallel capability;
the cryptographic algorithm comprises national cryptographic algorithms SM2, SM3 and SM4, international algorithms RSA, SHA, AES, DES and 3DES and homomorphic encryption Paillier; and the realization of each cryptographic algorithm is divided into a CPU version and a GPU version, and the CPU version and the GPU version have different parallel acceleration modes.
10. The apparatus for dynamically expanding the operational efficiency of an acceleration algorithm of a computing resource according to claim 9, wherein the apparatus is programmable with built-in cryptographic algorithms supporting autonomous programming, modification, replacement and optimization by a user.
CN202311804579.2A 2023-12-26 2023-12-26 Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource Active CN117453424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311804579.2A CN117453424B (en) 2023-12-26 2023-12-26 Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311804579.2A CN117453424B (en) 2023-12-26 2023-12-26 Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource

Publications (2)

Publication Number Publication Date
CN117453424A true CN117453424A (en) 2024-01-26
CN117453424B CN117453424B (en) 2024-04-19

Family

ID=89589654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311804579.2A Active CN117453424B (en) 2023-12-26 2023-12-26 Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource

Country Status (1)

Country Link
CN (1) CN117453424B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016103179A (en) * 2014-11-28 2016-06-02 株式会社日立製作所 Allocation method for computer resource and computer system
CN107395452A (en) * 2017-06-22 2017-11-24 重庆大学 A kind of method for the HTTPS application performances that WebServer is improved using software-hardware synergism technology
CN113867882A (en) * 2020-06-30 2021-12-31 中国电信股份有限公司 Container resource scheduling method and device and computer readable storage medium
CN115408152A (en) * 2022-08-23 2022-11-29 吉兴信(广东)信息技术有限公司 Adaptive resource matching obtaining method and system
CN115525418A (en) * 2021-06-25 2022-12-27 复旦大学 Multi-kernel multi-group operating system construction method
CN116954873A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Heterogeneous computing system, and method, device, equipment and medium for selecting power nodes of heterogeneous computing system
CN117032945A (en) * 2023-06-25 2023-11-10 广西电网有限责任公司 Heterogeneous computing architecture for adjusting computing resource balance energy consumption through energy consumption perception
CN117221328A (en) * 2023-10-24 2023-12-12 国电南瑞科技股份有限公司 Service scheduling method and system based on situation awareness

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016103179A (en) * 2014-11-28 2016-06-02 株式会社日立製作所 Allocation method for computer resource and computer system
CN107395452A (en) * 2017-06-22 2017-11-24 重庆大学 A kind of method for the HTTPS application performances that WebServer is improved using software-hardware synergism technology
CN113867882A (en) * 2020-06-30 2021-12-31 中国电信股份有限公司 Container resource scheduling method and device and computer readable storage medium
CN115525418A (en) * 2021-06-25 2022-12-27 复旦大学 Multi-kernel multi-group operating system construction method
CN115408152A (en) * 2022-08-23 2022-11-29 吉兴信(广东)信息技术有限公司 Adaptive resource matching obtaining method and system
CN117032945A (en) * 2023-06-25 2023-11-10 广西电网有限责任公司 Heterogeneous computing architecture for adjusting computing resource balance energy consumption through energy consumption perception
CN116954873A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Heterogeneous computing system, and method, device, equipment and medium for selecting power nodes of heterogeneous computing system
CN117221328A (en) * 2023-10-24 2023-12-12 国电南瑞科技股份有限公司 Service scheduling method and system based on situation awareness

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
严健康;陈更生;: "基于CPU/GPU异构资源协同调度的改进H-Storm平台", 计算机工程, no. 04, 15 April 2018 (2018-04-15) *
刘鑫: "面向异构计算平台的GPU资源调度系统的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 October 2023 (2023-10-15) *

Also Published As

Publication number Publication date
CN117453424B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN108537543B (en) Parallel processing method, device, equipment and storage medium for blockchain data
US11106495B2 (en) Techniques to dynamically partition tasks
WO2018059029A1 (en) Resource allocation method, related device and system
CN110389903B (en) Test environment deployment method and device, electronic equipment and readable storage medium
US20240118928A1 (en) Resource allocation method and apparatus, readable medium, and electronic device
CN113434284B (en) Privacy computation server side equipment, system and task scheduling method
US11347546B2 (en) Task scheduling method and device, and computer storage medium
CN109656699A (en) Distributed computing method, device, system, equipment and readable storage medium storing program for executing
CN109902059B (en) Data transmission method between CPU and GPU
CN104793996A (en) Task scheduling method and device of parallel computing equipment
US10318343B2 (en) Migration methods and apparatuses for migrating virtual machine including locally stored and shared data
CN117453424B (en) Method and device capable of dynamically expanding operation efficiency of acceleration algorithm of computing resource
CN112835703B (en) Task processing method, device, equipment and storage medium
US20090154695A1 (en) Managing a plurality of cached keys
CN111708812A (en) Distributed data processing method
CN111597035A (en) Simulation engine time advancing method and system based on multiple threads
CN115775199A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN111489279A (en) GPU acceleration optimization method and device and computer storage medium
CN113626099B (en) Application program starting method and device and electronic equipment
CN111475277A (en) Resource allocation method, system, equipment and machine readable storage medium
CN113360411A (en) Code coverage rate detection method and device based on parallel variation test
CN112468414B (en) Cloud computing multi-level scheduling method, system and storage medium
CN115061794A (en) Method, device, terminal and medium for scheduling task and training neural network model
CN112286663B (en) Task scheduling method and device
CN112799851B (en) Data processing method and related device in multiparty security calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant