KR102194513B1

KR102194513B1 - Web service system and method using gpgpu based task queue

Info

Publication number: KR102194513B1
Application number: KR1020190073406A
Authority: KR
Inventors: 정회경; 김경환
Original assignee: 배재대학교 산학협력단
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2020-12-23

Abstract

According to one embodiment of the present invention, a system for providing a web service utilizing a task queue based on GPGPU comprises: a task queue created in GPU and processing a user request for a web service by using the multi-thread processing capability of GPGPU; and an HTTP server creating a plurality of producer threads and a plurality of consumers threads in response to an HTTP request if there is the HTTP request responding to the user request from the client, receiving a packet related to the HTTP request through the plurality of producer threads to the task queue, and checking whether an error has occurred with respect to data preprocessed by the task queue through the plurality of consumer threads and transferring the data to a web application server if there is no error as a result of the check. Therefore, the system can improve the performance by reducing work of the CPU.

Description

Web service providing system and method using GPGPP based task queue {WEB SERVICE SYSTEM AND METHOD USING GPGPU BASED TASK QUEUE}

본 발명의 실시예들은 GPGPU(General-Purpose Computing on Graphics Processing) 기반 태스크 큐를 활용한 웹 서비스 제공 시스템 및 방법에 관한 것이다.Embodiments of the present invention relate to a system and method for providing a web service using a task queue based on General-Purpose Computing on Graphics Processing (GPGPU).

인터넷에서의 웹 서버는 불특정 다수의 사용자 요청을 처리하기 위하여 항상 준비되어 있고 매우 개방적이다. 하지만 대량의 사용자 요청이 발생할 경우 장애가 발생하거나 원활한 웹 서비스가 불가능한 경우가 발생한다.Web servers on the Internet are always ready and very open to handle a large number of unspecified user requests. However, when a large number of user requests occur, there are cases in which a failure occurs or a smooth web service is not possible.

평상시에는 사용자 요청량을 처리하는데 문제가 없던 웹 서버도 갑자기 발생하는 대량의 사용자 요청을 처리하지 못하고 웹 서버의 다운이 발생하기도 한다. 웹 서버의 다운이 발생하면 시스템을 다시 부팅하거나 또는 웹 서버의 재가동에 상당한 시간을 투자하여 정상화 시킬 수 있다.Even a web server, which normally has no problem in handling user requests, cannot handle a large number of user requests that occur suddenly and the web server is down. If the web server is down, it can be restored by rebooting the system or by investing a considerable amount of time to restart the web server.

이러한 문제를 대비하기 위하여 웹 서버의 자원을 늘리고 동적으로 자원을 할당할 수 있는 시스템인 클라우드 서비스 등을 활용하고 문제 해결을 위해 노력하고 있다. 이런 노력들은 대부분 많은 비용을 발생시키고, 어떤 경우에는 완전한 해결 방법을 제시하지 못하고 있다.In order to prepare for this problem, we are trying to solve the problem by using a cloud service, a system that can increase the resources of the web server and allocate resources dynamically. Most of these efforts are costly and in some cases do not provide a complete solution.

한편, GPGPU는 HPC 분야, 게임, 시뮬레이션, 인공지능 분야 등에 많이 이용되고 그 수가 점점 증가하고 있다. GPU를 사용하여 많은 작업들을 수행하고 GPU의 이용 분야도 다양하게 발전하고 있다.Meanwhile, GPGPU is widely used in the field of HPC, games, simulation, and artificial intelligence, and the number is increasing. Many tasks are performed using the GPU, and the fields of use of the GPU are also developing in various ways.

최신의 GPU는 그래픽 처리뿐 아니라 GPGPU 프로그램을 지원하는 고성능의 부동 소수점 연산 기능을 제공하고 있다. 또한 대량의 스레드를 생성하여 처리할 수 있도록 그 성능도 계속 발전하고 있다. 이런 GPU의 성능을 일반 어플리케이션에서 사용할 수 있도록 GPU가 계속 발전하고, GPU가 CPU를 보조하여 발전하면 범용의 어플리케이션의 실행 성능이 향상되고, 고성능의 범용 어플리케이션을 수행하기 위한 하드웨어도 더 비용을 절감할 수 있다.The latest GPUs not only provide graphics processing but also provide high-performance floating-point arithmetic functions that support GPGPU programs. In addition, its performance continues to evolve so that it can create and process a large number of threads. If the GPU continues to evolve so that the performance of this GPU can be used in general applications, and the GPU assists the CPU to develop, the execution performance of general-purpose applications improves, and the hardware for executing high-performance general-purpose applications can further reduce costs. I can.

관련 선행기술로는 대한민국 공개특허공보 제10-2017-0116439호(발명의 명칭: 태스크 스케줄링 방법 및 장치, 공개일자: 2017.10.19)가 있다.As related prior art, there is Korean Patent Application Publication No. 10-2017-0116439 (title of the invention: task scheduling method and apparatus, publication date: October 19, 2017).

본 발명의 일 실시예는 GPGPU를 활용하여 대량의 요구사항을 처리할 수 있는 태스크 큐를 설계하여 HTTP의 일부를 GPU에서 처리하여 CPU의 작업을 줄여 성능을 향상시킬 수 있는 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템 및 방법을 제공한다.An embodiment of the present invention utilizes a GPGPU-based task queue that can improve performance by reducing CPU work by designing a task queue capable of handling a large amount of requirements using GPGPU and processing part of HTTP on the GPU. Provides a web service providing system and method.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problem(s) mentioned above, and another problem(s) not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템은 GPU에 생성되고 GPGPU의 다중 스레드 처리 능력을 활용하여 웹 서비스의 사용자 요청을 처리하는 태스크 큐; 및 클라이언트로부터 상기 사용자 요청에 대응하는 HTTP 요청(Request)이 있으면 이에 응답하여 복수의 프로듀서 스레드(Producer Thread) 및 복수의 컨슈머 스레드(Consumer Thread)를 생성하고, 상기 복수의 프로듀서 스레드를 통해 상기 HTTP 요청에 관한 패킷을 받아 상기 태스크 큐에 전송하며, 상기 태스크 큐에 의해 전처리 된 데이터에 대해 상기 복수의 컨슈머 스레드를 통해 오류 발생 여부를 확인하고 확인 결과 오류가 없으면 상기 데이터를 웹 어플리케이션 서버에 전달하는 HTTP 서버를 포함한다.A web service providing system using a GPGPU-based task queue according to an embodiment of the present invention includes: a task queue that is created in a GPU and processes user requests of a web service by using the multi-thread processing capability of the GPGPU; And generating a plurality of producer threads and a plurality of consumer threads in response to an HTTP request corresponding to the user request from a client, and making the HTTP request through the plurality of producer threads. HTTP that receives a packet related to and transmits it to the task queue, checks whether an error has occurred with respect to the data preprocessed by the task queue through the plurality of consumer threads, and delivers the data to the web application server if there is no error. Includes the server.

상기 태스크 큐는 원형 큐 알고리즘을 이용하여 대량으로 발생하는 상기 사용자 요청이 상기 태스크 큐의 용량을 초과하지 않도록 처리할 수 있다.The task queue may be processed so that a large amount of user requests does not exceed the capacity of the task queue using a circular queue algorithm.

상기 태스크 큐는 GPGPU 기술 기반의 CUDA(Compute Unified Device Architecture)를 사용하여 구현되되, 상기 CUDA의 스레드가 원형 큐와 같이 작동하여 어플리케이션 처리가 가능하도록 구현될 수 있다.The task queue may be implemented using a Compute Unified Device Architecture (CUDA) based on GPGPU technology, and may be implemented so that the CUDA thread operates like a circular queue to enable application processing.

상기 CUDA는 커널 함수를 이용하여 HTTP 패킷 오류 처리 함수를 호출하여 상기 GPU에 복수의 스레드와 함께 생성하고, 상기 HTTP 패킷 오류 처리 함수는 상기 커널 함수에서 정의된 태스크 큐 데이터의 구조체와 그 구조체의 크기를 파라미터로 받아서 상기 HTTP 요청에 관한 패킷의 적합성을 확인하여 에러 플래그(Error Flag)에 값을 넣어 반환 처리할 수 있다.The CUDA calls an HTTP packet error processing function using a kernel function to create the GPU with a plurality of threads, and the HTTP packet error processing function is a structure of task queue data defined in the kernel function and the size of the structure. By receiving as a parameter, the suitability of the packet for the HTTP request is checked, and a value is added to the error flag for return processing.

상기 컨슈머 스레드는 상기 태스크 큐의 데이터를 비동기 방식으로 읽어 들여서 오류 발생 여부를 확인하고, 확인 결과 오류가 발생하면 상기 프로듀서 스레드에게 오류 코드에 대한 결과를 전송하여 상기 프로듀서 스레드를 통해 사용자에게 바로 그 결과를 전송하고, 확인 결과 오류가 없으면 상기 웹 어플리케이션 서버에 상기 데이터를 전달할 수 있다.The consumer thread reads the data in the task queue asynchronously to check whether an error has occurred, and if an error occurs as a result of the verification, it transmits the result of the error code to the producer thread, and the result is immediately to the user through the producer thread Is transmitted, and if there is no error as a result of checking, the data may be delivered to the web application server.

상기 컨슈머 스레드는 HTTP 프로토콜을 처리할 수 있는 기능을 통해 상기 프로듀서 스레드에서 생성한 태스크 큐를 접근하여 비동기 방식으로 상기 데이터를 읽어 들이되, 상기 읽어 들인 데이터는 이미 상기 GPU에서 데이터의 정합성, HTTP의 오류, 사용자 입력 값의 오류 중 적어도 하나를 확인하여 구조체에 저장하고 있으므로 상기 구조체의 값을 참고하여 상기 데이터의 오류 발생 여부를 확인할 수 있다.The consumer thread accesses the task queue created by the producer thread through a function capable of processing the HTTP protocol and reads the data in an asynchronous manner, but the read data is already consistent with the data in the GPU and the HTTP Since at least one of an error and an error of a user input value is checked and stored in a structure, it is possible to check whether or not an error has occurred in the data by referring to the value of the structure.

상기 프로듀서 스레드에서 생성한 태스크 큐의 스레드는 사용자 요청의 데이터를 검사하여 오류가 발생하면 부울 형 구조체 변수에 오류 발생 여부와 오류 코드를 저장하고, 상기 컨슈머 스레드는 상기 부울 형 구조체 변수의 값을 참고하여 상기 데이터의 오류 발생 여부를 확인할 수 있다.The thread of the task queue created by the producer thread checks the data of the user request and, if an error occurs, stores the error occurrence and error code in a boolean structure variable, and the consumer thread refers to the value of the boolean structure variable. Thus, it is possible to check whether an error has occurred in the data.

상기 태스크 큐는 CPU의 부하로 인해 상기 사용자 요청을 처리하지 못할 경우 클라이언트에게 대기 시간 및 대기 순번을 포함하는 대기 정보를 전송하여 사용자에게 안내할 수 있다.When the task queue fails to process the user request due to a load of the CPU, the task queue may transmit wait information including a wait time and a wait sequence number to the client to guide the user.

GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 방법은 클라이언트로부터 웹 서비스의 사용자 요청에 대응하는 HTTP 요청(Request)이 있으면 이에 응답하여 HTTP 서버가 복수의 프로듀서 스레드(Producer Thread) 및 복수의 컨슈머 스레드(Consumer Thread)를 생성하는 단계; 상기 HTTP 서버가 상기 복수의 프로듀서 스레드를 통해 상기 HTTP 요청에 관한 패킷을 받아 GPU에 생성된 태스크 큐에 전송하는 단계; 상기 태스크 큐가 GPGPU의 다중 스레드 처리 능력을 활용하여 상기 HTTP 요청을 처리하는 단계; 상기 HTTP 서버가 상기 태스크 큐에 의해 전처리 된 데이터에 대해 상기 복수의 컨슈머 스레드를 통해 오류 발생 여부를 확인하는 단계; 및 확인 결과 오류가 없으면, 상기 HTTP 서버가 상기 데이터를 웹 어플리케이션 서버에 전달하는 단계를 포함한다.In the method of providing web services using GPGPU-based task queues, if there is an HTTP request corresponding to the user request of the web service from the client, the HTTP server responds to a plurality of producer threads and a plurality of consumer threads. Creating a Thread); Receiving, by the HTTP server, a packet related to the HTTP request through the plurality of producer threads and transmitting the received packet to a task queue generated in a GPU; Processing, by the task queue, the HTTP request by utilizing the multi-thread processing capability of the GPGPU; Checking, by the HTTP server, whether an error has occurred with respect to the data preprocessed by the task queue through the plurality of consumer threads; And if there is no error as a result of the verification, transmitting the data to the web application server by the HTTP server.

상기 태스크 큐는 GPGPU 기술 기반의 CUDA를 사용하여 구현되되, 상기 CUDA의 스레드가 원형 큐와 같이 작동하여 어플리케이션 처리가 가능하도록 구현될 수 있다.The task queue may be implemented using CUDA based on GPGPU technology, but the CUDA thread may be implemented to perform application processing by operating like a circular queue.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 첨부 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and accompanying drawings.

본 발명의 일 실시예에 따르면, GPGPU를 활용하여 대량의 요구사항을 처리할 수 있는 태스크 큐를 설계하여 HTTP의 일부를 GPU에서 처리하여 CPU의 작업을 줄여 성능을 향상시킬 수 있다.According to an embodiment of the present invention, by designing a task queue capable of handling a large amount of requirements using a GPGPU, a part of HTTP is processed by a GPU, thereby reducing CPU work and improving performance.

본 발명의 일 실시예에 따르면, CPU의 부하가 많이 걸려 사용자 요구를 처리하지 못할 경우 태스크 큐에서 사용자에게 대기 시간 및 대기 순번을 안내할 수 있어서 사용자가 무한정 기다리는 것을 방지할 수 있다.According to an embodiment of the present invention, when a user request cannot be processed due to a heavy CPU load, a waiting time and a waiting sequence number can be guided to the user in a task queue, thereby preventing the user from waiting indefinitely.

본 발명의 일 실시예에 따르면, 웹 서비스의 사용자 요청을 처리하기 위해 GPGPU의 다중 스레드 처리 능력을 활용할 수 있는 태스크 큐를 개발하여 HTTP 요청을 태스크 큐에 저장하므로 GPU에서 HTTP를 전처리 하여 웹 서버의 부담을 경감할 수 있다.According to an embodiment of the present invention, a task queue that can utilize the multithreaded processing power of the GPGPU is developed to process user requests of web services and stores the HTTP requests in the task queue. The burden can be reduced.

본 발명의 일 실시예에 따르면, 태스크 큐를 원형 큐 알고리즘을 적용하여 대량으로 발생하는 사용자의 요청이 태스크 큐 용량을 초과할 수 없도록 함으로써 웹 사용자의 요청을 웹 서버에서 처리 가능한 요청으로 받아들여서 웹 서버가 보다 안정적으로 웹 서비스를 처리하도록 할 수 있다.According to an embodiment of the present invention, by applying a circular queue algorithm to the task queue, a request from a large number of users cannot exceed the capacity of the task queue, thereby accepting the request of a web user as a request that can be processed by the web server. You can make the server process web services more reliably.

도 1은 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템을 설명하기 위해 도시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 방법을 설명하기 위해 도시한 흐름도이다.
도 3은 GPGPU 기반 태스크 큐의 CUDA 커널 함수에서 호출하여 사용되는 HTTP 패킷 오류처리 함수의 일례를 나타낸 도면이다.
도 4는 HTTP 서버의 Producer를 구현한 일례를 도시한 도면이다.
도 5는 HTTP 서버의 Consumer를 구현한 일례를 도시한 도면이다.
도 6은 각각의 실험 시나리오별 성능을 나타낸 도면이다.
도 7은 두 서버(GPGPU 기반 태스크 큐를 사용하지 않은 서버와 GPGPU 기반 태스크 큐 서버)의 오류율을 나타낸 도면이다.1 is a block diagram illustrating a system for providing a web service using a GPGPU-based task queue according to an embodiment of the present invention.
2 is a flowchart illustrating a method of providing a web service using a GPGPU-based task queue according to an embodiment of the present invention.
3 is a diagram showing an example of an HTTP packet error processing function used by calling a CUDA kernel function of a GPGPU-based task queue.
4 is a diagram showing an example of implementing a producer of an HTTP server.
5 is a diagram illustrating an example of implementing a Consumer of an HTTP server.
6 is a diagram showing the performance of each experimental scenario.
7 is a diagram showing error rates of two servers (a server not using a GPGPU-based task queue and a GPGPU-based task queue server).

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.Advantages and/or features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described later in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, only these embodiments make the disclosure of the present invention complete, and common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same elements throughout the specification.

또한, 이하 실시되는 본 발명의 바람직한 실시예는 본 발명을 이루는 기술적 구성요소를 효율적으로 설명하기 위해 각각의 시스템 기능구성에 기 구비되어 있거나, 또는 본 발명이 속하는 기술분야에서 통상적으로 구비되는 시스템 기능 구성은 가능한 생략하고, 본 발명을 위해 추가적으로 구비되어야 하는 기능 구성을 위주로 설명한다. 만약 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면, 하기에 도시하지 않고 생략된 기능 구성 중에서 종래에 기 사용되고 있는 구성요소의 기능을 용이하게 이해할 수 있을 것이며, 또한 상기와 같이 생략된 구성 요소와 본 발명을 위해 추가된 구성 요소 사이의 관계도 명백하게 이해할 수 있을 것이다.In addition, a preferred embodiment of the present invention to be implemented below is already provided in each system functional configuration in order to efficiently describe the technical components constituting the present invention, or system functions commonly provided in the technical field to which the present invention belongs. The configuration will be omitted as much as possible, and a functional configuration that should be additionally provided for the present invention will be mainly described. If a person of ordinary skill in the art to which the present invention belongs will be able to easily understand the functions of components previously used among functional configurations that are not shown below and are omitted, the configuration omitted as described above. The relationship between the elements and the components added for the present invention will also be clearly understood.

또한, 이하의 설명에 있어서, 신호 또는 정보의 "전송", "통신", "송신", "수신" 기타 이와 유사한 의미의 용어는 일 구성요소에서 다른 구성요소로 신호 또는 정보가 직접 전달되는 것뿐만이 아니라 다른 구성요소를 거쳐 전달되는 것도 포함한다. 특히 신호 또는 정보를 일 구성요소로 "전송" 또는 "송신"한다는 것은 그 신호 또는 정보의 최종 목적지를 지시하는 것이고 직접적인 목적지를 의미하는 것이 아니다. 이는 신호 또는 정보의 "수신"에 있어서도 동일하다.In addition, in the following description, "transmission", "communication", "transmission", "receive" of a signal or information, and other terms with similar meanings refer to direct transmission of signals or information from one component to another. Not only that, but it includes things that are passed through other components. In particular, "transmitting" or "transmitting" a signal or information to a component indicates the final destination of the signal or information and does not imply a direct destination. The same is true for "reception" of signals or information.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템을 설명하기 위해 도시한 블록도이다.1 is a block diagram illustrating a system for providing a web service using a GPGPU-based task queue according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템(100)은 태스크 큐(110), HTTP 서버(120) 및 웹 어플리케이션 서버(130)를 포함하여 구성될 수 있다.Referring to FIG. 1, a web service providing system 100 using a GPGPU-based task queue according to an embodiment of the present invention includes a task queue 110, an HTTP server 120, and a web application server 130. Can be configured.

상기 HTTP 서버(120)의 컨슈머(Consumer) 스레드(124)에서 GPGPU 프로그래밍 방법으로 GPU에 태스크 큐(110)를 생성하고, 상기 GPU에서 실행할 C 코드도 함께 GPU에 전송하여 대량의 스레드를 C 코드 실행으로 생성할 수 있다. 상기 HTTP 서버(120)의 컨슈머 스레드(124)에서 스레드 풀을 생성하고 관리하며, 상기 HTTP 서버(120)의 프로듀서(Producer) 스레드(122)에서도 태스크 큐(110)에 접근이 가능하도록 GPGPU 프로그래밍 방법으로 상기 GPU를 사용한다. 상기 프로듀서 스레드(122)에서 사용자 요청을 받으면 바로 태스크 큐(110)에 요청을 전송할 수 있다.In the consumer thread 124 of the HTTP server 120, a task queue 110 is created on the GPU using a GPGPU programming method, and C code to be executed on the GPU is also transmitted to the GPU to execute a large number of threads. Can be created with GPGPU programming method to create and manage a thread pool in the consumer thread 124 of the HTTP server 120 and to access the task queue 110 from the producer thread 122 of the HTTP server 120 Use the above GPU. Upon receiving a user request from the producer thread 122, the request may be immediately transmitted to the task queue 110.

이하에서는 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 시스템(100)의 구성요소들(110, 120, 122, 124, 130) 각각에 대해서 구체적으로 설명한다.Hereinafter, each of the components 110, 120, 122, 124, and 130 of the web service providing system 100 using a GPGPU-based task queue according to an embodiment of the present invention will be described in detail.

상기 태스크 큐(110)는 GPU에 생성되고, GPGPU의 다중 스레드 처리 능력을 활용하여 웹 서비스의 사용자 요청을 처리할 수 있다. 이때, 상기 태스크 큐(110)는 원형 큐 알고리즘을 이용하여 대량으로 발생하는 상기 사용자 요청이 상기 태스크 큐의 용량을 초과하지 않도록 처리할 수 있다.The task queue 110 is generated in the GPU, and may process user requests for web services by utilizing the multi-thread processing capability of the GPGPU. In this case, the task queue 110 may process the user requests generated in large quantities using a circular queue algorithm so as not to exceed the capacity of the task queue.

상기 태스크 큐(110)는 GPGPU 기술 기반의 CUDA(Compute Unified Device Architecture)를 사용하여 구현될 수 있다. 여기서, 상기 CUDA의 스레드(Thread)는 원형 큐와 같이 작동하여 어플리케이션 처리가 가능하도록 구현될 수 있다.The task queue 110 may be implemented using a Compute Unified Device Architecture (CUDA) based on GPGPU technology. Here, the thread of the CUDA may be implemented to enable application processing by operating like a circular queue.

또한, 상기 CUDA는 커널 함수를 이용하여 HTTP 패킷 오류 처리 함수를 호출하여 생성할 수 있는데, 이때 상기 HTTP 패킷 오류 처리 함수는 복수의 스레드와 함께 상기 GPU에 생성될 수 있다.In addition, the CUDA may be generated by calling an HTTP packet error processing function using a kernel function. In this case, the HTTP packet error processing function may be generated in the GPU together with a plurality of threads.

상기 HTTP 패킷 오류 처리 함수는 상기 커널 함수에서 정의된 태스크 큐 데이터의 구조체와 그 구조체의 크기를 파라미터로 받아서 상기 웹 서비스의 사용자 요청, 즉 HTTP 요청에 관한 패킷의 적합성을 확인하고, 그 확인 결과에 따라 에러 플래그(Error Flag)에 값을 넣어 반환 처리할 수 있다.The HTTP packet error processing function receives the structure of the task queue data defined in the kernel function and the size of the structure as parameters, checks the suitability of the user request of the web service, that is, the packet related to the HTTP request, and the result of the check Accordingly, you can process the return by putting a value in the Error Flag.

한편, 상기 태스크 큐(110)는 CPU의 부하로 인해 상기 사용자 요청(HTTP 요청)을 처리하지 못할 경우, 클라이언트에게 대기 시간 및 대기 순번을 포함하는 대기 정보를 전송할 수 있다. 이에 따라 사용자는 상기 사용자 요청에 대한 대기 정보를 상기 클라이언트(예: PC, 스마트폰 등)의 화면을 통하여 안내를 받을 수 있다.Meanwhile, when the task queue 110 cannot process the user request (HTTP request) due to the load of the CPU, the task queue 110 may transmit wait information including a wait time and a wait sequence number to the client. Accordingly, the user may receive information on waiting for the user request through the screen of the client (eg, PC, smartphone, etc.).

상기 HTTP 서버(120)는 클라이언트로부터 상기 사용자 요청에 대응하는 HTTP 요청(Request)이 있으면 이에 응답하여 복수의 프로듀서 스레드(Producer Thread)(122) 및 복수의 컨슈머 스레드(Consumer Thread)(124)를 생성할 수 있다.The HTTP server 120 generates a plurality of producer threads 122 and a plurality of consumer threads 124 in response to an HTTP request corresponding to the user request from a client. can do.

상기 HTTP 서버(120)는 상기 복수의 프로듀서 스레드(122)를 통해 상기 HTTP 요청에 관한 패킷을 받아 상기 태스크 큐(110)에 전송할 수 있다. 그리고, 상기 HTTP 서버(120)는 상기 태스크 큐(110)에 의해 전처리 된 데이터에 대해 상기 복수의 컨슈머 스레드(124)를 통해 오류 발생 여부를 확인할 수 있다.The HTTP server 120 may receive a packet related to the HTTP request through the plurality of producer threads 122 and transmit it to the task queue 110. In addition, the HTTP server 120 may check whether an error has occurred with respect to the data preprocessed by the task queue 110 through the plurality of consumer threads 124.

여기서, 상기 컨슈머 스레드(124)는 상기 태스크 큐(110)의 데이터를 비동기 방식으로 읽어 들여서 오류 발생 여부를 확인할 수 있다.Here, the consumer thread 124 may check whether an error has occurred by reading data from the task queue 110 in an asynchronous manner.

구체적으로, 상기 컨슈머 스레드(124)는 HTTP 프로토콜을 처리할 수 있는 기능을 통해 상기 프로듀서 스레드(122)에서 생성한 태스크 큐(110)를 접근하여 비동기 방식으로 상기 데이터를 읽어 들일 수 있다. 그런데, 상기 읽어 들인 데이터는 이미 상기 GPU에서 데이터의 정합성, HTTP의 오류, 사용자 입력 값의 오류 등을 확인하여 구조체에 저장하고 있다. 그러므로, 상기 컨슈머 스레드(124)는 상기 구조체의 값을 참고하여 상기 데이터의 오류 발생 여부를 확인할 수 있다.Specifically, the consumer thread 124 may access the task queue 110 generated by the producer thread 122 through a function capable of processing the HTTP protocol and read the data in an asynchronous manner. However, the read data is already stored in the structure after checking the data consistency, HTTP error, and user input value error in the GPU. Therefore, the consumer thread 124 can check whether an error has occurred in the data by referring to the value of the structure.

다시 말해, 상기 프로듀서 스레드(122)에서 생성한 태스크 큐(110)의 스레드는 상기 사용자 요청의 데이터를 검사하여 오류가 발생하면 부울 형 구조체 변수에 오류 발생 여부와 오류 코드를 저장할 수 있다. 따라서, 상기 컨슈머 스레드(124)는 상기 부울 형 구조체 변수의 값을 참고하여 상기 데이터의 오류 발생 여부를 확인할 수 있다.In other words, the thread of the task queue 110 generated by the producer thread 122 may check the data of the user request and, if an error occurs, may store an error occurrence and an error code in a boolean structure variable. Accordingly, the consumer thread 124 can check whether an error has occurred in the data by referring to the value of the boolean structure variable.

상기 컨슈머 스레드(124)는 오류 발생 여부의 확인 결과, 오류가 발생하면 상기 프로듀서 스레드(122)에게 오류 코드에 대한 결과를 전송하여 상기 프로듀서 스레드(122)를 통해 사용자에게 바로 그 결과를 전송할 수 있다.As a result of checking whether an error has occurred, the consumer thread 124 may transmit a result of the error code to the producer thread 122 when an error occurs, and may directly transmit the result to the user through the producer thread 122. .

반면에, 오류 발생 여부의 확인 결과 오류가 없으면, 상기 컨슈머 스레드(124)는 상기 웹 어플리케이션 서버(130)에 상기 데이터를 전달할 수 있다.On the other hand, if there is no error as a result of checking whether an error has occurred, the consumer thread 124 may transmit the data to the web application server 130.

이상에서 설명된 장치는 하드웨어 구성 요소, 소프트웨어 구성 요소, 및/또는 하드웨어 구성 요소 및 소프트웨어 구성 요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성 요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It can be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

도 2는 본 발명의 일 실시예에 따른 GPGPU 기반 태스크 큐를 활용한 웹 서비스 제공 방법을 설명하기 위해 도시한 흐름도이다.2 is a flowchart illustrating a method of providing a web service using a GPGPU-based task queue according to an embodiment of the present invention.

여기서 설명하는 웹 서비스 제공 방법은 본 발명의 하나의 실시예에 불과하며, 그 이외에 필요에 따라 다양한 단계들이 부가될 수 있고, 하기의 단계들도 순서를 변경하여 실시될 수 있으므로, 본 발명이 하기에 설명하는 각 단계 및 그 순서에 한정되는 것은 아니다.The web service providing method described here is only one embodiment of the present invention. In addition, various steps may be added as needed, and the following steps may also be implemented by changing the order. It is not limited to each step and the order described in the.

도 2를 참조하면, 먼저 클라이언트가 웹 서버에 사용자의 요청을 보내면(1. Http Request), HTTP 서버(120)의 프로듀서 스레드(122)는 사용자의 HTTP Request를 받아 바로 GPU의 태스크 큐(110)에 패킷을 전송한다(2. Http Packet sending Task Queue).Referring to FIG. 2, when a client first sends a user's request to a web server (1. Http Request), the producer thread 122 of the HTTP server 120 receives the user's HTTP request and immediately receives the user's task queue 110 of the GPU. Send a packet to (2. Http Packet sending Task Queue).

이에 따라, GPU의 스레드는 패킷 데이터를 저장하고 HTTP의 패킷 오류, 사용자 데이터 오류 등을 확인하여 오류의 내용을 변수에 저장하고 대기한다.Accordingly, the GPU thread stores the packet data, checks the HTTP packet error, user data error, etc., stores the contents of the error in a variable and waits.

다음으로, 상기 HTTP 서버(120)의 컨슈머 스레드(124)는 태스크 큐(110)의 데이터를 비동기 방식으로 읽어서 GPU 스레드에서 처리한 오류 발생 여부를 확인한다(3. Http Packet Reading Task Queue).Next, the consumer thread 124 of the HTTP server 120 asynchronously reads the data of the task queue 110 to check whether an error processed by the GPU thread has occurred (3. Http Packet Reading Task Queue).

이때, 확인 결과 상기 데이터에 오류가 발생하면, 상기 컨슈머 스레드(124)는 상기 프로듀서 스레드(122)에게 오류 코드에 대한 결과를 전송하고, 상기 프로듀서 스레드(122)는 사용자에게 바로 결과를 전송한다(4. Error Packet Response).At this time, if an error occurs in the data as a result of the check, the consumer thread 124 transmits the result of the error code to the producer thread 122, and the producer thread 122 directly transmits the result to the user ( 4. Error Packet Response).

반면, 확인 결과 상기 데이터에 오류 발생이 없으면, 상기 컨슈머 스레드(124)는 웹 어플리케이션 서버(130)에 상기 데이터를 전달한다(5. Http Packet Forwarding).On the other hand, if there is no error in the data as a result of the verification, the consumer thread 124 delivers the data to the web application server 130 (5. Http Packet Forwarding).

여기서, 상기 프로듀서 스레드(122)는 TCP/IP 소켓 서버 프로그래밍 방법으로 작성한다. 즉, 클라이언트에서 HTTP 요청이 있으면 서버 모듈에서 요청을 받아 바로 스레드를 생성한다. 스레드에서는 태스크 큐(110)에 데이터를 푸시(Push) 하여 저장하고 상기 컨슈머 스레드(124) 및 웹 어플리케이션 서버(130)의 응답을 기다린다.Here, the producer thread 122 is created using a TCP/IP socket server programming method. In other words, if there is an HTTP request from the client, it receives the request from the server module and immediately creates a thread. The thread pushes and stores data to the task queue 110 and waits for a response from the consumer thread 124 and the web application server 130.

상기 웹 어플리케이션 서버(130)로부터 HTTP 응답을 수신하면(6. Http Response), 상기 HTTP 서버(120)는 클라이언트에 응답을 전송하고(7. Http Response) 스레드를 종료한다.Upon receiving an HTTP response from the web application server 130 (6. Http Response), the HTTP server 120 transmits a response to the client (7. Http Response) and terminates the thread.

이와 같은 본 발명의 일 실시예에 따르면, 상기 컨슈머 스레드(124)는 HTTP 프로토콜(Protocol)을 처리할 수 있는 기능을 통해 상기 프로듀서 스레드(122)에서 생성한 태스크 큐(110)를 접근하여 비동기 방식으로 데이터를 읽어 들인다.According to an embodiment of the present invention as described above, the consumer thread 124 accesses the task queue 110 generated by the producer thread 122 through a function capable of processing the HTTP protocol, and is in an asynchronous manner. Read the data.

읽어 들인 데이터는 이미 GPU에서 데이터의 정합성, HTTP의 오류, 사용자 입력 값의 오류 등을 확인한 데이터가 구조체에 저장되어 있으므로, 상기 컨슈머 스레드(124)는 상기 구조체의 값을 참고하여 데이터의 오류가 발생하면 바로 프로듀서 스레드(122)에게 오류의 응답을 보낸다. 상기 프로듀서 스레드(122)는 사용자의 응답을 처리한다.Since the read data is already stored in the structure where the GPU has confirmed data consistency, HTTP error, user input value error, etc., the consumer thread 124 refers to the value of the structure and causes an error in data. Immediately, an error response is sent to the producer thread 122. The producer thread 122 processes the user's response.

상기 프로듀서 스레드(122)에서 생성된 태스크 큐(110)의 스레드는 사용자 요청의 데이터를 검사하여 오류가 발생하면 부울 형의 구조체 변수에 오류 발생 여부와 오류 코드를 저장한다. 상기 컨슈머 스레드(124)에서는 부울 형의 변수 값을 확인하여 오류가 발생하면 오류 코드와 함께 상기 프로듀서 스레드(122)에 바로 응답을 보낸다. 상기 프로듀서 스레드(122)는 오류가 없으면 HTTP 패킷을 상기 웹 어플리케이션 서버(130)에 전송한다.The thread of the task queue 110 created by the producer thread 122 checks the data of the user request and, if an error occurs, stores the error occurrence status and error code in a boolean structure variable. The consumer thread 124 checks the value of a boolean variable and, if an error occurs, immediately sends a response to the producer thread 122 along with an error code. If there is no error, the producer thread 122 transmits an HTTP packet to the web application server 130.

한편, 본 발명의 일 실시예에 따르면 상기 컨슈머 스레드(124)는 GPU에 태스크 큐(110)를 생성한다. GPU에 대량의 스레드를 생성하기 위하여, 상기 컨슈머 스레드(124)는 CUDA(Compute Unified Device Architecture) 어플리케이션을 작성하여 생성하고, GPU에서 비동기 방식으로 사용자 요청의 데이터를 수신하기 위하여 스레드를 생성하고 기다린다.Meanwhile, according to an embodiment of the present invention, the consumer thread 124 creates a task queue 110 in the GPU. In order to create a large number of threads in the GPU, the consumer thread 124 creates and creates a CUDA (Compute Unified Device Architecture) application, and creates and waits for a thread to receive data of a user request in an asynchronous manner in the GPU.

상기 컨슈머 스레드(124)는 사용자 요청을 태스크 큐(110)에서 수신하면 오류 코드를 확인하여 오류 발생이 있으면 바로 상기 프로듀서 스레드(122)에 오류를 통보한다. 반면, 오류 발생이 없으면 상기 컨슈머 스레드(124)는 상기 웹 어플리케이션 서버(130)에 데이터를 전송하여 사용자 요청을 처리한다. 상기 웹 어플리케이션 서버(130)에서 처리한 사용자 요청에 대한 응답은 상기 프로듀서 스레드(122)에 의해서 사용자에게 전송된다.When the consumer thread 124 receives a user request from the task queue 110, the consumer thread 124 checks an error code and immediately reports an error to the producer thread 122 if an error occurs. On the other hand, if there is no error, the consumer thread 124 transmits data to the web application server 130 to process a user request. The response to the user request processed by the web application server 130 is transmitted to the user by the producer thread 122.

시스템 구현System implementation

GPGPU 기반의 태스크 큐를 CUDA를 사용하여 구현한다. 또한 태스크 큐의 알고리즘은 원형 큐 알고리즘을 사용하고, CUDA 스레드가 원형 큐와 같이 작동하여 간단한 어플리케이션 처리가 가능하도록 구현하였다.Implement GPGPU based task queue using CUDA. In addition, the task queue algorithm uses the circular queue algorithm, and the CUDA thread works like a circular queue to enable simple application processing.

도 3은 GPGPU 기반 태스크 큐의 CUDA 커널 함수에서 호출하여 사용되는 HTTP 패킷 오류처리 함수의 일례를 나타낸 도면이다.3 is a diagram showing an example of an HTTP packet error processing function used by calling a CUDA kernel function of a GPGPU-based task queue.

도 3을 참조하면, HTTP 패킷 오류처리 함수는 CUDA의 커널 함수에서 호출되어 GPU에 대량의 스레드와 함께 생성된다. CUDA 커널 함수에서 정의된 task_queue_data의 구조체와 그 구조체의 크기를 파라미터로 받아서 HTTP 패킷의 적합성을 확인하여 error_flag에 값을 넣어 반환한다.Referring to FIG. 3, an HTTP packet error processing function is called from a kernel function of CUDA and is generated in a GPU with a large number of threads. Receives the task_queue_data structure defined in the CUDA kernel function and the size of the structure as parameters, checks the suitability of the HTTP packet, and returns the error_flag value.

HTTP 서버 Producer와 Consumer 서버를 구현하였다. Producer 서버는 사용자의 요구를 받아 데이터를 태스크 큐에 전송하는 역할을 수행하고, Consumer 서버는 태스크 큐에서 사용자의 데이터를 읽어 오류 처리와 웹 어플리케이션에 사용자의 요구사항을 전송하는 역할을 수행한다. 도 3과 같이 Producer 서버는 사용자 클라이언트로부터 HTTP Request 받아서 task_queue_data 구조체에 처리할 데이터를 분리하여 저장하여 태스크 큐에 전송하는 멀티 스레드 서버이다.Implemented HTTP server Producer and Consumer server. Producer server receives user's request and transmits data to task queue. Consumer server reads user's data from task queue, handles error and transmits user's request to web application. As shown in FIG. 3, the Producer server is a multi-threaded server that receives an HTTP request from a user client, separates and stores data to be processed in a task_queue_data structure and transmits it to a task queue.

Producer 서버는 클라이언트로부터 HTTP Request 패킷을 수신 받는다. HTTP Request 원본 데이터로부터 Http Request 데이터를 구조체에 분리하여 저장하며, 사용자의 http_user_data 구조체에 사용자가 입력한 데이터를 분리하여 저장한다. 이렇게 각각의 데이터를 구조체에 모두 저장하면 task_queue의 구조체가 완성되고, task_queue의 구조체를 GPGPU 태스크 큐에 push한다.Producer server receives HTTP Request packet from client. Http Request data is separated and stored in a structure from the original HTTP Request data, and the user input data is separated and stored in the user's http_user_data structure. When each data is stored in a structure like this, the structure of task_queue is completed, and the structure of task_queue is pushed to the GPGPU task queue.

Producer가 태스크 큐의 Rear값을 Consumer와 공유하기 위하여 공유 메모리에 저장된 shared_mem 구조체를 공유한다. 도 4와 같이 HTTP 서버 Producer를 구현하였다. Consumer 서버의 구조는 먼저 태스크 큐의 Front 정보를 공유 메모리로부터 읽기 위하여 공유 메모리를 초기화 한다. 다음에 태스크 큐로부터 데이터를 읽어 error_flag를 확인하여 데이터에 오류가 없으면 웹 어플리케이션 서버(WAS)에 HTTP Request를 Forwarding 한다.Producer shares the shared_mem structure stored in shared memory in order to share the rear value of the task queue with the consumer. As shown in Figure 4, the HTTP server Producer was implemented. The structure of the consumer server first initializes the shared memory to read the front information of the task queue from the shared memory. Next, it reads the data from the task queue and checks the error_flag. If there is no error in the data, it forwards the HTTP Request to the web application server (WAS).

Consumer 서버의 스레드 구조가 간단한 이유는 GPU의 태스크 큐에서 데이터 처리에 대한 프로그램을 수행하여 Consumer 서버에서는 오류 확인 후 바로 WAS 서버에 데이터를 전송할 수 있기 때문이다. 따라서 스레드 구조가 간단한 Consumer 서버를 구현하였다.The reason for the simple thread structure of the consumer server is that the consumer server can send data to the WAS server immediately after checking an error by executing a program for data processing in the task queue of the GPU. Therefore, we implemented a consumer server with a simple thread structure.

Consumer 서버는 태스크 큐로부터 데이터를 읽어 데이터의 오류를 확인하고 사용자의 요구를 처리하기 위하여 웹 어플리케이션 서버에 전송하는 기능을 구현한 멀티 스레드 서버 프로그램이다. 도 5와 같이 HTTP 서버의 Consumer를 구현하였다.Consumer server is a multi-threaded server program that implements a function that reads data from a task queue, checks data errors, and transmits it to a web application server to process user requests. As shown in FIG. 5, a Consumer of an HTTP server was implemented.

실험 시나리오 및 실험 결과Experiment Scenario and Experiment Results

실험은 3가지의 시나리오를 선정하여 GPGPU 기반 태스크 큐를 사용하지 않은 서버와 GPGPU 기반의 태스크 큐 서버에 동일한 방법으로 실험을 진행하였다.In the experiment, three scenarios were selected and the experiment was conducted in the same way for the server not using the GPGPU-based task queue and the GPGPU-based task queue server.

시나리오 1은 안정적인 서비스에서 GPGPU 기반 태스크 큐를 사용하지 않은 서버와 GPGPU 기반 태스크 큐의 처리 속도를 테스트하기 위해 실험을 진행하였다.Scenario 1 conducted an experiment to test the processing speed of a server that did not use a GPGPU-based task queue and a GPGPU-based task queue in a stable service.

시나리오 2는 짧은 시간에 1,000개의 스레드를 발생시키고 3,000개의 HTTP Request를 요청하는 실험을 진행하였다. 이 실험은 안정성과 성능을 모두 비교할 수 있는 실험이다.In Scenario 2, 1,000 threads were created in a short time and 3,000 HTTP Requests were requested. This experiment is an experiment that can compare both stability and performance.

시나리오 3은 대량의 스레드를 발생시켜 두 서버의 안정성과 속도를 테스트 하였다. 총 9,000개의 HTTP Request를 발생시켜 서버가 다운되는 현상을 확인하기 위해서 실험을 진행하였다.Scenario 3 tested the stability and speed of both servers by generating a large number of threads. An experiment was conducted to confirm the server down phenomenon by generating a total of 9,000 HTTP Requests.

도 6은 각각의 실험 시나리오별 성능을 나타낸 도면이다.6 is a diagram showing the performance of each experimental scenario.

도 6을 참조하면, 두 서버의 성능을 비교하면 약 2배에서 1.3배의 성능 차이(향상)를 볼 수 있다.Referring to FIG. 6, when the performance of the two servers is compared, a difference (improvement) of about 2 to 1.3 times can be seen.

스레드의 수가 적은 경우 HTTP Request에서는 성능의 차이가 많이 나고 스레드의 수가 증가하고 HTTP Request의 수가 증가하면서 두 서버의 속도 차가 줄어들고 있는 현상을 보인다. GPGPU 기반 태스크 큐를 사용하지 않은 서버와 GPGPU 기반 태스크 큐 서버 간 많은 스레드를 발생시키면 두 서버 간 성능 점점 비슷해지는 현상이 발생한다. 이것은 GPGPU 기반 태스크 큐 서버가 CPU에서 GPU에 많은 양의 데이터가 복사가 발생하면 발생하는 속도 저하 현상이다.When the number of threads is small, there is a large difference in performance in HTTP Request, and as the number of threads increases and the number of HTTP Requests increases, the speed difference between the two servers decreases. When a large number of threads are generated between the server that does not use the GPGPU-based task queue and the GPGPU-based task queue server, the performance between the two servers gradually becomes similar. This is a slowdown phenomenon that occurs when a large amount of data is copied from the CPU to the GPU in the GPGPU-based task queue server.

GPGPU 기반 태스크 큐 서버는 CPU에서 GPU에 데이터를 복사할 때 PCI(Peripheral Component Interconnect) Express 인터페이스를 사용하여 복사가 진행된다. PCI Express 인터페이스의 속도가 느려서 발생하는 현상으로 CPU와 GPU의 데이터 복사 속도를 향상시키면 HTTP Request 증가로 발생하는 속도 저하 현상을 줄일 수 있다.The GPGPU-based task queue server uses the Peripheral Component Interconnect (PCI) Express interface to copy data from the CPU to the GPU. This is a phenomenon that occurs because the speed of the PCI Express interface is slow. If you increase the speed of data copying between the CPU and GPU, you can reduce the slowdown caused by an increase in HTTP requests.

도 7은 두 서버(GPGPU 기반 태스크 큐를 사용하지 않은 서버와 GPGPU 기반 태스크 큐 서버)의 오류율을 나타낸 도면이다.7 is a diagram showing error rates of two servers (a server that does not use a GPGPU-based task queue and a GPGPU-based task queue server).

도 7을 참조하면, 시나리오 1번에서는 두 서버 모두 안정적으로 HTTP Request를 처리하고 있다. 시나리오 1번은 성능을 확인하기 위한 실험으로 일반적인 HTTP Request의 처리는 오류가 발생하지 않고 있다.Referring to FIG. 7, in scenario 1, both servers are stably processing HTTP Requests. Scenario 1 is an experiment to check the performance, and general HTTP Request processing does not cause an error.

시나리오 2번은 GPGPU 기반 태스크 큐를 사용하지 않은 서버에서 2.43%의 오류가 발생하였다. 짧은 시간에 대량의 스레드를 발생하여 HTTP Request를 요청하자 요청이 거부되는 현상이 발생하였다. 반면, GPGPU 기반 태스크 큐 서버의 경우 HTTP Request가 모두 안정적으로 처리가 되었다.In scenario 2, an error of 2.43% occurred on the server that did not use the GPGPU-based task queue. When a large number of threads were created in a short time and HTTP Request was requested, the request was rejected. On the other hand, in the case of a GPGPU-based task queue server, all HTTP requests were processed stably.

시나리오 3번은 시나리오 2번보다 더 많은 스레드와 HTTP Request를 요청하였다. GPGPU 기반 태스크 큐를 사용하지 않은 서버의 경우 오류율이 15.4%가 발생하였다. 하지만 서버의 다운되는 현상은 없었으며 나머지 HTTP Request의 처리는 정상적으로 수행되었다. 반면, GPGPU 기반 태스크 큐 서버는 모두 HTTP Request의 처리가 가능하였다.Scenario 3 requested more threads and HTTP Request than scenario 2. In the case of servers that did not use the GPGPU-based task queue, an error rate of 15.4% occurred. However, there was no server down phenomenon, and the rest of HTTP Request was processed normally. On the other hand, all GPGPU-based task queue servers were able to process HTTP Requests.

도 6 및 도 7에서 보는 바와 같이, GPGPU 기반 태스크 큐를 사용하지 않은 서버보다 GPGPU 기반 태스크 큐 서버가 성능과 안정성 면에서 모두 우수하다는 것을 알 수 있었다.As shown in FIGS. 6 and 7, it can be seen that the GPGPU-based task queue server is superior to the server that does not use the GPGPU-based task queue in terms of performance and stability.

이와 같이, 본 발명의 일 실시예에서는 웹 서비스의 대량 요청을 처리하고 웹 서버의 안전성을 위하여 GPGPU 기반 태스크 큐를 NVIDIA사의 CUDA를 이용하여 구현하였다.As described above, in an embodiment of the present invention, a GPGPU-based task queue is implemented using NVIDIA's CUDA for processing a large amount of requests for web services and for the safety of a web server.

본 발명의 실험 결과는 GPGPU 기반 태스크 큐를 사용한 서버의 성능이 GPGPU 기반 태스크 큐를 사용하지 않은 서버의 성능보다 136%에서 233%까지 향상된 결과를 볼 수 있다. 인공지능, 게임 및 고성능 과학기술 계산 등에 많이 사용되는 GPU를 일반 어플리케이션의 성능 향상에 이용할 수 있는 결과를 보였다.The experimental results of the present invention can be seen that the performance of the server using the GPGPU-based task queue is improved from 136% to 233% compared to the performance of the server not using the GPGPU-based task queue. It was shown that the GPU, which is widely used in artificial intelligence, games, and high-performance scientific and technological calculations, can be used to improve the performance of general applications.

기존 웹 서비스의 구성으로 많은 양의 사용자 요청을 처리하기 위해서 웹 서버의 성능을 향상시키거나 다수 의 웹 서버를 설치하여 운영할 경우 많은 비용이 발생한다. 하지만, 본 발명에서 제시한 GPGPU 기반의 태스크 큐를 사용할 경우 하나의 서버에서 대량의 사용자 요청을 처리할 때 서버에 GPU를 설치하여 활용할 경우, 보다 안정적이고 고성능의 웹 서비스를 저렴한 비용에 제공할 수 있다.In order to process a large amount of user requests with the configuration of the existing web service, a lot of costs are incurred when the performance of the web server is improved or if a number of web servers are installed and operated. However, in the case of using the GPGPU-based task queue presented in the present invention, when a GPU is installed and utilized in the server when processing a large number of user requests in one server, a more stable and high-performance web service can be provided at low cost. have.

또한, GPGPU 기반 태스크 큐 서버에서 보다 많은 HTTP Request를 처리할 수 있는 다중 GPU를 제공할 경우, 웹 서버의 성능을 더 향상시킬 수 있을 것으로 기대된다. 다시 말해, 기존 웹 서비스에 대량의 사용자 요청이 있을 경우, 웹 서비스 중단에 대한 문제를 GPU를 활용하여 저렴한 비용으로 해결할 수 있을 것으로 기대된다.In addition, if the GPGPU-based task queue server provides multiple GPUs capable of handling more HTTP requests, it is expected that the performance of the web server can be further improved. In other words, when there is a large number of user requests to the existing web service, it is expected that the problem of web service interruption can be solved at low cost by utilizing the GPU.

GPGPU는 HPC 분야, 게임, 시뮬레이션, 인공지능 분야 등에 많이 이용되고 그 수가 점점 증가하고 있다. GPU를 사용하여 많은 작업들을 수행하고 GPU의 이용 분야도 다양하게 발전하고 있다.GPGPU is widely used in the field of HPC, games, simulation, and artificial intelligence, and the number is increasing. Many tasks are performed using the GPU, and the fields of use of the GPU are also developing in various ways.

최신의 GPU는 그래픽 처리뿐 아니고 GPGPU 프로그램을 지원하는 고성능의 부동 소수점 연산 기능을 제공하고 있다. 또한 대량의 스레드를 생성하여 처리할 수 있도록 그 성능도 계속 발전하고 있다. 이런 GPU의 성능을 일반 어플리케이션에서 사용할 수 있도록 GPU가 계속 발전하고, GPU가 CPU를 보조하여 발전하면 범용의 어플리케이션의 실행 성능이 향상되고 고성능의 범용 어플리케이션을 수행하기 위한 하드웨어도 더 비용을 절감할 수 있다.The latest GPUs provide high-performance floating-point arithmetic functions that support not only graphics processing but also GPGPU programs. In addition, its performance continues to evolve so that it can create and process a large number of threads. If the GPU continues to evolve so that the performance of this GPU can be used in general applications, and the GPU assists the CPU to develop, the execution performance of general-purpose applications improves, and the hardware for executing high-performance general-purpose applications can further reduce costs. have.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CDROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CDROMs and DVDs, and magnetic-optical media such as floptical disks. And hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims fall within the scope of the following claims.

110: 태스크 큐
120: HTTP 서버
122: Producer 스레드
124: Consumer 스레드
130: 웹 어플리케이션 서버110: task queue
120: HTTP server
122: Producer thread
124: Consumer thread
130: web application server

Claims

A task queue that is created in the GPU and processes a user request for a web service by utilizing the multi-thread processing capability of the GPGPU; And
When there is an HTTP request corresponding to the user request from a client, a plurality of producer threads and a plurality of consumer threads are generated in response to the HTTP request, and the HTTP request is responded to through the plurality of producer threads. When the task queue receives a packet related to and transmits it to the task queue, the task queue is preprocessed so that the data of the packet related to the HTTP request generated in large quantities does not exceed the capacity of the task queue using a circular queue algorithm HTTP server that checks whether an error occurs in the preprocessed data through the plurality of consumer threads for the data preprocessed by and, if there is no error, delivers the data to the web application server
A web service providing system using a GPGPU-based task queue, comprising: a.

delete

The method of claim 1,
The task queue is
A web service providing system using a GPGPU-based task queue, which is implemented using a GPGPU technology-based CUDA (Compute Unified Device Architecture).

The method of claim 3,
The CUDA is
An HTTP packet error processing function is called using a kernel function to create the GPU with a plurality of threads, and the HTTP packet error processing function takes a structure of the task queue data defined in the kernel function and the size of the structure as parameters. A web service providing system using a GPGPU-based task queue, characterized in that, receiving and checking the suitability of the packet for the HTTP request, putting a value in an error flag and processing the return.

The method of claim 1,
The consumer thread is
The data preprocessed by the task queue is read asynchronously to check whether an error has occurred, and if an error occurs as a result, the result of the error code is transmitted to the producer thread and the result is immediately sent to the user through the producer thread. A web service providing system using a GPGPU-based task queue, characterized in that the data is transmitted to the web application server when there is no error as a result of the transmission.

The method of claim 1,
The consumer thread is
Through a function capable of processing the HTTP protocol, the task queue created by the producer thread is accessed and the preprocessed data is read asynchronously, but the read data is already in the GPU for data consistency, HTTP error, A web service providing system using a GPGPU-based task queue, characterized in that since at least one of the errors of a user input value is checked and stored in a structure, the error of the data is checked by referring to the value of the structure.

The method of claim 6,
The thread of the task queue created by the producer thread is
If an error occurs by checking the data of the user request, the error and error code are stored in a boolean structure variable.
The consumer thread is
A web service providing system using a GPGPU-based task queue, characterized in that checking whether an error has occurred in the preprocessed data by referring to the value of the boolean structure variable.

The method of claim 1,
The task queue is
A web service providing system using a GPGPU-based task queue, characterized in that, when the user request cannot be processed due to a CPU load, waiting information including a waiting time and a waiting sequence number is transmitted to a client to guide the user.

Generating, by the HTTP server, a plurality of producer threads and a plurality of consumer threads in response to an HTTP request corresponding to a user request of a web service from a client;
Receiving, by the HTTP server, a packet related to the HTTP request through the plurality of producer threads and transmitting the packet to a task queue generated in a GPU;
Processing the HTTP request by the task queue using the multi-thread processing capability of the GPGPU;
When the task queue is preprocessed so that the data of the packet related to the HTTP request generated in a large amount using a circular queue algorithm does not exceed the capacity of the task queue, the HTTP server is reminded of the data preprocessed by the task queue. Checking whether an error has occurred in the preprocessed data through a plurality of consumer threads; And
If there is no error as a result of checking, the HTTP server transferring the preprocessed data to a web application server
Web service providing method using a GPGPU-based task queue comprising a.

delete

The method of claim 9,
The task queue is
A method of providing a web service using a GPGPU-based task queue, which is implemented using CUDA based on GPGPU technology.

The method of claim 11,
The CUDA is
An HTTP packet error processing function is called using a kernel function to create the GPU with a plurality of threads, and the HTTP packet error processing function takes a structure of the task queue data defined in the kernel function and the size of the structure as parameters. A method of providing a web service using a GPGPU-based task queue, characterized in that receiving a packet, checking the suitability of the packet for the HTTP request, putting a value in an error flag, and processing the return.

The method of claim 9,
The consumer thread is
Through a function capable of processing the HTTP protocol, the task queue created by the producer thread is accessed and the preprocessed data is read asynchronously, but the read data is already in the GPU for data consistency, HTTP error, Since at least one of the errors of user input values is checked and stored in a structure, the method of providing a web service using a GPGPU-based task queue, characterized in that it checks whether an error occurs in the preprocessed data by referring to the structure value.