KR101338566B1

KR101338566B1 - Apparatus and method for classifying i/o feature for disk drive

Info

Publication number: KR101338566B1
Application number: KR1020100136476A
Authority: KR
Inventors: 강수용; 원유집; 차재혁; 윤성로; 최종무; 서범준
Original assignee: 한양대학교 산학협력단
Priority date: 2010-12-28
Filing date: 2010-12-28
Publication date: 2013-12-06
Also published as: KR20120074593A

Abstract

입출력 특성 분류 장치 및 방법이 개시된다. 입출력 특성 분류 장치는 입출력 명령을 그룹화하여 적어도 하나의 집합을 생성하는 집합 생성부; 상기 집합들 각각으로부터 페이지 도메인 기반의 특성(feature)을 추출하는 특성 추출부; 및 상기 추출된 특성들을 이용하여 입출력 유형(IO Pattern)에 대응하는 적어도 하나의 클러스터(cluster)들을 생성하는 클러스터 생성부를 포함할 수 있다.An input / output characteristic classification apparatus and method are disclosed. The input / output characteristic classification apparatus includes a set generation unit for generating at least one set by grouping input / output commands; A feature extractor extracting a feature based on a page domain from each of the sets; And a cluster generation unit generating at least one cluster corresponding to an input / output type (IO Pattern) using the extracted characteristics.

Description

I / O characteristic classification device and method for storage device {APPARATUS AND METHOD FOR CLASSIFYING I / O FEATURE FOR DISK DRIVE}

본 발명은 입출력 특성 분류 장치 및 방법에 관한 것으로, 보다 구체적으로는 입출력 명령의 특성에 따라 입출력 유형에 대응하는 클러스터를 생성하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for classifying input / output characteristics, and more particularly, to an apparatus and method for generating a cluster corresponding to an input / output type according to characteristics of an input / output command.

저장 장치를 제작할 때, 어떠한 종류의 입출력 유형을 처리할 것인지를 미리 파악하는 것은 저장 장치의 성능 향상에 매우 중요한 일이다. 즉, 저장 장치에 전달되는 입출력 명령에 대해 입출력 유형으로 구분하여 이에 따라 최적화된 설계를 한다면 저장 장치의 성능을 향상될 수 있다.When manufacturing a storage device, it is very important to know in advance what kind of input / output types to process. That is, the performance of the storage device may be improved if the input and output commands transmitted to the storage device are classified into input and output types and optimized accordingly.

그러나, 저장 장치에 대한 입출력 명령의 입출력 유형을 체계적으로 분석하고 인식하는 기술은 활발하게 개발되지 않았다. 그리고, 필요시 입출력 유형을 분류하는 기술이 사용되었지만, 적용 분야는 매우 제한적이었다.However, a technique for systematically analyzing and recognizing the input / output type of the input / output command to the storage device has not been actively developed. In addition, although a technique of classifying input and output types was used when necessary, the application field was very limited.

본 발명은 입출력 명령으로부터 집합을 생성하고, 집합의 특성에 따라 입출력 유형에 대응하는 클러스터를 생성함으로써 저장 장치의 최적화된 설계를 가능하게 하는 장치 및 방법을 제공한다.The present invention provides an apparatus and method for enabling an optimized design of a storage device by generating a set from an input / output command and generating a cluster corresponding to an input / output type according to the characteristics of the set.

본 발명은 신규로 입력된 입출력 명령을 입출력 유형에 따라 분류하여 훈련함으로써 입출력 유형을 보다 정확하게 분류할 수 있는 장치 및 방법을 제공한다.The present invention provides an apparatus and method for classifying an input / output type more accurately by classifying and training a newly inputted input / output command according to an input / output type.

본 발명의 일실시예에 따른 입출력 특성 분류 장치는 입출력 명령을 그룹화하여 적어도 하나의 집합을 생성하는 집합 생성부; 상기 집합들 각각으로부터 페이지 도메인 기반의 특성(feature)을 추출하는 특성 추출부; 및 상기 추출된 특성들을 이용하여 입출력 유형(IO Pattern)에 대응하는 적어도 하나의 클러스터(cluster)들을 생성하는 클러스터 생성부를 포함할 수 있다.An input / output characteristic classification apparatus according to an embodiment of the present invention includes a set generation unit for generating at least one set by grouping input / output commands; A feature extractor extracting a feature based on a page domain from each of the sets; And a cluster generation unit generating at least one cluster corresponding to an input / output type (IO Pattern) using the extracted characteristics.

본 발명의 일실시예에 따른 입출력 특정 분류 장치는 신규로 입력된 입출력 명령을 적어도 하나의 클러스터들 중 어느 하나의 클러스터로 분류하는 입출력 명령 분류부를 더 포함할 수 있다.The input / output specific classification apparatus according to an embodiment of the present invention may further include an input / output command classification unit for classifying a newly inputted input / output command into any one cluster among at least one cluster.

본 발명의 일실시예에 따른 입출력 특성 분류 방법은 입출력 명령을 그룹화하여 적어도 하나의 집합을 생성하는 단계; 상기 집합들 각각으로부터 페이지 도메인 기반의 특성(feature)을 추출하는 단계; 및 상기 추출된 특성들을 이용하여 입출력 유형(IO Pattern)에 대응하는 적어도 하나의 클러스터(cluster)들을 생성하는 단계를 포함할 수 있다.According to one or more exemplary embodiments, a method for classifying input / output characteristics includes generating at least one set by grouping input / output commands; Extracting a page domain based feature from each of the sets; And generating at least one cluster corresponding to an input / output type (IO pattern) by using the extracted characteristics.

본 발명의 일실시예에 따른 입출력 특성 분류 방법은 신규로 입력된 입출력 명령을 적어도 하나의 클러스터들 중 어느 하나의 클러스터로 분류하는 단계를 더 포함할 수 있다.The method for classifying input / output characteristics according to an embodiment of the present invention may further include classifying a newly inputted input / output command into any one cluster among at least one cluster.

본 발명의 일실시예에 따르면, 입출력 명령으로부터 집합을 생성하고, 집합의 특성에 따라 입출력 유형에 대응하는 클러스터를 생성함으로써 저장 장치의 최적화된 설계가 가능하다.According to an embodiment of the present invention, an optimized design of a storage device is possible by generating a set from an input / output command and generating a cluster corresponding to an input / output type according to the characteristics of the set.

본 발명의 일실시예에 따르면, 신규로 입력된 입출력 명령을 입출력 유형에 따라 분류하여 훈련함으로써 입출력 유형이 보다 정확하게 분류될 수 있다.According to an embodiment of the present invention, the input / output type may be classified more accurately by classifying and training the newly inputted input / output command according to the input / output type.

도 1은 본 발명의 일실시예에 따른 입출력 특성 분류 장치의 전체 구성을 도시한 블록 다이어그램이다.
도 2는 본 발명의 일실시예에 따른 입출력 특성 분류 장치의 동작을 나타낸 도면이다.
도 3은 본 발명의 일실시예에 따른 트리 기반의 입출력 명령 분류 방법을 나타낸 도면이다.
도 4는 본 발명의 일실시예에 따른 입출력 유형을 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 클러스터 개수 대비 실루엣값을 나타낸 그래프이다.
도 6은 본 발명의 일실시예에 따른 입출력 특성 분류 방법을 도시한 플로우차트이다.1 is a block diagram showing an overall configuration of an input / output characteristic classification apparatus according to an embodiment of the present invention.
2 is a view showing the operation of the input and output characteristic classification apparatus according to an embodiment of the present invention.
3 is a diagram illustrating a tree-based input / output command classification method according to an embodiment of the present invention.
4 is a diagram illustrating an input / output type according to an embodiment of the present invention.
5 is a graph showing a silhouette value versus the number of clusters according to an embodiment of the present invention.
6 is a flowchart illustrating a method of classifying input / output characteristics according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 입출력 특성 분류 장치의 전체 구성을 도시한 블록 다이어그램이다.1 is a block diagram showing an overall configuration of an input / output characteristic classification apparatus according to an embodiment of the present invention.

도 1을 참고하면, 본 발명의 일실시예에 따른 입출력 특성 분류 장치(100)는 집합 생성부(101), 특성 추출부(102), 클러스터 생성부(103) 및 입출력 명령 분류부(104)를 포함할 수 있다. 본 발명의 일실시예에 따른 입출력 특성 분류 장치(100)는 저장 장치에 대한 입출력 명령에 대한 특성을 분석하여 저장 장치에서 발생할 수 있는 입출력 유형을 인식함으로써 입출력 유형에 기초한 최적화된 설계를 통해 저장 장치를 제작할 수 있도록 한다. 결국, 최적화된 설계를 통해 제작된 저장 장치의 성능은 향상될 수 있다.Referring to FIG. 1, an input / output characteristic classification apparatus 100 according to an embodiment of the present invention may include a set generator 101, a feature extractor 102, a cluster generator 103, and an input / output command classifier 104. It may include. The input / output characteristic classification apparatus 100 according to an embodiment of the present invention analyzes the characteristics of the input / output command for the storage device and recognizes the input / output type that may occur in the storage device. To be able to produce. As a result, the performance of the storage device manufactured through the optimized design may be improved.

집합 생성부(101)는 입출력 명령을 그룹화하여 적어도 하나의 집합을 생성할 수 있다. 여기서, 입출력 명령은 복수의 입출력 명령들이 시간 흐름에 따라 나열된 트레이스(Trace) 형태로 입력될 수 있으며, 저장 장치에 대한 1개의 입출력 명령은 주소, 오프셋, 타임스탬프(time stamp)로 구성될 수 있다. 데이터는 복수 개의 입출력 명령을 통해 이동될 수 있다. 일례로, 집합 생성부(101)는 시간 윈도우(time window)에 따라 입출력 명령을 그룹화하여 시간 윈도우 별로 하나의 집합을 생성할 수 있다. 여기서, 시간 윈도우의 크기는 입출력 명령의 개수에 대응할 수 있다.The set generator 101 may generate at least one set by grouping input / output commands. In this case, the input / output command may be input in a trace form in which a plurality of input / output commands are listed as time passes, and one input / output command for the storage device may include an address, an offset, and a time stamp. . The data may be moved through a plurality of input / output commands. For example, the set generator 101 may generate one set for each time window by grouping input / output commands according to a time window. Here, the size of the time window may correspond to the number of input / output commands.

특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 페이지 도메인 기반의 특성(feature)을 추출할 수 있다.The feature extractor 102 may extract a feature based on the page domain from each of sets that are generated for each time window.

제1 실시예로, 특성 추출부(102)는 생성된 집합들 각각으로부터 중단 포인트의 개수(The number of Break Points: BP)를 추출할 수 있다. 구체적으로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 중단 포인트의 개수를 추출할 수 있다.In a first embodiment, the feature extractor 102 may extract the number of break points (BP) from each of the generated sets. In detail, the feature extractor 102 may extract the number of breakpoints in the time window from each of the sets generated for each time window.

제2 실시예로, 특성 추출부(102)는 집합들 각각으로부터 전체 중단 포인트 대비 연결 포인트(connection point)의 비율(Ratio of connected points to break points: RCB)을 추출할 수 있다. 구체적으로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 전체 중단 포인트 대비 연결 포인트의 비율을 추출할 수 있다.In a second embodiment, the feature extractor 102 may extract a ratio of connected points to break points (RCB) from each of the sets. In detail, the feature extractor 102 may extract a ratio of connection points to total breakpoints in the time window from each of the sets generated for each time window.

제3 실시예로, 특성 추출부(102)는 집합들 각각으로부터 랜덤 포인트(random point)를 가지는 페이지들의 사이즈 비율(Size of Random Points: SR)을 추출할 수 있다. 구체적으로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트(random point)를 가지는 페이지들의 사이즈 비율을 추출할 수 있다.In a third embodiment, the feature extractor 102 may extract a size of random points (SR) of pages having random points from each of the sets. In detail, the feature extractor 102 may extract a size ratio of pages having a random point in the time window from each of the sets generated for each time window.

제4 실시예로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트를 포함하는 페이지를 제외한 영역의 전체 중단 포인트 대비 연결 포인트의 비율(Ratio of connected points to break points except random points: RER)을 추출할 수 있다. 구체적으로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트를 포함하는 페이지를 제외한 영역의 전체 중단 포인트 대비 연결 포인트의 비율을 추출할 수 있다.In a fourth exemplary embodiment, the feature extractor 102 generates a ratio of connected points to total breakpoints of an area excluding a page including a random point in a time window from each of sets generated for each time window. break points except random points (RER). In detail, the feature extractor 102 may extract a ratio of the connection points to the total breakpoints of the region excluding the page including the random point in each of the sets generated for each time window.

제5 실시예로, 특성 추출부(102)는 집합들 각각으로부터 중단점 이후에 기록되는 위치가 중단점에 비해 증가인지 또는 감소인지를 나타내는 비율(Ratio of Increase and Decrease tendency: RID)을 추출할 수 있다. 구체적으로, 특성 추출부(102)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 중단점 이후에 기록되는 위치가 중단점에 비해 증가인지 또는 감소인지를 나타내는 비율을 추출할 수 있다.In a fifth embodiment, the feature extractor 102 extracts a ratio of increase and decrease tendency (RID) indicating whether a position recorded after the breakpoint is increased or decreased relative to the breakpoint from each of the sets. Can be. In detail, the feature extractor 102 may extract a ratio indicating whether a position recorded after the breakpoint in the time window is increased or decreased compared to the breakpoint from each of the sets generated for each time window.

클러스터 생성부(103)는 추출된 특성들을 이용하여 입출력 유형(IO Pattern)에 대응하는 적어도 하나의 클러스터(cluster)들을 생성할 수 있다. 일례로, 클러스터 생성부(103)는 특성이 유사한 집합은 서로 모으고 특성이 서로 다른 집합은 분리함으로써 클러스터를 생성할 수 있다. 클러스터링 알고리즘으로 K-means와 Hierarchical이 사용될 수 있다. 바람직한 실시예로, 하나의 입출력 유형은 하나의 클러스터를 의미할 수 있으나, 클러스터링 알고리즘의 정확도에 따라 2개 이상의 클러스터가 하나의 입출력 유형을 의미할 수 있다.The cluster generator 103 may generate at least one cluster corresponding to the input / output type IO pattern using the extracted characteristics. For example, the cluster generator 103 may generate a cluster by collecting sets having similar characteristics and separating sets having different characteristics. K-means and Hierarchical can be used as clustering algorithms. In a preferred embodiment, one input / output type may mean one cluster, but two or more clusters may mean one input / output type according to the accuracy of the clustering algorithm.

앞서 설명한 구성 요소들은 이미 입력된 복수의 입출력 명령로 구성된 트레이스들에 기초하여 클러스터를 생성하는 과정에 관여하고, 이하에서 설명되는 입출력 명령 분류부(104)는 신규로 입력된 하나의 입출력 명령을 입출력 유형에 대응하는 클러스터로 분류하는 과정에 관여할 수 있다.The above-described components are involved in a process of generating a cluster based on traces composed of a plurality of input / output commands already input, and the input / output command classification unit 104 described below inputs / outputs a newly input / output command. It may be involved in the process of classifying into clusters corresponding to types.

입출력 명령 분류부(104)는 신규로 입력된 입출력 명령을 적어도 하나의 클러스터들 중 어느 하나의 클러스터로 분류할 수 있다. 입출력 명령 분류부(104)는 반복적인 훈련 과정을 통해 정확도가 향상될 수 있으며, K-nearest neighbor, support vector machine, logistic regression, decision tree 또는 random forest 등의 분류 알고리즘이 활용될 수 있다.The input / output command classifier 104 may classify the newly inputted input / output command into any one cluster among at least one cluster. The input / output command classification unit 104 may improve accuracy through an iterative training process, and a classification algorithm such as K-nearest neighbor, support vector machine, logistic regression, decision tree, or random forest may be utilized.

이하에서는, 입출력 특성 분류 장치(100)의 동작을 구체적으로 설명하기로 한다.Hereinafter, an operation of the input / output characteristic classification apparatus 100 will be described in detail.

도 2는 본 발명의 일실시예에 따른 입출력 특성 분류 장치의 동작을 나타낸 도면이다.2 is a view showing the operation of the input and output characteristic classification apparatus according to an embodiment of the present invention.

Step I에서, 입출력 특성 분류 장치(100)는 입출력 명령의 트레이스(trace)에 미리 설정한 크기(W)의 시간 윈도우(time window)를 적용할 수 있다. 앞서 설명하였듯이, 입출력 명령의 트레이스는 시간 흐름에 대한 입출력 명령의 집합을 의미한다. 이 때, W가 8000인 경우, 하나의 시간 윈도우에 8000개의 입출력 명령이 포함된 것을 의미한다.In Step I, the input / output characteristic classification apparatus 100 may apply a time window of a preset size W to a trace of the input / output command. As described above, the trace of an input / output command means a set of input / output commands for a time flow. In this case, when W is 8000, this means that 8000 input / output commands are included in one time window.

Step II에서, 입출력 특성 분류 장치(100)는 시간 윈도우에 대응하는 집합을 생성할 수 있다. 즉, 입출력 명령의 트레이스에 적용된 시간 윈도우가 100개인 경우, 100개의 집합이 생성된다.In Step II, the input / output characteristic classification apparatus 100 may generate a set corresponding to the time window. That is, if there are 100 time windows applied to the trace of the input / output instruction, 100 sets are generated.

Step IV에서, 입출력 특성 분류 장치(100)는 집합들 각각에 대해 N개의 특성을 추출할 수 있다. 일례로, 특성은 중단 포인트의 개수(BP), 전체 중단 포인트 대비 연결 포인트의 비율(RCB), 랜덤 포인트를 가지는 페이지들의 사이즈 비율(SR), 랜덤 포인트를 포함하는 페이지를 제외한 영역의 전체 중단 포인트 대비 연결 포인트의 비율을 추출(RER), 및 중단점 이후에 기록되는 위치가 중단점에 비해 증가인지 또는 감소인지를 나타내는 비율(RID) 중 적어도 하나를 추출할 수 있다.In Step IV, the input / output characteristic classification apparatus 100 may extract N characteristics for each of the sets. For example, the characteristic may include the number of breakpoints (BP), the ratio of the connection points to the total breakpoints (RCB), the size ratio of the pages having random points (SR), and the total breakpoints of the region excluding the page including the random points. At least one of extracting a ratio of the connection point to the contrast RER and a ratio RID indicating whether a position recorded after the breakpoint is increased or decreased relative to the breakpoint may be extracted.

제1 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 중단 포인트의 개수(BP)를 추출할 수 있다. 구체적으로, 입출력 특성 분류 장치(100)는 시간 윈도우에서 기록된 페이지에 대하여 기록의 연속이 끊어진 지점인 중단 포인트의 개수를 추출할 수 있다. BP는 페이지를 기준으로 카운트되며, BP를 통해 시간 윈도우에서 연속된 길이 정보가 추출될 수 있다.According to a first embodiment, the input / output characteristic classification apparatus 100 may extract the number of break points BP in a time window from each of sets that are generated for each time window. In detail, the input / output characteristic classification apparatus 100 may extract the number of breakpoints, which are points at which the continuation of recording is interrupted, for pages recorded in the time window. The BP is counted based on the page, and continuous length information can be extracted from the time window through the BP.

제2 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 전체 중단 포인트 대비 연결 포인트의 비율(RCB)을 추출할 수 있다. 구체적으로, 시간 윈도우에서 기록된 페이지에 대하여 기록의 연속이 중단된 지점인 전체 중단 포인트 중 다른 지점에서 기록되다가 나중에 다시 중단된 지점에서 기록하는 경우, 중간에 다른 지점에서 기록된 부분을 제외하면 연속인 중단 포인트를 연속 포인트라고 정의한다. In a second embodiment, the input / output characteristic classification apparatus 100 may extract the ratio of the connection points to the total breakpoints (RCB) in the time window from each of the sets generated for each time window. Specifically, when recording is recorded at another point of the total break point, which is the point where the continuation of recording is interrupted for the page recorded in the time window, and is recorded at the point where it was interrupted again later, except for the portion recorded at another point in the middle An interruption point is defined as a continuous point.

즉, RCB는 전체 중단 포인트에서 연속 포인트의 비율 (CP/BP)을 의미한다. RCB는 sequential과 interleaved(random)한 클러스터를 구분하는 데 사용될 수 있다. RCB는 입출력 유형에 직접적으로 영향을 줄 수 있는 연속 포인트와 연속 포인트가 아닌 중단 포인트를 통해 결정되기 때문에, RCB를 통해 보다 구체적인 입출력 유형의 구분이 가능하다.In other words, RCB means the ratio of consecutive points (CP / BP) to total breakpoints. RCB can be used to distinguish between sequential and interleaved (random) clusters. Since the RCB is determined by the continuous point and the interruption point rather than the continuous point, which can directly affect the input / output type, the RCB makes it possible to distinguish more specific input / output types.

제3 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트(random point)를 가지는 페이지들의 사이즈 비율을 추출할 수 있다. 구체적으로, 시간 윈도우에서 기록의 연속이 중단된 중단 포인트 중 연속 포인트가 아니거나 길이가 일정 이상으로 길지 않은 중단 포인트를 랜덤 포인트라고 정의한다. In a third embodiment, the input / output characteristic classification apparatus 100 may extract a size ratio of pages having a random point in the time window from each of sets generated for each time window. Specifically, a breakpoint that is not a continuous point or a length that is not longer than a certain length among the breakpoints where the recording is stopped in the time window is defined as a random point.

즉, SR은 전체 시간 윈도우에서 랜덤 포인트를 가지는 페이지들의 사이즈 비율 (Size of Random / Size of Window)를 의미하며, 랜덤 포인트를 가지는 페이지들의 사이즈가 증가할수록 해당 클러스터는 random한 경향을 나타낸다. 입출력 특성 분류 장치(100)는 SR를 통해 다양한 입출력 유형이 섞인 경우에도 명확하게 랜덤 포인트를 구분할 수 있다.In other words, SR means a size ratio of pages having random points in the entire time window, and the cluster tends to be random as the sizes of pages having random points increase. The input / output characteristic classification apparatus 100 may clearly distinguish random points even when various input / output types are mixed through the SR.

제4 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트를 포함하는 페이지를 제외한 영역의 전체 중단 포인트 대비 연결 포인트의 비율(RER)을 추출할 수 있다. 즉, RER은 전체 중단 포인트에서 랜덤 포인트를 제외한 나머지 중단 포인트 대비 연속 포인트의 비율을 의미한다. 입출력 특성 분류 장치(100)는 RER을 통해 시간 윈도우에 랜덤 포인트가 포함되어 있더라도 기본 입출력 유형을 찾을 수 있다. RER은 RCB의 확장형으로 명확하게 랜덤 포인트를 제거하고 나머지 중단 포인트나 연속 포인트를 이용하는 점에서 특징이 있다.In a fourth embodiment, the input / output characteristic classification apparatus 100 extracts a ratio RER of total breakpoints of an area excluding a page including a random point in a time window from each of sets generated for each time window. can do. That is, RER refers to the ratio of continuous points to remaining break points except random points in all break points. The input / output characteristic classification apparatus 100 may find a basic input / output type even though a random point is included in a time window through the RER. RER is an extension of RCB that features clear random points and uses the remaining breakpoints or contiguous points.

제5 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 중단점 이후에 기록되는 위치가 중단점에 비해 증가인지 또는 감소인지를 나타내는 비율(RID)을 추출할 수 있다. 즉, 기록의 연속이 중단된 중단 포인트에서 이후에 기록되는 위치가 증가인지 또는 감소인지에 대한 빈도를 통해 클러스터가 입출력 유형의 증가 또는 감소의 경향을 나타내는 지 판단될 수 있다. 랜덤 포인트를 가지는 시간 윈도우에서도 RID가 크면 클러스터는 특정한 경향성을 가지는 것으로 해석될 수 있다. RID는 기록의 연속이 중단된 중단 포인트의 바로 다음의 저장 위치만 고려하는 것이 아니라 시간 윈도우에서 전체적인 경향성을 고려하는 점에서 특징이 있다.In a fifth embodiment, the input / output characteristic classification apparatus 100 displays a ratio RID indicating whether a position recorded after a breakpoint in the time window is increased or decreased compared to the breakpoint from each of the sets generated for each time window. Can be extracted. In other words, it can be determined whether the cluster tends to increase or decrease the input / output type through the frequency of whether the position to be subsequently recorded is increased or decreased at the interruption point at which the recording is stopped. Even in a time window having random points, if the RID is large, the cluster may be interpreted as having a particular tendency. The RID is characterized in that it considers not only the storage location immediately after the interruption point at which the continuation of recording is interrupted, but also the overall trend in the time window.

Step III에서, 입출력 특성 분류 장치(100)는 집합들 각각으로부터 추출한 N개의 특성에 따라 N차원의 벡터로 변환할 수 있다. 그러면, X개의 집합이 존재한다면 총 M(N*X)개 벡터가 생성될 수 있다.In Step III, the input / output characteristic classification apparatus 100 may convert the N-dimensional vector according to the N characteristics extracted from each of the sets. Then, if there are X sets, a total of M (N * X) vectors may be generated.

Step IV에서, 입출력 특성 분류 장치(100)는 M개의 벡터에 대해 클러스터링 알고리즘을 이용하여 K개의 클러스터를 생성할 수 있다. K개의 클러스터 각각의 K개의 입출력 유형으로 인식될 수 있다.In Step IV, the input / output characteristic classification apparatus 100 may generate K clusters using M clustering algorithms for M vectors. It may be recognized as K input / output types of each of the K clusters.

Step V에서, 신규한 1개의 입출력 명령이 입력되면, 입출력 특성 분류 장치(100)는 입력된 입출력 명령이 어떠한 입출력 유형에 해당하는 지를 분류할 수 있다.In Step V, when a new input / output command is input, the input / output characteristic classification apparatus 100 may classify which input / output type the input / output command corresponds to.

도 3은 본 발명의 일실시예에 따른 트리 기반의 입출력 명령 분류 방법을 나타낸 도면이다.3 is a diagram illustrating a tree-based input / output command classification method according to an embodiment of the present invention.

트리 기반의 입출력 명령 분류 방법은 분기마다 문턱값을 초과하는 지 여부에 따라 입출력 명령을 분류하는 방법이다. 도 3을 참고하면, 신규한 입출력 명령 1개가 입력되면, 입출력 명령 분류 장치(100)는 입력된 입출력 명령의 BP를 추출할 수 있다. 이 때, BP<1인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (1) 입출력 유형으로 분류할 수 있다. The tree-based I / O command classification method is a method of classifying I / O commands according to whether a threshold is exceeded for each branch. Referring to FIG. 3, when one new input / output command is input, the input / output command classification apparatus 100 may extract a BP of the input / output command. In this case, when BP <1, the input / output command classification apparatus 100 may classify the input / output command as (1) input / output type.

그리고, 1≤BP<8이거나 또는 8≤BP인 경우, 입출력 명령 분류 장치(100)는 입출력 명령의 RCB를 추출할 수 있다. 이 때, 1≤BP<8이고 RCB<50%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (2) 입출력 유형으로 분류하고, 1≤BP<8이고 RCB>50% 인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (3) 입출력 유형으로 분류할 수 있다.When 1 ≦ BP <8 or 8 ≦ BP, the input / output command classification apparatus 100 may extract the RCB of the input / output command. At this time, when 1≤BP <8 and RCB <50%, the input / output command classification device 100 classifies the input / output command as (2) input / output type, and when 1≤BP <8 and RCB> 50%, The command classification apparatus 100 may classify the input / output commands into (3) input / output types.

그리고, 8≤BP이고 RCB<50%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령의 SR을 추출하고, 8≤BP이고 RCB>50%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (5) 입출력 유형으로 분류할 수 있다.When 8≤BP and RCB <50%, the input / output command classification device 100 extracts the SR of the input / output command. When 8≤BP and RCB> 50%, the input / output command classification device 100 receives the input / output command. (5) can be classified as input / output types.

만약, 8≤BP, RCB<50% 이면서, SR<20%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령의 RER을 추출하고, 8≤BP, RCB<50% 이면서, SR>20%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (7) 또는 (8) 입출력 유형으로 분류할 수 있다.If 8≤BP, RCB <50% and SR <20%, the input / output command classification apparatus 100 extracts the RER of the input / output command, and 8≤BP, RCB <50% and SR> 20%. In this case, the input / output command classification apparatus 100 may classify the input / output command into (7) or (8) input / output types.

그리고, 8≤BP, RCB<50%, SR<20% 이면서 RER<50%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (3) 입출력 유형으로 분류하고, ≤BP, RCB<50%, SR<20% 이면서 RER>50%인 경우, 입출력 명령 분류 장치(100)는 입출력 명령을 (6) 입출력 유형으로 분류할 수 있다.When 8≤BP, RCB <50%, SR <20% and RER <50%, the input / output command classification device 100 classifies the input / output command as (3) input / output type, and ≤BP, RCB <50% In the case where SR <20% and RER> 50%, the input / output command classification apparatus 100 may classify the input / output command into (6) input / output types.

도 3은 입출력 명령 분류 장치(100)가 트리 기반의 입출력 명령 분류 방법(decision tree)을 도시하였고, 장치의 구성에 따라 K-nearest neighbor, support vector machine, logistic regression, random forest와 같은 입출력 명령 분류 방법이 사용될 수 있다. 이러한 입출력 명령 분류 방법은 저장 장치의 펌웨어에 탑재되며, 탑재시 입출력 명령 분류 방법에서 사용되는 계수 정보가 입력될 수 있다. 입출력 명령 분류 방법은 훈련이 반복될수록 성능이 향상될 수 있다.3 illustrates a tree-based input / output command classification method (decision tree) of the input / output command classification apparatus 100, and classifies input / output commands such as K-nearest neighbor, support vector machine, logistic regression, and random forest according to the configuration of the apparatus. The method can be used. Such an input / output command classification method is mounted in firmware of a storage device, and coefficient information used in the input / output command classification method may be input when mounted. The I / O command classification method can improve performance as the training is repeated.

도 4는 본 발명의 일실시예에 따른 입출력 유형을 도시한 도면이다.4 is a diagram illustrating an input / output type according to an embodiment of the present invention.

도 4는 클러스터에 대응하는 입출력 유형의 예들을 도시한다. 구체적으로, 입출력 유형은 sequential, interleaved 및 random으로 구분될 수 있다. 중단 포인트에 따라 생성된 기록의 세그먼트가 길수록 입출력 유형은 sequential하고, 세그먼트가 짧을수록 입출력 유형은 interleaved한 것을 알 수 있다. 그리고, 세그먼트의 분포가 불규칙할수록 입출력 유형은 random한 것을 알 수 있다.4 shows examples of input / output types corresponding to clusters. In detail, the input / output type may be classified into sequential, interleaved, and random. As the segment of the record generated according to the breakpoint is longer, the input / output type is sequential. As the segment is shorter, the input / output type is interleaved. In addition, as the distribution of segments becomes more irregular, it can be seen that the input / output type is random.

입출력 유형에 따라 저장 장치의 설계를 최적화하는 방법은 다음과 같다.The method of optimizing the design of the storage device according to the input / output type is as follows.

본 발명의 실시예들은 HDD(Hard disk drive)나 SSD(Solid-state Drive)에 적용될 수 있다. HDD나 SSD와 같은 저장 장치는 쓰기 연산의 횟수가 제한적이다. 결국, 쓰기 연산의 횟수를 줄이는 것이 저장 장치의 수명에 중요한 변수가 된다. 따라서, 동일한 쓰기 명령이 입력되었을 때 실제로 쓰기 연산의 횟수를 줄이는 write less 기술이 중요하다. Write less 기술의 대표적인 3가지로, compression 방법, de-duplication 방법 및 differential write 방법이 있다.Embodiments of the present invention can be applied to a hard disk drive (HDD) or a solid-state drive (SSD). Storage devices such as HDDs and SSDs have a limited number of write operations. As a result, reducing the number of write operations is an important variable for the lifetime of the storage device. Therefore, a write less technique that reduces the number of write operations when the same write command is input is important. Three representative methods of write less technology are compression method, de-duplication method and differential write method.

compression 방법은 쓰기 명령이 입력되었을 때 바로 쓰기 연산을 수행하지 않고 압축하여 쓰기 연산을 수행하는 방법이다. 그리고, de-duplication 방법은 쓰기 명령이 입력되었을 때 저장 장치의 다른 지점에 동일한 내용이 존재하는 경우 쓰기 명령의 대상이 된 내용에 대해서는 쓰기 연산하지 않고 이미 존재하는 내용으로 대체하는 방법이다. 또한, differential 방법은 이미 내용이 저장된 위치에 다른 입력으로 덮어 쓰기하는 경우, 쓰기 연산 전의 내용과 쓰기 연산 후의 내용이 동일하면 덮어 쓰기하지 않고 그대로 유지하는 방법이다.The compression method is a method of compressing a write operation without performing a write operation immediately when a write command is input. When the write command is input, if the same content exists at different points of the storage device, the de-duplication method replaces the content that is the target of the write command with the existing content without writing. In addition, the differential method is a method of overwriting a content already stored in another location. If the content before the write operation and the content after the write operation are the same, the differential method is maintained without overwriting.

입출력 유형이 sequential인 경우, 하나의 프로세스가 연속해서 쓰기 연산하는 경우가 많다. 따라서, 입출력 유형이 sequential인 경우, compression 방법을 이용하면 쓰기 연산의 횟수가 크게 감소하여 효과적이다. 그러나, 입출력 유형이 sequential인 경우, 기록된 내용 간에 비교할 구간이 너무 길며, 같은 입출력 유형이 자주 발생하지 않기 때문에 de-duplication 방법이나 differential 방법은 비효율적이다.If the I / O type is sequential, one process often writes consecutively. Therefore, when the input / output type is sequential, the compression method is effective because the number of write operations is greatly reduced. However, if the input / output type is sequential, the de-duplication method or the differential method is inefficient because the interval to compare between recorded contents is too long and the same input / output type does not occur frequently.

입출력 유형이 random한 경우, 여러 프로세스가 연속해서 쓰기 연산하는 경우가 많다. 따라서, 입출력 유형이 random한 경우, 압축 효율이 낮기 때문에 compression 방법은 비효율적이다. 반대로, 입출력 유형이 random한 경우, 기록된 내용 간에 비교할 구간의 길이나 패턴의 유사성을 고려할 때 de-duplication 방법이나 differential 방법이 compression 방법보다 효율적이다.If the I / O type is random, many processes can write consecutively. Therefore, when the input / output type is random, the compression method is inefficient because the compression efficiency is low. On the contrary, when the input / output type is random, the de-duplication method or the differential method is more efficient than the compression method in consideration of the similarity of the interval length or pattern to be compared between the recorded contents.

도 5는 본 발명의 일실시예에 따른 클러스터 개수 대비 실루엣값을 나타낸 그래프이다.5 is a graph showing a silhouette value versus the number of clusters according to an embodiment of the present invention.

입출력 특성 분류 장치(100)는 집합의 특성에 따라 클러스터의 개수를 무제한적으로 증가시킬 수 없기 때문에, 도 5의 그래프에 따라 클러스터의 개수를 결정할 수 있다. 도 5를 참고하면, 평균 실루엣값이 최대인 클러스터의 번호가 최종적으로 생성될 클러스터 개수가 된다. 그리고, 생성된 클러스터의 개수는 입출력 유형의 개수가 된다. Since the input / output characteristic classification apparatus 100 may not increase the number of clusters indefinitely according to the characteristics of the set, the input / output characteristic classification apparatus 100 may determine the number of clusters according to the graph of FIG. 5. Referring to FIG. 5, the number of clusters having the maximum average silhouette value is the number of clusters to be finally generated. The number of generated clusters is the number of input / output types.

도 6은 본 발명의 일실시예에 따른 입출력 특성 분류 방법을 도시한 플로우차트이다.6 is a flowchart illustrating a method of classifying input / output characteristics according to an embodiment of the present invention.

단계(S601)에서, 입출력 특성 분류 장치(100)는 입출력 명령을 그룹화하여 적어도 하나의 집합을 생성할 수 있다. 일례로, 입출력 특성 분류 장치(100)는 시간 윈도우(time window)에 따라 입출력 명령을 그룹화하여 시간 윈도우별로 하나의 집합을 생성할 수 있다. 시간 윈도우의 크기는 입출력 명령의 개수에 대응할 수 있다.In operation S601, the input / output characteristic classification apparatus 100 may generate at least one set by grouping input / output commands. For example, the input / output characteristic classification apparatus 100 may generate one set for each time window by grouping input / output commands according to a time window. The size of the time window may correspond to the number of input / output commands.

단계(S602)에서, 입출력 특성 분류 장치(100)는 집합들 각각으로부터 페이지 도메인 기반의 특성(feature)을 추출할 수 있다. In operation S602, the input / output characteristic classification apparatus 100 may extract a page domain based feature from each of the sets.

제1 일실시예로, 입출력 특성 분류 장치(100)는 집합들 각각으로부터 중단 포인트(break point)의 개수를 추출할 수 있다. 제2 실시예로, 입출력 특성 분류 장치(100)는 집합들 각각으로부터 전체 중단 포인트 대비 연결 포인트(connection point)의 비율을 추출할 수 있다. 제3 실시예로, 입출력 특성 분류 장치(100)는 집합들 각각으로부터 랜덤 포인트(random point)를 가지는 페이지들의 사이즈 비율을 추출할 수 있다. 제4 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 랜덤 포인트를 포함하는 페이지를 제외한 영역의 전체 중단 포인트 대비 연결 포인트의 비율을 추출할 수 있다. 제5 실시예로, 입출력 특성 분류 장치(100)는 시간 윈도우별로 생성된 집합들 각각으로부터 시간 윈도우 내에서 중단점 이후에 기록되는 위치가 중단점에 비해 증가인지 또는 감소인지를 나타내는 비율을 추출할 수 있다.In a first embodiment, the input / output characteristic classification apparatus 100 may extract the number of break points from each of the sets. In a second embodiment, the input / output characteristic classification apparatus 100 may extract a ratio of connection points to total breakpoints from each of the sets. In a third embodiment, the input / output characteristic classification apparatus 100 may extract a size ratio of pages having a random point from each of the sets. In a fourth embodiment, the input / output characteristic classification apparatus 100 may extract a ratio of connection points to total breakpoints of an area excluding a page including random points in a time window from each of sets generated for each time window. . In a fifth embodiment, the input / output characteristic classification apparatus 100 extracts a ratio indicating whether a position recorded after a breakpoint in the time window is increased or decreased compared to the breakpoint from each of the sets generated for each time window. Can be.

단계(S603)에서, 입출력 특성 분류 장치(100)는 추출된 특성들을 이용하여 입출력 유형(IO Pattern)에 대응하는 적어도 하나의 클러스터(cluster)들을 생생할 수 있다.In operation S603, the input / output characteristic classification apparatus 100 may generate at least one cluster corresponding to the input / output type IO pattern using the extracted characteristics.

단계(S604)에서, 입출력 특성 분류 장치(100)는 신규로 입력된 입출력 명령을 적어도 하나의 클러스터들 중 어느 하나의 클러스터로 분류할 수 있다.In operation S604, the input / output characteristic classification apparatus 100 may classify the newly inputted input / output command into any one cluster among at least one cluster.

도 6에서 설명되지 않은 부분은 도 1 내지 도 5의 설명을 참고할 수 있다.Parts not described in FIG. 6 may refer to descriptions of FIGS. 1 to 5.

본 발명의 실시 예에 따른 방법들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. The methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

100: 입출력 특성 분류 장치
101: 집합 생성부
102; 특성 추출부
103: 클러스터 생성부
104: 입출력 명령 분류부100: input and output characteristic classification device
101: set generator
102; Feature Extraction Unit
103: cluster generator
104: I / O command classification unit

Claims

A set generating unit grouping input / output commands to generate at least one set;
A feature extractor extracting a feature based on a page domain from each of the sets; And
A cluster generation unit generating at least one cluster corresponding to an input / output type (IO pattern) by using the extracted characteristics.
Lt; / RTI >
The feature extraction unit,
The number of breakpoints from each of the sets, the ratio of connection points to total breakpoints, the size ratio of pages with random points, Extracting at least one of a ratio of connection points to total breakpoints, and a ratio indicating whether a position recorded after the breakpoint is increased or decreased relative to the breakpoint,
Input / output characteristic classification device.

The method of claim 1,
The set generating unit,
And classifying input and output commands according to a time window to generate one set for each time window.

3. The method of claim 2,
And a size of the time window corresponds to the number of input / output commands.

The method of claim 1,
I / O command classifying unit classifying newly inputted I / O command into any one cluster among at least one cluster
Input and output characteristic classification device further comprising.

delete

Grouping input / output commands to generate at least one set;
Extracting a page domain based feature from each of the sets; And
Generating at least one cluster corresponding to an input / output type (IO pattern) using the extracted characteristics;
Lt; / RTI >
Extracting the characteristic,
The number of breakpoints, the ratio of connection points to total breakpoints, the size ratio of pages having random points from each of the sets, and the area excluding the page including random points. Extracting at least one of a ratio of connection points to total breakpoints, and a ratio indicating whether a position recorded after the breakpoint is increased or decreased relative to the breakpoint;
Input and output characteristic classification method comprising a.

The method of claim 10,
Generating the set,
A method for classifying I / O characteristics, characterized by generating one set for each time window by grouping I / O commands according to a time window.

12. The method of claim 11,
And the size of the time window corresponds to the number of input / output commands.

The method of claim 10,
Classifying the newly inputted I / O command into any one cluster among at least one cluster
Input and output characteristic classification method further comprising.

delete