KR101893661B1

KR101893661B1 - A METHOD OF COMPUTING δ-APPROXIMATE PERIODS AND γ-APPROXIMATE PERIODS OF STRINGS OVER INTEGER ALPHABETS

Info

Publication number: KR101893661B1
Application number: KR1020160166451A
Authority: KR
Inventors: 심정섭; 김영호
Original assignee: 인하대학교 산학협력단
Priority date: 2016-12-08
Filing date: 2016-12-08
Publication date: 2018-08-30
Also published as: KR20180065496A

Abstract

정수문자집합에 대한 문자열의

-근사주기와

-근사주기 계산 방법이 개시된다. 일 실시예에 따른 근사주기 계산 방법은, 정수문자로 구성된 길이가 n인 문자열(T)에 대하여, 길이가 n/2인 배열을 생성하는 단계; 상기 길이가 m

인 문자열(T)의

-근사주기 또는

-근사주기를 계산하는 단계; 및 상기

-근사주기 또는

-근사주기와 관련된 문자열(P)을 획득하는 단계를 포함할 수 있다. String of integers for the character set

- Approximate cycles and

- Approximation period calculation method is disclosed. The method of calculating an approximate period according to an exemplary embodiment includes: generating an array of length n / 2 for a string T having a length n composed of integer characters; If the length is m

Of the character string (T)

- Approximate cycle or

Calculating an approximation period; And

- Approximate cycle or

- obtaining a string (P) associated with the approximation period.

Description

METHOD OF COMPUTING δ-APPROXIMATE PERIODS AND γ-APPROXIMATE PERIODS OF STRINGS OVER INTEGER ALPHABETS}

아래의 설명은 정수문자집합에 대한 문자열의

-근사주기와

-근사주기를 계산하는 기술에 관한 것이다.
The description below is for a string of integers

- Approximate cycles and

- a technique for calculating approximate cycles.

반복적인 문자열에 대한 연구는 데이터압축, 생물정보학, 컴퓨터지원 음악분석 등 많은 분야에서 연구되어 왔다. 반복(Repetition)은 문자열 T가 주어졌을 때, 연속적으로 나타나는 T의 부분문자열을 말한다. 예를 들면, T=aababab라 하면, aa와 ababab는 T의 반복이다. Research on repetitive strings has been studied in many fields such as data compression, bioinformatics, and computer-aided music analysis. Repeat (Repetition) when the string T is given, refers to the partial string of T appears successively. For example, if T = aababab , aa and ababab are T iterations.

이때,

은 스퀘어(Square),

은 큐브(Cube)라 부른다. 문자열 P를 r번 반복적으로 연결시킨 문자열을 P ^r 이라 하자. 문자열 T가 주어졌을 때, r>0이고, P'이 P의 접두사(Prefix)일 때, T=P ^r P'를 만족하면, P를 T의 주기(Period)라 한다. 예를 들면, T=abcabcab의 주기는 abc, abcabc, T이다. 주기 중 가장 짧은 문자열을 최단 주기라 한다. 다시 말해서, abc는 T의 최단 주기이다. 만약 T에

인 주기 P가 존재하면, T는 주기적(Periodic)이라 한다. At this time,

Square,

Is called a cube. Let P ^r be a string of repeatedly connecting string P r times. When the string T is given, r> 0 and, when P 'is when the prefix (Prefix) of P, T = P ^r P' satisfies, should the P la period (Period) of T. For example, the period of T = abcabcab is abc, abcabc, T. The shortest period is called the shortest cycle. In other words, abc is the shortest period of T. If T

If there is a period P, then T is called Periodic.

목적함수로 거리함수를 이용하여 거리반경 기반 문자열의 근사주기를 정의하고, 세 개의 문제를 제안하는 기술이 존재한다. 상기 기술은 두 문제에 대해 다항시간에 해결하는 알고리즘을 제시하고, 세 번째 문제에 대해 NP-완전임을 증명하였다.There are techniques to define the approximate cycle of the distance radius based string using the distance function as an objective function, and to propose three problems. The above technique presents an algorithm for solving the problem in polynomial time for both problems, and proved NP-complete for the third problem.

또 다른 기술에서는 거리합기반 문자열의 근사주기를 정의하고, 문자열 P와 X가 주어졌을 때, X에 대한 P의 최소 근사주기거리를 가중편집거리에 대해

시간, 편집거리에 대해

시간, 해밍거리에 대해

시간에 계산하는 알고리즘을 제시하였다. Another technique is to define an approximate period of a distance sum based string, and when the strings P and X are given, the minimum approximate cycle distance of P to X is calculated for the weighted edit distance

About time, edit distance

Time, Hamming distance

Time algorithm.

한편, 정수문자로 구성된 문자열에 대한 근사문자열매칭(Approximate String Matching)의 연구도 진행되고 있다. 예를 들면, 음의 높이인 음고(Pitch)는 MIDI 숫자들로 표현될 수 있고, 음과 음 사이의 음높이 차이인 음정(Pitch Interval)은 반음의 수로 표현될 수 있기 때문에 음악서열을 정수문자집합에 대한 문자열로 볼 수 있다. 이와 비슷하게 주가도 정수문자집합에 대한 문자열로 볼 수 있다. On the other hand, approximate string matching for a string composed of integer characters is also being studied. For example, the pitch of a pitch can be represented by MIDI numbers, and the pitch interval, which is the pitch difference between notes and notes, can be represented by a number of semitones, Can be seen as a string. Similarly, the stock price can also be viewed as a string of integer character sets.

더 나아가,

-근사패턴매칭(Approximate Pattern Matching)과 (

)-근사패턴매칭,

-근사스퀘어(Approximate Square)와 (

)-근사스퀘어를 찾는 문제를 정의하고 이를 해결하는 알고리즘,

-근사반복(Approximate Repetition)과 (

)-근사반복에 관한 문제들을 정의하고 각 문제를 해결하는 알고리즘,

-근사패턴매칭문제에 대해 Boyer-Moore 알고리즘 기반의 알고리즘, (

)-매칭과 관련된 새로운 문제들을 정의하고 이에 대해 FFT(Fast Fourier Transform)를 적용한 알고리즘 및 무관문자(Don't Care Symbol)를 포함한 (

)-매칭 문제를 연구하는 기술이 제안되어 왔다.
Furthermore,

- Approximate Pattern Matching and (

) - approximate pattern matching,

- Approximate Square and

) - Algorithms to define and solve problem finding approximate squares,

- Approximate Repetition and (

) - Algorithms that define problems with approximate iterations and solve each problem,

- Algorithm based on Boyer-Moore algorithm for approximate pattern matching problem, (

) - Defines new problems related to matching, and includes algorithms and FFT (Fast Fourier Transform) applied to them, including Do not Care Symbols

) - A technique for studying matching problems has been proposed.

본 발명은

-근사와

-근사를 이용하여 정수문자집합에 대한 문자열의

-근사주기 및

-근사주기를 정의한다. 또한, 본 발명은 최소

-근사주기와 최소

-근사주기를 탐색하는 문제를 정의하고, 이를 각각

시간에 해결하는 알고리즘을 제공한다.
The present invention

- Approximate

- Use the approximation to convert a string to an integer character set.

- Approximate cycles and

- define the approximate cycle. In addition,

- Approximate cycle and minimum

- define the problem of searching approximate cycles, and

Provides an algorithm to solve in time.

근사주기 계산 방법은, 정수문자로 구성된 길이가 n인 문자열(T)에 대하여, 길이가 n/2인 배열을 생성하는 단계; 상기 길이가 m

인 문자열(T)의

-근사주기 또는

-근사주기를 계산하는 단계; 및 상기

-근사주기 또는

-근사주기와 관련된 문자열(P)을 획득하는 단계를 포함할 수 있다. The approximate period calculation method includes the steps of: generating an array of length n / 2 for a string T having a length n composed of integer characters; If the length is m

Of the character string (T)

- Approximate cycle or

Calculating an approximation period; And

- Approximate cycle or

- obtaining a string (P) associated with the approximation period.

상기 문자열

와

가 주어졌을 때,

에 대해

인 i

의 집합을 E _j 라고 하면, 상기 j

,

에 대해

이면, 상기 문자열(P)는 상기 문자열(T)에 대해 거리가 d인

-근사주기이고, 상기

,

에 대해

를 만족하면, 상기 문자열(P)는 상기 문자열(T)에 대해 거리가 d인

-근사주기일 수 있다. The string

Wow

When given,

About

I

Is denoted by E _j , the set j

,

About

, The character string (P) has a distance d to the character string (T)

- an approximation period,

,

About

, The character string (P) has a distance d with respect to the character string T

- May be an approximate cycle.

상기 길이가 m

인 문자열(T)의

-근사주기 또는

-근사주기를 계산하는 단계는, 상기 길이가 m

인 문자열(T)의

-근사주기들 중 최소

-근사주기와 관련된 문자열(P)과 거리(d)를 계산하거나 상기 길이가 m

인 문자열(T)의 -근사주기들 중 최소

-근사주기와 관련된 문자열(P)과 거리(d)를 계산하는 단계를 포함하고, 상기 문자열(T)에 대한 최소

-근사주기는, 상기 문자열(T)의

-근사주기들 중 거리가 최소이고 가장 짧은 문자열을 의미하고, 상기 문자열(T)에 대한 최소

-근사주기는, 상기 문자열(T)의

-근사주기들 중 거리가 최소이고, 길이가 가장 짧은 문자열을 의미할 수 있다. If the length is m

Of the character string (T)

- Approximate cycle or

Calculating an approximation period, wherein the length is m

Of the character string (T)

- minimum of approximate cycles

- Calculate the string (P) and distance (d) associated with the approximation cycle or if the length is m

Of the character string (T) - minimum of approximate cycles

Calculating a string (P) and a distance (d) related to the approximation cycle,

- the approximate cycle is a function of the (T)

- means the shortest and shortest string of approximate periods,

- the approximate cycle is a function of the (T)

- It can mean a string with the shortest distance and the shortest length among the approximate cycles.

상기 길이가 m

인 문자열(T)의

-근사주기 또는

-근사주기를 계산하는 단계는, 상기 문자열 T(|T|=n)를 길이 m으로 분할했을 때 생성되는 부분문자열들을 주기 블록이라 하고, 각

에서 시작하는 k번째 주기블록을 T _k 라고 할 때, 상기

-근사주기의 길이를 1부터 n/2까지 증가시켜가면서, ComputeDeltaAP 함수를 이용하여 상기 길이에 대한

-근사주기를 계산하는 단계를 포함하고, 상기 ComputeDeltaAP 함수는 상기 문자열(T)와 근사주기의 길이 m이 주어지면,

들의 오차를 최소화하기 위해 최소값과 최대값의 산술평균을 이용하여 길이가 m일 때, 최소의 거리(

)와

-근사주기를 계산할 수 있다. If the length is m

Of the character string (T)

- Approximate cycle or

- The step of calculating the approximate period comprises: calling the substrings generated when the string T (| T | = n) is divided by the length m ,

Th periodic block starting from < RTI ID = 0.0 & gt; _k , < / RTI &

- Increasing the length of the approximation period from 1 to n / 2, using the ComputeDeltaAP function,

Computing the approximate period, wherein the ComputeDeltaAP function, when given a length m of the string (T) and an approximate period,

When the length is m using the arithmetic mean of the minimum value and the maximum value to minimize the error,

)Wow

- Approximate cycles can be calculated.

상기 길이가 m(1)인 문자열(T)의

-근사주기 또는

에서 시작하는 k번째 주기블록을 T _k 라고 할 때, 상기

-근사주기의 길이를 1부터 n/2까지 증가시키면서, ComputeGammaAP 함수를 이용하여 상기 길이에 대한

-근사주기를 계산하는 단계를 포함하고, 상기 ComputeGammaAP 함수는, 상기 문자열(T)와 근사주기의 길이 m이 주어지면,

들의 오차의 합을 최소화하기 위해 중앙값(Median)을 이용하여, 길이가 m일 때 최소의 거리(

)와

-근사주기를 계산할 수 있다. (T) having the length m (1)

- Approximate cycle or

Th periodic block starting from < RTI ID = 0.0 & gt; _k , < / RTI &

- ComputeGammaAP function, increasing the length of the approximation period from 1 to n / 2,

- computing the approximate period, the ComputeGammaAP function comprising: if given a length m of the string (T) and an approximate period,

(Median) is used to minimize the sum of the errors of the length (m) and the minimum distance

)Wow

- Approximate cycles can be calculated.

상기 중앙값은 t개의 정수들을 오름차순으로 정렬했을 때, t가 홀수일 때

번째의 값, t가 짝수일 때 t/2 번째와 (t/2)+1번째 값의 산술평균일 수 있다.
The median may be expressed as t integers sorted in ascending order, when t is odd

As the second value, t il even-t / 2 may be a second and (t / 2) + 1-th value of the arithmetic mean.

본 발명은 정수문자집합에 대한 문자열의

-근사주기와

-근사주기를 정의하고, 최소

-근사주기와 최소

-근사주기를 찾는 문제와 각 문제를 O(n²) 시간에 해결할 수 있다.The present invention relates to a method and apparatus for converting a character string

- Approximate cycles and

- define the approximate cycle,

- Approximate cycle and minimum

The problem of finding the approximate cycle and each problem can be solved in O (n ² ) time.

본 발명은 주어진 문자열에 무관문자가 포함된 경우, 본 발명에서 제안하는 알고리즘을 이용하여 최소

-근사주기와 최소

-근사주기를 찾을 수 있다. In the present invention, when an unrelated character is included in a given character string,

- Approximate cycle and minimum

- Approximate cycles can be found.

또한, 본 발명은 정수로 표현된 문자열에 대한 (

,

)-매칭은 음악 서열이나 주가 연구에 응용될 수 있다. In addition, the present invention relates to a method and apparatus for

,

) - Matching can be applied to music sequence or stock price research.

또한, 본 발명은 실수로 구성된 문자열에 대해서도 보다 쉽게 적용할 수 있다.
Further, the present invention can be more easily applied to a string composed of a real number.

도 1은 일 실시예에 따른

-근사주기를 계산하는 방법을 설명하기 위한 예이다.
도 2는 일 실시예에 따른

-근사주기를 계산하는 방법을 설명하기 위한 예이다.
도 3은 일 실시예에 따른 길이가 m이고 거리 d가 최소인

-근사주기를 계산하는 방법을 설명하기 위한 예이다.
도 4는 일 실시예에 따른 길이가 m이고 거리 d가 최소인

-근사주기를 계산하는 방법을 설명하기 위한 예이다.
도 5는 일 실시예에 따른 최소

-근사주기를 계산하는 방법을 설명하기 위한 흐름도이다.
도 6은 일 실시예에 따른 최소

-근사주기를 계산하는 방법을 설명하기 위한 흐름도이다. Figure 1 is a cross-

- This is an example to explain how to calculate the approximate cycle.
Figure 2 is a block diagram

- This is an example to explain how to calculate the approximate cycle.
FIG. 3 is a cross-sectional view of an embodiment in which the length is m and the distance d is minimum

- This is an example to explain how to calculate the approximate cycle.
Figure 4 is a graphical representation of an embodiment of the present invention,

- This is an example to explain how to calculate the approximate cycle.
Figure 5 is a block diagram

- a flowchart for explaining how to calculate the approximate cycle;
Figure 6 is a block diagram

- a flowchart for explaining how to calculate the approximate cycle;

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.
Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

유한한 문자집합 ∑의 문자들로 구성된 문자열들의 집합을 ∑^*로 나타내며, 길이가 m인 문자열들의 집합을 ∑^m으로 나타낸다. 문자열X의 길이는 |X|로 표기하며, X의 i번째

문자는

로 표기한다. X의 i번째 문자부터 j번째 문자까지의 부분문자열인

(

)는 X[i..j]로 표기하고, X=WU이면, W를 X의 접두사, U를 X의 접미사(Suffix)라 한다. T[i..j]가 i>j이거나 T[0]일 때 공백 문자열로 정의하고 이를 로 표기한다. 무관문자는 자신을 포함한 모든 문자와 일치하는 문자이며, ★로 표기한다. 다시 말해서,

일 때,

이다.A set of strings consisting of characters of a finite character set Σ is denoted by Σ ^* , and a set of strings of length ^m is denoted by Σ ^m . The length of the string X is denoted by | X |, and the i th

The letter

. The substring from the i-th character to the j-th character of X

(

) Is represented by X [i..j], and if X = WU, W is a prefix of X and U is a suffix of X. When T [i..j] is i> j or T [0], it is defined as an empty string and it is written as. An unindexed character is a character that matches all characters including itself, and is marked with a ★. In other words,

when,

to be.

문자열 X와 P, 거리함수 d가 주어졌을 때, 다양한 거리함수에 대한 근사주기를 다음과 같이 정의할 수 있다.

이고 P'을 P의 접두사라고 할 때,

이고,

이면, P를 X의 t-근사주기 또는 거리가 t인 X의 근사주기라고 한다. 종래의 기술에 따르면, 다양한 거리함수 d, 다시 말해서, 해밍거리, 편집거리, 메트릭 가중편집거리에 대해 P가 X의 t-근사주기를 만족하는 최소 근사주기거리 t를 찾는 알고리즘이 제시되었다.Given the strings X and P and the distance function d, the approximate cycle for various distance functions can be defined as follows.

And P 'is the prefix of P,

ego,

, P is called the t-approximation period of X or the approximate period of X whose distance is t. According to the conventional technique, an algorithm for finding the minimum approximate cycle distance t in which P satisfies the t-approximation period of X is proposed for various distance functions d , i.e., hamming distance, edit distance, and metric weighted edit distance.

정수문자집합 ∑의 두 문자 a, b와 정수

가 주어졌을 때,

이면, a와 b는

-근사라 하고(역도 성립),

문자열 X와 Y

, 정수

,

가 주어졌을 때,

이면, X와 Y는

-근사라고 하고(역도 성립),

로 표기할 수 있다. The two characters a, b and integer in the integer character set Σ

When given,

If so, a and b are

- It is called " muscle"

The strings X and Y

, essence

,

When given,

, Then X and Y are

- It is called "approximation"

.

만약,

이면, X와 Y는

-근사라 하고(역도 성립),

로 표기한다. 만약, X와 Y가

-근사이며 동시에

-근사이면, X와 Y는 (

,

)-근사라 하고,

로 표기한다. if,

, Then X and Y are

- It is called "muscle"

. If X and Y are

- At the same time,

- if it is approximate, X and Y are (

,

) - It's called muscle,

.

도 1은 일 실시예에 따른

-근사주기를 계산하는 방법을 설명하기 위한 예이고, 도 2는 일 실시예에 따른

-근사주기를 계산하는 방법을 설명하기 위한 예이다.Figure 1 is a cross-

- an example for explaining a method of calculating the approximate period, and Fig. 2 is a graph

- This is an example to explain how to calculate the approximate cycle.

문자열

와

가 주어졌을 때, 각

에 대해

인 i

의 집합을 E_j라고 가정한다. 이때, 모든 j

,

에 대해

이면, P는 T에 대해 거리가 d인

-근사주기라고 정의한다. 또한, 모든

,

에 대해

를 만족하면, P는 T에 대해 거리가 d인

-근사주기라 정의한다. String

Wow

When given, each

About

I

Is assumed to be E _j . At this time, all j

,

About

, P is the distance d to T

- Define an approximation cycle. Also,

,

About

, P is the distance d to T

- It is defined as an approximate cycle.

도 1 및 도 2를 참고하면, 예를 들면, T=(17, 12, 21, 16, 14, 20, 23, 7, 25, 19, 15)가 주어질 수 있다. 이때, P=(17, 13, 20)은 T에 대해 거리가 6인

-근사주기이며, P=(19, 11, 22)는 T에 대해 거리가 4인

-근사주기이다. 1 and 2, for example, T = (17, 12, 21, 16, 14, 20, 23, 7, 25, 19, 15) can be given. At this time, P = (17, 13, 20)

- the approximate cycle, and P = (19, 11, 22)

- Approximate cycle.

또한, P=(19, 11, 22)는 T에 대해 거리가 27인

-근사주기이며, P=(18, 13, 21)은 T에 대해 거리가 24인

-근사주기이다. 문자열(T)만 주어졌을 때 문자열(T)에 대하여 복수의

-근사주기 및

-근사주기가 존재할 수 있다. 문자열(T)의

-근사주기들 중 거리가 최소이고, 가장 짧은 문자열을 T에 대한 최소

-근사주기라고 정의하며, T의

-근사주기들 중 거리가 최소이고, 길이가 가장 짧은 문자열을 T에 대한 최소

-근사주기라고 정의한다. Further, P = (19, 11, 22) has a distance of 27

- the approximate cycle, and P = (18, 13, 21)

- Approximate cycle. When a string (T) is given, a plurality of

- Approximate cycles and

- Approximate cycles may exist. Of the string (T)

- the minimum distance between the approximate cycles and the shortest string to T

- an approximation cycle, and T

- the minimum distance between the approximate cycles and the shortest string to the minimum for T

- Define an approximation cycle.

도 3은 일 실시예에 따른 길이가 m이고 거리 d가 최소인

-근사주기를 계산하는 방법을 설명하기 위한 예이고, 도 4는 일 실시예에 따른 길이가 m이고 거리 d가 최소인

-근사주기를 계산하는 방법을 설명하기 위한 예이다.FIG. 3 is a cross-sectional view of an embodiment in which the length is m and the distance d is minimum

Fig. 4 is an example for explaining a method of calculating an approximate period, and Fig.

- This is an example to explain how to calculate the approximate cycle.

아래의 설명에서는 문제 1 내지 4를 해결하는 알고리즘들을 제시할 수 있다. 단, ∑는 유한한 정수문자집합으로 가정한다. In the following description, algorithms for solving Problems 1 to 4 can be suggested. However, Σ is assumed to be a finite integer character set.

주어진 문자열 T(|T|=n)를 길이 m으로 분할했을 때 생성되는 부분문자열들을 주기 블록이라 하고, 각

에서 시작하는 k번째 주기블록을 T _k 라고 하자.
Substrings generated when a given string T (| T | = n) is divided by a length m are called periodic blocks,

The k-th block period that begins on Let T _k.

문제 1. 주어진

의 최소

-근사주기

찾기.Problem 1. given

Minimum of

- approximate cycle

find.

도 5를 참고하면, 최소

-근사주기를 계산하는 방법을 설명하기 위한 흐름도로서, 단계(510 내지 530)을 수행할 수 있다. 이때, 근사주기 계산 시스템에 의하여 최소

-근사주기가 계산될 수 있다. Referring to FIG. 5,

As a flow chart for explaining the method of calculating the approximate cycle, steps 510 to 530 may be performed. At this time, by the approximate period calculation system,

- Approximate cycles can be calculated.

단계(510)에서 근사주기 계산 시스템은 정수문자로 구성된 길이가 n인 문자열(T)에 대하여 길이가 n/2인 배열 각각을 생성할 수 있다. In step 510, the approximate period calculation system may generate each of the arrays of length n / 2 for a string T of length n consisting of integer characters.

단계(520)에서 근사주기 계산 시스템은 길이가 m

인 문자열(T)의

-근사주기들 중 최소

-근사주기와 관련된 문자열(P)의 길이와 거리(d)를 계산할 수 있다. In step 520, the approximate period calculation system calculates the approximate period

Of the character string (T)

- minimum of approximate cycles

The length and distance d of the string P related to the approximation cycle can be calculated.

단계(530)에서 근사주기 계산 시스템은 문자열(T)의 최소

-근사주기와 관련된 문자열(P)를 획득할 수 있다. In step 530, the approximate period calculation system determines the minimum

- obtain a string (P) related to the approximation period.

단계(510 내지 단계530)에 대한 설명은 아래의 설명을 통하여 더욱 상세하게 설명하기로 한다. The description of steps 510 to 530 will be described in more detail with reference to the following description.

문제 1은 알고리즘 1을 이용하여 해결할 수 있다. 알고리즘 1은

-근사주기를 계산한다. Problem 1 can be solved using Algorithm 1. Algorithm 1

- Calculate the approximate cycle.

-근사주기의 각 문자

를 계산하기 위해, ComputeDeltaAP 함수는 문자열(T)와 근사주기의 길이 m이 주어지면,

)와

-근사주기를 계산한다.

- Each character in the approximation cycle

, The ComputeDeltaAP function returns the string T and the length m of the approximate period,

)Wow

- Calculate the approximate cycle.

구체적으로,

에서

(주기블록

,

)로부터 최소값과 최대값을 각각 A[j], B[j]에 저장한다(1번째 줄부터 9번째 줄). 각 P[j]는 (A[j] + B[j])/2로,

는

로 계산될 수 있다(10번째 줄부터 12번째 줄). Specifically,

in

(Periodic block

,

) Are stored in A [j] and B [j], respectively, from the 1st line to the 9th line. Each P [j] is (A [j] + B [j]) / 2,

The

(10th to 12th lines).

예를 들면, 도 3을 참고하면, T=(17, 12, 21, 16, 14, 20, 23, 7, 25, 19, 15)이고, m=3일 때, T₁=(17, 12, 21), T₂=(16, 14, 20), T₃=(23, 7, 25), T₄=(9, 15)이고, A=(16, 7, 20), B=(23, 15, 25) 이고,

-근사주기 P=(19.5, 11, 22.5),

=4가 된다. 알고리즘 1은 크기가 n/2인 배열 A, B, P를 할당하고, ComputeDeltaAP 함수를 이용하여 최소

-근사주기의 거리(

)와 길이(m')를 계산한다(6번째 줄부터 10번째 줄). m'이 계산됨에 따라 m'을 이용하여 최소

-근사주기를 다시 계산한다(11번째 줄). 최소

-근사주기는 정수문자로 구성되어야 하므로 각 문자 P[j]의 소수부분을 올림하거나 내림한 정수로 변환하여 출력한다. 이때, 정수 조건이 없으면 변환하지 않은 상태로 출력한다. 앞서 설명한 바와 같이, m'=3이며, 최소

-근사주기는 정수 변환전(19.5, 11, 22.5)이며, 변환 후(19, 11 22), (20, 11, 22), (19, 11, 23), (20, 11, 23) 중 적어도 하나가 출력될 수 있다. 알고리즘 1은

시간에 수행되는 ComputeDeltaAP함수를(n/2)+1번 호출하기 때문에

시간에 수행될 수 있다.
For example, referring to Figure 3, wherein T = (17, 12, 21 , 16, 14, 20, 23, 7, 25, 19, 15), m = time 3 days, T ₁ = (17, 12 _{, 21), T 2 = (} 16, 14, 20), T 3 = (23, 7, 25), T 4 = (9, 15) and, a = (16, 7, 20), B = (23 , 15, 25)

- approximate cycle P = (19.5, 11, 22.5),

= 4. Algorithm 1 allocates arrays A, B, and P with size n / 2, and computes the minimum

- the distance of the approximate cycle (

) And the length (m ') (from line 6 to line 10). As m 'is calculated, m'

- Recalculate the approximate cycle (line 11). at least

- Since the approximation cycle should consist of integer characters, convert the fractional part of each character P [j] to an integer that rounds up or down. At this time, if there is no integer condition, it is output without conversion. As described above, m '= 3 and minimum

The approximation period is 19.5, 11, and 22.5 before the integer transformation, and at least (19, 11, 22), 20, 11, 22, 19, 11, 23, One can be output. Algorithm 1

(N / 2) + 1 times the ComputeDeltaAP function performed at

Lt; / RTI >

문제 2. 주어진

의 최소

-근사주기

찾기.Problem 2. Given

Minimum of

- approximate cycle

find.

도 6을 참고하면, 최소

-근사주기를 계산하는 방법을 설명하기 위한 흐름도로서, 단계(610 내지 630)을 수행할 수 있다. 이때, 근사주기 계산 시스템에 의하여 최소

-근사주기가 계산될 수 있다. Referring to FIG. 6,

As a flow chart for explaining the method of calculating the approximate cycle, steps 610 to 630 may be performed. At this time, by the approximate period calculation system,

- Approximate cycles can be calculated.

단계(610)에서 근사주기 계산 시스템은 정수문자로 구성된 길이가 n인 문자열(T)에 대하여 길이가 n/2인 배열을 생성할 수 있다. In step 610, the approximate period calculation system may generate an array of length n / 2 for a string T of length n consisting of integer characters.

단계(620)에서 근사주기 계산 시스템은 길이가 m

인 문자열(T)의

-근사주기들 중 최소

-근사주기와 관련된 문자열(P)의 길이와 거리(d)를 계산할 수 있다. In step 620, the approximate period calculation system calculates the approximate period

Of the character string (T)

- minimum of approximate cycles

단계(630)에서 근사주기 계산 시스템은 문자열(T)의 최소

-근사주기와 관련된 문자열(P)를 획득할 수 있다. In step 630, the approximate period calculation system determines the minimum

- obtain a string (P) related to the approximation period.

단계(610 내지 단계630)에 대한 설명은 아래의 설명을 통하여 더욱 상세하게 설명하기로 한다. The description of steps 610 to 630 will be described in more detail with reference to the following description.

문제 2는 알고리즘 2를 이용하여 해결할 수 있다. 알고리즘 2는

-근사주기를 계산할 수 있다. Problem 2 can be solved using Algorithm 2. Algorithm 2

- Approximate cycles can be calculated.

-근사주기의 각 문자 P[i]

를 계산하기 위하여, ComputeGammaAP 함수는 문자열(T)와 근사주기의 길이 m이 주어지면,

)와

-근사주기를 계산할 수 있다. 중앙값은 t개의 정수들을 오름차순으로 정렬했을 때, t가 홀수일 때는

번째의 값, t가 짝수일 때는 t/2 번째와 (t/2)+1번째 값의 산술평균이다. t개의 실수들로 구성된 집합 S에 대해

가 S의 중앙값이면,

는 최소이다. 구체적으로 i

에서 주기블록

들의 중앙값을 계산한다. 각 P[i]는 ComputeMedian 함수를 통해 계산되며(1번째 줄부터 2번째 줄),

은

으로 계산될 수 있다(3번째 줄). 예를 들면, 도 4를 참고하면, T=(17, 12, 21, 16, 14, 20, 23, 7, 25, 19, 15)이고 m=3일 때, P[1]은 (17, 16, 23, 19)의 중앙값인 18, P[2]는 (12, 14, 7, 15)의 중앙값인 13, P[3]은 (21, 20, 25)의 중앙값인 21이 된다. 이에 따라

-근사주기 P=(18, 13, 21)이고,

=24가 된다.

- each letter P [ i ]

, The ComputeGammaAP function returns the string T and the length m of the approximate period,

)Wow

- Approximate cycles can be calculated. The median is the number of t integers sorted in ascending order, and when t is odd

The second value, when t is an even number il t / 2-th and (t / 2) + 1-th value of the arithmetic mean. For set S consisting of t real numbers

Is a median value of S,

Is minimal. Specifically, i

In the periodic block

Lt; / RTI > Each P [ i ] is computed through the ComputeMedian function (first to second)

silver

(The third line). For example, with reference to FIG. 4, P [1] is (17, 12, 21, 16, 14, 20, 23, 7, 25, 19, 18, P [2] is the median value of (12, 14, 7, 15), and P [3] is the median value of (21, 20, 25) Accordingly

- the approximate cycle P = (18, 13, 21)

= 24.

알고리즘 2는

-근사주기를 저장할 크기가 n/2인 배열 P를 할당하고, ComputeGammaAP 함수를 이용하여 최소

-근사주기의 거리(

)와 길이(m')를 계산한다(6번째 줄부터 10번째 줄). m'이 계산됨에 따라 m'를 이용하여 최소

-근사주기를 다시 계산한다(11번째 줄). 최소

-근사주기는 정수문자로 구성되어야 하므로 각 문자 P[j]의 소수부분을 올림하거나 내림한 정수로 변환하여 출력할 수 있다. 이때, 정수 조건이 없으면 변환하지 않고 그대로 출력하면 된다. 앞서 설명한 것과 같이 m'=3이고, 최소

-근사주기는 (18, 13, 21)이다.Algorithm 2

- Assign an array P of size n / 2 to store the approximate period, and use ComputeGammaAP function to set the minimum

- the distance of the approximate cycle (

) And the length (m ') (from line 6 to line 10). As m 'is calculated, m'

- Recalculate the approximate cycle (line 11). at least

- Since the approximate cycle must consist of integer characters, the fractional part of each character P [j] can be converted to an integer that rounds up or down. At this time, if there is no integer condition, it can be output without conversion. As described above, m '= 3 and minimum

- The approximate cycle is (18, 13, 21).

알고리즘 2의 복잡도를 분석하면 다음과 같다. ComputeMedian 함수에서 중앙값은 정렬알고리즘을 이용하여

시간에 계산할 수 있다. ComputeGammaAP 함수의 시간복잡도는 ComputeMedian 함수를 m번 호출하기 때문에

이다. 알고리즘 2는 m

에 대해 ComputeGammaAP 함수를 n/2번 호출한다(6번째 줄부터 10번째 줄). 이에 따라 알고리즘 2의 시간복잡도는

이며, 아래의 과정에 의하여

임을 알 수 있다. The complexity of algorithm 2 is analyzed as follows. In the ComputeMedian function, the median is calculated using a sorting algorithm

It can be calculated in time. The time complexity of the ComputeGammaAP function calls the ComputeMedian function m times

to be. Algorithm 2 is m

Call the ComputeGammaAP function n / 2 times (from line 6 to line 10). Thus, the time complexity of Algorithm 2 is

, And by the process below

.

수학식 1:Equation 1:

stirling의 근사에 의해,

이다.

이므로,

이다. By approximating stirling,

to be.

Because of,

to be.

따라서,

이므로 양변에 log를 취하면,

이다.therefore,

If you take a log on both sides,

to be.

수학식 2:Equation 2:

수학식 2를 수학식 1에 대입하면, 수학식 3으로 도출될 수 있다.Substituting Equation (2) into Equation (1), Equation (3) can be derived.

수학식 3: Equation (3)

만약, ComputeMedian 함수에서 정렬알고리즘이 아닌, 선택알고리즘(Selection Algorithm)을 이용하면, 알고리즘 2의 시간복잡도가

임을 보다 쉽게 판단할 수 있다. ComputeMedian 함수에서 중앙값은 선택알고리즘을 이용할 경우,

시간에 계산할 수 있다. If we use the Selection Algorithm instead of the sort algorithm in the ComputeMedian function,

Can be more easily judged. In the ComputeMedian function, the median, when using a selection algorithm,

It can be calculated in time.

이에 따라 ComputeGammaAP 함수의 시간복잡도는

이고, 알고리즘 2는 ComputeGammaAP 함수를 (n/2)+1번 호출하기 때문에, 시간복잡도는

이다.Thus the time complexity of the ComputeGammaAP function is

And Algorithm 2 calls the ComputeGammaAP function (n / 2) + 1 times, the time complexity is

to be.

더 나아가, 문제 1 및 문제 2를 해결하는 알고리즘에 기반하여 시간복잡도의 증가없이 문제 3 및 문제 4를 해결할 수 있다. Furthermore, Problem 3 and Problem 4 can be solved without increasing the time complexity based on the algorithm solving Problem 1 and Problem 2.

문제 3. 주어진

의 최소

-근사주기

찾기.Problem 3. Given

Minimum of

- approximate cycle

find.

문제 4. 주어진

의 최소

-근사주기

찾기.
Problem 4. Given

Minimum of

- approximate cycle

find.

문제 3의 경우, ComputeDeltaAP 함수를 ComputeDeltaAPwithDC 함수로 변경하여 무관문자를 고려한 최소

-근사주기를 획득할 수 있다. 마찬가지로, 문제 4의 경우, T[i]가 ★인 경우, 중앙값 및

계산에서 제외하기 위하여 ComputeMedian 함수와 ComputeGammaAP 함수를 ComputeMedianwithDC 함수와 ComputeGammaAPwithDC 함수로 변경하여 무관문자를 고려한 최소

-근사주기를 획득할 수 있다.For problem 3, change the ComputeDeltaAP function to the ComputeDeltaAPwithDC function to set the minimum

- Approximate cycles can be obtained. Likewise, in the case of Problem 4, if T [i] is & cir &, the median and

ComputeMedian function and ComputeGammaAP function are changed to ComputeMedianwithDC function and ComputeGammaAPwithDC function in order to exclude from calculation.

- Approximate cycles can be obtained.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be embodyed temporarily. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the approximate period calculation method,
Generating an array of length n / 2 for a string T of length n consisting of integer characters;
If the length is m

Of the character string (T)

- Approximate cycle or

Calculating an approximation period; And
remind

- Approximate cycle or

- obtaining a string (P) associated with the approximate cycle
Lt; / RTI >
The string

Wow

When given,

About

I

Let E _j be the set of
remind

- The approximate cycle, all j

,

About

, The character string (P) has a distance d to the character string (T)

- means to define an approximate cycle,
remind

- Approximate cycles, all

,

About

- which means defining an approximation cycle
Approximate period calculation method.

delete

The method according to claim 1,
If the length is m

Of the character string (T)

- Approximate cycle or

Calculating the approximate period comprises:
If the length is m

Of the character string (T)

- minimum of approximate cycles

Of the character string (T)

- minimum of approximate cycles

Calculating a string (P) and a distance (d) associated with the approximation cycle;
Lt; / RTI >
The minimum value for the string (T)

- the approximate cycle is a function of the (T)

- means a character string having a minimum distance and a shortest length among the approximate cycles,

- the approximate cycle is a function of the (T)

- means a string with the shortest distance and the shortest length among approximate cycles
/ RTI >

The method of claim 3,
If the length is m

Of the character string (T)

- Approximate cycle or

Calculating the approximate period comprises:
Substrings generated when the string T (| T | = n) is divided by the length m are called periodic blocks,

The k-th block period, starting from when said T _k,
remind

- calculating the approximate cycle
Lt; / RTI >
The ComputeDeltaAP function, given the length m of the string (T) and the approximate period,

)Wow

- Approximate period calculation method for calculating approximate period.

The method of claim 3,
(T) having the length m (1)

- Approximate cycle or

The k-th block period, starting from when said T _k,
remind

- calculating the approximate cycle
Lt; / RTI >
The ComputeGammaAP function, when given the length m of the approximate period and the string T,

)Wow

- Approximate cycle calculation method for calculating approximate cycle.

6. The method of claim 5,
The median may be expressed as t integers sorted in ascending order, when t is odd

The second value, t / 2-th and (t / 2) cycle method, the approximate arithmetic average of the calculated second value +1 when t is an even one.