KR101714270B1

KR101714270B1 - Xml schema transformation method and device

Info

Publication number: KR101714270B1
Application number: KR1020150175776A
Authority: KR
Inventors: 한요섭; 고상기; 김근해
Original assignee: 연세대학교 산학협력단
Priority date: 2015-12-10
Filing date: 2015-12-10
Publication date: 2017-03-08

Abstract

Disclosed are a method and an apparatus for XML schema transformation which transform a normalized regular hedge grammar (NRHG) expression into a relax NG schema. The XML schema transformation method comprises the steps of: generating at least one automata by using the NRHG expression; generating a regular expression on the automata by using a state transition condition of the automata; and transforming the regular expression into a relax NG schema by using a schema language corresponding to the regular expression.

Description

TECHNICAL FIELD [0001] The present invention relates to an XML Schema conversion method and apparatus,

본 발명은 XML 스키마 변환 방법 및 장치에 관한 것으로서, 보다 상세하게는 정규화된 정규 헷지 문법 표현식을 Relax NG 스키마로 변환하는 XML 스키마 변환 방법 및 장치에 관한 것이다. The present invention relates to an XML schema conversion method and apparatus, and more particularly, to an XML schema conversion method and apparatus for converting a normalized regular hedge grammar expression into a Relax NG schema.

정규 헷지 문법(Regular hedge grammar, RHG)는 트리 형식 언어인 정규 트리 언어를 표현하기 위한 표현 문법의 하나이다. 특히, 정규화된 정규 헷지 문법(Normalized RHG, NRHG)은, 정규 헷지 문법과 비교하여, 보다 제한된 문법을 이용함에도 정규 헷지 문법과 같은 표현 능력을 갖고 있기 때문에, 정규 트리 언어와 관련된 컴퓨터 이론 문제에서 많이 언급되는 형태이다.Regular hedge grammar (RHG) is one of the expression grammars for expressing regular tree language which is a tree type language. In particular, the normalized RHG (Normalized RHG) has a similar ability to express the regular hedge grammar, even though it uses a more limited grammar compared to the regular hedge grammar. It is a form mentioned.

정규 트리 언어의 대표적 응용 사례는 Relax NG(REgular LAnguage for XML Next Generation)라는 XML 스키마 언어이다. Relax NG 스키마 언어는 국제 표준의 하나(ISO/IEC 19757-2)로서, 다른 스키마 언어에 비해 상대적으로 간단하며, XML 신택스(syntax)를 지원한다.A typical application example of the regular tree language is an XML schema language called Relax NG (REGULAR LAnguage for XML Next Generation). The Relax NG schema language is one of the international standards (ISO / IEC 19757-2), relatively simple compared to other schema languages, and supports XML syntax.

도 1은 정규화된 정규 헷지 문법을 설명하기 위한 도면이다.1 is a diagram for explaining a normalized normalized hedging grammar.

도 1에 도시된 3개의 트리 구조 데이터는 [수학식 1]과 같이, 정규화된 정규 헷지 문법 표현식으로 표현될 수 있다.The three tree structure data shown in FIG. 1 can be expressed by a normalized regular hedge grammar expression as in Equation (1).

정규화된 정규 헷지 문법은 5개의 튜플

과 4개의 생성 규칙으로 정의될 수 있다.

는 단말(terminal)의 집합이며,

는 트리 변수의 집합이다.

는 포레스트 변수(forest variable)의 집합이며, P는 생성 규칙의 집합이다. 마지막으로 s는 스타팅 심볼(starting symbo)로 트리 변수 집합에 속한다.The normalized normalized hedging grammar consists of five tuples

And four production rules.

Is a set of terminals,

Is a set of tree variables.

Is a set of forest variables, and P is a set of production rules. Finally, s belongs to a set of tree variables as the starting symbol (starting symbo).

4개의 생성 규칙은 [표 1]과 같다. [표 1]에서의 노드는 트리의 노드에 대응된다. 포레스트 변수는 노드 사이를 연결하는 역할을 한다.The four generation rules are shown in [Table 1]. The nodes in [Table 1] correspond to nodes in the tree. Forest variables serve to connect nodes.

Tree variable (

Form a terminal node x.

The tree variable forms the non-end node (a), the forest variable (

).

A forest variable creates a tree variable.

If the forest variable is a forest variable that differs from the tree variable

).

다시, 도 1로 돌아와, 제1트리(Tree 1)와 [수학식 1]의 NRHG와의 관계를 설명하면, 트리 변수(T₀)가 비단말 노드인 records 노드를 형성하고, 포레스트 변수(F₀)를 형성(

)한다. 그리고 포레스트 변수(F₀)는 트리 변수(T₁)을 생성(

)하고, 트리 변수(T₁)이 car 노드와 포레스트 변수(F₁)을 형성(

)한다. 그리고 포레스트 변수(F₁)는 트리 변수(T₂)를 생성하고, 포레스트 변수(F₂)를 생성(

)한다. 그리고 포레스트 변수(F₂)는 트리 변수(T₃)를 생성(

)한다.Referring back to FIG. 1, the relationship between the first tree (Tree 1) and the NRHG in [Equation 1] is described. A records node in which the tree variable T ₀ is a non-end node is formed and a forest variable F ₀ )

)do. And the forest variable (F ₀ ) generates the tree variable (T ₁ )

), And the tree variable (T ₁ ) forms the car node and the forest variable (F ₁ )

)do. And the forest variable F ₁ generates the tree variable T ₂ and generates the forest variable F ₂

)do. And the forest variable (F ₂ ) generates the tree variable (T ₃ )

)do.

트리 변수(T₂, T₃)는 단말 노드 record 및 country를 형성(

,

)하고, 따라서 비단말 노드인 records 노드에 비단말 노드 car 노드가 연결되고, car 노드에 단말 노드인 country 노드 및 record 노드가 연결되는 제1트리의 구조가 [수학식 1]의 표현식에 의해 생성됨을 알 수 있다.The tree variables (T ₂ , T ₃ ) form the terminal node record and country

,

), So that the structure of the first tree in which the non-end node car node is connected to the non-end node record node, and the country node and the record node which are the end node are connected to the car node is generated by the expression of [Equation 1] .

데이터 구조 측면에서, 스키마 변환과 관련된 다양한 연구가 진행되고 있으며, 전술된 정규화된 정규 헷지 문법 표현식 역시 다른 스키마로 변환될 수 있다.In terms of data structure, various studies related to schema conversion are underway, and the above normalized normalized hedge grammar expressions can also be converted into other schemas.

관련 선행문헌으로 특허문헌 대한민국 공개특허 제2011-0081945호, 비특허문헌 "Classifying XML Documents Based on Structure/Content Similarity, Guangming Xing, Zhonghang Xia, INEX 2006 Workshop Pre-Proceedings"이 있다.Related publications are disclosed in Korean Patent Laid-Open Publication No. 2011-0081945, Non-Patent Document "Classifying XML Documents Based on Structure / Content Similarity, Guangming Xing, Zhonghang Xia, INEX 2006 Workshop Pre-Proceedings".

본 발명은 정규화된 정규 헷지 문법 표현식을 Relax NG 스키마로 변환하는 XML 스키마 변환 방법 및 장치를 제공하기 위한 것이다.The present invention is to provide an XML schema conversion method and apparatus for converting a normalized regular hedge grammar expression into a Relax NG schema.

상기한 목적을 달성하기 위해 본 발명의 일 실시예에 따르면, 정규화된 정규 헷지 문법(NRHG) 표현식을 이용하여, 적어도 하나 이상의 오토마타를 생성하는 단계; 상기 오토마타의 상태 전이 조건을 이용하여, 상기 오토마타에 대한 정규 표현식을 생성하는 단계; 및 상기 정규 표현식에 대응되는 스키마 언어를 이용하여, 상기 정규 표현식을 Relax NG 스키마로 변환하는 단계를 포함하는 XML 스키마 변환 방법을 제공한다.According to an embodiment of the present invention, there is provided a method for generating a plurality of automata, the method comprising: generating at least one automata using normalized normalized hedge grammar (NRHG) expressions; Generating a regular expression for the automata using the state transition condition of the automata; And transforming the regular expression into a Relax NG schema using a schema language corresponding to the regular expression.

또한 상기한 목적을 달성하기 위해 본 발명의 다른 실시예에 따르면, 정규화된 정규 헷지 문법 표현식을 이용하여, 적어도 하나 이상의 오토마타를 생성하는 오토마타 생성부; 상기 오토마타의 상태 전이 조건을 이용하여, 상기 오토마타에 대한 정규 표현식을 생성하는 정규 표현식 생성부; 및 상기 정규 표현식에 대응되는 스키마 언어를 이용하여, 상기 정규 표현식을 Relax NG 스키마로 변환하는 스키마 변환부를 포함하는 XML 스키마 변환 장치를 제공한다.According to another embodiment of the present invention, there is provided an automata generating unit for generating at least one automata using a normalized regular hedge grammar expression. A regular expression generating unit for generating a regular expression for the automata using the state transition condition of the automata; And a schema conversion unit for converting the regular expression into a Relax NG schema using a schema language corresponding to the regular expression.

본 발명에 따르면, 정규화된 정규 헷지 문법 표현식을 Relax NG 스키마로 변환할 수 있다.According to the present invention, a normalized regular hedge grammar expression can be converted to a Relax NG schema.

도 1은 정규화된 정규 헷지 문법을 설명하기 위한 도면이다.
도 2는 정규 표현식 및 오토마타를 설명하기 위한 도면이다.
도 3은 본 발명의 일실시예에 따른 XML 스키마 변환 장치를 설명하기 위한 도면이다.
도 4는 본 발명의 일실시예에 따른 XML 스키마 변환 방법의 흐름도를 도시한다.
도 5는 정규화된 정규 헷지 문법 표현식으로부터 생성된 오토마타를 도시하고 있다.
도 6은 변환된 Relax NG 스키마를 도시하고 있다.
도 7은 도 6과 비교하여 간소화된 Relax NG 스키마를 도시하고 있다.1 is a diagram for explaining a normalized normalized hedging grammar.
Fig. 2 is a diagram for explaining a regular expression and an automata.
3 is a diagram for explaining an XML schema conversion apparatus according to an embodiment of the present invention.
FIG. 4 shows a flowchart of an XML schema conversion method according to an embodiment of the present invention.
FIG. 5 illustrates an automaton generated from a normalized normalized hedge grammar expression.
Figure 6 shows the transformed Relax NG schema.
Figure 7 shows a simplified Relax NG schema compared to Figure 6.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

본 발명은 정규화된 정규 헷지 문법 표현식을 XML 스키마 중 하나인 Relax NG 스키마로 변환하는 방법 및 장치를 제공한다. 본 발명은 정규화된 정규 헷지 문법 표현식을 오토마타 및 정규 표현식으로 변환하고, 최종적으로 Relax NG 스키마를 생성한다.The present invention provides a method and apparatus for converting a normalized normalized hedge grammar expression to a Relax NG schema, one of the XML schemas. The present invention converts a normalized regular hedge grammar expression into an automata and regular expression, and finally generates a Relax NG schema.

먼저, 정규 표현식 및 오토마타에 대해 설명하고 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 보다 상세하게 설명한다.First, regular expressions and automata will be described and embodiments according to the present invention will be described in more detail with reference to the accompanying drawings.

도 2는 정규 표현식 및 오토마타를 설명하기 위한 도면이다.Fig. 2 is a diagram for explaining a regular expression and an automata.

정규 표현식(Regular Expression)은 각종 검색 엔진, 바이오 인포매틱스, 운영체제 명령어, 문서 편집기, 네트워크 보안 등과 같이 주로 텍스트 패턴 매칭에 활용되고 있으며, 검색할 패턴을 기술하는 표현식을 의미한다. 대부분의 어플리케이션은 정규 표현식을 활용하고 있으며, 학문적으로도 컴퓨터 과학의 거의 모든 분야에서 정규 표현식을 활용하고 있다. Regular expressions are used mainly in text pattern matching such as various search engines, bioinformatics, operating system commands, text editors, network security, etc., and express expressions describing patterns to be searched. Most applications use regular expressions, and academically, they use regular expressions in almost every area of computer science.

정규 표현식은 [표 2]와 같은 문법의 조합으로 정의되는 문자열의 집합을 의미한다. 일례로서, 정규 표현식은 "(fa|mo|b?o)ther"는 "father", "mother", "bother", "other"와 같이 표현될 수 있다.A regular expression is a set of strings defined by a combination of grammar like [Table 2]. As an example, the regular expression "(fa | mo | b? O) ther" can be expressed as "father", "mother", "bother", "other"

정규 표현식은 검색에 있어 대단히 강력한 도구로 활용되는데, 이는 특정 키워드가 아닌 다양한 패턴의 다중 검색어에 대해 효율적인 패턴 매칭을 수행하여 패턴 검색의 효율을 높일 수 있기 때문이다.Regular expressions are very powerful tools for searching because they can improve the efficiency of pattern searching by performing efficient pattern matching on multiple search terms of various patterns rather than specific keywords.

식expression 기능function 설명Explanation .. 문자text 1개의 문자와 일치한다. 단일행 모드에서는 새줄 문자를 제외한다.Matches one character. Single-line mode excludes newline characters. || 선택Selection 여러 식 중에서 하나를 선택한다. 예를 들어, "abc|adc"는 abc와 adc 문자열을 모두 포함한다.Select one of several expressions. For example, "abc | adc" contains both abc and adc strings. ^^ 부정denial 문자 클래스 안의 문자를 제외한 나머지를 선택한다. 예를 들면 [^abc]d는 ad, bd, cd는 포함하지 않고 ed, fd 등을 포함한다. [^a-z]는 알파벳 소문자로 시작하지 않는 모든 문자를 의미한다.Select the rest of the characters except for the characters in the character class. For example, [^ abc] d does not include ad, bd, or cd, but includes ed, fd, and so on. [^ a-z] means any character that does not begin with a lowercase alphabet. [][] 문자 클래스Character classes "["과 "]" 사이의 문자 중 하나를 선택한다. "?"를 여러 개 쓴 것과 같은 의미이다. 예를 들면 [abc]d는 ad, bd, cd를 뜻한다. 또한, "-" 기호와 함께 쓰면 범위를 지정할 수 있다. "[a-z]"는 a부터 z까지 중 하나, "[1-9]"는 1부터 9까지 중의 하나를 의미한다.Select one of the characters between "[" and "]". It is synonymous with the use of multiple "?" S. For example, [abc] d means ad, bd, cd. You can also specify a range with the "-" sign. "[a-z]" means one of a to z, and "[1-9]" means one of 1 to 9. ()() 하위식Sub-expression 여러 식을 하나로 묶을 수 있다. "abc|adc"와 "a(b|d)c"는 같은 의미를 가진다.Multiple expressions can be grouped together. "abc | adc" and "a (b | d) c" have the same meaning. ** 0회 이상0 times or more *가 표시된 문자를 0번 이상 반복한다. "a*b"는 "b", "ab", "aab", "aaab"를 포함한다.Repeat the character marked with * 0 or more times. "a * b" includes "b", "ab", "aab", and "aaab". ++ 1회 이상More than once ⁺가 표시된 문자를 1번 이상 반복한다. "a⁺b"는 "ab", "aab", "aaab"를 포함하지만 "b"는 포함하지 않는다.Repeat the character marked with ⁺ one or more times. "a ⁺ b" includes "ab", "aab", "aaab" but not "b" ?? 0 또는 1회0 or 1 time "a?b"는 "b", "ab"를 포함한다."a? b" includes "b" and "ab". {m}{m} m회m times "a{3}b"는 "aaab"만 포함한다."a {3} b" contains only "aaab". {m,}{m,} m회 이상More than m times "a{2,}b"는 "aab", "aaab", "aaaab"를 포함한다. "ab"는 포함되지 않는다."a {2,} b" includes "aab", "aaab", and "aaaab". "ab" is not included. {m,n}{m, n} m회 이상 n회 이하m times or more and n times or less "a{1,3}b"는 "ab", "aab", "aaab"를 포함하지만, "b"나 "aaaab"는 포함하지 않는다."a {1,3} b" includes "ab", "aab", "aaab" but not "b" or "aaaab" \\ 역참조Backreference 정규 표현식에서 하위식으로 묶인 부분의 문자열을 이후에 반복해서 매칭 수행Matches the string of sub-expressions enclosed by subexpressions in a regular expression

정규 표현식이 응용 프로그램에서 이용될 때, 일반적으로 비결정적 유한 오토마타(Nondeterministic Finite Automata, NFA)로 변환되어 사용된다. 도 2는 정규 표현식 "[ab]{3}|bad"에 대한 NFA를 도시하는 도면이다.When regular expressions are used in an application, they are usually converted to Nondeterministic Finite Automata (NFA). 2 is a diagram showing an NFA for the regular expression "[ab] {3} | bad ".

도 2에서, 숫자를 포함하는 원은 NFA의 상태(state)를 나타내고, 화살표에 병기된 문자는 상태 전이 조건을 나타낸다. 입력된 문자가 상태 전이 조건을 만족할 경우 화살표 방향으로 상태가 전이된다. 0이 표시된 원은 시작 상태를 나타내며, 두개의 동심원으로 이루어진 원은 최종 상태(final state)를 나타낸다. 시작 상태에서부터 출발하여, 상태가 전이(transition)된 결과 종료 상태에 도달하면 검색하고자 하는 문자열이 검색되었음을 의미한다.In FIG. 2, a circle containing a number indicates a state of the NFA, and a character indicated by an arrow indicates a state transition condition. If the input character satisfies the state transition condition, the state transitions in the arrow direction. A circle with zero indicates the starting state, and a circle with two concentric circles indicates the final state. Starting from the start state, when the state reaches the end state as a result of a transition, it means that the character string to be searched has been searched.

보다 구체적으로 설명하면, 시작 상태인 상태 0에서부터 시작하여 입력 문자열을 한 글자씩 읽어 들이면서 어떤 상태가 다른 상태로 이동하는지 여부를 검사하게 된다. 입력 스트링이 "gjekf3jmbab0d1f"인 경우, 첫 번째 문자는 "g"이지만, "g"가 입력되는 것에 대응하여 상태 0을 다른 상태로 이동시키는 전이는 없으므로, 계속하여 상태 0에 머무르게 된다. 두 번째 문자인 "j" 역시 해당하는 전이가 없으므로 상태 0에 머무르면서 그 다음 문자를 읽게 된다. 따라서, 이러한 경우에는, 상태 천이 여부 검사가 필요한 상태, 즉, 활성 상태는 상태 0으로서 총 1개가 된다.More specifically, starting from state 0, which is the starting state, the input character string is read one character at a time, and it is checked whether a state changes to another state. If the input string is "gjekf3jmbab0d1f ", the first character is" g ", but there is no transition to move state 0 to another state corresponding to input of "g " The second character, "j," will also stay in state 0 and read the next character because there is no corresponding transition. Therefore, in such a case, a state in which the state transition check is required, that is, the active state is a state 0, that is, a total of one.

이런 식으로 계속하여 상태 0에 머무르는 상황에서 "gjekf3jm"까지 입력되고 나면 그 다음에 입력되는 문자는 "b"가 되는데, 도 2의 NFA에는 "b"는 상태 0을 상태 1 및 상태 2로 이동시키는 상태 전이 조건이므로, 상태 0은 상태 1 및 상태 2로 각각 이동하게 된다. 그 다음에 입력되는 문자는 "a"이며, "a"는 활성 상태인 상태 0을 상태 1로 이동시키는 상태 전이 조건, 활성 상태인 상태 1을 상태 3으로 이동시키는 상태 전이 조건 및 활성 상태인 상태 2를 상태 4로 이동시키는 상태 전이 조건이므로, 상태 0, 상태 1 및 상태 2는 각각 상태 1, 상태 3 및 상태 4로 이동하게 된다.In this state, when "gjekf3jm" is input, the next character input is "b". In the NFA of FIG. 2, "b" moves state 0 from state 1 to state 2 State 0 is moved to state 1 and state 2, respectively. The next character to be entered is "a "," a "is a state transition condition that moves the active state 0 to state 1, a state transition condition that moves the active state 1 to state 3, 2 is changed to state 4, state 0, state 1 and state 2 are moved to state 1, state 3 and state 4, respectively.

이와 같이, 입력 문자열이 상태 전이 조건을 만족할 경우 상태가 전이되며, "bab"가 입력된 후 최종 상태로 전이되기 때문에, 입력 문자열에서 원하는 문자열 "bab"를 찾은 것으로 취급될 수 있다.Thus, when the input string meets the state transition condition, the state transitions, and since "bab" is input and transition to the final state, it can be regarded as finding the desired string "bab" in the input string.

도 3은 본 발명의 일실시예에 따른 XML 스키마 변환 장치를 설명하기 위한 도면이다.3 is a diagram for explaining an XML schema conversion apparatus according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 본 발명에 따른 XML 스키마 변환 장치는 프로세서를 포함하는 컴퓨터 또는 별도의 단말 장치일 수 있으며, 오토마타 생성부(310), 정규 표현식 생성부(320) 및 스키마 변환부(330)를 포함한다.3, the XML schema conversion apparatus according to the present invention may be a computer including a processor or a separate terminal apparatus, and may include an automata generating unit 310, a regular expression generating unit 320, and a schema converting unit 330).

오토마타 생성부(310)는 정규화된 정규 헷지 문법 표현식을 이용하여, 적어도 하나 이상의 오토마타를 생성한다. 정규화된 정규 헷지 문법 표현식에 따라서 한 개 이상의 오토마타가 생성될 수 있으며, 이 때 오토마타는 비결정적 유한 오토마타일 수 있다.The automata generating unit 310 generates at least one automata using the regularized regular hedge grammar expression. One or more automata may be generated according to a normalized regular hedge grammar expression, where the automata may be nondeterministic finite automata.

오토마타 생성부(310)는, 정규화된 정규 헷지 문법 표현식의 포레스트 변수를 상태로 이용하며, 포레스트 변수에 대한 트리 변수를 상태 전이 조건으로 이용하여, 오토마타를 생성할 수 있다.The automata generation unit 310 may use the forest variable of the normalized regular hedge grammar expression as a state and use the tree variable for the forest variable as the state transition condition to generate the automata.

정규 표현식 생성부(320)는 오토마타의 상태 전이 조건을 이용하여, 오토마타에 대한 정규 표현식을 생성한다. 오토마타 생성부(310)는, 기 생성된 오토마타에서, 시작 상태와 최종 상태의 사이에 위치하는 상태를 제거하여, 오토마타의 상태를 축소할 수 있는데, 정규 표현식 생성부(320)는 축소된 오토마타의 상태 전이 조건을 이용하여, 정규 표현식을 생성할 수 있다.The regular expression generating unit 320 generates a regular expression for the automata using the state transition condition of the automata. The automata generating unit 310 may reduce the state of the automata by removing a state located between the start state and the final state in the generated automata. The regular expression generating unit 320 generates a regular expression Using the state transition condition, a regular expression can be generated.

스키마 변환부(330)는 정규 표현식에 대응되는 스키마 언어를 이용하여, 정규 표현식을 Relax NG 스키마로 변환한다. 보다 구체적으로, 스키마 변환부(330)는 정규화된 정규 헷지 문법 표현식의 트리 변수를 Relax NG 스키마로 변환하면서, 정규 표현식을 함께 Relax NG 스키마로 변환할 수 있다.The schema conversion unit 330 converts the regular expression into a Relax NG schema using a schema language corresponding to the regular expression. More specifically, the schema conversion unit 330 may convert the regular expression into the Relax NG schema while converting the tree variable of the normalized regular hedge grammar expression into the Relax NG schema.

그리고 일실시예로서 스키마 변환부(330)는, 정규 표현식에서, 클리니 스타(kleene star) 연산자(*)가 표시된 문자의 앞 또는 뒤의 문자가, 클리니 스타가 표시된 문자와 동일한 경우, 클리니 플러스(kleene plus) 연산자(⁺)를 이용하여 정규 표현식을 수정하고, 수정된 정규 표현식을 Relax NG 스키마로 변환할 수 있다.In one embodiment, the schema conversion unit 330 determines whether the character before or after the character indicated by the kleene star operator (*) in the regular expression is equal to the character indicated by the clinician You can use the kleene plus operator ( ⁺ ) to modify the regular expression and convert the modified regular expression to the Relax NG schema.

도 4 내지 도 7은 본 발명의 일실시예에 따른 XML 스키마 변환 방법을 설명하기 위한 도면으로서, 도 4는 본 발명의 일실시예에 따른 XML 스키마 변환 방법의 흐름도를 도시한다. 도 5는 정규화된 정규 헷지 문법 표현식으로부터 생성된 오토마타를 도시하고 있으며, 도 6은 변환된 Relax NG 스키마를 도시하고 있다. 도 7은 도 6과 비교하여 간소화된 Relax NG 스키마를 도시하고 있다.FIGS. 4 to 7 are diagrams for explaining an XML schema conversion method according to an embodiment of the present invention, and FIG. 4 shows a flowchart of an XML schema conversion method according to an embodiment of the present invention. Figure 5 shows the automata generated from the normalized normalized hedge grammar expression, and Figure 6 shows the translated Relax NG schema. Figure 7 shows a simplified Relax NG schema compared to Figure 6.

도 4 내지 도 7에서는, 본 발명에 따른 XML 스키마 변환 장치가, 도 1에서 설명된 정규화된 정규 헷지 문법 표현식을 Relax NG 스키마로 변환하는 방법이 일실시예로서 설명된다.4 to 7, a method for converting an XML schema conversion apparatus according to the present invention to a Relax NG schema, which is a normalized normalized hedge grammar expression described in FIG. 1, is described as one embodiment.

본 발명에 따른 XML 스키마 변환 장치는 정규화된 정규 헷지 문법 표현식을 이용하여, 적어도 하나 이상의 오토마타를 생성(S410)하며, 일실시예로서, 도 5와 같은 오토마타(NFA)를 생성할 수 있다. 정규화된 정규 헷지 문법 표현식에서의 포레스트 변수는 오토마타의 상태에 대응되며, 포레스트 변수에 대한 트리 변수는 상태 전이 조건에 대응된다.The XML schema conversion apparatus according to the present invention generates at least one automata using the normalized regular hedge grammar expression (S410), and may generate an automata (NFA) as shown in FIG. 5 as an example. The forest variable in the normalized regular hedge grammar expression corresponds to the state of the automata, and the tree variable for the forest variable corresponds to the state transition condition.

예를 들어, [수학식 1]의 생성 규칙 집합(P)에서는 2개의 오토마타가 생성될 수 있는데,

및

규칙에 의해 도 5(a)와 같은 오토마타가 생성될 수 있으며,

및

에 의해 도 5(b)와 같은 오토마타가 생성될 수 있다. 즉, 포레스트 변수의 개수가 3개이므로 3개의 상태가 생성되는데, 포레스트 변수 간 관계에 따라서 상태는 연결되거나 연결되지 않을 수 있다. For example, in the generation rule set P of [Equation 1], two automata may be generated,

And

The automata as shown in FIG. 5 (a) can be generated by the rule,

And

The automata shown in Fig. 5 (b) can be generated. That is, since there are three forest variables, three states are created. Depending on the relationship between the forest variables, the states may or may not be connected.

그리고 XML 스키마 변환 장치는 오토마타의 상태 전이 조건을 이용하여, 오토마타에 대한 정규 표현식을 생성(S420)하는데, 오토마타의 시작 상태에서 최종 상태에 도달할 때까지 설정된 상태 전이 조건을 이용하여, 정규 표현식을 생성할 수 있다.Then, the XML schema conversion apparatus generates a regular expression for the automata using the state transition condition of the automata (S420). Using the state transition condition set from the start state to the final state of the automata, the regular expression Can be generated.

도 5(a)의 오토마타에 의해 정규 표현식 T₁*T₁이 생성되며, 도 5(b)의 오토마타에 의해 정규 표현식 T₂T₃가 생성된다. 이 때, XML 스키마 변환 장치는 오토마타의 상태를 축소시켜 정규 표현식을 생성할 수 있으며, 도 5(b)의 오토마타의 경우, 상태변수 F₂에 대한 상태가 제거되고 시작 상태와 최종 상태만으로 표현될 수 있다.The regular expression T ₁ * T ₁ is generated by the automata of FIG. 5 (a), and the regular expression T ₂ T ₃ is generated by the automata of FIG. 5 (b). In this case, the XML schema transformation apparatus can generate a regular expression by reducing the state of the automata. In the case of the automata of FIG. 5 (b), the state for the state variable F ₂ is removed and represented by only the start state and the final state .

그리고 XML 스키마 변환 장치는 정규 표현식에 대응되는 스키마 언어를 이용하여, 정규 표현식을 Relax NG 스키마로 변환(S430)한다. 도 6(a)에 도시된, 정규화된 정규 헷지 문법 표현식의 트리 변수에 대한 Relax NG 스키마와 함께, 도 6(b)에 도시된 단계 S420에서 생성된 정규 표현식에 대한 Relax NG 스키마가 생성될 수 있다. Then, the XML schema conversion apparatus converts the regular expression into a Relax NG schema (S430) using a schema language corresponding to the regular expression. A Relax NG schema for the regular expression generated in step S420 shown in FIG. 6 (b) can be generated together with the Relax NG schema for the tree variable of the normalized regular hedge grammar expression shown in FIG. 6 (a) have.

도 6에서 박스(610)는 정규 표현식 T₁*T₁에 대한 Relax NG 스키마를 나타내며, zeroOrMore는 클리니 스타 연산자에 대응되는 스키마 언어를 나타낸다. 그리고 도 6에서 박스(620)는 정규 표현식 T₂T₃에 대한 Relax NG 스키마를 나타낸다.Box 610 in FIG. 6 represents the Relax NG schema for the regular expression T ₁ * T ₁ , and zeroOrMore represents the schema language corresponding to the clinician operator. And box 620 in FIG. 6 shows the Relax NG schema for the regular expression T ₂ T ₃ .

한편, 전술된 바와 같이, 클리니 스타 연산자는 0번 이상 반복됨을 나타내는데, 클리니 스타 연산자가 표시된 문자(A)의 앞 또는 뒤의 문자가, 클리니 스타가 표시된 문자와 동일한 경우, 즉 A가 중복되는 경우에는 1회 이상 반복되므로 클리니 플러스 연산자로 대체될 수 있다. 예를 들어, 전술된 정규 표현식 T₁*T₁은 클리니 스타 연산자가 표시된 문자 T₁뒤에 동일한 T₁이 위치함으로, T₁ ⁺로 대체될 수 있다.On the other hand, as described above, the clinician operator repeats zero or more times. If the character before or after the character (A) marked with the clinician operator is the same as the character displayed by the clinician, that is, A In case of duplication, it is repeated more than once, so it can be replaced by the clinic plus operator. For example, the regular expression T ₁ * T ₁ described above can be replaced by T ₁ ⁺ , since the same T ₁ is located after the character T ₁ marked with the clinician operator.

클리니 플러스 연산자에 대응되는 스키마 언어는 oneOrMore이며, 따라서 박스(610)에 표시된 Relax NG 스키마는 다시 도 7과 같이 변환될 수 있다.The schema language corresponding to the Clini-Plus operator is oneOrMore, so the Relax NG schema displayed in box 610 can be transformed again as shown in FIG.

즉, XML 스키마 변환 장치는 정규 표현식에서, 클리니 스타 연산자가 표시된 문자의 앞 또는 뒤의 문자가, 클리니 스타가 표시된 문자와 동일한 경우, 클리니 플러스 연산자를 이용하여 정규 표현식을 수정하고, 수정된 정규 표현식을 상기 Relax NG 스키마로 변환할 수 있다.In other words, if the character before or after the character marked with the cleanest operator is the same as the character marked with the clinician, in the regular expression, the XML schema conversion device modifies the regular expression using the cleanup plus operator, Can be converted into the Relax NG schema.

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The above-described technical features may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific constituent elements, and limited embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

Generating at least one non-deterministic finite automata using a normalized normalized hedge grammar (NRHG) expression;
Generating a regular expression for the automata using the state transition condition of the automata; And
Converting the regular expression into a Relax NG schema using a schema language corresponding to the regular expression;
&Lt; / RTI >

The method according to claim 1,
The step of generating the automata
Using the forest variable of the normalized regular hedge grammar expression as a state and using the tree variable for the forest variable as the state transition condition to generate the automata
XML Schema conversion method.

3. The method of claim 2,
The step of generating a regular expression for the automata
The regular expression is generated using the state transition condition set from the start state of the automata until reaching the final state
XML Schema conversion method.

The method of claim 3,
The step of transforming the regular expression into a Relax NG schema
In the above regular expression, if the character before or after the character marked with the kleene star operator is the same as the character indicated by the clinician, the regular expression is calculated using the kleene plus operator Correcting; And
Converting the modified regular expression into the Relax NG schema
&Lt; / RTI >

An automata generating unit that generates at least one non-deterministic finite automata using the normalized regular hedge grammar expression;
A regular expression generating unit for generating a regular expression for the automata using the state transition condition of the automata; And
A schema conversion unit for converting the regular expression into a Relax NG schema using a schema language corresponding to the regular expression,
The XML schema conversion apparatus comprising:

6. The method of claim 5,
The automata generation unit
In the automata, a state located between a start state and a final state is removed to reduce the state of the automata,
The regular expression generating unit
And generates the regular expression using the state transition condition of the reduced automata
An XML schema conversion device.

The method according to claim 6,
The schema conversion unit
In the above regular expression, if the character before or after the character marked with the kleene star operator is the same as the character indicated by the clinician, the regular expression is calculated using the kleene plus operator Modify, and generate the Relax NG schema
An XML schema conversion device.