KR102056709B1

KR102056709B1 - Method and computer program for extracting table record from spread sheet

Info

Publication number: KR102056709B1
Application number: KR1020180046856A
Authority: KR
Inventors: 김선권
Original assignee: 한다시스템 주식회사
Priority date: 2017-04-25
Filing date: 2018-04-23
Publication date: 2019-12-17
Also published as: KR20180119501A

Abstract

스프레드 시트의 내용 중 테이블 레코드를 자동으로 추출하는 방법이 개시된다. 본 발명에 따른 추출 방법은, 스프레드 시트에서 데이터가 기재된 복수의 셀을 포함하는 직사각형의 테이블 영역을 검출하되, 서로 다른 위치에 존재하고 겹치지 않는 하나 이상의 테이블 영역을 검출하는 단계와; 검출된 테이블 영역에 대해 테이블 방향을 판단하는 단계와; 테이블 방향으로 해당 테이블 영역을 탐색하여 테이블의 컬럼명을 포함하는 컬럼 영역과 테이블 데이터를 포함하는 레코드 영역을 결정하고, 컬럼 영역의 구조를 분석하여 컬럼 레벨을 계산하는 단계와; 컬럼 레벨과 레코드 영역의 구조를 비교함으로써 매핑 정보를 생성하는 단계와; 생성된 매핑 정보를 이용하여 테이블 영역으로부터 컬럼명과 레코드를 추출하고 추출된 컬럼명과 레코드를 포함하는 테이블을 생성하는 단계;를 포함할 수 있다.A method of automatically extracting table records from the contents of a spreadsheet is disclosed. The extracting method according to the present invention comprises the steps of: detecting a rectangular table area comprising a plurality of cells in which data is described in a spreadsheet, but detecting at least one table area present at different positions and not overlapping; Determining a table direction with respect to the detected table area; Searching the table area in the table direction to determine a column area including a column name of the table and a record area including table data, and calculating column levels by analyzing the structure of the column area; Generating mapping information by comparing the structure of the column level and the record area; And extracting column names and records from the table area by using the generated mapping information and generating a table including the extracted column names and records.

Description

How to extract table records from a spreadsheet and a computer program for executing them {Method and computer program for extracting table record from spread sheet}

본 발명은 스프레드 시트의 내용 중 테이블 레코드를 자동으로 추출하는 방법 및 이 방법을 실행하기 위한 컴퓨터 프로그램에 대한 것이다.The present invention relates to a method of automatically extracting a table record from the contents of a spreadsheet and a computer program for executing the method.

기존에 엑셀과 같은 스프레드 시트를 이용하여 작성하고 수집한 데이터를 데이터베이스로 옮기는 작업을 수행하기 위해서는 CSV 파일로 엑스포트한 후 동일한 필드를 기준으로 데이터베이스에 통합하는 과정을 거쳐야 했다. In order to move data created and collected by using a spreadsheet such as Excel to a database, it had to be exported to a CSV file and integrated into the database based on the same field.

특히, 하나의 스프레드 시트에 여러 개의 테이블이 복잡하고 불규칙하게 들어 있는 경우 수정 없이 CSV 파일로 바로 엑스포트하는 것도 곤란하였다. In particular, it was difficult to directly export a CSV file without any modification when several tables were complicated and irregular in one spreadsheet.

따라서, 스프레드 시트의 데이터를 수정하거나 CSV 파일로 엑스포트한 후 데이터베이스로 통합하는데 필요한 번거로운 과정 없이 자동으로 스프레드 시트로부터 테이블 레코드를 추출하는 기술의 개발이 필요한 실정이다.Therefore, there is a need to develop a technology for automatically extracting table records from a spreadsheet without the troublesome process of modifying the data of the spreadsheet or exporting the CSV file and integrating it into a database.

본 발명은 스프레드 시트의 내용을 분석하여 통합할 테이블 영역을 자동으로 인식하고, 인식된 테이블 영역으로부터 컬럼명과 데이터 레코드를 추출하여 테이블로 생성하는 테이블 레코드 추출 방법 및 이를 위한 편리하고 직관적인 사용자 인터페이스를 제공하기 위한 것이다.The present invention provides a table record extraction method for automatically recognizing a table area to be integrated by analyzing the contents of a spreadsheet, extracting column names and data records from the recognized table area, and creating a table, and a convenient and intuitive user interface therefor. It is to provide.

본 발명의 일 실시예에 따른 스프레드 시트에서 테이블 레코드를 추출하는 방법은, 컴퓨터가, 스프레드 시트에서 데이터가 기재된 복수의 셀을 포함하는 직사각형의 테이블 영역을 검출하되, 서로 다른 위치에 존재하고 겹치지 않는 하나 이상의 테이블 영역을 검출하는 단계와; 상기 검출된 테이블 영역에 대해 테이블 방향을 판단하는 단계와; 상기 테이블 방향으로 해당 테이블 영역을 탐색하여 테이블의 컬럼명을 포함하는 컬럼 영역과 테이블 데이터를 포함하는 레코드 영역을 결정하고, 상기 컬럼 영역의 구조를 분석하여 컬럼 레벨을 계산하는 단계와; 상기 컬럼 레벨과 상기 레코드 영역의 구조를 비교함으로써 매핑 정보를 생성하는 단계와; 상기 생성된 매핑 정보를 이용하여 상기 테이블 영역으로부터 컬럼명과 레코드를 추출하고 추출된 컬럼명과 레코드를 포함하는 테이블을 생성하는 단계;를 포함할 수 있다.In a method of extracting a table record from a spreadsheet according to an embodiment of the present invention, a computer detects a rectangular table area including a plurality of cells in which data is written in the spreadsheet, but exists at different positions and does not overlap each other. Detecting at least one table area; Determining a table direction with respect to the detected table area; Searching the table area in the table direction to determine a column area including a column name of a table and a record area including table data, and calculating a column level by analyzing a structure of the column area; Generating mapping information by comparing the column level with the structure of the record area; And extracting column names and records from the table area using the generated mapping information and generating a table including the extracted column names and records.

상기 컬럼 레벨을 계산하는 단계는, 컬럼 레벨값을 초기화하는 단계와; 상기 테이블 방향으로 해당 테이블 영역을 탐색하되 컬럼 영역이 지속되는 동안 컬럼 레벨을 증가하는 단계와; 상기 컬럼 영역이 더이상 지속되지 않으면 현재의 컬럼 레벨값을 저장하는 단계;를 포함할 수 있다.The calculating of the column level may include initializing a column level value; Searching the table area in the table direction and increasing the column level while the column area is continued; And storing the current column level value when the column area no longer lasts.

상기 매핑 정보를 생성하는 단계는, 상기 레코드 영역이 상기 컬럼 영역의 컬럼 레벨에 대응하는 구조를 가진 경우, 상기 컬럼 영역에서 추출한 컬럼명과 상기 레코드 영역에서 추출한 레코드를 일대일로 매핑하도록 지시하는 매핑 정보를 생성하는 단계와; 상기 컬럼 영역의 컬럼 레벨이 2 이상인 다중행 컬럼이고 대응하는 레코드 영역이 단일행 구조인 경우, 상기 다중행 컬럼의 데이터를 결합하여 컬럼명을 생성한 후 상기 레코드 영역에서 추출한 레코드를 매핑하도록 지시하는 매핑 정보를 생성하는 단계;를 포함할 수 있다.In the generating of the mapping information, when the record area has a structure corresponding to the column level of the column area, mapping information indicating to map one-to-one mapping between the column name extracted from the column area and the record extracted from the record area is provided. Generating; In the case where the column level of the column area is two or more columns and the corresponding record area is a single row structure, instructing to map the records extracted from the record area after generating column names by combining data of the multi-row column. Generating mapping information.

본 발명에 따르면 통합할 스프레드 시트를 선택하는 것만으로도 자동으로 레테이블 레코드를 추출하여 테이블을 생성할 수 있다.According to the present invention, a table can be generated by automatically extracting a record table by simply selecting a spreadsheet to be merged.

또한, 본 발명에 따르면 다양한 유형의 데이터와 테이블을 포함하는 복잡한 스프레드 시트에서 테이블 데이터에 해당하는 것만을 자동으로 판별하여 추출하여 주므로 편리하게 데이터베이스로의 통합을 수행할 수 있다.In addition, according to the present invention, since it automatically determines and extracts only the data corresponding to the table data from the complex spreadsheet including various types of data and the table, the integration into the database can be performed conveniently.

도 1은 본 발명의 일 실시예에 따라 스프레드 시트에서 테이블 레코드를 추출하는 방법을 설명하기 위한 순서도이다.
도 2는 본 발명의 일 실시예에 따라 테이블 영역을 추출하는 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따라 서로 다른 구조와 방향을 가진 복수의 테이블 영역에서 테이블 방향을 판단하는 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따라 컬럼 영역의 컬럼 레벨을 추정하는 방법을 설명하기 위한 순서도이다.
도 5 및 도 6은 스프레드 시트에서 컬럼 영역의 컬럼 레벨을 추출하고 그에 따라 매핑 정보를 생성하는 다양한 실시예를 설명하기 위한 도면이다.
도 7은 도 6의 테이블 영역으로부터 추출한 테이블을 설명하기 위한 도면이다.
도 8은 스프레드 시트에서 테이블 레코드를 추출하는 방법을 수행하기 위한 사용자 인터페이스의 예를 도시한 것이다.1 is a flowchart illustrating a method of extracting a table record from a spreadsheet according to an embodiment of the present invention.
2 is a diagram illustrating a method of extracting a table area according to an embodiment of the present invention.
3 is a view for explaining a method of determining a table direction in a plurality of table areas having different structures and directions according to an embodiment of the present invention.
4 is a flowchart illustrating a method of estimating a column level of a column area according to an embodiment of the present invention.
5 and 6 are diagrams for describing various embodiments of extracting a column level of a column region from a spreadsheet and generating mapping information accordingly.
FIG. 7 is a diagram for explaining a table extracted from the table area of FIG. 6.
8 illustrates an example of a user interface for performing a method of extracting table records from a spreadsheet.

이하에서는 본 발명의 바람직한 실시예 및 첨부하는 도면을 참조하여 본 발명을 상세히 설명하되, 도면의 동일한 참조부호는 동일한 구성요소를 지칭함을 전제하여 설명하기로 한다.Hereinafter, with reference to the preferred embodiments of the present invention and the accompanying drawings will be described in detail, the same reference numerals in the drawings will be described on the assumption that the same components.

발명의 상세한 설명 또는 특허청구범위에서 어느 하나의 구성요소가 다른 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 당해 구성요소만으로 이루어지는 것으로 한정되어 해석되지 아니하며, 다른 구성요소들을 더 포함할 수 있는 것으로 이해되어야 한다.When any one element in the description or claims of the invention "includes" another element, unless otherwise stated, it is not limited to consisting only of that element, and other elements are not interpreted. It should be understood that it may include more.

또한, 발명의 상세한 설명 또는 특허청구범위에서 "~수단", "~부", "~모듈", "~블록"으로 명명된 구성요소들은 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이들 각각은 소프트웨어 또는 하드웨어, 또는 이들의 결합에 의하여 구현될 수 있다.Further, in the detailed description of the invention or in the claims, the elements designated as "~ means", "~ part", "~ module", and "~ block" mean a unit that processes at least one function or operation, Each of these may be implemented by software or hardware, or a combination thereof.

본 발명에 따른 스프레드 시트에서 테이블 레코드를 추출하는 방법은 컴퓨팅 장치에서 실행되는 컴퓨터 프로그램의 형태로 구현될 수 있다. 즉, 이하의 발명은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터 실행 가능 명령어의 일반적인 문맥으로 기술될 수 있다. 일반적으로, 프로그램 모듈들은 특정한 작업들을 수행하거나 특정한 추상 데이터 유형들을 구현하는 루틴(routine), 프로그램, 객체, 컴포넌트, 데이터 구조 등을 포함한다. The method of extracting table records from a spreadsheet according to the present invention may be implemented in the form of a computer program running on a computing device. That is, the following invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.

이하의 발명은 다양한 범용 또는 특수 목적 연산 시스템 환경에서 동작할 수 있다. 퍼스널 컴퓨터, 서버 컴퓨터, 휴대용 또는 랩탑 장치, 다중 프로세서, 프로그램 가능한 가전, 네트워크 PC 등이 본 발명을 구현하는데 사용될 수 있지만, 이것들로 한정되지 않는다. 또한, 기술된 실시예들은 통신 네트워크를 통하여 연결된 원격 프로세싱 장치들에 의해 작업들이 수행되는 분산 연산 환경에서도 실시될 수 있다. The invention that follows may operate in a variety of general purpose or special purpose computing system environments. Personal computers, server computers, portable or laptop devices, multiple processors, programmable consumer electronics, network PCs, and the like can be used to implement the present invention, but are not limited to these. The described embodiments can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

도 1은 본 발명의 일 실시예에 따라 스프레드 시트에서 테이블 데이터를 추출하는 방법을 설명하기 위한 순서도이다.1 is a flowchart illustrating a method of extracting table data from a spreadsheet according to an embodiment of the present invention.

단계 S10에서는, 스프레드 시트에서 데이터가 기재된 복수의 셀을 포함하는 직사각형의 테이블 영역을 검출한다. In step S10, a rectangular table area including a plurality of cells in which data is described in the spreadsheet is detected.

테이블 영역은 이러한 스프레드 시트 내에서 테두리가 선으로 둘러쌓인 직사각형 영역으로써 컬럼 영역과 레코드 영역으로 구성된다. 컬럼 영역은 직사각형의 상단 또는 좌측에 배치되어 있으며 컬럼명이 기재된 셀(들)의 집합이다. 레코드 영역은 컬럼 영역의 하단 또는 우측에 배치되어 있으며, 한쪽 방향으로 연이어진 셀에 컬럼별로 대체로 같은 포맷의 데이터가 들어 있는 영역이다. 각 테이블 영역은 스프레드 시트에서 서로 다른 위치에 존재하고 겹치지 않는 직사각형 모양의 영역이다. 테이블 영역 검출 모듈은 스프레드 시트 전체를 탐색하여 상기와 같은 조건을 만족하는 영역을 하나 이상 검출한다.The table area is a rectangular area surrounded by a line in this spreadsheet, which consists of a column area and a record area. The column region is arranged on the top or left side of the rectangle and is a collection of cell (s) with column names. The record area is located at the bottom or the right of the column area, and is an area in which data of the same format is generally included for each column in a cell connected in one direction. Each table area is a rectangular, non-overlapping area that exists at different locations in the spreadsheet. The table area detection module searches the entire spreadsheet to detect one or more areas that satisfy the above conditions.

입력되는 스프레드 시트는 하나 이상의 테이블 영역뿐만 아니라 문서의 제목, 그래프 등 다양한 정보를 포함할 수 있다. 단계 S10에서는 스프레드 시트의 내용 중에서 테이블 데이터로 변환할 수 있는 직사각형의 테이블 영역(들)만을 검출하며, 테이블 영역 검출 수단에 의해 자동으로 검출되는 테이블 영역에 대해 사용자의 컨펌을 받기 위한 사용자 인터페이스가 제공될 수 있다. The input spreadsheet may include not only one or more table areas but also various information such as a document title and a graph. In step S10, only the rectangular table area (s) that can be converted into table data is detected from the contents of the spreadsheet, and a user interface for receiving confirmation of the user is provided for the table area automatically detected by the table area detecting means. Can be.

도 2는 본 발명의 일 실시예에 따라 테이블 영역을 추출하는 방법을 설명하기 위한 도면이다.2 is a diagram illustrating a method of extracting a table area according to an embodiment of the present invention.

도 2를 참조하면, 스프레드 시트에서 데이터가 기재된 셀(들)로 구성되고 테두리가 선으로 둘러싸인 직사각형 영역으로 B2:B2(21), F2:F2(24), B3:C4(22), E3:F4(25), C5:C5(23), E5:E5(26)이 검출될 수 있다. 이들 직사각형 영역은 서로 다른 위치에 존재하며 겹치지 않는 면적으로 구분된다. 이들 중에서 컬럼 영역과 레코드 영역을 포함할 수 없는 1×1 크기의 영역들(21, 24, 23, 26)은 테이블 영역에서 제외될 것이다.Referring to FIG. 2, a rectangular area composed of cell (s) in which a data is described in a spreadsheet and surrounded by a line with borders B2: B2 (21), F2: F2 (24), B3: C4 (22), and E3: F4 (25), C5: C5 (23) and E5: E5 (26) can be detected. These rectangular areas exist at different locations and are separated by non-overlapping areas. Of these, 1x1 sized areas 21, 24, 23, and 26, which cannot include column and record areas, will be excluded from the table area.

본 발명의 일 실시예에 따르면, 테이블 영역 검출 모듈은 스프레드 시트를 탐색하여 테이블 영역을 검출하기 위해 아래와 같은 알고리즘으로 구현될 수 있다.According to an embodiment of the present invention, the table area detection module may be implemented with the following algorithm to detect a table area by searching a spreadsheet.

먼저, 영역의 탐색을 시작하여 직사각형 모양의 테이블 영역의 좌측 상단 셀을 탐색하는 알고리즘의 예이다.First, an example of an algorithm for searching for an upper left cell of a rectangular table area by starting to search an area.

다음은 영역의 탐색을 계속하여 직사각형 모양의 테이블 영역의 우측 상단 꼭지점을 확정하기 위한 알고리즘의 예이다.The following is an example of an algorithm for determining the top right corner of a rectangular table area by continuing to search the area.

void UpdateRT() //오른쪽 상단 꼭지점의 확정
{
if (this.LeftTopR == null) return;

var cell = this.LeftTopR;
Cell found = null;
while (cell != null)
{
if (cell.NoTopLine()) break;
if (cell.HasRightLine())
{
found = cell;
}
cell = cell.GetRight();
}

this.RightTop = found;
}void UpdateRT () // Confirm Top Right Vertex
{
if (this.LeftTopR == null) return;

var cell = this.LeftTopR;
Cell found = null;
while (cell! = null)
{
if (cell.NoTopLine ()) break;
if (cell.HasRightLine ())
{
found = cell;
}
cell = cell.GetRight ();
}

this.RightTop = found;
}

다음은 영역의 탐색을 계속하여 직사각형 모양의 테이블 영역의 우측 하단 꼭지점을 추정하기 위한 알고리즘의 예이다.The following is an example of an algorithm for estimating the lower right corner of a rectangular table area by continuing to search the area.

void UpdateRBCandidate() // 오른쪽 하단 꼭지점의 추정
{
if (this.RightTop == null) return;

var cell = this.RightTop.EndCellR;
Cell found = null;
while (cell != null)
{
if (cell.NoRightLine()) break;
if (cell.HasBottomLine())
{
found = cell;
//보다 큰 영역과 충돌하는 지 여부,
var check = cell.GetRight();
if (check != null && check.HasBottomLine())
{
break;
}
}
cell = cell.GetDown();
}
this.RightBottom = found;
}
void UpdateRBCandidate () // estimate bottom right corner
{
if (this.RightTop == null) return;

var cell = this.RightTop.EndCellR;
Cell found = null;
while (cell! = null)
{
if (cell.NoRightLine ()) break;
if (cell.HasBottomLine ())
{
found = cell;
// whether it collides with a larger area,
var check = cell.GetRight ();
if (check! = null && check.HasBottomLine ())
{
break;
}
}
cell = cell.GetDown ();
}
this.RightBottom = found;
}

다음은 영역의 탐색을 계속하여 직사각형 모양의 테이블 영역의 좌측 하단 꼭지점을 추정하기 위한 알고리즘의 예이다.The following is an example of an algorithm for estimating the lower left corner of a rectangular table area by continuing to search the area.

void UpdateLBCandidate()
{
if (this.LeftTopR == null) return;

var cell = this.LeftTopR;
Cell found = null;
while (cell != null)
{
if (cell.NoLeftLine()) break;
if (cell.HasBottomLine())
{
found = cell;
var check = cell.GetLeft();
if (check != null && check.HasBottomLine())
{
break;
}
}
cell = cell.GetDown();
}
this.LeftBottom = found;
}void UpdateLBCandidate ()
{
if (this.LeftTopR == null) return;

var cell = this.LeftTopR;
Cell found = null;
while (cell! = null)
{
if (cell.NoLeftLine ()) break;
if (cell.HasBottomLine ())
{
found = cell;
var check = cell.GetLeft ();
if (check! = null && check.HasBottomLine ())
{
break;
}
}
cell = cell.GetDown ();
}
this.LeftBottom = found;
}

다음은 직사각형 모양의 테이블 영역의 각 꼭지점을 확정하여 테이블 영역의 검출을 종료하는 알고리즘의 예이다.The following is an example of an algorithm that terminates the detection of the table area by determining each vertex of the rectangular table area.

void UpdateLBRBByAuto()
{
if (this.LeftBottom == null || this.RightBottom == null) return;

if (this.LeftBottom.EndRow < this.RightBottom.EndRow)
{
this.RightBottom = this.SheetR.Cells[this.LeftBottom.Row, RightBottom.Column];
}
if (this.LeftBottom.EndRow > this.RightBottom.EndRow)
{
this.LeftBottom = this.SheetR.Cells[this.RightBottom.Row, RightBottom.Column];
}

this.UpdateRangePositioin(new RangePosition(LT, RB))
;
this.RangeFindType = RangeFindType.Auto;
}void UpdateLBRBByAuto ()
{
if (this.LeftBottom == null this.RightBottom == null) return;

if (this.LeftBottom.EndRow <this.RightBottom.EndRow)
{
this.RightBottom = this.SheetR.Cells [this.LeftBottom.Row, RightBottom.Column];
}
if (this.LeftBottom.EndRow> this.RightBottom.EndRow)
{
this.LeftBottom = this.SheetR.Cells [this.RightBottom.Row, RightBottom.Column];
}

this.UpdateRangePositioin (new RangePosition (LT, RB))
;
this.RangeFindType = RangeFindType.Auto;
}

단계 S11에서는, 상기와 같이 검출된 테이블 영역에 대해 테이블 방향을 판단한다. 테이블 영역 내에 컬럼 영역이 상단에 있고 각 컬럼에 대응하는 레코드 데이터가 컬럼명 아래의 셀(들)에 기입되어 있으면 수직형 테이블로, 컬럼 영역이 좌측에 있고 각 컬럼에 대응하는 레코드 데이터가 컬럼명 우측의 셀(들)에 기입되어 있으면 수평형 테이블로 판단할 수 있다.In step S11, the table direction is determined for the table area detected as described above. In the table area, if the column area is at the top and the record data corresponding to each column is written in the cell (s) below the column name, it is a vertical table. The column data is at the left and the record data corresponding to each column is the column name. If written in the right cell (s), it can be determined as a horizontal table.

도 3은 본 발명의 일 실시예에 따라 서로 다른 구조와 방향을 가진 복수의 테이블 영역에서 테이블 방향을 판단하는 방법을 설명하기 위한 도면이다.3 is a view for explaining a method of determining a table direction in a plurality of table areas having different structures and directions according to an embodiment of the present invention.

도 3을 참조하면, 테이블 영역 검출 수단에 의해 [B4, E4, E10, B10]을 직사각형의 꼭지점으로 하는 제1 테이블 영역(31)과 [B12, G12, G15, B15]를 직사각형의 꼭지점으로 하는 제2 테이블 영역(32)이 검출될 것이다.Referring to Fig. 3, the table area detecting means makes the first table area 31 having [B4, E4, E10, B10] the rectangular vertices, and the [B12, G12, G15, B15] having the rectangular vertices. The second table area 32 will be detected.

제1 테이블 영역(31)의 경우, 문자열로 구성되는 컬럼명이 기입된 셀들이 테이블 상단에 배치되어 있고 동일한 유형의 레코드 데이터들이 5행과 10행 사이에 수직 방향으로 연이어 나열되어 있으므로 수직형 테이블로 판단될 수 있다. 제2 테이블 영역(32)의 경우, 문자열로 구성되는 컬럼명이 기입된 셀들이 테이블 좌측에 배치되어 있고 동일한 유형의 데이터들이 C열과 G열 사이에 수평 방향으로 연이어 나열되어 있으므로 수평형 테이블로 판단될 수 있다.In the case of the first table area 31, since the cells in which column names consisting of character strings are written are arranged at the top of the table, record data of the same type are arranged in a row in a vertical direction between 5 rows and 10 rows. Can be judged. In the case of the second table area 32, the cells in which column names consisting of character strings are written are arranged on the left side of the table, and data of the same type are listed in a row in the horizontal direction between columns C and G. Can be.

단계 S12에서는, 상기와 같이 판단된 테이블의 방향으로 테이블 영역을 탐색하여 컬럼 영역과 레코드 영역을 구분하고 컬럼 영역의 구조를 분석하여 컬럼의 레벨을 계산한다. In step S12, the table area is searched in the direction of the table determined as described above, the column area and the record area are divided, the structure of the column area is analyzed, and the column level is calculated.

컬럼 영역은 테이블에 포함된 하나 이상의 컬럼의 명칭을 포함하는 라벨 영역이며, 레코드 영역은 각 컬럼의 레코드 데이터가 기입되는 영역이다. 컬럼 영역의 컬럼 레벨은 컬럼 영역이 몇개의 행 또는 열로 구성되어 있는지를 의미하며 컬럼명이 포함된 행의 수 또는 열의 수를 탐색함으로써 계산할 수 있다. The column area is a label area including names of one or more columns included in the table, and the record area is an area in which record data of each column is written. The column level of the column area means how many rows or columns the column area is composed of and can be calculated by searching the number of rows or columns including the column name.

수직형 테이블의 경우 일반적으로 테이블 영역의 첫 행이 컬럼명을 포함하는 컬럼 영역이고 이때의 컬럼 레벨은 1이다. 그러나, 컬럼명이 2행 이상인 다중행 컬럼의 경우도 있으며 이때의 컬럼 레벨은 컬럼명이 포함된 행의 수로 결정된다.In the case of a vertical table, the first row of the table area is generally the column area including the column name, and the column level is 1. However, there are some cases where a multi-column column has two or more column names, and the column level is determined by the number of rows including the column name.

수평형 테이블의 경우 일반적으로 테이블 영역의 첫 열이 컬럼명을 포함하는 컬럼 영역이고 이때의 컬럼 레벨은 1이다. 그러나, 컬럼명이 2열 이상인 다중행 컬럼의 경우도 있으며, 이때의 컬럼 레벨은 컬럼명이 포함된 열의 수로 결정된다.In the case of a horizontal table, the first column of the table area is a column area including the column name, and the column level is 1 at this time. However, there may be a case of a multirow column having two or more column names, and the column level is determined by the number of columns including the column name.

도 3의 예에서, 제1 테이블 영역(31)의 컬럼 영역은 B4~E4의 셀 4개로 구성되며 레코드 영역은 B5~E10의 셀 24개로 구성되는 영역이다. 제2 테이블 영역(32)의 컬럼 영역은 B12~B15의 셀 4개로 구성되며 레코드 영역은 C12~G15의 셀 20개로 구성되는 영역이다. In the example of FIG. 3, the column area of the first table area 31 is composed of four cells B4 to E4 and the record area is an area composed of 24 cells of B5 to E10. The column area of the second table area 32 is composed of four cells B12 to B15, and the record area is composed of 20 cells C12 to G15.

제1 테이블 영역(31)의 컬럼 영역은 1행으로 구성되므로 컬럼 레벨이 1인 단일행 컬럼 영역이며, 제2 테이블 영역(32)의 컬럼 영역은 1열로 구성되므로 컬럼 레벨이 1인 단일행 컬럼 영역으로 판단될 것이다.Since the column area of the first table area 31 is composed of one row, it is a single row column area having a column level of 1, and the column area of the second table area 32 is composed of one column, so a single row column having a column level of 1. It will be judged as an area.

단계 S13에서는 계산된 컬럼 레벨과 레코드 영역의 구조를 비교함으로써 매핑 정보를 생성한다. In step S13, mapping information is generated by comparing the calculated column level with the structure of the record area.

매핑 정보는 각 컬럼의 컬럼명과 각 레코드의 값을 매핑하기 위한 정보이다. 레코드 영역이 컬럼 영역의 컬럼 레벨에 대응하는 구조를 가진 경우, 컬럼 영역에서 추출한 컬럼명과 레코드 영역에서 추출한 레코드를 일대일로 매핑하도록 지시하는 매핑 정보를 생성한다. Mapping information is information for mapping the column name of each column and the value of each record. When the record area has a structure corresponding to the column level of the column area, mapping information is generated to instruct one-to-one mapping of the column name extracted from the column area and the record extracted from the record area.

반면, 컬럼 영역의 컬럼 레벨이 2 이상인 다중행 컬럼이고 대응하는 레코드 영역이 단일행 구조인 경우, 다중행 컬럼의 데이터를 결합하여 컬럼명을 생성한 후 레코드 영역에서 추출한 레코드를 매핑하도록 지시하는 매핑 정보를 생성한다.On the other hand, if the column area of the column area is two or more columns and the corresponding record area is a single row structure, the mapping instructs to combine the data of the multi-row column to generate column names and then map the records extracted from the record area. Generate information.

도 3의 예에서, 제1 테이블 영역(31)은 수직형이고 컬럼 레벨이 1인 단일행 컬럼 영역을 포함하고 있으며 레코드 영역도 컬럼 레벨에 대응하는 구조를 가지고 있으므로, 컬럼 집합 (번호, 나이, 성별, 학교)에 대응하여 컬럼 영역 아래에 행 형태로 배치되는 레코드들(B5~E5, B6~E6, B7~E7, B8~E8, B9~E9, B10~E10)이 각각 매핑되도록 지시하는 정보가 매핑 정보로써 생성될 것이다.In the example of FIG. 3, since the first table area 31 includes a single row column area having a vertical shape and a column level of 1, and the record area also has a structure corresponding to the column level, the column set (number, age, Information indicating that records (B5 to E5, B6 to E6, B7 to E7, B8 to E8, B9 to E9, and B10 to E10) arranged under the column area corresponding to gender and school) are mapped. Will be generated as mapping information.

제2 테이블 영역(32)은 수평형이고 컬럼 레벨이 1인 단일행 컬럼 영역을 포함하고 있으며 레코드 영역도 컬럼 레벨에 대응하는 구조를 가지고 있으므로, 컬럼 집합(이름, 키, 몸무게, 기타)에 대응하여 컬럼 영역 우측에 열 형태로 배치된 레코드들(C12~C15, C12~C15, D12~D15, E12~E15, F12~F15, G12~G15)이 각각 매핑되도록 지시하는 정보가 매핑 정보로써 생성될 것이다.The second table area 32 includes a single-row column area having a horizontal type and a column level of 1, and the record area has a structure corresponding to the column level, so that it corresponds to a column set (name, key, weight, etc.). Information indicating that the records (C12 to C15, C12 to C15, D12 to D15, E12 to E15, F12 to F15, and G12 to G15) arranged in a column form on the right side of the column area are respectively generated as mapping information. will be.

단계 S14에서는, 생성된 매핑 정보를 이용하여 테이블 영역으로부터 컬럼명과 레코드를 추출하고 추출된 컬럼명과 레코드를 포함하는 테이블을 생성한다. In step S14, column names and records are extracted from the table area using the generated mapping information, and a table including the extracted column names and records is generated.

도 3의 스프레드 시트를 입력으로 하면, 제1 테이블 영역(31)과 제2 테이블 영역(21)에 대해 2개의 테이블이 생성될 것이다. 예를 들어, 제2 테이블 영역(32)에 대해서는 아래와 같은 테이블이 생성될 수 있다.With the spreadsheet of FIG. 3 as input, two tables will be created for the first table area 31 and the second table area 21. For example, the following table may be generated for the second table area 32.

이름name 키key 몸무게weight 기타Other AAAAAA 120120 3030 nullnull BBBBBB 130130 3434 nullnull CCCCCC 134134 3434 nullnull DDDDDD 132132 3232 nullnull EEEEEE 125125 4040 nullnull

다음은 테이블 영역으로부터 컬럼명을 추출하기 위한 알고리즘의 예이다.The following is an example of an algorithm for extracting column names from the table area.

다음은 레코드 영역에서 레코드를 추출하기 위한 알고리즘의 예이다.The following is an example of an algorithm for extracting records from a record area.

void UpdateRecords(DataTable dt)
{
var info = this.DetectInfo;
var rp = this.RangePositionR;
var off = info.ColumnLevel - 1;
if (info.TableDirections == TableDirections.Downward)
{
for (int i = rp.Row + off + 1; i <= rp.EndRow; i++)
{
this.AddRecord(dt, i, info.TableDirections);
}
}
if(info.TableDirections == TableDirections.RightHanded)
{
for (int i = rp.Col + off + 1; i <= rp.EndCol; i++)
{
this.AddRecord(dt, i, info.TableDirections);
}
}
}void UpdateRecords (DataTable dt)
{
var info = this.DetectInfo;
var rp = this.RangePositionR;
var off = info.ColumnLevel-1;
if (info.TableDirections == TableDirections.Downward)
{
for (int i = rp.Row + off + 1; i <= rp.EndRow; i ++)
{
this.AddRecord (dt, i, info.TableDirections);
}
}
if (info.TableDirections == TableDirections.RightHanded)
{
for (int i = rp.Col + off + 1; i <= rp.EndCol; i ++)
{
this.AddRecord (dt, i, info.TableDirections);
}
}
}

도 4는 본 발명의 일 실시예에 따라 컬럼 영역의 컬럼 레벨을 추정하는 방법을 설명하기 위한 순서도이다.4 is a flowchart illustrating a method of estimating a column level of a column area according to an embodiment of the present invention.

컬럼 레벨은 컬럼 영역의 구조를 나타내며 컬럼 영역이 1행 또는 1열로 구성된 경우는 컬럼 레벨값이 1로, 컬럼 영역이 2이상의 n행 또는 n열로 구성된 경우에는 컬럼 레벨값이 n이다.The column level represents the structure of the column region, and the column level value is 1 when the column region is composed of one row or one column, and the column level value is n when the column region is composed of two or more n rows or n columns.

도 4를 참조하면, 먼저 단계 S40에서 컬럼 레벨값이 1로 초기화된다.Referring to FIG. 4, first, the column level value is initialized to 1 in step S40.

단계 S41에서 테이블 방향이 수직형이면 단계 S43으로 진행하여 위에서 아래로 테이블 영역을 탐색하고, 테이블 방향이 수평형이면 단계 S42로 진행하여 왼쪽에서 오른쪽으로 테이블 영역을 탐색한다.If the table direction is vertical in step S41, the flow advances to step S43 to search the table area from top to bottom. If the table direction is horizontal, the flow advances to step S42 to search the table area from left to right.

단계 S44에서는 현재의 탐색 방향으로 컬럼 영역이 지속되는지 여부를 판단한다. 컬럼 영역이 더 있으면 레벨값을 1만큼 증가시키고(S45) 탐색을 계속한다. 컬럼 영역이 더 지속되지 않으면 현재의 레벨값을 컬럼 레벨로 저장하고 종료한다(S46).In step S44, it is determined whether the column area continues in the current search direction. If there are more column areas, the level value is increased by 1 (S45) and the search continues. If the column area does not last longer, the current level value is stored at the column level and ends (S46).

컬럼 영역의 지속 여부는, 이어지는 인접 셀에 컬럼명으로 사용할 셀값이 있는지, 아니면 컬럼 셀에 인접한 데이터 셀이 탐색되는지 여부에 따라 결정될 수 있다. 전자의 경우 컬럼 영역이 지속되는 것으로 판단하고, 후자의 경우는 컬럼 영역이 지속되지 않는 것으로 판단할 수 있다.The persistence of the column area may be determined according to whether there is a cell value to be used as the column name in a subsequent adjacent cell or whether a data cell adjacent to the column cell is searched. In the former case, it may be determined that the column region is continued, and in the latter case, the column region may not be maintained.

또한, 테이블 방향이 수직형이고 현재의 셀이 수평 통합 상태이면 레벨값을 증가한 후 컬럼 영역을 계속 탐색하고, 테이블 방향이 수평형이고 현재의 셀이 수직 통합 상태이면 레벨값을 증가한 후 컬럼 영역을 계속 탐색할 수 있다.Also, if the table direction is vertical and the current cell is horizontally integrated, increase the level value and continue to search the column area. If the table direction is horizontal and the current cell is vertically integrated, increase the level value and then open the column area. You can continue to explore.

도 5 및 도 6은 스프레드 시트에서 컬럼 영역의 컬럼 레벨을 추출하고 그에 따라 매핑 정보를 생성하는 다양한 실시예를 설명하기 위한 도면이다.5 and 6 are diagrams for describing various embodiments of extracting a column level of a column region from a spreadsheet and generating mapping information accordingly.

도 5를 참조하면, 스프레드 시트에서 검출되는 테이블 영역(51, 52, 53)은 3개이다. 각 테이블 영역의 탐색을 통해, 제1 테이블 영역(51)은 단일행 컬럼 영역을 가지는 수직형 테이블이고, 제2 테이블 영역(52)은 단일행 컬럼 영역을 가지는 수평형 테이블이며, 제3 테이블 영역(52)은 2행으로 구성되는 다중행 컬럼 영역을 가지는 수직형 테이블로서 레코드 영역은 다중행 컬럼 구조와 일치하지 않은 단일행 레코드를 가지고 있음이 검출될 것이다.Referring to FIG. 5, three table areas 51, 52, and 53 are detected in the spreadsheet. Through exploration of each table area, the first table area 51 is a vertical table having a single row column area, the second table area 52 is a horizontal table having a single row column area, and the third table area. Reference numeral 52 is a vertical table having a multi-row column area composed of two rows, and it will be detected that the record area has a single row record that does not match the multi-row column structure.

제1 및 제2 테이블 영역(51, 52)은 레코드 영역이 컬럼 영역의 컬럼 레벨에 대응하는 구조를 가지고 있으므로, 컬럼 영역에서 추출한 컬럼명과 레코드 영역에서 추출한 레코드를 일대일로 매핑하도록 지시하는 매핑 정보가 생성될 것이다. Since the record area has a structure corresponding to the column level of the column area, the first and second table areas 51 and 52 have mapping information indicating that the column names extracted from the column area and the records extracted from the record area are mapped one-to-one. Will be generated.

제1 테이블 영역(51)에서 컬럼명 勞務部에 대응하여 레코드 영역의 데이터 셀은 3개이므로 3개의 데이터를 결합하여 대응하는 레코드 데이터를 생성할 수 있다. 즉, (勞務部, 所屬課長) 컬럼들에 대응하는 제1 레코드로 (null/null/null, null)이 생성될 것이다.Since there are three data cells in the record area corresponding to the column names in the first table area 51, three data can be combined to generate corresponding record data. That is, (null / null / null, null) will be generated as the first record corresponding to the (勞務部, 所屬課長) columns.

한편, 제3 테이블 영역(53)은 컬럼 영역의 컬럼 레벨이 2인 다중행 컬럼인데대응하는 레코드 영역이 단일행 구조인 바, 다중행 컬럼의 데이터를 결합하여 컬럼명을 생성한 후 레코드 영역에서 추출한 레코드를 매핑하도록 지시하는 매핑 정보가 생성될 것이다. Meanwhile, the third table area 53 is a multi-row column having a column level of 2 in the column area. The corresponding record area has a single row structure. The column area is generated by combining data of the multi-row column to generate a column name. Mapping information will be generated to instruct to map the extracted records.

구체적으로, 제3 테이블 영역(53)에서 추출되어 생성되는 테이블은 아래와 같은 모습일 것이다. 컬럼 영역에서 다중행으로 구성된 부분의 컬럼명은 예를 들어, 摘要/年休, 摘要/缺勤, 摘要/遲刻, 摘要/早退, 摘要/出張 등과 같이 다중행 컬럼의 데이터를 결합하여 생성될 수 있다.In detail, the table extracted and generated in the third table area 53 may be as follows. The column name of the multi-row part in the column area may be generated by combining data of the multi-row column, for example, 摘要 / 年休, 摘要 / 缺勤, 摘要 / 遲刻, 摘要 / 早退, 摘要 / 出張. .

日日曜日曜日出社時刻出社時刻退社時刻退社時刻實勤務時間實勤務時間摘要/年休年休要 / 年休摘要/缺勤缺勤要 / 缺勤摘要/遲刻遲刻要 / 遲刻摘要/早退早退要 / 早退摘要/出張摘要 / 出張 2121 月月時分時分時分時分時分時分 00 00 00 nullnull nullnull 2222 火火時分時分時分時分時分時分 00 00 00 nullnull nullnull 2323 水Water 時分時分時分時分時分時分 1One nullnull 00 nullnull 00

도 6은 스프레드 시트에 포함된 데이터의 예로, 여기서 검출되는 테이블 영역은 2행으로 구성되는 다중행 컬럼 영역(61)과 일치하는 구조를 가지는 다중행 레코드 영역(62)을 포함한다. 즉, 컬럼 영역과 레코드 영역의 구조가 일치하므로 컬럼 영역(61)에서 추출한 컬럼명과 레코드 영역(62)에서 추출한 레코드를 일대일로 매핑하도록 지시하는 매핑 정보가 생성될 것이다.6 illustrates an example of data included in a spreadsheet, in which the table area detected includes a multi-row record area 62 having a structure that matches the multi-row column area 61 composed of two rows. That is, since the structure of the column area and the record area match, mapping information indicating that the column name extracted from the column area 61 and the record extracted from the record area 62 are mapped one-to-one will be generated.

도 6의 테이블 영역을 분석한 결과로 획득된 매핑 정보를 이용하여 도 7과 같은 테이블이 생성될 수 있다.A table as shown in FIG. 7 may be generated using mapping information obtained as a result of analyzing the table area of FIG. 6.

도 8은 스프레드 시트에서 테이블 레코드를 추출하는 방법을 수행하기 위한 사용자 인터페이스의 예를 도시한 것이다.8 illustrates an example of a user interface for performing a method of extracting table records from a spreadsheet.

도 8을 참조하면, 사용자 인터페이스는 사용자가 작업 대상이 되는 스프레드 시트 파일을 선택할 수 있는 영역, 선택된 스프레드 시트와 스프레드 시트 탐색 과정 및 결과를 보여주기 위한 영역, 시프레드 시트에서 테이블 레코드를 추출하여 생성한 테이블을 보여주기 위한 영역을 포함할 수 있다.Referring to FIG. 8, a user interface is generated by extracting a table record from an area in which a user can select a spreadsheet file to be worked on, an area for showing a selected spreadsheet and spreadsheet search process and results, and a spreadsheet It can contain an area for displaying a table.

이 예에서는, 테이블 영역 검출 모듈이 도 8에서 선택된 스프레드 시트로부터 3개의 테이블 영역을 검출하였으나 사용자의 조작에 의해 하나의 테이블 영역(70)만 테이블 레코드의 추출 대상으로 선택되었다. 테이블 영역(70)의 분석 결과 생성된 테이블(71)은 도시된 바와 같이 컬럼 영역의 다중행에 기입된 데이터를 결합하여 생성된 컬럼명을 일부 포함하고 있음을 알 수 있다.In this example, the table area detection module detected three table areas from the spreadsheet selected in FIG. 8, but only one table area 70 was selected as an extraction target of the table record by the user's operation. As shown in the drawing, the table 71 generated as a result of the analysis of the table area 70 includes some column names generated by combining data written in multiple rows of the column area.

한편, 본 발명에 의한 스프레드 시트에서 테이블 레코드를 추출하는 방법은 컴퓨터가 판독할 수 있는 명령어의 집합인 소프트웨어의 형태로 구현되어 기록 매체에 수록될 수 있다.Meanwhile, the method of extracting a table record from a spreadsheet according to the present invention may be implemented in the form of software, which is a set of computer readable instructions, and may be recorded on a recording medium.

이때, 기록매체는 컴퓨터에 의하여 읽을 수 있는 모든 종류의 매체를 포함할 수 있으며, 그 예로는 DVD-ROM, CD-ROM, 하드 디스크, USB 메모리, 플래쉬 메모리와 같은 유형물을 들 수 있다.In this case, the recording medium may include all types of media that can be read by a computer, and examples thereof include DVD-ROM, CD-ROM, hard disk, USB memory, and flash memory.

한편, 기록매체에 수록된다는 표현은 이와 같은 유형의 기록매체에 수록되는 경우는 물론, 무형의 반송파(Carrier Wave)의 형태로 통신회선을 통해 제공되는 경우를 포함한다.On the other hand, the expression recorded on the recording medium includes the case of being recorded on this type of recording medium, as well as the case provided through the communication line in the form of an intangible carrier wave (Carrier Wave).

이상 몇 가지의 실시예를 통해 본 발명의 기술적 사상을 살펴보았다.The technical spirit of the present invention has been described through several embodiments.

본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 상기 살펴본 실시예를 다양하게 변형하거나 변경할 수 있음은 자명하다. 또한, 비록 명시적으로 도시되거나 설명되지 아니하였다 하여도 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 본 발명에 의한 기술적 사상을 포함하는 다양한 형태의 변형을 할 수 있음은 자명하며, 이는 여전히 본 발명의 권리범위에 속한다. 첨부하는 도면을 참조하여 설명된 상기의 실시예들은 본 발명을 설명하기 위한 목적으로 기술된 것이며 본 발명의 권리범위는 이러한 실시예에 국한되지 아니한다.It will be apparent to those skilled in the art that various modifications or variations can be made to the embodiments described above from the description of the invention. In addition, even if not explicitly shown or described, those skilled in the art to which the present invention pertains various forms of modification including the technical idea according to the present invention from the description of the present invention. Is obvious and still belongs to the scope of the present invention. The above embodiments described with reference to the accompanying drawings are described for the purpose of illustrating the present invention, and the scope of the present invention is not limited to these embodiments.

Claims

A method of extracting table records from a spreadsheet to integrate data created or collected using a spreadsheet into a database,
Computer,
One or more table areas that detect a rectangular table area that contains a plurality of cells in which the data is described in the spreadsheet, but which exist at different locations and do not overlap-where the table area is a rectangle surrounded by lines within such a spreadsheet. Detecting an area comprising a column area and a record area;
Table direction for the detected table area-Here, in the definition of the table direction, if the column area in the table area is at the top and the record data corresponding to each column is written in the cell under the column name, the table is vertical. A definition, if the column area is on the left side and record data corresponding to each column is written in the cell on the right side of the column name;
Search the table area in the table direction to determine the column area including the column name of the table and the record area including the table data, and analyze the structure of the column area to determine the column level-here, the column level of the vertical table. Is determined by the number of rows including column names, and the column level of the horizontal table is determined by the number of columns including column names;
Generating mapping information by comparing the column level with the structure of the record area, wherein the mapping information is information for mapping the column name of each column and the value of each record;
Extracting column names and records from the table area by using the generated mapping information and generating a table including the extracted column names and records;
When the record area has a structure corresponding to the column level of the column area, the mapping information is generated to instruct to map 1: 1 the column name extracted from the column area and the record extracted from the record area.
When the column level of the column area is a multi-row column of 2 or more, mapping information for generating a column name by combining data of the multi-row column and then mapping the combined column name and a record extracted from the corresponding record area. Creates a,
And integrating the database into a vertical table form or a horizontal table form in correspondence with the mapping information.

The method of claim 1,
The step of calculating the column level,
Initializing the column level values;
Searching the table area in the table direction and increasing the column level while the column area is continued;
Storing the current column level value if the column area no longer persists.

delete

A computer program recorded on a recording medium for carrying out the method of extracting a table record from the spreadsheet of claim 1.