KR20170042145A - Apparatus and method for generating application and method for zing the same - Google Patents

Apparatus and method for generating application and method for zing the same Download PDF

Info

Publication number
KR20170042145A
KR20170042145A KR1020150141726A KR20150141726A KR20170042145A KR 20170042145 A KR20170042145 A KR 20170042145A KR 1020150141726 A KR1020150141726 A KR 1020150141726A KR 20150141726 A KR20150141726 A KR 20150141726A KR 20170042145 A KR20170042145 A KR 20170042145A
Authority
KR
South Korea
Prior art keywords
document
unit
distribution
pattern
sentence
Prior art date
Application number
KR1020150141726A
Other languages
Korean (ko)
Other versions
KR101740926B1 (en
Inventor
안정우
이종규
최보람
Original Assignee
주식회사 한글과컴퓨터
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 한글과컴퓨터 filed Critical 주식회사 한글과컴퓨터
Priority to KR1020150141726A priority Critical patent/KR101740926B1/en
Publication of KR20170042145A publication Critical patent/KR20170042145A/en
Application granted granted Critical
Publication of KR101740926B1 publication Critical patent/KR101740926B1/en

Links

Images

Classifications

    • G06F17/21
    • G06F17/211
    • G06F17/2705

Abstract

The present invention relates to an apparatus for reducing a document based on a document format and a method of summarizing the document using the same, capable of easily summarizing the document using an individual document, bibliography information of the document, and paragraph information of the document, such that the user can easily understand the content and the flow of the document. As described above, a method of summarizing a document using the same includes the steps of: loading a document to be summarized from a document loading unit; analyzing the number and the size of paragraphs in the document loaded by the document loading unit by a document pattern/bibliography distribution analyzing unit, analyzing the use frequency and the use distribution of a form included in the sentence, and detecting the distribution and the number of objects used in the document; determining the type of the document, which is analyzed by the document pattern/form distribution analyzing unit, by a document type analysis type determination unit; determining a summarization method based on the analysis type determined by the analysis type determination unit by a form/paragraph classification unit; selecting, by an important construction selection unit, an important construction from the sentence, an introduction, and the object of the document, which is designated and selected in the document pattern/form distribution analyzing unit, and marking the important construction; and generating, by an abstract document generation unit, an abstract document by using the sentence, the introduction, and the object of the document selected and marked by the important construction selection unit.

Description

Technical Field [0001] The present invention relates to a document-based document reduction device and a document summarizing method using the same,

The present invention relates to a document format-based document reduction, more particularly, to a document format that can easily understand a content and a flow of a document by summarizing the document using the document object, format information, and paragraph information And a document summarizing method using the same.

For the modern people, the web environment is one of the fastest and convenient way to get necessary information. As the spread of terminals such as PC, smart phone or tablet PC is spreading, The use of services is increasing.

However, the existing terminal does not provide a summary document of the original document, either providing the original document to the user, or storing the main part checked by the user in the original document separately and providing it to the user.

Therefore, when a document retrieved from a terminal corresponds to hundreds of pages, the user must read all the documents of several hundred pages in order to obtain desired information from the document.

In addition, there may be cases where the user does not have the desired information in the read document. Therefore, there is a need for an alternative that allows the user to grasp the contents of the document more quickly by providing the user with a summary document summarizing the original document.

Korean Patent No. 10-0435442 (June 10, 2004 Announcement) - Document summarization method and system

SUMMARY OF THE INVENTION The present invention has been made to solve the above problems of the prior art, and it is an object of the present invention to provide a document formatting system capable of easily summarizing a document using an object, format information, And a document summarizing method using the same.

According to an aspect of the present invention, there is provided an apparatus for reducing a document based on a document format, including: a document loading unit loading a document to be summarized; The pattern and form distribution of the retrieved document are analyzed, and the number of paragraphs and the size of the paragraph are analyzed. The frequency and distribution of the forms included in the sentence are analyzed, and the distribution and the number of the objects used in the document are grasped A document pattern / form distribution analyzer; An analysis type determining unit for determining a type of a document analyzed by the document pattern / style distribution analyzing unit; A form / paragraph classifier for classifying forms and paragraphs in the analysis type determined by the analysis type determination unit; An important syntax selection unit for selecting and marking a sentence, an outline, and an object of a document designated and selected by the document pattern / form distribution analyzer as an important sentence; And a summary document generation unit for generating a summary document using the sentences, outlines, and entities of the marked document selected and selected by the important phrase selection unit.

Here, the analysis of the pattern distribution of the document is to analyze the frequency and distribution of the form included in the sentence, and to designate the sentence as the selection subject according to the frequency of use of the words included in the form and the form information.

Also, the format information is at least one of bold, italic, color, and underline included in the word, and the object used in the document is to grasp the distribution and number of tables, pictures, and charts.

On the other hand, the type of document determined by the analysis type determination unit is characterized in that it is applied to a markup-based document including .hwp, doc and OWPML, OOXML, HTML and PDF for the sentence, outline and object of the document.

According to another aspect of the present invention, there is provided a document summarizing method using a document reduction device based on the document format of the present invention, comprising: loading a document to be summarized in a document loading section; The document pattern / form distribution analyzing unit analyzes the number of paragraphs and the size of paragraphs in the document loaded by the document loading unit, analyzes frequency and distribution of the forms included in the sentence, Analyzing a document pattern and a form distribution for grasping a distribution and a number; Determining in the type analysis type determining unit of the document analyzed by the document pattern / form distribution analyzing unit; Determining a summary method according to the analysis type determined by the analysis type determination unit in a form / paragraph classifier; Selecting and marking a sentence, an outline, and an object of a document designated and selected by the document pattern / form distribution analyzer as an important sentence in an important sentence selecting unit; And generating a summary document in the summary document generation unit using the sentences, outlines, and entities of the marked document selected and selected by the important phrase selection unit.

Here, the step of analyzing the pattern and the form distribution of the loaded document may include analyzing the paragraph structure, analyzing the form in the loaded document, analyzing the distribution of the object used in the document, And a step of recognizing the number of the image data.

In the step of analyzing the paragraph structure, the number of paragraphs and the size of the paragraphs are analyzed, and the corresponding paragraphs are reduced in units of sentences of a predetermined length according to the number and size of the analyzed paragraphs. If the contents are used and the contents are large, the sub-outline is reduced. When the outline is used less than the predetermined length, the analysis is performed so as to be the selection object with the first paragraph centering on the outline center.

Meanwhile, the step of determining by the type analysis type determining unit of the document analyzed by the document pattern / form distribution analyzing unit may include the steps of: .hwp, doc and OWPML for sentences, outlines and objects of the document designated and selected by the document pattern / , OOXML, HTML, and PDF, so that they can be applied to markup-based documents.

In addition, the step of determining the summary method in the form / paragraph classification section is characterized by classifying the form and the paragraph suitable for the type of document determined in the analysis type determination section.

Here, the non-marking area is set to be hidden in the step of selecting and marking the important syntax in the important syntax selection part.

The present invention has the following effects.

First, it is possible to summarize and organize long texts with only document structure and format, and users can quickly recognize and process large amounts of information.

Second, the user can easily understand the contents and the flow of the document by summarizing the document using the document object, format information, and paragraph information.

Third, it is applicable to markup based documents such as OWPML, OOXML, HTML, and PDF.

1 is a block diagram illustrating a document format-based document reduction apparatus according to the present invention.
FIG. 2 is a diagram illustrating an example of a summary document using a document format-based document reduction apparatus according to the present invention.
3 is a flowchart illustrating a document summarizing method using a document format-based document reduction apparatus according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

In addition, although the term used in the present invention is selected as a general term that is widely used at present, there are some terms selected arbitrarily by the applicant in a specific case. In this case, since the meaning is described in detail in the description of the relevant invention, It is to be understood that the present invention should be grasped as a meaning of a term that is not a name of the present invention. Further, in describing the embodiments, descriptions of technical contents which are well known in the technical field to which the present invention belongs and which are not directly related to the present invention will be omitted. This is for the sake of clarity of the present invention without omitting the unnecessary explanation.

FIG. 1 is a block diagram illustrating a document format-based document reduction apparatus according to the present invention, and FIG. 2 is a diagram illustrating an example of a summary document using a document format-based document reduction apparatus according to the present invention.

1, the document shaper-based document reduction apparatus according to the present invention includes a document loading unit 10, a document pattern / form distribution analyzer 20, an analysis type determination unit 30, a form / (40), an important syntax selection unit (50), and a summary document generation unit (60).

The document loading unit 10 loads the document to be summarized.

The document pattern / form distribution analyzing unit 20 analyzes the pattern and the form distribution of the loaded document. Here, the pattern of a document is a paragraph structure, which analyzes the number of paragraphs and the size of the paragraphs, and designates the paragraphs to be reduced in units of sentences of predetermined length according to the number and size of the analyzed paragraphs. It specifies that the outline is used heavily, and if there is a lot of content, the sub-outline is reduced. In addition, when the outline is used less, the center of the outline is selected as the selection target mainly in the first paragraph. For reference, reducing paragraphs by sentence can be determined by field experience or experimentation.

On the other hand, the document pattern / form distribution analyzer 20 analyzes the format of the document, and analyzes the frequency and distribution of the format included in the sentence. In this case, if the form is used frequently, the sentence is designated as the selection target according to the frequency of use of the words included in the form and the form information. Here, format information includes bold, italic, color, and underline.

Also, the document pattern / form distribution analyzing unit 20 grasps the distribution and the number of the objects (tables, figures, and charts) used in the document. At this time, in the case of a document created by an entity, it is designated to be selected according to the format or paragraph structure.

The analysis type determining unit 30 determines the type of the document analyzed by the document pattern / form distribution analyzer 20. Such a determination can be applied to markup-based documents such as .hwp, doc, OWPML, OOXML, HTML, and PDF for the sentences, outlines, and objects of the documents designated and selected by the document pattern / And is assigned to the corresponding type.

The form / paragraph classifier (40) classifies forms and paragraphs in the analysis type determined by the analysis type determination unit (30).

The important syntax selection unit 50 selects and marks important sentences, outlines, and entities of the document designated and selected by the document pattern / form distribution analyzer 20 as important sentences.

The summary document generation unit 60 generates a summary document using the sentences, outlines, and objects of the marked document selected and selected by the important phrase selection unit 50. [

Meanwhile, an example of the document summarized using the document reduction apparatus based on the document format of the present invention can be configured as shown in FIG.

For example, at the top of the summary document, the title of the document is displayed, and the important sentences and figures or tables of the document are summarized.

3 is a flowchart illustrating a document summarizing method using a document format-based document reduction apparatus according to the present invention.

As shown in FIG. 3, the document loading unit 10 loads (loads) a document to be summarized (S100).

Next, the document pattern / form distribution analyzer 20 analyzes the pattern structure and the form distribution of the loaded document first (S110). Here, the number of paragraphs and the size of paragraphs are analyzed as a paragraph structure, and the corresponding paragraphs are reduced in units of sentences of a predetermined length according to the number and size of the analyzed paragraphs. In this case, the outline is often used, and if there is a lot of content, the sub-outline is reduced. If the outline is less used, the outline is centered on the outline.

Then, the document pattern / form distribution analyzer 20 analyzes the format of the loaded document (S120). This type of analysis analyzes the frequency and distribution of the forms included in the sentence as information. If a lot of forms are used, the sentence is designated as a selection subject according to the frequency of use of the words included in the form and the form information. At this time, format information includes bold, italic, color and underline.

In addition, the document pattern / form distribution analyzer 20 determines the distribution and the number of the objects (tables, figures, and charts) used in the document (S130). At this time, in the case of a document created by an entity, it is designated to be selected according to the format or paragraph structure.

The analysis type determination unit 30 then determines the type of document analyzed by the document pattern / form distribution analysis unit 20 (S140). Such document type determination is applied to markup-based documents such as OWPML, OOXML, HTML, and PDF as well as .hwp and doc for sentences, outlines and objects of documents designated and selected by the document pattern / And is assigned to the corresponding type.

Then, the form / paragraph classifier 40 determines a summary method according to the analysis type determined by the analysis type determination unit 30 (S150). This summary method determination is to classify forms and paragraphs to conform to the type of document determined in the analysis type determination unit 30.

Then, the important syntax selection unit 50 selects and marks important sentences, outlines, and entities of the document designated and selected by the document pattern / form distribution analyzer 20 as important sentences (S160). At this time, the nonmarking area is set to hide.

The summary document generation unit 60 generates a summary document using the sentences, outlines, and entities of the marked document selected and selected by the important phrase selection unit 50 (S170). At this time, the non-marking area is hidden by the reduction.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it should be understood that various changes and modifications will be apparent to those skilled in the art. Obviously, the invention is not limited to the embodiments described above. Accordingly, the scope of protection of the present invention should be construed according to the following claims, and all technical ideas which fall within the scope of equivalence by alteration, substitution, substitution, and the like within the scope of the present invention, Range. In addition, it should be clarified that some configurations of the drawings are intended to explain the configuration more clearly and are provided in an exaggerated or reduced size than the actual configuration.

10: document loading section 20: document pattern / form distribution analyzing section
30: analysis type determination unit 40: form / paragraph classification unit
50: important syntax selection unit 60: summary document generation unit

Claims (10)

A document loading unit loading a document to summarize;
The pattern and form distribution of the retrieved document are analyzed, and the number of paragraphs and the size of the paragraph are analyzed. The frequency and distribution of the forms included in the sentence are analyzed, and the distribution and the number of the objects used in the document are grasped A document pattern / form distribution analyzer;
An analysis type determining unit for determining a type of a document analyzed by the document pattern / style distribution analyzing unit;
A form / paragraph classifier for classifying forms and paragraphs in the analysis type determined by the analysis type determination unit;
An important syntax selection unit for selecting and marking a sentence, an outline, and an object of a document designated and selected by the document pattern / form distribution analyzer as an important sentence; And
And a summary document generation unit for generating a summary document using the sentences, outlines and entities of the marked document selected and selected by the important phrase selection unit.
The method according to claim 1,
Wherein the analysis of the form distribution of the document analyzes the use frequency and distribution of the form included in the sentence and designates the sentence as the selection subject according to the frequency of use of the words included in the form and the form information. Document Reduction Device.
3. The method of claim 2,
The format information is at least one of bold, italic, color and underline included in the word,
Wherein the object used in the document is to grasp the distribution and the number of the table, the figure and the chart.
The method according to claim 1,
The type of document determined by the analysis type determination unit is
Based document, wherein the document is applied to a markup-based document including .hwp, doc and OWPML, OOXML, HTML, PDF for the sentence, outline and object of the document.
Loading a document to be summarized in a document loading unit;
The document pattern / form distribution analyzing unit analyzes the number of paragraphs and the size of paragraphs in the document loaded by the document loading unit, analyzes frequency and distribution of the forms included in the sentence, Analyzing a document pattern and a form distribution for grasping a distribution and a number;
Determining in the type analysis type determining unit of the document analyzed by the document pattern / form distribution analyzing unit;
Determining a summary method according to the analysis type determined by the analysis type determination unit in a form / paragraph classifier;
Selecting and marking a sentence, an outline, and an object of a document designated and selected by the document pattern / form distribution analyzer as an important sentence in an important sentence selecting unit;
And a step of generating a summary document in a summary document generation unit using a sentence, an outline and an entity of the marked document selected and selected by the important statement selection unit. Summary method.
6. The method of claim 5,
The document pattern / form distribution analyzing section analyzes the pattern and the form distribution of the loaded document, analyzing the paragraph structure, analyzing the form in the loaded document, and analyzing the distribution and number of the objects used in the document The document summary method according to claim 1, wherein the step of extracting the document comprises the steps of:
The method according to claim 6,
Wherein analyzing the paragraph structure comprises:
The number of paragraphs and the size of paragraphs are analyzed, and the paragraphs are reduced in units of sentences of predetermined length according to the number and size of the analyzed paragraphs. If the outline is used more than the preset length and the content is large, And if the outline is used less than the predetermined length, the analysis is performed such that the selection target is mainly centered on the first paragraph based on the outline.
6. The method of claim 5,
Wherein the step of determining by the type-analysis-type determining unit of the document analyzed by the document pattern / form distribution analyzing unit comprises: analyzing the .hwp, doc and OWPML of sentences, outlines and objects of the document designated and selected by the document pattern / , OOXML, HTML, PDF, and so on, so that it can be applied to the corresponding type.
6. The method of claim 5,
Wherein the step of determining the summary method in the form /
Wherein the document type classification unit classifies forms and paragraphs suitable for the type of document determined in the analysis type determination unit.
6. The method of claim 5,
Wherein the non-marking area is set to hide in a step of selecting and marking as an important phrase in the important syntax selection unit.
KR1020150141726A 2015-10-08 2015-10-08 Apparatus and method for generating application and method for zing the same KR101740926B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150141726A KR101740926B1 (en) 2015-10-08 2015-10-08 Apparatus and method for generating application and method for zing the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150141726A KR101740926B1 (en) 2015-10-08 2015-10-08 Apparatus and method for generating application and method for zing the same

Publications (2)

Publication Number Publication Date
KR20170042145A true KR20170042145A (en) 2017-04-18
KR101740926B1 KR101740926B1 (en) 2017-05-29

Family

ID=58704100

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150141726A KR101740926B1 (en) 2015-10-08 2015-10-08 Apparatus and method for generating application and method for zing the same

Country Status (1)

Country Link
KR (1) KR101740926B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200114604A (en) * 2019-03-29 2020-10-07 주식회사 한글과컴퓨터 Electronic device capable of generating a summary image through merging of objects inserted in an electronic document and operating method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3597697B2 (en) 1998-03-20 2004-12-08 富士通株式会社 Document summarizing apparatus and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200114604A (en) * 2019-03-29 2020-10-07 주식회사 한글과컴퓨터 Electronic device capable of generating a summary image through merging of objects inserted in an electronic document and operating method thereof

Also Published As

Publication number Publication date
KR101740926B1 (en) 2017-05-29

Similar Documents

Publication Publication Date Title
US9311422B2 (en) Dynamic simulation of a responsive web page
US20160342578A1 (en) Systems, Methods, and Media for Generating Structured Documents
CN110770735B (en) Transcoding of documents with embedded mathematical expressions
US9898548B1 (en) Image conversion of text-based images
US20030163790A1 (en) Solution data edit processing apparatus and method, and automatic summarization processing apparatus and method
US20090292987A1 (en) Formatting selected content of an electronic document based on analyzed formatting
US9348799B2 (en) Forming a master page for an electronic document
US20150331847A1 (en) Apparatus and method for classifying and analyzing documents including text
KR20100057089A (en) Presentation of large objects on small displays
US20150302247A1 (en) Read determining device and method
JP2016042349A (en) Automatic method for division into chapters and sections
US20120328187A1 (en) Text analysis and visualization
US7602972B1 (en) Method and apparatus for identifying white space tables within a document
AU2014309040A1 (en) Presenting fixed format documents in reflowed format
US11615635B2 (en) Heuristic method for analyzing content of an electronic document
US20120017144A1 (en) Content analysis apparatus and method
CN104951429A (en) Recognition method and device for page headers and page footers of format electronic document
US20190392209A1 (en) Document Analyzer, Document Analysis Method, and Computer-Readable Storage Medium Storing Program
WO2016130236A1 (en) Responsive course design system and method
US9298675B2 (en) Smart document import
CN110162773A (en) Title estimator
KR100463835B1 (en) Index extraction method of web contents transcoding system for small display devices
KR101740926B1 (en) Apparatus and method for generating application and method for zing the same
US20220058214A1 (en) Document information extraction method, storage medium and terminal
KR20170057951A (en) Mehtod and apparatus for sentence correction using natural language processing

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant