KR101754473B1

KR101754473B1 - Method and system for automatically summarizing documents to images and providing the image-based contents

Info

Publication number: KR101754473B1
Application number: KR1020150094112A
Authority: KR
Inventors: 김정희; 하정우; 강동엽; 표현아
Original assignee: 네이버 주식회사
Priority date: 2015-07-01
Filing date: 2015-07-01
Publication date: 2017-07-05
Also published as: KR20170004154A

Abstract

A method and system for providing a summary of a document as image-based content is disclosed. A computer implemented method comprising: extracting sentences included in a document from a given document; Calculating similarity scores and diversity scores for the sentences and summarizing the documents into at least one key sentence using the scores; Selecting an image associated with the core sentence with respect to at least one of an image included in the document and an image on the database; And generating summary content for the document by combining the image with the core sentence.

Description

Technical Field [0001] The present invention relates to a method and system for summarizing a document as image-based content,

The description below refers to a technique for automatically summarizing a document.

Recently, with the spread of computers and the development of the Internet, the amount of electronic documents has rapidly increased, and accordingly, it takes a comparatively long time to extract desired documents out of numerous electronic documents.

A document retrieval system is a retrieval system using a common keyword. When a document is retrieved by a key word, the user can retrieve desired information by simply inputting a simple keyword.

However, in today's world, the amount of search results corresponding to a search term is not only vast, but also it can not be accurately determined whether or not the search result is correct. Therefore, the user actually has to check documents corresponding to the search result.

Techniques have been developed to automatically summarize the original document so that the user can more quickly and easily grasp the contents of the document.

A document summary is simply a 'reduction of the contents of a document to a certain size'. In detail, it is possible to compress a document content System.

For example, Korean Registration No. 10-0435442 (Date of Registration June 01, 2004) "document summary method and system" describes the structural characteristics of a document, structures it according to a certain rule, Discloses a technique of extracting a pattern that occurs and automatically summarizing the document using natural language processing (NLP) technology.

It provides a document summarization method and system that can quickly and effectively deliver key information of a document to a user in a short time by summarizing the contents of the original document into a small number of images and representative sentences.

A computer-implemented method comprising: extracting sentences included in a document from a given document; Calculating similarity scores and diversity scores for the sentences and summarizing the documents into at least one key sentence using the scores; Selecting an image associated with the core sentence with respect to at least one of an image included in the document and an image on the database; And generating summary content for the document by combining the selected image with the core sentence.

A document summary extracting sentences included in the document from a given document, calculating a similarity score and a diversity score for the sentences, and summarizing the document into at least one core sentence using the similarity score and the diversity score part; An image selection unit for selecting an image associated with the core sentence on at least one of the image included in the document and the image on the database; And a content generation unit for generating summary content for the document by combining the selected image with the core sentence.

By summarizing the body of a given document as a core sentence and combining images that are highly relevant to the content of the summarized key sentence, the document can be summarized and generated as image-based content such as short posts.

Considering the limited display characteristics of the mobile terminal environment, it is possible to summarize the source of documents such as Internet news articles, blogs, or posts of online communities and social networks as a small number of images and representative sentences, A new type of service is possible to deliver effectively.

1 is a diagram illustrating an example of an operating environment of a system according to an embodiment of the present invention.
2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention.
3 is a diagram schematically illustrating a process of summarizing and providing a document in a server according to an exemplary embodiment of the present invention.
4 is a block diagram illustrating a processor included in a server according to an exemplary embodiment of the present invention.
5 to 7 are diagrams for explaining a process of extracting a key sentence from a document according to an embodiment of the present invention.
8 to 10 are views for explaining a process of selecting an image having high relevance to a document in an embodiment of the present invention.
11 to 13 are diagrams illustrating a process of providing summary contents of a document in an embodiment of the present invention.
Figure 14 illustrates a learning module for document summarization in one embodiment of the present invention.
Figure 15 illustrates an execution module for document summarization in one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The following description relates to a technique for automatically summarizing a document, and more particularly to a document summarizing method and system for summarizing a document into image-based content.

In this specification, 'document' refers to data in which files such as text are expressed in a logical structure, and not only standard data such as a database (DB), but also unstructured data such as web data such as blogs and cafes It can mean. But is not limited to, documents with online multimodal features such as Internet news articles, blogs, or posts in online communities and social networks. Here, the multi-modal document means a document in which at least one or more different expression methods such as text, image, and voice are used for semantic expression to be transmitted in the document.

1 is a diagram illustrating an example of an operating environment of a system for summarizing and providing documents in an embodiment of the present invention. 1 shows an example of an operating environment, which shows a plurality of electronic apparatuses 110, 120, 130, 140, a plurality of servers 150, 160, and a network 170. FIG.

The electronic device 110, 120, 130, 140 may be a fixed terminal implemented as a computing system or a mobile terminal. Examples of the electronic devices 110, 120, 130 and 140 include a smart phone, a mobile phone, a navigation device, a computer, a notebook, a terminal for digital broadcasting, a PDA (Personal Digital Assistants), a portable multimedia player (PMP) PC and so on. Each of these electronic devices 110, 120, 130, 140 may communicate with other electronic devices and / or servers 150, 160 via the network 170 using a wireless or wired communication scheme.

The communication method is not limited, and may include a communication method using a communication network (for example, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 170 may include, as well as a short-range wireless communication between the devices. For example, the network 170 may be a personal area network (LAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN) , A network such as the Internet, and the like. The network 170 may also include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, It is not limited.

Each of the servers 150 and 160 may be implemented by a device or a plurality of devices for communicating with the electronic devices 110, 120, 130 and 140 through the network 170 to provide contents for the service. The servers 150 and 160 may be configured to configure one system for providing services or contents to the electronic apparatuses 110, 120, 130 and 140, or may be separate systems for providing different services or contents It is possible.

The servers 150 and 160 transmit a code for configuring screens of the electronic devices 110, 120, 130 and 140 to the electronic devices 110, 120, 130 and 140 according to a user's request through the electronic devices 110, 120, 130 and 140 and the electronic devices 110, 120, 130 and 140 may use a program (for example, a browser or a specific application) included in the electronic devices 110, 120, 130 and 140 A screen can be configured and displayed using the provided code, thereby providing the contents to the user.

The servers 150 and 160 may provide services and contents to the electronic devices 110, 120, 130 and 140 according to a user's request through the electronic devices 110, 120, 130 and 140. For example, the electronic devices 110, 120, 130 and 140 receive codes, files, data, and the like provided by the servers 150 and 160 and transmit programs (applications) installed in the electronic devices 110, 120, And receive services and contents provided by the servers 150 and 160 using the received codes, files, data, and the like.

Hereinafter, various embodiments of the present invention will be described in terms of one electronic device 110 such as a smart phone and the server 150. FIG.

2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. The electronic device 110 may include a memory 211, a processor 212, a communication module 213, and an input / output interface 214. Similarly, the server 150 may also include a memory 221, a processor 222, a communication module 223, and an input / output interface 224.

The memories 211 and 221 may be a computer-readable recording medium and may include a permanent mass storage device such as a random access memory (RAM), a read only memory (ROM), and a disk drive. Also, the operating system and at least one program code (for example, a code for an application installed and driven in the electronic device 110) may be stored in the memories 211 and 221. [ These software components may be loaded from a computer readable recording medium separate from the memories 211 and 221 using a drive mechanism. Such a computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, and a memory card. In other embodiments, the software components may be loaded into memory 211, 221 via communication modules 213, 223 rather than a computer readable recording medium. For example, at least one program may be loaded into the memory 211, 221 based on a program installed by the developers through files provided via the network 170. [

Processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. The instructions may be provided to the processors 212 and 222 by the memories 211 and 221 or the communication modules 213 and 223. The processors 212 and 222 may be configured to execute program codes stored in a recording device, such as memories 211 and 221. [

The communication modules 213 and 223 may provide functions for the electronic device 110 and the server 150 to communicate with each other through the network 170 and may allow the electronic device 110 or the server 150 to communicate with other electronic devices Or to communicate with other servers. For example, a request message generated by the processor 212 of the electronic device 110 may be transmitted to the server 150 through the network 170 under the control of the communication module 213 according to the user's control. The content provided by the processor 222 of the server 150 is received through the communication module 213 of the electronic device 110 via the communication module 223 and the network 170 and is transmitted to the processor 212 or the memory (211).

The input / output interfaces 214 and 224 may be means for interfacing with various input devices and output devices. For example, the input device may include a device such as a keyboard or a mouse, and the output device may include a device such as a display for displaying a communication session of the application and the application. As another example, the input / output interfaces 214 and 224 may be means for interfacing with a device in which functions for input and output are integrated into one, such as a touch screen.

The processor 212 of the electronic device 110 may be configured to use the service screen 215 configured by using the data provided by the server 150 in processing commands of the computer program loaded in the memory 211, May be implemented to be displayed on the display via interface 214. [

Also, in other embodiments, the electronic device 110 and the server 150 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, the electronic device 110 may further include other components such as a display such as a touch screen, a transceiver, a Global Positioning System (GPS) module, a camera, and the like.

3 is a diagram schematically showing a process of summarizing a document in an embodiment of the present invention. 3 shows a server 150 and a plurality of electronic devices 310. The process of summarizing and providing the document 301 as image-based content from the viewpoint of the server 150 will be briefly described.

In step (1), the server 150 may summarize the body of the document 301 for each given document 301 into one or more core sentences. For example, the server 150 may define a new score in consideration of the similarity and diversity between the sentences included in the document 301 in the summary sentence of the document 301, ) Can be extracted from the core sentence. At this time, the similarity may mean the similarity scores between the words constituting the sentence, and the diversity may mean the entropy calculated from the occurrence probabilities of the words constituting the sentence.

In step (2), the server 150 can select an image having high relevance to the content of the core sentence extracted in the process (1). The server 150 can directly manage the database 320 or obtain information on the images stored in the database 320 by communicating with the database 320 installed outside the server 150 via the network. Accordingly, the server 150 may calculate the degree of similarity between the key sentence and the image, and may select an image having a high degree of similarity among the images stored in the database 320. [

In step (3), the server 150 combines the key sentence extracted in step (1) with the image selected in step (2), thereby combining the document 301 with new summary type content, Quot; content "). The server 150 may determine at least one of the position and color of the key sentence to be combined with the image in consideration of the entire color pattern of the image in order to improve the readability.

In step (4), the server 150 may store and maintain the summary contents in which the key sentence and the associated image are combined, in association with the corresponding document 301, and may store and maintain a plurality of electronic apparatuses 310). &Lt; / RTI > The server 150 provides the summary contents summarized in the form of a post image to the corresponding electronic devices 310 in the case where the server 301 wants to serve the document 301 to the plurality of electronic devices 310 can do. For example, when the document 301 is searched based on the user query input through the electronic device 310, the server 150 searches the retrieved document 301 for one or more key sentences and images associated with the key sentences By outputting in the form of a post, it is possible to serve the summary contents to the user.

Accordingly, the server 150 can summarize and generate a given document in the form of a content such as a short image post, thereby effectively transmitting core information of the document.

FIG. 4 is a block diagram for explaining a processor included in a server according to an embodiment of the present invention, and FIGS. 5, 8, and 11 illustrate a method performed by a server in an embodiment of the present invention FIG. The processor 222 included in the server 150 may include a document summarizing unit 410, an image selecting unit 420, a content generating unit 430 and a content providing unit 440 as shown in FIG. And these components may be implemented to execute the steps included in the method of FIGS. 5, 8, and 11 through at least one program code and an operating system that includes the memory 221.

5 is a flowchart for explaining a process of extracting a key sentence from a document according to an embodiment of the present invention.

In step 510, the document summarizing unit 410 may extract the document including the text as an input, a text, a title, and meta information. The document summarization unit 410 can generate all the sentences in a given document as a set of words including part of speech information through morphological analysis. The object generating the set of words may include the title of the document.

In step 520, the document summarizer 410 may generate a set of words as a descriptor or feature using a pre-learned text descriptor learning model. For example, the text descriptor can be represented as a numerical multidimensional vector form.

A word frequency histogram, a term frequency (TF) / inverse document frequency (IDF), a language learning model (e.g., word2vec, phrase2vec, document2vec, etc.) can be used for generating a text descriptor.

Specifically, for example, if the following example sentence is defined as a 512-dimensional real value vector s,

Example sentence: <Kim, Yeon-ae Let it go

→ s = {s ₁ = 0.032, s ₂ = -0.595, ... , s ₅₁₂ = 1.22} (s _n is the n-th variable of the s vector)

Can be expressed as

The real vector value of the sentence is calculated through learning using the whole corpus as the training data, and if the meaning of the sentence is similar, it is learned that the distance on the 512 dimension of the two sentences becomes closer. In addition, through learning, not only the sentence but also each word is expressed as a high-dimensional real number vector, where the number of dimensions of sentence and word vector can be set to an arbitrary integer value greater than zero.

In step 530, the document summarization unit 410 may calculate a score based on the similarity and diversity with the title of the document from the text descriptor generated in step 520 for all sentences of the document.

In step 540, the document summarizing unit 410 may stochastically select a core sentence among all the sentences included in the document using the calculated scores.

Scores can be calculated using similarity and diversity to extract key sentences in the document.

For example, for the sentences represented by the vector s = {s ₁ , s ₂ , ..., s ₅₁₂ }, the score of each sentence is the sum of the similarity (S) , Which is defined as Equation (1).

Where s is a sentence in the body of the document, t is the title of the document, C is a group of sentences containing s, and w ₁ and w ₂ are weight values of similarity (S) and diversity (D), respectively. Here, C may be a cluster of the entire sentences included in the document, or the entire sentence may be a cluster of clusters of similar meaning.

The document summarization unit 410 may extract one or more sentences having a high score according to the score of each sentence included in the document and select the extracted sentence as the core sentence of the corresponding document.

At this time, the similarity between sentences is calculated from the similarity score between the words constituting the sentence. For example, the similarity between words can be calculated from a model in which semantics are learned based on a common appearance pattern of words included in all documents of an online document database.

For example, given the title t = {t ₁ , t ₂ , ...} and one sentence s = {s ₁ , s ₂ , ...} in the text, the similarity between each sentence and the title S (s, t) = cosine similarity ({s ₁ , s ₂ , ...}, {t ₁ , t ₂ , ...) if cosine similarity is used to obtain s .

The similarity degree similarity between vectors A and B is defined by Equation (2).

The degree of similarity for determining the similarity between sentences is not limited to the degree of cosine similarity, and various measurement methods can be applied according to the characteristics of text descriptors such as Euclidean distance and Hamming distance.

Accordingly, the document summarizing unit 410 can extract sentences similar to the title of the document based on the similarity S (s, t) between the sentences.

And, diversity among sentences can be calculated based on the uncertainty calculated from the probability of emergence of the words composing each sentence, and calculated uncertainty. At this time, sentences with similar probabilities of occurrence of words can be regarded as similar sentences, and sentences with completely different probabilities of occurrence of words can be understood as sentences containing different meanings.

In order to extract the key sentences with diversity in a given document, the sentence vectors are clustered into a suitable number of clusters between semantically similar sentences by clustering method and the sentences with the highest uncertainty are extracted from each of the clusters, Can be extracted.

At this time, the number K of clusters is given as an integer larger than 0, and when K = 1, the entire text of the document is used as a cluster. Examples of clustering methods include K-Means Clustering, mean-shift clustering, and hierarchical clustering.

If the uncertainty of sentence s for cluster C is Entropy (s, C), the key sentence s' selected for the sentences belonging to the ^mth cluster C ^m is s, which maximizes Entropy (s, C) '= argmax _s (Entropy (s, C ^m )).

As an example of uncertainty, the uncertainty computed from the sentence vectors in a particular cluster can use Shannon's Boltzmann entropy calculation, which is often used in information theory.

The sentences with the highest uncertainty in each cluster represent the sentences represented by the most diverse words in the cluster.

FIGS. 6 and 7 illustrate an example of a process of extracting a core sentence from a document.

6, the document 600 may be composed of a title 601 and a plurality of sentences 602. At this time, based on the occurrence probabilities of the words constituting each sentence, a sentence (602) into at least one cluster (603).

Referring to FIG. 7, similarity (S) between a title and a sentence of a document is calculated for each sentence included in the document. At this time, a sentence most similar in meaning to the title of the sentence included in the document is selected, (D) between each of the sentences of each cluster 603 and the selected sentence.

(F = w ₁ * S + w ₂ * D) for each similar sentence S and D, and then calculates a weighted sum F of all sentences in the document as a threshold value 1) can be selected as the core sentence 704 of the document. As another example, a predetermined number of sentences can be selected as the key sentence 704 in the order in which the weighted sum (F) of all the sentences in the document is large.

Accordingly, the document summarizing unit 410 can extract a key sentence for summarizing the document in the document by using the score simultaneously considering similarity and diversity among sentences in the document.

8 is a flowchart illustrating a process of selecting an image having high relevance to a document according to an exemplary embodiment of the present invention.

In step 850, the image selector 420 inputs a text descriptor of the core sentence into each of the core sentences into the previously learned text-image common semantic space learning model, and generates an image descriptor corresponding to the text descriptor through the probabilistic inference Lt; / RTI > The text-image common semantic space learning model will be described in detail below.

In step 860, the image selector 420 may compare the image descriptor generated in step 850 with an image on the database to select a highly similar image. In other words, the image selecting unit 420 calculates the similarity between the image descriptors generated from the core sentence of the document and the image descriptors of the database constructed in advance, and outputs the predetermined number of images, which are defined in advance, It can be selected stochastically as an image expressing a sentence.

As another example, the image selection unit 420 may select at least one image included in the document for at least one key sentence of the key sentences when the given document includes the image. For example, the image selection unit 420 may select an associated image of the key sentence having the highest weighted sum (F) among the key sentences of the document, from the image included in the document.

The image selection unit 420 can select an associated image 905 having a high degree of similarity with the corresponding sentence 904 for each of the core sentences 904 extracted from the document as shown in FIG.

The image selection unit 420 may use various learning models to select an image associated with a key sentence. For example, the image selection unit 420 may include a text-image common learning model that learns the semantic space of text and images in one model Through the semantic spatial learning model, the associated images of the core sentence can be selected. FIG. 10 illustrates an exemplary text-image common semantic spatial learning model 1000.

In order to efficiently learn the image, the category or label of the image can be automatically set by using at least one word included in the title of the document. In other words, an automated labeling technique can be applied by using the category or label information of the images included in the document as important words included in the title of the document.

In order to improve the quality of the image descriptor, important words (e.g., characters, concepts, words with high arithmetic values such as TF / IDF (Term Frequency - Inverse Document Frequency)) among the words included in the title of the document to which the image belongs, You can conduct supervised or supervised learning by setting it as an argument for the generation of the inter-relationship.

The image is represented by a high dimensional real vector form descriptor through the preprocessing technique for learning the common semantic pattern between the title and the images of the document including the image. For this purpose, the CNN (Convolutional neural network) feature, SIFT (Scale Invariant Feature Transform) , Histogram of Oriented Gradient (HOG), and SURF (Speeded Up Robust Features). For example, if one image is defined as a 1024-dimensional real number vector i, i = {i ₁ = 0.000, i ₂ = 0.859, ... , i ₁₀₂₄ = 1.245}. At this time, a descriptor is defined such that the distance between two vectors in the 1024-dimensional space becomes closer as the two images become closer to each other. The number of dimensions is exemplary and is not limited to 1024.

In the present invention, a model for representing text information (for example, text in a title or a body text) and image information in the same semantic space for documents including an image for image retrieval associated with a key sentence of the document, Technology can be applied.

For example, a text represented by a 512-dimensional real vector and an image represented by a 1024-dimensional real vector may be represented on another real vector space of the same 200-dimension, and text and images having similar meaning in this space Learning of the model proceeds so that the distance is close to the defined semantic space. The semantic space represented by the above model can be expressed as shown in FIG. 10, and text and images having similar meaning can be expressed on a space of a close distance.

The image selection unit 420 may infer the image descriptor information having a similar meaning to the corresponding sentence by using the text-image common semantic space learning model 1000 when a key sentence of the document is input. For example, if a sentence is input, a 512-dimensional real vector s = {s ₁ = 0.02, ... , s ₅₁₂ = 0.324} is transformed into a 1024-dimensional real vector I = (i ₁ = 0.102, ..., i ₁₀₂₄ = 0.999) representing the image through stochastic inference of the learned model. At this time, the dimensions of the text descriptor and the image descriptor are illustrative and not restrictive.

The image selection unit 420 can select the most similar image from the database using the image descriptor information generated through the text-image common semantic spatial learning model 1000. [ For example, the image selection unit 420 may calculate the image similarity through various measurement techniques such as Euclidean distance and cosine similarity between two images, and may select an associated image for the key sentence based on the image similarity.

11 is a flowchart illustrating a process of providing summary contents of a document in an embodiment of the present invention.

In step 1170, the content generation unit 430 generates the summary content of the document by combining the key sentence inputted for the image search on the image selected as the associated image of the core sentence in the image selection unit 420 for the given document can do.

Referring to FIG. 12, the content generation unit 430 may synthesize a core sentence 1202 of a document with an image 1201 having a similar meaning to the corresponding sentence, and generate it as a summary content of the document.

In order to increase the readability of the key sentence 1202 in the image 1201, the content generation unit 430 generates at least one of the color and the position of the key sentence 1202 in consideration of the color and brightness of the image 1201 It can be determined adaptively. For example, the core sentence 1202 can be placed in the background in consideration of the pattern of the image 1201, or the color of the core sentence 1202 can be determined with a color similar to a complementary or complementary color to the background color of the image 1201 have.

In step 1180, the content providing unit 440 stores the summary content in which the key sentence and the related image are combined with respect to the document, and provides the summary content to the user through various service channels such as the web or mobile .

For example, when a query term for a search request is received through a user terminal, the content providing unit 440 may output a post-form image including a core sentence as a summary content of a document retrieved on the basis of a query term have.

13, a post-image 1310 including each of the core sentences 1303 extracted from the document 1300 with respect to the service target document 1300 may be exposed as the summary content of the document 1300. FIG.

Furthermore, the summary contents exposed through the service channel may include a user interface for receiving feedback information such as a user evaluation of the content. The content providing unit 440 may collect feedback information on the summary contents from the user terminal. The collected user feedback information may include a learning model for document summary and a text-image common semantic space learning model, which is an image learning model. Or updated.

Figure 14 illustrates a learning module 1400 for document summarization in one embodiment of the present invention.

14, the learning module 1400 may construct a learning model 1410 for a text sentence summary, and in particular, the learning model 1410 may consider the similarity and diversity among the sentences included in the document through learning To summarize the document, the document can be expressed as a set of words through morphological analysis, and each of the words can be expressed as a real vector form descriptor.

In particular, the learning module 1400 learns the semantic space of the text and the image included in the online document in one model, and expresses the text and the image in the same semantic space by expressing the text and the image as descriptors of a real vector, The text-image common semantic space learning model 1420 can be constructed.

At this time, the text can be represented by a real vector descriptor through learning using the entire corpus as training data. In the case of an image, it can be expressed as a real vector descriptor using the characteristics of the image itself or using words included in the title of the document to which the image belongs.

The learning model 1410 and the text-image common semantic space learning model 1420 for the text sentence summary of the learning module 1400 may be constructed reflecting the user feedback information on the document.

Figure 15 illustrates an execution module 1500 for document summarization, in one embodiment of the present invention.

The execution module 1500 extracts a given document, such as an Internet news article, a blog, or an online community and a social network post, as input, extracts a given document as a text, a title, meta information, etc. (1501) All sentences including the title can be generated as a set of words (1502).

The execution module 1500 generates a set of words as a text descriptor in the form of a numerical multidimensional vector using a previously learned document summary model 1510 (1503), and calculates a real vector value for each sentence included in the document After that, the key sentence can be extracted from the document in consideration of similarity and diversity in the document (1504).

The execution module 1500 generates a text descriptor corresponding to the text descriptor by inputting the text descriptor of the sentence into each of the extracted core texts into the text-image common semantic space model 1520 previously learned (step 1505 ), The generated image descriptor is compared with the image descriptor on the database, and an associated image having high similarity can be selected (1506).

Execution module 1500 may generate the summarized post-form content 1507 by combining the sentence with the associated image selected for each of the extracted core sentences in the document.

According to the embodiments of the present invention, the text of a given document is summarized as a key sentence, and the image, which is highly related to the content of the summarized key sentence, is combined with the corresponding sentence to summarize the document as image- Can be generated. Therefore, by summarizing the original document with a small number of images and representative sentences, it is possible to provide a new type of service that enables users to transmit key information of a document more quickly and effectively. Furthermore, various personalization services can be implemented by applying a technique of summarizing a document to image-based content to data mining techniques such as multi-sensor personal log as well as image-text based document data.

The apparatus described above may be implemented as a hardware component, a software component, and / or a device described above as a combination of hardware components, software components, and / or hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In a computer implemented method,
Extracting sentences included in the document from a given document;
A similarity score indicating a degree of similarity between words constituting a sentence as information on the sentences, and a diversity score indicating an entropy calculated from the probability that the words constituting the sentence appear in the document, Summarizing at least one key sentence;
Selecting an image associated with the core sentence with respect to at least one of an image included in the document and an image on the database; And
Generating summary content for the document by combining the selected image with the core sentence
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
The summarizing step comprises:
Obtaining a weight of the similarity score and the diversity score; And
Selecting a sentence having a high weight as the key sentence
&Lt; / RTI >

The method according to claim 1,
The summarizing step comprises:
A step of converting a sentence included in the document into a text descriptor represented by a numerical vector using a previously learned text descriptor learning model in which a similarity between sentences or words is expressed by a numerical vector
&Lt; / RTI >

The method according to claim 1,
The summarizing step comprises:
Calculating a similarity score according to the degree of similarity between the title of the document and the words constituting the sentence for each of the sentences included in the document;
Calculating a diversity score between the sentences based on the sentence having the highest similarity score among the sentences included in the document for each of the sentences included in the document; And
Selecting at least one sentence among the sentences included in the document based on the similarity score and the diversity score as the core sentence
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
The summarizing step comprises:
A step of converting a sentence included in the document into a text descriptor represented by a numerical vector using a first learned learning model in advance;
Lt; / RTI >
Wherein the selecting comprises:
Generating a numeric vector corresponding to a text descriptor of the core sentence using an image descriptor using a second learning model in which text and an image are expressed in a same semantic space as a numeric vector; And
Selecting at least one of an image on the document and an image on a database using the image descriptor and selecting an image associated with the core sentence
Lt; RTI ID = 0.0 > 1, < / RTI >

6. The method of claim 5,
Wherein the second learning model comprises:
Learning by setting the label information of the image included in the document to at least one word contained in the title of the document
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the generating comprises:
Determining at least one of a color and a position of the key sentence to be combined with the selected image according to the pattern of the selected image
Lt; RTI ID = 0.0 > 1, < / RTI >

6. The method of claim 5,
Providing an image combining the core sentence as summary content of the document to a user terminal over a network; And
Collecting user feedback information on the summary content from the user terminal
Further comprising:
Wherein the user feedback information is used for the first learning model and the second learning model
Lt; RTI ID = 0.0 > 1, < / RTI >

A computer-readable recording medium having recorded therein a program for executing the method according to any one of claims 1 to 8.

A similarity score indicating the degree of similarity between the words constituting the sentence as information on the sentences, and a degree of similarity between the words constituting the sentence and the uncertainty calculated from the probability that the words constituting the sentence appear in the document after extracting the sentences included in the document from the given document a document summary unit for summarizing the document into at least one key sentence using diversity scores representing entropy;
An image selection unit for selecting an image associated with the core sentence on at least one of the image included in the document and the image on the database; And
Generating a summary content for the document by combining the selected image with the core sentence,
&Lt; / RTI >

11. The method of claim 10,
The document summarizing unit,
Calculating a similarity score and a weight of the diversity score, and then selecting the sentence having a higher weight as the key sentence
Lt; / RTI >

11. The method of claim 10,
The document summarizing unit,
A sentence included in the document is converted into a text descriptor represented by a numerical vector using a previously learned text descriptor learning model,
Calculating the similarity score and the diversity score using the text descriptor,
Selecting at least one sentence among the sentences included in the document as the key sentence using the similarity score and the diversity score
Lt; / RTI >

11. The method of claim 10,
The document summarizing unit,
A sentence included in the document is converted into a text descriptor represented by a numerical vector using a first learned learning model,
Wherein the image selection unit comprises:
A numerical vector corresponding to a text descriptor of the core sentence is generated as an image descriptor using a second learning model in which text and an image are expressed by a numerical vector in the same semantic space,
Retrieving at least one of the image included in the document and the image on the database using the image descriptor to select an image associated with the core sentence
Lt; / RTI >