JP5112027B2 - Document group presentation device and document group presentation program - Google Patents

Document group presentation device and document group presentation program Download PDF

Info

Publication number
JP5112027B2
JP5112027B2 JP2007308151A JP2007308151A JP5112027B2 JP 5112027 B2 JP5112027 B2 JP 5112027B2 JP 2007308151 A JP2007308151 A JP 2007308151A JP 2007308151 A JP2007308151 A JP 2007308151A JP 5112027 B2 JP5112027 B2 JP 5112027B2
Authority
JP
Japan
Prior art keywords
document
concept
abstraction level
important topic
document group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2007308151A
Other languages
Japanese (ja)
Other versions
JP2009134378A (en
Inventor
嘉隆 伊藤
Original Assignee
株式会社日立ソリューションズ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立ソリューションズ filed Critical 株式会社日立ソリューションズ
Priority to JP2007308151A priority Critical patent/JP5112027B2/en
Publication of JP2009134378A publication Critical patent/JP2009134378A/en
Application granted granted Critical
Publication of JP5112027B2 publication Critical patent/JP5112027B2/en
Application status is Expired - Fee Related legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

  The present invention relates to a document group presentation apparatus and a document group presentation program in a document management system, and in particular, uses semantic hierarchical relationships of concepts from the overall image of all document sets to expand or reduce semantics on important topics. It is related to the technology to do.

As a conventional information flow detection method, a method of automatically extracting and classifying topics and calculating related words is known. (See Patent Document 1 below)
As a conventional information flow detection method, a method of automatically extracting and presenting topics arranged in time series is known. (See Patent Document 2 below)
Furthermore, a classification method using abstraction is known as an automatic document classification method in a conventional document management system. (See Patent Document 3 below)

As prior art documents related to the invention of the present application, there are the following.
JP 2006-277767 A JP-A-11-175530 JP 2003-85189 A

In the method described in Patent Document 1 described above, topic extraction based on appearance frequency and intensity of topics are calculated and related to each other, but there is a possibility that topics that are unknown in terms of meaning are combined. Therefore, it cannot be used for the purpose of presenting a document group by tracing from the topic of the superordinate concept to the topic of the subordinate concept.
Further, in the method described in Patent Document 2, topics are projected on a graph with the number of appearances and the date and time as axes, and the listability is improved, but it can be presented to a huge number of topics. could not.
Furthermore, in the method described in Patent Document 3, a specific document group presentation method is not shown, and a classified document group cannot be presented.
The present invention has been made to solve the above-described problems of the prior art, and an object of the present invention is to create an integration / division result of a document group at a specified abstraction level, and is a classification target. An object of the present invention is to provide a document group presentation device capable of constructing an interface that can be enlarged or reduced in terms of meaning by using a document space of a document group as a map.
Another object of the present invention is to provide a program for causing a computer to execute the above document group presentation device.
The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.
In order to achieve the above object, the present invention is a document group presentation device for presenting a document group of a document management system having a function of inputting, registering, storing, searching, and displaying a document, and reading a concept dictionary A concept tree construction means for constructing a word concept as a tree structure, and an important topic extraction means for extracting a word by reading a plurality of documents and determining and extracting an important topic based on the number of occurrences of the word or whether it is a headword The concept tree constructed by the concept tree construction means and the important topic extracted by the important topic extraction means. a document classification table construction means for constructing a concept tree representing the degree as a document classification table, parsing the document classification table constructed by the document classification table constructing unit, and abstract inputted this time, before Input abstraction, and documents integrating and dividing means for updating the document classification table based on the current level of abstraction based on the document classification table updated by the documents integrating and dividing means, presenting the documents Presenting means.

Further, in the present invention, the document classification table construction unit, when the important topic extracted by the important topic extraction unit exists in the concept tree constructed by the concept tree construction unit , The concept identifier is the concept identifier of the document classification table, the document identifier of the document associated with the important topic is the document identifier of the document classification table, and the hierarchy of the important topic on the concept tree is the abstract of the document classification table When the abstraction level input last time is greater than the abstraction level input this time , the document group integration / division means sets the abstraction level input this time and the document classification. based on the current level of abstraction table, the important topic of the currently input abstraction, integrates important topic subgeneric important topic of the currently input abstraction level, the previous time input abstraction Said If: abstract input times, based on said a currently input abstraction and the current abstract of the document classification table, an important topic of the preamble that integrates important topic subgeneric, the current The document classification table is updated so as to divide into the important topics of the abstraction level inputted, and the presenting means is based on the document classification table updated by the document group integration / division means, and the abstraction inputted this time degrees below the critical topic group document associated with the appearance of the vertical axis frequency on the graph for the horizontal axis and the appearance date, the number of documents associated with the important topic of the following abstract wherein is currently input Present in proportional area.
The present invention is also a document group presentation program for presenting a document group of a document management system having functions for inputting, registering, storing, searching, and displaying a document, and the document group presentation program is stored in the computer. Each means of the document group presentation apparatus is realized.

The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.
According to the present invention, it is possible to create an integration / division result of a document group with a specified abstraction level, and to construct an interface that can be expanded or reduced semantically by using the document space of the document group to be classified as a map It becomes possible to do.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In all the drawings for explaining the embodiments, parts having the same functions are given the same reference numerals, and repeated explanation thereof is omitted.
[Function block diagram]
FIG. 1 shows a functional block diagram of a document group presentation apparatus according to an embodiment of the present invention.
As shown in FIG. 1, the document group presentation device 11 includes a concept tree construction unit 101, an important topic extraction unit 102, a document classification table construction unit 103, a concept dictionary storage unit 104, a document storage unit 105, and a document. A group integration / division unit 106, a document group presentation unit 107, an abstraction level input unit 108, and a display unit 109 are provided.
The concept tree construction unit 101 reads a concept word from the concept dictionary storage unit 104, analyzes the parent-child relationship, and constructs a tree structure on the memory.
The important topic extraction unit 102 reads a document from the document storage unit 105, extracts a text sentence, performs a morphological analysis, extracts a topic, extracts a topic, determines an importance by a predetermined procedure, and determines an important topic. Extract.
Based on the concept tree constructed by the concept tree construction unit 101 and the important topic extracted by the important topic extraction unit 102, the document classification table construction unit 103 includes a concept identifier, a document identifier, A document classification table including a concept tree representing the abstraction level, the current abstraction level and the current position is constructed.
The document group integration / division unit 106 analyzes the document classification table according to the abstraction level input by the abstraction level input unit 108, and integrates / divides the document classifications of the lower abstraction level into the abstraction document classifications to be displayed. Update the document classification table.
The document group presentation unit 107 analyzes the document classification table created by the document integration / classification unit 106 and displays the display target classification target document group on the display device.

[Hardware configuration]
FIG. 2 shows a hardware configuration of a computer device that executes the document group presentation device 11 shown in FIG.
As shown in FIG. 2, the computer executing the document group presentation apparatus 11 includes a display 201, a CPU 202, a memory 203, a keyboard / mouse 204, a hard disk 205, and a CD-ROM drive 206 for reading a CD-ROM 207. And a communication circuit 208 connected to the Internet 209. The hard disk 205 stores a document presentation program 2051, a concept dictionary database 2052, and a document database 2053.
The concept tree construction unit 101 of FIG. 1 uses a concept dictionary database 2052 and is realized by a document presentation program executed by the CPU 202 using the memory 203.
The important topic extraction unit 102 in FIG. 1 uses a document database 2053 and is realized by a document presentation program 2051 that is executed by the CPU 202 using the memory 203.
The document classification table construction unit 103 in FIG. 1 is realized by a document presentation program 2051 that the CPU 202 executes using the memory 203.
1 is realized by a concept dictionary database 2052. The document storage unit 105 in FIG. 1 is realized by a document database 2053.
The document group integration / division means 106 in FIG. 1 is realized by a document presentation program 2051 that is executed by the CPU 202 using the memory 203.
The document group presentation unit 107 in FIG. 1 is realized by a document presentation program 2051 that the CPU 202 executes using the memory 203.
The abstraction level input means 108 in FIG. 1 is realized by a keyboard / mouse 204. The display unit 109 in FIG. 1 is realized by the display 201.

[Processing details]
The processing procedure of this embodiment will be described with reference to FIGS.
FIG. 4 is a flowchart showing a processing procedure of the concept tree construction unit 101 of the document group presentation device 11 shown in FIG.
FIG. 5 is a flowchart showing a processing procedure of the important topic extraction means 102 of the document group presentation device 11 shown in FIG.
FIG. 6 is a flowchart showing a processing procedure of the document classification table construction unit 103 of the document group presentation device 11 shown in FIG.
FIG. 7 is a flowchart showing a processing procedure of the document group integration / division means 106 of the document group presentation apparatus 11 shown in FIG.
FIG. 8 is a flowchart showing a processing procedure of the document group presenting means 107 of the document group presenting apparatus 11 shown in FIG.
4 to 8 are realized by the document presentation program 2051 shown in FIG.

[Concept tree construction processing]
When the user starts the document presentation program 2051, the concept tree construction process by the concept tree construction unit 101 starts. In the concept tree construction process, an area for retaining the concept tree of the construction result is secured in the memory 203 in advance.
First, the concept is read from the concept dictionary storage unit 104 to acquire one concept (step S401). Here, the concept dictionary is assumed to be a dictionary composed of the concept part shown in FIG. 9 and the relation part shown in FIG. 10, and the first concept 901 in FIG. 9 is read, and then the relation shown in FIG. Read the concept that you are a subordinate concept from. In this case, the concept 901 is not read because there is no relationship corresponding to the subordinate concept.
Next, an object shown in the concept 301 of FIG. 3 is created in the memory 203, and “1” that is the concept identifier of the read concept and “concept” that is the concept name are set (step S402). There is no superordinate concept or subordinate concept, so it is not set.
Next, while tracing the resulting concept tree from the root element to the lower level concept (step S403), it is determined whether or not there is a higher level concept whose subordinate concept is the read concept (step S404). In this case, since there is no element in the concept tree yet, the current concept is set as the root element (step S405).

Next, it is determined whether the read concept is the last concept (step S406). In this case, since it is not the last concept yet, the next concept is read (step S401).
The second subject 902 in FIG. 9 is read, and then one line 1001 that is a subordinate concept is read from FIG.
Next, an object shown in the concept 301 of FIG. 3 is created in the memory 203, and the read concept identifier “2”, the concept name “subject”, and the higher concept identifier are “concept” identifiers. A certain “1” is set. In this case, since there is no subordinate concept, it is not set.
Next, while tracing the resulting concept tree from the root element to the lower level concept (step S403), it is determined whether or not there is a higher level concept whose subordinate concept is the read concept (step S404). In this case, since the concept object of “subject” corresponds to the subordinate concept of the concept object of “concept” that already exists in the concept tree, the concept object of “subject” is not included in the subordinate concept of the concept object of “concept”. Set.
Thereafter, this process is repeated until the last concept, and the concept tree shown in FIG. 11 is constructed. Here, concept objects of “subject”, “thing”, “event”, “position”, and “time” are set as subordinate concepts of the concept object of “concept”.

[Important topic extraction processing]
When the concept tree construction process ends, the important topic extraction unit 102 starts the important topic extraction process.
In the important topic extraction process, first, a document is read from the document storage unit 105 to obtain one document (step S501). In this case, the document is composed of a heading 1201 and a body 1202 shown in FIG. Assume a document.
Next, an object shown in the document 302 of FIG. 3 is created in the memory 203, and a sequential document identifier, in this case, “0” is automatically assigned and set. From the file path to the document file path, in this case, “ Concept.doc "is set (step S502).
Next, all texts including the headline 1201 and the body 1202 are extracted from the document, and one sentence is extracted (step S503). To cut out the first sentence from the captured sentence, for example, it is performed by detecting the first punctuation mark “.” Or a line break in the sentence. In this case, if the captured sentence is, for example, “concept is a general / general meaning of things. For a certain thing…”, in step S503, “concept” Is the general and general meaning of things. "
Next, the extracted text sentence is subjected to morphological analysis processing and decomposed into parts of speech (step S504). In the morphological analysis process, the extracted sentence is decomposed into words and part-of-speech information is generated. A known method can be used for such morphological analysis processing.
In this case, analysis processing is performed in the form of 1302 shown in FIG. 13 to obtain a list of words 1303.

Next, the first topic is acquired using the decomposed word list as the topic list (step S505). In this case, “concept” is acquired.
Next, it is determined whether or not “concept” exists in the concept tree constructed by the concept tree construction unit 101 (step S506). In this case, since “concept” exists in the concept tree, an object shown in the document classification 303 of FIG. 3 is generated (step S507).
Next, “1” that is an identifier of “concept” is set as the concept identifier of the object shown in the document classification 303 of FIG. 3 as the concept identifier, “1” is set as the document identifier, and the concept 301 of FIG. The appearance frequency of the “concept” object shown is incremented by +1, and the document creation date “June 15, 2003” is set as the appearance date (step S508). Each time the appearance frequency of the “concept” object shown in the concept 301 of FIG. 3 is increased by +1, the appearance date and time is updated to the creation date of the newest document.
Next, it is determined whether or not “concept” is a headword (step S509). If it is not a headword, for example, the appearance frequency of the object of “concept” shown in the concept 301 of FIG. Judgment is made based on whether or not the threshold value set in the file stored in is exceeded (step S510). In this case, since “concept” is a headword, it is determined to be an important word.

Next, the document classification object generated in step S507 is added to the document classification table 304 of FIG. 3 (step S512).
Next, it is confirmed whether or not the current topic is the last topic (step S513). In this case, since it is not the last topic, the next topic is acquired (step S505).
Thereafter, the steps from Step S505 to Step S513 are repeated, and if the current topic is the last topic, it is confirmed whether it is the last sentence (Step S514). In this case, since it is not the last sentence, the next sentence is extracted (step S503).
Thereafter, the steps S503 to S514 are repeated, and if the current sentence is the last sentence, it is confirmed whether or not it is the last document (step S515). In this case, since it is not the last document, the next document is read (step S501).
Thereafter, steps S501 to S515 are repeated, and if the current document is the last document, the process ends.

[Document classification table construction process]
When the important topic extraction process ends, the document classification table construction process by the document classification table construction unit 103 starts.
In the document classification table construction process, first, one document classification object is acquired from the document classification table 304 created by the important topic extraction unit 102 (step S601). In this case, a document classification whose important topic name is “concept” is acquired.
Next, it is determined whether or not the concept corresponding to the document classification whose important topic name is “concept” exists in the concept tree constructed by the concept tree construction unit 101 (step S602). In this case, since the concept “concept” exists in the concept tree, “1”, which is the number of stages in the current concept tree hierarchy, is set as the abstraction level of the document classification object (step S603), and the current abstract of the document classification object is set. Each time, “1”, which is the number of stages in the current concept tree hierarchy, is set (step S604).
Next, the document classification table is updated with the updated document classification (step S605).
Next, it is determined whether it is the last document classification (step S606). In this case, since it is not the last document classification, the next document classification is acquired (step S601).
Thereafter, steps S601 to S606 are repeated until the last document classification, and the document classification table shown in FIG. 14 is constructed.
This document classification table is an example of the document classification table 304 of FIG. For example, one row of the table 1401 corresponds to the document classification 303 in FIG. 3, and three columns represented by “concept tree representation” 1402 have a hierarchical structure corresponding to the current concept of the document classification 303. It is expressed by a concept name.

[Document group integration / division processing]
When the document classification table construction process ends, the document group integration / division process by the document group integration / division means 106 starts.
The document group integration / division process includes a document group integration process and a document group division process. When the input abstraction level is lower than the previous abstraction level, the document group integration process starts. When the input abstraction level is higher than the previous abstraction level, the document group division process starts.
[Document group integration processing]
First, the document group integration process starts when the user inputs an abstraction level (step S701). In this case, it is assumed that “4” is input as the abstraction level.
Next, one document classification is acquired from the document classification table shown in FIG. 14 constructed by the document classification table construction unit 103 (step S702). In this case, a document classification whose important topic name is “concept” is acquired.
Next, the abstraction level input last time is compared with the abstraction level input this time (step S703). In this case, since the maximum abstraction level has been expanded and “5” is set as the previously input abstraction level, the input abstraction level is compared with the abstraction level of the document classification (step S710).
In this case, the current abstraction level of the document classification is “1”, the input abstraction level is “4”, and the input abstraction level is higher than the current abstraction level of the document classification. (Step S702).
Thereafter, similarly, steps S702 to S710 are repeated up to the 31st line in FIG.

When the target document classification is a document classification whose important topic name in the 32nd line is “high school teacher”, the input abstraction is smaller than the current abstraction of the document classification. (Step S711), it is determined whether the abstraction level of the superordinate concept is the input abstraction level (step S712).
In this case, the superordinate concept of the document classification whose important topic name is “high school teacher” is the document classification whose important topic name is “teacher”, and the abstraction level of this document classification is “4”, and the inputted abstract In order to match the degree, “4”, which is the abstraction level input to the current abstraction degree, is set (step 713).
Next, a superordinate concept is set to the current concept shown in the document classification 303 of FIG. 3 (step S714). In this case, a concept whose name is “teacher” is set. The concept objects set in the current concept form a tree structure having a superordinate concept and a subordinate concept.
Next, it is confirmed whether or not the current document classification is the last document classification (step S715). In this case, since it is not the last document classification, the target is moved to the next document classification (step S702).
Thereafter, steps S702 to S715 are repeated to construct the document classification table shown in FIG.

[Document group split processing]
The document group division process starts with the user inputting an abstraction level (step S701). In this case, it is assumed that “5” is input as the abstraction level.
Next, one document classification is acquired from the document classification table shown in FIG. 15 constructed by the document integration process (step S702). In this case, a document classification whose important topic name is “concept” is acquired.
Next, the abstraction level input last time is compared with the abstraction level input this time (step S703). In this case, since “4” is set as the abstraction level input last time, it is determined whether the abstraction level of the document classification is different from the current abstraction level of the document classification (step S704). In this case, since the abstract level of the document category is “1” and the current abstract level of the document category is “1”, the target is moved to the next document category (step S702). Thereafter, similarly, steps S702 to S704 are repeated until the 31st line in FIG.
When the target document classification is the document classification whose important topic name on line 32 is “high school teacher”, the abstract level of the document classification is different from “5” and the current abstraction level of the document classification is different from “4”. The abstraction level is compared with the abstraction level of the document classification (step S705).
Here, the abstract level of the document classification is “5”, the input abstraction level is “5”, and the input abstraction level is less than or equal to the current abstract level of the document classification, so the current concept of the document classification (FIG. 15). Then, the subordinate concept of “teacher” is traced (step S706), and it is determined whether or not the concept has the input abstraction level (step S707).

In this case, the subordinate concept set to “teacher”, which is the current concept of the document classification of “high school teacher” as the important topic name, becomes the document classification of “high school teacher”, “junior high school teacher”, and “primary school teacher”. The abstraction level is “5”, which matches the input abstraction level.
Next, it is determined whether or not the subordinate concept set in the “teacher”, which is the current concept of the document classification with the important topic name “high school teacher”, is a concept that includes the current document classification concept as a child concept (S708). ). In this case, since the current document classification concept is “high school teacher” and the subordinate concepts are “high school teacher”, “junior high school teacher”, and “elementary school teacher”, “high school teacher” is selected.
Next, the current abstraction level is set to the input abstraction level (S709). In this case, the abstraction level “5” input to the current abstraction level of “high school teacher”, which is the current document classification, is set.
Next, a subordinate concept is set to the current concept shown in the document classification 303 of FIG. 3 (S714). In this case, a concept whose name is “high school teacher” is set. The concept objects set in the current concept form a tree structure having a superordinate concept and a subordinate concept.
Next, it is confirmed whether or not the current document classification is the last document classification (step S715). In this case, since it is not the last document classification, the target is moved to the next document classification (step S702).
Thereafter, steps S702 to S715 are repeated to construct the document classification table shown in FIG.

[Document group presentation processing]
When the document group integration / division process ends, the document group presentation unit 107 starts the processed document group presentation process. In this case, it is assumed that the document classification table is a document classification table integrated at an abstraction level “4”.
In the document group presentation process, first, one document classification object is acquired from the document classification table shown in FIG. 15 created by the document group integration / division unit 106 (step S801). In this case, a document classification whose important topic name is “concept” is acquired.
Next, it is determined whether or not the current abstraction level of the document classification object is equal to or less than the input abstraction level (step S802). In this case, since the input abstraction level is “4” and the current abstract level of the document classification whose important topic name is “concept” is “1”, which is below the input abstraction level, the subsequent processing Continue.
Next, it is confirmed from the document classification table 304 whether the same current concept as the current concept of the document classification object exists (step S803). In this case, since there is no document classification whose current concept is “concept”, the subsequent processing is continued.
Next, the display area is calculated from the number of documents belonging to the current concept (step S808). In this case, since the number of documents whose important topic name is “concept” is “1”, for example, the display area is a display area of 32 × 1 dots in the vertical direction and 32 × 1 dots in the horizontal direction. If the number of documents is large, the display area is increased.
Next, the position of the display area is calculated, and the display area itself of 32 dots vertically and 32 dots horizontally is displayed on the graph where the vertical axis represents the appearance frequency and the horizontal axis represents the appearance date and time (step S809). In this case, the position of the display area is the position where the coordinates of the center position of the display area are the concept appearance frequency (for example, “1”) and the concept appearance date (for example, “June 15, 2003”). And
Next, the name of the current concept is displayed in the display area (step S810). In this case, “concept” is displayed.

Next, a document file path is acquired based on the document identifier of the document classification object, and the document itself is displayed in the display area (step S811). In this case, FIG. 12 which is the content of the document “concept.doc” is displayed in the display area.
Next, it is confirmed whether or not the current document classification is the last document classification (step S812). In this case, since it is not the last document classification yet, the target is moved to the next document classification (step S801).
Thereafter, similarly, steps S801 to S812 are repeated up to the 26th line in FIG. In this case, when the target document classification is the document classification whose important topic name is “teacher” on the 27th line, the document classification that is the same current concept exists from the 32nd line to the 34th line. The document classification is extracted (step S804), the appearance frequency of the current concept of the extracted document classification is added to the appearance frequency of the current concept of the current document classification (step S805), and the appearance date and time of the extracted current concept of the document classification and the current The appearance date and time of the current concept of the document classification are compared, and if the appearance date and time of the extracted current concept of the document classification is newer, it is updated (step S806), and the document corresponding to the document identifier of the extracted document classification is also displayed. Target (step S807).
In this case, the document classification objects whose important topic names are “high school teacher”, “junior high school teacher”, and “primary school teacher” are extracted as the document classification of the same current concept, and the frequency of appearance of each is the frequency of appearance of the concept of “teacher”. The appearance date and time are updated after the comparison, and the documents with the document identifiers 1 to 4 are made to be displayed as documents of the “teacher” concept.

Next, the display area is calculated from the number of documents belonging to the current concept (step S808). In this case, the number of documents is “4”. For example, the display area is a display area of 128 dots of vertical 32 × 4 dots and 128 dots of horizontal 32 × 4 dots. If the number of documents is large, the display area is increased.
Next, the position of the display area is calculated, and the display area itself of 128 dots vertically and 128 dots horizontally is displayed on the graph where the vertical axis represents the appearance frequency and the horizontal axis represents the appearance date and time (step S809). In this case, the position of the display area is a position where the coordinates of the center position of the display area are the appearance frequency of the teacher (for example, “100”) and the appearance date and time of the teacher (for example, “September 20, 2007”). And
Next, the name of the current concept is displayed in the display area (step S810). In this case, “teacher” is displayed.
Next, a document file path is acquired based on the document identifier of the document classification object, and the document itself is displayed in the display area (step S811). In this case, the contents of the documents with document identifiers 1 to 4 are displayed in the display area. Thereafter, the document classification on the 35th to 37th lines is processed in the same manner.
Finally, it is confirmed whether or not the current document classification is the last document classification (step S812). In this case, since it is the last document classification, it ends.
By performing the above-described processing, for example, a screen as shown in FIG. 17 is displayed. FIG. 17 is a diagram showing an example of a screen in which “4” is specified as the abstraction level. In order to make the display easy to see, the center position of the displayed “current concept” display area is shown. The coordinates (appearance frequency and appearance date and time of each important topic name) do not match the above description.

The advantages of the present invention described in the above embodiment will be described below with reference to FIGS.
FIG. 17 is a screen display example in which “4” is specified as the abstraction level, 1701 is a slide bar for specifying the abstraction level, 1702 is a document display area, and 1705 is a document group of “concept”. , 1704 is a document group of “teacher”, and 1703 is a document group of “car”.
FIG. 18 is a screen display example when “5” is input as the abstraction level, 1801 is a slide bar for specifying the abstraction level, 1802 is a document display area, and 1806 is a document group of “concept”. 1807 is the “teacher” document group, 1808 is the “high school teacher” document group, 1809 is the “junior high school teacher” document group, and 1810 is the “primary school teacher” document group, Reference numeral 1803 denotes a document group of “sports”, 1804 denotes a document group of “RV”, and 1805 denotes a document group of “sedan”.
When the level of abstraction transitions from “5” to “4”, the teacher 1704 in FIG. 17 is divided into a teacher 1807 in FIG. 18, a high school teacher 1808, a middle school teacher 1809, and an elementary school teacher 1810. It can be considered that it was done. In FIG. 18, the “concept” display area is moved to the vicinity of the center of the display in order to avoid the overlapping of figures and it is difficult to see, and the center position of the displayed “current concept” display area is displayed. The coordinates (appearance frequency and appearance date and time of each important topic name) do not match the above description.

As described above, according to the present embodiment, it is possible to create a document group integration and division result with a specified abstraction level, and use the integration / division result to obtain a document group to be classified as shown in FIGS. 17 and 18. It is possible to construct a screen that makes it possible to enlarge (divide document groups) and reduce (integrate document groups) semantically, with the document space as a map.
In other words, in this embodiment, a word space in a document is constructed by associating words appearing in a document with a hierarchical structure of word meanings, a topic is projected on a graph, and enlargement / reduction in terms of words is performed. By making the document space as a map and expanding and reducing the semantic meaning of the concept, it is possible to construct a user interface that presents more detailed parts of the concept that is of interest, and to perform analysis work efficiently It is possible to provide a document group presentation device capable of
As mentioned above, the invention made by the present inventor has been specifically described based on the above embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Of course.

It is a figure which shows the functional block of the document group presentation apparatus of the Example of this invention. It is a block diagram which shows the hardware constitutions of the computer which performs the document group presentation apparatus of the Example of this invention. It is a class diagram which shows the data structure in the document group presentation apparatus of the Example of this invention. It is a flowchart which shows the process sequence of the concept tree construction means of the document group presentation apparatus shown in FIG. It is a flowchart which shows the process sequence of the important topic extraction means of the document group presentation apparatus shown in FIG. It is a flowchart which shows the process sequence of the document classification table construction | assembly means of the document group presentation apparatus shown in FIG. It is a flowchart which shows the process sequence of the document group integration / division | segmentation means of the document group presentation apparatus shown in FIG. It is a flowchart which shows the process sequence of the document group presentation means of the document group presentation apparatus shown in FIG. It is a figure which shows an example of the conceptual part of the concept dictionary stored in the concept dictionary memory | storage means shown in FIG. It is a figure which shows an example of the relationship part of the concept dictionary stored in the concept dictionary memory | storage means shown in FIG. It is a figure which shows an example of the concept tree constructed | assembled by the concept tree construction means of the document group presentation apparatus shown in FIG. It is a figure which shows an example of the document stored in the document memory | storage means shown in FIG. It is a figure which shows an example of the important topic extraction example extracted by the important topic extraction means of the document group presentation apparatus shown in FIG. It is a figure which shows an example of the document classification table in case the abstraction level is 5 in the document group presentation apparatus of the Example of this invention. It is a figure which shows an example of a document classification table | surface when the abstraction level is changed from 5 to 4 in the document group presentation apparatus of the Example of this invention. It is a figure which shows an example of a document classification table | surface when the abstraction level is changed from 4 to 5 in the document group presentation apparatus of the Example of this invention. It is a figure which shows an example of the screen display example in case the abstraction level is designated 4 in the document group presentation apparatus of the Example of this invention. It is a figure which shows an example of the screen display example in case the abstraction level is changed from 4 to 5 in the document group presentation apparatus of the Example of this invention.

Explanation of symbols

11 Document group presentation device
DESCRIPTION OF SYMBOLS 101 Concept tree construction means 102 Important topic extraction means 103 Document classification table construction means 104 Concept dictionary storage means 105 Document storage means 106 Document group integration / division means 107 Document group presentation means 108 Abstract level input means 109 Display means 201 Display 202 CPU
203 Memory 204 Keyboard / Mouse 205 Hard Disk 206 CD-ROM Drive 207 CD-ROM
208 Communication Circuit 209 Internet 2051 Document Presentation Program 2052 Concept Dictionary Database 2053 Document Database 1701, 1801 Slide Bar 1702, 1802 Document Display Area

Claims (3)

  1. A document group presentation device for inputting a document and presenting a document group of a document management system having functions of registration, storage, retrieval, and display,
    A concept tree construction means for reading a concept dictionary and constructing a word concept as a tree structure;
    An important topic extraction means for extracting a word by reading a plurality of documents, and determining and extracting an important topic according to the number of occurrences of the word or whether it is a headword;
    Based on the concept tree constructed by the concept tree construction means and the important topics extracted by the important topic extraction means, a concept identifier, document identifier, abstraction level, and current abstraction level necessary for presenting a document group are obtained. A document classification table construction means for constructing a concept tree to be expressed as a document classification table;
    Document group integration that analyzes the document classification table constructed by the document classification table construction unit and updates the document classification table based on the abstraction level input this time, the abstraction level input last time, and the current abstraction level. Dividing means;
    A document group presentation apparatus comprising: a presentation unit that presents the document group based on the document classification table updated by the document group integration / division unit.
  2. The document classification table construction means, when the important topic extracted by the important topic extraction means exists in the concept tree constructed by the concept tree construction means, assigns a concept identifier of the important topic to the document classification. The concept identifier of the table, the document identifier of the document associated with the important topic as the document identifier of the document classification table, the hierarchy on the concept tree of the important topic as the abstraction level and the current abstraction level of the document classification table Set,
    The document group integration / division means, when the previously input abstraction level is greater than the abstraction level input this time, based on the abstraction level input this time and the current abstraction level of the document classification table, the important topic of the currently input abstraction, the integrating important topic subgeneric important topic of the currently input abstraction, the case the last input abstraction degree is less abstract entered the time In addition, based on the abstraction level input this time and the current abstraction level of the document classification table, the important topic of the superordinate concept integrating the important topics of the subordinate concept is changed to the important topic of the abstraction level input this time Update the document classification table to divide,
    The presenting means, based on the document classification table updated by the document group integration / division means, shows a document group associated with the important topic below the abstraction inputted this time , with the appearance frequency and horizontal axis 2. The document group presentation device according to claim 1, wherein the document group presentation device presents an area proportional to the number of documents associated with the important topic less than or equal to the abstraction level input this time on a graph having an appearance date and time as an axis.
  3. A document group presentation program for presenting a document group of a document management system having functions for inputting, registering, storing, searching, and displaying a document,
    The document group presentation program causes a computer to realize each means of the document group presentation device according to claim 1 or 2.
JP2007308151A 2007-11-29 2007-11-29 Document group presentation device and document group presentation program Expired - Fee Related JP5112027B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007308151A JP5112027B2 (en) 2007-11-29 2007-11-29 Document group presentation device and document group presentation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007308151A JP5112027B2 (en) 2007-11-29 2007-11-29 Document group presentation device and document group presentation program

Publications (2)

Publication Number Publication Date
JP2009134378A JP2009134378A (en) 2009-06-18
JP5112027B2 true JP5112027B2 (en) 2013-01-09

Family

ID=40866221

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007308151A Expired - Fee Related JP5112027B2 (en) 2007-11-29 2007-11-29 Document group presentation device and document group presentation program

Country Status (1)

Country Link
JP (1) JP5112027B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5436356B2 (en) * 2010-07-05 2014-03-05 日本電信電話株式会社 Period-specific subject phrase extraction apparatus, method, and program
JP6511954B2 (en) 2015-05-15 2019-05-15 富士ゼロックス株式会社 Information processing apparatus and program
WO2017158812A1 (en) * 2016-03-18 2017-09-21 株式会社日立製作所 Data classification method and data classification device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3577819B2 (en) * 1995-07-14 2004-10-20 富士ゼロックス株式会社 Information search apparatus and information search method
JP4084445B2 (en) * 1996-07-18 2008-04-30 松下電器産業株式会社 Data search support device, data search support method, and medium storing program
JP3001460B2 (en) * 1997-05-21 2000-01-24 日本電気株式会社 Document classification apparatus
JP2000348041A (en) * 1999-06-03 2000-12-15 Nec Corp Document retrieval method, device therefor and mechanically readable recording medium
JP3925003B2 (en) * 1999-09-29 2007-06-06 富士ゼロックス株式会社 Document processing apparatus and document processing method
JP3880534B2 (en) * 2003-04-02 2007-02-14 淳 安達 Document classification method and document classification program
JP4650293B2 (en) * 2006-02-15 2011-03-16 富士フイルム株式会社 Image classification display device and image classification display program

Also Published As

Publication number Publication date
JP2009134378A (en) 2009-06-18

Similar Documents

Publication Publication Date Title
KR101448325B1 (en) Rank graph
Van Ham et al. Mapping text with phrase nets
US8209605B2 (en) Method and system for facilitating the examination of documents
US6915308B1 (en) Method and apparatus for information mining and filtering
CN101251855B (en) Equipment, system and method for cleaning internet web page
US20040029085A1 (en) Summarisation representation apparatus
JP3598211B2 (en) Related word extraction device, related word extraction method, and computer readable recording medium on which related word extraction program is recorded
JP3981734B2 (en) Question answering system and question answering processing method
CN1159661C (en) System for Chinese tokenization and named entity recognition
Peng et al. Information extraction from research papers using conditional random fields
CA2777520C (en) System and method for phrase identification
US8892420B2 (en) Text segmentation with multiple granularity levels
Radev et al. Introduction to the special issue on summarization
US20050081146A1 (en) Relation chart-creating program, relation chart-creating method, and relation chart-creating apparatus
KR20040102071A (en) Integrated development tool for building a natural language understanding application
US9348808B2 (en) Content-based automatic input protocol selection
JP2005352888A (en) Notation fluctuation-responding dictionary creation system
JPH11110416A (en) Method and device for retrieving document from data base
Koch et al. VarifocalReader—in-depth visual analysis of large text documents
JP2005038386A (en) Device and method for sorting sentences
JP4160548B2 (en) Document summary creation system, method, and program
JP2005043977A (en) Method and device for calculating degree of similarity between documents
JP4502615B2 (en) Similar sentence search device, similar sentence search method, and program
US20130018894A1 (en) System and method of sentiment data generation
JP3372532B2 (en) Computer-readable recording medium for emotion information extraction method and emotion information extraction program

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100929

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20120703

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20120830

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20120830

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20121009

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20121010

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20151019

Year of fee payment: 3

LAPS Cancellation because of no payment of annual fees