WO2016132558A1 - Information processing device and method, and program - Google Patents

Information processing device and method, and program Download PDF

Info

Publication number
WO2016132558A1
WO2016132558A1 PCT/JP2015/054890 JP2015054890W WO2016132558A1 WO 2016132558 A1 WO2016132558 A1 WO 2016132558A1 JP 2015054890 W JP2015054890 W JP 2015054890W WO 2016132558 A1 WO2016132558 A1 WO 2016132558A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
concept
database
information processing
processing apparatus
Prior art date
Application number
PCT/JP2015/054890
Other languages
French (fr)
Japanese (ja)
Inventor
ヤコブ ハルスコウ
秀樹 武田
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Priority to PCT/JP2015/054890 priority Critical patent/WO2016132558A1/en
Publication of WO2016132558A1 publication Critical patent/WO2016132558A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to an information processing apparatus, method, and program, and is suitably applied to, for example, an information processing apparatus that monitors electronic mail.
  • Patent Document 1 discloses an abnormality detection system that efficiently detects an abnormality that occurs in a control system and isolates the control system in which the abnormality is recognized.
  • the present invention has been made in consideration of the above points, and intends to propose an information processing apparatus, method, and program that can improve the convenience as seen by the user by presenting the entire image of the data to the user. It is.
  • a database creation unit that creates a database in which a selected target concept is associated with a data element that is a subordinate concept of the target concept, and a target Based on the summary, a summary creation unit that extracts data including the data element registered in the database from data and creates a summary that expresses the content of the extracted data in a superordinate concept of the data element; And a display unit for classifying the data including the data elements registered in the database and displaying a classification result.
  • the information processing apparatus creates a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept, and the information A second step in which a processing device extracts data including the data element registered in the database from data and creates a summary expressing the content of the extracted data in a superordinate concept of the data element;
  • the information processing apparatus includes a third step of classifying the data including the data elements registered in the database based on the summary and displaying a classification result.
  • a first step of creating a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept in the information processing apparatus, and the data A second step of extracting data including the data element registered in the database, creating a summary expressing the content of the extracted data in a superordinate concept of the data element, and based on the summary, the database And classifying the data including the data element registered in (3), and executing a process including a third step of displaying a classification result.
  • the user can grasp the entire image of the data based on the display result of the information processing apparatus. Can be omitted.
  • FIG. 1 It is a block diagram which shows schematic structure of the information processing apparatus by this Embodiment. It is a graph with which it uses for description of an electronic dictionary.
  • A is a conceptual diagram used for the outline
  • B is a basic diagram which shows an example of the display format of a classification result. It is a conceptual diagram with which it uses for description of a target concept. It is a conceptual diagram which shows schematic structure of an extraction email management table.
  • It is a graph used for description of abstraction level filtering processing. It is a flowchart which shows the process sequence of a display process.
  • reference numeral 1 denotes an information processing device to which this embodiment is applied as a whole.
  • the information processing apparatus 1 monitors an electronic mail distributed through a network 2 such as an in-house LAN (Local Area Network), and a specific keyword set in advance in the electronic mail data (including a subject, text, and attached file).
  • a network 2 such as an in-house LAN (Local Area Network), and a specific keyword set in advance in the electronic mail data (including a subject, text, and attached file).
  • This is a computer device equipped with an e-mail monitoring function for notifying an administrator of this when it is detected, and a topic detection function to be described later.
  • the information processing apparatus 1 includes a CPU 10, a memory 11, a hard disk device 12, an interface 13, an input device 14, and a display device 15.
  • the CPU 10 is a processor (controller) having a function for controlling the operation of the entire information processing apparatus 1.
  • the memory 11 is composed of, for example, a nonvolatile semiconductor memory and is used as a work memory for the CPU 10.
  • the memory 11 stores an email monitoring program 20, a topic detection program 21, and an extracted email management table 22.
  • the email monitoring program 20 is a program that executes various processes for realizing the email monitoring function described above. Details of the topic detection program 21 and the extracted email management table 22 will be described later.
  • the hard disk device 12 is used for storing various programs and various data for a long period of time.
  • the hard disk device 12 stores an electronic dictionary 23 and a target concept extraction database 24.
  • the electronic dictionary 23 is a dictionary in which Japanese words and concepts are classified hierarchically and recorded in a systematic form. By using this electronic dictionary 23, it is possible to construct a graph representing the vertical relationship of concepts as shown in FIG. 2, for example. Details of the target concept extraction database 24 will be described later.
  • the input device 14 is composed of, for example, a keyboard and a mouse, and is used by a user to perform operation input and settings.
  • the display device 15 includes a liquid crystal display and is used for displaying various types of information.
  • the topic detection function installed in the information processing apparatus 1 will be described.
  • the information processing apparatus 1 has a concept that is pre-selected from e-mails distributed through the network 2 within a predetermined period (hereinafter referred to as a target concept). Extract e-mails that contain the keywords in the text, create summaries of the contents of each extracted e-mail with a moderate level of abstraction, classify the e-mails based on the created summaries (clustering), and within a predetermined period
  • Such a topic detection function is realized by two phases, a preparation phase and an application phase.
  • the preparation phase only the keywords of the subordinate concepts of each target concept preset by the user are extracted from the electronic dictionary 23 (FIG. 1), and the extracted keywords are associated with the corresponding target concepts.
  • the application phase the target concept extraction database 24 created in the preparation phase is used to create a summary expressing the contents of the corresponding email in a superordinate concept, and the corresponding email is classified based on the created summary.
  • This is a phase in which the classification result is displayed in response to a request from the user.
  • the “corresponding e-mail” mentioned here refers to an e-mail including the keyword registered in the target concept extraction database 24 in the text. The same applies to the following.
  • the user selects several target concepts corresponding to the topic to be detected from the text of the email, and registers the selected target concepts in the information processing apparatus 1 in advance.
  • the topics to be detected are “injustice” and “dissatisfaction”
  • the concept categories are “behavior”, “emotion”, “nature and state”, “risk” and “money” as shown in FIG.
  • risk and “dangerous” and “dangerous” and “money” “money paid for human labor” and so on. Set each.
  • the information processing apparatus 1 searches the electronic dictionary 23 for a keyword representing the subordinate concept for each registered target concept, and detects each keyword detected by the search.
  • the above-described target concept extraction database 24 is created in which each is associated with the corresponding target concept.
  • the information processing apparatus 1 is registered in the target concept extraction database 24 from e-mails distributed through the network 2 using the target concept extraction database 24 created as described above. Extract e-mails that contain the keyword in the text. Further, the information processing apparatus 1 creates a summary of the e-mail extracted in this way, using the superordinate concept of the keyword detected at that time.
  • the information processing apparatus 1 sends the corresponding email within a predetermined period according to the content based on the summary of the corresponding email created in this way. Classify and present the classification result to the user.
  • the memory 11 (FIG. 1) of the information processing apparatus 1 stores the topic detection program 21 and the extracted email management table 22 as described above with reference to FIG. ing.
  • the topic detection program 21 is a program for executing various processes related to the topic detection function described above, and includes a database creation unit 30, a summary creation unit 31, and a display unit 32 as shown in FIG.
  • the database creation unit 30 is a module having a function of creating the above-described target concept extraction database 24 based on the target concept set by the user.
  • the summary creation unit 31 is a module having a function of extracting an e-mail including a keyword registered in the target concept extraction database 24 in the text and creating the summary.
  • the display unit 32 is a module having a function of classifying corresponding e-mails using the summary in accordance with a request from the user and displaying an entire image of the corresponding e-mails within a predetermined period.
  • the extracted e-mail management table 22 is a table used for managing e-mails that are extracted in the application phase and that include keywords registered in the target concept extraction database 24 in the text.
  • the extracted email management table 22 includes a transmission date / time column 22A, a content column 22B, a transmission source address column 22C, a transmission destination address column 22D, and the like.
  • the transmission date and time column 22A stores the date and time when the electronic mail was transmitted from the transmission source
  • the content column 22B stores the above-described summary created for the electronic mail.
  • the sender address column 22C stores the email address of the sender of the email
  • the sender address column 22D stores the email address of the recipient of the email.
  • FIG. 6, FIG. 7 and FIG. 9 show specific process contents of various processes executed in the information processing apparatus 1 in relation to the topic detection function described above.
  • the processing entity of various processes will be described as “module (to part)”.
  • the CPU 10 executes the process based on the “module (to part)”. .
  • FIG. 6 shows a flow of a series of processes in the preparation phase. This processing (hereinafter referred to as database creation processing) is executed by the database creation unit 30.
  • the database creation unit 30 starts the database creation process shown in FIG. It waits for one or more target concepts to be selected (SP1).
  • the database creation unit 30 searches the electronic dictionary for the subordinate concepts for each target concept selected at that time, and extracts all the subordinate concepts. (SP2).
  • the database creation unit 30 extracts, from the electronic dictionary, all keywords related to the subordinate concepts for all the subordinate concepts extracted in step SP2 (SP3).
  • the database creation unit 30 creates the target concept extraction database 24 in which all the keywords extracted in step SP3 are associated with the corresponding target concepts (SP4). Then, the database creation unit 30 thereafter ends this database creation process.
  • FIG. 7 shows the flow of the process from extracting an e-mail that includes a keyword registered in the target concept extraction database 24 in the text and creating a summary of the series of processes in the application phase.
  • This processing (hereinafter referred to as summary creation processing) is executed by the summary creation unit 31.
  • the summary creation unit 31 starts the summary creation process illustrated in FIG. 7, and first captures from the network 2 to execute the above-described e-mail monitoring function.
  • One e-mail to be analyzed is selected from the e-mails (SP10).
  • the summary creation unit 31 performs morphological analysis on the text of the selected e-mail to divide the text into individual morphemes (minimum unit having meaning in the language) (SP11). Whether or not there is a morpheme registered as a keyword in the target concept extraction database 24 among the morphemes obtained by the morpheme analysis by searching each morpheme obtained in the target concept extraction database 24 Is determined (SP12).
  • the summary creation unit 31 obtains a negative result in this determination, it returns to step SP10 and moves to the next unprocessed e-mail.
  • the summary creation unit 31 obtains a positive result in the determination at step SP12, among the morphemes obtained by the morpheme analysis at step SP11, for each morpheme registered as a keyword in the target concept extraction database 24, Referring to the target concept extraction database 24, each target concept that is a superordinate concept of the morpheme (keyword) is detected (SP13).
  • the summary creation unit 31 executes an abstraction filtering process for extracting a concept having a predetermined abstraction from the subordinate concepts for each target concept detected in step SP13 (SP14). This is because even if a summary is created using a concept that is too high, the user cannot recognize the content of the email based on the summary, so the user can recognize the content of the email. This is because a summary is created using a superordinate concept having an abstraction level.
  • the summary creation unit 31 performs the above-described abstraction level filtering processing as described above with reference to FIG. 2 among the keywords registered in the target concept extraction database 24 for each target concept, as shown in FIG.
  • FIG. 8 In the graph showing the top-and-bottom relationship of the concept constructed by using the digitized dictionary, to the leaf-level keywords (keywords that do not have subordinate concepts and correspond to “leaf_1” to “leaf_3” in FIG. 8)
  • a superordinate concept whose average distance is less than a preset threshold and has the largest mean distance is detected as a superordinate concept used for summarization.
  • the average distance from the node “C:” to the three leaf nodes “leaf_1” to “leaf_3” is three leaf nodes “leaf_1” to “leaf_3” from the node “C:” Is calculated by dividing the total distance by the number of leaf nodes.
  • the distance from the node “leaf_1” to the node “C:” and the distance from the node “leaf_2” to the node “C:” are both “2”.
  • the total distance is “5” which is the total value of these distances. Therefore, "5/3 ( ⁇ 1.67)” obtained by dividing "5" by the number of leaf nodes "3” is the average from the node “C:” to the three leaf nodes "leaf_1" to "leaf_3" Distance.
  • step SP14 the summary creation unit 31 selects all the keywords higher than the morpheme (keyword) detected in step SP12 among the keywords registered in the target concept extraction database 24 for each target concept detected in step SP13.
  • the average distance from the superordinate concept to the leaf node is calculated, and the calculated average distance is smaller than a preset threshold and the average distance is calculated.
  • One superordinate concept that is closest to the threshold is extracted.
  • the summary creation unit 31 creates a summary of the email by arranging the superordinate concepts for each target concept extracted in this way (SP15), and further, the necessary information regarding the email is described above with reference to FIG. After storing in the extracted e-mail management table 22 (SP16), the process returns to step SP10.
  • FIG. 9 in the series of processing in the application phase, the user is given an instruction to display the entire image of the corresponding e-mail within a predetermined period (hereinafter referred to as the entire image display instruction). Shows the flow of processing executed in the information processing apparatus 1 in the case of This process (hereinafter referred to as display process) is executed by the display unit 32 (FIG. 1).
  • the display unit 32 starts the display process shown in FIG. 9 when the input device 14 is operated and the whole image display instruction is given.
  • the display unit 32 stores the email registered in the extracted email management table 22. Among them, all e-mails transmitted from the transmission source within a predetermined period are classified according to the contents of the summary (SP20).
  • a classification method at this time for example, a method of classifying e-mails whose summary contents completely match as a same group, or a superordinate concept of each concept constituting the summary even if the summary contents do not completely match Can be applied to a method of classifying them into the same group.
  • the display unit 32 displays the classification result of step SP22 on the display device 15 (FIG. 1) in the predetermined format described above with reference to FIG. 3B, for example (SP21), and thereafter ends this display processing. .
  • target concept extraction is performed by associating the selected target concept with a keyword representing a subordinate concept of the target concept.
  • Database 24 is created, an e-mail containing the keyword registered in the target concept extraction database 24 in the text is extracted, a summary representing the contents of the e-mail in a high-level concept is created, and a request from the user Accordingly, the corresponding electronic mail is classified based on the summary, and the classification result is displayed.
  • the information processing apparatus 1 during the monitoring process based on the e-mail monitoring function, such an e-mail including a keyword preset based on the e-mail monitoring function is not detected. Based on the classification result, the entire image of the e-mail including the keyword registered in the target concept extraction database 24 can be recognized, so that the user recognizes that the information processing apparatus 1 is functioning normally. be able to. That is, according to the information processing apparatus 1, the user can recognize an entire image of the contents of the email within a predetermined period without looking through the text of each email. Thus, according to the information processing apparatus 1, it is possible to improve the convenience as viewed from the user.
  • Second Embodiment In the first embodiment, by registering a target concept related to a specific topic desired by a user, an e-mail including a keyword of a subordinate concept of the target concept is extracted, The whole image of these emails is displayed, but the information processing apparatus 1 creates summaries for all emails, classifies the emails based on the created summaries, and gives an overall view of the classification results. May be displayed.
  • the preparation phase described above is not necessary, and the morphological analysis is performed on the text of the e-mail, the characteristic morpheme is extracted from the result (characteristic morpheme extraction process), and the superordinate concept of the extracted morpheme is detected ( (Upper concept detection process), an upper level concept of an appropriate level is extracted from the detected higher level concepts (abstract filtering and higher level concept ranking process), and an e-mail is obtained in the same manner as in the above-described embodiment based on the result. And an overall image of the classification result may be displayed.
  • a reference corpus is prepared.
  • the reference corpus is a structure in which natural language sentences are structured and accumulated on a large scale, and the appearance frequency of morphemes can be easily extracted from the reference corpus.
  • a frequency that a morpheme appears in unknown data to be analyzed is O 11
  • a frequency that appears in a reference corpus is O 12
  • a frequency that all other morphemes different from the morpheme appear in unknown data O 21, the frequency of the other all the morphemes appear in the reference corpus O 22.
  • a log-likelihood-ratio is calculated by the following equation.
  • the log likelihood ratio indicates that the higher the value, the higher the probability that the morpheme characterizes unknown data. Therefore, for example, a morpheme whose log likelihood ratio is set in advance is extracted as a characteristic morpheme.
  • the superordinate concept detection process the superordinate concept of the morpheme extracted by the above characteristic morpheme extraction process is detected by searching the electronic dictionary 23 described above with reference to FIG.
  • superordinate concepts having a certain degree of abstraction are extracted from the superordinate concepts detected in the superordinate concept detection process by the abstraction degree filtering process described above for step SP14 in FIG. If there are multiple superordinate concepts extracted by this extraction process, Since the concept appearance frequency (CF) is obtained from the above, by ranking the appearance frequency of the concept, a predetermined number of high appearance frequencies or a higher concept whose appearance frequency is equal to or higher than a preset threshold is extracted. A summary of these superordinate concepts is taken as the summary of the e-mail.
  • CF concept appearance frequency
  • CF / DF Document Frequency
  • CF / TF-iDF Document Frequency
  • a ranking method using a value calculated based on (an index calculated from the word appearance frequency and the document frequency) or other methods can also be used. Thereafter, all e-mails within a predetermined period are classified using this summary, and the classification result is displayed.
  • the user since all electronic mails can be classified according to the contents thereof, the user recognizes the entire picture of the contents of all electronic mails within a predetermined period. Thus, the convenience as seen from the user can be further improved.
  • step SP14 of the summary creation process described above with reference to FIG. 7 the concept that the average distance to the leaf level is less than a preset threshold value is represented in the e-mail.
  • the superordinate concept of the keyword extracted from the text the case where the summary of the e-mail is created using the superordinate concept has been described, but the present invention is not limited to this, for example, extracted from the text of the e-mail It is also possible to obtain a superordinate concept of keywords registered in the target concept extraction database 24 and create a summary of the e-mail using the superordinate concept.
  • a chart such as a pie chart, a bar graph, or a line graph that clearly shows the ratio of each summarized / categorized result to the whole is displayed as such a whole picture (for example, topic A occupies 20% of the whole, B can occupy 10% of the whole, topic C can occupy 5% of the whole, and other topics can occupy 65% of the whole.
  • the electronic mail monitoring function and the topic detection function are mounted on the same information processing apparatus 1 (that is, the electronic mail monitoring program 20 and the topic detection program 21 are combined into one information processing).
  • the present invention is not limited to this, and these two functions are mounted in separate information processing apparatuses (for example, the e-mail monitoring program 20 and the topic detection program 21 are separately provided). May be mounted on the information processing apparatus.
  • the system may be constructed as a distributed system in which an electronic mail monitoring function and a topic detection function are executed by a plurality of information processing apparatuses.
  • the information processing apparatus 1 calculates a correlation (co-occurrence) between a concept (first concept) and another concept (second concept) different from the concept. Considering this, data may be analyzed. For example, if the first concept “system” (evaluation target) and the second concept “value determination” (value judgment) often appear simultaneously in the same data, the evaluation target “system” The information processing device 1 may be able to present a value judgment that the evaluation is low to the user.
  • the information processing apparatus 1 not only associates the keyword with the target concept in the preparation phase, but also scores for the keyword (whether the keyword indicates a positive emotion or a negative emotion, for example, , An index quantified by a value of 0 to 1) is associated as a concept emotion score, and based on the concept emotion score corresponding to the concept extracted from the data in the application phase (for example, the concept emotion score is added and accumulated) ) Can present the emotion (value judgment) for the concept (evaluation target) to the user Unishi and may be.
  • an e-mail containing a keyword belonging to a subordinate concept of a target concept selected in advance is extracted, and a summary of the e-mail is created using the superordinate concept of the keyword.
  • the information processing apparatus 1 extracts a verb phrase included in the sentence as a superordinate concept, and uses the extracted verb phrase to convert the data including the sentence.
  • a summary may be created.
  • the information processing apparatus 1 may extract the verb phrase “I enjoyed” from the sentence “I enjoyed cooking” and present the verb phrase as a summary to the user.
  • the present invention can be applied to an Internet application system.
  • data such as messages posted by the user to the SNS, recommended information and reviews posted on the website, and user or group profiles can be summarized and provided to the user by the information processing apparatus of the present invention.
  • the information processing device includes an evaluation target (for example, the product in the case of a product review posted by the user on the website) and a value determination (summary of how the product was evaluated). Therefore, the convenience of the user regarding the Internet can be improved.
  • the present invention can also be applied to a medical application system (for example, a system that predicts a patient's prognosis or verifies a drug effect using electronic medical records, nursing records, patient diaries, and the like as data).
  • a medical application system for example, a system that predicts a patient's prognosis or verifies a drug effect using electronic medical records, nursing records, patient diaries, and the like as data.
  • the patient may fall into a dangerous state (for example, falls). Prediction can be facilitated.
  • the present invention can also be applied to a discovery support system. For example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, the user can efficiently extract only the documents related to the lawsuit and submit them to the court. it can.
  • the present invention can also be applied to a forensic system.
  • a forensic system for example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, it is possible to facilitate the extraction of evidence that proves the criminal activity. Work efficiency can be improved.
  • a data analysis system equipped with a predictive coding function includes a client device (for example, a user terminal such as a personal computer or a smartphone) that executes part or all of a data analysis program that executes the data analysis, and the data analysis described above.
  • a server device that executes part or all of the program and returns the execution result to the client device, and is configured to arbitrarily share the processing included in the data analysis program between the client device and the server device.
  • the score calculated for the data by the predictive coding function is adjusted based on the value judgment indicated by the data summary.
  • the value judgment indicating “not interested” is shown as a summary from the data
  • the information processing apparatus of the present invention may be able to adjust the score, for example, by reducing the calculated score.
  • the present invention can also be applied to a patent search system.
  • a patent search system For example, by summarizing data such as patent documents and documents summarizing the invention by the information processing apparatus, the user can efficiently perform an operation of extracting invalid materials from a large number of patent documents.
  • the information processing apparatus of the present invention is widely applied not only to the information processing apparatus 1 that monitors e-mails but also to various systems such as a forensic system, a discovery support system, a medical application system, an Internet application system, and a patent research system. can do.
  • the information processing apparatus of the present invention can be widely applied to any system such as a portal site management system, a project evaluation system, a transaction management system, a call center escalation system, and a marketing system. That is, the present invention is widely applied to a system that presents an overall image of data to the user by extracting the superordinate concept from the data, creating a summary expressed by the superordinate concept, and presenting the summary to the user. obtain.
  • the present invention can be widely applied to various information processing apparatuses such as an information processing apparatus that detects a change in environment or a specific state, and a server apparatus that provides a web page on the Internet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

[Problem] To provide an information processing device, method, and program which improve usability. [Solution] An information processing device creates a database which maps selected subject concepts with data elements which are subordinate concepts of the subject concepts; extracts, from among data, data including the data elements which are registered with the database, and creates a summary which expresses the content of the extracted data with superordinate concepts of the data elements; and, on the basis of the created summary, classifies the data which includes the data elements which are registered with the database, and displays the result of the classification.

Description

情報処理装置及び方法並びにプログラムInformation processing apparatus and method, and program
 本発明は、情報処理装置及び方法並びにプログラムに関し、例えば、電子メールを監視する情報処理装置に適用して好適なものである。 The present invention relates to an information processing apparatus, method, and program, and is suitably applied to, for example, an information processing apparatus that monitors electronic mail.
 従来、環境の変化を検知した場合や、特定の状態を検出した場合に、当該変化又は特定の状態を検出したことをユーザに通知するシステムが広く研究されている。例えば、特許文献1には、制御システムにおいて発生する異常を効率的に検出し、異常が認められた制御システムを隔離する異常検出システムが開示されている。 Conventionally, when a change in the environment is detected or a specific state is detected, a system for notifying the user that the change or the specific state has been detected has been widely studied. For example, Patent Document 1 discloses an abnormality detection system that efficiently detects an abnormality that occurs in a control system and isolates the control system in which the abnormality is recognized.
特開2012-168755号公報JP 2012-168755 A
 ところで、かかるシステムでは、システムが「変化」や「特定の状態」を検出していないときには、システムが正常に機能しているが真に「変化」や「特定の状態」が発生していないのか、又は、システムが正常に機能していないために「変化」や「特定の状態」を検出できていないのかをユーザが認識することができない。 By the way, in such a system, when the system does not detect "change" or "specific state", is the system functioning normally, but isn't there really "change" or "specific state" occurring? Alternatively, the user cannot recognize whether “change” or “specific state” has not been detected because the system is not functioning normally.
 従って、このようなシステムにおいて、システムが「変化」や「特定の状態」を検出していない状態のときに、例えば所定期間内における電子メールの内容の全体像をユーザに提供できれば、システムは正常に機能しているが真に「変化」や「特定状態」が発生していないことをユーザが容易に認識し得、システムに対する安心感や信頼性を向上させることができるものと考えられる。またこのようにすることによって、ユーザが個々の電子メールに目を通すことなく、所定期間内における電子メールの内容の全体像をユーザが認識することができるため、ユーザから見たシステムの利便性を向上させ得るものと考えられる。 Therefore, in such a system, if the system is in a state where no “change” or “specific state” has been detected, for example, if the entire picture of the contents of the e-mail within a predetermined period can be provided to the user, the system is normal. However, it is considered that the user can easily recognize that no “change” or “specific state” has actually occurred, and the security and reliability of the system can be improved. In addition, by doing this, the user can recognize the whole image of the contents of the email within a predetermined period without looking at each email, so the convenience of the system as seen from the user It is thought that this can be improved.
 また近年、インターネット上の商品の販売サイトや飲食店等の紹介サイトなどでは、商品や飲食店等に関する利用者のレビューを掲載するケースが増えてきている。このような利用者のレビューは、その商品を購入し又はその飲食店等を利用しようとしているユーザにとって有益な情報であるものの、すべてのレビューに目を通すためには相当の時間及び労力を要することとなる。 Also, in recent years, there are an increasing number of cases where users' reviews regarding products and restaurants are posted on websites for selling products on the Internet and introduction sites for restaurants and the like. Although such user reviews are useful information for users who purchase the product or use the restaurant, etc., it takes considerable time and effort to read through all reviews. It will be.
 従って、このようなウェブサイトにおいて、かかるレビューの全体像をユーザに提供することができれば、個々のレビューに目を通す時間や労力を省略させて、ユーザから見たインターネットシステム全体としての利便性を向上させ得るものと考えられる。 Therefore, if such a website can provide the user with an overall view of such reviews, the time and labor required to read through individual reviews can be omitted, and the convenience of the Internet system as a whole viewed from the user can be reduced. It is thought that it can be improved.
 本発明は以上の点を考慮してなされたもので、データの全体像をユーザに提示することにより当該ユーザから見た利便性を向上させ得る情報処理装置及び方法並びにプログラムを提案しようとするものである。 The present invention has been made in consideration of the above points, and intends to propose an information processing apparatus, method, and program that can improve the convenience as seen by the user by presenting the entire image of the data to the user. It is.
 かかる課題を解決するため本発明においては、情報処理装置において、選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成するデータベース作成部と、対象とするデータの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する要約作成部と、前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する表示部とを設けるようにした。 In order to solve such a problem, in the present invention, in an information processing apparatus, a database creation unit that creates a database in which a selected target concept is associated with a data element that is a subordinate concept of the target concept, and a target Based on the summary, a summary creation unit that extracts data including the data element registered in the database from data and creates a summary that expresses the content of the extracted data in a superordinate concept of the data element; And a display unit for classifying the data including the data elements registered in the database and displaying a classification result.
 また本発明においては、情報処理方法において、情報処理装置が、選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成する第1のステップと、前記情報処理装置が、データの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する第2のステップと、前記情報処理装置が、前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する第3のステップとを含むようにした。 According to the present invention, in the information processing method, the information processing apparatus creates a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept, and the information A second step in which a processing device extracts data including the data element registered in the database from data and creates a summary expressing the content of the extracted data in a superordinate concept of the data element; The information processing apparatus includes a third step of classifying the data including the data elements registered in the database based on the summary and displaying a classification result.
 さらに本発明においては、プログラムにおいて、情報処理装置に、選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成する第1のステップと、データの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する第2のステップと、前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する第3のステップとを含む処理を実行させるようにした。 Furthermore, in the present invention, in the program, a first step of creating a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept in the information processing apparatus, and the data A second step of extracting data including the data element registered in the database, creating a summary expressing the content of the extracted data in a superordinate concept of the data element, and based on the summary, the database And classifying the data including the data element registered in (3), and executing a process including a third step of displaying a classification result.
 本情報処理装置及び情報処理方法並びにプログラムによれば、ユーザは、情報処理装置の表示結果に基づいて、データの全体像を把握することができるため、ユーザが個々のデータに目を通す手間を省略させることができる。 According to the information processing apparatus, the information processing method, and the program, the user can grasp the entire image of the data based on the display result of the information processing apparatus. Can be omitted.
 本発明によれば、ユーザから見た利便性を向上させ得る情報処理装置及び方法並びにプログラムを実現できる。 According to the present invention, it is possible to realize an information processing apparatus, method, and program that can improve convenience for the user.
本実施の形態による情報処理装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the information processing apparatus by this Embodiment. 電子化辞書の説明に供するグラフである。It is a graph with which it uses for description of an electronic dictionary. (A)は本発明の概要説明に供する概念図であり、(B)は分類結果の表示形式の一例を示す略線図である。(A) is a conceptual diagram used for the outline | summary description of this invention, (B) is a basic diagram which shows an example of the display format of a classification result. 対象概念の説明に供する概念図である。It is a conceptual diagram with which it uses for description of a target concept. 抽出電子メール管理テーブルの概略構成を示す概念図である。It is a conceptual diagram which shows schematic structure of an extraction email management table. データベース作成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a database creation process. 要約作成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a summary preparation process. 抽象度フィルタリング処理の説明に供するグラフである。It is a graph used for description of abstraction level filtering processing. 表示処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a display process.
 以下図面について、本発明の一実施の形態を詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
(1)第1の実施の形態
(1-1)本実施の形態による情報処理装置の構成
 図1において、1は全体として本実施を適用した情報処理装置を示す。本情報処理装置1は、社内LAN(Local Area Network)等のネットワーク2を流通する電子メールを監視し、電子メールのデータ(件名、本文及び添付ファイルを含む)内に予め設定された特定のキーワードを検出した場合に、これを管理者に通知する電子メール監視機能と、後述するトピック検出機能とが搭載されたコンピュータ装置である。この情報処理装置1は、CPU10、メモリ11、ハードディスク装置12、インタフェース13、入力装置14及び表示装置15を備えて構成される。
(1) First Embodiment (1-1) Configuration of Information Processing Device According to this Embodiment In FIG. 1, reference numeral 1 denotes an information processing device to which this embodiment is applied as a whole. The information processing apparatus 1 monitors an electronic mail distributed through a network 2 such as an in-house LAN (Local Area Network), and a specific keyword set in advance in the electronic mail data (including a subject, text, and attached file). This is a computer device equipped with an e-mail monitoring function for notifying an administrator of this when it is detected, and a topic detection function to be described later. The information processing apparatus 1 includes a CPU 10, a memory 11, a hard disk device 12, an interface 13, an input device 14, and a display device 15.
 CPU10は、情報処理装置1全体の動作制御を司る機能を有するプロセッサ(コントローラ)である。またメモリ11は、例えば不揮発性の半導体メモリから構成され、CPU10のワークメモリとして利用される。メモリ11には、電子メール監視プログラム20、トピック検出プログラム21及び抽出電子メール管理テーブル22が格納される。電子メール監視プログラム20は、上述の電子メール監視機能を実現するための各種処理を実行するプログラムである。トピック検出プログラム21及び抽出電子メール管理テーブル22の詳細については、後述する。 The CPU 10 is a processor (controller) having a function for controlling the operation of the entire information processing apparatus 1. The memory 11 is composed of, for example, a nonvolatile semiconductor memory and is used as a work memory for the CPU 10. The memory 11 stores an email monitoring program 20, a topic detection program 21, and an extracted email management table 22. The email monitoring program 20 is a program that executes various processes for realizing the email monitoring function described above. Details of the topic detection program 21 and the extracted email management table 22 will be described later.
 ハードディスク装置12は、各種プログラムや各種データを長期間保存するために利用される。ハードディスク装置12には、電子化辞書23及び対象概念抽出用データベース24が格納される。電子化辞書23は、日本語の単語や概念を階層的に分類し、これら単語や概念を体系化した形で収録した辞書である。この電子化辞書23を利用することにより、例えば図2に示すような概念の上下関係を表すグラフを構築することができる。対象概念抽出用データベース24の詳細については、後述する。 The hard disk device 12 is used for storing various programs and various data for a long period of time. The hard disk device 12 stores an electronic dictionary 23 and a target concept extraction database 24. The electronic dictionary 23 is a dictionary in which Japanese words and concepts are classified hierarchically and recorded in a systematic form. By using this electronic dictionary 23, it is possible to construct a graph representing the vertical relationship of concepts as shown in FIG. 2, for example. Details of the target concept extraction database 24 will be described later.
 入力装置14は、例えばキーボードやマウスなどから構成され、ユーザが操作入力や設定等を行うために利用される。また表示装置15は、液晶ディスプレイなどから構成され、各種情報を表示するために利用される。 The input device 14 is composed of, for example, a keyboard and a mouse, and is used by a user to perform operation input and settings. The display device 15 includes a liquid crystal display and is used for displaying various types of information.
(1-2)トピック検出機能
 次に、かかる情報処理装置1に搭載されたトピック検出機能について説明する。本情報処理装置1には、図3(A)に示すように、所定期間内にネットワーク2を流通した電子メールの中から予め選定された概念(以下、これを対象概念と呼ぶ)の下位概念のキーワードをテキストに含む電子メールを抽出し、抽出した各電子メールの内容の要約を適度な抽象度でそれぞれ作成し、作成した要約に基づいて電子メールを分類(クラスタリング)し、所定期間内における電子メールの分類結果を例えば図3(B)のような形式でユーザに提示するトピック検出機能が搭載されている。
(1-2) Topic Detection Function Next, the topic detection function installed in the information processing apparatus 1 will be described. As shown in FIG. 3A, the information processing apparatus 1 has a concept that is pre-selected from e-mails distributed through the network 2 within a predetermined period (hereinafter referred to as a target concept). Extract e-mails that contain the keywords in the text, create summaries of the contents of each extracted e-mail with a moderate level of abstraction, classify the e-mails based on the created summaries (clustering), and within a predetermined period A topic detection function for presenting the classification result of the electronic mail to the user in a format as shown in FIG.
 このようなトピック検出機能は、準備フェーズ及び適用フェーズの2段階のフェーズにより実現される。準備フェーズは、電子化辞書23(図1)の中から予めユーザにより設定された各対象概念の下位概念のキーワードだけを抽出し、抽出したキーワードをそれぞれ対応する対象概念に対応付けた上述の対象概念抽出用データベース24(図1)を作成するためのフェーズである。また適用フェーズは、準備フェーズで作成した対象概念抽出用データベース24を利用して該当する電子メールの内容を上位概念で表現した要約を作成し、作成した要約に基づいて該当する電子メールを分類し、分類結果をユーザからの要求に応じて表示するフェーズである。なお、ここで言う「該当する電子メール」とは、対象概念抽出用データベース24に登録されたキーワードをテキスト内に含む電子メールを指す。以下においても同様である。 Such a topic detection function is realized by two phases, a preparation phase and an application phase. In the preparation phase, only the keywords of the subordinate concepts of each target concept preset by the user are extracted from the electronic dictionary 23 (FIG. 1), and the extracted keywords are associated with the corresponding target concepts. This is a phase for creating the concept extraction database 24 (FIG. 1). In the application phase, the target concept extraction database 24 created in the preparation phase is used to create a summary expressing the contents of the corresponding email in a superordinate concept, and the corresponding email is classified based on the created summary. This is a phase in which the classification result is displayed in response to a request from the user. The “corresponding e-mail” mentioned here refers to an e-mail including the keyword registered in the target concept extraction database 24 in the text. The same applies to the following.
 準備フェーズでは、まず、ユーザが、電子メールのテキストから検出したい話題(トピック)に応じた幾つかの対象概念を選定し、選定した対象概念を予め情報処理装置1に登録する。例えば、検出したいトピックが「不正」及び「不満」である場合、図4に示すように、概念のカテゴリを「行動」、「感情」、「性質や状態」、「リスク」及び「金銭」の5つに分けて、例えば「行動」については「復讐する」及び「軽蔑する」など、「感情」については「苦しむこと」及び「腹を立てること」など、「性質や状態」については「鈍重だ」及び「心や態度が悪い」など、「リスク」については「脅す」及び「だます」など、「金銭」については「人の労働に対して支払われるお金」などの概念を対象概念としてそれぞれ設定する。 In the preparation phase, first, the user selects several target concepts corresponding to the topic to be detected from the text of the email, and registers the selected target concepts in the information processing apparatus 1 in advance. For example, if the topics to be detected are “injustice” and “dissatisfaction”, the concept categories are “behavior”, “emotion”, “nature and state”, “risk” and “money” as shown in FIG. For example, “behave” for “behavior” and “despise” for “behavior”, “suffer” and “be angry” for “emotion”, etc. For “risk” and “dangerous” and “dangerous” and “money”, “money paid for human labor” and so on. Set each.
 情報処理装置1は、このようにして対象概念が設定されると、登録された対象概念ごとに、その下位概念を表すキーワードを電子化辞書23上で検索し、当該検索により検出した個々のキーワードをそれぞれ対応する対象概念に対応付けた上述の対象概念抽出用データベース24を作成する。 When the target concept is set in this way, the information processing apparatus 1 searches the electronic dictionary 23 for a keyword representing the subordinate concept for each registered target concept, and detects each keyword detected by the search. The above-described target concept extraction database 24 is created in which each is associated with the corresponding target concept.
 一方、適用フェーズでは、情報処理装置1は、上述のようにして作成した対象概念抽出用データベース24を利用して、ネットワーク2を流通する電子メールの中から、対象概念抽出用データベース24に登録されたキーワードをテキスト内に含む電子メールを抽出する。また情報処理装置1は、このようにして抽出した電子メールについて、そのテキストの内容をそのとき検出したキーワードの上位概念を用いて表した要約を作成する。 On the other hand, in the application phase, the information processing apparatus 1 is registered in the target concept extraction database 24 from e-mails distributed through the network 2 using the target concept extraction database 24 created as described above. Extract e-mails that contain the keyword in the text. Further, the information processing apparatus 1 creates a summary of the e-mail extracted in this way, using the superordinate concept of the keyword detected at that time.
 例えば図3の場合、図3(A)に示すように、「e-mail_1」については、「監視システム受注」という箇所から「システム」、「販売」及び「する」という対象概念が抽出され、「e-mail_2」については、「会計システム導入」という箇所から「システム」、「販売」及び「する」という上位概念が抽出されるため、これら「e-mail_1」及び「e-mail_2」については、いずれも「システム 販売 する」という要約が作成されることになる。 For example, in the case of FIG. 3, as shown in FIG. 3A, for “e-mail_1”, target concepts “system”, “sales”, and “do” are extracted from the location “monitoring system order”, As for “e-mail_2”, the high-level concepts of “system”, “sales” and “do” are extracted from the place of “accounting system introduction”, so for these “e-mail_1” and “e-mail_2” In both cases, a summary of “system sales” will be created.
 そして情報処理装置1は、この後、ユーザからの要求があった場合に、このようにして作成した該当する電子メールの要約に基づいて、所定期間内における該当する電子メールをその内容に応じて分類し、分類結果をユーザに提示する。 Then, when there is a request from the user, the information processing apparatus 1 sends the corresponding email within a predetermined period according to the content based on the summary of the corresponding email created in this way. Classify and present the classification result to the user.
 例えば図3の場合、上述のように「e-mail_1」及び「e-mail_2」について「システム 販売 する」という同じ要約が作成されるため、これら「e-mail_1」及び「e-mail_2」が同一のグループに分類される。そして、この分類結果が例えば図3(B)のように要約を「内容」とする形式で表示される。 For example, in the case of FIG. 3, since the same summary of “system sale” is created for “e-mail_1” and “e-mail_2” as described above, these “e-mail_1” and “e-mail_2” are the same. Classified into groups. Then, this classification result is displayed in a format in which the summary is “content” as shown in FIG. 3B, for example.
 以上のようなトピック検出機能を実現するための手段として、情報処理装置1のメモリ11(図1)には、図1について上述したようにトピック検出プログラム21及び抽出電子メール管理テーブル22が格納されている。 As means for realizing the topic detection function as described above, the memory 11 (FIG. 1) of the information processing apparatus 1 stores the topic detection program 21 and the extracted email management table 22 as described above with reference to FIG. ing.
 トピック検出プログラム21は、上述のトピック検出機能に関する各種処理を実行するためのプログラムであり、図1に示すように、データベース作成部30、要約作成部31及び表示部32から構成される。 The topic detection program 21 is a program for executing various processes related to the topic detection function described above, and includes a database creation unit 30, a summary creation unit 31, and a display unit 32 as shown in FIG.
 データベース作成部30は、ユーザにより設定された対象概念に基づいて上述の対象概念抽出用データベース24を作成する機能を有するモジュールである。また要約作成部31は、対象概念抽出用データベース24に登録されたキーワードをテキスト内に含む電子メールを抽出し、その要約を作成する機能を有するモジュールである。さらに表示部32は、ユーザからの要求に応じて、該当する電子メールをその要約を利用して分類し、所定期間内における該当する電子メールの全体像を表示する機能を有するモジュールである。 The database creation unit 30 is a module having a function of creating the above-described target concept extraction database 24 based on the target concept set by the user. The summary creation unit 31 is a module having a function of extracting an e-mail including a keyword registered in the target concept extraction database 24 in the text and creating the summary. Further, the display unit 32 is a module having a function of classifying corresponding e-mails using the summary in accordance with a request from the user and displaying an entire image of the corresponding e-mails within a predetermined period.
 また抽出電子メール管理テーブル22は、適用フェーズにおいて抽出された、対象概念抽出用データベース24に登録されたキーワードをテキスト内に含む電子メールを管理するために利用されるテーブルである。 The extracted e-mail management table 22 is a table used for managing e-mails that are extracted in the application phase and that include keywords registered in the target concept extraction database 24 in the text.
 この抽出電子メール管理テーブル22は、図5に示すように、送信日時欄22A、内容欄22B、送信元アドレス欄22C及び送信先アドレス欄22Dなどを備えて構成される。そして送信日時欄22Aには、その電子メールが送信元から送信された日時が格納され、内容欄22Bには、その電子メールについて作成された上述の要約が格納される。また送信元アドレス欄22Cには、その電子メールの送信元のメールアドレスが格納され、送信先アドレス欄22Dには、その電子メールの送信先のメールアドレスが格納される。 As shown in FIG. 5, the extracted email management table 22 includes a transmission date / time column 22A, a content column 22B, a transmission source address column 22C, a transmission destination address column 22D, and the like. The transmission date and time column 22A stores the date and time when the electronic mail was transmitted from the transmission source, and the content column 22B stores the above-described summary created for the electronic mail. The sender address column 22C stores the email address of the sender of the email, and the sender address column 22D stores the email address of the recipient of the email.
 従って、図5の例では、「2014/12/15 09:31:15」に「a_okamoto@aaa.co.jp」というメールアドレス(送信元)から「m_higasi@aaa.co.jp」というメールアドレス(送信先)に「システム 販売 する」という内容の電子メールが送信されたことが示されている。 Therefore, in the example of FIG. 5, from “e_okamoto@aaa.co.jp” to “m_higasi@aaa.co.jp” from “a_okamoto@aaa.co.jp” at “2014/12/15 09:31:15” (Send to) shows that an e-mail with the content “Sell system” was sent.
 図6、図7及び図9は、以上のトピック検出機能に関連して情報処理装置1において実行される各種処理の具体的な処理内容を示す。なお以下においては、各種処理の処理主体を「モジュール(~部)」として説明するが、実際上は、その「モジュール(~部)」に基づいて、その処理をCPU10が実行することは言うまでもない。 FIG. 6, FIG. 7 and FIG. 9 show specific process contents of various processes executed in the information processing apparatus 1 in relation to the topic detection function described above. In the following description, the processing entity of various processes will be described as “module (to part)”. However, in practice, the CPU 10 executes the process based on the “module (to part)”. .
 図6は、準備フェーズにおける一連の処理の流れを示す。この処理(以下、これをデータベース作成処理と呼ぶ)は、データベース作成部30により実行される。 FIG. 6 shows a flow of a series of processes in the preparation phase. This processing (hereinafter referred to as database creation processing) is executed by the database creation unit 30.
 実際上、データベース作成部30は、入力装置14(図1)が操作されて対象概念抽出用データベース24の作成指示が入力されるとこの図6に示すデータベース作成処理を開始し、まず、ユーザにより1又は複数の対象概念が選定されるのを待ち受ける(SP1)。 In practice, when the input device 14 (FIG. 1) is operated to input a creation instruction for the target concept extraction database 24, the database creation unit 30 starts the database creation process shown in FIG. It waits for one or more target concepts to be selected (SP1).
 そしてデータベース作成部30は、やがて1又は複数の対象概念が選定されると、そのとき選定された対象概念ごとに、その下位概念を電子化辞書上で検索し、すべての下位概念をそれぞれ抽出する(SP2)。 Then, when one or more target concepts are eventually selected, the database creation unit 30 searches the electronic dictionary for the subordinate concepts for each target concept selected at that time, and extracts all the subordinate concepts. (SP2).
 続いて、データベース作成部30は、ステップSP2で抽出した対象概念ごとのすべての下位概念について、その下位概念に関連するすべてのキーワードを電子化辞書からそれぞれ抽出する(SP3)。 Subsequently, the database creation unit 30 extracts, from the electronic dictionary, all keywords related to the subordinate concepts for all the subordinate concepts extracted in step SP2 (SP3).
 さらにデータベース作成部30は、ステップSP3で抽出したすべてのキーワードをそれぞれ対応する対象概念と対応付けた対象概念抽出用データベース24を作成する(SP4)。そしてデータベース作成部30は、この後、このデータベース作成処理を終了する。 Furthermore, the database creation unit 30 creates the target concept extraction database 24 in which all the keywords extracted in step SP3 are associated with the corresponding target concepts (SP4). Then, the database creation unit 30 thereafter ends this database creation process.
 一方、図7は、適用フェーズの一連の処理のうち、対象概念抽出用データベース24に登録されたキーワードをテキストに含む電子メールを抽出し、その要約を作成するまでの処理の流れを示す。この処理(以下、これを要約作成処理と呼ぶ)は、要約作成部31により実行される。 On the other hand, FIG. 7 shows the flow of the process from extracting an e-mail that includes a keyword registered in the target concept extraction database 24 in the text and creating a summary of the series of processes in the application phase. This processing (hereinafter referred to as summary creation processing) is executed by the summary creation unit 31.
 実際上、要約作成部31は、図6について上述したデータベース作成処理が終了すると、この図7に示す要約作成処理を開始し、まず、上述の電子メール監視機能を実行するためにネットワーク2から取り込んだ電子メールの中から分析対象とする電子メールを1つの選択する(SP10)。 In practice, when the database creation process described above with reference to FIG. 6 is completed, the summary creation unit 31 starts the summary creation process illustrated in FIG. 7, and first captures from the network 2 to execute the above-described e-mail monitoring function. One e-mail to be analyzed is selected from the e-mails (SP10).
 続いて、要約作成部31は、選択した電子メールのテキストを形態素分析することにより、当該テキストを個々の形態素(言語で意味をもつ最小単位)に分割し(SP11)、この後、このとき得られた各形態素を対象概念抽出用データベース24上でそれぞれ検索することにより、かかる形態素分析で得られた形態素の中に、対象概念抽出用データベース24にキーワードとして登録された形態素が存在するか否かを判断する(SP12)。 Subsequently, the summary creation unit 31 performs morphological analysis on the text of the selected e-mail to divide the text into individual morphemes (minimum unit having meaning in the language) (SP11). Whether or not there is a morpheme registered as a keyword in the target concept extraction database 24 among the morphemes obtained by the morpheme analysis by searching each morpheme obtained in the target concept extraction database 24 Is determined (SP12).
 要約作成部31は、この判断で否定結果を得ると、ステップSP10に戻り、次の未処理の電子メールに処理を移す。これに対して要約作成部31は、ステップSP12の判断で肯定結果を得ると、ステップSP11の形態素分析で得られた形態素のうち、対象概念抽出用データベース24にキーワードとして登録された各形態素について、対象概念抽出用データベース24を参照して、その形態素(キーワード)の上位概念である対象概念をそれぞれ検出する(SP13)。 If the summary creation unit 31 obtains a negative result in this determination, it returns to step SP10 and moves to the next unprocessed e-mail. On the other hand, when the summary creation unit 31 obtains a positive result in the determination at step SP12, among the morphemes obtained by the morpheme analysis at step SP11, for each morpheme registered as a keyword in the target concept extraction database 24, Referring to the target concept extraction database 24, each target concept that is a superordinate concept of the morpheme (keyword) is detected (SP13).
 続いて、要約作成部31は、ステップSP13で検出した各対象概念について、その下位概念の中から、所定の抽象度を有する概念を抽出する抽象度フィルタンリング処理を実行する(SP14)。これは、あまりにも上位の概念を用いて要約を作成しても、結果的にユーザがその要約に基づいて電子メールの内容を把握することができないため、ユーザが電子メールの内容を認識可能な抽象度を有する上位概念を用いて要約を作成するためである。 Subsequently, the summary creation unit 31 executes an abstraction filtering process for extracting a concept having a predetermined abstraction from the subordinate concepts for each target concept detected in step SP13 (SP14). This is because even if a summary is created using a concept that is too high, the user cannot recognize the content of the email based on the summary, so the user can recognize the content of the email. This is because a summary is created using a superordinate concept having an abstraction level.
 本実施の形態の場合、要約作成部31は、かかる抽象度フィルタリング処理として、図8に示すように、対象概念ごとに、対象概念抽出用データベース24に登録されたキーワードのうち、図2について上述した電子化辞書を利用することにより構築される概念の上下関係を表すグラフにおいて、リーフレベルのキーワード(下位概念を持たないキーワードであり、図8の「leaf_1」~「leaf_3」が相当)への平均距離が予め設定された閾値未満であり、かつ、かかる平均距離が最も大きい上位概念を要約に利用する上位概念として検出する。 In the case of the present embodiment, the summary creation unit 31 performs the above-described abstraction level filtering processing as described above with reference to FIG. 2 among the keywords registered in the target concept extraction database 24 for each target concept, as shown in FIG. In the graph showing the top-and-bottom relationship of the concept constructed by using the digitized dictionary, to the leaf-level keywords (keywords that do not have subordinate concepts and correspond to “leaf_1” to “leaf_3” in FIG. 8) A superordinate concept whose average distance is less than a preset threshold and has the largest mean distance is detected as a superordinate concept used for summarization.
 ここで、図8において「C:」というノードから「leaf_1」~「leaf_3」という3つのリーフノードへの平均距離は、「C:」というノードから「leaf_1」~「leaf_3」という3つのリーフノードへの総合距離を計算し、この合計距離をリーフノードの数で除算することにより算出することができる。 Here, in FIG. 8, the average distance from the node “C:” to the three leaf nodes “leaf_1” to “leaf_3” is three leaf nodes “leaf_1” to “leaf_3” from the node “C:” Is calculated by dividing the total distance by the number of leaf nodes.
 具体的に、図8の例の場合、「leaf_1」というノードから「C:」というノードへの距離と、「leaf_2」というノードから「C:」というノードへの距離はいずれも「2」であり、「leaf_3」というノードから「C:」というノードへの距離は「1」であるため、総合距離は、これら距離の合計値である「5」となる。従って、この「5」をリーフノード数である「3」で除算した「5/3(~1.67)」が「C:」というノードから「leaf_1」~「leaf_3」という3つのリーフノードへの平均距離となる。 Specifically, in the example of FIG. 8, the distance from the node “leaf_1” to the node “C:” and the distance from the node “leaf_2” to the node “C:” are both “2”. Yes, since the distance from the node “leaf_3” to the node “C:” is “1”, the total distance is “5” which is the total value of these distances. Therefore, "5/3 (~ 1.67)" obtained by dividing "5" by the number of leaf nodes "3" is the average from the node "C:" to the three leaf nodes "leaf_1" to "leaf_3" Distance.
 そこで要約作成部31は、ステップSP14において、ステップSP13で検出した対象概念ごとに、対象概念抽出用データベース24に登録されたキーワードのうち、ステップSP12で検出した形態素(キーワード)よりも上位のすべての概念(上位概念)についてこのような演算をそれぞれ実行することにより、これら上位概念からリーフノードまでの平均距離をそれぞれ算出し、算出した平均距離が予め設定された閾値よりも小さくかつ、当該平均距離が最も閾値に近い上位概念を1つ抽出する。 In view of this, in step SP14, the summary creation unit 31 selects all the keywords higher than the morpheme (keyword) detected in step SP12 among the keywords registered in the target concept extraction database 24 for each target concept detected in step SP13. By calculating each of these concepts for the concept (superordinate concept), the average distance from the superordinate concept to the leaf node is calculated, and the calculated average distance is smaller than a preset threshold and the average distance is calculated. One superordinate concept that is closest to the threshold is extracted.
 次いで、要約作成部31は、このようにして抽出した対象概念ごとの上位概念を並べることによりその電子メールの要約を作成し(SP15)、さらにその電子メールに関する必要な情報を図5について上述した抽出電子メール管理テーブル22に格納した後(SP16)、ステップSP10に戻る。 Next, the summary creation unit 31 creates a summary of the email by arranging the superordinate concepts for each target concept extracted in this way (SP15), and further, the necessary information regarding the email is described above with reference to FIG. After storing in the extracted e-mail management table 22 (SP16), the process returns to step SP10.
 他方、図9は、適用フェーズの一連の処理のうち、ユーザから所定期間内における該当する電子メールの全体像を表示すべき旨の指示(以下、これを全体像表示指示と呼ぶ)が与えられた場合に情報処理装置1において実行される処理の流れを示す。この処理(以下、これを表示処理と呼ぶ)は、表示部32(図1)により実行される。 On the other hand, in FIG. 9, in the series of processing in the application phase, the user is given an instruction to display the entire image of the corresponding e-mail within a predetermined period (hereinafter referred to as the entire image display instruction). Shows the flow of processing executed in the information processing apparatus 1 in the case of This process (hereinafter referred to as display process) is executed by the display unit 32 (FIG. 1).
 実際上、表示部32は、入力装置14が操作されてかかる全体像表示指示が与えられるとこの図9に示す表示処理を開始し、まず、抽出電子メール管理テーブル22に登録された電子メールのうち所定期間内に送信元から送信されたすべての電子メールをその要約の内容に応じて分類する(SP20)。 In practice, the display unit 32 starts the display process shown in FIG. 9 when the input device 14 is operated and the whole image display instruction is given. First, the display unit 32 stores the email registered in the extracted email management table 22. Among them, all e-mails transmitted from the transmission source within a predetermined period are classified according to the contents of the summary (SP20).
 この際の分類方法としては、例えば、要約の内容が完全一致する電子メール同士を同一グループとして分類する方法や、要約の内容が完全一致していない場合でも、要約を構成する各概念の上位概念が完全一致又は部分的に一致する場合には同一グループに分類する方法などを適用することができる。 As a classification method at this time, for example, a method of classifying e-mails whose summary contents completely match as a same group, or a superordinate concept of each concept constituting the summary even if the summary contents do not completely match Can be applied to a method of classifying them into the same group.
 続いて、表示部32は、ステップSP22の分類結果を、例えば図3(B)について上述した所定形式で表示装置15(図1)に表示し(SP21)、この後、この表示処理を終了する。 Subsequently, the display unit 32 displays the classification result of step SP22 on the display device 15 (FIG. 1) in the predetermined format described above with reference to FIG. 3B, for example (SP21), and thereafter ends this display processing. .
(1-3)本実施の形態の効果
 以上のように本実施の形態の情報処理装置1では、選定された対象概念と、当該対象概念の下位概念を表すキーワードとを対応付けた対象概念抽出用データベース24を作成し、この対象概念抽出用データベース24に登録されたキーワードをテキスト内に含む電子メールを抽出すると共に、その電子メールの内容を上位概念で表す要約を作成し、ユーザからの要求に応じて、その要約に基づいて該当する電子メールを分類し、分類結果を表示する。
(1-3) Effects of the present embodiment As described above, in the information processing apparatus 1 of the present embodiment, target concept extraction is performed by associating the selected target concept with a keyword representing a subordinate concept of the target concept. Database 24 is created, an e-mail containing the keyword registered in the target concept extraction database 24 in the text is extracted, a summary representing the contents of the e-mail in a high-level concept is created, and a request from the user Accordingly, the corresponding electronic mail is classified based on the summary, and the classification result is displayed.
 従って、本情報処理装置1によれば、電子メール監視機能に基づく監視処理中に、電子メール監視機能に基づいて予め設定されたキーワードを含む電子メールを検出していない状態のときにも、かかる分類結果に基づいて、対象概念抽出用データベース24に登録されたキーワードを含む電子メールの全体像を認識することができるため、当該情報処理装置1が正常に機能していることをユーザが認識することができる。すなわち本情報処理装置1によれば、ユーザは個々の電子メールのテキストに目を通すことなく、所定期間内における電子メールの内容の全体像を認識することができる。かくして本情報処理装置1によれば、ユーザから見た利便性を向上させることができる。 Therefore, according to the information processing apparatus 1, during the monitoring process based on the e-mail monitoring function, such an e-mail including a keyword preset based on the e-mail monitoring function is not detected. Based on the classification result, the entire image of the e-mail including the keyword registered in the target concept extraction database 24 can be recognized, so that the user recognizes that the information processing apparatus 1 is functioning normally. be able to. That is, according to the information processing apparatus 1, the user can recognize an entire image of the contents of the email within a predetermined period without looking through the text of each email. Thus, according to the information processing apparatus 1, it is possible to improve the convenience as viewed from the user.
(2)第2の実施の形態
 第1の実施の形態においては、ユーザが所望する特定のトピックに関する対象概念を登録することで、当該対象概念の下位概念のキーワードを含む電子メールを抽出し、これらの電子メールの全体像を表示するよう構成しているが、情報処理装置1がすべての電子メールについてその要約を作成し、作成した要約に基づいて電子メールを分類し、分類結果の全体像を表示するようにしても良い。
(2) Second Embodiment In the first embodiment, by registering a target concept related to a specific topic desired by a user, an e-mail including a keyword of a subordinate concept of the target concept is extracted, The whole image of these emails is displayed, but the information processing apparatus 1 creates summaries for all emails, classifies the emails based on the created summaries, and gives an overall view of the classification results. May be displayed.
 この場合には、上述した準備フェーズは必要なく、電子メールのテキストを形態素分析し、その結果から特徴的な形態素を抽出し(特徴的形態素抽出処理)、抽出した形態素の上位概念を検出し(上概念検出処理)、検出した上位概念の中から、かかる適度なレベルの上位概念を抽出し(抽象フィルタリング及び上位概念ランキング処理)、その結果に基づいて上述の実施の形態と同様にして電子メールを分類して分類結果の全体像を表示するようにすれば良い。 In this case, the preparation phase described above is not necessary, and the morphological analysis is performed on the text of the e-mail, the characteristic morpheme is extracted from the result (characteristic morpheme extraction process), and the superordinate concept of the extracted morpheme is detected ( (Upper concept detection process), an upper level concept of an appropriate level is extracted from the detected higher level concepts (abstract filtering and higher level concept ranking process), and an e-mail is obtained in the same manner as in the above-described embodiment based on the result. And an overall image of the classification result may be displayed.
 具体的には、特徴的形態素抽出処理では、
(A)参照コーパス(reference corpus)を用意する。ここで、当該参照コーパスは、自然言語の文章を構造化し、大規模に集積したものであり、当該参照コーパスから形態素の出現頻度を容易に取り出すことができる。
Specifically, in the characteristic morpheme extraction process,
(A) A reference corpus is prepared. Here, the reference corpus is a structure in which natural language sentences are structured and accumulated on a large scale, and the appearance frequency of morphemes can be easily extracted from the reference corpus.
(B)ある形態素が、分析対象となる未知データに出現する頻度をO11、参照コーパスに出現する頻度をO12とし、当該形態素とは異なる他のすべての形態素が未知データに出現する頻度をO21、当該他のすべての形態素が参照コーパスに出現する頻度をO22とする。 (B) A frequency that a morpheme appears in unknown data to be analyzed is O 11 , a frequency that appears in a reference corpus is O 12, and a frequency that all other morphemes different from the morpheme appear in unknown data O 21, the frequency of the other all the morphemes appear in the reference corpus O 22.
(C)R及びRをそれぞれ次式
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
とし、C、C、Nをそれぞれ次式
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
として、期待出現頻度(expected frequencies)E11~E22をそれぞれ次式により算出する。
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
(C) Each of R 1 and R 2 is
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
And C 1 , C 2 and N are respectively
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
As shown, expected frequencies E 11 to E 22 are respectively calculated by the following equations.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
(D)対数尤度比(log-likelihood-ratio)を次式により算出する。
Figure JPOXMLDOC01-appb-M000010
 この対数尤度比は、その値が高いほど、当該形態素が未知データを特徴付けるものである確率が高いことを示す。よって、例えば対数尤度比が予め設定した形態素を特徴的な形態素として抽出する。
(D) A log-likelihood-ratio is calculated by the following equation.
Figure JPOXMLDOC01-appb-M000010
The log likelihood ratio indicates that the higher the value, the higher the probability that the morpheme characterizes unknown data. Therefore, for example, a morpheme whose log likelihood ratio is set in advance is extracted as a characteristic morpheme.
 また上位概念検出処理では、上述の特徴的形態素抽出処理で抽出した形態素の上位概念を図1について上述した電子辞書23で検索することにより検出する。 In the superordinate concept detection process, the superordinate concept of the morpheme extracted by the above characteristic morpheme extraction process is detected by searching the electronic dictionary 23 described above with reference to FIG.
 さらに抽象度フィルタリング及び上位概念ランキング処理では、まず、上位概念検出処理で検出した上位概念の中から図7のステップSP14について上述した抽象度フィルタリング処理により程度な抽象度を有する上位概念を抽出する。この抽出処理により抽出された上位概念が複数あった場合、次式
Figure JPOXMLDOC01-appb-M000011
により概念の出現頻度(Concept Frequency;CF)が求められるため、当該概念の出現頻度をランキングすることにより、当該出現頻度が高い所定個数又は出現頻度が予め設定された閾値以上の上位概念を抽出し、これらの上位概念を並べたものをその電子メールの要約とする。なお、上述のように上位概念をランキングする方法としては、単に出現頻度をその大きさの順番で決定する方法以外にも、例えば、CF/DF(文書頻度;Document Frequency)又はCF/TF-iDF(単語の出現頻度と文書頻度とから計算される指標)により計算した値を利用してランキングする方法や、これ以外の方法を利用することもできる。この後は、この要約を用いて所定期間内のすべての電子メールを分類し、分類結果を表示する。
Further, in the abstraction level filtering and the superordinate concept ranking process, first, superordinate concepts having a certain degree of abstraction are extracted from the superordinate concepts detected in the superordinate concept detection process by the abstraction degree filtering process described above for step SP14 in FIG. If there are multiple superordinate concepts extracted by this extraction process,
Figure JPOXMLDOC01-appb-M000011
Since the concept appearance frequency (CF) is obtained from the above, by ranking the appearance frequency of the concept, a predetermined number of high appearance frequencies or a higher concept whose appearance frequency is equal to or higher than a preset threshold is extracted. A summary of these superordinate concepts is taken as the summary of the e-mail. Note that, as described above, as a method for ranking the superordinate concepts, for example, CF / DF (Document Frequency) or CF / TF-iDF is used in addition to the method of simply determining the appearance frequency in the order of the size. A ranking method using a value calculated based on (an index calculated from the word appearance frequency and the document frequency) or other methods can also be used. Thereafter, all e-mails within a predetermined period are classified using this summary, and the classification result is displayed.
 以上の本実施の形態による情報処理装置によれば、すべての電子メールをその内容に応じて分類することができるため、ユーザが所定期間内におけるすべての電子メールの内容の全体像を認識することができ、かくしてユーザから見た利便性をより一層と向上させることができる。 According to the information processing apparatus according to the present embodiment as described above, since all electronic mails can be classified according to the contents thereof, the user recognizes the entire picture of the contents of all electronic mails within a predetermined period. Thus, the convenience as seen from the user can be further improved.
(3)他の実施の形態
 なお上述の第1及び第2の実施の形態においては、情報処理装置1が電子化辞書を保持している場合について述べたが、本発明はこれに限らず、情報処理装置1は電子化辞書を保持しておらず、情報処理装置1が電子化辞書を保持する外部装置に対して電子化辞書上での各種検索を依頼し、その結果を受け取るようにシステム(装置)を構築するようにしても良い。
(3) Other Embodiments In the first and second embodiments described above, the case where the information processing apparatus 1 holds an electronic dictionary has been described. However, the present invention is not limited to this, The information processing device 1 does not hold an electronic dictionary, and the information processing device 1 requests various searches on the electronic dictionary from an external device holding the electronic dictionary and receives the results. You may make it construct (apparatus).
 また上述の第1及び第2の実施の形態においては、図7について上述した要約作成処理のステップSP14において、リーフレベルへの平均距離が予め設定された閾値未満の距離を有する概念を電子メールのテキストから抽出したキーワードの上位概念として、当該上位概念を用いてその電子メールの要約を作成するようにした場合について述べたが、本発明はこれに限らず、例えば、電子メールのテキストから抽出した対象概念抽出用データベース24に登録されたキーワードの上位概念を求め、当該上位概念を利用してその電子メールの要約を作成するようにしても良い。 Further, in the first and second embodiments described above, in step SP14 of the summary creation process described above with reference to FIG. 7, the concept that the average distance to the leaf level is less than a preset threshold value is represented in the e-mail. As the superordinate concept of the keyword extracted from the text, the case where the summary of the e-mail is created using the superordinate concept has been described, but the present invention is not limited to this, for example, extracted from the text of the e-mail It is also possible to obtain a superordinate concept of keywords registered in the target concept extraction database 24 and create a summary of the e-mail using the superordinate concept.
 さらに上述の第1及び第2の実施の形態においては、該当する電子メールの全体像を例えば図3(B)のような形式で表示するようにした場合について述べたが、本発明はこれに限らず、例えば要約・分類された各結果が全体に占める割合を明示した円グラフ・棒グラフ・折れ線グラフ等のチャートをかかる全体像として表示する(例えば、トピックAは全体の20%を占め、トピックBは全体の10%を占め、トピックCは全体の5%を占め、その他のトピックは全体の65%を占めるなど)など、この他種々の表示形式を広く適用することができる。 Further, in the first and second embodiments described above, the case has been described in which the entire image of the corresponding e-mail is displayed in a format as shown in FIG. 3B, for example. For example, a chart such as a pie chart, a bar graph, or a line graph that clearly shows the ratio of each summarized / categorized result to the whole is displayed as such a whole picture (for example, topic A occupies 20% of the whole, B can occupy 10% of the whole, topic C can occupy 5% of the whole, and other topics can occupy 65% of the whole.
 さらに上述の第1及び第2の実施の形態においては、電子メール監視機能及びトピック検出機能を同じ1つの情報処理装置1に搭載(つまり電子メール監視プログラム20及びトピック検出プログラム21を1つの情報処理装置1に実装)するようにした場合について述べたが、本発明はこれに限らず、これら2つの機能を別個の情報処理装置に搭載(例えば、電子メール監視プログラム20及びトピック検出プログラム21を別個の情報処理装置に実装)するようにしても良い。また電子メール監視機能やトピック検出機能を複数台の情報処理装置で実行する分散システムとしてシステムを構築するようにしても良い。 Further, in the first and second embodiments described above, the electronic mail monitoring function and the topic detection function are mounted on the same information processing apparatus 1 (that is, the electronic mail monitoring program 20 and the topic detection program 21 are combined into one information processing). However, the present invention is not limited to this, and these two functions are mounted in separate information processing apparatuses (for example, the e-mail monitoring program 20 and the topic detection program 21 are separately provided). May be mounted on the information processing apparatus. Further, the system may be constructed as a distributed system in which an electronic mail monitoring function and a topic detection function are executed by a plurality of information processing apparatuses.
 さらに上述の第1及び第2の実施の形態においては、電子メールの要約を作成し、当該要約に基づいて電子メールを分類し、分類結果の全体像をユーザに提供するようにした場合について述べたが、本発明はこれに限らず、例えば、情報処理装置1が、ある概念(第1の概念)と当該概念とは異なる他の概念(第2の概念)との相関(共起)を考慮して、データを分析できるようにしても良い。例えば、「システム」という第1の概念(評価対象)と「腹を立てること」という第2の概念(価値判断)とが同じデータに同時に出現することが多い場合、当該「システム」という評価対象は評価が低いという価値判断を、情報処理装置1がユーザに提示できるようにしても良い。 Further, in the first and second embodiments described above, a case is described in which an email summary is created, emails are classified based on the summary, and the entire classification result is provided to the user. However, the present invention is not limited to this, and for example, the information processing apparatus 1 calculates a correlation (co-occurrence) between a concept (first concept) and another concept (second concept) different from the concept. Considering this, data may be analyzed. For example, if the first concept “system” (evaluation target) and the second concept “value determination” (value judgment) often appear simultaneously in the same data, the evaluation target “system” The information processing device 1 may be able to present a value judgment that the evaluation is low to the user.
 さらに上述の第1の実施の形態においては、準備フェーズにおいてキーワードと対象概念とを対応付けただけの対象概念抽出用データベース24を作成するようにした場合について述べたが、本発明はこれに限らず、例えば、情報処理装置1が、準備フェーズにおいてキーワードと対象概念とを対応付けるだけでなく、当該キーワードに対するスコア(当該キーワードがポジティブな感情を示すものか、ネガティブな感情を示すものかを、例えば、0~1の値で定量化した指標)を概念感情スコアとして対応付け、適用フェーズにおいてデータから抽出された概念に対応する概念感情スコアに基づいて(例えば、当該概念感情スコアを合算・積算することによって)、当該概念(評価対象)に対する感情(価値判断)をユーザに提示できるようにしても良い。 Furthermore, in the above-described first embodiment, the case where the target concept extraction database 24 in which only the keyword and the target concept are associated is created in the preparation phase has been described, but the present invention is not limited thereto. For example, the information processing apparatus 1 not only associates the keyword with the target concept in the preparation phase, but also scores for the keyword (whether the keyword indicates a positive emotion or a negative emotion, for example, , An index quantified by a value of 0 to 1) is associated as a concept emotion score, and based on the concept emotion score corresponding to the concept extracted from the data in the application phase (for example, the concept emotion score is added and accumulated) ) Can present the emotion (value judgment) for the concept (evaluation target) to the user Unishi and may be.
 さらに上述の第1の実施の形態においては、予め選定された対象概念の下位概念に属するキーワードを含む電子メールを抽出し、当該キーワードの上位概念を用いてその電子メールの要約を作成するようにした場合について述べたが、本発明はこれに限らず、例えば、情報処理装置1が、センテンスに含まれる動詞句を上位概念として抽出し、抽出した動詞句を利用して当該センテンスを含むデータの要約を作成するようにしても良い。例えば、情報処理装置1が、「私は料理を楽しみました」というセンテンスから「楽しみました」という動詞句を抽出し、当該動詞句を要約としてユーザに提示するようにしても良い。 Furthermore, in the first embodiment described above, an e-mail containing a keyword belonging to a subordinate concept of a target concept selected in advance is extracted, and a summary of the e-mail is created using the superordinate concept of the keyword. However, the present invention is not limited to this. For example, the information processing apparatus 1 extracts a verb phrase included in the sentence as a superordinate concept, and uses the extracted verb phrase to convert the data including the sentence. A summary may be created. For example, the information processing apparatus 1 may extract the verb phrase “I enjoyed” from the sentence “I enjoyed cooking” and present the verb phrase as a summary to the user.
 さらに上述の第1及び第2の実施の形態においては、本発明を電子メールの監視を行う情報処理装置1に適用するようにした場合について述べたが、本発明はこれに限らず、以下の実施の目的、或いは、実施の形態にも適用することができる。 Furthermore, in the above-described first and second embodiments, the case where the present invention is applied to the information processing apparatus 1 that monitors electronic mail has been described. However, the present invention is not limited to this, and the following The present invention can also be applied to the purpose of implementation or the embodiment.
 例えば、本発明は、インターネット応用システムに適用することもできる。例えば、ユーザがSNSに投稿したメッセージ、ウェブサイトに掲載されたお勧め情報やレビュー、ユーザ又は団体のプロフィールなどのデータを本発明の情報処理装置により要約してユーザに提供することができる。すなわち、上記情報処理装置は、評価対象(例えば、ユーザがウェブサイトに投稿した商品レビューの場合、当該商品)と、価値判断(当該商品に対してどのような評価をしたのかについての要約)とを示すことができるため、インターネットに関するユーザの利便性を向上させることができる。 For example, the present invention can be applied to an Internet application system. For example, data such as messages posted by the user to the SNS, recommended information and reviews posted on the website, and user or group profiles can be summarized and provided to the user by the information processing apparatus of the present invention. That is, the information processing device includes an evaluation target (for example, the product in the case of a product review posted by the user on the website) and a value determination (summary of how the product was evaluated). Therefore, the convenience of the user regarding the Internet can be improved.
 また本発明は、医療応用システム(例えば、電子カルテ、看護記録、患者の日記などをデータとして、患者の予後を予測したり、薬効を検証したりするシステム)に適用することもできる。この場合、例えば、電子カルテ、看護記録、患者の日記などを本発明の情報処理装置により要約したものを提示することによって、例えば、患者が危険な状態(例えば、転倒するなど)に陥ることの予測を容易化させることができる。 The present invention can also be applied to a medical application system (for example, a system that predicts a patient's prognosis or verifies a drug effect using electronic medical records, nursing records, patient diaries, and the like as data). In this case, for example, by presenting a summary of electronic medical records, nursing records, patient diaries, etc. by the information processing apparatus of the present invention, for example, the patient may fall into a dangerous state (for example, falls). Prediction can be facilitated.
 さらに本発明は、ディスカバリ支援システムに適用することもできる。例えば、ドキュメント、電子メール、表計算データなどのデータを本発明の情報処理装置により要約することによって、例えば、本件訴訟に関連する文書のみをユーザが効率的に抽出して法廷に提出することができる。 Furthermore, the present invention can also be applied to a discovery support system. For example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, the user can efficiently extract only the documents related to the lawsuit and submit them to the court. it can.
 さらに本発明は、フォレンジックシステムに適用することもできる。この場合、例えば、ドキュメント、電子メール、表計算データなどのデータを本発明の情報処理装置により要約することによって、例えば、当該犯罪行為を立証する証拠の抽出を容易化させることができ、そのような作業効率を向上させることができる。 Furthermore, the present invention can also be applied to a forensic system. In this case, for example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, it is possible to facilitate the extraction of evidence that proves the criminal activity. Work efficiency can be improved.
 さらに本発明は、例えばプレディクティブコーディング機能(少数の訓練データに基づいて、多数の未知データに対してスコア(当該未知データと所定の事案との関連性の高低を示す指標)を算出することによって、当該多数の未知データを序列化する機能)が搭載されたデータ分析システムに適用することもできる。なお、プレディクティブコーディング機能が搭載されたデータ分析システムは、そのデータ分析を実行するデータ分析プログラムの一部又は全部を実行するクライアント装置(例えば、パーソナルコンピュータ、スマートフォンなどのユーザ端末)と、上記データ分析プログラムの一部又は全部を実行し、実行した結果を上記クライアント装置に返送するサーバ装置とを有し、データ分析プログラムに含まれる処理をクライアント装置及びサーバ装置において任意に分担するように構成される。 Furthermore, the present invention, for example, by calculating predictive coding function (based on a small number of training data, a score for a large number of unknown data (an index indicating the level of relevance between the unknown data and a predetermined case), The present invention can also be applied to a data analysis system equipped with a function that ranks a large number of unknown data. A data analysis system equipped with a predictive coding function includes a client device (for example, a user terminal such as a personal computer or a smartphone) that executes part or all of a data analysis program that executes the data analysis, and the data analysis described above. A server device that executes part or all of the program and returns the execution result to the client device, and is configured to arbitrarily share the processing included in the data analysis program between the client device and the server device. .
 なお本発明をプレディクティブコーディング機能が搭載されたデータ分析システムに適用する場合、データの要約によって示された価値判断に基づいて、上記プレディクティブコーディング機能によって当該データに対して算出されたスコアを調整するようにしても良い。例えば、上記プレディクティブコーディング機能によって、ユーザの嗜好に合っていると考えられるデータほど高いスコアが付けられた場合であって、当該データから「関心がない」ことを示す価値判断が要約として示された場合(すなわち、スコアと要約とが矛盾する場合)、本発明の情報処理装置が、例えば、上記算出されたスコアを減少させるなど、当該スコアを調整できるようにしても良い。 When the present invention is applied to a data analysis system equipped with a predictive coding function, the score calculated for the data by the predictive coding function is adjusted based on the value judgment indicated by the data summary. Anyway. For example, when the above predictive coding function gives a higher score to data that seems to match the user's preference, the value judgment indicating “not interested” is shown as a summary from the data In this case (that is, when the score and the summary contradict each other), the information processing apparatus of the present invention may be able to adjust the score, for example, by reducing the calculated score.
 さらに本発明は、特許調査システムに適用することもできる。例えば、特許文献、発明を要約した文書などのデータを本情報処理装置により要約することによって、大量の特許文献の中から無効資料を抽出する作業をユーザが効率良く行うことができる。 Furthermore, the present invention can also be applied to a patent search system. For example, by summarizing data such as patent documents and documents summarizing the invention by the information processing apparatus, the user can efficiently perform an operation of extracting invalid materials from a large number of patent documents.
 このように本発明の情報処理装置は、電子メールを監視する情報処理装置1だけでなく、フォレンジックシステム、ディスカバリ支援システム、医療応用システム、インターネット応用システム、特許調査システムなどの種々のシステムに広く適用することができる。さらに、本発明の情報処理装置は、ポータルサイト運営システム、プロジェクト評価システム、取引管理システム、コールセンターエスカレーションシステム、マーケティングシステムなど、任意のシステムに広く適用することができる。すなわち、本発明は、データから上位概念を抽出し、当該上位概念で表現した要約を作成し、当該要約をユーザに提示することによって、データの全体像を当該ユーザに提示するシステムに広く適用され得る。 Thus, the information processing apparatus of the present invention is widely applied not only to the information processing apparatus 1 that monitors e-mails but also to various systems such as a forensic system, a discovery support system, a medical application system, an Internet application system, and a patent research system. can do. Furthermore, the information processing apparatus of the present invention can be widely applied to any system such as a portal site management system, a project evaluation system, a transaction management system, a call center escalation system, and a marketing system. That is, the present invention is widely applied to a system that presents an overall image of data to the user by extracting the superordinate concept from the data, creating a summary expressed by the superordinate concept, and presenting the summary to the user. obtain.
 本発明は、環境の変化又は特定の状態を検出する情報処理装置や、インターネット上でウェブページを提供するサーバ装置など、種々の情報処理装置に広く適用することができる。 The present invention can be widely applied to various information processing apparatuses such as an information processing apparatus that detects a change in environment or a specific state, and a server apparatus that provides a web page on the Internet.
 1……情報処理装置、10……CPU、15……表示装置、21……トピック検出プログラム、22……抽出電子メール管理テーブル、23……電子化辞書、24……対象概念抽出用データベース、30……データベース作成部、31……要約作成部、32……表示部。 DESCRIPTION OF SYMBOLS 1 ... Information processing apparatus, 10 ... CPU, 15 ... Display apparatus, 21 ... Topic detection program, 22 ... Extraction e-mail management table, 23 ... Electronic dictionary, 24 ... Target concept extraction database, 30 …… Database creation section, 31 …… Summary creation section, 32 …… Display section.

Claims (6)

  1.  選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成するデータベース作成部と、
     対象とするデータの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する要約作成部と、
     前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する表示部と
     を備えることを特徴とする情報処理装置。
    A database creation unit that creates a database that associates the selected target concept with data elements that are subordinate concepts of the target concept;
    Extracting data including the data element registered in the database from the target data, and a summary creation unit for creating a summary expressing the content of the extracted data in a higher concept of the data element;
    An information processing apparatus comprising: a display unit that classifies the data including the data elements registered in the database based on the summary and displays a classification result.
  2.  前記データ要素及び概念を階層的に分類し、前記データ要素及び前記概念を収録した辞書が予め与えられ、
     前記データベース作成部は、
     当該辞書から選定された対象概念のすべての下位概念を前記辞書上で検索し、
     当該検索により検出したすべての前記下位概念に対応するすべての前記データ要素を抽出し、
     抽出したすべての前記データ要素をそれぞれ対応する前記対象概念と対応付けるようにして前記データベースを作成する
     ことを特徴とする請求項1に記載の情報処理装置。
    Hierarchically classifying the data elements and concepts, a dictionary containing the data elements and the concepts is given in advance,
    The database creation unit
    Search the dictionary for all subordinate concepts of the target concept selected from the dictionary,
    Extract all the data elements corresponding to all the subordinate concepts detected by the search,
    The information processing apparatus according to claim 1, wherein the database is created by associating all the extracted data elements with the corresponding target concepts.
  3.  前記要約作成部は、
     前記データに含まれる前記データベースに登録された前記データ要素の上位概念である前記対象概念を検出し、
     検出した前記対象概念の下位の概念のうち、前記データから抽出した前記データ要素の上位概念であって、所定の抽象度を有する概念を検出し、検出した概念を利用して前記要約を作成する
     ことを特徴とする請求項1に記載の情報処理装置。
    The summary creation unit
    Detecting the target concept that is a superordinate concept of the data element registered in the database included in the data;
    Among the subordinate concepts of the detected target concept, a superordinate concept of the data element extracted from the data and having a predetermined abstraction level is detected, and the summary is created using the detected concept The information processing apparatus according to claim 1.
  4.  所定の前記抽象度を有する前記概念は、
     概念の上下関係を表すグラフにおいて、リーフレベルへの平均距離が予め設定された閾値未満の距離を有する概念である
     ことを特徴とする請求項2又は3に記載の情報処理装置。
    The concept having a predetermined level of abstraction is:
    The information processing apparatus according to claim 2 or 3, wherein, in the graph representing the hierarchical relationship of concepts, the average distance to the leaf level is a concept having a distance less than a preset threshold.
  5.  情報処理装置が、選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成する第1のステップと、
     前記情報処理装置が、データの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する第2のステップと、
     前記情報処理装置が、前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する第3のステップと
     を含むことを特徴とする情報処理方法。
    A first step in which the information processing apparatus creates a database in which the selected target concept is associated with data elements that are subordinate concepts of the target concept;
    A second step in which the information processing apparatus extracts data including the data element registered in the database from data, and creates a summary expressing the content of the extracted data in a superordinate concept of the data element When,
    The information processing method includes: a third step of classifying the data including the data element registered in the database based on the summary and displaying a classification result.
  6.  選定された対象概念と、当該対象概念の下位概念となるデータ要素とを対応付けたデータベースを作成する第1のステップと、
     データの中から前記データベースに登録された前記データ要素を含むデータを抽出し、抽出した前記データの内容を当該データ要素の上位概念で表現した要約を作成する第2のステップと、
     前記要約に基づいて、前記データベースに登録された前記データ要素を含む前記データを分類し、分類結果を表示する第3のステップと
     を含む処理を情報処理装置に実行させることを特徴とするプログラム。
    A first step of creating a database in which the selected target concept is associated with data elements that are subordinate concepts of the target concept;
    A second step of extracting data including the data element registered in the database from data and creating a summary expressing the content of the extracted data in a superordinate concept of the data element;
    A program that causes the information processing apparatus to execute processing including a third step of classifying the data including the data element registered in the database based on the summary and displaying a classification result.
PCT/JP2015/054890 2015-02-20 2015-02-20 Information processing device and method, and program WO2016132558A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/054890 WO2016132558A1 (en) 2015-02-20 2015-02-20 Information processing device and method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/054890 WO2016132558A1 (en) 2015-02-20 2015-02-20 Information processing device and method, and program

Publications (1)

Publication Number Publication Date
WO2016132558A1 true WO2016132558A1 (en) 2016-08-25

Family

ID=56692068

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/054890 WO2016132558A1 (en) 2015-02-20 2015-02-20 Information processing device and method, and program

Country Status (1)

Country Link
WO (1) WO2016132558A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021240832A1 (en) * 2020-05-27 2021-12-02 日本電信電話株式会社 Processing device, processing method and processing program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318398A (en) * 2005-05-16 2006-11-24 Nippon Telegr & Teleph Corp <Ntt> Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein
US20120066210A1 (en) * 2010-09-14 2012-03-15 Microsoft Corporation Interface to navigate and search a concept hierarchy
JP2015001834A (en) * 2013-06-14 2015-01-05 日本電信電話株式会社 Content summarization device, content summarization method and content summarization program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318398A (en) * 2005-05-16 2006-11-24 Nippon Telegr & Teleph Corp <Ntt> Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein
US20120066210A1 (en) * 2010-09-14 2012-03-15 Microsoft Corporation Interface to navigate and search a concept hierarchy
JP2015001834A (en) * 2013-06-14 2015-01-05 日本電信電話株式会社 Content summarization device, content summarization method and content summarization program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021240832A1 (en) * 2020-05-27 2021-12-02 日本電信電話株式会社 Processing device, processing method and processing program
WO2021240686A1 (en) * 2020-05-27 2021-12-02 日本電信電話株式会社 Processing device, processing method, and processing program
JP7477791B2 (en) 2020-05-27 2024-05-02 日本電信電話株式会社 Processing device, processing method, and processing program

Similar Documents

Publication Publication Date Title
JP5168961B2 (en) Latest reputation information notification program, recording medium, apparatus and method
US9152625B2 (en) Microblog summarization
US9852215B1 (en) Identifying text predicted to be of interest
US20120330968A1 (en) System and method for matching comment data to text data
US20130179423A1 (en) Computer-generated sentiment-based knowledge base
US9946703B2 (en) Title extraction using natural language processing
US20160041985A1 (en) Systems and methods for suggesting headlines
WO2013049774A2 (en) Sentiment analysis from social media content
Heu et al. FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy
US9116979B2 (en) Systems and methods for creating an interest profile for a user
US10679015B1 (en) Utilizing artificial intelligence-based machine translation to augment document summarization
JP6048977B2 (en) Site summary method, site summary system, information processing apparatus, and program
US11061943B2 (en) Constructing, evaluating, and improving a search string for retrieving images indicating item use
WO2016191912A1 (en) Comment-centered news reader
US10339559B2 (en) Associating social comments with individual assets used in a campaign
WO2016121127A1 (en) Data evaluation system, data evaluation method, and data evaluation program
Faisal et al. A novel framework for social web forums’ thread ranking based on semantics and post quality features
US11055345B2 (en) Constructing, evaluating, and improving a search string for retrieving images indicating item use
WO2016132558A1 (en) Information processing device and method, and program
Dhoju et al. A large-scale analysis of health journalism by reliable and unreliable media
JP5361090B2 (en) Topic word acquisition apparatus, method, and program
JP2016045552A (en) Feature extraction program, feature extraction method, and feature extraction device
JP5844887B2 (en) Support for video content search through communication network
JP6993955B2 (en) Information processing equipment, information processing methods, and programs
JP7042720B2 (en) Information processing equipment, information processing methods, and programs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15882659

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 15882659

Country of ref document: EP

Kind code of ref document: A1