WO2016132558A1

WO2016132558A1 - Information processing device and method, and program

Info

Publication number: WO2016132558A1
Application number: PCT/JP2015/054890
Authority: WO
Inventors: ヤコブハルスコウ; 秀樹武田
Original assignee: 株式会社Ｕｂｉｃ
Priority date: 2015-02-20
Filing date: 2015-02-20
Publication date: 2016-08-25

Abstract

[Problem] To provide an information processing device, method, and program which improve usability. [Solution] An information processing device creates a database which maps selected subject concepts with data elements which are subordinate concepts of the subject concepts; extracts, from among data, data including the data elements which are registered with the database, and creates a summary which expresses the content of the extracted data with superordinate concepts of the data elements; and, on the basis of the created summary, classifies the data which includes the data elements which are registered with the database, and displays the result of the classification.

Description

Information processing apparatus and method, and program

The present invention relates to an information processing apparatus, method, and program, and is suitably applied to, for example, an information processing apparatus that monitors electronic mail.

Conventionally, when a change in the environment is detected or a specific state is detected, a system for notifying the user that the change or the specific state has been detected has been widely studied. For example, Patent Document 1 discloses an abnormality detection system that efficiently detects an abnormality that occurs in a control system and isolates the control system in which the abnormality is recognized.

JP 2012-168755 A

By the way, in such a system, when the system does not detect "change" or "specific state", is the system functioning normally, but isn't there really "change" or "specific state" occurring? Alternatively, the user cannot recognize whether “change” or “specific state” has not been detected because the system is not functioning normally.

Therefore, in such a system, if the system is in a state where no “change” or “specific state” has been detected, for example, if the entire picture of the contents of the e-mail within a predetermined period can be provided to the user, the system is normal. However, it is considered that the user can easily recognize that no “change” or “specific state” has actually occurred, and the security and reliability of the system can be improved. In addition, by doing this, the user can recognize the whole image of the contents of the email within a predetermined period without looking at each email, so the convenience of the system as seen from the user It is thought that this can be improved.

Also, in recent years, there are an increasing number of cases where users' reviews regarding products and restaurants are posted on websites for selling products on the Internet and introduction sites for restaurants and the like. Although such user reviews are useful information for users who purchase the product or use the restaurant, etc., it takes considerable time and effort to read through all reviews. It will be.

Therefore, if such a website can provide the user with an overall view of such reviews, the time and labor required to read through individual reviews can be omitted, and the convenience of the Internet system as a whole viewed from the user can be reduced. It is thought that it can be improved.

The present invention has been made in consideration of the above points, and intends to propose an information processing apparatus, method, and program that can improve the convenience as seen by the user by presenting the entire image of the data to the user. It is.

In order to solve such a problem, in the present invention, in an information processing apparatus, a database creation unit that creates a database in which a selected target concept is associated with a data element that is a subordinate concept of the target concept, and a target Based on the summary, a summary creation unit that extracts data including the data element registered in the database from data and creates a summary that expresses the content of the extracted data in a superordinate concept of the data element; And a display unit for classifying the data including the data elements registered in the database and displaying a classification result.

According to the present invention, in the information processing method, the information processing apparatus creates a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept, and the information A second step in which a processing device extracts data including the data element registered in the database from data and creates a summary expressing the content of the extracted data in a superordinate concept of the data element; The information processing apparatus includes a third step of classifying the data including the data elements registered in the database based on the summary and displaying a classification result.

Furthermore, in the present invention, in the program, a first step of creating a database in which the selected target concept is associated with a data element that is a subordinate concept of the target concept in the information processing apparatus, and the data A second step of extracting data including the data element registered in the database, creating a summary expressing the content of the extracted data in a superordinate concept of the data element, and based on the summary, the database And classifying the data including the data element registered in (3), and executing a process including a third step of displaying a classification result.

According to the information processing apparatus, the information processing method, and the program, the user can grasp the entire image of the data based on the display result of the information processing apparatus. Can be omitted.

According to the present invention, it is possible to realize an information processing apparatus, method, and program that can improve convenience for the user.

It is a block diagram which shows schematic structure of the information processing apparatus by this Embodiment. It is a graph with which it uses for description of an electronic dictionary. (A) is a conceptual diagram used for the outline | summary description of this invention, (B) is a basic diagram which shows an example of the display format of a classification result. It is a conceptual diagram with which it uses for description of a target concept. It is a conceptual diagram which shows schematic structure of an extraction email management table. It is a flowchart which shows the process sequence of a database creation process. It is a flowchart which shows the process sequence of a summary preparation process. It is a graph used for description of abstraction level filtering processing. It is a flowchart which shows the process sequence of a display process.

Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

(1) First Embodiment (1-1) Configuration of Information Processing Device According to this Embodiment In FIG. 1, reference numeral 1 denotes an information processing device to which this embodiment is applied as a whole. The information processing apparatus 1 monitors an electronic mail distributed through a network 2 such as an in-house LAN (Local Area Network), and a specific keyword set in advance in the electronic mail data (including a subject, text, and attached file). This is a computer device equipped with an e-mail monitoring function for notifying an administrator of this when it is detected, and a topic detection function to be described later. The information processing apparatus 1 includes a CPU 10, a memory 11, a hard disk device 12, an interface 13, an input device 14, and a display device 15.

The CPU 10 is a processor (controller) having a function for controlling the operation of the entire information processing apparatus 1. The memory 11 is composed of, for example, a nonvolatile semiconductor memory and is used as a work memory for the CPU 10. The memory 11 stores an email monitoring program 20, a topic detection program 21, and an extracted email management table 22. The email monitoring program 20 is a program that executes various processes for realizing the email monitoring function described above. Details of the topic detection program 21 and the extracted email management table 22 will be described later.

The hard disk device 12 is used for storing various programs and various data for a long period of time. The hard disk device 12 stores an electronic dictionary 23 and a target concept extraction database 24. The electronic dictionary 23 is a dictionary in which Japanese words and concepts are classified hierarchically and recorded in a systematic form. By using this electronic dictionary 23, it is possible to construct a graph representing the vertical relationship of concepts as shown in FIG. 2, for example. Details of the target concept extraction database 24 will be described later.

The input device 14 is composed of, for example, a keyboard and a mouse, and is used by a user to perform operation input and settings. The display device 15 includes a liquid crystal display and is used for displaying various types of information.

(1-2) Topic Detection Function Next, the topic detection function installed in the information processing apparatus 1 will be described. As shown in FIG. 3A, the information processing apparatus 1 has a concept that is pre-selected from e-mails distributed through the network 2 within a predetermined period (hereinafter referred to as a target concept). Extract e-mails that contain the keywords in the text, create summaries of the contents of each extracted e-mail with a moderate level of abstraction, classify the e-mails based on the created summaries (clustering), and within a predetermined period A topic detection function for presenting the classification result of the electronic mail to the user in a format as shown in FIG.

Such a topic detection function is realized by two phases, a preparation phase and an application phase. In the preparation phase, only the keywords of the subordinate concepts of each target concept preset by the user are extracted from the electronic dictionary 23 (FIG. 1), and the extracted keywords are associated with the corresponding target concepts. This is a phase for creating the concept extraction database 24 (FIG. 1). In the application phase, the target concept extraction database 24 created in the preparation phase is used to create a summary expressing the contents of the corresponding email in a superordinate concept, and the corresponding email is classified based on the created summary. This is a phase in which the classification result is displayed in response to a request from the user. The “corresponding e-mail” mentioned here refers to an e-mail including the keyword registered in the target concept extraction database 24 in the text. The same applies to the following.

In the preparation phase, first, the user selects several target concepts corresponding to the topic to be detected from the text of the email, and registers the selected target concepts in the information processing apparatus 1 in advance. For example, if the topics to be detected are “injustice” and “dissatisfaction”, the concept categories are “behavior”, “emotion”, “nature and state”, “risk” and “money” as shown in FIG. For example, “behave” for “behavior” and “despise” for “behavior”, “suffer” and “be angry” for “emotion”, etc. For “risk” and “dangerous” and “dangerous” and “money”, “money paid for human labor” and so on. Set each.

When the target concept is set in this way, the information processing apparatus 1 searches the electronic dictionary 23 for a keyword representing the subordinate concept for each registered target concept, and detects each keyword detected by the search. The above-described target concept extraction database 24 is created in which each is associated with the corresponding target concept.

On the other hand, in the application phase, the information processing apparatus 1 is registered in the target concept extraction database 24 from e-mails distributed through the network 2 using the target concept extraction database 24 created as described above. Extract e-mails that contain the keyword in the text. Further, the information processing apparatus 1 creates a summary of the e-mail extracted in this way, using the superordinate concept of the keyword detected at that time.

For example, in the case of FIG. 3, as shown in FIG. 3A, for “e-mail_1”, target concepts “system”, “sales”, and “do” are extracted from the location “monitoring system order”, As for “e-mail_2”, the high-level concepts of “system”, “sales” and “do” are extracted from the place of “accounting system introduction”, so for these “e-mail_1” and “e-mail_2” In both cases, a summary of “system sales” will be created.

Then, when there is a request from the user, the information processing apparatus 1 sends the corresponding email within a predetermined period according to the content based on the summary of the corresponding email created in this way. Classify and present the classification result to the user.

For example, in the case of FIG. 3, since the same summary of “system sale” is created for “e-mail_1” and “e-mail_2” as described above, these “e-mail_1” and “e-mail_2” are the same. Classified into groups. Then, this classification result is displayed in a format in which the summary is “content” as shown in FIG. 3B, for example.

As means for realizing the topic detection function as described above, the memory 11 (FIG. 1) of the information processing apparatus 1 stores the topic detection program 21 and the extracted email management table 22 as described above with reference to FIG. ing.

The topic detection program 21 is a program for executing various processes related to the topic detection function described above, and includes a database creation unit 30, a summary creation unit 31, and a display unit 32 as shown in FIG.

The database creation unit 30 is a module having a function of creating the above-described target concept extraction database 24 based on the target concept set by the user. The summary creation unit 31 is a module having a function of extracting an e-mail including a keyword registered in the target concept extraction database 24 in the text and creating the summary. Further, the display unit 32 is a module having a function of classifying corresponding e-mails using the summary in accordance with a request from the user and displaying an entire image of the corresponding e-mails within a predetermined period.

The extracted e-mail management table 22 is a table used for managing e-mails that are extracted in the application phase and that include keywords registered in the target concept extraction database 24 in the text.

As shown in FIG. 5, the extracted email management table 22 includes a transmission date / time column 22A, a content column 22B, a transmission source address column 22C, a transmission destination address column 22D, and the like. The transmission date and time column 22A stores the date and time when the electronic mail was transmitted from the transmission source, and the content column 22B stores the above-described summary created for the electronic mail. The sender address column 22C stores the email address of the sender of the email, and the sender address column 22D stores the email address of the recipient of the email.

Therefore, in the example of FIG. 5, from “e_okamoto@aaa.co.jp” to “m_higasi@aaa.co.jp” from “a_okamoto@aaa.co.jp” at “2014/12/15 09:31:15” (Send to) shows that an e-mail with the content “Sell system” was sent.

FIG. 6, FIG. 7 and FIG. 9 show specific process contents of various processes executed in the information processing apparatus 1 in relation to the topic detection function described above. In the following description, the processing entity of various processes will be described as “module (to part)”. However, in practice, the CPU 10 executes the process based on the “module (to part)”. .

FIG. 6 shows a flow of a series of processes in the preparation phase. This processing (hereinafter referred to as database creation processing) is executed by the database creation unit 30.

In practice, when the input device 14 (FIG. 1) is operated to input a creation instruction for the target concept extraction database 24, the database creation unit 30 starts the database creation process shown in FIG. It waits for one or more target concepts to be selected (SP1).

Then, when one or more target concepts are eventually selected, the database creation unit 30 searches the electronic dictionary for the subordinate concepts for each target concept selected at that time, and extracts all the subordinate concepts. (SP2).

Subsequently, the database creation unit 30 extracts, from the electronic dictionary, all keywords related to the subordinate concepts for all the subordinate concepts extracted in step SP2 (SP3).

Furthermore, the database creation unit 30 creates the target concept extraction database 24 in which all the keywords extracted in step SP3 are associated with the corresponding target concepts (SP4). Then, the database creation unit 30 thereafter ends this database creation process.

On the other hand, FIG. 7 shows the flow of the process from extracting an e-mail that includes a keyword registered in the target concept extraction database 24 in the text and creating a summary of the series of processes in the application phase. This processing (hereinafter referred to as summary creation processing) is executed by the summary creation unit 31.

In practice, when the database creation process described above with reference to FIG. 6 is completed, the summary creation unit 31 starts the summary creation process illustrated in FIG. 7, and first captures from the network 2 to execute the above-described e-mail monitoring function. One e-mail to be analyzed is selected from the e-mails (SP10).

Subsequently, the summary creation unit 31 performs morphological analysis on the text of the selected e-mail to divide the text into individual morphemes (minimum unit having meaning in the language) (SP11). Whether or not there is a morpheme registered as a keyword in the target concept extraction database 24 among the morphemes obtained by the morpheme analysis by searching each morpheme obtained in the target concept extraction database 24 Is determined (SP12).

If the summary creation unit 31 obtains a negative result in this determination, it returns to step SP10 and moves to the next unprocessed e-mail. On the other hand, when the summary creation unit 31 obtains a positive result in the determination at step SP12, among the morphemes obtained by the morpheme analysis at step SP11, for each morpheme registered as a keyword in the target concept extraction database 24, Referring to the target concept extraction database 24, each target concept that is a superordinate concept of the morpheme (keyword) is detected (SP13).

Subsequently, the summary creation unit 31 executes an abstraction filtering process for extracting a concept having a predetermined abstraction from the subordinate concepts for each target concept detected in step SP13 (SP14). This is because even if a summary is created using a concept that is too high, the user cannot recognize the content of the email based on the summary, so the user can recognize the content of the email. This is because a summary is created using a superordinate concept having an abstraction level.

In the case of the present embodiment, the summary creation unit 31 performs the above-described abstraction level filtering processing as described above with reference to FIG. 2 among the keywords registered in the target concept extraction database 24 for each target concept, as shown in FIG. In the graph showing the top-and-bottom relationship of the concept constructed by using the digitized dictionary, to the leaf-level keywords (keywords that do not have subordinate concepts and correspond to “leaf_1” to “leaf_3” in FIG. 8) A superordinate concept whose average distance is less than a preset threshold and has the largest mean distance is detected as a superordinate concept used for summarization.

Here, in FIG. 8, the average distance from the node “C:” to the three leaf nodes “leaf_1” to “leaf_3” is three leaf nodes “leaf_1” to “leaf_3” from the node “C:” Is calculated by dividing the total distance by the number of leaf nodes.

Specifically, in the example of FIG. 8, the distance from the node “leaf_1” to the node “C:” and the distance from the node “leaf_2” to the node “C:” are both “2”. Yes, since the distance from the node “leaf_3” to the node “C:” is “1”, the total distance is “5” which is the total value of these distances. Therefore, "5/3 (~ 1.67)" obtained by dividing "5" by the number of leaf nodes "3" is the average from the node "C:" to the three leaf nodes "leaf_1" to "leaf_3" Distance.

In view of this, in step SP14, the summary creation unit 31 selects all the keywords higher than the morpheme (keyword) detected in step SP12 among the keywords registered in the target concept extraction database 24 for each target concept detected in step SP13. By calculating each of these concepts for the concept (superordinate concept), the average distance from the superordinate concept to the leaf node is calculated, and the calculated average distance is smaller than a preset threshold and the average distance is calculated. One superordinate concept that is closest to the threshold is extracted.

Next, the summary creation unit 31 creates a summary of the email by arranging the superordinate concepts for each target concept extracted in this way (SP15), and further, the necessary information regarding the email is described above with reference to FIG. After storing in the extracted e-mail management table 22 (SP16), the process returns to step SP10.

On the other hand, in FIG. 9, in the series of processing in the application phase, the user is given an instruction to display the entire image of the corresponding e-mail within a predetermined period (hereinafter referred to as the entire image display instruction). Shows the flow of processing executed in the information processing apparatus 1 in the case of This process (hereinafter referred to as display process) is executed by the display unit 32 (FIG. 1).

In practice, the display unit 32 starts the display process shown in FIG. 9 when the input device 14 is operated and the whole image display instruction is given. First, the display unit 32 stores the email registered in the extracted email management table 22. Among them, all e-mails transmitted from the transmission source within a predetermined period are classified according to the contents of the summary (SP20).

As a classification method at this time, for example, a method of classifying e-mails whose summary contents completely match as a same group, or a superordinate concept of each concept constituting the summary even if the summary contents do not completely match Can be applied to a method of classifying them into the same group.

Subsequently, the display unit 32 displays the classification result of step SP22 on the display device 15 (FIG. 1) in the predetermined format described above with reference to FIG. 3B, for example (SP21), and thereafter ends this display processing. .

(1-3) Effects of the present embodiment As described above, in the information processing apparatus 1 of the present embodiment, target concept extraction is performed by associating the selected target concept with a keyword representing a subordinate concept of the target concept. Database 24 is created, an e-mail containing the keyword registered in the target concept extraction database 24 in the text is extracted, a summary representing the contents of the e-mail in a high-level concept is created, and a request from the user Accordingly, the corresponding electronic mail is classified based on the summary, and the classification result is displayed.

Therefore, according to the information processing apparatus 1, during the monitoring process based on the e-mail monitoring function, such an e-mail including a keyword preset based on the e-mail monitoring function is not detected. Based on the classification result, the entire image of the e-mail including the keyword registered in the target concept extraction database 24 can be recognized, so that the user recognizes that the information processing apparatus 1 is functioning normally. be able to. That is, according to the information processing apparatus 1, the user can recognize an entire image of the contents of the email within a predetermined period without looking through the text of each email. Thus, according to the information processing apparatus 1, it is possible to improve the convenience as viewed from the user.

(2) Second Embodiment In the first embodiment, by registering a target concept related to a specific topic desired by a user, an e-mail including a keyword of a subordinate concept of the target concept is extracted, The whole image of these emails is displayed, but the information processing apparatus 1 creates summaries for all emails, classifies the emails based on the created summaries, and gives an overall view of the classification results. May be displayed.

In this case, the preparation phase described above is not necessary, and the morphological analysis is performed on the text of the e-mail, the characteristic morpheme is extracted from the result (characteristic morpheme extraction process), and the superordinate concept of the extracted morpheme is detected ( (Upper concept detection process), an upper level concept of an appropriate level is extracted from the detected higher level concepts (abstract filtering and higher level concept ranking process), and an e-mail is obtained in the same manner as in the above-described embodiment based on the result. And an overall image of the classification result may be displayed.

Specifically, in the characteristic morpheme extraction process,
(A) A reference corpus is prepared. Here, the reference corpus is a structure in which natural language sentences are structured and accumulated on a large scale, and the appearance frequency of morphemes can be easily extracted from the reference corpus.

(B) A frequency that a morpheme appears in unknown data to be analyzed is O ₁₁ , a frequency that appears in a reference corpus is O _12, and a frequency that all other morphemes different from the morpheme appear in unknown data O _21, the frequency of the other all the morphemes appear in the reference corpus O _22.

(C) Each of R ₁ and R ₂ is

And C ₁ , C ₂ and N are respectively

As shown, expected frequencies E ₁₁ to E ₂₂ are respectively calculated by the following equations.

(D) A log-likelihood-ratio is calculated by the following equation.

The log likelihood ratio indicates that the higher the value, the higher the probability that the morpheme characterizes unknown data. Therefore, for example, a morpheme whose log likelihood ratio is set in advance is extracted as a characteristic morpheme.

In the superordinate concept detection process, the superordinate concept of the morpheme extracted by the above characteristic morpheme extraction process is detected by searching the electronic dictionary 23 described above with reference to FIG.

Further, in the abstraction level filtering and the superordinate concept ranking process, first, superordinate concepts having a certain degree of abstraction are extracted from the superordinate concepts detected in the superordinate concept detection process by the abstraction degree filtering process described above for step SP14 in FIG. If there are multiple superordinate concepts extracted by this extraction process,

Since the concept appearance frequency (CF) is obtained from the above, by ranking the appearance frequency of the concept, a predetermined number of high appearance frequencies or a higher concept whose appearance frequency is equal to or higher than a preset threshold is extracted. A summary of these superordinate concepts is taken as the summary of the e-mail. Note that, as described above, as a method for ranking the superordinate concepts, for example, CF / DF (Document Frequency) or CF / TF-iDF is used in addition to the method of simply determining the appearance frequency in the order of the size. A ranking method using a value calculated based on (an index calculated from the word appearance frequency and the document frequency) or other methods can also be used. Thereafter, all e-mails within a predetermined period are classified using this summary, and the classification result is displayed.

According to the information processing apparatus according to the present embodiment as described above, since all electronic mails can be classified according to the contents thereof, the user recognizes the entire picture of the contents of all electronic mails within a predetermined period. Thus, the convenience as seen from the user can be further improved.

(3) Other Embodiments In the first and second embodiments described above, the case where the information processing apparatus 1 holds an electronic dictionary has been described. However, the present invention is not limited to this, The information processing device 1 does not hold an electronic dictionary, and the information processing device 1 requests various searches on the electronic dictionary from an external device holding the electronic dictionary and receives the results. You may make it construct (apparatus).

Further, in the first and second embodiments described above, in step SP14 of the summary creation process described above with reference to FIG. 7, the concept that the average distance to the leaf level is less than a preset threshold value is represented in the e-mail. As the superordinate concept of the keyword extracted from the text, the case where the summary of the e-mail is created using the superordinate concept has been described, but the present invention is not limited to this, for example, extracted from the text of the e-mail It is also possible to obtain a superordinate concept of keywords registered in the target concept extraction database 24 and create a summary of the e-mail using the superordinate concept.

Further, in the first and second embodiments described above, the case has been described in which the entire image of the corresponding e-mail is displayed in a format as shown in FIG. 3B, for example. For example, a chart such as a pie chart, a bar graph, or a line graph that clearly shows the ratio of each summarized / categorized result to the whole is displayed as such a whole picture (for example, topic A occupies 20% of the whole, B can occupy 10% of the whole, topic C can occupy 5% of the whole, and other topics can occupy 65% of the whole.

Further, in the first and second embodiments described above, the electronic mail monitoring function and the topic detection function are mounted on the same information processing apparatus 1 (that is, the electronic mail monitoring program 20 and the topic detection program 21 are combined into one information processing). However, the present invention is not limited to this, and these two functions are mounted in separate information processing apparatuses (for example, the e-mail monitoring program 20 and the topic detection program 21 are separately provided). May be mounted on the information processing apparatus. Further, the system may be constructed as a distributed system in which an electronic mail monitoring function and a topic detection function are executed by a plurality of information processing apparatuses.

Further, in the first and second embodiments described above, a case is described in which an email summary is created, emails are classified based on the summary, and the entire classification result is provided to the user. However, the present invention is not limited to this, and for example, the information processing apparatus 1 calculates a correlation (co-occurrence) between a concept (first concept) and another concept (second concept) different from the concept. Considering this, data may be analyzed. For example, if the first concept “system” (evaluation target) and the second concept “value determination” (value judgment) often appear simultaneously in the same data, the evaluation target “system” The information processing device 1 may be able to present a value judgment that the evaluation is low to the user.

Furthermore, in the above-described first embodiment, the case where the target concept extraction database 24 in which only the keyword and the target concept are associated is created in the preparation phase has been described, but the present invention is not limited thereto. For example, the information processing apparatus 1 not only associates the keyword with the target concept in the preparation phase, but also scores for the keyword (whether the keyword indicates a positive emotion or a negative emotion, for example, , An index quantified by a value of 0 to 1) is associated as a concept emotion score, and based on the concept emotion score corresponding to the concept extracted from the data in the application phase (for example, the concept emotion score is added and accumulated) ) Can present the emotion (value judgment) for the concept (evaluation target) to the user Unishi and may be.

Furthermore, in the first embodiment described above, an e-mail containing a keyword belonging to a subordinate concept of a target concept selected in advance is extracted, and a summary of the e-mail is created using the superordinate concept of the keyword. However, the present invention is not limited to this. For example, the information processing apparatus 1 extracts a verb phrase included in the sentence as a superordinate concept, and uses the extracted verb phrase to convert the data including the sentence. A summary may be created. For example, the information processing apparatus 1 may extract the verb phrase “I enjoyed” from the sentence “I enjoyed cooking” and present the verb phrase as a summary to the user.

Furthermore, in the above-described first and second embodiments, the case where the present invention is applied to the information processing apparatus 1 that monitors electronic mail has been described. However, the present invention is not limited to this, and the following The present invention can also be applied to the purpose of implementation or the embodiment.

For example, the present invention can be applied to an Internet application system. For example, data such as messages posted by the user to the SNS, recommended information and reviews posted on the website, and user or group profiles can be summarized and provided to the user by the information processing apparatus of the present invention. That is, the information processing device includes an evaluation target (for example, the product in the case of a product review posted by the user on the website) and a value determination (summary of how the product was evaluated). Therefore, the convenience of the user regarding the Internet can be improved.

The present invention can also be applied to a medical application system (for example, a system that predicts a patient's prognosis or verifies a drug effect using electronic medical records, nursing records, patient diaries, and the like as data). In this case, for example, by presenting a summary of electronic medical records, nursing records, patient diaries, etc. by the information processing apparatus of the present invention, for example, the patient may fall into a dangerous state (for example, falls). Prediction can be facilitated.

Furthermore, the present invention can also be applied to a discovery support system. For example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, the user can efficiently extract only the documents related to the lawsuit and submit them to the court. it can.

Furthermore, the present invention can also be applied to a forensic system. In this case, for example, by summarizing data such as documents, e-mails, and spreadsheet data by the information processing apparatus of the present invention, for example, it is possible to facilitate the extraction of evidence that proves the criminal activity. Work efficiency can be improved.

Furthermore, the present invention, for example, by calculating predictive coding function (based on a small number of training data, a score for a large number of unknown data (an index indicating the level of relevance between the unknown data and a predetermined case), The present invention can also be applied to a data analysis system equipped with a function that ranks a large number of unknown data. A data analysis system equipped with a predictive coding function includes a client device (for example, a user terminal such as a personal computer or a smartphone) that executes part or all of a data analysis program that executes the data analysis, and the data analysis described above. A server device that executes part or all of the program and returns the execution result to the client device, and is configured to arbitrarily share the processing included in the data analysis program between the client device and the server device. .

When the present invention is applied to a data analysis system equipped with a predictive coding function, the score calculated for the data by the predictive coding function is adjusted based on the value judgment indicated by the data summary. Anyway. For example, when the above predictive coding function gives a higher score to data that seems to match the user's preference, the value judgment indicating “not interested” is shown as a summary from the data In this case (that is, when the score and the summary contradict each other), the information processing apparatus of the present invention may be able to adjust the score, for example, by reducing the calculated score.

Furthermore, the present invention can also be applied to a patent search system. For example, by summarizing data such as patent documents and documents summarizing the invention by the information processing apparatus, the user can efficiently perform an operation of extracting invalid materials from a large number of patent documents.

Thus, the information processing apparatus of the present invention is widely applied not only to the information processing apparatus 1 that monitors e-mails but also to various systems such as a forensic system, a discovery support system, a medical application system, an Internet application system, and a patent research system. can do. Furthermore, the information processing apparatus of the present invention can be widely applied to any system such as a portal site management system, a project evaluation system, a transaction management system, a call center escalation system, and a marketing system. That is, the present invention is widely applied to a system that presents an overall image of data to the user by extracting the superordinate concept from the data, creating a summary expressed by the superordinate concept, and presenting the summary to the user. obtain.

The present invention can be widely applied to various information processing apparatuses such as an information processing apparatus that detects a change in environment or a specific state, and a server apparatus that provides a web page on the Internet.

DESCRIPTION OF SYMBOLS 1 ... Information processing apparatus, 10 ... CPU, 15 ... Display apparatus, 21 ... Topic detection program, 22 ... Extraction e-mail management table, 23 ... Electronic dictionary, 24 ... Target concept extraction database, 30 …… Database creation section, 31 …… Summary creation section, 32 …… Display section.

Claims

A database creation unit that creates a database that associates the selected target concept with data elements that are subordinate concepts of the target concept;
Extracting data including the data element registered in the database from the target data, and a summary creation unit for creating a summary expressing the content of the extracted data in a higher concept of the data element;
An information processing apparatus comprising: a display unit that classifies the data including the data elements registered in the database based on the summary and displays a classification result.
Hierarchically classifying the data elements and concepts, a dictionary containing the data elements and the concepts is given in advance,
The database creation unit
Search the dictionary for all subordinate concepts of the target concept selected from the dictionary,
Extract all the data elements corresponding to all the subordinate concepts detected by the search,
The information processing apparatus according to claim 1, wherein the database is created by associating all the extracted data elements with the corresponding target concepts.
The summary creation unit
Detecting the target concept that is a superordinate concept of the data element registered in the database included in the data;
Among the subordinate concepts of the detected target concept, a superordinate concept of the data element extracted from the data and having a predetermined abstraction level is detected, and the summary is created using the detected concept The information processing apparatus according to claim 1.
The concept having a predetermined level of abstraction is:
The information processing apparatus according to claim 2 or 3, wherein, in the graph representing the hierarchical relationship of concepts, the average distance to the leaf level is a concept having a distance less than a preset threshold.
A first step in which the information processing apparatus creates a database in which the selected target concept is associated with data elements that are subordinate concepts of the target concept;
A second step in which the information processing apparatus extracts data including the data element registered in the database from data, and creates a summary expressing the content of the extracted data in a superordinate concept of the data element When,
The information processing method includes: a third step of classifying the data including the data element registered in the database based on the summary and displaying a classification result.
A first step of creating a database in which the selected target concept is associated with data elements that are subordinate concepts of the target concept;
A second step of extracting data including the data element registered in the database from data and creating a summary expressing the content of the extracted data in a superordinate concept of the data element;
A program that causes the information processing apparatus to execute processing including a third step of classifying the data including the data element registered in the database based on the summary and displaying a classification result.