WO2016129124A1

WO2016129124A1 - Data analysis system, data analysis method, and data analysis program

Info

Publication number: WO2016129124A1
Application number: PCT/JP2015/054041
Authority: WO
Inventors: 秀樹武田; 彰晃花谷
Original assignee: 株式会社Ｕｂｉｃ
Priority date: 2015-02-13
Filing date: 2015-02-13
Publication date: 2016-08-18

Abstract

The present invention is a data analysis system provided with a controller for executing a process to separate desired data in accordance with the prescribed purpose of a user from a data constellation stored in memory, wherein the data analysis system is configured such that, when extracting part of the data from the data constellation and separating the desired data, the controller can set an evaluation criterion having content associated with the aforementioned purpose for the aforementioned part of the data in order to classify the aforementioned part of the data, classify the aforementioned part of the data in accordance with the evaluation result of the extracted part of the data on the basis of the evaluation criterion, rank and evaluate object data among the data constellation other than the extracted part of the data in conformity with the classification of the aforementioned part of the data by utilizing the classification result of the aforementioned part of the data, and separate the desired data from the data constellation by utilizing the result of evaluation of the object data.

Description

Data analysis system, data analysis method, and data analysis program

The present invention relates to a data analysis system that executes processing for separating desired data from a data group stored in a memory according to a predetermined purpose of a user.

With the development of the intellectual information industry in recent years, operators related to the intellectual information industry handle a large amount of information, and in addition to this, a large amount of information has been accumulated not only in the computer system inside and outside the operator. It is becoming a situation where information must be used effectively.

For example, a business operator who conducts consulting services summarizes the results of various projects, such as business consulting related to company personnel and accounting, management consulting related to corporate acquisitions, integration, etc., and industry consulting specialized in specific industries. It is preferable to be able to refer to and use the reports from time to time so that the quality of consulting can be improved and the project can be carried out efficiently. To be able to see the project results, it is necessary to evaluate the project and record the evaluation results.

As a system related to project evaluation, for example, there is one described in JP 2010-033424. This system is intended to help project managers understand the situation and create documents by simply presenting information and documents on the project activity status and results at the end of the project or during a specific period. A project management system that manages the execution of tasks, and a task database that subdivides the tasks performed in a project and records the tasks and information related to the tasks together with the time when the information related to the tasks was created A document database for storing documents related to tasks, and an information extraction unit for extracting specific information for extracting information representing a project at a specific point in time from the task database and the document database. .

JP 2010-033424

In order to be able to utilize the project results, it is necessary to classify and record the evaluation results in advance as much as possible, rather than just evaluating the project results. By doing so, a person who intends to utilize the results of the project can qualify and efficiently select a case suitable for the purpose of utilization from among many projects.

However, the perspectives for classifying project evaluations include, for example, the passage of time, such as where the social significance of a project used to be important, but now customer satisfaction is emphasized. However, it usually changes depending on the situation, and when trying to use the evaluation of the project, there was a problem that a case suitable for the purpose of use was not selected accurately.

Therefore, the present invention provides a data analysis system, a method for the same, and a computer program capable of accurately selecting useful data suitable for the purpose of utilization that can change dynamically from a data group. With the goal.

In order to achieve the above object, a first invention is a data analysis system including a controller that executes processing for separating desired data from a data group stored in a memory according to a predetermined purpose of a user, The controller extracts a part of the data from the data group, and classifies the part of the evaluation criteria having contents related to the purpose when trying to classify the desired data. Set for the partial data, classify the partial data according to the evaluation result of the extracted partial data based on the evaluation criteria, and use the classification result for the partial data, The target data other than the extracted partial data in the data group are evaluated in order according to the classification of the partial data, and the evaluation result for the target data Using, characterized in that said desired data is to be separated from the data group.

A second invention is a data analysis method in which a controller executes a process for separating desired data from a data group stored in a memory in accordance with a predetermined purpose of the user. In order to classify the part of the data to extract the part of the data, and to classify the part of the data, the evaluation criteria having the contents related to the purpose are classified. Set, classify the partial data according to the evaluation result of the extracted partial data based on the evaluation criteria, and extract the data group using the classification result for the partial data The target data other than a part of the data is ranked and evaluated in accordance with the classification of the part of the data, and the desired data is converted into the desired data using the result of the evaluation on the target data. It was set to be fractionated from over data group, characterized in that.

A third invention is a data analysis program for causing a computer to execute processing for separating desired data from a data group according to a predetermined purpose of the user, and extracts a part of the data from the data group, When classifying the desired data, an evaluation criterion having contents related to the purpose is set for the partial data in order to classify the partial data, and based on the evaluation criterion The partial data is classified according to the evaluation result of the extracted partial data, and the target data other than the extracted partial data in the data group using the classification result for the partial data. So that the desired data can be separated from the data group using the result of the evaluation on the target data. It was characterized by.

According to the present invention, when trying to classify desired data, since the evaluation criteria having contents related to the purpose of the user are set each time, the data is classified in advance from a specific viewpoint, Compared to the case where the data to be referred to must be selected from the range, a data analysis system, method and computer capable of accurately selecting useful data suitable for the purpose of use from the data group A program can be provided.

It is a block diagram which shows the hardware constitutions of a data analysis system. It is a functional block diagram of a business server. It is a flowchart which shows operation | movement of a data analysis system. It is an interface (input screen) of an evaluation standard setting tool. It is an example of the evaluation classification input screen of a specific client apparatus. It is another example of the evaluation classification input screen of a specific client apparatus. It is an example of a table when a business server registers a score for each unknown data in a database. The client device sorts and displays the project related data in descending order of score values. It is an example of the display screen displayed on a client apparatus.

Next, an embodiment of the present invention will be described based on the drawings, taking a data analysis system in a consulting company as an example. FIG. 1 is a block diagram showing a hardware configuration of the data analysis system. The data analysis system includes a business server 14 capable of executing core processing of data processing, one or a plurality of client devices 10 capable of executing peripheral processing of data analysis, and document data, image data, and audio data relating to project results. A management computer that executes a management function for data analysis on a storage system 18 including a database 22 that records project-related data, evaluations on project-related data, and classification results, and a client device 10 and a business server 14 12.

The client device 10 provides a part of project-related data of an existing project as sampling data to a user who is authorized to evaluate and classify the project results, and the authorized user evaluates and classifies the sampling data. To be able to.

The client device 10 includes known computer hardware resources. Specifically, the client device 10 includes an input device such as a memory (HDD, flash memory, etc.), a controller (CPU), a bus, a keyboard, and an output such as a display. An input / output interface for the apparatus and a communication interface between the business server 14 and the management computer 12 are provided. The client device 10 is connected to the business server 14 and the management computer 12 by communication means 20 such as a LAN.

The application program necessary for evaluation and classification of sampling data is stored in the memory. By executing this program, the controller can perform input / output necessary for classification and evaluation processing to authorized users. To.

The business server 14 performs classification on relational data related to project results other than the sampling data using the classification result on the sampling data. The management computer 12 includes the client device 10 and the business Necessary management processing is executed for the server 14. Similar to the client device 10, the business server 14 and the management computer 12 are configured to include a memory (HDD, flash memory, etc.), a controller (CPU), and a communication interface as hardware resources.

In the memory of the business server 14, an application program for executing classification on related data is stored, and the controller executes data search, arithmetic processing, and the like based on the program. The memory of the management computer 12 stores an application program for the controller to execute management processing.

The storage system 18 is composed of, for example, a disk array system, and includes a data group including related data relating to project results, and a database 22 that records evaluation and classification results for the related data. The business server 14 and the storage device 18 are connected (16) by the DAS method or SAN. Related data relating to the results of each of a plurality of projects exists for each of a plurality of projects as events in the field of consulting business, and the database 22 stores a plurality of related data as a data set.

Note that the hardware configuration shown in FIG. 1 is merely an example, and the data analysis system can be realized by other hardware configurations. For example, a configuration in which part or all of the processing executed in the business server 14 is executed in the client device 10 may be performed, or the storage system 18 may be built in the business server 14. It is understood by those skilled in the art that there are various hardware configurations that can implement the data analysis system, and the hardware configuration is not limited to any one (for example, the configuration illustrated in FIG. 1). .

As shown in FIG. 2, the business server 14 extracts an extraction unit 102 that extracts a part of data (sampling data) based on a predetermined standard from relation data of project results stored in the database 22. A display processing unit 103 that displays sampling data and the like on the screen of the client device 10; a classification code receiving unit 104 that receives a classification code setting (tagging) request from a user with evaluator authority for the sampling data; Based on the classification code, the extracted sampling data is classified for each classification code, and the characteristics of the sampling data, for example, related data elements are analyzed and selected from the classified sampling data, and the degree of influence of the characteristics (related The evaluation unit for the data element), the selected data element, The storage execution unit 201 for storing the value in the database 22 and a search process for the database are performed, and the data element is searched from the relational data related to the project result other than the sampling data (hereinafter referred to as unknown data). A score corresponding to the relevance between the classification code and the unknown data is obtained for each unknown data using the search unit 106, the search result obtained by the search unit 106, the data element determined by the selection unit 105, and the evaluation value. The score calculation unit 107 to be calculated, the automatic classification unit 108 that automatically assigns a classification code to unknown data based on the calculated score, and the selection unit 105 to select based on the score calculated by the score calculation unit 107 And a learning unit 110 that increases or decreases the evaluation value of the data element. Note that the configuration described as the **** unit, such as the extraction unit, is a functional configuration realized by the controller based on the program, and thus the **** unit may be rephrased as **** processing. The **** part can be replaced with hardware resources as necessary. That is, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

Sampling data is relational data related to the results of a predetermined number of projects extracted based on a predetermined standard among a plurality of projects recorded in the database 22. The evaluator authority user refers to the contents of the sampling data and evaluates the sampling data, and the business server 14 classifies the sampling data based on the evaluation result of the sampling data, that is, classifies the sampling data.

The relationship data related to the remaining project results other than the sampling data is the target data to be analyzed by the data analysis system, but it has not undergone the evaluation and classification work of the evaluator authority user. In other words, it can be abbreviated as unknown data. The data analysis system learns the evaluation and classification results of evaluator-authorized users in the sampling data and executes classification on the target data consisting of unknown data. For the analysis system, it has meaning as training data.

The classification code is an identifier (tag) for classifying sampling data and unknown data. An evaluation criterion is whether or not a tag is given to sampling data. The evaluation criteria may be rephrased as an evaluation viewpoint or an evaluation axis. The evaluation criterion is to classify the data in the database 22 for any purpose, for example, a project in which a user is judged as an optimum result for achievement of his / her own project among a large number of past projects. The contents change in relation to the purpose when the user wants to obtain the data separately for reference, that is, when the user desires to separate the data desired from other data.

As in the present embodiment, in the consulting company, for the purpose of extracting a predetermined case having a high evaluation among past projects, as an evaluation standard, for example, (evaluation index 1) to a customer Was the degree of influence large? (Evaluation index 2) Was the project's social attractiveness high? (Evaluation index 3) Was the customer's contribution to human resource development large? (Evaluation index 4) Is the project highly efficient? (Evaluation index 5) Is the compatibility with the customer's management strategy and business strategy high? As long as it is easy to be conscious by consulting firms whose main business is advice on various corporate activities, such as, one or more can be set. . The data analysis system sets evaluation criteria when attempting to classify data in a database, which is a data analysis target, for a predetermined purpose.

When the evaluator authority user evaluates each of the plurality of sampling data based on each evaluation standard and affirms the evaluation standard, the business server 14 sets a tag (flag) corresponding to the evaluation standard. For example, if the evaluator authority user affirms all the evaluation indexes 1-5, five types of tags are set for the sampling data. The selection unit classifies the sampling data based on tagging of the sampling data. For example, the first sampling data is classified as “yes” for all of the evaluation indexes 1-5, and the second sampling data is “yes” for the evaluation indexes 1-4 and “no” for the evaluation item 5. And so on. In addition, when three or more of the evaluation items 1-5 are “Yes”, the correspondence mode between the set number of tags and the number of classification items is appropriately determined, for example, the sampling data is classified as “relevant”. One tag is not limited to 1: 1 correspondence with one classification item.

Furthermore, when there are a plurality of evaluator authority users, the evaluation policy for the evaluation index varies depending on the evaluator authority users, so a tag is set for each evaluator authority user. Therefore, in addition to the evaluation index, the evaluator authority user itself is a specific example of the evaluation standard. By providing a plurality of evaluation criteria, the data analysis system can ensure the diversity of evaluation, that is, satisfy the change in the utilization purpose of data and the diversity thereof.

Related data is data related to the object to which the data belongs, and the object is, for example, the result of the project in the consulting business according to this embodiment. Data is mainly document data, but widely includes image data, audio data, video data, and the like. Document data is digital information including at least text information. For example, e-mails, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc., have incomplete structural definitions. Wide-ranging data (unstructured data such as natural language).

A data element is a meaningful element that forms at least part of the related data. A typical data element is a keyword of document data. A keyword is a group of character strings having a certain meaning in a certain language, that is, a morpheme.

 In addition to keywords, sentences and paragraphs can also be data elements. When the related data is other than the document data, a partial image of the entire image, a partial audio of the entire audio, and a partial frame of all the frames of the video correspond to data elements. The data analysis system extracts useful data elements from a plurality of sampling data classified according to a predetermined classification code and assigned the same classification code, and classifies unknown data in the same way as sampling data based on the data elements. Analyze whether it can be done. Data elements are extracted for each of a plurality of classification codes.

As described above, the data elements including the keyword selected by the selection unit 105 are recorded in the database 22. Further, the business server 14 determines in advance the data elements that can be classified as excellent if the result of the project is highly relevant to the superiority or inferiority of the project and is included in the relational data, Can be registered in the database.

Also, based on the results of past classification processing, it is possible to register in the database data elements that are highly relevant to the relational data to which the code relating to the excellence of the project has been assigned. The keywords once registered in the database 201 are increased or decreased according to the learning result by the learning unit 110, and can be additionally registered and deleted manually.

Next, the operation of the data analysis system will be described with reference to FIG. A management user having administrator authority executes a request 300 for extracting sampling data to the management computer 12. As a form of the extraction request, a form in which relational data of a predetermined number of projects is randomly sampled from relational data related to the result of the project recorded in the database 22, relational data of a predetermined range of projects, for example, projects There is a form in which a predetermined number of project-related data is sampled in order from the latest end date and time.

The predetermined number can be appropriately set by the management user, such as a predetermined percentage of the total number of projects. The management computer 12 generates an extraction command based on the extraction request and transmits it to the business server 10 (302). The extraction unit 102 of the business server 14 extracts a predetermined number of sampling data from the database 22 based on the management command from the management computer 12 (304).

Further, the management user sets an evaluation standard for the management computer 12 based on the evaluation standard setting tool of the management computer 12 when the evaluator authority user evaluates and classifies the sampling data (306). FIG. 4 shows an interface (input screen) of the evaluation criteria setting tool, which includes a plurality of evaluation index input fields 400 and a user ID input field 402 to which classification evaluation authority is given.

In the former input field 400, the management user can input one or a plurality of evaluation indexes. Furthermore, the management user can freely define the content of the evaluation index without limitation as in the above-described evaluation index 1-5. Therefore, even if the purpose of using project deliverables may change depending on the environment and individual preferences, the data analysis system can dynamically change the evaluation index (evaluation criteria) accordingly. .

分類 By classifying the relational data based on this evaluation index, the results of the project that can be consistent with the current utilization purpose are always presented to the user. The management user may be allowed to select a desired one from those of a prescribed evaluation index from a pull-down menu.

On the other hand, the evaluation and classification policies and positions for sampling data differ for each evaluator. For example, evaluator A wants to sort out projects that have a large “degree of influence on customers”, while evaluator B wants to sort out projects that have a large “social attractiveness of projects”. This is when you have a purpose. The same applies when the background of the evaluator is different.

For example, a development manager (a tendency to evaluate with emphasis on the technical aspects of the project), a finance manager (a tendency to evaluate with a focus on the cost management of the project), a planning manager (an evaluation with emphasis on the significance of the project theme) ), A customer response manager (a tendency to evaluate with an emphasis on customer satisfaction of the project), and a labor manager (a tendency to evaluate with an emphasis on labor management in the project process).

The reason why the data analysis system allowed the participation of multiple evaluators for the same evaluation index is to give diversity to the data classification results. The purpose of utilizing project results varies from individual to individual. However, if there are diversity in the classification results of data, individuals who intend to utilize the results of the project will be evaluated by evaluators who tend to follow their own purpose of utilization. The classification result can be referred to.

For example, an individual who wants to use what is related to the outcome of a project in a project he / she is working on is more than the result of evaluator A who emphasizes the “degree of influence on customers”. This is a case where an evaluation and classification result by an evaluator B who places emphasis on "is desired.

When the management computer 12 receives the evaluation standard setting information from the management user, the management computer 12 sends it to the client device 10 (specific client device) of the evaluator authority user (310), and also sends it to the business server (308). The business server 10 sends the sampling data extracted by the extraction unit 102 to the specific client device (312). The specific client device executes the classification evaluation setting program to activate the evaluation classification input interface, and presents the evaluation classification input screen to the evaluator authority user. FIG. 5 shows an example of the screen, which includes a sampling data list 500, contents 504 for each of a plurality of evaluation indices, and a check box 502 for each evaluation index.

When the evaluator authority user selects the sampling data list, the details 506 of the selected sampling data are displayed as shown in FIG. The list of sampling data is presented as a project ID 510 and a project name 512 (for example, **** Construction of a personnel evaluation system for a company). Details 506 include text data including an outline of the contents of the project and an evaluation of the project.

The evaluator authority user reviews each of the evaluation indexes in order while referring to the sampling data details 506, and evaluates whether or not each evaluation index is established. For example, when the evaluator authority user determines that the sampling data (evaluation index 1) has a large influence on the customer, the check box corresponding to the evaluation index 1 is checked.

On the other hand, if sampling data (evaluation index 2) evaluate that the project's social attractiveness is not large, the check box corresponding to evaluation index 2 is not checked. When the check box is checked, a tag for the checked evaluation index is set by the business server 14.

When the evaluator authority user finishes the evaluation and classification of the sampling data, the client device 10 transmits evaluation classification input information to the business server 14 (314). The business server 14 determines the necessity of tag setting for each evaluation index and each evaluator user based on the evaluation classification input information obtained from all evaluator authority users, and registers the result in the database 22.

The selection unit 105 of the business server 14 refers to the tag setting information in the database 22 and automatically classifies unknown data from a collection of sampling data in which tags are set for each evaluation index and for each evaluator user. In step 316, useful data elements that are characteristic of the above are extracted according to a predetermined selection criterion. Here, “beneficial” means that it is effective for evaluating whether or not the same tag should be set for unknown data having content similarity with the sampling data in which the tag is set. .

The selection unit 105 of the business server 14 extracts useful data elements for the classification evaluator A based on the sampling data in which the tag of the first evaluation index is set, and uses this for all evaluations after the second evaluation index. Repeat for indicators. Further, the selection unit 105 repeats this for the remaining evaluator authority users. Therefore, the selection unit 105 extracts useful data elements for each evaluation index and for each evaluator authority user.

Examples of useful data elements include a plurality of sampling data with a tag, or a keyword that appears at a predetermined frequency or more in a predetermined number of sampling data. Note that useful data elements may be set by an administrative user.

The selection unit 105 evaluates the usefulness level of each of the plurality of data elements according to a predetermined evaluation criterion. As a predetermined evaluation criterion, a data element can be evaluated using a transmission information amount indicating a dependency relationship with an evaluation index. For example, when a selection unit extracts a keyword as a data element from document information (text), the keyword is evaluated by calculating a keyword weight. “Weight” refers to the degree of the evaluation value, such as the magnitude, degree, superiority, inferiority, etc. of the data element, regardless of the type of the data element, such as a keyword, partial sound, partial image, or partial video. That's what it says.

The learning unit 110 calibrates the weight of each keyword according to a predetermined algorithm. For example, the learning unit 110 includes a plurality of data elements constituting at least a part of training data in a training data set (a data set including a plurality of combinations of training data and classification information (tags) for classifying the training data). The degree of contribution to a plurality of combinations included is evaluated as the weight based on a predetermined criterion (for example, the amount of transmitted information). Further, the learning unit 110 repeatedly reevaluates the weight of each keyword until the score of the sampling data with the tag set is higher than the score of the sampling data with no tag set, and determines the weight. It can be recalculated.

Specifically, first, the learning unit 110 calculates a score for sampling data for which the evaluator has already performed setting and non-setting of tags based on the weights calculated once, and arranges the sampling data according to the magnitude of the score. . At this time, it is desirable that the sampling data with the tag set be arranged in a higher rank than the sampling data with no tag set.

Therefore, the learning unit 110 continues to correct the weights until such a sequence is obtained. Then, the learning unit 110 determines whether or not the tag is set for the unknown data, with an intermediate value between the lowest score of the sampling data with the tag set and the highest score of the sampling data with no tag set. Is a threshold for automatic determination. The learning unit 110 calculates the weight wgt of the data element using, for example, the following equation (1).

wgt indicates the initial value of the weight of the i-th selected keyword before learning. Wgt represents the weight of the i-th selected keyword after the L-th learning. γ means a learning parameter in the L-th learning, and θ means a learning effect threshold.

The business server 14 stores the data element extracted by the selection unit 105, the evaluation value for each data element, and the threshold value in the database. The data element, the evaluation value of the data element, and the threshold value are stored in the database for each evaluation index and each classification evaluator.

Next, the business server 14 compares the data element with the unknown data, and evaluates and determines the degree of relevance between the classification result of the sampling data and the unknown data, and the classification of the unknown data is input by the user. Run without the need. That is, the search unit 106 takes in a plurality of unknown data to be automatically classified from the database 22 and sequentially reads a plurality of data elements recorded from the database for the unknown data of each project. The presence or absence is searched (320). The score calculation unit 107 calculates a score of unknown data based on an evaluation value corresponding to the searched data element when there is a data element searched by the search unit 106 for each unknown data. The data is ordered (322).

When the data element is a keyword, the score calculation unit 107 can calculate a score from the following formula based on the weight of the keyword. The score is a quantitative evaluation of the strength of association of unknown data with a classification code.

Alternatively, the score calculation unit 107 calculates the result of evaluating the first data element included in the data (weight of the first data element) and the result of evaluating the second data element included in the data (second data element). The score may be calculated based on the weight. That is, when the first data element appears in the data, the score calculation unit 107 also refers to the frequency at which the second data element appears in the data (that is, the correlation or co-occurrence between the first data element and the second data element). ) Can be taken into account. Thereby, since the data analysis system can calculate the score in consideration of the correlation between the data elements, it can extract the unknown data related to the training data with higher accuracy.

Further, the score calculation unit 107 not only ranks the data by calculating a score for each data (assuming the evaluation result of the data), but also, for example, scores for each sentence or paragraph included in the data. Calculate and integrate the scores (for example, by extracting the maximum score value or adding a predetermined number of scores in descending order), and the integrated score is used as the evaluation result of the data You can also. Thereby, the data analysis system can more accurately select useful data suitable for the purpose of use from the data group. In addition, when the data is data including at least a user's evaluation of the event, the score calculation unit 107 represents the emotion of the user who generated the data and the emotion for the event generated based on the evaluation. Can be extracted from. The score calculation unit 107 can also cluster the data for each context included in common in the data. Further, the score calculation unit 107 is provided for each phase (for example, a proposal stage, an execution stage, etc.) that is an index indicating each stage in which a predetermined action (for example, an action in which a consultant proposes a solution to a problem to a customer) progresses. It is also possible to evaluate the data and identify the current phase based on the result of the evaluation.

The automatic classification unit 108 automatically evaluates the unknown data based on the evaluation of the sampling data, the digitized index related to the relationship between the classification result and the unknown data, that is, the calculated score, and is the same as the data element. Decide whether to set tags. If the score is equal to or greater than the above-described threshold, a tag is set for unknown data. The business server 14 may exclude, in advance from unknown data, unknown data that does not include keywords registered in the database 22 in advance, related terms, and data elements selected by the selection unit 105 from the target of score calculation. Is possible.

The business server 14 registers a score in the database 22 for each unknown data. FIG. 7 shows an example of a table registered in the database. For each unknown data (

data

1, 2, 3,...), For each evaluation index (evaluation index 1-5), for each evaluator (evaluator A). , B), the score is recorded. Each of Ad represents a score value. The business server 14 determines the establishment of the tag based on the score value, and the tag information is a database for each of the unknown data for each evaluation index (evaluation index 1-5) and for each evaluator (evaluation A, B). May be registered. Evaluation of unknown data on the business server is based on the superiority or inferiority of multiple unknown data such as the above-mentioned score. Tags are set for each of multiple unknown data, and multiple unknown data are identified based on the magnitude of the score. It includes predetermined calculation processing based on the degree of relevance of a plurality of unknown data to the sampling data, such as enabling the data to be performed.

The user who wants to refer to the project result using the classification result of the relational data related to the project result and utilize the result, as described above, after the data classification is completed, as described above, via the client device 10 An evaluation index (one or more of the evaluation indexes 1-5) and an evaluator (one evaluator or a plurality of evaluators) may be designated and transmitted to the business server 14.

The business server 14 extracts project-related data in which a classification tag is assigned to the evaluation index specified with reference to the database 22 and the score is evaluated using the evaluation by the specified evaluator. And the score value of each evaluation index is transmitted to the client device 10. The client device sorts and displays the project related data in descending order of score values.

FIG. 8 is an example of the display screen, and shows the score value of the evaluator A who has selected the evaluation index 1-5. Each aj is a score. The total value is an index that comprehensively evaluates the selected evaluation index, and is, for example, the total value or average value of the scores of the evaluation index. In this case, the higher the data, the higher the score in terms of the selected evaluation items. In addition, as the total value, the weight of each evaluation value may be changed. For example, the evaluation index 1 is regarded as important, the weight is set to 40%, and the rest is set to 15%. Further, the score values may be sorted for each evaluation index. Furthermore, the data analysis system uses the gradation corresponding to the ratio that the data associated with the predetermined classification information (tag) occupies for all the data, and the ratio of the ratio to the result of evaluating each of the plurality of data. The distribution can be displayed so as to be visible. For example, in the data analysis system, the percentage of data that was judged by the evaluator as “the degree of influence on the customer was large” (that is, the tag for the evaluation criterion 1 was set) increased in all data. The distribution of the ratio with respect to the score calculated with respect to the data can be displayed using a gradation that changes from green to red. Furthermore, the data analysis system can also evaluate data based on a plurality of evaluation criteria and display a radar chart showing the plurality of evaluation results with the plurality of evaluation criteria as axes.

In the above-described embodiment, the data analysis system is described as being realized by the client device and the server, but may be realized by the client device. In addition, the system that can evaluate and classify the project results and select the most suitable for utilizing the results of past projects has been explained. Therefore, the present invention can be applied to other technical fields in which electronic medical records are utilized in hospitals.

Furthermore, in the above-described embodiment, a system having a plurality of evaluators and a plurality of evaluation criteria has been described. However, as described above, the feature of the present invention is that when trying to classify desired data, Since the evaluation criteria having contents related to the purpose of the user are set each time, the data is classified in advance from a specific viewpoint, and the data to be referred to must be selected from the range. In comparison, useful data suitable for the purpose of utilization can be accurately selected from the data group, and therefore the number of evaluators and evaluation criteria may be singular.

[Other application examples]
In the above embodiment, an example in which the data analysis system is realized as a “project evaluation system” (that is, an example in which the object to be analyzed by the data analysis system is related data related to project results) has been described. The system can also be applied to the following purposes or embodiments.

For example, a data analysis system can be applied to an information asset utilization system that utilizes information stored in a company. In other words, this data analysis system is realized as a system that utilizes (dynamically) information assets possessed by companies / experts according to the situation, and, for example, (1) it is desired to shorten the development period. In order to make the development site more efficient, information on products developed in the past can be reused according to the requirements of the development, or (2) useful information assets can be identified based on the expertise possessed by skilled engineers. Can be. More specifically, the evaluation criteria of the data analysis system are appropriately changed according to the characteristics of the target, such as the technical field to which the target for utilizing the target data belongs, the technical or economic characteristics of the target, etc. As a result, it is possible to dynamically extract information that is highly likely to meet current individual and specific requirements. The useful data thus selected can be accurately selected from a large group of data. This also applies to other technical fields described below.

The data analysis system of the present invention can also be applied to Internet application systems. In this case, the data analysis system uses data (for example, a message posted by the user to the SNS, recommended information posted on the website, a profile of the user or organization, etc.) as a predetermined evaluation criterion (for example, the user's preference). For example, whether the user's preference is similar to the user's preference, whether the user's preference matches the restaurant attribute, etc. It is possible to display a list of other users, present restaurant information that suits the user's preferences, and warn organizations that may harm the user. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to a driving support system. In this case, the data analysis system determines whether the data (for example, data acquired from an in-vehicle sensor, a camera, a microphone, or the like) is information that the skilled driver has focused on during a predetermined evaluation standard (for example, driving by the skilled driver). For example, useful information that can make driving safe and comfortable can be automatically extracted. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to financial related systems. In this case, the data analysis system uses the data (for example, a report document to the bank, the market price of the stock price, etc.) for a predetermined evaluation standard (for example, whether there is a risk of fraud or whether the stock price increases). For example, a report having an unauthorized purpose can be detected, or a future stock price can be predicted. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to a medical application system (a system that estimates whether or not a specific dangerous behavior of a victim is caused by using electronic medical records, nursing records, patient diaries, etc. as data). In this case, the data analysis system evaluates data (e.g., electronic medical record, nursing record, patient diary, etc.) based on a predetermined evaluation standard (e.g., whether or not to take a specific dangerous action of the patient). Thus, for example, it can be predicted that the patient falls into a dangerous state (for example, falls). Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to a smart mail system. In this case, the data analysis system evaluates the data (for example, e-mail, attached file, etc.) based on a predetermined evaluation standard (for example, whether it is necessary to reply to the e-mail), For example, important mails (mails that require action) can be extracted from a large number of mails. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

The data analysis system can also be applied to a discovery support system. In this case, the data analysis system evaluates the data (eg, document, e-mail, spreadsheet data, etc.) based on a predetermined evaluation standard (eg, whether or not to submit to the lawsuit), for example, Only documents related to this case may be submitted to the court. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to a forensic system. In this case, the data analysis system uses data (eg, documents, e-mails, spreadsheet data, etc.) based on predetermined evaluation criteria (eg, whether the data is evidence that can prove criminal activity). For example, evidence that proves the criminal act can be extracted. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to an email audit system. In this case, the data analysis system uses the data (for example, e-mail, attached file, etc.) based on a predetermined evaluation standard (for example, whether or not the user who sent / received the e-mail tried to cheat) By evaluating, for example, a sign of fraud such as information leakage or collusion can be found. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

Also, the data analysis system can be applied to a patent search system. In this case, the data analysis system can use the data (eg, patent literature, documents summarizing the invention, etc.) for a predetermined evaluation standard (eg, the patent literature can provide evidence that the given patent is rejected / invalidated). For example, invalid materials can be extracted from a large number of patent documents. Thereby, the data analysis system can accurately select useful data suitable for the purpose of use from the data group.

In this way, the data analysis system is not only a project evaluation system but also a forensic system, a discovery support system, a medical application system, an email audit system, an Internet application system, a driving support system, a financial system, a patent research system, etc. The present invention can be applied to any system that achieves an object by evaluating based on a predetermined evaluation standard.

The present invention can be widely applied to arbitrary computers such as personal computers, servers, workstations, mainframes, and the like.

10 Client device 12 Management computer 14 Business server 18 Storage system 22 Database

Claims

A data analysis system comprising a controller that executes processing for separating desired data from a data group stored in a memory according to a predetermined purpose of a user,
The controller is
Extracting some data from the data group;
In order to classify the part of the data in order to classify the part of the data in order to classify the part of the data, in order to classify the desired data,
Classifying the partial data according to the evaluation result of the extracted partial data based on the evaluation criteria;
Using the classification result for the partial data, the target data other than the extracted partial data in the data group is ranked according to the classification of the partial data and evaluated,
A data analysis system in which the desired data can be separated from the data group using a result of evaluation on the target data.
The controller is
When the desired data is to be sorted by the user, the evaluation criterion having a predetermined content is set based on the input by the user.
The data analysis system according to claim 1.
The controller is
Permitting evaluation inputs from a plurality of evaluators with respect to the evaluation criteria,
Based on the evaluation results for each of the plurality of evaluators, the target data other than some of the extracted data in the data group is ranked and evaluated according to the classification of the partial data.
The data analysis system according to claim 1.
The controller is
Set multiple evaluation criteria with different contents,
Based on the evaluation results for each of the plurality of evaluation criteria, the target data other than the extracted part of the data group is ranked and evaluated according to the classification of the part of the data,
The data analysis system according to claim 1.
The controller allows an evaluation input from a plurality of evaluators with respect to the evaluation criteria, and sets a plurality of the evaluation criteria so that the contents are different from each other,
Based on the evaluation results for each of the plurality of evaluators and for each of the plurality of evaluation criteria, the target data other than the extracted part of the data group is ranked according to the classification of the part of the data. evaluate,
The data analysis system according to claim 1.
The controller is
In a selected range based on a first selection input for selecting a predetermined evaluator from the plurality of evaluators and / or a second selection input for selecting a predetermined evaluation criterion from the plurality of evaluation criteria. The classified target data can be referred to.
The data analysis system according to claim 5.
The controller is
When the evaluation criterion is affirmed, a classification code for the evaluation criterion is given to the partial data,
Evaluate the relationship between the target data and the part of the data to which the classification code is assigned,
When the evaluated numerical value is greater than or equal to a predetermined value, it is determined that the target data matches the evaluation criteria;
The data analysis system according to claim 1.
The controller is
Extracting a data element related to the evaluation criterion from the partial data to which the classification code is assigned;
Evaluating the extracted data elements based on predetermined criteria;
In accordance with the data element and the evaluation result, quantification of relevance between the partial data to which the classification code is assigned and the target data is executed.
The data analysis system according to claim 7.
A data analysis system comprising a controller that executes processing for separating desired data from a data group stored in a memory according to a predetermined purpose of a user,
The data group includes a plurality of data sets for independent events,
The controller is
Extract a predetermined number of data sets from the data group,
When trying to classify the desired data set, a plurality of evaluation criteria having contents related to the purpose are set for the predetermined number of data sets,
Allowing an evaluation input from a plurality of evaluators for each of the plurality of evaluation criteria,
Corresponding to each evaluation result based on each of the plurality of evaluation criteria and corresponding to the classification code according to the evaluation result of the predetermined number of data sets based on the evaluation result of the plurality of evaluators Classify by attaching
For each predetermined number of classified data sets, a predetermined data element is extracted and evaluated,
Evaluating the extracted data elements based on predetermined criteria;
Based on the data element and the evaluation result, quantification of the relationship between the predetermined number of data sets and a target data set other than the predetermined number of data sets,
Rank and evaluate the target data set based on the performed quantification,
Classifying the target data set based on the classification code based on the result of the evaluation on the target data set;
Using the classification result for the target data set, the desired data can be separated from the data group.
Data analysis system.
A data analysis method in which a controller executes processing for separating desired data from a data group stored in a memory according to a predetermined purpose of a user,
The controller is
Extracting some data from the data group;
In order to classify the part of the data in order to classify the part of the data in order to classify the part of the data, in order to classify the desired data,
Classifying the partial data according to the evaluation result of the extracted partial data based on the evaluation criteria;
Using the classification result for the partial data, the target data other than the extracted partial data in the data group is ranked according to the partial data classification and evaluated,
Using the result of evaluation on the target data, the desired data can be separated from the data group.
Data analysis method.
A data analysis program for causing a computer to execute processing for separating desired data from a data group according to a predetermined purpose of a user,
Extracting some data from the data group;
In order to classify the part of the data in order to classify the part of the data in order to classify the part of the data, in order to classify the desired data,
Classifying the partial data according to the evaluation result of the extracted partial data based on the evaluation criteria;
Using the classification result for the partial data, the target data other than the extracted partial data in the data group is ranked according to the classification of the partial data and evaluated,
Using the result of evaluation on the target data, the desired data can be separated from the data group.
Data analysis program.