WO2016189605A1 - Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement - Google Patents

Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement Download PDF

Info

Publication number
WO2016189605A1
WO2016189605A1 PCT/JP2015/064832 JP2015064832W WO2016189605A1 WO 2016189605 A1 WO2016189605 A1 WO 2016189605A1 JP 2015064832 W JP2015064832 W JP 2015064832W WO 2016189605 A1 WO2016189605 A1 WO 2016189605A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
component
target data
evaluation
analysis system
Prior art date
Application number
PCT/JP2015/064832
Other languages
English (en)
Japanese (ja)
Inventor
秀樹 武田
和巳 蓮子
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Priority to JP2017520082A priority Critical patent/JPWO2016189605A1/ja
Priority to PCT/JP2015/064832 priority patent/WO2016189605A1/fr
Publication of WO2016189605A1 publication Critical patent/WO2016189605A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to a data analysis system that analyzes data, and can be applied to, for example, an artificial intelligence system that analyzes big data.
  • the present invention has been made in view of such problems, and a data analysis technology system capable of accurately analyzing a data group regardless of the amount of the partial data for the data group, and
  • the purpose is to provide the related technology.
  • a data analysis system for analyzing data, comprising a memory, an input control device, and a controller.
  • An index for ranking the target data is generated, and the index corresponds to the relationship between each target data and a predetermined case, and changes based on an input given by the user via the input control device.
  • the memory stores at least temporarily the plurality of target data
  • the input control device presents sample data for the target data to a user, and receives input of classification information from the user;
  • the classification information is associated with the sample data based on the input to classify the sample data, and the sample data
  • a combination of data and classification information received from the user is provided as reference data to the controller, and the controller obtains a plurality of the reference data and extracts a first component from the plurality of reference data
  • the first component constitutes at least a part of the reference data, evaluates the degree to which the first component contributes to the combination, and the evaluated first component
  • a second component having relevance to the target data is extracted from at least one of the plurality of target data, and the second component constitutes at least a part of the target data. Evaluating the relevance between the plurality of target data and the predetermined case by evaluating the component and generating the index based on the evaluation result of the second component And it is characterized in that.
  • a control method of a data analysis system for analyzing data generates an index for ranking a plurality of target data, and the index includes each target data and a predetermined value.
  • a first step corresponding to a relevance to the case and changing based on an input from a user; a second step storing at least temporarily the plurality of target data;
  • a third step of presenting sample data for the target data to the user and an input of classification information is received from the user, and the classification information is associated with the sample data based on the input to classify the sample data
  • a combination of the sample data and the classification information received from the user is provided as reference data.
  • the third invention is a data analysis system control program for causing a computer to execute each step included in the control method of the data analysis system.
  • the invention is characterized in that it is a computer-readable recording medium on which a control program of the data analysis system is recorded.
  • the data analysis system, the control method, the control program, and the recording medium according to one aspect of the present invention have the effect that the data group can be accurately analyzed regardless of the amount of the partial data with respect to the data group. Play.
  • FIG. 1 is a block diagram illustrating an example of a hardware configuration of a data analysis system (hereinafter, simply referred to as “system”) according to the present embodiment.
  • the system includes, for example, an arbitrary recording medium (eg, memory, hard disk, etc.) capable of storing data (including digital data and analog data), and a controller capable of executing a control program stored in the recording medium.
  • an arbitrary recording medium eg, memory, hard disk, etc.
  • data including digital data and analog data
  • controller capable of executing a control program stored in the recording medium.
  • “data” may be any data expressed in a format that can be processed by the computer.
  • the data may be, for example, unstructured data whose structure definition is incomplete at least in part, and document data (for example, e-mail (attached file header) Information), technical documents (including a wide range of documents explaining technical matters such as academic papers, patent publications, product specifications, design drawings, etc.), presentation materials, spreadsheets, financial statements, meeting materials, Record reports, sales documents, contracts, organization charts, business plans, company analysis information, electronic medical records, web pages, blogs, comments posted on social network services, etc., audio data (eg conversation / music) Data), image data (eg, data composed of a plurality of pixels or vector information), video data (eg, Broadly includes such configured data) of a plurality of frame images.
  • document data for example, e-mail (attached file header) Information
  • technical documents including a wide range of documents explaining technical matters such as academic papers, patent publications, product specifications, design drawings, etc.
  • presentation materials including a wide
  • “reference data” may be, for example, data associated with classification information by a user (data that has been classified, which is a combination of data and classification information).
  • the “target data” may be data not associated with the classification information (unclassified data that is not presented to the user as reference data and is not classified for the user).
  • the “classification information” may be an identification label used for classifying reference data, for example, a “Related” label indicating that the reference data and a predetermined case are related, Information that classifies the reference data into three, such as a “High” label indicating that they are related and a “Non-Related” label indicating that they are not related, or “good”, “ It may be information that classifies the reference data into five categories such as “slightly good”, “normal”, “slightly bad”, and “bad”.
  • the “predetermined case” includes a wide range of targets for which the system is evaluated for relevance to data, and the scope thereof is not limited.
  • the predetermined case may be a case where the discovery procedure is required when the system is realized as a discovery support system, or a crime that is the subject of an investigation when the system is realized as a criminal investigation support system.
  • an email monitoring system When implemented as an email monitoring system, it may be fraudulent activity (eg, information leakage, collusion, etc.), or medical application system (eg, pharmacovigilance support system, clinical trial efficiency system, medical risk)
  • medical application system eg, pharmacovigilance support system, clinical trial efficiency system, medical risk
  • pharmacovigilance support system e.g., pharmacovigilance support system, clinical trial efficiency system, medical risk
  • a hedging system When implemented as a hedging system, fall prediction (fall prevention) system, prognosis prediction system, diagnosis support system, etc., it may be a case or case related to medicine, or an Internet application system (for example, smart mail system, information aggregation ( Curation) system, user monitoring System, social media management system, etc.), it may be case examples / cases related to the Internet, and when implemented as a project evaluation system, it may be a project that has been performed in the past or as a marketing support system.
  • an Internet application system for example,
  • it may be a product / service targeted for marketing, or it may be realized as an intellectual property evaluation system, it may be an intellectual property subject to evaluation, or it may be realized as an unauthorized transaction monitoring system, It may be a fraudulent financial transaction, if it is realized as a call center escalation system, it may be a past response case, if it is realized as a credit check system, it may be a subject of credit check, and driving support When implemented as a system, It may be that on the rolling, if it is implemented as a sales support system, may be in the operating results.
  • the data analysis system 1 includes, for example, a server device 2 that can execute main processing of data analysis and one or more that can execute related processing of data analysis.
  • a storage system 5 including a plurality of client devices 3, a database 4 for recording data and evaluation results for the data, and a management computer 6 that provides a management function for data analysis to the client device 3 and the server device 2. And may be provided.
  • the client device 3 can present a part of a plurality of target data to the user as sample data before classification. As a result, the user can input for evaluation / classification of the sample data via the client device 3.
  • the server device 2 can randomly sample a plurality of target data, extract a predetermined number of sample data, and provide this to a predetermined client device.
  • the sample data may be data belonging to a data group that is not included in the target data to be analyzed but has a predetermined case that is the same as or similar to the target data.
  • the client device 3 includes, as hardware resources, for example, a memory, a controller, a bus, an input / output interface (for example, a keyboard and a display), and a communication interface (communication means using a predetermined network). And the server apparatus 2 and the management computer 6 are communicably connected).
  • the server device 2 Based on the sample data to which the classification information is attached, that is, the combination of the sample data and the classification information (this is referred to as “reference data”), the server device 2 includes a pattern (for example, included in the data). Broadly refer to abstract rules, meanings, concepts, styles, distributions, samples, etc., not limited to so-called “specific patterns”), and based on these patterns, the relationship between the target data and a given case evaluate. That is, the server device 2 can evaluate the relevance between the target data and the lawsuit based on the learned pattern, can also evaluate the relevance between the target data and the criminal investigation, And the user's preference can be evaluated, and the relationship between the target data and any other event can be evaluated. Similarly to the client device 3, the server device 2 may include, for example, a memory, a controller, a bus, an input / output interface, and a communication interface as hardware resources.
  • the management computer 6 executes predetermined management processing for the client device 3, the server device 2, and the storage system 5.
  • the management computer 6 may include, for example, a memory, a controller, a bus, an input / output interface, and a communication interface as hardware resources.
  • application programs that can control each device are stored in the memory provided in each of the client device 3, the server device 2, and the management computer 6, and each controller executes the application program to thereby execute the application program.
  • Programs (software resources) and hardware resources cooperate to operate each device.
  • the storage system 5 may be composed of, for example, a disk array system, and may include a database 4 that records data and results of evaluation / classification of the data.
  • the server apparatus 2 and the storage system 5 are connected by a DAS (Direct Attached Storage) method or a SAN (Storage Area Network).
  • DAS Direct Attached Storage
  • SAN Storage Area Network
  • FIG. 1 the hardware configuration shown in FIG. 1 is merely an example, and the above system can be realized by other hardware configurations.
  • a part or all of the processing executed in the server device 2 may be executed in the client device 3, or a part or all of the processing may be executed in the server device 2.
  • the storage system 5 may be built in the server device 2. It is understood by those skilled in the art that there can be various hardware configurations capable of realizing the system, and the present invention is not limited to one specific configuration (for example, the configuration illustrated in FIG. 1).
  • FIG. 2 is a functional block diagram showing an example of the predictive coding function realized by the data analysis system according to the present embodiment.
  • the system can include a predictive coding unit 10.
  • the predictive coding (Predictive Coding) unit 10 is a large number of data (target data not associated with classification information) based on a small number of data manually classified (referred to as the reference data described above). For example, it is big data.) The target data is evaluated so that significant information can be extracted.
  • the predictive coding unit 10 includes, for example, a data acquisition unit 11, a classification information acquisition unit 12, a data classification unit 13, a component extraction unit 14, a component evaluation unit 15, a component storage 16 and a data evaluation unit 17. Can do.
  • the data acquisition unit 11 acquires data from an arbitrary storage resource (for example, the database 4, a web server on the Internet, a mail server on the intranet, etc.).
  • the data acquisition unit 11 provides all data to be subjected to data analysis as target data to the component extraction unit 14, randomly samples the target data, acquires a predetermined number of sample data, and classifies the data Provided to part 13.
  • the classification information acquisition unit 12 acquires the classification information input by the user for each sample data from an arbitrary input device (for example, the client device 3), and outputs the classification information to the data classification unit 13.
  • the data classification unit 13 combines the plurality of sample data sent from the data acquisition unit 11 and the classification information input to each sample data from the classification information acquisition unit 12, and uses the combination as a plurality of reference data To the component extraction unit 14.
  • the component extraction unit 14 extracts the components constituting the reference data from the plurality of reference data received from the data classification unit 13.
  • the “component” may be partial data constituting at least a part of the data, for example, a morpheme, a keyword, a sentence, a paragraph, and / or metadata (for example, an email header) constituting the document.
  • Information partial audio that constitutes audio, volume (gain) information, and / or timbre information, partial image that constitutes an image, partial pixels, and / or luminance information, and video Frame image, motion information, and / or 3D information.
  • the component extraction unit 14 outputs the extracted component and classification information corresponding to the component to the component evaluation unit 15. Further, the constituent element extraction unit 14 extracts constituent elements constituting the target data from the target data input from the data acquisition unit 11 and outputs the constituent elements to the data evaluation unit 17.
  • the component evaluation unit 15 evaluates the component input from the component extraction unit 14. For example, the component evaluation unit 15 determines the degree of contribution of the plurality of components constituting at least part of the reference data to the combination (in other words, the distribution in which the components appear according to the classification information). Evaluate each. More specifically, the constituent element evaluation unit 15 uses, for example, a transmission information amount (for example, an information amount calculated from a predetermined definition formula using the appearance probability of the constituent element and the appearance probability of the classification information). Then, the evaluation value of the component is calculated by evaluating the component. Thereby, the component evaluation part 15 can learn the pattern contained in the said reference data. The component evaluation unit 15 outputs the component and the evaluation value of the component to the component storage unit 16.
  • a transmission information amount for example, an information amount calculated from a predetermined definition formula using the appearance probability of the constituent element and the appearance probability of the classification information.
  • the component storage unit 16 associates the component and the evaluation value input from the component evaluation unit 15, and stores both in an arbitrary memory (for example, the storage system 5).
  • the data evaluation unit 17 reads an evaluation value associated with the component input from the component extraction unit 14 from an arbitrary memory (for example, the database 4 of the storage system 5), and obtains target data based on the evaluation value. evaluate. More specifically, the data evaluation unit 17 ranks the index of the target data (for example, ranks the target data, for example, by adding the evaluation values associated with the constituent elements constituting at least a part of the target data. Numerical values, letters, and / or symbols) can be derived. A form suitable as the index is a score obtained by adding the evaluation values. The data evaluation unit 17 associates the target data with the index, and stores both in an arbitrary memory (for example, the storage system 5).
  • an arbitrary memory for example, the database 4 of the storage system 5
  • the component evaluation unit 15 selects the component until the evaluation of the data with the “Related” or “High” label set becomes larger than the evaluation of the data with no label set, and the component Can be repeatedly evaluated to correct the evaluation value of the component. As a result, the component evaluation unit 15 can find a component that appears in a plurality of reference data to which the classification information “Related” or “High” is attached and has an influence on the combination of the reference data and the label. .
  • the component evaluation unit 15 calculates the evaluation value wgt of the component using, for example, the following formula.
  • wgt indicates the initial value of the evaluation value of the i-th component before evaluation.
  • Wgt indicates the evaluation value of the i-th component after the Lth evaluation.
  • means an evaluation parameter in the L-th evaluation, and ⁇ means a threshold value in the evaluation.
  • the component evaluation part 15 can evaluate, for example, that a component represents the characteristic of predetermined classification information, so that the value of the calculated transmission information amount is large.
  • the component evaluation unit 15 sets, as target data, an intermediate value between the lowest value of the index of the reference data set with “Related” and the highest value of the index of the reference data set with “Non-Related”.
  • a threshold value predetermined reference value for automatically determining whether or not “Related” is set can be used.
  • the data evaluation part 17 calculates each score of each of several target data and each of several reference data from the following formula
  • the score is an index that quantitatively evaluates the strength of the connection of these data to the classification code.
  • m j frequency of occurrence of the i-th component
  • wgt i Evaluation value of the i-th component
  • *** part is a functional configuration that is realized by executing a program (data analysis program) by a controller included in the data analysis system, It may be paraphrased as ** processing or *** function.
  • *** part can be replaced by hardware resources, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof. Yes, it is not limited to either.
  • FIG. 3 is a flowchart showing an example of processing executed by the predictive coding unit 10 included in the data analysis system according to the present embodiment.
  • the data acquisition unit 11 acquires sample data from an arbitrary memory (step 10, hereinafter “step” is abbreviated as “S”).
  • the classification information acquisition unit 12 acquires the classification information input by the user from an arbitrary input device (S11).
  • the data classification unit 13 classifies the data by combining the data and the classification information to configure reference data (S12), and the component extraction unit 14 configures the reference data. Are extracted from the reference data (S13).
  • the component evaluation unit 15 evaluates the component (S14), and the component storage unit 16 associates the component with the evaluation value and stores both in an arbitrary memory (S15).
  • the processing of S10 to S15 is referred to as a “learning phase” (a phase in which the system learns a pattern).
  • the data acquisition unit 11 acquires target data from an arbitrary memory (S16).
  • the component extraction unit 14 extracts the components constituting the target data from the target data (S17).
  • the data evaluation unit 17 reads an evaluation value associated with the constituent element from an arbitrary memory, and evaluates the target data based on the evaluation value (S18).
  • evaluation phase the system evaluates target data based on the pattern).
  • each process included in the learning phase is not an essential process in the system.
  • a memory that associates and stores a component and an evaluation value of the component is given in advance, and the predictive coding unit 10 performs target data based on the component and the evaluation value stored in the memory. Can also be evaluated.
  • the data analysis system accurately corrects a large number of target data regardless of the ratio of the number of reference data to the number of target data, or even when the number of reference data is insufficient.
  • the data analysis system sets new components based on the components extracted from the reference data, so that the target data can be accurately evaluated by supplementing the new components while maintaining the classification policy for the reference data. I was able to do it.
  • the former in order to distinguish between the component extracted from the reference data and the new component, the former is referred to as a reference component and the latter as a related component for convenience. To do.
  • a related component is a component that is not included in the reference data but is included in the target data and has attributes related to the standard component.
  • the target data is evaluated in relation to a predetermined case (target data Is effective in determining the score).
  • the “related attribute” is a characteristic of the related component with respect to the reference component, for example, that the related component (morpheme) is co-occurring with the reference component (morpheme),
  • the former and the latter are in a synonym relationship, and the former meta information and the latter meta information are common.
  • “price” is a reference component, “adjust price” or “adjust” that exists simultaneously in the same context as “price”, such as “determine price”, “consult price”, etc.
  • “Decision” and “consultation” are co-occurrence words for “price”, that is, related components.
  • “adjustment”, “determination”, and “consultation” may be determined as synonyms for “price” based on a database or the like.
  • FIG. 4 shows an example of a flowchart of the addition process.
  • the component evaluation unit 15 determines whether additional processing is necessary (S40).
  • the data analysis program of the server apparatus 12 can select an operation mode (operation policy) of additional processing when the data analysis operation manager sets the operation environment for data search.
  • this operation mode for example, there is a pattern in which (1) additional processing is performed, (2) additional processing is not performed, or (3) additional processing is performed depending on the situation.
  • the “situation” is, for example, a state in which the number of reference data tends to be insufficient compared to the number of target data, and the component evaluation unit 15 indicates that the score distribution of the plurality of target data is biased, When the number of reference components used for the evaluation is relatively small, it can be determined that the situation has occurred.
  • the additional process control program may always perform the additional process at the time of data analysis without the necessity determination step of the additional process.
  • the data analysis program may be configured to select whether to notify the operator that the additional processing is performed.
  • the component evaluation unit 15 denies the necessity determination of the additional process, the flow ends without performing the additional process, and when the necessity determination is affirmed, the related component element is set based on the evaluation result of the reference data. Migrate to Therefore, the component evaluation unit 15 sets a specific component that is a basis for determining a related component from among components (standard components) extracted from the reference data in order to set the related component. Is determined (S41).
  • the specific standard component may be one, a plurality, or all of a plurality of standard components extracted from the reference data. How the specific reference component is determined may be based on configuration information set for the operational environment for data analysis. In a preferred aspect, the specific reference component is selected from a predetermined number of reference components in descending order of evaluation value (for example, the reference component having the highest evaluation value). This is because the higher the evaluation component is, the higher the degree of relevance of the related component to the above-mentioned predetermined case according to the reference component. If the number of specific criteria components is greater than the optimal value, there is a concern that the evaluation of the target data tends to be inconsistent with the classification in the reference data, while if the number of specific criteria components is less than the optimal value, the target data is evaluated. However, the “predetermined number” may be appropriately determined by the system.
  • the component evaluation unit 15 determines a related component based on the specific reference component (S42).
  • the constituent element evaluation unit 15 extracts constituent elements that do not exist in the reference data and have a co-occurrence relationship with the specific standard constituent element from all target data or a part of target data, and the extracted constituent elements Is set as the related morpheme. For example, if “price” is a specific reference component, for example, “adjustment” co-occurs with “price”, such as “adjust price”, “determine price”, “consult price”, “Decision” and “consultation” are extracted, and these are set as related components.
  • the related component is information that is automatically added to the system without user input for evaluation (scoring) of the target data without increasing the reference data. If the related component does not exist in the target data, the component evaluation unit 15 may add a new specific reference element until the related component can be extracted from the target data.
  • the constituent element evaluation unit 15 performs the evaluation according to the attribute (for example, information transmission amount) of the related constituent element (S43).
  • the component evaluation unit 15 detects target data in which related components such as “adjustment”, “decision”, and “consultation” coexist with a specific reference component (“price”).
  • the number (n) of the detected target data is specified.
  • the additional processing control program regards the detected target data as relevant to the predetermined case (that is, the classification information of “Relative” or “High” corresponds to the predetermined case)
  • the information transmission amount is calculated from a predetermined definition formula based on the appearance probability of the related component and the appearance probability of the classification information in all the target data, and the evaluation value corresponding to each related component is estimated.
  • the component evaluation unit 15 can evaluate the evaluation value (weight) of the related component according to the following formula.
  • a component evaluation part may evaluate the evaluation value of a related component according to the following formula
  • CF is the j 0 th reference elements m j0, frequency with which the j 1 th connected component m j1 cooccur in the same sentence (occurrence frequency: collocation frequency) represents
  • DF is Both represent the frequency of co-occurrence in the same data
  • w represents the weight (evaluation result) of the reference component m j0 .
  • F represents an arbitrary function, for example, May be, It may be.
  • the component evaluation unit 15 evaluates the related component, whether the target data including the related component is “Relative” or “Non-Relative” based on the evaluation of the reference component, and The evaluation may be performed based on the evaluation result (score value) (S18) of the target data.
  • the data evaluation unit 17 re-evaluates all target data (recalculates the score) based on the evaluation value of the reference component and the evaluation value of the related component (S44). Furthermore, the data evaluation unit 17 ranks all the target data according to the evaluation result of all the target data and creates ranking information of all the targets. The data evaluation unit 17 evaluates each target data according to a predetermined value. Compared with the threshold information, classification information is set for each target data. The data evaluation unit 17 can output the above-described ranking information including the classification information to the client device 3.
  • a component having relevance to the component included in the reference data can be added to the evaluation of the target data as a new component. Regardless of the ratio of the number of reference data to the number of data, or even when the number of reference data is insufficient, a large number of target data can be accurately evaluated.
  • the constituent element evaluation unit 15 may determine the related constituent element from a constituent element that is a synonym of the specific reference constituent element and is not included in the reference data but included in the target data. At this time, the component evaluation unit 15 may use the search table of the database 4 to select synonyms for the specific reference component.
  • a synonym means that two different morphemes are in a relationship of being matched by, for example, a higher-level concept morpheme.
  • the constituent element evaluation unit 15 may set the related constituent elements from the synonyms and the morphemes described above having a co-occurrence relationship with the specific reference constituent element. Furthermore, the constituent element evaluation unit 15 may set related constituent elements from morphemes having a co-occurrence relationship with respect to the synonyms. Furthermore, when the synonym of the specific reference constituent element does not exist in the target data, the constituent element evaluation unit 15 may use another synonym having a similar meaning to the synonym as a candidate for the related constituent element.
  • the target data is evaluated based on the related component and the reference component, so that the difference between the two is given to the user. It can be presented, and the former evaluation results can be applied to the determination and evaluation of related components, but without the evaluation of target data based on the reference components, You may perform evaluation of object data based on it.
  • the server device 12 when the operating environment is set so as to notify the user of the additional processing, the server device 12 sends the specific reference component and the related configuration to the client device 3.
  • the candidate elements can be displayed in the order of evaluation values, and the user can select whether or not to adopt each element for data analysis.
  • the predictive coding unit 10 optimizes evaluation values of constituent elements based on given reference data and / or newly obtained reference data, for example, as described in (1) to (3) below. Can do.
  • the component evaluation unit 15 calculates the recall rate or the conformance rate based on the result of evaluating the target data, and the component is the data and the data so that the recall rate or the conformance rate increases. By repeatedly evaluating the degree of contribution to the combination with the classification information, the learned pattern can be updated.
  • the above-mentioned “recall rate” (RecallateRate) is an index indicating the ratio (coverability) of the data to be discovered to the predetermined number of data. For example, when “reproducibility is 80% with respect to 30% of all data”, it indicates that 80% of the data to be found is included in the data of the top 30% of the index (data If the data is brute force (linear review) without using an analysis system, the amount of data to be discovered is proportional to the amount reviewed, so the greater the deviation from the proportion, the better the system performance.) .
  • the “Precision Rate” is an index indicating the ratio (accuracy) of data to be truly discovered to the data discovered by the system. For example, when the expression “the relevance rate is 80% when 30% of all data is processed” is shown, the proportion of data to be discovered is 80% of the data of the top 30% of the index. .
  • the component extraction unit 14 calculates the recall rate or the conformance rate based on the result evaluated by the data evaluation unit 17, and when the recall rate or the conformance rate is lower than the target value, the recall rate or the conformance rate is the target. Re-extract the component from the data until the value is exceeded. At this time, the component extraction unit 14 may extract the component excluding the component extracted last time, or may replace a part of the component extracted last time with a new component.
  • the data evaluation unit 17 derives the index of the target data using the re-extracted component, the index (second index) of each data is derived using the re-extracted component and its evaluation value.
  • the recall rate or the matching rate may be derived again from the first index and the second index obtained before re-extracting the constituent elements. Thereby, the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the component evaluation unit 15 evaluates the component included in the reference data, and then convolves the evaluation value of the component other than the component with the component.
  • the component can be re-evaluated so that the evaluation value of the other component is reflected in the evaluation value.
  • the relevance between the constituent element and the other constituent elements is evaluated as an evaluation value of the constituent element, so that the data analysis system can further improve the accuracy of data analysis. Play.
  • the component evaluation unit 15 can update a pattern (for example, a combination of a component and an evaluation value of the component) at an arbitrary timing. That is, for example, the component evaluation unit 15 (a) at a timing when an update request is received from an administrative user who manages the system, (b) at a timing when a preset date and time arrives, and / or (c) The pattern can be updated at a timing when an input regarding the additional review is received from the user.
  • a pattern for example, a combination of a component and an evaluation value of the component
  • the user can confirm (confirmation review) the content of the target data from which the index is derived by the data evaluation unit 17, and can newly input classification information for the target data.
  • the classification information acquisition unit 12 may acquire newly input classification information, and the data classification unit 13 may combine the target data and the classification information and use the combination as new reference data.
  • the new reference data is stored in an arbitrary memory, and is fed back to the system, for example, at the timings (a) to (c).
  • the component extraction unit 14 extracts the component from the new reference data, and the component evaluation unit 15 evaluates the component.
  • the constituent element storage unit 16 replaces the evaluation value with a new evaluation result (evaluation value) and stores it. If not, the component and the evaluation value are associated with each other and newly stored in the memory.
  • the predictive coding unit 10 includes a plurality of constituent elements constituting at least a part of data corresponding to the classification information at an arbitrary timing (for example, timings (a) to (b) described above).
  • the learned pattern can be updated by re-evaluating the degree of contribution to the combination with the classification information.
  • the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the predictive coding unit 10 can further include a management unit 18 (for example, the management unit 18 has the following functions (1) to (5)).
  • the data evaluation unit 17 derives an index for each of a plurality of target data, and the user (for example, in the order in which the index indicates that the target data is highly related to the predetermined case) As an example, consider the case where each target data is confirmed and classification information is given (confirmed review). At this time, the management unit 18 uses the gradation corresponding to the ratio that the target data associated with the classification information occupies for all the target data, and the distribution of the ratio with respect to the result of evaluating each of the plurality of target data. Can be displayed in a visible manner.
  • the management unit 18 when the data evaluation unit 17 derives a numerical value in the range of 0 to 10000 as the index, the management unit 18, for example, has a range obtained by dividing the index every 1000 (that is, 0 to 1000 in the first interval). , 1001 to 2000 as the second section, 2001 to 3000 as the third section, etc.) (for example, the target data with the index of 2500 is classified into the third section), and a certain range
  • the range can be displayed (for example, the higher the ratio, the closer to the warm color system and the lower, the closer to the cold color system).
  • the management unit 18 displays the other ranges in the same manner for the other ranges.
  • the management unit 18 can display the distribution of the ratio in each range using gradation, for example, the index indicates that the relevance between the target data and the predetermined case is high. If the above-mentioned ratio in the range is indicated by a cold color tone in spite of the range (for example, the ninth section where the index is 8001 to 9000), the confirmation review by the user may be wrong Can suggest that. That is, the data analysis system further provides an additional effect that allows the user to grasp the distribution at a glance.
  • the management unit 18 can visualize interrelationships (eg, hierarchical relationships, series relationships, data transmission / reception, etc.) between a plurality of subjects (eg, people, organizations, computers, etc.). For example, when an e-mail is transmitted from the first computer to the second computer, the management unit 18 converts the first circle representing the first computer and the second circle representing the second computer into the first circle.
  • a predetermined display device for example, a display provided in the client device 10) is a diagram that is connected by an arrow (for example, a thickness corresponding to the size of the e-mail) from the circle to the second circle. Can be displayed.
  • the management unit 18 can visualize the interrelationship according to the result evaluated by the data evaluation unit 17. For example, when the data evaluation unit 17 derives a numerical value in the range of 0 to 10000 as the index, the management unit 18 may, for example, target data (for example, first data) associated with an index belonging to a specified section.
  • target data for example, first data
  • the diagram can be displayed on the predetermined display device only on the basis of the electronic mail transmitted from the computer to the second computer. Thereby, the data analysis system further exhibits an additional effect that allows the user to grasp the mutual relationship between a plurality of subjects at a glance.
  • the management unit 18 determines whether or not the first component representing the predetermined operation is included in the target data. When determining that the first component is included, the management unit 18 identifies the second component representing the target of the predetermined operation can do.
  • the management unit 18 associates the meta information (attribute information) indicating the attribute (property / feature) of the target data including the above constituent element and other constituent elements with the constituent element and the second constituent element.
  • the meta information is information indicating a predetermined attribute of data.
  • the target data is an e-mail
  • the name of the person who sent the e-mail the name of the person who received the e-mail
  • the e-mail It may be an address, the date and time of transmission / reception, and the like.
  • the management unit 18 associates the two components with the meta information and displays them on a predetermined display device (for example, a display provided in the client device 3).
  • a predetermined display device for example, a display provided in the client device 3
  • the management unit 18 connects the circle representing the first component and the circle representing the second component with an arrow from the first circle to the second circle. It can be displayed on a display device.
  • the data analysis system further exhibits an additional effect that the user can grasp the predetermined operation and the target at a glance.
  • the management unit 18 can extract data including constituent elements corresponding to subordinate concepts of a preselected concept from a plurality of target data, and can summarize the plurality of target data.
  • Content eg, sentences, graphs, tables, etc.
  • the user selects some concepts according to the topic to be detected from the target data, and registers the selected concepts in the management unit 18 in advance. For example, if the topic to be detected is “illegal” or “dissatisfied”, the concept category is divided into five categories of “behavior”, “emotion”, “nature / state”, “risk”, and “money” For example, “behavior” for “behavior”, “despise”, etc. “feeling” for “feelings”, “being angry”, etc. “dullness” for “nature / state”, “ The concept of “risk” and “danger” for “risk”, such as “bad attitude”, and “money paid for human labor” for “money” are given to the management unit 18 by the user. sign up.
  • the management unit 18 For each registered concept, the management unit 18 searches the reference data for a component corresponding to the subordinate concept of the concept, associates the searched component with the concept, and stores an arbitrary memory (for example, storage Store in system 18). Then, the management unit 18 extracts the stored constituent element from the target data, specifies a concept associated with the constituent element, and outputs a summary using the concept.
  • an arbitrary memory for example, storage Store in system 18
  • the management unit 18 extracts the concepts “system”, “sales” and “do” from the text “monitoring system order” included in a certain e-mail, and “accounting system introduction” included in another e-mail.
  • the concepts “system”, “sale”, and “do” are extracted from the text “”, and “sell system” is output as a summary of these emails.
  • the management unit 18 can show, for example, a graph (for example, a pie chart) indicating the ratio of target data including the concept of “sell system” to all target data.
  • the data analysis system further exhibits an additional effect of allowing the user to grasp the entire image of the target data.
  • Topic clustering The management unit 18 can cluster the plurality of target data according to topics (subjects) included in the plurality of target data.
  • the management unit 18 can cluster a plurality of target data using an arbitrary classification model (for example, K-means, support vector machine, spherical clustering, etc.).
  • an arbitrary classification model for example, K-means, support vector machine, spherical clustering, etc.
  • the predictive coding unit 10 may further include a phase analysis unit 19 (not shown in FIG. 2).
  • the phase analysis unit 19 has the following functions (1) to (3), for example.
  • phase analysis part 19 can analyze the phase which shows each step in which a predetermined case progresses.
  • a flow in which the phase analysis unit 19 analyzes a phase based on an example in which the above system is realized as a criminal investigation support system and a predetermined case is “collusion” will be described.
  • the collusion involves the relationship building phase (the stage of building relationships with competitors), the preparation phase (the stage of exchanging information about competitors with competitors), and the competition phase (providing prices to customers, obtaining feedback, It is known to progress in the order of communication). Therefore, the system administrator sets the above three phases in the phase analysis unit 19.
  • the system learns a plurality of patterns corresponding to the plurality of phases from a plurality of types of reference data respectively prepared for a plurality of preset phases, and the target data based on the plurality of phases, respectively. For example, it is possible to specify “in which phase the organization to be analyzed is currently in”.
  • the component evaluation unit 15 refers to a plurality of types of reference data respectively prepared for a plurality of preset phases, evaluates components included in the plurality of types of reference data, and The element and the result (evaluation value) obtained by evaluating the component are associated with each other and stored in the memory for each phase (that is, a plurality of patterns corresponding to the plurality of phases are respectively learned).
  • the data evaluation unit 17 derives an index for each of a plurality of phases by analyzing the target data based on the pattern learned for each phase.
  • the phase analysis unit 19 determines whether or not the index satisfies a predetermined determination criterion (for example, a threshold value) set in advance for each phase (for example, whether or not the index exceeds the threshold value). ) And the count value corresponding to the phase is increased. Finally, the phase analysis unit 19 specifies the current phase based on the count value (for example, the phase having the maximum count value is set as the current phase). Or when it determines with the parameter
  • a predetermined determination criterion for example, a threshold value
  • phase progress prediction based on a prediction model The phase analysis unit 19 is based on an index derived by evaluating a plurality of target data based on a model that can predict the progress of a predetermined action related to a predetermined case. Predict and present the following actions:
  • the phase analysis unit 19 uses the index derived for the first phase (for example, the relationship building phase) and the index derived for the second phase (for example, the preparation phase) as variables. Assuming a regression model (a model in which the progress can be predicted), the possibility (for example, the probability) of proceeding to the third phase (for example, the competitive phase) can be predicted based on the regression coefficient optimized in advance. Thereby, the data analysis system further exhibits an additional effect that the result of predicting the progress of the predetermined action related to the predetermined case can be suggested to the user.
  • a regression model a model in which the progress can be predicted
  • the phase analysis unit 19 uses the above-mentioned determination criteria (predetermined determination criteria set in advance for each phase, for specifying phases based on the index derived by the data evaluation unit 17, For example, the threshold) can be optimized according to given data.
  • the management unit 18 performs regression analysis on the relationship between the index derived for each of the plurality of target data and the ranking of the index (that is, the rank when the indices are arranged in ascending order), and the regression Based on the result of the analysis, the determination criterion can be reset (for example, the threshold value is changed).
  • the administrator of the system previously sets a ranking threshold for the ranking.
  • a function (y e ⁇ x + ⁇ (e is the base of the natural logarithm) where the phase analysis unit 19 determines the relationship between the index derived by the data evaluation unit 17 and the ranking of the index.
  • ⁇ and ⁇ are parameters that take real values)) (for example, the parameters of the function are determined by the method of least squares), and the index corresponding to the ranking threshold is newly set in the function.
  • the data analysis system can optimize the determination criterion according to given data, and thus has the additional effect of improving the accuracy of data analysis.
  • Each unit included in the predictive coding unit 10 can have, for example, the following auxiliary functions (1) to (6).
  • the data evaluation unit 17 can evaluate target data with high resolution. That is, the data evaluation unit 17 not only derives an index for the target data but also divides the target data into a plurality of parts (for example, sentences or paragraphs (partial target data) included in the target data). Based on the learned pattern, each of the plurality of partial target data can be evaluated (an index is derived for the partial target data).
  • the data evaluation unit 17 can also integrate a plurality of indices derived for each of the plurality of partial target data, and use the integrated index as an evaluation result of the target data (for example, each index is derived as a numerical value).
  • the maximum value of the index is extracted and used as an integrated index for the target data, or the average of the index is set as an integrated index for the target data, or a predetermined number of the indexes are added in descending order, Or an integrated indicator).
  • the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the component evaluation unit 15 delimits at predetermined intervals.
  • Each pattern is learned from the obtained reference data (for example, the reference data of the first section, the reference data of the second section, etc.) (that is, the component and the result of evaluating the component at each predetermined time)
  • the data evaluation unit 17 can evaluate the target data based on each of the patterns. That is, the data evaluation unit 17 can derive an index for the target data along the time series. Thereby, the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the data evaluation unit 17 can predict a future index based on the temporal change of the index. For example, the data evaluation unit 17 sets a model for time series analysis (for example, autoregressive model, moving average model, etc.) and within a predetermined period (for example, the past month) before new target data is obtained. The next index obtained when the new target data is evaluated can be predicted based on the index derived in step. Thereby, the data analysis system can further exhibit an additional effect that an event that can occur in the future (for example, a risk that an undesirable situation occurs) can be presented to the user.
  • a model for time series analysis for example, autoregressive model, moving average model, etc.
  • a predetermined period for example, the past month
  • Case-by-case evaluation Data that changes in nature depending on the type of case (for example, litigation-related documents whose contents change according to the type of lawsuit (for example, violation of antitrust law, information leakage, patent infringement, etc.) Etc.)
  • the component evaluation unit 15 learns each pattern from the reference data prepared for each case (for example, reference data related to violation of the Antimonopoly Act, reference data related to information leakage, etc.) (that is, The data evaluation unit 17 can evaluate the target data based on the pattern, respectively, by acquiring the component and the result of evaluating the component for each case.
  • the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the data evaluation unit 17 can analyze the structure of the target data and reflect the analysis result in the evaluation of the target data. For example, when the target data includes a sentence (text) at least partially, the data evaluation unit 17 expresses each sentence included in the sentence (for example, whether the sentence is a positive form or a negative form). Or the like, and the result of the analysis can be reflected in an index derived for the target data.
  • the positive form is an expression that affirms the subject (for example, “the dish is delicious”)
  • the negative form is an expression that denies the subject (for example, “the dish is not delicious” or “the dish is not delicious”).
  • the negative form may be an expression that affirms or denies the subject matter (eg, “the food was not delicious” or “the food was not delicious”).
  • the data evaluation unit 17 can adjust the index according to the expression form. For example, when the data evaluation unit 17 derives a numerical value in a predetermined range as the index, the data evaluation unit 17 adds, for example, “+ ⁇ ” to the positive form and “ ⁇ ” to the negative form, The above index can be adjusted by adding “+ ⁇ ” to the depolarized form ( ⁇ , ⁇ , and ⁇ may be arbitrary numerical values, respectively). Further, when the data evaluation unit 17 detects that the sentence included in the target data is negative, for example, by canceling the sentence, the component included in the sentence is not used as a basis for deriving the index ( The component is not considered).
  • the constituent element evaluation unit 15 can increase or decrease the evaluation value of the constituent element depending on, for example, whether a certain morpheme (constituent element) is a subject, an object, or a predicate of the sentence. Thereby, the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the data evaluation unit 17 correlates the first component included in the target data with the second component included in the target data (co-occurrence, For example, the index for the target data can be derived in consideration of the frequency of occurrence of both at the same time.
  • the data evaluation unit 17 determines that the first keyword is Based on the number of occurrences of the second keyword (second component) at a second position (for example, a position included in a predetermined range including the first position) in the vicinity of the appearing first position, the index Can be derived.
  • the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • the data evaluation unit 17 is the user's emotion that generated the target data, and the predetermined data generated based on the evaluation information Emotions for the case can be extracted from the target data (emotions included in the target data are evaluated).
  • the data evaluation unit 17 when data included in a website introducing a product / service (for example, an online product site, a restaurant guide) is to be analyzed, the data evaluation unit 17 is included in a comment (review) on the product / service.
  • Components for example, keywords such as “good”, “fun”, “bad”, “clogged”
  • evaluation of the product / service eg, “very good”, “good”, “
  • the target data for example, data included in other websites
  • the data evaluation unit 17 can increase or decrease the evaluation result according to, for example, exaggerated expressions (for example, “very”, “very”, etc.).
  • the data analysis system further exhibits an additional effect that the accuracy of data analysis can be improved.
  • Example of data analysis system processing data other than document data the case where the data analysis system analyzes document data is mainly assumed, and an example based on the assumption has been described.
  • the system is not limited to document data (for example, audio data, image data). , Video data, etc.).
  • the system may analyze the speech data itself, convert the speech data into document data by speech recognition, and convert the converted document data as an analysis target.
  • the system divides the voice data into partial voices of a predetermined length to form components, and uses the voice analysis method (for example, hidden Markov model, Kalman filter, etc.) to convert the partial voices.
  • the voice analysis method for example, hidden Markov model, Kalman filter, etc.
  • the voice data can be analyzed.
  • a speech is recognized using an arbitrary speech recognition algorithm (for example, a recognition method using a hidden Markov model), and the procedure similar to the procedure described in the embodiment is performed on the recognized data. Can be analyzed.
  • the system When analyzing image data, the system, for example, divides the image data into partial images of a predetermined size to form components, and any image recognition method (for example, pattern matching, support vector machine, neural network) Etc.) can be used to identify the partial image.
  • image recognition method for example, pattern matching, support vector machine, neural network
  • the system when analyzing video data, divides a plurality of frame images included in the video data into partial images each having a predetermined size to form a component, and an arbitrary image recognition technique (for example, a pattern
  • the video data can be analyzed by identifying the partial image using matching, a support vector machine, a neural network, or the like.
  • the control block of the data analysis system may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software using a CPU.
  • the system includes a CPU that executes a program (control program for the data analysis system) that is software that implements each function, and a ROM (in which the program and various data are recorded so as to be readable by the computer (or CPU)).
  • a Read Only Memory or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for developing the program, and the like are provided.
  • a computer reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • the above program can be implemented in any programming language, for example, a script language such as Python, ActionScript, JavaScript (registered trademark), an object-oriented programming language such as Objective-C, Java (registered trademark), HTML5, or the like Can be implemented using other markup languages. Also, any recording medium that records the above program falls within the scope of the present invention.
  • the system uses the related component having the relationship with the component included in the reference data for the evaluation of the target data, so that the target data can be obtained even if the number of reference data is small.
  • the system includes, for example, a discovery support system, a forensic system, an e-mail monitoring system, a medical application system (for example, a pharmacovigilance support system, a clinical trial efficiency system) , Medical risk hedging system, fall prediction (fall prevention) system, prognosis prediction system, diagnosis support system, etc.) Internet application system (eg smart mail system, information aggregation (curation) system, user monitoring system, social media) Management systems, etc.), information leakage detection system, project evaluation system, marketing support system, intellectual property assessment system, unauthorized transaction monitoring system, call center escalation system, such as credit investigation system, it can also be implemented as any of the system.
  • a discovery support system for example, a forensic system, an e-mail monitoring system
  • a medical application system for example, a pharmacovig
  • the data analysis system of the present invention uses target data (for example, documents, emails, spreadsheet data, etc.) as a predetermined evaluation standard (for example, in this case lawsuit). (E.g., whether or not the data should be submitted in the discovery procedure), by using related components that are related to the components included in the reference data for the evaluation of the target data Even if the number of data is small, the target data can be accurately evaluated and only the documents related to this case can be efficiently and reliably submitted to the court.
  • target data for example, documents, emails, spreadsheet data, etc.
  • a predetermined evaluation standard for example, in this case lawsuit
  • the data analysis system of the present invention uses target data (for example, documents, emails, spreadsheet data, etc.) as a predetermined evaluation standard (for example, the data is a crime). (E.g., whether or not the act is provable evidence), by using the related component that is related to the component included in the reference data for the evaluation of the target data Even if the number of data is small, it is possible to accurately evaluate the target data and efficiently and reliably extract evidence that proves the criminal activity.
  • target data for example, documents, emails, spreadsheet data, etc.
  • a predetermined evaluation standard for example, the data is a crime.
  • the data analysis system of the present invention when the data analysis system of the present invention is realized as an e-mail monitoring system, the data analysis system transmits / receives target data (for example, e-mail, attached file, etc.) to a predetermined evaluation standard (for example, e-mail By using a related component that is related to the component included in the reference data in the evaluation of the target data when evaluating based on whether or not the user has attempted fraud) Even if the number of reference data is small, it is possible to accurately evaluate target data and efficiently and reliably detect signs of fraud such as information leakage and collusion.
  • target data for example, e-mail, attached file, etc.
  • a predetermined evaluation standard for example, e-mail
  • the data analysis system of the present invention is realized as a medical application system (for example, pharmacovigilance support system, clinical trial efficiency system, medical risk hedging system, fall prediction (fall prevention) system prognosis prediction system, diagnosis support system, etc.).
  • the data analysis system uses target data (for example, electronic medical records, nursing records, patient diaries, etc.) based on predetermined evaluation criteria (for example, whether or not to take a specific risk action of the patient, (E.g., whether or not the reference data is effective), by using the related components that are related to the components included in the reference data for the evaluation of the target data. Even if the number is small, the target data is accurately evaluated, for example, the patient falls into a dangerous state (for example, falls) The efficacy of the prediction and drugs, to efficiently and reliably, it is possible to objectively evaluate.
  • target data for example, electronic medical records, nursing records, patient diaries, etc.
  • predetermined evaluation criteria for example, whether or not to take a specific risk action of the patient, (E
  • the data analysis system of the present invention is realized as an Internet application system (for example, a smart mail system, an information aggregation (curation) system, a user monitoring system, a social media management system, etc.), the data analysis system is a target.
  • Data for example, a message posted by the user to the SNS, recommended information posted on the website, profile of the user or group, etc.
  • a predetermined evaluation standard for example, the preference of the user and the preference of other users
  • the number of reference data Accurately evaluate the target data at least, display a list of other users who are likely to feel at ease with the user, present restaurant information that suits the user's preferences, or cause harm to the user It is possible to efficiently and reliably execute a warning for a group that may be.
  • the data analysis system of the present invention uses target data (for example, e-mail, database access log information) as a predetermined evaluation criterion (for example, the When evaluating based on whether or not the user who sent and received e-mails is trying to commit fraud, use related components that are related to the components included in the reference data to evaluate the target data
  • target data for example, e-mail, database access log information
  • evaluation criterion for example, the When evaluating based on whether or not the user who sent and received e-mails is trying to commit fraud, use related components that are related to the components included in the reference data to evaluate the target data
  • the number of reference data is small, it is possible to accurately evaluate the target data and efficiently and reliably find a sign of information leakage.
  • the data analysis system of the present invention when the data analysis system of the present invention is realized as an information asset utilization system (project evaluation system), the data analysis system includes information assets (target data) possessed by companies / experts for effective information for the project. Therefore, when extracting dynamically according to the situation of the project, the number of reference data can be reduced by using the related components that are related to the components included in the reference data for the evaluation of the target data. Even if the target data is accurately evaluated, for example, (1) In order to improve the efficiency of development sites where shortening of the development period is desired, information on products developed in the past can be reused according to the requirements of the development. (2) It is possible to efficiently and reliably execute the specification of useful information assets based on the expertise possessed by skilled engineers.
  • the data analysis system of the present invention uses target data (for example, company / individual profile, product information, etc.) as a predetermined evaluation standard (for example, When evaluating based on whether the product is male or female, or whether the consumer has a favorable impression on the product, etc., the related components that are related to the components included in the reference data are used to evaluate the target data.
  • target data for example, company / individual profile, product information, etc.
  • the related components that are related to the components included in the reference data are used to evaluate the target data.
  • the data analysis system of the present invention uses target data (for example, patent publications, documents summarizing the invention, academic papers, etc.) as a predetermined evaluation standard (for example, , Whether the patent publication can be used as evidence to reject or invalidate a given patent).
  • target data for example, patent publications, documents summarizing the invention, academic papers, etc.
  • a predetermined evaluation standard for example, , Whether the patent publication can be used as evidence to reject or invalidate a given patent.
  • the data analysis system for example, combines each claim of a patent to be invalidated with a “Related” label (classification information), and each claim of an unrelated patent different from the patent and “Non- A combination with a “Related” label (classification information) is acquired as reference data, a pattern is learned from the reference data, and an index is calculated for a large number of documents (target data) (for example, an index for each paragraph of a patent publication) The target data can be evaluated by calculating and adding a predetermined number from the top of the index to obtain the index of the patent publication.
  • target data for example, an index for each paragraph of a patent publication
  • the data analysis system of the present invention uses target data (for example, e-mail, financial transaction information, bid information, etc.) as a predetermined evaluation criterion (for example, the When evaluating based on whether the user who sent and received e-mail is going to conduct fraudulent transactions, etc., use related components that are related to the components included in the reference data to evaluate the target data
  • target data for example, e-mail, financial transaction information, bid information, etc.
  • a predetermined evaluation criterion for example, the When evaluating based on whether the user who sent and received e-mail is going to conduct fraudulent transactions, etc., use related components that are related to the components included in the reference data to evaluate the target data
  • a sign of fraud such as cartels and collusion.
  • the data analysis system of the present invention uses target data (for example, telephone call history, recorded voice, etc.) as a predetermined evaluation criterion (for example, past history).
  • target data for example, telephone call history, recorded voice, etc.
  • evaluation criterion for example, past history
  • the data analysis system of the present invention When the data analysis system of the present invention is implemented as a credit check system, the data analysis system receives target data (for example, company profile, information about company performance, information about stock prices, press releases, etc.) in a predetermined manner.
  • target data for example, company profile, information about company performance, information about stock prices, press releases, etc.
  • evaluation criteria for example, whether the company goes bankrupt, whether the company grows, etc.
  • the related components that are related to the components included in the reference data are subject data For example, even if the number of reference data is small, the target data can be accurately evaluated, and for example, the prediction of corporate growth / bankruptcy can be achieved efficiently and reliably.
  • the data analysis system of the present invention uses target data (for example, data acquired from an in-vehicle sensor, a camera, a microphone, etc.) as a predetermined evaluation standard (for example, When the evaluation is performed based on whether or not the driver is focused on information during driving by the expert driver, the related component having the relationship with the component included in the reference data is used for the evaluation of the target data.
  • target data for example, data acquired from an in-vehicle sensor, a camera, a microphone, etc.
  • a predetermined evaluation standard for example, When the evaluation is performed based on whether or not the driver is focused on information during driving by the expert driver, the related component having the relationship with the component included in the reference data is used for the evaluation of the target data.
  • the target data can be accurately evaluated, and for example, automatic extraction of useful information that can make driving safe and comfortable can be achieved efficiently and reliably.
  • the data analysis system of the present invention uses target data (for example, company / individual profile, product information, etc.) based on a predetermined evaluation standard (for example, When evaluating based on whether the product is male or female, or whether the consumer has a favorable impression on the product, etc., the related components that are related to the components included in the reference data are used to evaluate the target data.
  • a predetermined evaluation standard for example, When evaluating based on whether the product is male or female, or whether the consumer has a favorable impression on the product, etc., the related components that are related to the components included in the reference data are used to evaluate the target data.
  • the data analysis system of the present invention uses target data (for example, the market price of the stock price) as a predetermined evaluation standard (for example, a stock price).
  • target data for example, the market price of the stock price
  • a predetermined evaluation standard for example, a stock price
  • preprocessing for example, extracting an important part from the data and extracting only the important part from the data
  • the analysis target may be applied), or the mode of displaying the data analysis result may be changed. It will be understood by those skilled in the art that a variety of such variations can exist, and all variations fall within the scope of the present invention.
  • a data analysis system includes a memory, an input control device, and a controller, and the controller generates an index that ranks a plurality of target data, and the index includes each target data Corresponding to a predetermined case and changes based on an input given by a user via the input control device, and the memory stores at least the plurality of target data at least temporarily.
  • the input control device presents sample data for the target data to the user, accepts input of classification information from the user, and the classification information is based on the input to classify the sample data.
  • a combination of the sample data and the classification information received from the user is associated with the sample data, and the reference data
  • the controller obtains a plurality of the reference data, extracts a first component from the plurality of reference data, and the first component is at least one of the reference data.
  • the second component having the relevance to the evaluated first component is the plurality of target data, and the degree of contribution of the first component to the combination is evaluated.
  • the second component element constitutes at least a part of the target data, the second component element is evaluated, and the evaluation result of the second component element is obtained.
  • the relevance between the plurality of target data and the predetermined case is evaluated by generating the index based on the index. Therefore, according to the data analysis system, since the second component can be supplemented for the evaluation of the target data, the analysis of the data group can be accurately performed regardless of the amount of the partial data with respect to the data group. It can be carried out.
  • the controller includes the first configuration from among the plurality of components based on superiority or inferiority of the evaluation of each of the plurality of components included in the reference data.
  • the controller includes the first configuration from among the plurality of components based on superiority or inferiority of the evaluation of each of the plurality of components included in the reference data.
  • the controller determines, as the first component, the component having the highest evaluation among the plurality of components included in the target data. An additional effect that a component having a high evaluation value can be selected as the second component is achieved.
  • the controller sets the association between the second component and the first component based on a predetermined criterion, and sets the relationship to the criterion.
  • the controller is configured such that the second component is in a co-occurrence relationship with the first component, is in a similar word relationship, and Analyzing the second constituent element related to the first constituent element by analyzing the second constituent element by extracting the second constituent element based on at least one of the relationships having the common meta information
  • the controller is configured such that the second component is in a co-occurrence relationship with the first component, is in a similar word relationship, and Analyzing the second constituent element related to the first constituent element by analyzing the second constituent element by extracting the second constituent element based on at least one of the relationships having the common meta information
  • the controller includes the second component according to the frequency with which the second component exists with the predetermined relationship with respect to the plurality of target data.
  • the evaluation that the second constituent element can be evaluated accurately and reliably by evaluating the constituent element and evaluating the plurality of target data based on the evaluation result of the second constituent element. Effects are achieved.
  • a control method for a data analysis system is a control method for a data analysis system for analyzing data, which generates an index for ranking a plurality of target data, and the index includes each target data And a second step of storing at least temporarily the plurality of target data corresponding to the relationship between the user and the predetermined case and changing based on an input from the user And a third step of presenting sample data for the target data to the user, and an input of classification information is received from the user, the classification information being based on the input to classify the sample data Reference is made to the combination of the fourth step that is associated with the sample data and the classification information received from the user.
  • a second component having the characteristics is extracted from at least one of the plurality of target data, and the second component constitutes at least a part of the target data;
  • a data analysis system control program is a data analysis system control program that causes a computer to execute each step included in the data analysis system control method invention.
  • a recording medium is a recording medium that records a control program of the data analysis system. Therefore, according to the control program and the recording medium of the data analysis system, the second component can be supplemented for the evaluation of the target data. Therefore, regardless of the amount of the partial data for the data group, the data Group analysis can be performed accurately.
  • a data analysis system is, for example, a data analysis system that evaluates target data
  • the system includes a memory, an input control device, and a controller, and the controller includes a plurality of targets.
  • the data is evaluated, and the evaluation corresponds to, for example, the relationship between each target data and a predetermined case, and an index that enables ranking of the plurality of target data is generated by the evaluation,
  • the index can be changed based on an input given by a user via the input control device, and the memory stores, for example, at least temporarily the plurality of target data evaluated by the controller, and the input
  • the control device for example, allows the user to input an order for the controller to rank the plurality of target data, and the plurality of target data
  • the order changes, for example, according to the index that changes based on the input, and the input includes, for example, reference data different from the plurality of target data, the reference data, and the predetermined case.
  • the classification is, for example, divided into a plurality of classification information according to the content of the reference data, and at least one of the plurality of classification information is
  • the reference data is given to the reference data by the input, the reference data is presented to the user, and the at least one classification information given to the presented reference data by the user input;
  • a combination with the reference data is provided to the controller, and the controller includes, for example, a plurality of components included in the reference data.
  • a pattern characterized by the reference data is extracted from the reference data according to the classification information given by the input by evaluating the degree of contribution to each combination provided by the control device, and the extracted pattern is converted into the extracted pattern.
  • determining the index by evaluating relevance between the target data and the predetermined case, setting the determined index in the target data, and ranking the plurality of target data according to the index
  • the user is notified of the plurality of target data arranged in order.
  • the present invention can be widely applied to arbitrary computers such as a personal computer, a server device, a workstation, and a mainframe, and is particularly applicable to an artificial intelligence system.

Abstract

L'invention concerne un contrôleur conçu pour : extraire un premier élément constitutif d'une pluralité de données de référence; évaluer le premier élément constitutif et, d'après cette évaluation, extraire un second élément constitutif d'une ou de plusieurs données d'une pluralité de données cibles; évaluer le second élément constitutif; générer un indicateur d'après les résultats de l'évaluation pour le premier élément constitutif et les résultats de l'évaluation pour le second élément constitutif et, d'après l'indicateur généré, évaluer la relation entre la pluralité de données cibles et une matière prescrite.
PCT/JP2015/064832 2015-05-22 2015-05-22 Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement WO2016189605A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2017520082A JPWO2016189605A1 (ja) 2015-05-22 2015-05-22 データ分析に係るシステム、制御方法、制御プログラム、および、その記録媒体
PCT/JP2015/064832 WO2016189605A1 (fr) 2015-05-22 2015-05-22 Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/064832 WO2016189605A1 (fr) 2015-05-22 2015-05-22 Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement

Publications (1)

Publication Number Publication Date
WO2016189605A1 true WO2016189605A1 (fr) 2016-12-01

Family

ID=57394061

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/064832 WO2016189605A1 (fr) 2015-05-22 2015-05-22 Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement

Country Status (2)

Country Link
JP (1) JPWO2016189605A1 (fr)
WO (1) WO2016189605A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019133478A (ja) * 2018-01-31 2019-08-08 株式会社Fronteo 計算機システム
JP2020502712A (ja) * 2016-12-11 2020-01-23 ディープ バイオ インク ニューラルネットワークを用いた疾病診断システム及びその方法
CN113065065A (zh) * 2021-03-30 2021-07-02 广联达科技股份有限公司 一种评价搜索性能的方法、装置、设备及可读存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003242170A (ja) * 2002-02-15 2003-08-29 Ricoh Co Ltd 文書検索装置、文書検索方法および記録媒体
JP2013182338A (ja) * 2012-02-29 2013-09-12 Ubic:Kk 文書分別システム及び文書分別方法並びに文書分別プログラム

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003242170A (ja) * 2002-02-15 2003-08-29 Ricoh Co Ltd 文書検索装置、文書検索方法および記録媒体
JP2013182338A (ja) * 2012-02-29 2013-09-12 Ubic:Kk 文書分別システム及び文書分別方法並びに文書分別プログラム

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020502712A (ja) * 2016-12-11 2020-01-23 ディープ バイオ インク ニューラルネットワークを用いた疾病診断システム及びその方法
US11074686B2 (en) 2016-12-11 2021-07-27 Deep Bio, Inc. System for diagnosing disease using neural network and method therefor
JP2019133478A (ja) * 2018-01-31 2019-08-08 株式会社Fronteo 計算機システム
CN113065065A (zh) * 2021-03-30 2021-07-02 广联达科技股份有限公司 一种评价搜索性能的方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
JPWO2016189605A1 (ja) 2018-02-15

Similar Documents

Publication Publication Date Title
JP6182279B2 (ja) データ分析システム、データ分析方法、データ分析プログラム、および、記録媒体
Li et al. Tourism companies' risk exposures on text disclosure
JP5885875B1 (ja) データ分析システム、データ分析方法、プログラム、および、記録媒体
Liu et al. A two-phase sentiment analysis approach for judgement prediction
WO2017067153A1 (fr) Procédé et dispositif d'évaluation du risque de crédit d'après une analyse de texte, et support de stockage
JP6748710B2 (ja) データ分析システム、その制御方法、プログラム、及び、記録媒体
WO2016125310A1 (fr) Système, procédé et programme d'analyse de données
Abrahams et al. Audience targeting by B-to-B advertisement classification: A neural network approach
Afsana et al. Automatically assessing quality of online health articles
WO2016203652A1 (fr) Système lié à l'analyse de données, procédé de commande, programme de commande et support d'enregistrement associé
Liu et al. Physician selection based on user-generated content considering interactive criteria and risk preferences of patients
Li et al. Evaluating Online Review Helpfulness Based on Elaboration Likelihood Model: the Moderating Role of Readability.
WO2016189605A1 (fr) Système d'analyse de données, procédé de commande, programme de commande et support d'enregistrement
JP5933863B1 (ja) データ分析システム、制御方法、制御プログラム、および記録媒体
JP2017201543A (ja) データ分析システム、データ分析方法、データ分析プログラム、および、記録媒体
WO2016121127A1 (fr) Système d'évaluation de données, procédé d'évaluation de données, et programme d'évaluation de données
JP6178480B1 (ja) データ分析システム、その制御方法、プログラム、及び、記録媒体
JP6026036B1 (ja) データ分析システム、その制御方法、プログラム、及び、記録媒体
Stankevich et al. Predicting personality traits from social network profiles
Tanaltay et al. Can Social Media Predict Soccer Clubs’ Stock Prices? The Case of Turkish Teams and Twitter
Dey et al. Applying Text Mining to Understand Customer Perception of Mobile Banking App
WO2016111007A1 (fr) Système d'analyse de données, procédé de commande de système d'analyse de données, et programme de commande de système d'analyse de données
Kim et al. Analyzing Dissatisfaction Factors of Weather Service Users Using Twitter and News Headlines
Afsana et al. Automatically Assessing Quality of Online Health
Li et al. Empirical study of factors that influence the perceived usefulness of online mental health community members

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15893241

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017520082

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15893241

Country of ref document: EP

Kind code of ref document: A1