WO2013190627A1

WO2013190627A1 - Correlation analyzing device and correlation analyzing method

Info

Publication number: WO2013190627A1
Application number: PCT/JP2012/065552
Authority: WO
Inventors: 靖英森; 孝史野口; 光一朗飯島
Original assignee: 株式会社日立製作所
Priority date: 2012-06-18
Filing date: 2012-06-18
Publication date: 2013-12-27

Abstract

A correlation analyzing device includes a data conversion section for converting a plurality of data items generated in different events and having different types or different temporal resolutions to data having a common data format by using relationship information that manages the relationships among the plurality of data items, and a correlation calculation section for calculating the correlations among the data obtained by the conversion in the data conversion section.

Description

Correlation analysis apparatus and method

The present invention relates to an apparatus for analyzing the relationship between events, and more particularly to a correlation analysis apparatus and method for supporting cause analysis of an event using a plurality of types of data.

As background technologies, there are RCA (Root Cause Analysis) and FTA (Fault Tree Analysis) that define methods by hand. The former is a method for pursuing the root cause by tracing the cause in several stages while listing the cause in a tree structure. The latter expresses the causal relationship in a tree structure with AND and OR branches.

There is Patent Document 1 as an invention of a system for supporting cause investigation work. Patent Document 1 discloses a technique having a function of displaying other selected data when a change per unit time exceeds a preset threshold value.

Also, Patent Document 2 is an invention of a system for supporting cause investigation work. Patent Document 2 discloses a technology having a function of presenting support information for supporting analysis of a cause of a departure event to an analyst by referring to a human error tree diagram when analyzing the cause of the departure event. Is disclosed.

JP 2003-216238 A JP 2004-287649 A

In the inventions described in the above patent documents, it is assumed that the time-series feature values that can be handled are uniform. On the other hand, basic data for pursuing the cause is generally a mixture of a plurality of types of data (text data converted into language data, numerical data such as sensor data and various measurement data). Therefore, when performing cause analysis using a plurality of types of basic data, it is necessary to compare the relationship between the basic data. However, no consideration is given to this point in the prior art. Further, when performing cause analysis using a plurality of types of basic data, the conventional technology does not give any consideration to reducing the amount of calculation and the amount of data for applying the basic data to large-scale data.

An object of the present invention is to provide a correlation analysis apparatus and method capable of calculating a correlation between a plurality of types of data even when a plurality of types of data are mixed.

In order to solve the above-mentioned problem, the present invention uses relevance information for managing relevance between a plurality of data having different types or time resolutions, which is generated for each event. The data is converted into data having a common data format, and the correlation between the converted data is calculated.

According to the present invention, even when multiple types of data are mixed, the correlation between the multiple types of data can be calculated.

It is a block diagram showing a first embodiment of a computer system according to the present invention. It is a block diagram of sensor system original data. It is a block diagram of text system original data. It is a block diagram of sensor system feature-value data. It is a block diagram of text type feature-value data. It is a block diagram of multiple time scale numerical feature-value data. It is a block diagram of a feature quantity / keyword correspondence table. It is a block diagram of a feature quantity / keyword conversion coefficient. It is a flowchart for demonstrating a sensor type | system | group feature-value extraction process. It is a flowchart for demonstrating a text type feature-value extraction process. It is a flowchart for demonstrating a correlation calculation process. It is a block diagram which shows the 2nd Example of the computer system which concerns on this invention. It is a block diagram of importance data. It is a flowchart for demonstrating importance calculation and partial correlation calculation processing. It is a flowchart for demonstrating an approximate correlation calculation process.

Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
In the present embodiment, an example will be described in which a correlation between a plurality of types of data can be calculated.

FIG. 1 is a block diagram showing a first embodiment of a computer system according to the present invention. In FIG. 1, the computer system includes a correlation analysis server 1001, a plurality of terminals 1002, and a network 1003. The correlation analysis server 1001 and the plurality of terminals 1002 are connected to the network 1003, respectively. The correlation analysis server 1001 is a correlation analysis device for analyzing the correlation. Each terminal 1002 is a terminal that presents a result to the user or accepts a user operation input. Note that the terminal 1002 is composed of two or more, and each user may input a different analysis result presentation or request, and individual operations from a plurality of terminals 1002 depending on the configuration of an existing server client method or the like. Can be done.

The network 1003 is a network that performs information communication with the correlation analysis server 1001 and the terminal 1002. The network 1003 may be any communication system that can perform communication that can transmit an operation instruction and a result display to be described later. As another configuration example, the terminal 1002 may be directly connected to the correlation analysis server 1001 without using the network 1003. As examples of input data, sensor data 1004 and text data 1005 are assumed.

Next, the internal configuration of the correlation analysis server 1001 will be described. The correlation analysis server 1001 includes a processing unit 1010 that performs various types of analysis processing, a storage device 1020 that stores various types of data, a data collection interface (I / F) 1030, and a display interface (I / F) 1040.

The processing unit 1010 stores the sensor data 1004 or text data 1005 as original data 1021 in the storage device 1020, and extracts a feature amount from the original data 1021 by a data management unit 1011 that manages the stored original data 1021, and a procedure that will be described later. Feature amount extraction unit 1012, correlation calculation unit 1013 that calculates correlation from the feature amount, and display content configuration unit 1014 that configures display content for presenting the correlation calculated by correlation calculation unit 1013 to the user by various displays It consists of a CPU (Central Processing Unit) with

The storage device 1020 stores various data such as original data 1021 such as sensor data and text data, feature data 1022, and correlation data 1023.

The data collection I / F 1030 is an interface for collecting data such as sensor data 1004 and text data 1005 and inputting the collected data to the data management unit 1011 in the processing unit 1010.

The display I / F 1040 is an interface for actually displaying the display content of the display content configuration unit 1014 in the processing unit 1010 on the terminal 1002.

FIG. 2 is a configuration diagram of sensor system original data such as sensor data. In FIG. 2, sensor system original data 2010 is sensor system original data (measurement data) composed of data extracted from information of a measurement device (sensor) installed in the facility and an operation control panel, and a data ID 2011. , And the description content 2012.

The data ID 2011 is an ID for identifying which sensor of which equipment the original data 2010 obtained from the sensor data 1004 out of the original data 1021 is generated. The description content 2012 includes a date and time 2013 indicating the date and time when the data is generated, a sensor ID 2014 indicating the sensor or measuring device that is the data generation source, a sensor output value 2015 indicating the output value of the sensor, and the occurrence of an event by the sensor. The flag 2016 is shown. In the flag 2016, “Y (present)” is recorded when an event has occurred by the sensor, and “N (none)” is recorded when no event has occurred by the sensor.

In many sensors, there is only one output value or flag value. However, there are cases where multiple measuring devices are grouped together and regarded as one sensor. Generally, one or more output values and flag values are used. Suppose you can have

Here, an event means that various sensors collect measurement data and an operator creates a business record or the like. The data generated by the event is data including measurement data obtained by measurement of various sensors (measurement devices) and language data obtained by converting business records created by an operator into digital data, for example, text data. .

FIG. 3 is a configuration diagram of text-based original data such as text data. In FIG. 3, text-based original data 3010 is text-based original data originating from text input by a worker, such as daily reports and various report information, and includes a data ID 3011 and description contents 3012. The data ID 3011 is an ID for identifying in which equipment or business the original data 3011 obtained from the text data 1005 among the original data 1021 is generated. The description content 3012 includes a date and time 3013 indicating the date and time when the data was generated, a report ID 3014 for identifying the reporter, and a report for entering report items such as work content, results, and special notes in natural language. Consists of item 3015.

FIG. 4 is a configuration diagram of sensor system feature data. In FIG. 4, sensor system feature data 4010 is integrated feature data obtained by converting the sensor system original data 2010 into feature data, and includes a type ID 4011, a data ID 4012, a starting time 4013, and a keyword 4014. , A numerical feature amount 4015 and a parameter 4016. The type ID 4011 is an index indicating the type of whether the original data 2010 is a text system or a sensor system. When the original data 2010 is a sensor system, it may indicate a more detailed type such as what type of sensor it is. The data ID 4012 is an ID for identifying in which equipment or business the data is generated. The starting time 4013 is information indicating the data generation time. A keyword 4014 is obtained by estimating a typical keyword with respect to a feature amount from a past feature amount pattern and a corresponding example of the keyword, and adding the estimated keyword. The estimation method will be described later.

The numerical feature quantity 4015 is obtained by converting the sensor system original data 2010 into a feature quantity in a predetermined format and digitizing the converted feature quantity. For example, the measurement data is divided by a set value larger than the maximum value of the measurement data, and a value obtained by this division is obtained as numerical feature data of 1 or less. In this case, the defined format is a feature that expands features on multiple time scales and expresses them on multiple scales so that correlation can be calculated on different time scales (1 day and 1 hour, etc.). is there. When the numerical feature value data obtained from the measurement data is expanded to a plurality of time scales, the numerical feature value data of each time scale is managed as time series data having different time resolutions.

For example, numerical feature amounts 1 to 100 are managed as time series data indicating numerical feature amounts in one hour units, and numerical feature amounts 101 to 200 are managed as time series data indicating numerical feature amounts in one day units. . If there is insufficient data in the time series data, the shortage can be satisfied by executing interpolation calculation or average value calculation based on the same time series data or other time series data. be able to. The contents of the time scale development of the feature amount portion will be separately described with reference to FIG.

The parameter 4016 is information added to the measurement data, and is information used to determine the correlation calculation at different time delays. For example, the parameter 1 includes the time delay of the correlation between the feature quantity whose type ID 4011 is A and the data ID 4012 is X and the feature quantity X (the time when the data to be correlated is obtained, and the feature quantity The difference from the time when X was obtained is 2 hours. At this time, if the difference between the time at which the data to be correlated (data to be correlated) is obtained and the time at which the feature value X is obtained is 2 hours, the time resolution condition is set. This means that matching is established between the two data.

The parameter M indicates that the time delay between the feature quantity Y having the type ID 4011 of Z and the data ID 4012 of Y and the feature quantity Y is 12 hours. The values of these parameters are determined with reference to characteristic values for each type ID 4011 and data ID 4012. Here, if the single original data alone cannot cover the time scale of the feature amount, the original data from the sensors having the same ID at a plurality of times are collected as one feature amount.

FIG. 5 is a configuration diagram of text-based feature data. In FIG. 5, text-based feature data 5010 is integrated feature data obtained by converting text-based original data 3010 into feature data, and includes a type ID 5011, a data ID 5012, a starting time 5013, and a keyword 5014. , A numerical feature quantity 5015 and a parameter 5016. The type ID 5011 is an index indicating the type of whether the original data 3010 is a text system or a sensor system. When the original data 3010 is a text system, it may indicate a more detailed type such as a daily report or a defect report. The data ID 5012 is an ID for identifying in which equipment or business the data is generated. The starting time 5013 is information indicating the data generation time. A keyword 5014 is obtained by extracting a keyword from text data of the original data 3010 by document processing. At this time, only those existing in a predetermined vocabulary may be left. The numerical feature quantity 5015 is obtained by estimating a corresponding feature quantity from text data and digitizing the estimated feature quantity.

The typical method is to assign a feature amount to a keyword from past sensor system data value patterns and keyword correspondence examples. The procedure associates the keyword in the text data with the pattern of the value of the sensor system data, which is known to have occurred at the same time in the past, and stores the feature values that are likely to occur for a specific keyword as a correspondence table. Using the correspondence table, for example, the feature quantity / keyword correspondence table, the most typical feature quantity is given to each keyword extracted this time. Finally, the average value of the feature values given to each extracted keyword is taken to obtain the estimated feature value for the current keyword.

The above-described feature amount estimation method is based on past data. However, as another method, an expression related to an obvious feature such as a temporal expression in document data is utilized and reflected in a numerical feature amount. A method is also possible. For example, if there is a report of “abnormal noise from the afternoon” as text data information, the keywords are extracted by document analysis, and “from the afternoon”, “gradually”, and “abnormal noise” are reflected in the feature amount. .

If the sensor is related to sound, the reflection method gives the value from the afternoon of the field related to the feature quantity arranged by time, and from the expression “gradually”, the value increases as time passes. Etc.

The parameter 5016 is used to determine the correlation calculation at different time delays as in FIG. The description content is the same as in FIG.

FIG. 6 is a configuration diagram of multiple time scale numerical feature data. In FIG. 6, the multiple time scale numerical feature value data 6010 shows a specific arrangement example of the numerical feature values 4015 in FIG. 4. FIG. 6 shows a case where there are three types of time scales (time resolutions) for the sake of simplicity. In the field of time scale 1, for example, sensor values are arranged in order of time every hour. In the field of the time scale 2, for example, morning measurement values and afternoon measurement values are input. In the scale 3 field, for example, a measurement value is input once a day.

Generally, in the original feature amount, it is rare that all of these scales are measured depending on the type of sensor and the measurement rule. Therefore, the missing field is estimated from other fields and interpolated. For example, if measurements are given on the shortest time scale, some representative values such as average, mode, median, and value at a specific time are calculated to estimate other fields. It can be a value. Conversely, when estimating a measurement value of a shorter time scale, a method of calculating by a method of inputting a measurement value having a large scale and the same value in all fields is the simplest method. If the measured values are only possible in principle on a long scale, there is no other way but to interpolate like that. Alternatively, it is also possible to input a value of a higher scale only for a representative time, and enter a predetermined value suitable for the type of sensing, such as a symbol indicating some missing value or a zero value in the other fields. .

On the other hand, in principle, it is possible to measure at shorter time intervals than in the current original data, and when the measurement value of similar data by the same sensor can be referred, It is possible to distribute weights so as to have the same shape as the change pattern. It is assumed that a certain value or a symbol indicating a deficit is input in all scales in the feature value converted by any of the above interpolation methods.

As described above, FIG. 6 shows an example in which there is one type of sensor value. As illustrated in FIG. 2, when a plurality of types of data are handled, each type is converted so as to have multiple time scales. In this case, the order of the fields is determined according to either one of the types arranged for each scale, or the data arranged for each scale for one sensor is arranged for each type.

FIG. 7 is a configuration diagram of the feature quantity / keyword correspondence table. In FIG. 7, a feature quantity / keyword correspondence table 7010 is a table used for estimating a keyword from a numeric feature quantity, or conversely, estimating a numeric feature quantity from a keyword, and includes an ID 7011, a representative feature quantity 7012, a keyword / weight. 7013. The ID 7011, the representative feature amount 7012, and the keyword / weight 7013 are configured as a plurality of pieces of data having different types or temporal resolutions as relevance information for managing the relevance between the plurality of pieces of data. ID 7011 is an identifier for identifying a feature amount. The representative feature amount 7012 is obtained by quantifying a representative feature amount among the feature amounts. The keyword / weight 7013 includes a keyword corresponding to the feature amount and the keyword weight, and the upper row indicates the keyword and the lower row indicates the weight.

At this time, the ID 7011, the representative feature amount 7012, and the keyword / weight 7013 include relevance information (first relevance information) in which a correspondence relationship between the numerical feature amount 4015 and the keyword 4014 is defined, and a numerical feature amount 5015. It is configured as relevance information (second relevance information) in which a correspondence relationship with the keyword keyword 5014 is defined, and the ID 7011 includes a place, time, facility, or work as information specifying data generated for each event. Information for identifying the person is stored.

In constructing the feature quantity / keyword correspondence table 7010, first, data in which as many corresponding keywords and feature quantities as possible coexist is prepared. Such data may be prepared manually, or may be generated by associating feature data that clearly co-occurred in the past with text data. Although such a plurality of data pairs can be directly used as the feature quantity / keyword correspondence table 7010, the following processing is performed in order to avoid variation in correspondence.

First, grouping is performed based on the “numerical proximity” of the feature values of each data. Here, as the numerical closeness, for example, the Euclidean distance is used by regarding the arrangement of feature amounts as one numerical vector. When one closeness is determined, the process of grouping is called clustering. After clustering, representative feature values are calculated from the feature values belonging to each group. As a calculation method, for example, it is possible to consider a sequence of feature values as one numerical vector and calculate an average vector thereof.

On the other hand, the keywords included in each cluster as a pair of feature values are all collected from the data in the cluster, and the frequency of each keyword is aggregated. The number of keywords determined in order from the keyword having the highest frequency is adopted as the corresponding keyword in the keyword table. At the same time, the frequency itself or the frequency normalized so that the total sum of the keywords is 1.0 is used as the keyword weight. Through the above processing, a pair of representative feature quantity (vector) and weighted keyword group is calculated for each cluster. The ID of each corresponding data is arbitrarily determined every time the corresponding data is determined.

As another method for associating the keyword with the feature amount, a method of directly calculating the correspondence between the feature amount vector and the vector representing the frequency of each word may be used without using the clustering. In the same coexistence data as described above, the procedure regards the feature quantity as a numerical vector, similar to the method used at the time of clustering. Prepare (hereinafter referred to as keyword appearance frequency vector). As a result of the above, a pair of feature vector and keyword appearance frequency vector can be created in one coexistence data. Therefore, when converting the relationship between them, the parameters of the conversion function are set so that the conversion can be performed with the least error. Just decide.

As a representation of the conversion function, for example, a method that assumes linear conversion is typical. This coefficient is represented by one coefficient matrix and one coefficient vector. When the coefficient matrix is A and the coefficient vector is b, the formula for predicting the keyword appearance frequency vector from the feature vector is:
(Keyword appearance frequency vector) = A × (feature quantity vector) + b
It is expressed as Here, multiplication of the matrix A and the feature vector is a product of a normal matrix and a vector, and addition of the vector b is addition between the vectors.

Conversely, an expression for predicting a feature quantity vector from a keyword appearance frequency vector may be determined by replacing (feature quantity vector) and (keyword appearance frequency vector) with the above inverse function. In addition, the calculation procedure of the above coefficient should just use the technique of the multiple regression analysis in multivariate analysis.

Fig. 8 shows an example of a conversion function assuming the linear conversion. In FIG. 8, a feature quantity / keyword conversion coefficient 4201 is the coefficient matrix A, and a feature quantity / keyword conversion coefficient 4202 is a coefficient vector b. That is, each component of the coefficient matrix A indicates the magnitude of the relationship between the keyword appearance frequency vector and the feature quantity vector, and the coefficient vector b is the magnitude of the keyword vector after being converted by the feature quantity / keyword correspondence table 4201. It is a value added to optimize the thickness.

Next, sensor system feature amount extraction processing will be described with reference to the flowchart of FIG. This process shows the procedure for converting the sensor system original data into the sensor system feature quantity using the feature quantity / keyword correspondence table 7010.

First, the feature amount extraction unit 1012 determines what type of data is the original data 1021 stored in the storage device 102, and determines that the original data 1020 is text-based original data 2010. In this case, the data ID 4012 is assigned to the original data 2010 (S901). Next, the feature quantity extraction unit 1012 shapes the feature quantity according to the type of data based on the original data 2010 (S902). Feature shaping is, for example, missing value interpolation or conversion to a value shifted in time by data interpolation when the data type does not match the standard time measurement. .

Next, the feature quantity extraction unit 1012 multi-scales the feature quantity with respect to the shaped feature quantity (S903). The method is as described in FIG. Next, the feature quantity extraction unit 1012 performs keyword estimation as described in FIG. 4 (S9004). In the keyword estimation method, a typical keyword is assigned to a feature amount from a past feature amount pattern and a corresponding example of the keyword.

As an example of the method, a method using the feature quantity / keyword table 7010 described above with reference to FIG. 7 will be described. First, for a given feature quantity pattern, a representative feature quantity closest to the representative feature quantity 7012 in the feature quantity / keyword table 7010 is selected. Here, the closeness between the feature amounts is calculated using, for example, the Euclidean distance used when the feature amount / keyword table 7010 is created.

When the closest representative feature is selected, the keyword in the feature / keyword table 7010 corresponding to the selected representative feature is estimated, and the estimated keyword is used as the estimated keyword.

Finally, the feature quantity extraction unit 1012 determines a feature quantity parameter (S905). The parameter indicates how much time the correlation should be calculated when calculating the correlation of each pair with each feature amount as a brute force pair.

These parameters are derived from the characteristics for the combination of feature values, and are basically calculated manually. For example, if the feature amount of a certain data ID 4012 indicates the pressure in the container of a certain facility, and the feature amount of another data ID 4012 indicates the temperature of the device attached to the facility, When it is known that a high correlation is produced with a delay of one hour, the time delay between data for which correlation is to be calculated is defined as one hour. Alternatively, a time delay having a high correlation may be empirically set as a parameter by analyzing past data using a plurality of time delays.

Next, the text feature extraction process will be described with reference to the flowchart of FIG. This process shows a procedure for converting text-based original data into text-based feature values using the feature-value / keyword correspondence table 7010.

When the feature quantity extraction unit 1012 determines what kind of data is the original data 1021 stored in the storage device 1020, and determines that the original data 1021 is text-based original data 3010 The data ID 5012 is assigned to the original data 3010 (S1001).

Next, the feature quantity extraction unit 1012 performs text processing on the original data 3010 obtained from the text data 1005 using natural language processing techniques such as word segmentation and part-of-speech identification (S1002). The processed results include words that are generally too difficult to use as keywords, numbers, particles, particles that are not independent words, and the like. Therefore, the feature quantity extraction unit 1012 extracts from the original data 3010 what can be a keyword using the frequency, the part of speech, and other features (S1003).

Next, contrary to the procedure described in FIG. 9, the feature quantity extraction unit 1012 uses the feature quantity / keyword table 7010 to extract converted feature quantities from the keywords (S1004). For example, in the feature / keyword table 7010, all the lines including the keyword extracted from the current data are extracted, and the weighted average of the extracted lines is converted by the keyword weight. And

For example, the keyword “vapor pressure” is on the first and second lines in the feature quantity / keyword table 7010, and the keyword “high temperature” is on the first and third lines in the feature quantity / keyword table 7010. In this case, the weight for the feature amount in the first row is obtained by adding the weights derived from “vapor pressure” and “high temperature”, and the weight for the feature amount in the second row is set as the weight of “vapor pressure”. Similarly, the eye performs a weighted average among the three representative feature amounts as the “high temperature” weight.

Finally, the feature quantity extraction unit 1012 determines parameters (S1005). As in the process of FIG. 9, the parameter is determined by using data given manually or by using past data, depending on the data contents.

Next, the correlation calculation process will be described with reference to the flowchart of FIG. This process shows a procedure for calculating the correlation between the two feature amounts converted by the processes of FIGS. 9 and 10. In the matching judgment, the parameters of both feature values are viewed, and if they match, the correlation is calculated, and otherwise, the values indicating mismatch are returned assuming that the parameters of both features do not match. It is said.

First, the correlation calculation unit 1013 inputs the feature amounts of the two target data from the feature amount data 1022 stored in the storage device 1020 (S1101). Next, the correlation calculation unit 1013 reads various IDs from the input feature amount (S1102). Here, the various IDs are the above-described type IDs indicating the types of feature quantities and IDs indicating the origin of data.

Next, the correlation calculation unit 1013 reads parameters corresponding to various IDs of the partner feature amount (S1103). As described above, the parameter describes a delay time with respect to a partner that can be matched.

Next, the correlation calculation unit 1013 determines whether or not each read parameter is matched (S1104). As a result of examining the parameters, if the combination can be matched, the time delay specified by the parameter is calculated. Along with this, a correlation value of the feature amount is calculated (S1105). For example, when the time delay of each read parameter satisfies the condition of time resolution and matching is established between both feature quantities, the correlation calculation unit 1013 calculates a correlation value between both feature quantities.

Subsequently, the correlation calculation unit 1013 also calculates the degree of co-occurrence between keywords (S1106). For example, the correlation calculation unit 1013 selects, from the keyword 4014 and the keyword 5014, the keyword 4014 and the keyword 5014 in which the time delay of each read parameter satisfies the time resolution condition, and between the selected keyword 4014 and the keyword 5014. The co-occurrence degree of is calculated.

Here, the degree of co-occurrence of keywords is an amount indicating what percentage of common keywords are present between the feature amounts of each other.
For example, (number of common keywords) /
{(Total number of keywords in data 1) × (total number of keywords in data 2)}
It is calculated using the following formula.

On the other hand, as a result of examining the parameters, if it is determined that matching is not possible, the correlation calculation unit 1013 returns a determination result of “mismatch” (S1107). Here, when the amount of time delay is larger than the time width stored in each feature amount, it is determined that the matching is impossible.

In the present embodiment, the feature amount extraction unit 1012 is a feature amount / keyword correspondence table that manages a plurality of pieces of data having different types or time resolutions, which are data generated for each event, and managing the relationship between the pieces of data. 7010 functions as a data conversion unit that converts data into a data format common to each data, and the correlation calculation unit 1013 functions as a correlation calculation unit that calculates the correlation of each data obtained by the feature amount extraction unit 1012 To do.

At this time, the feature amount extraction unit 1012 generates data for each event, which is measurement data obtained by measurement by the measuring device or language data (text data) indicating the work record of the worker, and is generated for each event. In the case of being composed of a plurality of time-series data having different time resolutions indicating the generation time of data, a numerical feature quantity (first numerical feature quantity data) 4015 indicating the feature quantity of the measurement data is selected from the measurement data. Extraction is performed for each time resolution, and each extracted numeric feature 4015 is converted into a keyword (first language data) 4014 related to language data based on a feature / keyword correspondence table (relevance information) 7010, A keyword (second language data) 5014 related to the event is extracted from the language data for each time resolution, and each extracted keyword is extracted. 014 a numerical feature value based on the (first numerical feature data) 5015, functions as a data converter for converting the numerical feature amount indicating a feature value of the keyword 5014 (second numerical feature data) 5015.

In addition, the correlation calculation unit 1013 is specified by

parameters

4016 and 5016 added to measurement data or language data (text data) from the numeric feature 4015 and the numeric feature 5015 obtained by the feature extraction unit 1012. A numerical feature quantity 4015 and a numerical feature quantity 5015 (numerical feature quantity for which matching is established) satisfying a time resolution condition are selected, and a correlation between the selected numerical feature quantity 4015 and the numerical feature quantity 5015 is calculated. Of the keyword 4014 and the keyword 5014 obtained by the quantity extraction unit 1012, the keyword 4014 and the keyword 5014 satisfying the time resolution condition specified by the

parameters

4016 and 5016 added to the measurement data or language data (text data). (Keyword that matches), Functioning as correlation calculating unit that calculates a correlation between the keywords 4014 and the keyword 5014-option was.

Further, the feature quantity extraction unit 1012, when there is insufficient numeric feature quantity data in the numeric feature quantity data of each time resolution (time scale), the missing numeric feature quantity data is referred to as the numeric feature quantity data. It functions as a data conversion unit that estimates from numerical feature quantity data having other time resolutions having different time resolutions or numerical feature quantity data having the same time resolution as deficient numerical feature quantity data.

According to the present embodiment, by using the feature quantity / keyword correspondence table 7010, even when text data (language data) and measurement data are mixed as multiple types of data, the correlation between the multiple types of data is calculated. can do.

Further, according to this embodiment, the correlation between different types of data can be calculated with an appropriate time scale, for example, with a time delay.

In addition, according to the present embodiment, by setting importance and correlation criteria and calculating only the correlation of partial data, the relevance of a wide range of data can be monitored, but the calculation amount and data amount are suppressed. Is possible. In addition, it is possible to change the data amount as necessary while leaving data in order from the highest importance.

(Second embodiment)
In this embodiment, in addition to the correlation calculation method described in the first embodiment, an embodiment of a process storage method for calculating a correlation of only partial data based on importance / correlation criteria will be described.

According to the method of the first embodiment, it is possible to calculate the correlation even between different types of data. However, in the actual system operation, the calculation time and the storage device are all calculated. Difficult due to capacity issues. Therefore, a method for presenting an approximate relationship based on partial calculation by some method is required. In the present embodiment, therefore, the method is provided by calculating the importance of the data.

FIG. 12 is a block diagram showing a second embodiment of the computer system according to the present invention. In the present embodiment, an approximate correlation calculation unit 1015 that calculates a correlation between other elements for which no correlation is calculated based on the correlation calculated by the correlation calculation unit 1013 is added to the processing unit 1010, and the storage device The importance level data 1024 is added in 1020, and the other configuration is the same as that of the first embodiment shown in FIG.

FIG. 13 is a configuration diagram of importance data used in this embodiment. In FIG. 13, importance level data 1024 includes a data ID 1301, an original importance level 1302, a propagation importance level 1303, and a propagation source 1304, and each row 1311 to 1314 corresponds to one piece of data. The calculation procedure will be described later.

The data ID 1301 is an identifier for identifying the importance data 1024. The original importance 1302 is information indicating the original importance of each data. The propagation importance 1303 is information calculated based on the correlation status and the original importance. The propagation source 1304 is information indicating the data propagation source.

Next, the importance calculation / partial correlation calculation processing will be described with reference to the flowchart of FIG. This process is a procedure for performing the above-described calculation of importance and correlation calculation of only partial data in the present embodiment. This procedure is performed for each new data and all existing data every time one sensor data 1004 or text data 1005 is added as new data.

The correlation calculation unit 1013 first inputs the feature amount of the input new data (S1401), and then determines the original importance for the input new data (S1402). The original importance of the new data is determined in consideration of the frequency that has become a major factor in the past cause analysis, whether it was related to an important event, and the like.

Next, the correlation calculation unit 1013 reads the existing data in order (S1403), and proceeds to the correlation calculation procedure between each read existing data and new data. In the correlation calculation procedure, first, the correlation calculation unit 1013 reads the importance of propagation of existing data that has already been calculated (definition will be described later) (S1409), and calculates the correlation between the existing data and new data (S1405). .

Next, the correlation calculation unit 1013 calculates a link importance level index from the calculated correlation value (S1406). Here, the connection importance is an index determined for each pair of data, and represents both the correlation and the importance by a single numerical value. That is, the connection importance is an amount that takes a larger value as the correlation is higher and the importance is higher.

Its simplest definition is
(Consolidation importance) = (Propagation importance of existing data) x (Correlation magnitude)
And it is sufficient.

Next, the correlation calculation unit 1013 determines whether or not the connection importance is greater than a storage threshold (S1407). If the connection importance is greater than a predetermined storage threshold, the correlation is stored ( S1408), the propagation importance of new data is calculated (S1409).

The importance of propagation of new data is
For example,
(Propagation importance) = (Original importance) + (Coefficient) × (Propagation importance of existing data) × (Correlation magnitude) Here, the coefficient is a predetermined value, and is a coefficient for preventing the propagation importance from becoming an excessively large value, such as the reciprocal of the average combination number of all past data. .

Next, the correlation calculation unit 1013 updates the propagation importance of existing data (S1410).

How to update
For example,
(Updated propagation importance) = (Propagation importance before update) + (Coefficient) × (Propagation importance of new data) × (Magnitude of correlation) Here, the coefficient is a predetermined value, and is set in the same manner as in the case of the propagation importance of new data.

On the other hand, if it is determined in step S1407 that the connection importance is smaller than a predetermined storage threshold, or after the processing in step S1409, the correlation calculation unit 1013 determines whether all the processing of existing data has been completed. If it is determined (S1411) and there is a process for the next existing data, the process returns to the process in step S1409 and the processes in steps S1409 to S1411 are repeated for the next existing data. If it is determined that the processing has been completed, the processing in this routine is repeated.

Next, the approximate correlation calculation process will be described with reference to the flowchart of FIG. This process shows a procedure for approximately calculating the correlation between arbitrary data.

First, the approximate correlation calculation unit 1015 designates a data pair whose correlation is to be calculated (S1501). The approximate correlation calculation unit 1015 determines whether the above-described stored correlation exists for the specified data pair (S1502). If the stored correlation exists, the stored correlation is output (S1503). .

On the other hand, if it is determined in step S1502 that there is no stored correlation, the approximate correlation calculation unit 1015 calculates the importance of each data, and determines whether both importances are larger than the calculation threshold (S1504). If it is determined that both importance levels are larger than the calculated threshold value, the correlation between both data is calculated (S1505), and there is data with a degree of importance less than the calculated threshold value, and both importance levels are not greater than the calculated threshold value. Is determined, the correlation between the two data is approximately zero (S1506). Here, as the importance, the value is used when the propagation importance exists in the data, and the original importance is used when the propagation importance does not exist.

According to the present embodiment, the importance of the connection between events is calculated using the importance of each event and the magnitude of the correlation, and only the correlation of the partial data is calculated based on the calculation result. The amount of calculation and data can be reduced.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

For example, when the plurality of different types of data is measurement data and language data, the feature amount extraction unit 1012 includes a numerical feature amount (first numerical feature amount) indicating the feature amount of the measurement data from the measurement data. Data) 4014 is extracted, and the extracted numerical feature quantity 4014 is converted into a keyword (first language data) 4014 based on the feature quantity / keyword correspondence table 7010, and a keyword related to an event is selected from the language data. (Second language data) 5014 is extracted, and the extracted keyword 5014 functions as a data conversion unit that converts the extracted keyword 5014 into a numerical feature quantity (second numerical feature quantity data) 5015 based on the feature quantity / keyword correspondence table 7010. .

The correlation calculation unit 1013 calculates the correlation between the numeric feature 4040 and the numeric feature 5015 based on the numeric feature 4014 and the numeric feature 5015 obtained by the feature extraction unit 1012, and obtains the correlation by the feature extraction unit 1012. Based on the obtained keyword 4014 and keyword 5014, it can function as a correlation calculation unit that calculates the correlation between the keyword 4014 and the keyword 5014.

In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files that realize each function is stored in memory, a hard disk, a recording device such as an SSD (Solid State Drive), an IC (Integrated Circuit) card, an SD (Secure Digital) memory card, a DVD ( It can be recorded on a recording medium such as Digital Versatile Disc).

1001 Correlation analysis server, 1002 terminal, 1004 sensor data, 1005 text data, 1010 processing unit, 1020 storage device, 1011 data management unit, 1012 feature amount extraction unit, 1013 correlation calculation unit, 1014 display content configuration unit.

Claims

In a correlation analyzer that analyzes the relationship between multiple events and extracts other events related to the event being analyzed,
Data generated for each event, and a plurality of data having different types or temporal resolutions are converted into data having a common data format by using relevance information for managing relevance between the plurality of data. A data converter to convert;
A correlation calculation unit that calculates a correlation of each data converted by the data conversion unit.
The correlation analyzer according to claim 1,
The data converter is
When the plurality of different types of data are measurement data obtained by measurement by a measurement device and language data indicating a work record of an operator, a first numerical value indicating a feature amount of the measurement data from the measurement data Extracting feature quantity data, converting the extracted first numerical feature quantity data into first language data related to the language data based on the relevance information, and from among the language data, Second language data related to the event is extracted, and the extracted second language data is converted into second numerical feature value data indicating the feature value of the second language data based on the relevance information. And
The correlation calculation unit
The first numerical feature value data and the second numerical feature are based on the first numerical feature value data extracted by the data converter and the second numerical feature value data converted by the data converter. The first language data and the first language data are calculated based on the first language data converted by the data conversion unit and the second language data extracted by the data conversion unit. A correlation analysis apparatus characterized by calculating a correlation with two language data.
The correlation analyzer according to claim 2,
The relevance information is
Correspondence between the first relevance information in which the correspondence between the first numerical feature data and the first language data is defined, and the correspondence between the second numerical feature data and the second language data Is defined, the first relevance information and the second relevance information are information specifying data generated for each event, and A correlation analyzer characterized by including information for specifying time, equipment, or workers.
The correlation analyzer according to claim 1,
The data converter is
When the plurality of pieces of data having different time resolutions are composed of a plurality of pieces of time series data having different time resolutions indicating the generation time of data generated for each event, the time series data of each time resolution is numerically characterized. Converted to quantity data,
The correlation calculation unit
A correlation analysis apparatus characterized by calculating a correlation of each numerical feature quantity data converted by the data conversion unit.
The correlation analyzer according to claim 4,
The data converter is
If there is a missing numerical feature quantity data in the converted numeric feature quantity data of each time resolution, the missing numeric feature quantity data is transferred to another time whose time resolution is different from the numeric feature quantity data. A correlation analysis apparatus characterized by estimating from numerical feature value data of resolution or numerical feature value data of the same time resolution as the missing numerical feature value data.
The correlation analyzer according to claim 1,
The data converter is
The data generated for each event is measurement data obtained by measurement by a measuring instrument or language data indicating a work record of an operator, and a plurality of time resolutions indicating generation times of data generated for each event are different. In the case of being composed of time series data, first numerical feature amount data indicating the feature amount of the measurement data is extracted from the measurement data for each time resolution, and each of the extracted first numerical features is extracted. Based on the relevance information, quantity data is converted into first language data related to the language data, and second language data related to the event is converted from the language data for each time resolution. Extracting and converting each extracted second language data into second numerical feature value data indicating a feature value of the second language data based on the relevance information;
The correlation calculation unit
The first numerical feature value data extracted by the data converter and the second numerical feature value data converted by the data converter are specified by parameters added to the measurement data or the language data. Numerical feature quantity data satisfying the time resolution condition is selected, the correlation between the selected first numeric feature quantity data and the second numeric feature quantity data is calculated, and the data converted by the data converter Selecting language data satisfying a condition of time resolution specified by the measurement data or a parameter added to the language data from one language data and the second language data extracted by the data conversion unit; A correlation analyzer for calculating a correlation between the selected first language data and second language data.
The correlation analyzer according to claim 1,
The correlation calculation unit
When new data and existing data exist in each data converted by the data conversion unit, the importance of the connection of the existing data is calculated based on the importance of the individual existing data and the magnitude of correlation, A correlation analysis apparatus characterized in that, when the calculation result exceeds a threshold value, a correlation between the new data and the existing data is calculated.
The correlation analyzer according to claim 7,
The correlation calculation unit
A correlation analysis apparatus, wherein the importance of each existing data is updated based on the importance of each new data and the magnitude of correlation.
In a correlation analysis method that analyzes the relationship between multiple events and extracts other events related to the event being analyzed,
Data generated for each event, and a plurality of data having different types or temporal resolutions are converted into data having a common data format by using relevance information for managing relevance between the plurality of data. A data conversion step to convert;
A correlation calculation step of calculating a correlation of each data converted in the data conversion step.
The correlation analysis method according to claim 9,
In the data conversion step,
When the plurality of different types of data are measurement data obtained by measurement by a measurement device and language data indicating a work record of an operator, a first numerical value indicating a feature amount of the measurement data from the measurement data Extracting feature quantity data, converting the extracted first numerical feature quantity data into first language data related to the language data based on the relevance information, and from among the language data, Second language data related to the event is extracted, and the extracted second language data is converted into second numerical feature value data indicating the feature value of the second language data based on the relevance information. And
In the correlation calculating step,
The first numerical feature value data and the second numerical feature are based on the first numerical feature value data extracted in the data conversion step and the second numerical feature value data converted in the data conversion step. Calculating the correlation with the quantity data, and based on the first language data converted in the data conversion step and the second language data extracted in the data conversion step, the first language data and the first language data A correlation analysis method characterized by calculating a correlation with two language data.
The correlation analysis method according to claim 10,
The relevance information is
Correspondence between the first relevance information in which the correspondence between the first numerical feature data and the first language data is defined, and the correspondence between the second numerical feature data and the second language data Is defined, the first relevance information and the second relevance information are information specifying data generated for each event, and A correlation analysis method characterized by including information specifying time, equipment or workers.
The correlation analysis method according to claim 9,
In the data conversion step,
When the plurality of pieces of data having different time resolutions are composed of a plurality of pieces of time series data having different time resolutions indicating the generation time of data generated for each event, the time series data of each time resolution is numerically characterized. Converted to quantity data,
In the correlation calculating step,
A correlation analysis method characterized by calculating a correlation of each numerical feature value data converted in the data conversion step.
The correlation analysis method according to claim 12,
In the data conversion,
If there is a missing numerical feature quantity data in the converted numeric feature quantity data of each time resolution, the missing numeric feature quantity data is transferred to another time whose time resolution is different from the numeric feature quantity data. A correlation analysis method characterized by estimating from numerical feature value data of resolution or numerical feature value data of the same time resolution as the missing numerical feature value data.
The correlation analysis method according to claim 9,
In the data conversion step,
The data generated for each event is measurement data obtained by measurement by a measuring instrument or language data indicating a work record of an operator, and a plurality of time resolutions indicating generation times of data generated for each event are different. In the case of being composed of time series data, first numerical feature amount data indicating the feature amount of the measurement data is extracted from the measurement data for each time resolution, and each of the extracted first numerical features is extracted. Based on the relevance information, quantity data is converted into first language data related to the language data, and second language data related to the event is converted from the language data for each time resolution. Extracting and converting each extracted second language data into second numerical feature value data indicating a feature value of the second language data based on the relevance information;
In the correlation calculating step,
The first numerical feature value data extracted in the data conversion step and the second numerical feature value data converted in the data conversion step are specified by parameters added to the measurement data or the language data. Numerical feature quantity data satisfying the conditions of the time resolution to be selected, the correlation between the selected first numeric feature quantity data and the second numeric feature quantity data is calculated, and the data converted in the data conversion step Selecting language data satisfying a condition of time resolution specified by the measurement data or a parameter added to the language data from one language data and the second language data extracted in the data conversion step; A correlation analysis method comprising calculating a correlation between the selected first language data and second language data.
The correlation analysis method according to claim 9, comprising:
In the correlation calculating step,
When new data and existing data exist in each data converted in the data conversion step, calculate the importance of concatenation of the existing data based on the importance of each existing data and the magnitude of correlation, A correlation analysis method characterized by calculating a correlation between the new data and the existing data when the calculation result exceeds a threshold value.