WO2019130419A1 - Analysis device, analysis method, and program recording medium - Google Patents

Analysis device, analysis method, and program recording medium Download PDF

Info

Publication number
WO2019130419A1
WO2019130419A1 PCT/JP2017/046608 JP2017046608W WO2019130419A1 WO 2019130419 A1 WO2019130419 A1 WO 2019130419A1 JP 2017046608 W JP2017046608 W JP 2017046608W WO 2019130419 A1 WO2019130419 A1 WO 2019130419A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
residual
residual matrix
refinement
analysis
Prior art date
Application number
PCT/JP2017/046608
Other languages
French (fr)
Japanese (ja)
Inventor
翼 高橋
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2017/046608 priority Critical patent/WO2019130419A1/en
Publication of WO2019130419A1 publication Critical patent/WO2019130419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Definitions

  • the present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in data.
  • the present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in matrix data of a set of event vectors.
  • a network intrusion detection device IDS: Intrusion Detection System
  • a factory temperature control device and the like are provided with a sensor device for observing a state or value related to an observation target.
  • the sensor devices sometimes use data associated with the state or value related to the observation target (hereinafter referred to as observation value) and information including the observation time at which the observation value was observed (hereinafter referred to as time stamp). Generate every moment.
  • observation value data associated with the state or value related to the observation target
  • time stamp information including the observation time at which the observation value was observed
  • a network, a factory, etc. can be constantly monitored by distributing data in a stream format, in which the observation value thus generated every moment and the time stamp are linked in this way.
  • a sequence of data including observation values and timestamps distributed in a stream format in this manner is called a data stream.
  • a mini-blog such as Twitter (registered trademark), a proxy server log, an IDS alert log, and the like can be given as an example of a data stream. If the data stream to be observed is acquired and the observed data is analyzed, it becomes possible to determine whether or not the observation target is normal, to grasp the state such as finding suspicious behavior, and to classify and classify events. .
  • the data contained in the data stream is mixed with various events. For example, focusing on the frequency of occurrence of events, such as events that occur frequently (hereinafter referred to as major events), events that rarely occur (hereinafter referred to as rare events), and events that occur with moderate frequency It can be classified.
  • the main patterns in the data corresponding to each event are called topics.
  • a pattern means the combination of the value which appears in common.
  • the pattern of combinations of key values is called a topic.
  • patterns and topics are treated as equivalent.
  • Finding a pattern that represents an event is important for understanding the characteristics of the security device that issues the alert, and for understanding anomalies that do not normally occur.
  • data such as data stream format, sequence format, and document format are converted into vector format including, for example, frequency of events, words, keywords, etc. included in the data.
  • vector format including, for example, frequency of events, words, keywords, etc. included in the data.
  • topic analysis can also be applied to sequence data.
  • FIG. 16 is a graph showing the relationship between the topic corresponding to an event and the frequency of the topic.
  • the frequency of topics has a power-law relationship.
  • the frequency of the rare event shown by a dashed line frame in FIG. 16 is extremely small compared to the major event.
  • the data stream may contain many simple noises that can not be identified as events.
  • rare events are regarded as errors because they are relatively small values from the viewpoint of major events. Therefore, in simple topic analysis, rare events are likely to be misinterpreted as topics. Therefore, it is required to clearly distinguish rare events contained in a data stream from noise.
  • Non-Patent Document 1 discloses a topic analysis method using L-Ens NMF (Local Ensemble of Non-Continuous Matrix Factorization).
  • L-Ens NMF Local Ensemble of Non-Continuous Matrix Factorization
  • a predetermined number of topics are acquired by matrix decomposition, and a residual matrix which is a portion not corresponding to the topics acquired by matrix decomposition is generated.
  • a portion (event or the like) which can not be acquired as a topic is emphasized (boosted) with respect to the generated residual matrix, and a predetermined number of topics are again boosted.
  • Matrix decomposition for the residual matrix In the topic analysis method of Non-Patent Document 1, the above operation is recursively repeated until a set number of topics are obtained.
  • Non-Patent Document 2 discloses Group Lasso (Least Absolute Shrinkage and Selection Operator) regularization, which is a type of sparse regularization.
  • Group Lasso regularization is a regularization that simultaneously reduces variables belonging to a group to 0 for a group of variables. That is, the Group Lasso regularization is a regularization that has the effect of forcing it to become sparse.
  • Non-Patent Document 3 discloses Joint Sparse PCA (JSPCA) and Joint Sparse PCA (GJSPCA), which are improved versions of Principal Component Analysis (PCA).
  • JSPCA Joint Sparse PCA
  • GJSPCA Joint Sparse PCA
  • PCA Principal Component Analysis
  • Group Lasso regularization is used to form variable groups with high-order components in principal component analysis (PCA).
  • PCA principal component analysis
  • the higher order components have more major components of data. Therefore, the variable group formed by the method of Non-Patent Document 3 is estimated as a dense pattern including more features that many data have in common.
  • Non-Patent Document 4 and Non-Patent Document 5 disclose matrix decomposition using Group Lasso regularization. According to the methods of Non-Patent Document 4 and Non-Patent Document 5, it is possible to perform matrix decomposition with robustness in which the pattern obtained by principal component analysis is less susceptible to noise and outliers.
  • Non-Patent Document 1 not only major topics but also topics with medium frequency can be acquired.
  • the method of Non-Patent Document 1 only emphasizes a portion of a matrix to be a target of topic analysis that can not be acquired as a topic, and there is no mechanism to distinguish residuals from noise. Therefore, in the method of Non-Patent Document 1, a low-frequency residual is captured as noise, or noise is mixed in the low-frequency residual. That is, the method of Non-Patent Document 1 has a problem that it is difficult to obtain a topic from a residual having a low frequency.
  • variable groups can be estimated as patterns that are constantly expressed in many data.
  • An object of the present invention is to solve the above-mentioned problems and to provide an analysis device that makes it possible to find out patterns related to infrequent events.
  • An analysis apparatus performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics. , And a matrix product storage unit in which matrix products are stored, and at least one matrix product and analysis target matrix stored in the matrix product storage unit. And a residual matrix deriving unit that derives a residual matrix corresponding to the difference between the analysis target matrix and the matrix product, and acquiring the residual matrix, and removing the noise included in the residual matrix to perform refinement.
  • Obtain a residual matrix refinement unit that generates a residual matrix obtain an analysis target matrix and a refinement residual matrix, and based on the analysis target matrix and the refinement residual matrix, elements including topics that have not yet been acquired are emphasized Stressed Comprising a residual matrix boost unit for deriving a difference matrix, and enhancement residual matrix storage unit that residual enhancement residual matrix derived by matrix boosting unit is accumulated.
  • the analysis method by performing topic analysis on an analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics , The matrix product of the index matrix and the dictionary matrix, the matrix product is accumulated, and the residual matrix corresponding to the difference between the analysis target matrix, and the stored at least one matrix product and the analysis target matrix To generate a refinement residual matrix by removing the noise contained in the residual matrix, and based on the analysis target matrix and the refinement residual matrix, emphasizing the elements including topics that have not been acquired yet The residual matrix is derived, and the enhanced residual matrix is included in the analysis target matrix and accumulated.
  • a program performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes topics. Equivalent to the process of generating, the process of calculating the matrix product of the index matrix and the dictionary matrix, the process of accumulating the matrix product, the analysis target matrix, and the difference between the stored at least one matrix product and the analysis target matrix Processing for deriving a residual matrix, processing for generating a refinement residual matrix by removing noise included in the residual matrix, and a topic not yet obtained based on the analysis target matrix and the refinement residual matrix The computer is caused to execute a process of deriving an enhanced residual matrix in which elements including X are enhanced, and a process of including the enhanced residual matrix in the analysis target matrix and accumulating the matrix.
  • Residual matrix boost portion of the first analysis according to the embodiment apparatus is a conceptual diagram showing an example of L 1 norm of the values calculated by the present invention. It is a conceptual diagram which shows an example in which the remainder matrix boost part of the analyzer which concerns on the 1st Embodiment of this invention selects a row and a column.
  • NMF Nonnegative Matrix Factorization
  • FIG. 1 is a block diagram showing an example of the configuration of the analyzer 1 of the present embodiment.
  • the analysis device 1 includes a topic analysis unit 11, a matrix product storage unit 12, a residual matrix derivation unit 13, a residual matrix storage unit 14, a residual matrix refinement unit 15, and a refinement residual matrix storage unit 16. , Residual matrix boost unit 17, and enhanced residual matrix storage unit 18.
  • FIG. 1 illustrates an example of analyzing the input matrix A stored in the storage device 100.
  • the input matrix A may be configured to be acquired from the storage device 100 configuring the external system via the network, or may be configured to be acquired from the storage device 100 provided in parallel to the analysis apparatus 1. Good.
  • the topic analysis unit 11 performs topic analysis using NMF on a matrix to be analyzed (hereinafter, referred to as analysis target matrix).
  • the analysis target matrix is either the input matrix A or the enhanced residual matrix R L.
  • the emphasis residual matrix R L is a matrix in which the uncaptured part is emphasized in each iteration.
  • the enhancement residual matrix R L generated in each iteration is stored in the enhancement residual matrix storage unit 18.
  • the topic analysis unit 11 repeats the topic analysis on the input matrix A until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the topic analysis on the input matrix A a predetermined number of times (m times) (m is a natural number). Also, for example, the topic analysis unit 11 repeats topic analysis until the number of acquired topics reaches a predetermined number.
  • FIG. 2 is a conceptual diagram showing an example of the input matrix A.
  • cells corresponding to each element are expressed by shading according to the size of each element, L is for large value cells, M is for medium value cells, and is for small value cells. It is written as S. However, the value of the blank cell is 0.
  • cells with very small values may be denoted as VS from this point onward. Also, from this point onward, in the same figure, there are cases where the density and pattern of cells are different although the same notation is used.
  • the input matrix A is a matrix of I rows and J columns (I and J are natural numbers).
  • the value of the element of the i-th row and the j-th column of the input matrix A is expressed as A [i, j] (i and j are natural numbers).
  • the notation of the values of matrix elements is the same as in other matrices.
  • the topic analysis unit 11 when the topic analysis unit 11 receives an input matrix A, the topic analysis unit 11 starts a first iteration on the input matrix A that has been input. Then, when starting the second and subsequent iterations, the topic analysis unit 11 refers to the enhanced residual matrix storage unit 18 and applies to the enhanced residual matrix R L generated based on the previous iterations. Repeat the topic analysis.
  • the topic analysis unit 11 repeats topic analysis until k reaches a predetermined topic number k (k is a natural number). In other words, the topic analysis unit 11 acquires a number ks of topics smaller than a predetermined number k of topics by NMF (ks is a natural number). However, k and ks are natural numbers which satisfy the relation of ks ⁇ k. Practically, 1 or 2 or k / 2 can be set as ks.
  • the topic analysis unit 11 may repeat the topic analysis until the enhancement residual matrix R L stored in the enhancement residual matrix storage unit 18 becomes empty. The matrix being empty means that the values of all the cells are zero.
  • the topic analysis unit 11 performs topic analysis of the analysis target matrix, and generates a dictionary matrix (Dictionary Matrix) storing topics, and a index matrix (Membership Matrix) indicating which topics are included and to what extent.
  • a dictionary matrix (Dictionary Matrix) storing topics
  • a index matrix Membership Matrix
  • each of the dictionary matrix and index matrix generated in the m-th iteration is denoted as dictionary matrix H m and index matrix W m (m is an integer of 1 or more).
  • the topic analysis unit 11 calculates a matrix product W m H m of the generated index matrix W m and the dictionary matrix H m .
  • the topic analysis unit 11 stores the calculated matrix product W m H m in the matrix product storage unit 12.
  • FIG. 3 is an example of a matrix product W m H m of the index matrix W m and the dictionary matrix H m generated by the topic analysis unit 11 using the input matrix A of FIG.
  • the value of the cell (S) having the smaller value among the elements of the input matrix A (FIG. 2) is 0.
  • the residual matrix deriving unit 13 calculates a difference between any one of the input matrix A and the enhanced residual matrix R L and the matrix product W m H m generated by the topic analysis unit 11 as a matrix difference Ro. Then, the residual matrix deriving unit 13 replaces all negative values included in the matrix difference Ro with zero in order to remove from the matrix difference Ro negative values that adversely affect the NMF. As described above, a matrix obtained by replacing elements of negative values of the elements of the matrix difference Ro with 0 is called a residual matrix R.
  • FIG. 4 is a conceptual diagram showing an example of the residual matrix R.
  • the residual matrix deriving unit 13 calculates a matrix difference Ro which is a difference between the input matrix A (FIG. 2) and the matrix product W m H m (FIG. 3), and sets a negative element of the elements of the matrix difference Ro to zero.
  • the residual matrix R (FIG. 4) is generated by replacing.
  • the residual matrix deriving unit 13 stores the generated residual matrix R in the residual matrix storage unit 14.
  • the residual matrix refinement unit 15 obtains the residual matrix R from the residual matrix storage unit 14.
  • the residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise included in the residual matrix R.
  • the residual matrix refinement unit 15 stores the generated refinement residual matrix R * in the refinement residual matrix storage unit 16.
  • the residual matrix refinement unit 15 generates a refinement residual matrix R * in which elements below the threshold ⁇ 1 (also referred to as a first threshold) are replaced with 0 for each element of the residual matrix R.
  • the residual matrix refinement unit 15 may generate, for each element of the residual matrix R, an element that becomes negative by subtracting a predetermined value with 0 to generate a refinement residual matrix R *.
  • the residual matrix refinement unit 15 may generate a refinement residual matrix R * using a sparse estimation method such as Lasso (Least Absolute Shrinkage and Selection Operator).
  • the residual matrix refinement unit 15 may generate a refinement residual matrix after thinning out the elements to be subjected to noise removal.
  • the residual matrix refinement unit 15 refers to the matrix product storage unit 12 and obtains the matrix product W m H m stored in the matrix product storage unit 12.
  • the residual matrix refinement unit 15 derives the position (also referred to as a first position) of an element having a threshold ⁇ 2 (also referred to as a second threshold) or more with respect to the acquired matrix product W m H m .
  • the residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise from the elements of the first position L WH of the residual matrix R.
  • the residual matrix refinement unit 15 may set, for each of the set groups, the thinning amount of the element to be subjected to noise removal.
  • the residual matrix refinement unit 15 uses the group-by-group sparse estimation method such as Group Lasso when considering a specific row or column of matrix product as a group, and the thinning amount of the element to be subjected to noise removal May be set.
  • FIG. 5 is an example in which 1 is set in the cell of the first position L WH derived by the residual matrix refinement unit 15.
  • the residual matrix refinement unit 15 may derive the cell of the first position L WH using a list of cells or a hash instead of the matrix as shown in FIG.
  • FIG. 6 is an example of a refinement residual matrix R * generated based on the residual matrix R of FIG.
  • the element whose first position L WH is below the second threshold is filled with white. Note that the value of the white-filled cell is 0.
  • the refinement residual matrix R * excludes the small values left in the cell in which the topic has been acquired in the previous iteration. In other words, using refinement residual matrix R *, it is possible to exclude minute values left in cells in which a topic has already been acquired in topic analysis of subsequent iterations, so errors generated by topic analysis and Can distinguish between rare events. That is, the refinement residual matrix R * enhances the chance of acquiring rare events as topics.
  • the residual matrix boost unit 17 obtains a refinement residual matrix R * from the refinement residual matrix storage unit 16.
  • the residual matrix boost unit 17 generates an enhanced residual matrix R L in which the values of specific rows and columns of the acquired refinement residual matrix R * are enhanced in the following procedure.
  • the residual matrix boost unit 17 calculates the L 1 norm for each row of the refinement residual matrix R *.
  • the L 1 norm is the sum of the absolute values of the elements of each row of the refinement residual matrix R *.
  • the residual matrix boosting unit 17 considers that the row i is a probability distribution selected with the weight of Pr [i], and selects one row i.
  • the row selected by the residual matrix boost unit 17 is referred to as a reference row i *.
  • the residual matrix boosting unit 17 may select the reference row i * at random or may select the row i with the largest L 1 norm as the reference row i *.
  • residual matrix boost unit 17 calculates the L 1 norm for each column of refinement residual matrix R *.
  • the residual matrix boosting unit 17 generates a column reference vector Pc in which the L 1 norm of the j-th column of the refinement residual matrix R * is Pc [j] (Equation 2).
  • J is the number of columns of the uncaptured degree matrix U.
  • Pc (Pc [1],..., Pc [J]) (2)
  • the residual matrix boost unit 17 considers that the column j is a probability distribution selected with the weight of Pc [j], and selects one column j.
  • the column selected by the residual matrix boost unit 17 is referred to as a reference column j *.
  • the residual matrix boosting unit 17 may randomly select the reference sequence j *, or may select a sequence having the largest L 1 norm as the reference sequence j *.
  • the reference row and the reference column are selected based on the L 1 norm, but the residual matrix boost unit 17 selects the reference row or the reference row based on any statistic that can be calculated for each row or column.
  • a reference column may be selected.
  • the residual matrix boosting unit 17 may select the reference row or the reference column using the L 2 norm.
  • FIG. 7 is a conceptual diagram showing an example of L 1 norm values calculated in the row direction and column direction of the refinement residual matrix R * of FIG.
  • the numbers to the right of each line in FIG. 7 are the L 1 norm value of each line.
  • the upper numbers in each column of FIG. 8 are the values of L 1 norm in each column. That is, the numbers on the right side of each row of FIG. 7 are elements of the column reference vector Pr, and the numbers on the upper side of each column of FIG. 7 are elements of the row reference vector Pc.
  • FIG. 8 shows that rows and columns (hereinafter referred to as selected rows and selected columns) corresponding to reference row i * and reference column j * are refined based on the values of L 1 norm of FIG. It is a conceptual diagram which shows the example selected.
  • the first row of the refinement residual matrix R * is selected as the selection row i *
  • the second column of the refinement residual matrix R * is selected as the selection sequence j *.
  • the residual matrix boosting unit 17 generates a selected row vector A [i * ,:] composed of the values of the elements of the selected row i * of the input matrix A. Then, the residual matrix boosting unit 17 calculates the similarity sim (i *, i) between the selected row vector A [i * ,:] and all the rows of the input matrix A. Similarly, residual matrix boost unit 17 generates a selected column vector R [:, j *] composed of the values of the elements of selected column j * of residual matrix R. Then, the residual matrix boost unit 17 calculates the similarity sim (j *, j) between the selected column vector R [:, j *] and all the columns of the input matrix A.
  • the residual matrix boost unit 17 uses cosine similarity as the similarity sim ( ⁇ ). However, the residual matrix boost unit 17 may calculate the similarity sim ( ⁇ , ⁇ ) using a method other than cosine similarity. In the second and subsequent iterations, the similarity sim ( ⁇ , ⁇ ) may be calculated for the enhanced residual matrix to be analyzed.
  • the residual matrix boosting unit 17 is a diagonal matrix of I rows and I columns in which the similarity sim (i *, i) is set to the diagonal element D c [i, i] and 0 is set to the nondiagonal elements. Generate D c . Similarly, residual matrix boosting unit 17 sets J to j in which the similarity sim (j *, j) is set to diagonal element D r [j, j] and 0 is set to non-diagonal elements. Generate diagonal matrix D r .
  • the residual matrix boost unit 17 calculates a matrix product D c RD r of the diagonal matrix D c , the residual matrix R, and the diagonal matrix D r .
  • the matrix product D c RD r is the enhanced residual matrix R L.
  • the residual matrix boost unit 17 stores the enhanced residual matrix R L , which is the calculation result, in the enhanced residual matrix storage unit 18.
  • the diagonal matrix D c has an effect of emphasizing the values of rows similar to the selected row i * in the selected row i * and the input matrix A and attenuating the values of the other rows.
  • the diagonal matrix D r has an effect of emphasizing the values of columns similar to the selection column j * and attenuating the values of the other columns.
  • FIG. 9 is an example of the enhanced residual matrix R L generated by the residual matrix boost unit 17.
  • the values of the other columns having low similarity with the j * column are attenuated, and as a result, the attenuation of the values in the first row is large (M to S).
  • the other rows having low similarity to the i * th row are also attenuated, but the third and fourth rows (the third column) have the original values (the S) is small, so the effect to be attenuated is small.
  • the refinement residual matrix R * generated by the residual matrix refinement unit 15 is a value other than the topic acquired by the topic analysis unit 11, that is, the value of the topic not acquired by the topic analysis unit 11 yet. Emphasize. Furthermore, the residual matrix refinement unit 15 eliminates the error caused by acquiring the topic by squashing the value of the cell whose residual caused by the topic acquired by the topic analysis unit 11 is smaller than a predetermined threshold to 0. Do. As a result, in the subsequent iteration, the topic analysis unit 11 has a high opportunity to acquire a part that has not been acquired as a topic. That is, the possibility of acquiring rare topics which could not be acquired in the first topic analysis is increased.
  • the topic analysis unit 11 receives the enhanced residual matrix R L as input, and repeats the above-described processing until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the above-described process until a predetermined topic number k of topics is obtained or the enhanced residual matrix R L becomes empty.
  • the structure of the analyzer 1 of this embodiment is not limited to the above-mentioned structure.
  • the function of one component may be assigned to another component, or the function of one component may be shared with another component.
  • a single component may be configured to have a function shared by separate components.
  • another function may be added to the function of each component.
  • FIG. 10 is a flowchart for explaining the first iteration by the analyzer 1.
  • FIG. 11 is a flowchart for explaining the second and subsequent iterations by the analyzer 1.
  • the analyzer 1 will be described as an operation subject.
  • the analyzer 1 receives an input matrix A (step S11).
  • the analyzer 1 performs topic analysis of the input matrix A (step S12). At this time, the analysis apparatus 1 generates a dictionary matrix H 1 and an index matrix W 1 of the input matrix A.
  • the analyzer 1 calculates the matrix product W 1 H 1 between the index matrices W 1 and dictionary matrix H 1 (step S13).
  • the analyzer 1 calculates a matrix difference Ro as a difference between the input matrix A and the matrix product W 1 H 1, and a residual matrix R in which negative elements of the elements of the calculated matrix difference Ro are replaced with 0. Are generated (step S14).
  • the analyzer 1 obtains the matrix product W 1 H 1, with respect to the obtained matrix product W 1 H 1, deriving a first position L WH corresponding to the position of a cell having a threshold theta 2 or more values (Step S15).
  • Step S16 when the element of the cell at the first position L WH in the residual matrix R is less than or equal to the threshold ⁇ 1 , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S16).
  • the analyzer 1 refers to the input matrix A and the refinement residual matrix R * to generate an enhanced residual matrix R L in which elements of specific columns and rows of the input matrix A are enhanced (step S17). ).
  • the analysis device 1 stores the generated enhanced residual matrix R L in the enhanced residual matrix storage unit 18 (step S18). After step S18, the process proceeds to A of the flowchart of FIG.
  • the analysis device 1 executes topic analysis of the enhanced residual matrix R L stored in the enhanced residual matrix storage unit 18 (step S21). At this time, the analyzer 1 generates a dictionary matrix H m and an index matrix W m .
  • the analyzer 1 calculates a matrix product W m H m of the index matrix W m and the dictionary matrix H m (step S22).
  • the analysis device 1 calculates the matrix difference Ro as the difference between the enhancement residual matrix R L and the matrix product W m H m, and leaves the negative element of the elements of the calculated matrix difference Ro replaced with 0.
  • a difference matrix R is generated (step S23).
  • the analyzer 1 obtains the matrix product W 1 H 1, with respect to the obtained matrix product W 1 H 1, deriving a first position L WH corresponding to the position of a cell having a threshold theta 2 or more elements (Step S24).
  • Step S25 when the element of the cell at the first position L WH in the residual matrix R is less than or equal to the threshold ⁇ 1 , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S25).
  • the analyzer 1 refers to the residual matrix R and the refinement residual matrix R * to generate an enhanced residual matrix R L in which elements of specific columns and rows of the residual matrix R are emphasized ( Step S26).
  • the analysis device 1 stores the generated enhanced residual matrix R L in the enhanced residual matrix storage unit 18 (step S27).
  • step S28 when the predetermined condition is not satisfied (No in step S28), the process returns to step S21 to execute the next iteration, and when the predetermined condition is satisfied (Yes in step S28), the process is ended. Do.
  • the analysis device of the present embodiment generates an enhanced residual matrix in which values other than the acquired topic are emphasized.
  • the analyzer according to the present embodiment can emphasize a portion that has not been acquired, even if the residual generated by the acquired topic has a large value.
  • repeating the topic analysis increases the chance that a topic not acquired as a topic is acquired in the later topic analysis.
  • rare topics that can not be acquired in the preceding topic analysis are more likely to be acquired in the subsequent topic analysis. That is, according to the present embodiment, it is possible to discover not only events with high frequency and events with medium frequency, but also patterns regarding events with low frequency.
  • the analyzer of the present embodiment is a simplification of the configuration of the analyzer 1 of the first embodiment.
  • FIG. 12 is a block diagram showing the configuration of the analyzer 2 of this embodiment.
  • the analysis device 2 includes a topic analysis unit 21, a matrix product storage unit 22, a residual matrix derivation unit 23, a residual matrix refinement unit 25, a residual matrix boost unit 27, and an enhanced residual matrix storage unit 28 is provided.
  • the connection line which mutually connects each component is an example, Comprising: The connection between each component is not limited.
  • the topic analysis unit 21 receives an input matrix as an analysis target matrix in the first iteration. In addition, in the second and subsequent iterations, the topic analysis unit 21 receives, as an analysis target matrix, the enhancement residual matrix generated in the previous iterations.
  • the topic analysis unit 21 performs topic analysis on the input analysis target matrix to generate a dictionary matrix storing topics and an index matrix indicating which topics are included and to what extent.
  • the topic analysis unit 21 calculates a matrix product of the generated index matrix and the dictionary matrix.
  • the topic analysis unit 21 stores the calculated matrix product in the matrix product storage unit 22.
  • the matrix product storage unit 22 stores the matrix product calculated by the topic analysis unit 21.
  • the residual matrix deriving unit 23 receives an analysis target matrix as an input. Further, the residual matrix deriving unit 23 refers to the matrix product storage unit 22 and inputs a matrix product corresponding to the input analysis target matrix. The residual matrix deriving unit 23 calculates the matrix difference between the analysis target matrix and the matrix product generated from the analysis target matrix. The residual matrix deriving unit 23 generates a residual matrix in which negative elements of the elements of the calculated matrix difference are replaced with 0. The residual matrix deriving unit 23 outputs the generated residual matrix to the residual matrix refinement unit 25.
  • the residual matrix refinement unit 25 obtains the residual matrix from the residual matrix derivation unit 23.
  • the residual matrix refinement unit 25 generates a refinement residual matrix by removing noise included in the residual matrix.
  • the residual matrix refinement unit 25 outputs the derived refinement residual matrix to the residual matrix boost unit 27.
  • the residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements below the first threshold with 0. Further, the residual matrix refinement unit 25 may derive a refinement residual matrix by replacing an element that becomes negative by subtracting a predetermined value with respect to each element of the residual matrix.
  • the residual matrix refinement unit 25 refers to the matrix product storage unit 22 and obtains the matrix product stored in the matrix product storage unit 22.
  • the residual matrix refinement unit 25 derives a first position corresponding to the position of an element equal to or greater than the second threshold with respect to the acquired matrix product.
  • the residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements of the first position less than or equal to the first threshold with 0 as to the residual matrix.
  • the residual matrix boost unit 27 receives a refinement residual matrix as an input.
  • the residual matrix boost unit 27 generates an enhanced residual matrix in which elements of specific rows and columns of the refinement residual matrix are enhanced.
  • the residual matrix boost unit 27 stores the generated enhanced residual matrix in the enhanced residual matrix storage unit 28.
  • the enhancement residual matrix storage unit 28 stores the enhancement residual matrix generated by the residual matrix boost unit 27.
  • FIG. 13 is a block diagram showing an example of the configuration of residual matrix refinement unit 25.
  • the residual matrix refinement unit 25 includes an input unit 51, a first cell derivation unit 52, a refinement residual matrix generation unit 53, and an output unit 54.
  • the connection between the components of the residual matrix refining unit 25 is omitted.
  • each component in FIG. 13 may be shared with another component, may be divided, or another component may be added.
  • the input unit 51 receives the matrix product stored in the matrix product storage unit 22 as an input.
  • the input unit 51 outputs the residual matrix to the first cell derivation unit 52.
  • the input unit 51 also receives the residual matrix of the analysis target matrix from the residual matrix derivation unit 23.
  • the input unit 51 outputs the residual matrix of the analysis target matrix to the refinement residual matrix generation unit 53.
  • a matrix product is input to the first cell derivation unit 52 from the input unit 51.
  • the first cell derivation unit 52 derives a position (also referred to as a first position) of a cell (also referred to as a first cell) of an element having a second threshold or more with respect to the acquired matrix product.
  • the first cell derivation unit 52 outputs the derived first position to the refinement residual matrix generation unit 53.
  • the refinement residual matrix generation unit 53 acquires a residual matrix.
  • the refinement residual matrix generation unit 53 generates a refinement residual matrix in which the cells of the elements at the first threshold and below the first threshold among the cells at the first position are replaced with 0 in the acquired residual matrix.
  • the refinement residual matrix generation unit 53 outputs the generated refinement residual matrix to the output unit 54.
  • the output unit 54 outputs the refinement residual matrix to the residual matrix boost unit 27.
  • FIG. 14 is a block diagram showing an example of the configuration of the residual matrix boost unit 27.
  • the residual matrix boost unit 27 includes an input unit 71, a statistic calculation unit 72, a selection unit 73, a diagonal matrix generation unit 74, an enhanced residual matrix calculation unit 75, and an output unit 76.
  • the connection between the components of the residual matrix boost unit 27 is omitted.
  • each component in FIG. 14 may be shared with another component, may be divided, or another component may be added.
  • the residual matrix is input from the residual matrix derivation unit 23 to the input unit 71, and the refinement residual matrix is input from the residual matrix refinement unit 25.
  • the input unit 71 outputs the refinement residual matrix to the statistic calculation unit 72, and outputs the residual matrix to the selection unit 73, the diagonal matrix generation unit 74, and the enhanced residual matrix calculation unit 75.
  • the refinement residual matrix is input to the statistic calculator 72 from the input unit 71.
  • the statistic calculator 72 calculates statistics for each row and each column of the refinement residual matrix. For example, the statistic calculator 72 calculates statistics such as L 1 norm and L 2 norm for each row and each column of the refinement residual matrix.
  • the statistic calculation unit 72 generates, for each row and each column of the refinement residual matrix, a vector (row reference vector and column reference vector) having the statistic of each row and column as an element.
  • the statistic calculation unit 72 outputs the row reference vector and the column reference vector of the refinement residual matrix to the selection unit 73.
  • the row reference vector and the column reference vector are input from the statistic calculation unit 72 to the selection unit 73.
  • the selection unit 73 selects one row and one column (reference row and reference column) from each row and each column of the refinement residual matrix on the basis of the statistics constituting the row reference vector and the column reference vector. For example, the selection unit 73 randomly selects one reference row and one reference column from a plurality of rows and columns. In addition, for example, the selection unit 73 may select a row and a column having the largest statistics constituting a plurality of vectors as a reference row and a reference column.
  • the selection unit 73 selects a row (also referred to as a selected row) corresponding to the reference row from the residual matrix, and generates a vector (also referred to as a selected row vector) composed of the values of the selected row. Similarly, the selection unit 73 selects a column (also referred to as a selected column) corresponding to the reference column from the residual matrix, and generates a vector (also referred to as a selected column vector) composed of the values of the selected column. The selection unit 73 outputs the selected row vector and the selected column vector to the diagonal matrix generation unit 74.
  • the selected row vector and the selected column vector are input from the selection unit 73 to the diagonal matrix generation unit 74, and the residual matrix is input from the input unit 71.
  • the diagonal matrix generation unit 74 calculates, for all the rows of the residual matrix, the similarity between the elements of the residual matrix and the elements of the selected row vector for each element of the same row number.
  • the diagonal matrix generator 74 calculates, for each element of the same column number, the similarity between the elements of the residual matrix and the elements of the selected column vector.
  • the diagonal matrix generation unit 74 is a matrix in which the degree of similarity calculated for each element of the selected row vector is set to a diagonal element and the non-diagonal element is set to 0 (hereinafter referred to as a first diagonal matrix) Generate Similarly, in the diagonal matrix generation unit 74, the degree of similarity calculated for each element of the selected column vector is set to the diagonal element, and the non-diagonal element is set to 0 (hereinafter referred to as the second diagonal matrix). To generate The diagonal matrix generation unit 74 outputs the generated first diagonal matrix and second diagonal matrix to the enhanced residual matrix calculation unit 75.
  • the enhanced residual matrix calculation unit 75 receives the residual matrix from the input unit 71, and receives the first diagonal matrix and the second diagonal matrix from the diagonal matrix generation unit 74.
  • the enhanced residual matrix calculator 75 generates an enhanced residual matrix by calculating a matrix product of the first diagonal matrix, the residual matrix, and the second diagonal matrix.
  • the emphasis residual matrix calculation unit 75 outputs the generated emphasis residual matrix to the output unit 76.
  • the output unit 76 stores the enhancement residual matrix generated by the enhancement residual matrix calculation unit 75 in the enhancement residual matrix storage unit 28.
  • the analysis device of the present embodiment since the low frequency events are emphasized by repeating the iteration, it becomes easy to find a topic related to the low frequency events.
  • FIG. 15 is a block diagram showing the configuration of a computer 90 as an example of the hardware configuration that implements the analysis device of each embodiment.
  • the computer 90 includes a central processing unit 91 (CPU: Central Processing Unit), a first memory 92 (ROM: Read Only Memory), and a second memory 93 (RAM: Random Access Memory).
  • the computer 90 also includes an internal storage device 94, an input / output connection circuit 95 (IOC: Input Output Circuit), and a network interface circuit 96 (NIC: Network Interface Circuit).
  • the computer 90 is also connected to the input device 98 and the display device 99 via the input / output connection circuit 95.
  • the computer 90 in FIG. 15 is a configuration example for realizing the analyzer of each embodiment, and does not limit the scope of the present invention.
  • the central processing unit 91 reads the program from the first memory 92.
  • the central processing unit 91 controls the second memory 93, the internal storage unit 94, the input / output connection circuit 95, and the network interface circuit 96 based on the read program.
  • the central processing unit 91 may use the second memory 93 or the internal storage unit 94 as a program storage area when realizing the functions of the analysis apparatus of each embodiment.
  • the central processing unit 91 may read the program from a storage medium in which the program is stored so as to be readable by the computer 90 using a storage medium reading device (not shown).
  • the central processing unit 91 receives a program from an external device (not shown) via the input / output connection circuit 95, stores the received program in the second memory 93, and based on the program stored in the second memory 93. May operate.
  • the first memory 92 is a non-volatile storage medium for storing programs executed by the central processing unit 91 and fixed data.
  • the first memory 92 can be realized by, for example, a PROM (Programmable ROM) or a flash ROM.
  • the second memory 93 is a volatile storage medium for temporarily storing programs executed by the central processing unit 91 and data.
  • the second memory 93 can be realized by, for example, a DRAM (Dynamic RAM).
  • the internal storage device 94 is a non-volatile storage medium for storing data and programs to be stored for a long time.
  • the internal storage device 94 may be operated as a temporary storage device of the central processing unit 91.
  • the internal storage device 94 can be realized by a hard disk device, a magneto-optical disk device, a solid state drive (SSD), a disk array device, a flash memory, or the like.
  • the central processing unit 91 is operable based on a program stored in at least one of the first memory 92, the internal storage device 94, and the second memory 93. That is, the central processing unit 91 can operate using a non-volatile storage medium or a volatile storage medium.
  • the computer 90 may be equipped with a disk drive (not shown) as needed.
  • the disk drive is connected to the bus 97.
  • the disk drive mediates reading of the data program from the recording medium, writing of the processing result of the computer 90 to the recording medium, and the like between the central processing unit 91 and the recording medium (program recording medium) not shown.
  • the recording medium can be realized by an optical recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc).
  • the recording medium may be realized by a semiconductor recording medium such as a Universal Serial Bus (USB) memory or a Secure Digital (SD) card, a magnetic recording medium such as a flexible disk, or another recording medium.
  • USB Universal Serial Bus
  • SD Secure Digital
  • the input / output connection circuit 95 is a circuit that mediates the exchange of data between the central processing unit 91 and input / output devices such as the input device 98 and the display device 99. That is, the input / output connection circuit 95 is an interface for connecting the computer 90 and peripheral devices based on the standards and specifications.
  • the input / output connection circuit 95 can be realized by an IO (Input Output Circuit) interface card, a USB (Universal Serial Bus) card, or the like.
  • the input device 98 is a device that receives an input instruction input by the operator of the computer 90.
  • the input device 98 is realized by a keyboard, a mouse, a touch panel or the like.
  • the display device 99 is a device that provides the operator of the computer 90 with display information.
  • the display device 99 is realized by a liquid crystal display, a projector, or the like.
  • the network interface circuit 96 is a circuit that relays data exchange between an external device (not shown) and the computer via a network. That is, the network interface circuit 96 is an interface for connecting to an external system or apparatus through a network such as the Internet or an intranet.
  • the network interface circuit 96 is realized by a LAN (Local Area Network) card.
  • the functions of the analysis device of each embodiment can be realized.
  • the analysis device of each embodiment may be configured by hardware in which a plurality of constituent elements are combined.
  • the components of the analyzer of each embodiment may be configured by at least one hardware circuit.
  • the components of the analyzer of each embodiment may be configured by combining a plurality of hardware circuits.
  • the components of the analysis device of each embodiment may be configured by a plurality of devices connected via a network.
  • the above is an example of the hardware configuration for enabling the analyzer according to each embodiment of the present invention.
  • the hardware configuration in FIG. 15 is an example of the hardware configuration for realizing the analyzer according to each embodiment, and does not limit the scope of the present invention.
  • a program that causes a computer to execute the process related to the analyzer according to each embodiment is also included in the scope of the present invention.
  • a program recording medium recording the program according to each embodiment is also included in the scope of the present invention.
  • the components of the analyzer of each embodiment can be arbitrarily combined.
  • the components of the analysis device of each embodiment may be realized by software or circuits.
  • Topic analysis means for calculating the matrix product of ⁇ and the dictionary matrix, Matrix product storage means in which the matrix product is stored; Residual matrix deriving means for obtaining at least one of the matrix products stored in the matrix product storage means and the analysis target matrix, and for deriving a residual matrix equivalent to the difference between the analysis subject matrix and the matrix product
  • Residual matrix refinement means for obtaining a refinement residual matrix by obtaining the residual matrix and removing noise contained in the residual matrix
  • the analysis target matrix and the refinement residual matrix are obtained, and based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix is derived in which elements including the topic not yet acquired are emphasized.
  • Residual matrix boosting means And an emphasizing residual matrix storage means in which the emphasizing residual matrix derived by the residual matrix boosting means is accumulated.
  • the topic analysis means The analyzer according to appendix 1, wherein the dictionary matrix and the index matrix are generated by performing nonnegative matrix factorization on the analysis target matrix.
  • the residual matrix deriving means A matrix difference corresponding to a difference between the analysis target matrix and the matrix product is calculated, and the residual matrix is derived by replacing a negative element of the elements of the matrix difference with 0, as described in Appendix 1 or 2. Analyzer.
  • the residual matrix refining unit The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements below the first threshold with 0 for each element of the residual matrix.
  • the residual matrix refining unit The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements that become negative by subtracting a predetermined value with 0 for each element of the residual matrix.
  • the residual matrix refining unit Obtaining the matrix product from the matrix product storage means; The position of the element above the second threshold among the elements of the matrix product is derived as a first position, and noise is removed for the element at the first position in the residual matrix Analyzer according to paragraph.
  • the residual matrix refining unit Obtaining the matrix product from the matrix product storage means; The analyzer according to appendix 7, wherein at least one of a specific row and column selected from the matrix product is set as a group, and the first position is set for each set group.
  • the residual matrix refining unit The analyzer according to appendix 8, wherein the refinement residual matrix is generated by performing the group-by-group sparse estimation with respect to the residual matrix.
  • the residual matrix boost unit Calculating statistics of each row and each column of the refinement residual matrix, and selecting one reference row and reference column from each of the rows and columns of the residual matrix based on the calculated statistics; Selecting a selected row and a selected column corresponding to each of the reference row and the reference column from the residual matrix; Generating a selected row vector having the value of the selected row as an element, and a selected column vector having the value of the selected column as an element; Calculating the similarity between each of the generated selected row vector and the elements of the selected column vector and each of the elements of the analysis target matrix; Generating a first diagonal matrix in which the similarity calculated for each row is set to a diagonal element, and a second diagonal matrix in which the similarity calculated for each column is set to a diagonal element;
  • the residual matrix boost unit 10 The analyzer according to appendix 10, wherein L 1 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
  • the residual matrix boost unit 10 The analyzer according to appendix 10, wherein an L 2 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
  • (Supplementary Note 14) The analysis device according to any one of appendices 1 to 13, further comprising refining residual matrix storage means for storing the refining residual matrix.
  • the topic analysis means The analyzer according to any one of appendices 1 to 14, wherein the topic analysis on the analysis target matrix including the enhanced residual matrix stored in the enhanced residual matrix storage means is repeated until a predetermined condition is satisfied. .
  • a dictionary matrix By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated.
  • Reference Signs List 1 2 analysis device 11, 21 topic analysis unit 12, 22 matrix product storage unit 13, 23 residual matrix derivation unit 14 residual matrix storage unit 15, 25 residual matrix refinement unit 16 refinement residual matrix storage unit 17, 27 Residual matrix boost unit 18, 28 Enhanced residual matrix storage unit 51 Input unit 52 First cell derivation unit 53 Refinement residual matrix generation unit 54 Output unit 71 Input unit 72 Statistics value calculation unit 73 Selection unit 74 Diagonal matrix generation unit 75 Emphasized residual matrix calculator 76 Output

Abstract

In order to enable a pattern relating to an infrequent event to be detected, this analysis device is provided with: a topic analysis unit which calculates a matrix product of an index matrix, which is generated by performing topic analysis on a matrix to be analyzed, and a dictionary matrix; a residual matrix derivation unit which acquires at least one accumulated matrix product and the matrix to be analyzed, and derives a residual matrix corresponding to a difference between the matrix to be analyzed and the at least one matrix product; a residual matrix refining unit which removes noise from the residual matrix to generate a refined residual matrix; a residual matrix boosting unit which derives an emphasized residual matrix in which each element including an uncaptured topic is emphasized, on the basis of the matrix to be analyzed and the refined residual matrix; and an emphasized residual matrix storage unit which accumulates emphasized residual matrices.

Description

分析装置、分析方法およびプログラム記録媒体Analyzer, analysis method and program recording medium
 本発明は、データに含まれるトピックを分析する分析装置、分析方法およびプログラムに関する。特に、本発明は、イベントベクトルの集合の行列データに含まれるトピックを分析する分析装置、分析方法およびプログラムに関する。 The present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in data. In particular, the present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in matrix data of a set of event vectors.
 ネットワークの侵入検知装置(IDS:Intrusion Detection System)や工場の温度管理装置などには、観測対象に関する状態や値を観測するセンサー機器が備えられている。それらのセンサー機器は、観測対象に関する状態や値(以下、観測値とよぶ)と、その観測値が観測された観測時刻を含む情報(以下、タイムスタンプと呼ぶ)とを紐付けたデータを時々刻々と生成する。このように時々刻々と生成される観測値とタイムスタンプとが紐付けられたデータをストリーム形式で配信することによって、ネットワークや工場などを常時監視することができる。このようにストリーム形式で配信される観測値とタイムスタンプとを含むデータのシーケンスは、データストリームと呼ばれる。例えば、Twitter(登録商標)などのミニブログ(つぶやき)や、プロキシサーバのログ、IDSのアラートログなどがデータストリームの一例として挙げられる。観測対象のデータストリームを取得し、観測されたデータを分析すれば、観測対象が正常であるか否かの判別や、疑わしい挙動を発見するなどの状態把握、イベントの類別や分類が可能となる。 A network intrusion detection device (IDS: Intrusion Detection System), a factory temperature control device, and the like are provided with a sensor device for observing a state or value related to an observation target. The sensor devices sometimes use data associated with the state or value related to the observation target (hereinafter referred to as observation value) and information including the observation time at which the observation value was observed (hereinafter referred to as time stamp). Generate every moment. A network, a factory, etc. can be constantly monitored by distributing data in a stream format, in which the observation value thus generated every moment and the time stamp are linked in this way. A sequence of data including observation values and timestamps distributed in a stream format in this manner is called a data stream. For example, a mini-blog (tweet) such as Twitter (registered trademark), a proxy server log, an IDS alert log, and the like can be given as an example of a data stream. If the data stream to be observed is acquired and the observed data is analyzed, it becomes possible to determine whether or not the observation target is normal, to grasp the state such as finding suspicious behavior, and to classify and classify events. .
 データストリームに含まれるデータには、様々な事象が入り混じっている。例えば、事象の発生頻度に着目すると、頻繁に発生する事象(以下、メジャーイベントと呼ぶ)、稀にしか生じない事象(以下、レアイベントと呼ぶ)、中程度の頻度で発生するイベントのように分類できる。各イベントに対応するデータ中の主要なパターンのことをトピックと呼ぶ。なお、パターンとは、共通に出現する値の組合せを意味する。特に、主要な値の組合せのパターンをトピックと呼ぶ。以下においては、パターンとトピックとを等価なものとして扱う。 The data contained in the data stream is mixed with various events. For example, focusing on the frequency of occurrence of events, such as events that occur frequently (hereinafter referred to as major events), events that rarely occur (hereinafter referred to as rare events), and events that occur with moderate frequency It can be classified. The main patterns in the data corresponding to each event are called topics. In addition, a pattern means the combination of the value which appears in common. In particular, the pattern of combinations of key values is called a topic. In the following, patterns and topics are treated as equivalent.
 イベントを表すパターンを発見することは、アラートを発報するセキュリティ機器の特性を理解したり、普段は生じないアノマリを理解したりする上で重要である。イベントを分析するためには、データストリーム形式やシーケンス形式、文書形式などのデータを、それらのデータに含まれるイベントや単語、キーワードなどを単位として、それらの頻度などを含むベクトル形式に変換する。以下においては、このようなベクトル形式のデータをイベントベクトルと呼ぶ。 Finding a pattern that represents an event is important for understanding the characteristics of the security device that issues the alert, and for understanding anomalies that do not normally occur. In order to analyze events, data such as data stream format, sequence format, and document format are converted into vector format including, for example, frequency of events, words, keywords, etc. included in the data. Hereinafter, such data in vector format is referred to as an event vector.
 一般に、データ中の主要なパターンやトピックを発見する手法としては、イベントベクトルに対する主成分分析や、特異値分解などの行列分解によるトピック分析が用いられる。特に、トピック分析は、シーケンスデータに対しても適用できる。 Generally, as a method of finding main patterns and topics in data, a principal component analysis on event vectors and a topic analysis by matrix decomposition such as singular value decomposition are used. In particular, topic analysis can also be applied to sequence data.
 しかしながら、一般的なトピック分析では、データセット中の主要なトピックを発見することに主眼に置かれているため、レアイベントをトピックとして捉えることは難しい。なぜならば、一般的なトピック分析は、データセットを十分に圧縮するトピックの集合を発見することを目的としているため、メジャーなトピックが優先的に捉えられる傾向があるためである。 However, because general topic analysis focuses on finding major topics in the data set, it is difficult to capture rare events as topics. This is because general topic analysis is aimed at finding a set of topics that sufficiently compresses a data set, so major topics tend to be captured preferentially.
 図16は、イベントに対応するトピックと、そのトピックの頻度との間の関係を示すグラフである。一般に、トピックを頻度順に整列させると、トピックの頻度にはべき乗則の関係が成り立つ。通常、図16中に破線枠で囲んで示すレアイベントの頻度は、メジャーイベントに比べると極めて小さい。また、データストリームには、イベントとして同定することのできないような単なる雑音が多数混じっていることがある。このように、一般的なトピック分析では、レアイベントは、メジャーイベントから見れば相対的に微小な値であるため、誤差のように捉えられてしまう。そのため、単純なトピック分析では、レアイベントは、誤差に紛れてトピックとして捉えられない可能性が高い。そのため、データストリームに含まれるレアイベントを雑音から明確に区別することが求められる。 FIG. 16 is a graph showing the relationship between the topic corresponding to an event and the frequency of the topic. In general, when topics are arranged in order of frequency, the frequency of topics has a power-law relationship. Usually, the frequency of the rare event shown by a dashed line frame in FIG. 16 is extremely small compared to the major event. Also, the data stream may contain many simple noises that can not be identified as events. As described above, in general topic analysis, rare events are regarded as errors because they are relatively small values from the viewpoint of major events. Therefore, in simple topic analysis, rare events are likely to be misinterpreted as topics. Therefore, it is required to clearly distinguish rare events contained in a data stream from noise.
 非特許文献1には、L-EnsNMF(Local Ensemble of Nonnegative Matrix Factorization)を用いたトピック分析法について開示されている。非特許文献1のトピック分析法では、所定の個数のトピックを行列分解で獲得し、行列分解で獲得したトピックに該当しない部分である残差行列を生成する。そして、非特許文献1のトピック分析法では、生成した残差行列に対して、未だトピックとして獲得できていない部分(イベント等)を強調(ブースティング)し、所定の個数のトピックを再びブースティングした残差行列に対して行列分解する。非特許文献1のトピック分析法では、上述の演算を設定された個数分のトピックが得られるまで再帰的に繰り返す。 Non-Patent Document 1 discloses a topic analysis method using L-Ens NMF (Local Ensemble of Non-Continuous Matrix Factorization). In the topic analysis method of Non-Patent Document 1, a predetermined number of topics are acquired by matrix decomposition, and a residual matrix which is a portion not corresponding to the topics acquired by matrix decomposition is generated. Then, in the topic analysis method of Non-Patent Document 1, a portion (event or the like) which can not be acquired as a topic is emphasized (boosted) with respect to the generated residual matrix, and a predetermined number of topics are again boosted. Matrix decomposition for the residual matrix In the topic analysis method of Non-Patent Document 1, the above operation is recursively repeated until a set number of topics are obtained.
 非特許文献2には、スパース正則化の一種であるGroup Lasso(Least Absolute Shrinkage and Selection Operator)正則化について開示されている。Group Lasso正則化は、変数のグループに対して、そのグループに属する変数を同時に0に潰す正則化である。すなわち、Group Lasso正則化は、疎になるように仕向ける作用を持つ正則化である。 Non-Patent Document 2 discloses Group Lasso (Least Absolute Shrinkage and Selection Operator) regularization, which is a type of sparse regularization. Group Lasso regularization is a regularization that simultaneously reduces variables belonging to a group to 0 for a group of variables. That is, the Group Lasso regularization is a regularization that has the effect of forcing it to become sparse.
 非特許文献3には、スパースPCA(Principal Component Analysis)の改良版であるJSPCA(Joint Sparse PCA)およびGJSPCA(Joint Sparse PCA)について開示されている。非特許文献3の手法では、Group Lasso正則化を用いて、主成分分析(PCA)における上位のコンポーネントで変数グループを形成する。一般に、主成分分析では、上位のコンポーネントほどデータの主要な成分を持つ。そのため、非特許文献3の手法によって形成される変数グループは、多くのデータが共通に持つ特徴をより多く含んだ密なパターンとして推定される。 Non-Patent Document 3 discloses Joint Sparse PCA (JSPCA) and Joint Sparse PCA (GJSPCA), which are improved versions of Principal Component Analysis (PCA). In the method of Non-Patent Document 3, Group Lasso regularization is used to form variable groups with high-order components in principal component analysis (PCA). Generally, in principal component analysis, the higher order components have more major components of data. Therefore, the variable group formed by the method of Non-Patent Document 3 is estimated as a dense pattern including more features that many data have in common.
 非特許文献4および非特許文献5には、Group Lasso正則化を用いた行列分解について開示されている。非特許文献4および非特許文献5の手法によれば、主成分分析で得られるパターンがノイズや異常値の影響を受けにくい頑健性を持った行列分解が可能となる。 Non-Patent Document 4 and Non-Patent Document 5 disclose matrix decomposition using Group Lasso regularization. According to the methods of Non-Patent Document 4 and Non-Patent Document 5, it is possible to perform matrix decomposition with robustness in which the pattern obtained by principal component analysis is less susceptible to noise and outliers.
 非特許文献1の手法によれば、メジャーなトピックだけでなく、頻度が中程度のトピックについても獲得できる。ところで、非特許文献1の手法は、トピック分析の対象とする行列のうち、トピックとして獲得できていない部分を強調するのみであり、残差とノイズとを区別するような仕組みはない。そのため、非特許文献1の手法では、頻度の小さい残差がノイズとして捉えられたり、頻度の小さい残差にノイズが混入したりする。すなわち、非特許文献1の手法には、頻度の小さい残差からトピックを獲得することが困難であるという問題点があった。 According to the method of Non-Patent Document 1, not only major topics but also topics with medium frequency can be acquired. By the way, the method of Non-Patent Document 1 only emphasizes a portion of a matrix to be a target of topic analysis that can not be acquired as a topic, and there is no mechanism to distinguish residuals from noise. Therefore, in the method of Non-Patent Document 1, a low-frequency residual is captured as noise, or noise is mixed in the low-frequency residual. That is, the method of Non-Patent Document 1 has a problem that it is difficult to obtain a topic from a residual having a low frequency.
 非特許文献3の手法によれば、多くのデータに定常的に発現するパターンとして変数グループを推定できる。しかしながら、非特許文献3の手法では、単にデータセット全体における特徴的なパターンが上位のコンポーネントに集められるだけであり、レアイベントに関するパターンを獲得することは難しいという問題点があった。 According to the method of Non-Patent Document 3, variable groups can be estimated as patterns that are constantly expressed in many data. However, in the method of Non-Patent Document 3, there is a problem that only characteristic patterns in the entire data set are collected in the upper component, and it is difficult to obtain patterns related to rare events.
 本発明の目的は、上述した課題を解決し、頻度の小さいイベントに関するパターンを発見することを可能とする分析装置を提供することにある。 An object of the present invention is to solve the above-mentioned problems and to provide an analysis device that makes it possible to find out patterns related to infrequent events.
 本発明の一態様の分析装置は、分析対象行列に対してトピック分析を行うことによって、分析対象行列に含まれるトピックを格納する辞書行列と、分析対象行列がトピックを含む程度を示す索引行列とを生成し、索引行列と辞書行列との行列積を計算するトピック分析部と、行列積が蓄積される行列積記憶部と、行列積記憶部に蓄積された少なくとも一つの行列積と分析対象行列とを取得し、分析対象行列と行列積との差分に相当する残差行列を導出する残差行列導出部と、残差行列を取得し、残差行列に含まれるノイズを除去することによって精錬残差行列を生成する残差行列精錬部と、分析対象行列と精錬残差行列とを取得し、分析対象行列および精錬残差行列に基づいて、未だ獲得されていないトピックを含む要素が強調された強調残差行列を導出する残差行列ブースト部と、残差行列ブースト部によって導出された強調残差行列が蓄積される強調残差行列記憶部とを備える。 An analysis apparatus according to an aspect of the present invention performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics. , And a matrix product storage unit in which matrix products are stored, and at least one matrix product and analysis target matrix stored in the matrix product storage unit. And a residual matrix deriving unit that derives a residual matrix corresponding to the difference between the analysis target matrix and the matrix product, and acquiring the residual matrix, and removing the noise included in the residual matrix to perform refinement. Obtain a residual matrix refinement unit that generates a residual matrix, obtain an analysis target matrix and a refinement residual matrix, and based on the analysis target matrix and the refinement residual matrix, elements including topics that have not yet been acquired are emphasized Stressed Comprising a residual matrix boost unit for deriving a difference matrix, and enhancement residual matrix storage unit that residual enhancement residual matrix derived by matrix boosting unit is accumulated.
 本発明の一態様の分析方法においては、分析対象行列に対してトピック分析を行うことによって、分析対象行列に含まれるトピックを格納する辞書行列と、分析対象行列がトピックを含む程度を示す索引行列とを生成し、索引行列と辞書行列との行列積を計算し、行列積を蓄積し、分析対象行列と、蓄積された少なくとも一つの行列積と分析対象行列との差分に相当する残差行列を導出し、残差行列に含まれるノイズを除去することによって精錬残差行列を生成し、分析対象行列および精錬残差行列に基づいて、未だ獲得されていないトピックを含む要素が強調された強調残差行列を導出し、強調残差行列を分析対象行列に含めて蓄積させる。 In the analysis method according to one aspect of the present invention, by performing topic analysis on an analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics , The matrix product of the index matrix and the dictionary matrix, the matrix product is accumulated, and the residual matrix corresponding to the difference between the analysis target matrix, and the stored at least one matrix product and the analysis target matrix To generate a refinement residual matrix by removing the noise contained in the residual matrix, and based on the analysis target matrix and the refinement residual matrix, emphasizing the elements including topics that have not been acquired yet The residual matrix is derived, and the enhanced residual matrix is included in the analysis target matrix and accumulated.
 本発明の一態様のプログラムは、分析対象行列に対してトピック分析を行うことによって、分析対象行列に含まれるトピックを格納する辞書行列と、分析対象行列がトピックを含む程度を示す索引行列とを生成する処理と、索引行列と辞書行列との行列積を計算する処理と、行列積を蓄積する処理と、分析対象行列と、蓄積された少なくとも一つの行列積と分析対象行列との差分に相当する残差行列を導出する処理と、残差行列に含まれるノイズを除去することによって精錬残差行列を生成する処理と、分析対象行列および精錬残差行列に基づいて、未だ獲得されていないトピックを含む要素が強調された強調残差行列を導出する処理と、強調残差行列を分析対象行列に含めて蓄積させる処理とをコンピュータに実行させる。 A program according to an aspect of the present invention performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes topics. Equivalent to the process of generating, the process of calculating the matrix product of the index matrix and the dictionary matrix, the process of accumulating the matrix product, the analysis target matrix, and the difference between the stored at least one matrix product and the analysis target matrix Processing for deriving a residual matrix, processing for generating a refinement residual matrix by removing noise included in the residual matrix, and a topic not yet obtained based on the analysis target matrix and the refinement residual matrix The computer is caused to execute a process of deriving an enhanced residual matrix in which elements including X are enhanced, and a process of including the enhanced residual matrix in the analysis target matrix and accumulating the matrix.
 本発明によれば、頻度の小さいイベントに関するパターンを発見することを可能とする分析装置を提供することが可能になる。 According to the present invention, it is possible to provide an analysis device that makes it possible to discover patterns related to infrequent events.
本発明の第1の実施形態に係る分析装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the analyzer which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係る分析装置が入力とする入力行列の一例を示す概念図である。It is a conceptual diagram which shows an example of the input matrix which the analyzer which concerns on the 1st Embodiment of this invention inputs. 本発明の第1の実施形態に係る分析装置のトピック分析部が生成する行列積の一例を示す概念図である。It is a conceptual diagram which shows an example of the matrix product which the topic analysis part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. 本発明の第1の実施形態に係る分析装置の残差行列導出部が導出する残差行列の一例を示す概念図である。It is a conceptual diagram which shows an example of the remainder matrix which the remainder matrix derivation | leading-out part of the analyzer which concerns on the 1st Embodiment of this invention derives. 本発明の第1の実施形態に係る分析装置の残差行列精錬部が第1の位置の要素を1に設定した行列の一例を示す概念図である。It is a conceptual diagram which shows an example of the matrix which the residual-matrix refinement | purification part of the analyzer which concerns on the 1st Embodiment of this invention set the element of 1st position to one. 本発明の第1の実施形態に係る分析装置の残差行列精錬部が生成する精錬残差行列の一例を示す概念図であるIt is a conceptual diagram which shows an example of the refinement | purification residual matrix which the remainder matrix refinement part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. 本発明の第1の実施形態に係る分析装置の残差行列ブースト部が算出するL1ノルムの値の一例を示す概念図である。Residual matrix boost portion of the first analysis according to the embodiment apparatus is a conceptual diagram showing an example of L 1 norm of the values calculated by the present invention. 本発明の第1の実施形態に係る分析装置の残差行列ブースト部が行と列とを選択する一例を示す概念図である。It is a conceptual diagram which shows an example in which the remainder matrix boost part of the analyzer which concerns on the 1st Embodiment of this invention selects a row and a column. 本発明の第1の実施形態に係る分析装置の残差行列ブースト部が生成する強調残差行列の一例を示す概念図であるIt is a conceptual diagram which shows an example of the emphasizing residual matrix which the residual-matrix boost part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. 本発明の第1の実施形態に係る分析装置の一回目のイテレーションにおける動作について説明するためのフローチャートである。It is a flowchart for demonstrating the operation | movement in the first iteration of the analyzer which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係る分析装置の二回目以降のイテレーションにおける動作について説明するためのフローチャートである。It is a flowchart for demonstrating the operation | movement in the second or subsequent iteration of the analyzer which concerns on the 1st Embodiment of this invention. 本発明の第2の実施形態に係る分析装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the analyzer which concerns on the 2nd Embodiment of this invention. 本発明の第2の実施形態に係る分析装置の残差行列精錬部の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the remainder matrix refinement | purification part of the analyzer which concerns on the 2nd Embodiment of this invention. 本発明の第2の実施形態に係る分析装置の残差行列ブースト部の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the remainder matrix boost part of the analyzer which concerns on the 2nd Embodiment of this invention. 本発明の各実施形態に係る分析装置を実現するハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions which implement | achieves the analyzer which concerns on each embodiment of this invention. 頻度順に整列されたトピックの頻度に関するグラフである。It is a graph regarding the frequency of the topic arranged in order of frequency.
 以下に、本発明を実施するための形態について図面を用いて説明する。ただし、以下に述べる実施形態には、本発明を実施するために技術的に好ましい限定がされているが、発明の範囲を以下に限定するものではない。なお、以下の実施形態の説明に用いる全図においては、特に理由がない限り、同様箇所には同一符号を付す。また、以下の実施形態において、同様の構成・動作に関しては繰り返しの説明を省略する場合がある。また、図面中の矢印の向きは、一例を示すものであり、ブロック間の信号の向きを限定するものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below are technically preferable limitations for carrying out the present invention, but the scope of the invention is not limited to the following. In all the drawings used in the following description of the embodiment, the same reference numerals are given to the same parts unless there is a particular reason. In the following embodiments, the same configuration and operation may not be repeatedly described. Further, the direction of the arrow in the drawing shows an example, and does not limit the direction of the signal between the blocks.
 (第1の実施形態)
 まず、本発明の第1の実施形態に係る分析装置の構成について図面を参照しながら説明する。以下においては、本実施形態の分析装置が非負値行列因子分解(以下、NMF:Nonnegative Matrix Factorization)を用いてトピック分析する例について説明するが、本実施形態の分析装置が行うトピック分析はNMFに限定されない。
First Embodiment
First, the configuration of an analyzer according to a first embodiment of the present invention will be described with reference to the drawings. In the following, an example in which the analyzer of the present embodiment performs topic analysis using nonnegative matrix factorization (hereinafter referred to as NMF: Nonnegative Matrix Factorization) will be described, but the topic analysis performed by the analyzer of the present embodiment is NMF. It is not limited.
 図1は、本実施形態の分析装置1の構成の一例を示すブロック図である。図1のように、分析装置1は、トピック分析部11、行列積記憶部12、残差行列導出部13、残差行列記憶部14、残差行列精錬部15、精錬残差行列記憶部16、残差行列ブースト部17、および強調残差行列記憶部18を備える。 FIG. 1 is a block diagram showing an example of the configuration of the analyzer 1 of the present embodiment. As shown in FIG. 1, the analysis device 1 includes a topic analysis unit 11, a matrix product storage unit 12, a residual matrix derivation unit 13, a residual matrix storage unit 14, a residual matrix refinement unit 15, and a refinement residual matrix storage unit 16. , Residual matrix boost unit 17, and enhanced residual matrix storage unit 18.
 図1の例では、記憶装置100に記憶された入力行列Aを分析する例を示す。入力行列Aは、外部システムを構成する記憶装置100からネットワークを介して取得されるように構成してもよいし、分析装置1に併設させた記憶装置100から取得されるように構成してもよい。 The example of FIG. 1 illustrates an example of analyzing the input matrix A stored in the storage device 100. The input matrix A may be configured to be acquired from the storage device 100 configuring the external system via the network, or may be configured to be acquired from the storage device 100 provided in parallel to the analysis apparatus 1. Good.
 トピック分析部11は、分析対象の行列(以下、分析対象行列と呼ぶ)に関して、NMFを用いたトピック分析を行う。本実施形態において、分析対象行列は、入力行列Aおよび強調残差行列RLのいずれかである。強調残差行列RLは、各イテレーションにおいて未捕捉部分が強調された行列である。各イテレーションにおいて生成される強調残差行列RLは、強調残差行列記憶部18に記憶される。トピック分析部11は、入力行列Aに関するトピック分析を所定の条件が満たされるまで繰り返す。例えば、トピック分析部11は、入力行列Aに関するトピック分析を所定の回数(m回)繰り返す(mは自然数)。また、例えば、トピック分析部11は、獲得したトピックの数が所定の数量に達するまでトピック分析を繰り返す。 The topic analysis unit 11 performs topic analysis using NMF on a matrix to be analyzed (hereinafter, referred to as analysis target matrix). In the present embodiment, the analysis target matrix is either the input matrix A or the enhanced residual matrix R L. The emphasis residual matrix R L is a matrix in which the uncaptured part is emphasized in each iteration. The enhancement residual matrix R L generated in each iteration is stored in the enhancement residual matrix storage unit 18. The topic analysis unit 11 repeats the topic analysis on the input matrix A until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the topic analysis on the input matrix A a predetermined number of times (m times) (m is a natural number). Also, for example, the topic analysis unit 11 repeats topic analysis until the number of acquired topics reaches a predetermined number.
 図2は、入力行列Aの一例を示す概念図である。図2においては、各要素の大小に応じて、各要素に対応するセルを濃淡で表現するとともに、大きな値のセルにはL、中程度の値のセルにはM、小さい値のセルにはSと表記している。ただし、空白のセルの値は0である。また、これ以降、同様の図において、非常に小さい値のセルにはVSと表記する場合もある。また、これ以降、同様の図において、同じ表記でありながらセルの濃度や模様が異なる場合もある。以下において、入力行列Aは、I行J列の行列であるものとする(I、Jは自然数)。また、入力行列Aのi行目j列目の要素の値をA[i,j]と表記する(i、jは、自然数)。行列の要素の値の表記法は、他の行列においても同様である。 FIG. 2 is a conceptual diagram showing an example of the input matrix A. As shown in FIG. In FIG. 2, cells corresponding to each element are expressed by shading according to the size of each element, L is for large value cells, M is for medium value cells, and is for small value cells. It is written as S. However, the value of the blank cell is 0. Also, in the same drawings, cells with very small values may be denoted as VS from this point onward. Also, from this point onward, in the same figure, there are cases where the density and pattern of cells are different although the same notation is used. In the following, it is assumed that the input matrix A is a matrix of I rows and J columns (I and J are natural numbers). Further, the value of the element of the i-th row and the j-th column of the input matrix A is expressed as A [i, j] (i and j are natural numbers). The notation of the values of matrix elements is the same as in other matrices.
 まず、トピック分析部11は、入力行列Aを入力すると、入力された入力行列Aに対して1回目のイテレーションを開始する。そして、トピック分析部11は、2回目以降のイテレーションを開始する際には、強調残差行列記憶部18を参照し、それより前のイテレーションに基づいて生成された強調残差行列RLに対してトピック分析を繰り返す。 First, when the topic analysis unit 11 receives an input matrix A, the topic analysis unit 11 starts a first iteration on the input matrix A that has been input. Then, when starting the second and subsequent iterations, the topic analysis unit 11 refers to the enhanced residual matrix storage unit 18 and applies to the enhanced residual matrix R L generated based on the previous iterations. Repeat the topic analysis.
 トピック分析部11は、所定のトピック数kに達するまでトピック分析を繰り返す(kは自然数)。言い換えると、トピック分析部11は、所定のトピック数kよりも小さい数ksのトピックをNMFによって獲得する(ksは自然数)。ただし、kおよびksは、ks<kの関係を満たす自然数である。実用的には、ksとして、1や2、k/2などを設定できる。なお、トピック分析部11は、強調残差行列記憶部18に記憶される強調残差行列RLが空になるまでトピック分析を繰り返してもよい。なお、行列が空になるとは、全てのセルの値が0になることである。 The topic analysis unit 11 repeats topic analysis until k reaches a predetermined topic number k (k is a natural number). In other words, the topic analysis unit 11 acquires a number ks of topics smaller than a predetermined number k of topics by NMF (ks is a natural number). However, k and ks are natural numbers which satisfy the relation of ks <k. Practically, 1 or 2 or k / 2 can be set as ks. The topic analysis unit 11 may repeat the topic analysis until the enhancement residual matrix R L stored in the enhancement residual matrix storage unit 18 becomes empty. The matrix being empty means that the values of all the cells are zero.
 トピック分析部11は、分析対象行列のトピック分析を行って、トピックを格納する辞書行列(Dictionary Matrix)と、どのトピックをどの程度含むのかを示す索引行列(Membership Matrix)とを生成する。以下において、m回目のイテレーションで生成される辞書行列および索引行列のそれぞれを辞書行列Hmおよび索引行列Wmと表記する(mは1以上の整数)。 The topic analysis unit 11 performs topic analysis of the analysis target matrix, and generates a dictionary matrix (Dictionary Matrix) storing topics, and a index matrix (Membership Matrix) indicating which topics are included and to what extent. In the following, each of the dictionary matrix and index matrix generated in the m-th iteration is denoted as dictionary matrix H m and index matrix W m (m is an integer of 1 or more).
 トピック分析部11は、生成した索引行列Wmと辞書行列Hmとの行列積Wmmを計算する。トピック分析部11は、算出した行列積Wmmを行列積記憶部12に記憶させる。 The topic analysis unit 11 calculates a matrix product W m H m of the generated index matrix W m and the dictionary matrix H m . The topic analysis unit 11 stores the calculated matrix product W m H m in the matrix product storage unit 12.
 図3は、トピック分析部11が図2の入力行列Aを用いて生成する索引行列Wmと辞書行列Hmとの行列積Wmmの一例である。図3の例の行列積Wmmでは、入力行列A(図2)の要素のうち値が小さかったセル(S)の値が0になっている。 FIG. 3 is an example of a matrix product W m H m of the index matrix W m and the dictionary matrix H m generated by the topic analysis unit 11 using the input matrix A of FIG. In the matrix product W m H m in the example of FIG. 3, the value of the cell (S) having the smaller value among the elements of the input matrix A (FIG. 2) is 0.
 残差行列導出部13は、入力行列Aおよび強調残差行列RLのいずれかと、トピック分析部11が生成した行列積Wmmとの差を行列差Roとして算出する。そして、残差行列導出部13は、NMFに悪影響を与える負の値を行列差Roから取り除くために、行列差Roに含まれる全ての負の値を0に置き換える。このように、行列差Roの要素のうち負の値の要素を0に置換した行列のことを残差行列Rと呼ぶ。 The residual matrix deriving unit 13 calculates a difference between any one of the input matrix A and the enhanced residual matrix R L and the matrix product W m H m generated by the topic analysis unit 11 as a matrix difference Ro. Then, the residual matrix deriving unit 13 replaces all negative values included in the matrix difference Ro with zero in order to remove from the matrix difference Ro negative values that adversely affect the NMF. As described above, a matrix obtained by replacing elements of negative values of the elements of the matrix difference Ro with 0 is called a residual matrix R.
 図4は、残差行列Rの一例を示す概念図である。残差行列導出部13は、入力行列A(図2)と行列積Wmm(図3)との差である行列差Roを算出し、行列差Roの要素のうち負の要素を0に置き換えることによって残差行列R(図4)を生成する。残差行列導出部13は、生成した残差行列Rを残差行列記憶部14に記憶させる。 FIG. 4 is a conceptual diagram showing an example of the residual matrix R. The residual matrix deriving unit 13 calculates a matrix difference Ro which is a difference between the input matrix A (FIG. 2) and the matrix product W m H m (FIG. 3), and sets a negative element of the elements of the matrix difference Ro to zero. The residual matrix R (FIG. 4) is generated by replacing. The residual matrix deriving unit 13 stores the generated residual matrix R in the residual matrix storage unit 14.
 残差行列精錬部15は、残差行列記憶部14から残差行列Rを取得する。残差行列精錬部15は、残差行列Rに含まれるノイズを除去することによって精錬残差行列R*を生成する。残差行列精錬部15は、生成した精錬残差行列R*を精錬残差行列記憶部16に記憶させる。 The residual matrix refinement unit 15 obtains the residual matrix R from the residual matrix storage unit 14. The residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise included in the residual matrix R. The residual matrix refinement unit 15 stores the generated refinement residual matrix R * in the refinement residual matrix storage unit 16.
 残差行列精錬部15は、残差行列Rの各要素に関して、閾値θ1(第1の閾値とも呼ぶ)以下の要素を0に置換した精錬残差行列R*を生成する。また、残差行列精錬部15は、残差行列Rの各要素に関して、所定の値を減じることによって負となる要素を0に置換して精錬残差行列R*を生成してもよい。また、残差行列精錬部15は、Lasso(Least Absolute Shrinkage and Selection Operator)などのスパース推定法を用いて精錬残差行列R*を生成してもよい。 The residual matrix refinement unit 15 generates a refinement residual matrix R * in which elements below the threshold θ 1 (also referred to as a first threshold) are replaced with 0 for each element of the residual matrix R. In addition, the residual matrix refinement unit 15 may generate, for each element of the residual matrix R, an element that becomes negative by subtracting a predetermined value with 0 to generate a refinement residual matrix R *. Alternatively, the residual matrix refinement unit 15 may generate a refinement residual matrix R * using a sparse estimation method such as Lasso (Least Absolute Shrinkage and Selection Operator).
 また、残差行列精錬部15は、ノイズを除去する対象となる要素を間引いてから精錬残差行列を生成してもよい。この場合、残差行列精錬部15は、行列積記憶部12を参照し、行列積記憶部12に記憶された行列積Wmmを取得する。残差行列精錬部15は、取得した行列積Wmmに関して、閾値θ2(第2の閾値とも呼ぶ)以上の要素の位置(第1の位置とも呼ぶ)を導出する。残差行列精錬部15は、残差行列Rの第1の位置LWHの要素のノイズを除去することによって精錬残差行列R*を生成する。 Alternatively, the residual matrix refinement unit 15 may generate a refinement residual matrix after thinning out the elements to be subjected to noise removal. In this case, the residual matrix refinement unit 15 refers to the matrix product storage unit 12 and obtains the matrix product W m H m stored in the matrix product storage unit 12. The residual matrix refinement unit 15 derives the position (also referred to as a first position) of an element having a threshold θ 2 (also referred to as a second threshold) or more with respect to the acquired matrix product W m H m . The residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise from the elements of the first position L WH of the residual matrix R.
 例えば、残差行列精錬部15は、行列積の特定の行や列をグループに設定し、当該グループに属する全ての要素から所定の値を減じることによって負となる要素を0に置換してもよい。言い換えると、残差行列精錬部15は、設定されたグループごとに、ノイズを除去する対象となる要素の間引量を設定してもよい。残差行列精錬部15は、行列積の特定の行や列をグループとみなす際に、Group Lassoのようなグループごとのスパース推定法を用いて、ノイズを除去する対象となる要素の間引量を設定してもよい。 For example, even if the residual matrix refinement unit 15 sets a specific row or column of a matrix product to a group and subtracts a predetermined value from all the elements belonging to the group, the element becoming negative is replaced with 0. Good. In other words, the residual matrix refinement unit 15 may set, for each of the set groups, the thinning amount of the element to be subjected to noise removal. The residual matrix refinement unit 15 uses the group-by-group sparse estimation method such as Group Lasso when considering a specific row or column of matrix product as a group, and the thinning amount of the element to be subjected to noise removal May be set.
 図5は、残差行列精錬部15が導出した第1の位置LWHのセルに1を設定する例である。残差行列Rの要素のうち第1の位置LWHの要素について第1の閾値と比較すれば、残差行列Rの全ての要素を第1の閾値と比較するよりも演算量を減らすことができる。なお、残差行列精錬部15は、第1の位置LWHのセルを、図5のような行列ではなく、セルのリストやハッシュで導出してもよい。 FIG. 5 is an example in which 1 is set in the cell of the first position L WH derived by the residual matrix refinement unit 15. When the element of the first position L WH among the elements of the residual matrix R is compared with the first threshold, the amount of operation is reduced compared to comparing all the elements of the residual matrix R with the first threshold it can. The residual matrix refinement unit 15 may derive the cell of the first position L WH using a list of cells or a hash instead of the matrix as shown in FIG.
 図6は、図4の残差行列Rに基づいて生成した精錬残差行列R*の一例である。図6の例では、第1の位置LWHの要素が第2の閾値以下のセルを白く塗りつぶしている。なお、白く塗りつぶしたセルの値は0である。精錬残差行列R*では、それ以前のイテレーションにおいてトピックが獲得されているセルに残された微小な値が除外されている。すなわち、精錬残差行列R*を用いれば、それ以降のイテレーションのトピック分析において、既にトピックが獲得されたセルに残された微小な値を除外することができるため、トピック分析によって生じた誤差と、レアなイベントとを区別できる。すなわち、精錬残差行列R*は、レアなイベントをトピックとして獲得される機会を高める。 FIG. 6 is an example of a refinement residual matrix R * generated based on the residual matrix R of FIG. In the example of FIG. 6, the element whose first position L WH is below the second threshold is filled with white. Note that the value of the white-filled cell is 0. The refinement residual matrix R * excludes the small values left in the cell in which the topic has been acquired in the previous iteration. In other words, using refinement residual matrix R *, it is possible to exclude minute values left in cells in which a topic has already been acquired in topic analysis of subsequent iterations, so errors generated by topic analysis and Can distinguish between rare events. That is, the refinement residual matrix R * enhances the chance of acquiring rare events as topics.
 残差行列ブースト部17は、精錬残差行列記憶部16から精錬残差行列R*を取得する。残差行列ブースト部17は、以下のような手順で、取得した精錬残差行列R*の特定の行および列の値を強調した強調残差行列RLを生成する。 The residual matrix boost unit 17 obtains a refinement residual matrix R * from the refinement residual matrix storage unit 16. The residual matrix boost unit 17 generates an enhanced residual matrix R L in which the values of specific rows and columns of the acquired refinement residual matrix R * are enhanced in the following procedure.
 まず、残差行列ブースト部17は、精錬残差行列R*の各行に対してL1ノルムを計算する。なお、L1ノルムとは、精錬残差行列R*の各行の要素の絶対値の和である。残差行列ブースト部17は、未捕捉度行列Uのi行目のL1ノルムをPr[i]とする行参照ベクトルPrを生成する(式1)。ただし、式1において、Iは、精錬残差行列R*の行数である。
Pr=(Pr[1]、・・・、Pr[I])・・・(1)
 残差行列ブースト部17は、行参照ベクトルPrについて、行iがPr[i]の重みで選択される確率分布とみなし、1つの行iを選択する。ここで、残差行列ブースト部17によって選択された行を参照行i*とする。なお、残差行列ブースト部17は、参照行i*をランダムに選択してもよいし、L1ノルムが最大の行iを参照行i*として選択してもよい。
First, the residual matrix boost unit 17 calculates the L 1 norm for each row of the refinement residual matrix R *. The L 1 norm is the sum of the absolute values of the elements of each row of the refinement residual matrix R *. The residual matrix boost unit 17 generates a row reference vector Pr in which the L 1 norm of the ith row of the uncaptured degree matrix U is Pr [i] (Equation 1). Where I is the number of rows of the refinement residual matrix R *.
Pr = (Pr [1], ..., Pr [I]) ... (1)
For the row reference vector Pr, the residual matrix boosting unit 17 considers that the row i is a probability distribution selected with the weight of Pr [i], and selects one row i. Here, the row selected by the residual matrix boost unit 17 is referred to as a reference row i *. The residual matrix boosting unit 17 may select the reference row i * at random or may select the row i with the largest L 1 norm as the reference row i *.
 同様に、残差行列ブースト部17は、精錬残差行列R*の各列に対してL1ノルムを計算する。残差行列ブースト部17は、精錬残差行列R*のj列目のL1ノルムをPc[j]とする列参照ベクトルPcを生成する(式2)。ただし、式2において、Jは、未捕捉度行列Uの列数である。
Pc=(Pc[1]、・・・、Pc[J])・・・(2)
 残差行列ブースト部17は、列参照ベクトルPcについて、列jがPc[j]の重みで選択される確率分布とみなし、1つの列jを選択する。ここで、残差行列ブースト部17によって選択された列を参照列j*とする。なお、残差行列ブースト部17は、ランダムに参照列j*を選択してもよいし、L1ノルムが最大の列を参照列j*として選択してもよい。
Similarly, residual matrix boost unit 17 calculates the L 1 norm for each column of refinement residual matrix R *. The residual matrix boosting unit 17 generates a column reference vector Pc in which the L 1 norm of the j-th column of the refinement residual matrix R * is Pc [j] (Equation 2). However, in Equation 2, J is the number of columns of the uncaptured degree matrix U.
Pc = (Pc [1],..., Pc [J]) (2)
For the column reference vector Pc, the residual matrix boost unit 17 considers that the column j is a probability distribution selected with the weight of Pc [j], and selects one column j. Here, the column selected by the residual matrix boost unit 17 is referred to as a reference column j *. The residual matrix boosting unit 17 may randomly select the reference sequence j *, or may select a sequence having the largest L 1 norm as the reference sequence j *.
 上記においては、L1ノルムを基準として参照行や参照列が選択される例を示したが、残差行列ブースト部17は、行や列ごとに計算できる任意の統計量を基準として参照行や参照列を選択してもよい。例えば、残差行列ブースト部17は、L2ノルムを用いて参照行や参照列を選択してもよい。 In the above, the reference row and the reference column are selected based on the L 1 norm, but the residual matrix boost unit 17 selects the reference row or the reference row based on any statistic that can be calculated for each row or column. A reference column may be selected. For example, the residual matrix boosting unit 17 may select the reference row or the reference column using the L 2 norm.
 L2ノルムを用いて参照行や参照列を選択する場合、精錬残差行列R*に多くの値が含まれる行および列ほど、参照行i*および参照列j*として選択される確率が高い。 When selecting a reference row or reference column using L 2 norm, the probability that the row and column containing more values in the refinement residual matrix R * are more likely to be selected as reference row i * and reference column j * is higher .
 図7は、図6の精錬残差行列R*の行方向および列方向に計算したL1ノルムの値の一例を示す概念図である。図7の各行の右側の数字は、各行のL1ノルムの値である。同様に、図8の各列の上側の数字は、各列のL1ノルムの値である。すなわち、図7の各行の右側の数字が列参照ベクトルPrの要素であり、図7の各列の上側の数字が行参照ベクトルPcの要素である。 FIG. 7 is a conceptual diagram showing an example of L 1 norm values calculated in the row direction and column direction of the refinement residual matrix R * of FIG. The numbers to the right of each line in FIG. 7 are the L 1 norm value of each line. Similarly, the upper numbers in each column of FIG. 8 are the values of L 1 norm in each column. That is, the numbers on the right side of each row of FIG. 7 are elements of the column reference vector Pr, and the numbers on the upper side of each column of FIG. 7 are elements of the row reference vector Pc.
 図8は、図7のL1ノルムの値に基づいて、参照行i*および参照列j*に対応する行および列(以下、選択行および選択列と呼ぶ)が精錬残差行列R*から選択される例を示す概念図である。図8の例では、精錬残差行列R*の1行目が選択行i*として選択され、精錬残差行列R*の2列目が選択列j*として選択される。 FIG. 8 shows that rows and columns (hereinafter referred to as selected rows and selected columns) corresponding to reference row i * and reference column j * are refined based on the values of L 1 norm of FIG. It is a conceptual diagram which shows the example selected. In the example of FIG. 8, the first row of the refinement residual matrix R * is selected as the selection row i *, and the second column of the refinement residual matrix R * is selected as the selection sequence j *.
 続いて、残差行列ブースト部17は、入力行列Aの選択行i*の要素の値からなる選択行ベクトルA[i*,:]を生成する。そして、残差行列ブースト部17は、選択行ベクトルA[i*,:]と、入力行列Aの全ての行との類似度sim(i*,i)を算出する。同様に、残差行列ブースト部17は、残差行列Rの選択列j*の要素の値からなる選択列ベクトルR[:,j*]を生成する。そして、残差行列ブースト部17は、選択列ベクトルR[:,j*]と入力行列Aの全ての列との類似度sim(j*,j)を算出する。例えば、残差行列ブースト部17は、コサイン類似度を類似度sim(・,・)として用いる。ただし、残差行列ブースト部17は、コサイン類似度以外の手法を用いて類似度sim(・,・)を算出してもよい。なお、2回目以降のイテレーションにおいては、分析対象である強調残差行列について類似度sim(・,・)を算出すればよい。 Subsequently, the residual matrix boosting unit 17 generates a selected row vector A [i * ,:] composed of the values of the elements of the selected row i * of the input matrix A. Then, the residual matrix boosting unit 17 calculates the similarity sim (i *, i) between the selected row vector A [i * ,:] and all the rows of the input matrix A. Similarly, residual matrix boost unit 17 generates a selected column vector R [:, j *] composed of the values of the elements of selected column j * of residual matrix R. Then, the residual matrix boost unit 17 calculates the similarity sim (j *, j) between the selected column vector R [:, j *] and all the columns of the input matrix A. For example, the residual matrix boost unit 17 uses cosine similarity as the similarity sim (···). However, the residual matrix boost unit 17 may calculate the similarity sim (·, ·) using a method other than cosine similarity. In the second and subsequent iterations, the similarity sim (·, ·) may be calculated for the enhanced residual matrix to be analyzed.
 残差行列ブースト部17は、対角要素Dc[i,i]に類似度sim(i*,i)が設定され、非対角要素に0が設定されたI行I列の対角行列Dcを生成する。同様に、残差行列ブースト部17は、対角要素Dr[j,j]に類似度sim(j*,j)が設定され、非対角要素に0が設定されたJ行J列の対角行列Drを生成する。 The residual matrix boosting unit 17 is a diagonal matrix of I rows and I columns in which the similarity sim (i *, i) is set to the diagonal element D c [i, i] and 0 is set to the nondiagonal elements. Generate D c . Similarly, residual matrix boosting unit 17 sets J to j in which the similarity sim (j *, j) is set to diagonal element D r [j, j] and 0 is set to non-diagonal elements. Generate diagonal matrix D r .
 そして、残差行列ブースト部17は、対角行列Dcと残差行列Rと対角行列Drとの行列積DcRDrを計算する。行列積DcRDrが強調残差行列RLである。残差行列ブースト部17は、計算結果である強調残差行列RLを強調残差行列記憶部18に記憶させる。 Then, the residual matrix boost unit 17 calculates a matrix product D c RD r of the diagonal matrix D c , the residual matrix R, and the diagonal matrix D r . The matrix product D c RD r is the enhanced residual matrix R L. The residual matrix boost unit 17 stores the enhanced residual matrix R L , which is the calculation result, in the enhanced residual matrix storage unit 18.
 対角行列Dcには、選択行i*および入力行列Aにおいて、選択行i*と類似する行の値を強調し、それ以外の行の値を減衰させる効果がある。同様に、対角行列Drには、選択列j*および入力行列Aにおいて、選択列j*と類似する列の値を強調し、それ以外の列の値を減衰させる効果がある。 The diagonal matrix D c has an effect of emphasizing the values of rows similar to the selected row i * in the selected row i * and the input matrix A and attenuating the values of the other rows. Similarly, in the selection matrix j * and the input matrix A, the diagonal matrix D r has an effect of emphasizing the values of columns similar to the selection column j * and attenuating the values of the other columns.
 図9は、残差行列ブースト部17が生成する強調残差行列RLの一例である。図9の例では、j*列目と類似性が低い他の列の値は減衰されており、結果として1行目の値の減衰量が大きい(MからSに減衰)。また、図9の例では、i*行目(1行目)と類似性が低い他の行も減衰しているが、3行目および4行目(3列目)は、元の値(S)が小さいために減衰される効果が小さい。 FIG. 9 is an example of the enhanced residual matrix R L generated by the residual matrix boost unit 17. In the example of FIG. 9, the values of the other columns having low similarity with the j * column are attenuated, and as a result, the attenuation of the values in the first row is large (M to S). Also, in the example of FIG. 9, the other rows having low similarity to the i * th row (the first row) are also attenuated, but the third and fourth rows (the third column) have the original values (the S) is small, so the effect to be attenuated is small.
 以上のように、残差行列精錬部15が生成する精錬残差行列R*は、トピック分析部11で獲得済みのトピック以外の値、すなわちトピック分析部11が未だ獲得していないトピックの値を強調する。さらに、残差行列精錬部15は、トピック分析部11が獲得済みのトピックによって生じた残差が所定の閾値よりも小さいセルの値を0に潰し、トピックを獲得することによって生じた誤差を排除する。その結果、後続のイテレーションにおいて、トピック分析部11が、未だトピックとして獲得されていない箇所を獲得する機会が高くなる。すなわち、一回目のトピック分析で獲得できなかったレアなトピックが獲得される可能性が高くなる。 As described above, the refinement residual matrix R * generated by the residual matrix refinement unit 15 is a value other than the topic acquired by the topic analysis unit 11, that is, the value of the topic not acquired by the topic analysis unit 11 yet. Emphasize. Furthermore, the residual matrix refinement unit 15 eliminates the error caused by acquiring the topic by squashing the value of the cell whose residual caused by the topic acquired by the topic analysis unit 11 is smaller than a predetermined threshold to 0. Do. As a result, in the subsequent iteration, the topic analysis unit 11 has a high opportunity to acquire a part that has not been acquired as a topic. That is, the possibility of acquiring rare topics which could not be acquired in the first topic analysis is increased.
 二回目以降のイテレーションにおいて、トピック分析部11は、強調残差行列RLを入力とし、所定の条件が満たされるまで上述の処理を繰り返す。例えば、トピック分析部11は、所定のトピック数kのトピックが得られるか、強調残差行列RLが空になるまで、上述の処理を繰り返す。 In the second and subsequent iterations, the topic analysis unit 11 receives the enhanced residual matrix R L as input, and repeats the above-described processing until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the above-described process until a predetermined topic number k of topics is obtained or the enhanced residual matrix R L becomes empty.
 以上が、本実施形態の分析装置1の構成についての説明である。なお、本実施形態の分析装置1の構成は、上述の構成に限定されない。例えば、いずれかの構成要素の機能を別の構成要素に負わせるように構成したり、いずれかの構成要素の機能を他の構成要素と共有させるように構成したりしてもよい。また、例えば、別々の構成要素に分担させている機能を単一の構成要素に負わせるように構成してもよい。また、例えば、各構成要素の機能に別の機能を追加するように構成してもよい。 The above is the description of the configuration of the analyzer 1 of the present embodiment. In addition, the structure of the analyzer 1 of this embodiment is not limited to the above-mentioned structure. For example, the function of one component may be assigned to another component, or the function of one component may be shared with another component. Also, for example, a single component may be configured to have a function shared by separate components. Also, for example, another function may be added to the function of each component.
 (動作)
 次に、本実施形態の分析装置1の動作(分析方法とも呼ぶ)について図面を参照しながら説明する。図10は、分析装置1による一回目のイテレーションについて説明するためのフローチャートである。図11は、分析装置1による二回目以降のイテレーションについて説明するためのフローチャートである。ただし、図10および図11のフローチャートに沿った説明においては、分析装置1を動作主体として説明する。
(Operation)
Next, the operation (also referred to as an analysis method) of the analyzer 1 of the present embodiment will be described with reference to the drawings. FIG. 10 is a flowchart for explaining the first iteration by the analyzer 1. FIG. 11 is a flowchart for explaining the second and subsequent iterations by the analyzer 1. However, in the description along the flowcharts of FIGS. 10 and 11, the analyzer 1 will be described as an operation subject.
 図10において、まず、分析装置1は、入力行列Aを入力とする(ステップS11)。 In FIG. 10, first, the analyzer 1 receives an input matrix A (step S11).
 次に、分析装置1は、入力行列Aのトピック分析を実行する(ステップS12)。このとき、分析装置1は、入力行列Aの辞書行列H1および索引行列W1を生成する。 Next, the analyzer 1 performs topic analysis of the input matrix A (step S12). At this time, the analysis apparatus 1 generates a dictionary matrix H 1 and an index matrix W 1 of the input matrix A.
 次に、分析装置1は、索引行列W1と辞書行列H1との行列積W11を算出する(ステップS13)。 Then, the analyzer 1 calculates the matrix product W 1 H 1 between the index matrices W 1 and dictionary matrix H 1 (step S13).
 次に、分析装置1は、入力行列Aと行列積W11との差として行列差Roを算出し、算出した行列差Roの要素のうち負の要素を0に置き換えた残差行列Rを生成する(ステップS14)。 Next, the analyzer 1 calculates a matrix difference Ro as a difference between the input matrix A and the matrix product W 1 H 1, and a residual matrix R in which negative elements of the elements of the calculated matrix difference Ro are replaced with 0. Are generated (step S14).
 次に、分析装置1は、行列積W11を取得し、取得した行列積W11に関して、閾値θ2以上の値を持つセルの位置に相当する第1の位置LWHを導出する(ステップS15)。 Then, the analyzer 1 obtains the matrix product W 1 H 1, with respect to the obtained matrix product W 1 H 1, deriving a first position L WH corresponding to the position of a cell having a threshold theta 2 or more values (Step S15).
 次に、分析装置1は、残差行列Rにおける第1の位置LWHのセルの要素が閾値θ1以下である場合に、そのセルの要素を0に置換した精錬残差行列R*を生成する(ステップS16)。 Next, when the element of the cell at the first position L WH in the residual matrix R is less than or equal to the threshold θ 1 , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S16).
 次に、分析装置1は、入力行列Aと精錬残差行列R*とを参照して、入力行列Aの特定の列と行の要素を強調した強調残差行列RLを生成する(ステップS17)。 Next, the analyzer 1 refers to the input matrix A and the refinement residual matrix R * to generate an enhanced residual matrix R L in which elements of specific columns and rows of the input matrix A are enhanced (step S17). ).
 そして、分析装置1は、生成した強調残差行列RLを強調残差行列記憶部18に記憶させる(ステップS18)。ステップS18の後は、図11のフローチャートのAに進む。 Then, the analysis device 1 stores the generated enhanced residual matrix R L in the enhanced residual matrix storage unit 18 (step S18). After step S18, the process proceeds to A of the flowchart of FIG.
 以上が、図10のフローチャートに沿った分析装置1による1回目のイテレーションについての説明である。 The above is the description of the first iteration by the analyzer 1 along the flowchart of FIG.
 続いて、図11のフローチャートに沿って、分析装置1による2回目以降のイテレーションについて説明する。なお、図11のフローチャートに沿った説明においては、m回目のイテレーションについて説明する(mは2以上の整数)。 Subsequently, the second and subsequent iterations by the analyzer 1 will be described along the flowchart of FIG. In the description according to the flowchart of FIG. 11, the m-th iteration will be described (m is an integer of 2 or more).
 図11において、まず、分析装置1は、強調残差行列記憶部18に記憶された強調残差行列RLのトピック分析を実行する(ステップS21)。このとき、分析装置1は、辞書行列Hmおよび索引行列Wmを生成する。 In FIG. 11, first, the analysis device 1 executes topic analysis of the enhanced residual matrix R L stored in the enhanced residual matrix storage unit 18 (step S21). At this time, the analyzer 1 generates a dictionary matrix H m and an index matrix W m .
 次に、分析装置1は、索引行列Wmと辞書行列Hmとの行列積Wmmを算出する(ステップS22)。 Next, the analyzer 1 calculates a matrix product W m H m of the index matrix W m and the dictionary matrix H m (step S22).
 次に、分析装置1は、強調残差行列RLと行列積Wmmとの差として行列差Roを算出し、算出した行列差Roの要素のうち負の要素を0に置き換えた残差行列Rを生成する(ステップS23)。 Next, the analysis device 1 calculates the matrix difference Ro as the difference between the enhancement residual matrix R L and the matrix product W m H m, and leaves the negative element of the elements of the calculated matrix difference Ro replaced with 0. A difference matrix R is generated (step S23).
 次に、分析装置1は、行列積W11を取得し、取得した行列積W11に関して、閾値θ2以上の要素を持つセルの位置に相当する第1の位置LWHを導出する(ステップS24)。 Then, the analyzer 1 obtains the matrix product W 1 H 1, with respect to the obtained matrix product W 1 H 1, deriving a first position L WH corresponding to the position of a cell having a threshold theta 2 or more elements (Step S24).
 次に、分析装置1は、残差行列Rにおける第1の位置LWHのセルの要素が閾値θ1以下である場合に、そのセルの要素を0に置換した精錬残差行列R*を生成する(ステップS25)。 Next, when the element of the cell at the first position L WH in the residual matrix R is less than or equal to the threshold θ 1 , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S25).
 次に、分析装置1は、残差行列Rと精錬残差行列R*とを参照して、残差行列Rの特定の列と行の要素を強調した強調残差行列RLを生成する(ステップS26)。 Next, the analyzer 1 refers to the residual matrix R and the refinement residual matrix R * to generate an enhanced residual matrix R L in which elements of specific columns and rows of the residual matrix R are emphasized ( Step S26).
 そして、分析装置1は、生成した強調残差行列RLを強調残差行列記憶部18に記憶させる(ステップS27)。 Then, the analysis device 1 stores the generated enhanced residual matrix R L in the enhanced residual matrix storage unit 18 (step S27).
 ここで、所定の条件が満たされていない場合(ステップS28でNo)はステップS21に戻って次のイテレーションを実行し、所定の条件が満たされた場合(ステップS28でYes)は処理を終了とする。 Here, when the predetermined condition is not satisfied (No in step S28), the process returns to step S21 to execute the next iteration, and when the predetermined condition is satisfied (Yes in step S28), the process is ended. Do.
 以上が、図11のフローチャートに沿った説明である。なお、図11のフローチャートに沿った処理は一例であって、本実施形態の分析装置1の動作を限定するものではない。例えば、いずれかのステップを複数のステップに分割してもよい。また、例えば、別々のステップに分けている処理を単一のステップで実行するように構成してもよい。また、例えば、各ステップに別の処理を追加するように構成してもよい。 The above is the description according to the flowchart of FIG. In addition, the process along the flowchart of FIG. 11 is an example, Comprising: Operation | movement of the analyzer 1 of this embodiment is not limited. For example, any step may be divided into a plurality of steps. Also, for example, processing divided into separate steps may be configured to be performed in a single step. Also, for example, another process may be added to each step.
 以上のように、本実施形態の分析装置は、獲得済みのトピック以外の値が強調された強調残差行列を生成する。本実施形態の分析装置は、獲得済みのトピックによって生じた残差が大きな値を有していたとしても、未だ獲得されていない部分を強調できる。 As described above, the analysis device of the present embodiment generates an enhanced residual matrix in which values other than the acquired topic are emphasized. The analyzer according to the present embodiment can emphasize a portion that has not been acquired, even if the residual generated by the acquired topic has a large value.
 本実施形態によれば、トピック分析を繰り返すことによって、未だトピックとして獲得されていないトピックが後段のトピック分析で獲得される機会が高くなる。その結果、本実施形態によれば、先行するトピック分析で獲得できなかったレアなトピックが、後続するトピック分析において獲得される可能性が高くなる。すなわち、本実施形態によれば、頻度の大きいイベントや頻度が中程度のイベントだけでなく、頻度の小さいイベントに関するパターンを発見することが可能となる。 According to the present embodiment, repeating the topic analysis increases the chance that a topic not acquired as a topic is acquired in the later topic analysis. As a result, according to this embodiment, rare topics that can not be acquired in the preceding topic analysis are more likely to be acquired in the subsequent topic analysis. That is, according to the present embodiment, it is possible to discover not only events with high frequency and events with medium frequency, but also patterns regarding events with low frequency.
 (第2の実施形態)
 次に、本発明の第2の実施形態に係る分析装置について図面を参照しながら説明する。本実施形態の分析装置は、第1の実施形態の分析装置1の構成を簡略化したものである。
Second Embodiment
Next, an analyzer according to a second embodiment of the present invention will be described with reference to the drawings. The analyzer of the present embodiment is a simplification of the configuration of the analyzer 1 of the first embodiment.
 図12は、本実施形態の分析装置2の構成を示すブロック図である。図12のように、分析装置2は、トピック分析部21、行列積記憶部22、残差行列導出部23、残差行列精錬部25、残差行列ブースト部27、および強調残差行列記憶部28を備える。なお、各構成要素を互いに接続する接続線は一例であって、各構成要素間の接続を限定するものではない。 FIG. 12 is a block diagram showing the configuration of the analyzer 2 of this embodiment. As illustrated in FIG. 12, the analysis device 2 includes a topic analysis unit 21, a matrix product storage unit 22, a residual matrix derivation unit 23, a residual matrix refinement unit 25, a residual matrix boost unit 27, and an enhanced residual matrix storage unit 28 is provided. In addition, the connection line which mutually connects each component is an example, Comprising: The connection between each component is not limited.
 トピック分析部21は、一回目のイテレーションにおいて、分析対象行列として入力行列を入力とする。また、トピック分析部21は、二回目以降のイテレーションにおいて、それ以前のイテレーションで生成された強調残差行列を分析対象行列として入力とする。 The topic analysis unit 21 receives an input matrix as an analysis target matrix in the first iteration. In addition, in the second and subsequent iterations, the topic analysis unit 21 receives, as an analysis target matrix, the enhancement residual matrix generated in the previous iterations.
 トピック分析部21は、入力された分析対象行列をトピック分析することによって、トピックを格納する辞書行列と、どのトピックをどの程度含むのかを示す索引行列とを生成する。トピック分析部21は、生成した索引行列と辞書行列との行列積を計算する。トピック分析部21は、算出した行列積を行列積記憶部22に記憶させる。 The topic analysis unit 21 performs topic analysis on the input analysis target matrix to generate a dictionary matrix storing topics and an index matrix indicating which topics are included and to what extent. The topic analysis unit 21 calculates a matrix product of the generated index matrix and the dictionary matrix. The topic analysis unit 21 stores the calculated matrix product in the matrix product storage unit 22.
 行列積記憶部22には、トピック分析部21が算出した行列積が記憶される。 The matrix product storage unit 22 stores the matrix product calculated by the topic analysis unit 21.
 残差行列導出部23は、分析対象行列を入力とする。また、残差行列導出部23は、行列積記憶部22を参照して、入力した分析対象行列に対応する行列積を入力する。残差行列導出部23は、分析対象行列と、その分析対象行列から生成された行列積との行列差を算出する。残差行列導出部23は、算出した行列差の要素のうち負の要素を0に置換した残差行列を生成する。残差行列導出部23は、生成した残差行列を残差行列精錬部25に出力する。 The residual matrix deriving unit 23 receives an analysis target matrix as an input. Further, the residual matrix deriving unit 23 refers to the matrix product storage unit 22 and inputs a matrix product corresponding to the input analysis target matrix. The residual matrix deriving unit 23 calculates the matrix difference between the analysis target matrix and the matrix product generated from the analysis target matrix. The residual matrix deriving unit 23 generates a residual matrix in which negative elements of the elements of the calculated matrix difference are replaced with 0. The residual matrix deriving unit 23 outputs the generated residual matrix to the residual matrix refinement unit 25.
 残差行列精錬部25は、残差行列導出部23から残差行列を取得する。残差行列精錬部25は、残差行列に含まれるノイズを除去することによって精錬残差行列を生成する。残差行列精錬部25は、導出した精錬残差行列を残差行列ブースト部27に出力する。 The residual matrix refinement unit 25 obtains the residual matrix from the residual matrix derivation unit 23. The residual matrix refinement unit 25 generates a refinement residual matrix by removing noise included in the residual matrix. The residual matrix refinement unit 25 outputs the derived refinement residual matrix to the residual matrix boost unit 27.
 例えば、残差行列精錬部25は、第1の閾値以下の要素を0に置換することによって精錬残差行列を導出する。また、残差行列精錬部25は、残差行列の各要素に関して、所定の値を減じることによって負となる要素を0に置換して精錬残差行列を導出してもよい。 For example, the residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements below the first threshold with 0. Further, the residual matrix refinement unit 25 may derive a refinement residual matrix by replacing an element that becomes negative by subtracting a predetermined value with respect to each element of the residual matrix.
 例えば、残差行列精錬部25は、行列積記憶部22を参照し、行列積記憶部22に記憶された行列積を取得する。残差行列精錬部25は、取得した行列積に関して、第2の閾値以上の要素の位置に相当する第1の位置を導出する。残差行列精錬部25は、残差行列に関して、第1の位置の要素のうち第1の閾値以下の要素を0に置換することによって精錬残差行列を導出する。 For example, the residual matrix refinement unit 25 refers to the matrix product storage unit 22 and obtains the matrix product stored in the matrix product storage unit 22. The residual matrix refinement unit 25 derives a first position corresponding to the position of an element equal to or greater than the second threshold with respect to the acquired matrix product. The residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements of the first position less than or equal to the first threshold with 0 as to the residual matrix.
 残差行列ブースト部27は、精錬残差行列を入力とする。残差行列ブースト部27は、精錬残差行列の特定の行および列の要素を強調した強調残差行列を生成する。残差行列ブースト部27は、生成した強調残差行列を強調残差行列記憶部28に記憶させる。 The residual matrix boost unit 27 receives a refinement residual matrix as an input. The residual matrix boost unit 27 generates an enhanced residual matrix in which elements of specific rows and columns of the refinement residual matrix are enhanced. The residual matrix boost unit 27 stores the generated enhanced residual matrix in the enhanced residual matrix storage unit 28.
 強調残差行列記憶部28には、残差行列ブースト部27によって生成された強調残差行列が記憶される。 The enhancement residual matrix storage unit 28 stores the enhancement residual matrix generated by the residual matrix boost unit 27.
 以上が、本実施形態の分析装置2の構成についての説明である。 The above is the description of the configuration of the analyzer 2 of the present embodiment.
 〔残差行列精錬部〕
 次に、分析装置2が備える残差行列精錬部25の詳細構成について図面を用いて説明する。以下においては、第1の位置を導出し、残差行列に関して、その第1の位置に対応する位置の要素からノイズを除去する例を示す。
[Remaining matrix refining section]
Next, the detailed configuration of the residual matrix refinement unit 25 included in the analysis device 2 will be described using the drawings. The following shows an example of deriving a first position and removing noise from the element of the position corresponding to the first position in the residual matrix.
 図13は、残差行列精錬部25の構成の一例を示すブロック図である。図13のように、残差行列精錬部25は、入力部51、第1セル導出部52、精錬残差行列生成部53、および出力部54を有する。なお、図13においては、残差行列精錬部25の各構成要素間の接続関係は省略する。また、図13の各構成要素は、他の構成要素と共通化してもよいし、分割してもよいし、別の構成要素を追加してもよい。 FIG. 13 is a block diagram showing an example of the configuration of residual matrix refinement unit 25. Referring to FIG. As illustrated in FIG. 13, the residual matrix refinement unit 25 includes an input unit 51, a first cell derivation unit 52, a refinement residual matrix generation unit 53, and an output unit 54. In FIG. 13, the connection between the components of the residual matrix refining unit 25 is omitted. In addition, each component in FIG. 13 may be shared with another component, may be divided, or another component may be added.
 入力部51は、行列積記憶部22に蓄積される行列積を入力とする。入力部51は、残差行列を第1セル導出部52に出力する。また、入力部51は、分析対象行列の残差行列を残差行列導出部23から入力とする。入力部51は、分析対象行列の残差行列を精錬残差行列生成部53に出力する。 The input unit 51 receives the matrix product stored in the matrix product storage unit 22 as an input. The input unit 51 outputs the residual matrix to the first cell derivation unit 52. The input unit 51 also receives the residual matrix of the analysis target matrix from the residual matrix derivation unit 23. The input unit 51 outputs the residual matrix of the analysis target matrix to the refinement residual matrix generation unit 53.
 第1セル導出部52には、入力部51から行列積が入力される。第1セル導出部52は、取得した行列積に関して、第2の閾値以上の要素のセル(第1セルとも呼ぶ)の位置(第1の位置とも呼ぶ)を導出する。第1セル導出部52は、導出した第1の位置を精錬残差行列生成部53に出力する。 A matrix product is input to the first cell derivation unit 52 from the input unit 51. The first cell derivation unit 52 derives a position (also referred to as a first position) of a cell (also referred to as a first cell) of an element having a second threshold or more with respect to the acquired matrix product. The first cell derivation unit 52 outputs the derived first position to the refinement residual matrix generation unit 53.
 精錬残差行列生成部53は、残差行列を取得する。精錬残差行列生成部53は、取得した残差行列に関して、第1の位置のセルのうち第1の閾値以下の要素のセルを0に置換した精錬残差行列を生成する。精錬残差行列生成部53は、生成した精錬残差行列を出力部54に出力する。 The refinement residual matrix generation unit 53 acquires a residual matrix. The refinement residual matrix generation unit 53 generates a refinement residual matrix in which the cells of the elements at the first threshold and below the first threshold among the cells at the first position are replaced with 0 in the acquired residual matrix. The refinement residual matrix generation unit 53 outputs the generated refinement residual matrix to the output unit 54.
 出力部54は、残差行列ブースト部27に精錬残差行列を出力する。 The output unit 54 outputs the refinement residual matrix to the residual matrix boost unit 27.
 以上が、残差行列精錬部25の構成についての説明である。 The above is the description of the configuration of the residual matrix refinement unit 25.
 〔残差行列ブースト部〕
 次に、分析装置2が備える残差行列ブースト部27の詳細構成について図面を用いて説明する。図14は、残差行列ブースト部27の構成の一例を示すブロック図である。図14のように、残差行列ブースト部27は、入力部71、統計量計算部72、選択部73、対角行列生成部74、強調残差行列算出部75、出力部76を有する。なお、図14においては、残差行列ブースト部27の各構成要素間の接続関係は省略する。また、図14の各構成要素は、他の構成要素と共通化してもよいし、分割してもよいし、別の構成要素を追加してもよい。
[Residual matrix boost unit]
Next, the detailed configuration of the residual matrix boost unit 27 included in the analysis device 2 will be described using the drawings. FIG. 14 is a block diagram showing an example of the configuration of the residual matrix boost unit 27. As shown in FIG. As shown in FIG. 14, the residual matrix boost unit 27 includes an input unit 71, a statistic calculation unit 72, a selection unit 73, a diagonal matrix generation unit 74, an enhanced residual matrix calculation unit 75, and an output unit 76. In FIG. 14, the connection between the components of the residual matrix boost unit 27 is omitted. Also, each component in FIG. 14 may be shared with another component, may be divided, or another component may be added.
 入力部71には、残差行列導出部23から残差行列が入力され、残差行列精錬部25から精錬残差行列が入力される。入力部71は、統計量計算部72に精錬残差行列を出力し、選択部73、対角行列生成部74、および強調残差行列算出部75に残差行列を出力する。 The residual matrix is input from the residual matrix derivation unit 23 to the input unit 71, and the refinement residual matrix is input from the residual matrix refinement unit 25. The input unit 71 outputs the refinement residual matrix to the statistic calculation unit 72, and outputs the residual matrix to the selection unit 73, the diagonal matrix generation unit 74, and the enhanced residual matrix calculation unit 75.
 統計量計算部72には、入力部71から精錬残差行列が入力される。統計量計算部72は、精錬残差行列の各行および各列に関して統計量を計算する。例えば、統計量計算部72は、精錬残差行列の各行および各列に対してL1ノルムやL2ノルムなどの統計量を計算する。統計量計算部72は、精錬残差行列の各行および各列に関して、それぞれの行および列の統計量を要素とするベクトル(行参照ベクトルおよび列参照ベクトル)を生成する。統計量計算部72は、精錬残差行列の行参照ベクトルおよび列参照ベクトルを選択部73に出力する。 The refinement residual matrix is input to the statistic calculator 72 from the input unit 71. The statistic calculator 72 calculates statistics for each row and each column of the refinement residual matrix. For example, the statistic calculator 72 calculates statistics such as L 1 norm and L 2 norm for each row and each column of the refinement residual matrix. The statistic calculation unit 72 generates, for each row and each column of the refinement residual matrix, a vector (row reference vector and column reference vector) having the statistic of each row and column as an element. The statistic calculation unit 72 outputs the row reference vector and the column reference vector of the refinement residual matrix to the selection unit 73.
 選択部73には、行参照ベクトルおよび列参照ベクトルが統計量計算部72から入力される。選択部73は、行参照ベクトルおよび列参照ベクトルを構成する統計量を基準として、精錬残差行列の各行および各列のそれぞれから行および列(参照行および参照列)を一つずつ選択する。例えば、選択部73は、複数の行および列の中からランダムに一つずつ参照行および参照列を選択する。また、例えば、選択部73は、複数のベクトルを構成する統計量が最大の行および列を参照行および参照列として選択してもよい。 The row reference vector and the column reference vector are input from the statistic calculation unit 72 to the selection unit 73. The selection unit 73 selects one row and one column (reference row and reference column) from each row and each column of the refinement residual matrix on the basis of the statistics constituting the row reference vector and the column reference vector. For example, the selection unit 73 randomly selects one reference row and one reference column from a plurality of rows and columns. In addition, for example, the selection unit 73 may select a row and a column having the largest statistics constituting a plurality of vectors as a reference row and a reference column.
 選択部73は、参照行に対応する行(選択行とも呼ぶ)を残差行列から選択し、選択行の値からなるベクトル(選択行ベクトルとも呼ぶ)を生成する。同様に、選択部73は、参照列に対応する列(選択列とも呼ぶ)を残差行列から選択し、選択列の値からなるベクトル(選択列ベクトルとも呼ぶ)を生成する。選択部73は、選択行ベクトルおよび選択列ベクトルを対角行列生成部74に出力する。 The selection unit 73 selects a row (also referred to as a selected row) corresponding to the reference row from the residual matrix, and generates a vector (also referred to as a selected row vector) composed of the values of the selected row. Similarly, the selection unit 73 selects a column (also referred to as a selected column) corresponding to the reference column from the residual matrix, and generates a vector (also referred to as a selected column vector) composed of the values of the selected column. The selection unit 73 outputs the selected row vector and the selected column vector to the diagonal matrix generation unit 74.
 対角行列生成部74には、選択行ベクトルおよび選択列ベクトルが選択部73から入力され、入力部71から残差行列が入力される。対角行列生成部74は、残差行列の全ての行に関して、同じ行番号の要素ごとに残差行列の要素と選択行ベクトルの要素との類似度を算出する。同様に、対角行列生成部74は、残差行列の全ての列に関して、同じ列番号の要素ごとに残差行列の要素と選択列ベクトルの要素との類似度を算出する。 The selected row vector and the selected column vector are input from the selection unit 73 to the diagonal matrix generation unit 74, and the residual matrix is input from the input unit 71. The diagonal matrix generation unit 74 calculates, for all the rows of the residual matrix, the similarity between the elements of the residual matrix and the elements of the selected row vector for each element of the same row number. Similarly, for all the columns of the residual matrix, the diagonal matrix generator 74 calculates, for each element of the same column number, the similarity between the elements of the residual matrix and the elements of the selected column vector.
 対角行列生成部74は、選択行ベクトルの各要素について算出される類似度が対角要素に設定され、非対角要素に0が設定された行列(以下、第1対角行列と呼ぶ)を生成する。同様に、対角行列生成部74は、選択列ベクトルの各要素について算出される類似度が対角要素に設定され、非対角要素に0が設定された行列(以下、第2対角行列と呼ぶ)を生成する。対角行列生成部74は、生成した第1対角行列および第2対角行列を強調残差行列算出部75に出力する。 The diagonal matrix generation unit 74 is a matrix in which the degree of similarity calculated for each element of the selected row vector is set to a diagonal element and the non-diagonal element is set to 0 (hereinafter referred to as a first diagonal matrix) Generate Similarly, in the diagonal matrix generation unit 74, the degree of similarity calculated for each element of the selected column vector is set to the diagonal element, and the non-diagonal element is set to 0 (hereinafter referred to as the second diagonal matrix). To generate The diagonal matrix generation unit 74 outputs the generated first diagonal matrix and second diagonal matrix to the enhanced residual matrix calculation unit 75.
 強調残差行列算出部75は、入力部71から残差行列を入力とし、対角行列生成部74から第1対角行列および第2対角行列を入力とする。強調残差行列算出部75は、第1対角行列と残差行列と第2対角行列との行列積を計算することによって強調残差行列を生成する。強調残差行列算出部75は、生成した強調残差行列を出力部76に出力する。 The enhanced residual matrix calculation unit 75 receives the residual matrix from the input unit 71, and receives the first diagonal matrix and the second diagonal matrix from the diagonal matrix generation unit 74. The enhanced residual matrix calculator 75 generates an enhanced residual matrix by calculating a matrix product of the first diagonal matrix, the residual matrix, and the second diagonal matrix. The emphasis residual matrix calculation unit 75 outputs the generated emphasis residual matrix to the output unit 76.
 出力部76は、強調残差行列算出部75によって生成された強調残差行列を強調残差行列記憶部28に記憶させる。 The output unit 76 stores the enhancement residual matrix generated by the enhancement residual matrix calculation unit 75 in the enhancement residual matrix storage unit 28.
 以上が、残差行列ブースト部27の構成についての説明である。 The above is the description of the configuration of the residual matrix boosting unit 27.
 以上のように、本実施形態の分析装置によれば、イテレーションを繰り返すことによって頻度の小さいイベントが強調されるため、頻度の小さいイベントに関するトピックが発見しやすくなる。 As described above, according to the analysis device of the present embodiment, since the low frequency events are emphasized by repeating the iteration, it becomes easy to find a topic related to the low frequency events.
 (ハードウェア構成)
 ここで、本発明の各実施形態に係る分析装置を実現するハードウェア構成について図面を参照しながら説明する。以下においては、各実施形態の分析装置を実現するハードウェア構成としてコンピュータを挙げる。ただし、各実施形態の分析装置を実現するハードウェア構成は、以下の構成に限定されない。
(Hardware configuration)
Here, a hardware configuration for realizing the analyzer according to each embodiment of the present invention will be described with reference to the drawings. In the following, a computer will be mentioned as a hardware configuration for realizing the analysis device of each embodiment. However, the hardware configuration for realizing the analysis device of each embodiment is not limited to the following configuration.
 図15は、各実施形態の分析装置を実現するハードウェア構成の一例のコンピュータ90の構成を示すブロック図である。 FIG. 15 is a block diagram showing the configuration of a computer 90 as an example of the hardware configuration that implements the analysis device of each embodiment.
 図15のように、コンピュータ90は、中央演算装置91(CPU:Central Processing Unit)、第1メモリ92(ROM:Read Only Memory)、第2メモリ93(RAM:Random Access Memory)を含む。また、コンピュータ90は、内部記憶装置94、入出力接続回路95(IOC:Input Output Circuit)、ネットワークインターフェース回路96(NIC:Network Interface Circuit)を含む。また、コンピュータ90は、入出力接続回路95を介して、入力機器98および表示機器99に接続される。なお、図15のコンピュータ90は、各実施形態の分析装置を実現するための構成例であって、本発明の範囲を限定するものではない。 As shown in FIG. 15, the computer 90 includes a central processing unit 91 (CPU: Central Processing Unit), a first memory 92 (ROM: Read Only Memory), and a second memory 93 (RAM: Random Access Memory). The computer 90 also includes an internal storage device 94, an input / output connection circuit 95 (IOC: Input Output Circuit), and a network interface circuit 96 (NIC: Network Interface Circuit). The computer 90 is also connected to the input device 98 and the display device 99 via the input / output connection circuit 95. The computer 90 in FIG. 15 is a configuration example for realizing the analyzer of each embodiment, and does not limit the scope of the present invention.
 中央演算装置91は、第1メモリ92からプログラムを読み込む。中央演算装置91は、読み込んだプログラムに基づいて、第2メモリ93、内部記憶装置94、入出力接続回路95、およびネットワークインターフェース回路96を制御する。なお、中央演算装置91は、各実施形態の分析装置の有する機能を実現する際に、第2メモリ93または内部記憶装置94をプログラムの記憶領域として使用してもよい。 The central processing unit 91 reads the program from the first memory 92. The central processing unit 91 controls the second memory 93, the internal storage unit 94, the input / output connection circuit 95, and the network interface circuit 96 based on the read program. The central processing unit 91 may use the second memory 93 or the internal storage unit 94 as a program storage area when realizing the functions of the analysis apparatus of each embodiment.
 また、中央演算装置91は、コンピュータ90によって読み取り可能にプログラムが記憶させた記憶媒体から、図示しない記憶媒体読み取り装置を用いてプログラムを読み込んでもよい。また、中央演算装置91は、入出力接続回路95を介して、図示しない外部の装置からプログラムを受け取り、受け取ったプログラムを第2メモリ93に保存し、第2メモリ93に保存されたプログラムに基づいて動作してもよい。 The central processing unit 91 may read the program from a storage medium in which the program is stored so as to be readable by the computer 90 using a storage medium reading device (not shown). The central processing unit 91 receives a program from an external device (not shown) via the input / output connection circuit 95, stores the received program in the second memory 93, and based on the program stored in the second memory 93. May operate.
 第1メモリ92は、中央演算装置91が実行するプログラムや固定的なデータを記憶させる不揮発性の記憶媒体である。第1メモリ92は、例えば、PROM(Programmable ROM)やフラッシュROMによって実現できる。 The first memory 92 is a non-volatile storage medium for storing programs executed by the central processing unit 91 and fixed data. The first memory 92 can be realized by, for example, a PROM (Programmable ROM) or a flash ROM.
 第2メモリ93は、中央演算装置91が実行するプログラムおよびデータを一時的に記憶させる揮発性の記憶媒体である。第2メモリ93は、例えば、DRAM(Dynamic RAM)によって実現できる。 The second memory 93 is a volatile storage medium for temporarily storing programs executed by the central processing unit 91 and data. The second memory 93 can be realized by, for example, a DRAM (Dynamic RAM).
 内部記憶装置94は、長期的に保存させるデータやプログラムを記憶させるための不揮発性の記憶媒体である。なお、内部記憶装置94は、中央演算装置91の一時記憶装置として動作させてもよい。例えば、内部記憶装置94は、ハードディスク装置や光磁気ディスク装置、SSD(Solid State Drive)、ディスクアレイ装置、フラッシュメモリなどによって実現できる。 The internal storage device 94 is a non-volatile storage medium for storing data and programs to be stored for a long time. The internal storage device 94 may be operated as a temporary storage device of the central processing unit 91. For example, the internal storage device 94 can be realized by a hard disk device, a magneto-optical disk device, a solid state drive (SSD), a disk array device, a flash memory, or the like.
 中央演算装置91は、第1メモリ92、内部記憶装置94、および第2メモリ93の少なくともいずれかに記憶されているプログラムに基づいて動作可能である。つまり、中央演算装置91は、不揮発性記憶媒体または揮発性記憶媒体を用いて動作可能である。 The central processing unit 91 is operable based on a program stored in at least one of the first memory 92, the internal storage device 94, and the second memory 93. That is, the central processing unit 91 can operate using a non-volatile storage medium or a volatile storage medium.
 また、コンピュータ90には、必要に応じて、ディスクドライブ(図示しない)を備え付けてもよい。ディスクドライブは、バス97に接続される。例えば、ディスクドライブは、中央演算装置91と図示しない記録媒体(プログラム記録媒体)との間で、記録媒体からのデータ・プログラムの読み出し、コンピュータ90の処理結果の記録媒体への書き込みなどを仲介する。例えば、記録媒体は、CD(Compact Disc)やDVD(Digital Versatile Disc)などの光学記録媒体で実現できる。また、記録媒体は、USB(Universal Serial Bus)メモリやSD(Secure Digital)カードなどの半導体記録媒体や、フレキシブルディスクなどの磁気記録媒体、その他の記録媒体によって実現してもよい。 In addition, the computer 90 may be equipped with a disk drive (not shown) as needed. The disk drive is connected to the bus 97. For example, the disk drive mediates reading of the data program from the recording medium, writing of the processing result of the computer 90 to the recording medium, and the like between the central processing unit 91 and the recording medium (program recording medium) not shown. . For example, the recording medium can be realized by an optical recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). Also, the recording medium may be realized by a semiconductor recording medium such as a Universal Serial Bus (USB) memory or a Secure Digital (SD) card, a magnetic recording medium such as a flexible disk, or another recording medium.
 入出力接続回路95は、入力機器98や表示機器99などの入出力装置と、中央演算装置91との間のデータ授受を仲介する回路である。すなわち、入出力接続回路95は、規格や仕様に基づいて、コンピュータ90と周辺機器とを接続するためのインターフェースである。例えば、入出力接続回路95は、IO(Input Output Circuit)インターフェースカードやUSB(Universal Serial Bus)カードなどによって実現できる。 The input / output connection circuit 95 is a circuit that mediates the exchange of data between the central processing unit 91 and input / output devices such as the input device 98 and the display device 99. That is, the input / output connection circuit 95 is an interface for connecting the computer 90 and peripheral devices based on the standards and specifications. For example, the input / output connection circuit 95 can be realized by an IO (Input Output Circuit) interface card, a USB (Universal Serial Bus) card, or the like.
 入力機器98は、コンピュータ90の操作者によって入力される入力指示を受け付ける機器である。例えば、入力機器98は、キーボードやマウス、タッチパネルなどによって実現される。 The input device 98 is a device that receives an input instruction input by the operator of the computer 90. For example, the input device 98 is realized by a keyboard, a mouse, a touch panel or the like.
 表示機器99は、コンピュータ90の操作者に表示情報を提供する機器である。例えば、表示機器99は、液晶ディスプレイやプロジェクタなどによって実現される。 The display device 99 is a device that provides the operator of the computer 90 with display information. For example, the display device 99 is realized by a liquid crystal display, a projector, or the like.
 ネットワークインターフェース回路96は、ネットワークを介して、図示しない外部の装置とコンピュータとの間のデータ授受を中継する回路である。すなわち、ネットワークインターフェース回路96は、インターネットやイントラネットなどのネットワークを通じて、外部のシステムや装置に接続するためのインターフェースである。例えば、ネットワークインターフェース回路96は、LAN(Local Area Network)カードによって実現される。 The network interface circuit 96 is a circuit that relays data exchange between an external device (not shown) and the computer via a network. That is, the network interface circuit 96 is an interface for connecting to an external system or apparatus through a network such as the Internet or an intranet. For example, the network interface circuit 96 is realized by a LAN (Local Area Network) card.
 以上のように、コンピュータ90の中央演算装置91がプログラムに基づいて動作することによって、各実施形態の分析装置の機能が実現できる。なお、各実施形態の分析装置は、複数の構成要素を組み合わせたハードウェアによって構成してもよい。また、各実施形態の分析装置の構成要素は、少なくとも一つのハードウェア回路によって構成してもよい。また、各実施形態の分析装置の構成要素は、複数のハードウェア回路を組み合わせて構成してもよい。また、各実施形態の分析装置の構成要素は、ネットワークを介して接続された複数の装置によって構成してもよい。 As described above, when the central processing unit 91 of the computer 90 operates based on a program, the functions of the analysis device of each embodiment can be realized. Note that the analysis device of each embodiment may be configured by hardware in which a plurality of constituent elements are combined. In addition, the components of the analyzer of each embodiment may be configured by at least one hardware circuit. In addition, the components of the analyzer of each embodiment may be configured by combining a plurality of hardware circuits. In addition, the components of the analysis device of each embodiment may be configured by a plurality of devices connected via a network.
 以上が、本発明の各実施形態に係る分析装置を可能とするためのハードウェア構成の一例である。なお、図15のハードウェア構成は、各実施形態に係る分析装置を実現するためのハードウェア構成の一例であって、本発明の範囲を限定するものではない。また、各実施形態に係る分析装置に関する処理をコンピュータに実行させるプログラムも本発明の範囲に含まれる。さらに、各実施形態に係るプログラムを記録したプログラム記録媒体も本発明の範囲に含まれる。 The above is an example of the hardware configuration for enabling the analyzer according to each embodiment of the present invention. The hardware configuration in FIG. 15 is an example of the hardware configuration for realizing the analyzer according to each embodiment, and does not limit the scope of the present invention. Further, a program that causes a computer to execute the process related to the analyzer according to each embodiment is also included in the scope of the present invention. Furthermore, a program recording medium recording the program according to each embodiment is also included in the scope of the present invention.
 また、各実施形態の分析装置の構成要素は、任意に組み合わせることができる。各実施形態の分析装置の構成要素は、ソフトウェアによって実現してもよいし、回路によって実現してもよい。 Moreover, the components of the analyzer of each embodiment can be arbitrarily combined. The components of the analysis device of each embodiment may be realized by software or circuits.
 以上、実施形態を参照して本発明を説明してきたが、本発明は上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
〔付記〕
 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。
(付記1)
 分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成し、前記索引行列と前記辞書行列との行列積を計算するトピック分析手段と、
 前記行列積が蓄積される行列積記憶手段と、
 前記行列積記憶手段に蓄積された少なくとも一つの前記行列積と前記分析対象行列とを取得し、前記分析対象行列と前記行列積との差分に相当する残差行列を導出する残差行列導出手段と、
 前記残差行列を取得し、前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成する残差行列精錬手段と、
 前記分析対象行列と前記精錬残差行列とを取得し、前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出する残差行列ブースト手段と、
 残差行列ブースト手段によって導出された前記強調残差行列が蓄積される強調残差行列記憶手段とを備える分析装置。
(付記2)
 前記トピック分析手段は、
 前記分析対象行列に関して非負値行列因子分解を行うことによって前記辞書行列および前記索引行列を生成する付記1に記載の分析装置。
(付記3)
 前記残差行列導出手段は、
 前記分析対象行列と前記行列積との差に相当する行列差を算出し、前記行列差の要素のうち負の要素を0に置換することによって前記残差行列を導出する付記1または2に記載の分析装置。
(付記4)
 前記残差行列精錬部は、
 前記残差行列の各要素に関して、第1の閾値以下の要素を0に置換することによって前記精錬残差行列を生成する付記1乃至3のいずれか一項に記載の分析装置。
(付記5)
 前記残差行列精錬部は、
 前記残差行列の各要素に関して、所定の値を減じることによって負となる要素を0に置換することによって前記精錬残差行列を生成する付記1乃至3のいずれか一項に記載の分析装置。
(付記6)
 前記残差行列精錬部は、
 スパース推定法を用いて前記精錬残差行列を生成する付記1乃至3のいずれか一項に記載の分析装置。
(付記7)
 前記残差行列精錬部は、
 前記行列積記憶手段から前記行列積を取得し、
 前記行列積の要素のうち第2の閾値以上の要素の位置を第1の位置として導出し、前記残差行列における前記第1の位置の要素についてノイズを除去する付記4乃至6のいずれか一項に記載の分析装置。
(付記8)
 前記残差行列精錬部は、
 前記行列積記憶手段から前記行列積を取得し、
 前記行列積から選択される特定の行および列の少なくともいずれかをグループに設定し、設定した前記グループごとに前記第1の位置を設定する付記7に記載の分析装置。
(付記9)
 前記残差行列精錬部は、
 前記残差行列に関して前記グループごとのスパース推定を行うことによって前記精錬残差行列を生成する付記8に記載の分析装置。
(付記10)
 前記残差行列ブースト部は、
 前記精錬残差行列の各行および各列の統計量を計算し、算出した前記統計量に基づいて前記残差行列の行および列のそれぞれから一つずつ参照行および参照列を選択し、
 前記参照行および前記参照列のそれぞれに対応する選択行および選択列を前記残差行列から選択し、
 前記選択行の値を要素とする選択行ベクトルと、前記選択列の値を要素とする選択列ベクトルとを生成し、
 生成した前記選択行ベクトルおよび前記選択列ベクトルの要素のそれぞれと前記分析対象行列の要素のそれぞれとの類似度を算出し、
 行ごとに算出した前記類似度が対角要素に設定された第1対角行列と、列ごとに算出した前記類似度が対角要素に設定された第2対角行列とを生成し、
 前記第1対角行列と前記分析対象行列と前記第2対角行列との行列積を前記強調残差行列として導出する付記1乃至9のいずれか一項に記載の分析装置。
(付記11)
 前記残差行列ブースト部は、
 前記精錬残差行列の各行および各列のL1ノルムを前記統計量として計算する付記10に記載の分析装置。
(付記12)
 前記残差行列ブースト部は、
 前記精錬残差行列の各行および各列のL2ノルムを前記統計量として計算する付記10に記載の分析装置。
(付記13)
 前記残差行列を記憶させる残差行列記憶手段を備える付記1乃至12のいずれか一項に記載の分析装置。
(付記14)
 前記精錬残差行列を記憶させる精錬残差行列記憶手段を備える付記1乃至13のいずれか一項に記載の分析装置。
(付記15)
 前記トピック分析手段は、
 前記強調残差行列記憶手段に蓄積された前記強調残差行列を含めた前記分析対象行列に対する前記トピック分析を所定の条件が満たされるまで繰り返す付記1乃至14のいずれか一項に記載の分析装置。
(付記16)
 分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成し、
 前記索引行列と前記辞書行列との行列積を計算し、
 前記行列積を蓄積し、
 前記分析対象行列と、蓄積された少なくとも一つの前記行列積と前記分析対象行列との差分に相当する残差行列を導出し、
 前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成し、
 前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出し、
 前記強調残差行列を前記分析対象行列に含めて蓄積させる分析方法。
(付記17)
 分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成する処理と、
 前記索引行列と前記辞書行列との行列積を計算する処理と、
 前記行列積を蓄積する処理と、
 前記分析対象行列と、蓄積された少なくとも一つの前記行列積と前記分析対象行列との差分に相当する残差行列を導出する処理と、
 前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成する処理と、
 前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出する処理と、
 前記強調残差行列を前記分析対象行列に含めて蓄積させる処理とをコンピュータに実行させるプログラム。
Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
[Supplementary Note]
Some or all of the above embodiments may be described as in the following appendices, but is not limited to the following.
(Supplementary Note 1)
By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated, and the index matrix is generated. Topic analysis means for calculating the matrix product of と and the dictionary matrix,
Matrix product storage means in which the matrix product is stored;
Residual matrix deriving means for obtaining at least one of the matrix products stored in the matrix product storage means and the analysis target matrix, and for deriving a residual matrix equivalent to the difference between the analysis subject matrix and the matrix product When,
Residual matrix refinement means for obtaining a refinement residual matrix by obtaining the residual matrix and removing noise contained in the residual matrix;
The analysis target matrix and the refinement residual matrix are obtained, and based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix is derived in which elements including the topic not yet acquired are emphasized. Residual matrix boosting means,
And an emphasizing residual matrix storage means in which the emphasizing residual matrix derived by the residual matrix boosting means is accumulated.
(Supplementary Note 2)
The topic analysis means
The analyzer according to appendix 1, wherein the dictionary matrix and the index matrix are generated by performing nonnegative matrix factorization on the analysis target matrix.
(Supplementary Note 3)
The residual matrix deriving means
A matrix difference corresponding to a difference between the analysis target matrix and the matrix product is calculated, and the residual matrix is derived by replacing a negative element of the elements of the matrix difference with 0, as described in Appendix 1 or 2. Analyzer.
(Supplementary Note 4)
The residual matrix refining unit
The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements below the first threshold with 0 for each element of the residual matrix.
(Supplementary Note 5)
The residual matrix refining unit
The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements that become negative by subtracting a predetermined value with 0 for each element of the residual matrix.
(Supplementary Note 6)
The residual matrix refining unit
15. The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated using a sparse estimation method.
(Appendix 7)
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The position of the element above the second threshold among the elements of the matrix product is derived as a first position, and noise is removed for the element at the first position in the residual matrix Analyzer according to paragraph.
(Supplementary Note 8)
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The analyzer according to appendix 7, wherein at least one of a specific row and column selected from the matrix product is set as a group, and the first position is set for each set group.
(Appendix 9)
The residual matrix refining unit
The analyzer according to appendix 8, wherein the refinement residual matrix is generated by performing the group-by-group sparse estimation with respect to the residual matrix.
(Supplementary Note 10)
The residual matrix boost unit
Calculating statistics of each row and each column of the refinement residual matrix, and selecting one reference row and reference column from each of the rows and columns of the residual matrix based on the calculated statistics;
Selecting a selected row and a selected column corresponding to each of the reference row and the reference column from the residual matrix;
Generating a selected row vector having the value of the selected row as an element, and a selected column vector having the value of the selected column as an element;
Calculating the similarity between each of the generated selected row vector and the elements of the selected column vector and each of the elements of the analysis target matrix;
Generating a first diagonal matrix in which the similarity calculated for each row is set to a diagonal element, and a second diagonal matrix in which the similarity calculated for each column is set to a diagonal element;
The analysis device according to any one of appendices 1 to 9, wherein a matrix product of the first diagonal matrix, the analysis target matrix, and the second diagonal matrix is derived as the enhanced residual matrix.
(Supplementary Note 11)
The residual matrix boost unit
10. The analyzer according to appendix 10, wherein L 1 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
(Supplementary Note 12)
The residual matrix boost unit
10. The analyzer according to appendix 10, wherein an L 2 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
(Supplementary Note 13)
The analysis device according to any one of appendices 1 to 12, further comprising residual matrix storage means for storing the residual matrix.
(Supplementary Note 14)
The analysis device according to any one of appendices 1 to 13, further comprising refining residual matrix storage means for storing the refining residual matrix.
(Supplementary Note 15)
The topic analysis means
The analyzer according to any one of appendices 1 to 14, wherein the topic analysis on the analysis target matrix including the enhanced residual matrix stored in the enhanced residual matrix storage means is repeated until a predetermined condition is satisfied. .
(Supplementary Note 16)
By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated.
Calculate the matrix product of the index matrix and the dictionary matrix;
Accumulating the matrix product,
Deriving a residual matrix corresponding to a difference between the analysis target matrix, the at least one accumulated matrix product, and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix,
An enhanced residual matrix is derived based on the analysis target matrix and the refinement residual matrix, in which an element including the topic that has not been acquired is emphasized.
An analysis method of including and accumulating the enhanced residual matrix in the analysis target matrix.
(Supplementary Note 17)
A process of generating a dictionary matrix storing topics included in the analysis target matrix by performing topic analysis on the analysis target matrix; and an index matrix indicating the degree to which the analysis target matrix includes the topics.
A process of calculating a matrix product of the index matrix and the dictionary matrix;
A process of accumulating the matrix product;
A process of deriving a residual matrix corresponding to a difference between the analysis target matrix, and at least one of the stored matrix product and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix;
A process of deriving, based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix in which an element including the topic that has not been acquired is emphasized.
A program causing a computer to execute a process of including the enhancement residual matrix in the analysis target matrix and accumulating the matrix.
 1、2  分析装置
 11、21  トピック分析部
 12、22  行列積記憶部
 13、23  残差行列導出部
 14  残差行列記憶部
 15、25  残差行列精錬部
 16  精錬残差行列記憶部
 17、27  残差行列ブースト部
 18、28  強調残差行列記憶部
 51  入力部
 52  第1セル導出部
 53  精錬残差行列生成部
 54  出力部
 71  入力部
 72  統計量計算部
 73  選択部
 74  対角行列生成部
 75  強調残差行列算出部
 76  出力部
Reference Signs List 1, 2 analysis device 11, 21 topic analysis unit 12, 22 matrix product storage unit 13, 23 residual matrix derivation unit 14 residual matrix storage unit 15, 25 residual matrix refinement unit 16 refinement residual matrix storage unit 17, 27 Residual matrix boost unit 18, 28 Enhanced residual matrix storage unit 51 Input unit 52 First cell derivation unit 53 Refinement residual matrix generation unit 54 Output unit 71 Input unit 72 Statistics value calculation unit 73 Selection unit 74 Diagonal matrix generation unit 75 Emphasized residual matrix calculator 76 Output

Claims (17)

  1.  分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成し、前記索引行列と前記辞書行列との行列積を計算するトピック分析手段と、
     前記行列積が蓄積される行列積記憶手段と、
     前記行列積記憶手段に蓄積された少なくとも一つの前記行列積と前記分析対象行列とを取得し、前記分析対象行列と前記行列積との差分に相当する残差行列を導出する残差行列導出手段と、
     前記残差行列を取得し、前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成する残差行列精錬手段と、
     前記分析対象行列と前記精錬残差行列とを取得し、前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出する残差行列ブースト手段と、
     残差行列ブースト手段によって導出された前記強調残差行列が蓄積される強調残差行列記憶手段とを備える分析装置。
    By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated, and the index matrix is generated. Topic analysis means for calculating the matrix product of と and the dictionary matrix,
    Matrix product storage means in which the matrix product is stored;
    Residual matrix deriving means for obtaining at least one of the matrix products stored in the matrix product storage means and the analysis target matrix, and for deriving a residual matrix equivalent to the difference between the analysis subject matrix and the matrix product When,
    Residual matrix refinement means for obtaining a refinement residual matrix by obtaining the residual matrix and removing noise contained in the residual matrix;
    The analysis target matrix and the refinement residual matrix are obtained, and based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix is derived in which elements including the topic not yet acquired are emphasized. Residual matrix boosting means,
    And an emphasizing residual matrix storage means in which the emphasizing residual matrix derived by the residual matrix boosting means is accumulated.
  2.  前記トピック分析手段は、
     前記分析対象行列に関して非負値行列因子分解を行うことによって前記辞書行列および前記索引行列を生成する請求項1に記載の分析装置。
    The topic analysis means
    The analyzer according to claim 1, wherein the dictionary matrix and the index matrix are generated by performing nonnegative matrix factorization on the analysis target matrix.
  3.  前記残差行列導出手段は、
     前記分析対象行列と前記行列積との差に相当する行列差を算出し、前記行列差の要素のうち負の要素を0に置換することによって前記残差行列を導出する請求項1または2に記載の分析装置。
    The residual matrix deriving means
    A matrix difference corresponding to a difference between the analysis target matrix and the matrix product is calculated, and the residual matrix is derived by replacing negative elements of the elements of the matrix difference with 0. Analyzer as described.
  4.  前記残差行列精錬部は、
     前記残差行列の各要素に関して、第1の閾値以下の要素を0に置換することによって前記精錬残差行列を生成する請求項1乃至3のいずれか一項に記載の分析装置。
    The residual matrix refining unit
    The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated by replacing the elements below the first threshold with 0 for each element of the residual matrix.
  5.  前記残差行列精錬部は、
     前記残差行列の各要素に関して、所定の値を減じることによって負となる要素を0に置換することによって前記精錬残差行列を生成する請求項1乃至3のいずれか一項に記載の分析装置。
    The residual matrix refining unit
    The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated by replacing elements that become negative by subtracting a predetermined value with 0 for each element of the residual matrix. .
  6.  前記残差行列精錬部は、
     スパース推定法を用いて前記精錬残差行列を生成する請求項1乃至3のいずれか一項に記載の分析装置。
    The residual matrix refining unit
    The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated using a sparse estimation method.
  7.  前記残差行列精錬部は、
     前記行列積記憶手段から前記行列積を取得し、
     前記行列積の要素のうち第2の閾値以上の要素の位置を第1の位置として導出し、前記残差行列における前記第1の位置の要素についてノイズを除去する請求項4乃至6のいずれか一項に記載の分析装置。
    The residual matrix refining unit
    Obtaining the matrix product from the matrix product storage means;
    The position of the element more than a 2nd threshold value among the elements of said matrix product is derived as a 1st position, and noise is removed about the element of said 1st position in said remainder matrix. Analyzer according to one of the items.
  8.  前記残差行列精錬部は、
     前記行列積記憶手段から前記行列積を取得し、
     前記行列積から選択される特定の行および列の少なくともいずれかをグループに設定し、設定した前記グループごとに前記第1の位置を設定する請求項7に記載の分析装置。
    The residual matrix refining unit
    Obtaining the matrix product from the matrix product storage means;
    The analyzer according to claim 7, wherein at least one of a specific row and column selected from the matrix product is set as a group, and the first position is set for each set group.
  9.  前記残差行列精錬部は、
     前記残差行列に関して前記グループごとのスパース推定を行うことによって前記精錬残差行列を生成する請求項8に記載の分析装置。
    The residual matrix refining unit
    The analyzer according to claim 8, wherein the refinement residual matrix is generated by performing the group-wise sparse estimation on the residual matrix.
  10.  前記残差行列ブースト部は、
     前記精錬残差行列の各行および各列の統計量を計算し、算出した前記統計量に基づいて前記残差行列の行および列のそれぞれから一つずつ参照行および参照列を選択し、
     前記参照行および前記参照列のそれぞれに対応する選択行および選択列を前記残差行列から選択し、
     前記選択行の値を要素とする選択行ベクトルと、前記選択列の値を要素とする選択列ベクトルとを生成し、
     生成した前記選択行ベクトルおよび前記選択列ベクトルの要素のそれぞれと前記分析対象行列の要素のそれぞれとの類似度を算出し、
     行ごとに算出した前記類似度が対角要素に設定された第1対角行列と、列ごとに算出した前記類似度が対角要素に設定された第2対角行列とを生成し、
     前記第1対角行列と前記分析対象行列と前記第2対角行列との行列積を前記強調残差行列として導出する請求項1乃至9のいずれか一項に記載の分析装置。
    The residual matrix boost unit
    Calculating statistics of each row and each column of the refinement residual matrix, and selecting one reference row and reference column from each of the rows and columns of the residual matrix based on the calculated statistics;
    Selecting a selected row and a selected column corresponding to each of the reference row and the reference column from the residual matrix;
    Generating a selected row vector having the value of the selected row as an element, and a selected column vector having the value of the selected column as an element;
    Calculating the similarity between each of the generated selected row vector and the elements of the selected column vector and each of the elements of the analysis target matrix;
    Generating a first diagonal matrix in which the similarity calculated for each row is set to a diagonal element, and a second diagonal matrix in which the similarity calculated for each column is set to a diagonal element;
    The analyzer according to any one of claims 1 to 9, wherein a matrix product of the first diagonal matrix, the analysis target matrix, and the second diagonal matrix is derived as the enhanced residual matrix.
  11.  前記残差行列ブースト部は、
     前記精錬残差行列の各行および各列のL1ノルムを前記統計量として計算する請求項10に記載の分析装置。
    The residual matrix boost unit
    The analyzer according to claim 10, wherein L 1 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
  12.  前記残差行列ブースト部は、
     前記精錬残差行列の各行および各列のL2ノルムを前記統計量として計算する請求項10に記載の分析装置。
    The residual matrix boost unit
    The analyzer according to claim 10, wherein an L 2 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
  13.  前記残差行列を記憶させる残差行列記憶手段を備える請求項1乃至12のいずれか一項に記載の分析装置。 The analyzer according to any one of claims 1 to 12, further comprising residual matrix storage means for storing the residual matrix.
  14.  前記精錬残差行列を記憶させる精錬残差行列記憶手段を備える請求項1乃至13のいずれか一項に記載の分析装置。 The analyzer according to any one of claims 1 to 13, further comprising refining residual matrix storage means for storing the refining residual matrix.
  15.  前記トピック分析手段は、
     前記強調残差行列記憶手段に蓄積された前記強調残差行列を含めた前記分析対象行列に対する前記トピック分析を所定の条件が満たされるまで繰り返す請求項1乃至14のいずれか一項に記載の分析装置。
    The topic analysis means
    The analysis according to any one of claims 1 to 14, wherein the topic analysis on the analysis target matrix including the enhancement residual matrix stored in the enhancement residual matrix storage means is repeated until a predetermined condition is satisfied. apparatus.
  16.  分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成し、
     前記索引行列と前記辞書行列との行列積を計算し、
     前記行列積を蓄積し、
     前記分析対象行列と、蓄積された少なくとも一つの前記行列積と前記分析対象行列との差分に相当する残差行列を導出し、
     前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成し、
     前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出し、
     前記強調残差行列を前記分析対象行列に含めて蓄積させる分析方法。
    By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated.
    Calculate the matrix product of the index matrix and the dictionary matrix;
    Accumulating the matrix product,
    Deriving a residual matrix corresponding to a difference between the analysis target matrix, the at least one accumulated matrix product, and the analysis target matrix;
    Generating a refinement residual matrix by removing noise contained in the residual matrix,
    An enhanced residual matrix is derived based on the analysis target matrix and the refinement residual matrix, in which an element including the topic that has not been acquired is emphasized.
    An analysis method of including and accumulating the enhanced residual matrix in the analysis target matrix.
  17.  分析対象行列に対してトピック分析を行うことによって、前記分析対象行列に含まれるトピックを格納する辞書行列と、前記分析対象行列が前記トピックを含む程度を示す索引行列とを生成する処理と、
     前記索引行列と前記辞書行列との行列積を計算する処理と、
     前記行列積を蓄積する処理と、
     前記分析対象行列と、蓄積された少なくとも一つの前記行列積と前記分析対象行列との差分に相当する残差行列を導出する処理と、
     前記残差行列に含まれるノイズを除去することによって精錬残差行列を生成する処理と、
     前記分析対象行列および前記精錬残差行列に基づいて、未だ獲得されていない前記トピックを含む要素が強調された強調残差行列を導出する処理と、
     前記強調残差行列を前記分析対象行列に含めて蓄積させる処理とをコンピュータに実行させるプログラムを記憶させたプログラム記憶媒体。
    A process of generating a dictionary matrix storing topics included in the analysis target matrix by performing topic analysis on the analysis target matrix; and an index matrix indicating the degree to which the analysis target matrix includes the topics.
    A process of calculating a matrix product of the index matrix and the dictionary matrix;
    A process of accumulating the matrix product;
    A process of deriving a residual matrix corresponding to a difference between the analysis target matrix, and at least one of the stored matrix product and the analysis target matrix;
    Generating a refinement residual matrix by removing noise contained in the residual matrix;
    A process of deriving, based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix in which an element including the topic that has not been acquired is emphasized.
    A program storage medium storing a program that causes a computer to execute a process of including and accumulating the enhancement residual matrix in the analysis target matrix.
PCT/JP2017/046608 2017-12-26 2017-12-26 Analysis device, analysis method, and program recording medium WO2019130419A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/046608 WO2019130419A1 (en) 2017-12-26 2017-12-26 Analysis device, analysis method, and program recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/046608 WO2019130419A1 (en) 2017-12-26 2017-12-26 Analysis device, analysis method, and program recording medium

Publications (1)

Publication Number Publication Date
WO2019130419A1 true WO2019130419A1 (en) 2019-07-04

Family

ID=67066780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/046608 WO2019130419A1 (en) 2017-12-26 2017-12-26 Analysis device, analysis method, and program recording medium

Country Status (1)

Country Link
WO (1) WO2019130419A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012523025A (en) * 2009-04-01 2012-09-27 アイ−セタナ ピーティーワイ リミテッド System and method for detecting anomalies from data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012523025A (en) * 2009-04-01 2012-09-27 アイ−セタナ ピーティーワイ リミテッド System and method for detecting anomalies from data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIANG, RUOYI ET AL.: "Anomaly Localization for Network Data Streams with Graph Joint Sparse PCA", PROCEEDINGS OF THE 17TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, August 2011 (2011-08-01), pages 886 - 894, XP55623122 *
SUN, SANGHO ET AL.: "L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization", 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM, December 2016 (2016-12-01), pages 479 - 488, XP033056010, doi:10.1109/ICDM.2016.0059 *
TAKAHASHI, TSUBASA ET AL.: "AutoCyclone: Automatic Mining of Cyclic Online Activities with Robust Tensor Factorization", INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW, 7 April 2017 (2017-04-07), pages 213 - 221, XP055557706, doi:10.1145/3038912.3052595 *

Similar Documents

Publication Publication Date Title
US10733149B2 (en) Template based data reduction for security related information flow data
JP6751235B2 (en) Machine learning program, machine learning method, and machine learning device
US7761398B2 (en) Apparatus and method for identifying process elements using request-response pairs, a process graph and noise reduction in the graph
CN113676484B (en) Attack tracing method and device and electronic equipment
CN112702342B (en) Network event processing method and device, electronic equipment and readable storage medium
CN113162794B (en) Next attack event prediction method and related equipment
CN107341095B (en) Method and device for intelligently analyzing log data
CN111183620A (en) Intrusion investigation
JP2017045080A (en) Business flow specification regeneration method
WO2019130419A1 (en) Analysis device, analysis method, and program recording medium
Khodabakhsh et al. Cloud-based fault detection and classification for oil & gas industry
Zhang et al. Mapping time series into complex networks based on equal probability division
WO2019130416A1 (en) Analysis device, analysis method, and program recording medium
JP6988827B2 (en) Abnormality identification system, method and program
JP6556681B2 (en) Anonymization table generation device, anonymization table generation method, program
JP6549076B2 (en) Anonymization table generation device, anonymization table generation method, program
WO2019073913A1 (en) Pseudo-data generating device, method and program
Tammana et al. An Exploration on Competent Video Processing Architectures
CN113971119A (en) Unsupervised model-based user behavior anomaly analysis and evaluation method and system
Wei et al. Identification and reconstruction of chaotic systems using multiresolution wavelet decompositions
CN114357445A (en) Method, device and storage medium for identifying terminal side attack path
Cogranne et al. Statistical detection of LSB matching in the presence of nuisance parameters
JP6741203B2 (en) Analysis equipment
CN114418130B (en) Model training method, data processing method and related equipment
CA3101842A1 (en) A method of digital signal feature extraction comprising multiscale analysis

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 17936347

Country of ref document: EP

Kind code of ref document: A1