WO2019130419A1

WO2019130419A1 - Analysis device, analysis method, and program recording medium

Info

Publication number: WO2019130419A1
Application number: PCT/JP2017/046608
Authority: WO
Inventors: 翼高橋
Original assignee: 日本電気株式会社
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2019-07-04

Abstract

In order to enable a pattern relating to an infrequent event to be detected, this analysis device is provided with: a topic analysis unit which calculates a matrix product of an index matrix, which is generated by performing topic analysis on a matrix to be analyzed, and a dictionary matrix; a residual matrix derivation unit which acquires at least one accumulated matrix product and the matrix to be analyzed, and derives a residual matrix corresponding to a difference between the matrix to be analyzed and the at least one matrix product; a residual matrix refining unit which removes noise from the residual matrix to generate a refined residual matrix; a residual matrix boosting unit which derives an emphasized residual matrix in which each element including an uncaptured topic is emphasized, on the basis of the matrix to be analyzed and the refined residual matrix; and an emphasized residual matrix storage unit which accumulates emphasized residual matrices.

Description

Analyzer, analysis method and program recording medium

The present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in data. In particular, the present invention relates to an analyzer, an analysis method, and a program for analyzing topics included in matrix data of a set of event vectors.

A network intrusion detection device (IDS: Intrusion Detection System), a factory temperature control device, and the like are provided with a sensor device for observing a state or value related to an observation target. The sensor devices sometimes use data associated with the state or value related to the observation target (hereinafter referred to as observation value) and information including the observation time at which the observation value was observed (hereinafter referred to as time stamp). Generate every moment. A network, a factory, etc. can be constantly monitored by distributing data in a stream format, in which the observation value thus generated every moment and the time stamp are linked in this way. A sequence of data including observation values and timestamps distributed in a stream format in this manner is called a data stream. For example, a mini-blog (tweet) such as Twitter (registered trademark), a proxy server log, an IDS alert log, and the like can be given as an example of a data stream. If the data stream to be observed is acquired and the observed data is analyzed, it becomes possible to determine whether or not the observation target is normal, to grasp the state such as finding suspicious behavior, and to classify and classify events. .

The data contained in the data stream is mixed with various events. For example, focusing on the frequency of occurrence of events, such as events that occur frequently (hereinafter referred to as major events), events that rarely occur (hereinafter referred to as rare events), and events that occur with moderate frequency It can be classified. The main patterns in the data corresponding to each event are called topics. In addition, a pattern means the combination of the value which appears in common. In particular, the pattern of combinations of key values is called a topic. In the following, patterns and topics are treated as equivalent.

Finding a pattern that represents an event is important for understanding the characteristics of the security device that issues the alert, and for understanding anomalies that do not normally occur. In order to analyze events, data such as data stream format, sequence format, and document format are converted into vector format including, for example, frequency of events, words, keywords, etc. included in the data. Hereinafter, such data in vector format is referred to as an event vector.

Generally, as a method of finding main patterns and topics in data, a principal component analysis on event vectors and a topic analysis by matrix decomposition such as singular value decomposition are used. In particular, topic analysis can also be applied to sequence data.

However, because general topic analysis focuses on finding major topics in the data set, it is difficult to capture rare events as topics. This is because general topic analysis is aimed at finding a set of topics that sufficiently compresses a data set, so major topics tend to be captured preferentially.

FIG. 16 is a graph showing the relationship between the topic corresponding to an event and the frequency of the topic. In general, when topics are arranged in order of frequency, the frequency of topics has a power-law relationship. Usually, the frequency of the rare event shown by a dashed line frame in FIG. 16 is extremely small compared to the major event. Also, the data stream may contain many simple noises that can not be identified as events. As described above, in general topic analysis, rare events are regarded as errors because they are relatively small values from the viewpoint of major events. Therefore, in simple topic analysis, rare events are likely to be misinterpreted as topics. Therefore, it is required to clearly distinguish rare events contained in a data stream from noise.

Non-Patent Document 1 discloses a topic analysis method using L-Ens NMF (Local Ensemble of Non-Continuous Matrix Factorization). In the topic analysis method of Non-Patent Document 1, a predetermined number of topics are acquired by matrix decomposition, and a residual matrix which is a portion not corresponding to the topics acquired by matrix decomposition is generated. Then, in the topic analysis method of Non-Patent Document 1, a portion (event or the like) which can not be acquired as a topic is emphasized (boosted) with respect to the generated residual matrix, and a predetermined number of topics are again boosted. Matrix decomposition for the residual matrix In the topic analysis method of Non-Patent Document 1, the above operation is recursively repeated until a set number of topics are obtained.

Non-Patent Document 2 discloses Group Lasso (Least Absolute Shrinkage and Selection Operator) regularization, which is a type of sparse regularization. Group Lasso regularization is a regularization that simultaneously reduces variables belonging to a group to 0 for a group of variables. That is, the Group Lasso regularization is a regularization that has the effect of forcing it to become sparse.

Non-Patent Document 3 discloses Joint Sparse PCA (JSPCA) and Joint Sparse PCA (GJSPCA), which are improved versions of Principal Component Analysis (PCA). In the method of Non-Patent Document 3, Group Lasso regularization is used to form variable groups with high-order components in principal component analysis (PCA). Generally, in principal component analysis, the higher order components have more major components of data. Therefore, the variable group formed by the method of Non-Patent Document 3 is estimated as a dense pattern including more features that many data have in common.

Non-Patent Document 4 and Non-Patent Document 5 disclose matrix decomposition using Group Lasso regularization. According to the methods of Non-Patent Document 4 and Non-Patent Document 5, it is possible to perform matrix decomposition with robustness in which the pattern obtained by principal component analysis is less susceptible to noise and outliers.

According to the method of Non-Patent Document 1, not only major topics but also topics with medium frequency can be acquired. By the way, the method of Non-Patent Document 1 only emphasizes a portion of a matrix to be a target of topic analysis that can not be acquired as a topic, and there is no mechanism to distinguish residuals from noise. Therefore, in the method of Non-Patent Document 1, a low-frequency residual is captured as noise, or noise is mixed in the low-frequency residual. That is, the method of Non-Patent Document 1 has a problem that it is difficult to obtain a topic from a residual having a low frequency.

According to the method of Non-Patent Document 3, variable groups can be estimated as patterns that are constantly expressed in many data. However, in the method of Non-Patent Document 3, there is a problem that only characteristic patterns in the entire data set are collected in the upper component, and it is difficult to obtain patterns related to rare events.

An object of the present invention is to solve the above-mentioned problems and to provide an analysis device that makes it possible to find out patterns related to infrequent events.

An analysis apparatus according to an aspect of the present invention performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics. , And a matrix product storage unit in which matrix products are stored, and at least one matrix product and analysis target matrix stored in the matrix product storage unit. And a residual matrix deriving unit that derives a residual matrix corresponding to the difference between the analysis target matrix and the matrix product, and acquiring the residual matrix, and removing the noise included in the residual matrix to perform refinement. Obtain a residual matrix refinement unit that generates a residual matrix, obtain an analysis target matrix and a refinement residual matrix, and based on the analysis target matrix and the refinement residual matrix, elements including topics that have not yet been acquired are emphasized Stressed Comprising a residual matrix boost unit for deriving a difference matrix, and enhancement residual matrix storage unit that residual enhancement residual matrix derived by matrix boosting unit is accumulated.

In the analysis method according to one aspect of the present invention, by performing topic analysis on an analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix, and an index matrix indicating the degree to which the analysis target matrix includes topics , The matrix product of the index matrix and the dictionary matrix, the matrix product is accumulated, and the residual matrix corresponding to the difference between the analysis target matrix, and the stored at least one matrix product and the analysis target matrix To generate a refinement residual matrix by removing the noise contained in the residual matrix, and based on the analysis target matrix and the refinement residual matrix, emphasizing the elements including topics that have not been acquired yet The residual matrix is derived, and the enhanced residual matrix is included in the analysis target matrix and accumulated.

A program according to an aspect of the present invention performs a topic analysis on an analysis target matrix to thereby store a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes topics. Equivalent to the process of generating, the process of calculating the matrix product of the index matrix and the dictionary matrix, the process of accumulating the matrix product, the analysis target matrix, and the difference between the stored at least one matrix product and the analysis target matrix Processing for deriving a residual matrix, processing for generating a refinement residual matrix by removing noise included in the residual matrix, and a topic not yet obtained based on the analysis target matrix and the refinement residual matrix The computer is caused to execute a process of deriving an enhanced residual matrix in which elements including X are enhanced, and a process of including the enhanced residual matrix in the analysis target matrix and accumulating the matrix.

According to the present invention, it is possible to provide an analysis device that makes it possible to discover patterns related to infrequent events.

It is a block diagram which shows an example of a structure of the analyzer which concerns on the 1st Embodiment of this invention. It is a conceptual diagram which shows an example of the input matrix which the analyzer which concerns on the 1st Embodiment of this invention inputs. It is a conceptual diagram which shows an example of the matrix product which the topic analysis part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. It is a conceptual diagram which shows an example of the remainder matrix which the remainder matrix derivation | leading-out part of the analyzer which concerns on the 1st Embodiment of this invention derives. It is a conceptual diagram which shows an example of the matrix which the residual-matrix refinement | purification part of the analyzer which concerns on the 1st Embodiment of this invention set the element of 1st position to one. It is a conceptual diagram which shows an example of the refinement | purification residual matrix which the remainder matrix refinement part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. Residual matrix boost portion of the first analysis according to the embodiment apparatus is a conceptual diagram showing an example of L ¹ norm of the values calculated by the present invention. It is a conceptual diagram which shows an example in which the remainder matrix boost part of the analyzer which concerns on the 1st Embodiment of this invention selects a row and a column. It is a conceptual diagram which shows an example of the emphasizing residual matrix which the residual-matrix boost part of the analyzer which concerns on the 1st Embodiment of this invention produces | generates. It is a flowchart for demonstrating the operation | movement in the first iteration of the analyzer which concerns on the 1st Embodiment of this invention. It is a flowchart for demonstrating the operation | movement in the second or subsequent iteration of the analyzer which concerns on the 1st Embodiment of this invention. It is a block diagram which shows an example of a structure of the analyzer which concerns on the 2nd Embodiment of this invention. It is a block diagram which shows an example of a structure of the remainder matrix refinement | purification part of the analyzer which concerns on the 2nd Embodiment of this invention. It is a block diagram which shows an example of a structure of the remainder matrix boost part of the analyzer which concerns on the 2nd Embodiment of this invention. It is a block diagram which shows an example of the hardware constitutions which implement | achieves the analyzer which concerns on each embodiment of this invention. It is a graph regarding the frequency of the topic arranged in order of frequency.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below are technically preferable limitations for carrying out the present invention, but the scope of the invention is not limited to the following. In all the drawings used in the following description of the embodiment, the same reference numerals are given to the same parts unless there is a particular reason. In the following embodiments, the same configuration and operation may not be repeatedly described. Further, the direction of the arrow in the drawing shows an example, and does not limit the direction of the signal between the blocks.

First Embodiment
First, the configuration of an analyzer according to a first embodiment of the present invention will be described with reference to the drawings. In the following, an example in which the analyzer of the present embodiment performs topic analysis using nonnegative matrix factorization (hereinafter referred to as NMF: Nonnegative Matrix Factorization) will be described, but the topic analysis performed by the analyzer of the present embodiment is NMF. It is not limited.

FIG. 1 is a block diagram showing an example of the configuration of the analyzer 1 of the present embodiment. As shown in FIG. 1, the analysis device 1 includes a topic analysis unit 11, a matrix product storage unit 12, a residual matrix derivation unit 13, a residual matrix storage unit 14, a residual matrix refinement unit 15, and a refinement residual matrix storage unit 16. , Residual matrix boost unit 17, and enhanced residual matrix storage unit 18.

The example of FIG. 1 illustrates an example of analyzing the input matrix A stored in the storage device 100. The input matrix A may be configured to be acquired from the storage device 100 configuring the external system via the network, or may be configured to be acquired from the storage device 100 provided in parallel to the analysis apparatus 1. Good.

The topic analysis unit 11 performs topic analysis using NMF on a matrix to be analyzed (hereinafter, referred to as analysis target matrix). In the present embodiment, the analysis target matrix is either the input matrix A or the enhanced residual matrix R _L. The emphasis residual matrix R _L is a matrix in which the uncaptured part is emphasized in each iteration. The enhancement residual matrix R _L generated in each iteration is stored in the enhancement residual matrix storage unit 18. The topic analysis unit 11 repeats the topic analysis on the input matrix A until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the topic analysis on the input matrix A a predetermined number of times (m times) (m is a natural number). Also, for example, the topic analysis unit 11 repeats topic analysis until the number of acquired topics reaches a predetermined number.

FIG. 2 is a conceptual diagram showing an example of the input matrix A. As shown in FIG. In FIG. 2, cells corresponding to each element are expressed by shading according to the size of each element, L is for large value cells, M is for medium value cells, and is for small value cells. It is written as S. However, the value of the blank cell is 0. Also, in the same drawings, cells with very small values may be denoted as VS from this point onward. Also, from this point onward, in the same figure, there are cases where the density and pattern of cells are different although the same notation is used. In the following, it is assumed that the input matrix A is a matrix of I rows and J columns (I and J are natural numbers). Further, the value of the element of the i-th row and the j-th column of the input matrix A is expressed as A [i, j] (i and j are natural numbers). The notation of the values of matrix elements is the same as in other matrices.

First, when the topic analysis unit 11 receives an input matrix A, the topic analysis unit 11 starts a first iteration on the input matrix A that has been input. Then, when starting the second and subsequent iterations, the topic analysis unit 11 refers to the enhanced residual matrix storage unit 18 and applies to the enhanced residual matrix R _L generated based on the previous iterations. Repeat the topic analysis.

The topic analysis unit 11 repeats topic analysis until k reaches a predetermined topic number k (k is a natural number). In other words, the topic analysis unit 11 acquires a number ks of topics smaller than a predetermined number k of topics by NMF (ks is a natural number). However, k and ks are natural numbers which satisfy the relation of ks <k. Practically, 1 or 2 or k / 2 can be set as ks. The topic analysis unit 11 may repeat the topic analysis until the enhancement residual matrix R _L stored in the enhancement residual matrix storage unit 18 becomes empty. The matrix being empty means that the values of all the cells are zero.

The topic analysis unit 11 performs topic analysis of the analysis target matrix, and generates a dictionary matrix (Dictionary Matrix) storing topics, and a index matrix (Membership Matrix) indicating which topics are included and to what extent. In the following, each of the dictionary matrix and index matrix generated in the m-th iteration is denoted as dictionary matrix H _m and index matrix W _m (m is an integer of 1 or more).

The topic analysis unit 11 calculates a matrix product W _m H _m of the generated index matrix W _m and the dictionary matrix H _m . The topic analysis unit 11 stores the calculated matrix product W _m H _m in the matrix product storage unit 12.

FIG. 3 is an example of a matrix product W _m H _m of the index matrix W _m and the dictionary matrix H _m generated by the topic analysis unit 11 using the input matrix A of FIG. In the matrix product W _m H _m in the example of FIG. 3, the value of the cell (S) having the smaller value among the elements of the input matrix A (FIG. 2) is 0.

The residual matrix deriving unit 13 calculates a difference between any one of the input matrix A and the enhanced residual matrix R _L and the matrix product W _m H _m generated by the topic analysis unit 11 as a matrix difference Ro. Then, the residual matrix deriving unit 13 replaces all negative values included in the matrix difference Ro with zero in order to remove from the matrix difference Ro negative values that adversely affect the NMF. As described above, a matrix obtained by replacing elements of negative values of the elements of the matrix difference Ro with 0 is called a residual matrix R.

FIG. 4 is a conceptual diagram showing an example of the residual matrix R. The residual matrix deriving unit 13 calculates a matrix difference Ro which is a difference between the input matrix A (FIG. 2) and the matrix product W _m H _m (FIG. 3), and sets a negative element of the elements of the matrix difference Ro to zero. The residual matrix R (FIG. 4) is generated by replacing. The residual matrix deriving unit 13 stores the generated residual matrix R in the residual matrix storage unit 14.

The residual matrix refinement unit 15 obtains the residual matrix R from the residual matrix storage unit 14. The residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise included in the residual matrix R. The residual matrix refinement unit 15 stores the generated refinement residual matrix R * in the refinement residual matrix storage unit 16.

The residual matrix refinement unit 15 generates a refinement residual matrix R * in which elements below the threshold θ ₁ (also referred to as a first threshold) are replaced with 0 for each element of the residual matrix R. In addition, the residual matrix refinement unit 15 may generate, for each element of the residual matrix R, an element that becomes negative by subtracting a predetermined value with 0 to generate a refinement residual matrix R *. Alternatively, the residual matrix refinement unit 15 may generate a refinement residual matrix R * using a sparse estimation method such as Lasso (Least Absolute Shrinkage and Selection Operator).

Alternatively, the residual matrix refinement unit 15 may generate a refinement residual matrix after thinning out the elements to be subjected to noise removal. In this case, the residual matrix refinement unit 15 refers to the matrix product storage unit 12 and obtains the matrix product W _m H _m stored in the matrix product storage unit 12. The residual matrix refinement unit 15 derives the position (also referred to as a first position) of an element having a threshold θ ₂ (also referred to as a second threshold) or more with respect to the acquired matrix product W _m H _m . The residual matrix refinement unit 15 generates a refinement residual matrix R * by removing noise from the elements of the first position L _WH of the residual matrix R.

For example, even if the residual matrix refinement unit 15 sets a specific row or column of a matrix product to a group and subtracts a predetermined value from all the elements belonging to the group, the element becoming negative is replaced with 0. Good. In other words, the residual matrix refinement unit 15 may set, for each of the set groups, the thinning amount of the element to be subjected to noise removal. The residual matrix refinement unit 15 uses the group-by-group sparse estimation method such as Group Lasso when considering a specific row or column of matrix product as a group, and the thinning amount of the element to be subjected to noise removal May be set.

FIG. 5 is an example in which 1 is set in the cell of the first position L _WH derived by the residual matrix refinement unit 15. When the element of the first position L _WH among the elements of the residual matrix R is compared with the first threshold, the amount of operation is reduced compared to comparing all the elements of the residual matrix R with the first threshold it can. The residual matrix refinement unit 15 may derive the cell of the first position L _{WH using} a list of cells or a hash instead of the matrix as shown in FIG.

FIG. 6 is an example of a refinement residual matrix R * generated based on the residual matrix R of FIG. In the example of FIG. 6, the element whose first position L _WH is below the second threshold is filled with white. Note that the value of the white-filled cell is 0. The refinement residual matrix R * excludes the small values left in the cell in which the topic has been acquired in the previous iteration. In other words, using refinement residual matrix R *, it is possible to exclude minute values left in cells in which a topic has already been acquired in topic analysis of subsequent iterations, so errors generated by topic analysis and Can distinguish between rare events. That is, the refinement residual matrix R * enhances the chance of acquiring rare events as topics.

The residual matrix boost unit 17 obtains a refinement residual matrix R * from the refinement residual matrix storage unit 16. The residual matrix boost unit 17 generates an enhanced residual matrix R _{L in} which the values of specific rows and columns of the acquired refinement residual matrix R * are enhanced in the following procedure.

First, the residual matrix boost unit 17 calculates the L ¹ norm for each row of the refinement residual matrix R *. The L ¹ norm is the sum of the absolute values of the elements of each row of the refinement residual matrix R *. The residual matrix boost unit 17 generates a row reference vector Pr in which the L ¹ norm of the ith row of the uncaptured degree matrix U is Pr [i] (Equation 1). Where I is the number of rows of the refinement residual matrix R *.
Pr = (Pr [1], ..., Pr [I]) ... (1)
For the row reference vector Pr, the residual matrix boosting unit 17 considers that the row i is a probability distribution selected with the weight of Pr [i], and selects one row i. Here, the row selected by the residual matrix boost unit 17 is referred to as a reference row i *. The residual matrix boosting unit 17 may select the reference row i * at random or may select the row i with the largest L ¹ norm as the reference row i *.

Similarly, residual matrix boost unit 17 calculates the L ¹ norm for each column of refinement residual matrix R *. The residual matrix boosting unit 17 generates a column reference vector Pc in which the L ¹ norm of the j-th column of the refinement residual matrix R * is Pc [j] (Equation 2). However, in Equation 2, J is the number of columns of the uncaptured degree matrix U.
Pc = (Pc [1],..., Pc [J]) (2)
For the column reference vector Pc, the residual matrix boost unit 17 considers that the column j is a probability distribution selected with the weight of Pc [j], and selects one column j. Here, the column selected by the residual matrix boost unit 17 is referred to as a reference column j *. The residual matrix boosting unit 17 may randomly select the reference sequence j *, or may select a sequence having the largest L ¹ norm as the reference sequence j *.

In the above, the reference row and the reference column are selected based on the L ¹ norm, but the residual matrix boost unit 17 selects the reference row or the reference row based on any statistic that can be calculated for each row or column. A reference column may be selected. For example, the residual matrix boosting unit 17 may select the reference row or the reference column using the L ² norm.

When selecting a reference row or reference column using L ² norm, the probability that the row and column containing more values in the refinement residual matrix R * are more likely to be selected as reference row i * and reference column j * is higher .

FIG. 7 is a conceptual diagram showing an example of L ¹ norm values calculated in the row direction and column direction of the refinement residual matrix R * of FIG. The numbers to the right of each line in FIG. 7 are the L ¹ norm value of each line. Similarly, the upper numbers in each column of FIG. 8 are the values of L ¹ norm in each column. That is, the numbers on the right side of each row of FIG. 7 are elements of the column reference vector Pr, and the numbers on the upper side of each column of FIG. 7 are elements of the row reference vector Pc.

FIG. 8 shows that rows and columns (hereinafter referred to as selected rows and selected columns) corresponding to reference row i * and reference column j * are refined based on the values of L ¹ norm of FIG. It is a conceptual diagram which shows the example selected. In the example of FIG. 8, the first row of the refinement residual matrix R * is selected as the selection row i *, and the second column of the refinement residual matrix R * is selected as the selection sequence j *.

Subsequently, the residual matrix boosting unit 17 generates a selected row vector A [i * ,:] composed of the values of the elements of the selected row i * of the input matrix A. Then, the residual matrix boosting unit 17 calculates the similarity sim (i *, i) between the selected row vector A [i * ,:] and all the rows of the input matrix A. Similarly, residual matrix boost unit 17 generates a selected column vector R [:, j *] composed of the values of the elements of selected column j * of residual matrix R. Then, the residual matrix boost unit 17 calculates the similarity sim (j *, j) between the selected column vector R [:, j *] and all the columns of the input matrix A. For example, the residual matrix boost unit 17 uses cosine similarity as the similarity sim (···). However, the residual matrix boost unit 17 may calculate the similarity sim (·, ·) using a method other than cosine similarity. In the second and subsequent iterations, the similarity sim (·, ·) may be calculated for the enhanced residual matrix to be analyzed.

The residual matrix boosting unit 17 is a diagonal matrix of I rows and I columns in which the similarity sim (i *, i) is set to the diagonal element D _c [i, i] and 0 is set to the nondiagonal elements. Generate D _c . Similarly, residual matrix boosting unit 17 sets J to j in which the similarity sim (j *, j) is set to diagonal element D _r [j, j] and 0 is set to non-diagonal elements. Generate diagonal matrix D _r .

Then, the residual matrix boost unit 17 calculates a matrix product D _c RD _r of the diagonal matrix D _c , the residual matrix R, and the diagonal matrix D _r . The matrix product D _c RD _r is the enhanced residual matrix R _L. The residual matrix boost unit 17 stores the enhanced residual matrix R _L , which is the calculation result, in the enhanced residual matrix storage unit 18.

The diagonal matrix D _c has an effect of emphasizing the values of rows similar to the selected row i * in the selected row i * and the input matrix A and attenuating the values of the other rows. Similarly, in the selection matrix j * and the input matrix A, the diagonal matrix D _r has an effect of emphasizing the values of columns similar to the selection column j * and attenuating the values of the other columns.

FIG. 9 is an example of the enhanced residual matrix R _L generated by the residual matrix boost unit 17. In the example of FIG. 9, the values of the other columns having low similarity with the j * column are attenuated, and as a result, the attenuation of the values in the first row is large (M to S). Also, in the example of FIG. 9, the other rows having low similarity to the i * th row (the first row) are also attenuated, but the third and fourth rows (the third column) have the original values (the S) is small, so the effect to be attenuated is small.

As described above, the refinement residual matrix R * generated by the residual matrix refinement unit 15 is a value other than the topic acquired by the topic analysis unit 11, that is, the value of the topic not acquired by the topic analysis unit 11 yet. Emphasize. Furthermore, the residual matrix refinement unit 15 eliminates the error caused by acquiring the topic by squashing the value of the cell whose residual caused by the topic acquired by the topic analysis unit 11 is smaller than a predetermined threshold to 0. Do. As a result, in the subsequent iteration, the topic analysis unit 11 has a high opportunity to acquire a part that has not been acquired as a topic. That is, the possibility of acquiring rare topics which could not be acquired in the first topic analysis is increased.

In the second and subsequent iterations, the topic analysis unit 11 receives the enhanced residual matrix R _L as input, and repeats the above-described processing until a predetermined condition is satisfied. For example, the topic analysis unit 11 repeats the above-described process until a predetermined topic number k of topics is obtained or the enhanced residual matrix R _L becomes empty.

The above is the description of the configuration of the analyzer 1 of the present embodiment. In addition, the structure of the analyzer 1 of this embodiment is not limited to the above-mentioned structure. For example, the function of one component may be assigned to another component, or the function of one component may be shared with another component. Also, for example, a single component may be configured to have a function shared by separate components. Also, for example, another function may be added to the function of each component.

(Operation)
Next, the operation (also referred to as an analysis method) of the analyzer 1 of the present embodiment will be described with reference to the drawings. FIG. 10 is a flowchart for explaining the first iteration by the analyzer 1. FIG. 11 is a flowchart for explaining the second and subsequent iterations by the analyzer 1. However, in the description along the flowcharts of FIGS. 10 and 11, the analyzer 1 will be described as an operation subject.

In FIG. 10, first, the analyzer 1 receives an input matrix A (step S11).

Next, the analyzer 1 performs topic analysis of the input matrix A (step S12). At this time, the analysis apparatus 1 generates a dictionary matrix H ₁ and an index matrix W ₁ of the input matrix A.

Then, the analyzer 1 calculates the matrix product W ₁ H ₁ between the index matrices W ₁ and dictionary matrix H ₁ (step S13).

Next, the analyzer 1 calculates a matrix difference Ro as a difference between the input matrix A and the matrix product W ₁ H _1, and a residual matrix R in which negative elements of the elements of the calculated matrix difference Ro are replaced with 0. Are generated (step S14).

Then, the analyzer 1 obtains the matrix product W ₁ H _1, with respect to the obtained matrix product W ₁ H _1, deriving a first position L _WH corresponding to the position of a cell having a threshold theta ₂ or more values (Step S15).

Next, when the element of the cell at the first position L _WH in the residual matrix R is less than or equal to the threshold θ ₁ , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S16).

Next, the analyzer 1 refers to the input matrix A and the refinement residual matrix R * to generate an enhanced residual matrix R _{L in} which elements of specific columns and rows of the input matrix A are enhanced (step S17). ).

Then, the analysis device 1 stores the generated enhanced residual matrix R _L in the enhanced residual matrix storage unit 18 (step S18). After step S18, the process proceeds to A of the flowchart of FIG.

The above is the description of the first iteration by the analyzer 1 along the flowchart of FIG.

Subsequently, the second and subsequent iterations by the analyzer 1 will be described along the flowchart of FIG. In the description according to the flowchart of FIG. 11, the m-th iteration will be described (m is an integer of 2 or more).

In FIG. 11, first, the analysis device 1 executes topic analysis of the enhanced residual matrix R _L stored in the enhanced residual matrix storage unit 18 (step S21). At this time, the analyzer 1 generates a dictionary matrix H _m and an index matrix W _m .

Next, the analyzer 1 calculates a matrix product W _m H _m of the index matrix W _m and the dictionary matrix H _m (step S22).

Next, the analysis device 1 calculates the matrix difference Ro as the difference between the enhancement residual matrix R _L and the matrix product W _m H _m, and leaves the negative element of the elements of the calculated matrix difference Ro replaced with 0. A difference matrix R is generated (step S23).

Then, the analyzer 1 obtains the matrix product W ₁ H _1, with respect to the obtained matrix product W ₁ H _1, deriving a first position L _WH corresponding to the position of a cell having a threshold theta ₂ or more elements (Step S24).

Next, when the element of the cell at the first position L _WH in the residual matrix R is less than or equal to the threshold θ ₁ , the analyzer 1 generates a refinement residual matrix R * in which the element of the cell is replaced by 0. (Step S25).

Next, the analyzer 1 refers to the residual matrix R and the refinement residual matrix R * to generate an enhanced residual matrix R _{L in} which elements of specific columns and rows of the residual matrix R are emphasized ( Step S26).

Then, the analysis device 1 stores the generated enhanced residual matrix R _L in the enhanced residual matrix storage unit 18 (step S27).

Here, when the predetermined condition is not satisfied (No in step S28), the process returns to step S21 to execute the next iteration, and when the predetermined condition is satisfied (Yes in step S28), the process is ended. Do.

The above is the description according to the flowchart of FIG. In addition, the process along the flowchart of FIG. 11 is an example, Comprising: Operation | movement of the analyzer 1 of this embodiment is not limited. For example, any step may be divided into a plurality of steps. Also, for example, processing divided into separate steps may be configured to be performed in a single step. Also, for example, another process may be added to each step.

As described above, the analysis device of the present embodiment generates an enhanced residual matrix in which values other than the acquired topic are emphasized. The analyzer according to the present embodiment can emphasize a portion that has not been acquired, even if the residual generated by the acquired topic has a large value.

According to the present embodiment, repeating the topic analysis increases the chance that a topic not acquired as a topic is acquired in the later topic analysis. As a result, according to this embodiment, rare topics that can not be acquired in the preceding topic analysis are more likely to be acquired in the subsequent topic analysis. That is, according to the present embodiment, it is possible to discover not only events with high frequency and events with medium frequency, but also patterns regarding events with low frequency.

Second Embodiment
Next, an analyzer according to a second embodiment of the present invention will be described with reference to the drawings. The analyzer of the present embodiment is a simplification of the configuration of the analyzer 1 of the first embodiment.

FIG. 12 is a block diagram showing the configuration of the analyzer 2 of this embodiment. As illustrated in FIG. 12, the analysis device 2 includes a topic analysis unit 21, a matrix product storage unit 22, a residual matrix derivation unit 23, a residual matrix refinement unit 25, a residual matrix boost unit 27, and an enhanced residual matrix storage unit 28 is provided. In addition, the connection line which mutually connects each component is an example, Comprising: The connection between each component is not limited.

The topic analysis unit 21 receives an input matrix as an analysis target matrix in the first iteration. In addition, in the second and subsequent iterations, the topic analysis unit 21 receives, as an analysis target matrix, the enhancement residual matrix generated in the previous iterations.

The topic analysis unit 21 performs topic analysis on the input analysis target matrix to generate a dictionary matrix storing topics and an index matrix indicating which topics are included and to what extent. The topic analysis unit 21 calculates a matrix product of the generated index matrix and the dictionary matrix. The topic analysis unit 21 stores the calculated matrix product in the matrix product storage unit 22.

The matrix product storage unit 22 stores the matrix product calculated by the topic analysis unit 21.

The residual matrix deriving unit 23 receives an analysis target matrix as an input. Further, the residual matrix deriving unit 23 refers to the matrix product storage unit 22 and inputs a matrix product corresponding to the input analysis target matrix. The residual matrix deriving unit 23 calculates the matrix difference between the analysis target matrix and the matrix product generated from the analysis target matrix. The residual matrix deriving unit 23 generates a residual matrix in which negative elements of the elements of the calculated matrix difference are replaced with 0. The residual matrix deriving unit 23 outputs the generated residual matrix to the residual matrix refinement unit 25.

The residual matrix refinement unit 25 obtains the residual matrix from the residual matrix derivation unit 23. The residual matrix refinement unit 25 generates a refinement residual matrix by removing noise included in the residual matrix. The residual matrix refinement unit 25 outputs the derived refinement residual matrix to the residual matrix boost unit 27.

For example, the residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements below the first threshold with 0. Further, the residual matrix refinement unit 25 may derive a refinement residual matrix by replacing an element that becomes negative by subtracting a predetermined value with respect to each element of the residual matrix.

For example, the residual matrix refinement unit 25 refers to the matrix product storage unit 22 and obtains the matrix product stored in the matrix product storage unit 22. The residual matrix refinement unit 25 derives a first position corresponding to the position of an element equal to or greater than the second threshold with respect to the acquired matrix product. The residual matrix refinement unit 25 derives a refinement residual matrix by replacing elements of the first position less than or equal to the first threshold with 0 as to the residual matrix.

The residual matrix boost unit 27 receives a refinement residual matrix as an input. The residual matrix boost unit 27 generates an enhanced residual matrix in which elements of specific rows and columns of the refinement residual matrix are enhanced. The residual matrix boost unit 27 stores the generated enhanced residual matrix in the enhanced residual matrix storage unit 28.

The enhancement residual matrix storage unit 28 stores the enhancement residual matrix generated by the residual matrix boost unit 27.

The above is the description of the configuration of the analyzer 2 of the present embodiment.

[Remaining matrix refining section]
Next, the detailed configuration of the residual matrix refinement unit 25 included in the analysis device 2 will be described using the drawings. The following shows an example of deriving a first position and removing noise from the element of the position corresponding to the first position in the residual matrix.

FIG. 13 is a block diagram showing an example of the configuration of residual matrix refinement unit 25. Referring to FIG. As illustrated in FIG. 13, the residual matrix refinement unit 25 includes an input unit 51, a first cell derivation unit 52, a refinement residual matrix generation unit 53, and an output unit 54. In FIG. 13, the connection between the components of the residual matrix refining unit 25 is omitted. In addition, each component in FIG. 13 may be shared with another component, may be divided, or another component may be added.

The input unit 51 receives the matrix product stored in the matrix product storage unit 22 as an input. The input unit 51 outputs the residual matrix to the first cell derivation unit 52. The input unit 51 also receives the residual matrix of the analysis target matrix from the residual matrix derivation unit 23. The input unit 51 outputs the residual matrix of the analysis target matrix to the refinement residual matrix generation unit 53.

A matrix product is input to the first cell derivation unit 52 from the input unit 51. The first cell derivation unit 52 derives a position (also referred to as a first position) of a cell (also referred to as a first cell) of an element having a second threshold or more with respect to the acquired matrix product. The first cell derivation unit 52 outputs the derived first position to the refinement residual matrix generation unit 53.

The refinement residual matrix generation unit 53 acquires a residual matrix. The refinement residual matrix generation unit 53 generates a refinement residual matrix in which the cells of the elements at the first threshold and below the first threshold among the cells at the first position are replaced with 0 in the acquired residual matrix. The refinement residual matrix generation unit 53 outputs the generated refinement residual matrix to the output unit 54.

The output unit 54 outputs the refinement residual matrix to the residual matrix boost unit 27.

The above is the description of the configuration of the residual matrix refinement unit 25.

[Residual matrix boost unit]
Next, the detailed configuration of the residual matrix boost unit 27 included in the analysis device 2 will be described using the drawings. FIG. 14 is a block diagram showing an example of the configuration of the residual matrix boost unit 27. As shown in FIG. As shown in FIG. 14, the residual matrix boost unit 27 includes an input unit 71, a statistic calculation unit 72, a selection unit 73, a diagonal matrix generation unit 74, an enhanced residual matrix calculation unit 75, and an output unit 76. In FIG. 14, the connection between the components of the residual matrix boost unit 27 is omitted. Also, each component in FIG. 14 may be shared with another component, may be divided, or another component may be added.

The residual matrix is input from the residual matrix derivation unit 23 to the input unit 71, and the refinement residual matrix is input from the residual matrix refinement unit 25. The input unit 71 outputs the refinement residual matrix to the statistic calculation unit 72, and outputs the residual matrix to the selection unit 73, the diagonal matrix generation unit 74, and the enhanced residual matrix calculation unit 75.

The refinement residual matrix is input to the statistic calculator 72 from the input unit 71. The statistic calculator 72 calculates statistics for each row and each column of the refinement residual matrix. For example, the statistic calculator 72 calculates statistics such as L ¹ norm and L ² norm for each row and each column of the refinement residual matrix. The statistic calculation unit 72 generates, for each row and each column of the refinement residual matrix, a vector (row reference vector and column reference vector) having the statistic of each row and column as an element. The statistic calculation unit 72 outputs the row reference vector and the column reference vector of the refinement residual matrix to the selection unit 73.

The row reference vector and the column reference vector are input from the statistic calculation unit 72 to the selection unit 73. The selection unit 73 selects one row and one column (reference row and reference column) from each row and each column of the refinement residual matrix on the basis of the statistics constituting the row reference vector and the column reference vector. For example, the selection unit 73 randomly selects one reference row and one reference column from a plurality of rows and columns. In addition, for example, the selection unit 73 may select a row and a column having the largest statistics constituting a plurality of vectors as a reference row and a reference column.

The selection unit 73 selects a row (also referred to as a selected row) corresponding to the reference row from the residual matrix, and generates a vector (also referred to as a selected row vector) composed of the values of the selected row. Similarly, the selection unit 73 selects a column (also referred to as a selected column) corresponding to the reference column from the residual matrix, and generates a vector (also referred to as a selected column vector) composed of the values of the selected column. The selection unit 73 outputs the selected row vector and the selected column vector to the diagonal matrix generation unit 74.

The selected row vector and the selected column vector are input from the selection unit 73 to the diagonal matrix generation unit 74, and the residual matrix is input from the input unit 71. The diagonal matrix generation unit 74 calculates, for all the rows of the residual matrix, the similarity between the elements of the residual matrix and the elements of the selected row vector for each element of the same row number. Similarly, for all the columns of the residual matrix, the diagonal matrix generator 74 calculates, for each element of the same column number, the similarity between the elements of the residual matrix and the elements of the selected column vector.

The diagonal matrix generation unit 74 is a matrix in which the degree of similarity calculated for each element of the selected row vector is set to a diagonal element and the non-diagonal element is set to 0 (hereinafter referred to as a first diagonal matrix) Generate Similarly, in the diagonal matrix generation unit 74, the degree of similarity calculated for each element of the selected column vector is set to the diagonal element, and the non-diagonal element is set to 0 (hereinafter referred to as the second diagonal matrix). To generate The diagonal matrix generation unit 74 outputs the generated first diagonal matrix and second diagonal matrix to the enhanced residual matrix calculation unit 75.

The enhanced residual matrix calculation unit 75 receives the residual matrix from the input unit 71, and receives the first diagonal matrix and the second diagonal matrix from the diagonal matrix generation unit 74. The enhanced residual matrix calculator 75 generates an enhanced residual matrix by calculating a matrix product of the first diagonal matrix, the residual matrix, and the second diagonal matrix. The emphasis residual matrix calculation unit 75 outputs the generated emphasis residual matrix to the output unit 76.

The output unit 76 stores the enhancement residual matrix generated by the enhancement residual matrix calculation unit 75 in the enhancement residual matrix storage unit 28.

The above is the description of the configuration of the residual matrix boosting unit 27.

As described above, according to the analysis device of the present embodiment, since the low frequency events are emphasized by repeating the iteration, it becomes easy to find a topic related to the low frequency events.

(Hardware configuration)
Here, a hardware configuration for realizing the analyzer according to each embodiment of the present invention will be described with reference to the drawings. In the following, a computer will be mentioned as a hardware configuration for realizing the analysis device of each embodiment. However, the hardware configuration for realizing the analysis device of each embodiment is not limited to the following configuration.

FIG. 15 is a block diagram showing the configuration of a computer 90 as an example of the hardware configuration that implements the analysis device of each embodiment.

As shown in FIG. 15, the computer 90 includes a central processing unit 91 (CPU: Central Processing Unit), a first memory 92 (ROM: Read Only Memory), and a second memory 93 (RAM: Random Access Memory). The computer 90 also includes an internal storage device 94, an input / output connection circuit 95 (IOC: Input Output Circuit), and a network interface circuit 96 (NIC: Network Interface Circuit). The computer 90 is also connected to the input device 98 and the display device 99 via the input / output connection circuit 95. The computer 90 in FIG. 15 is a configuration example for realizing the analyzer of each embodiment, and does not limit the scope of the present invention.

The central processing unit 91 reads the program from the first memory 92. The central processing unit 91 controls the second memory 93, the internal storage unit 94, the input / output connection circuit 95, and the network interface circuit 96 based on the read program. The central processing unit 91 may use the second memory 93 or the internal storage unit 94 as a program storage area when realizing the functions of the analysis apparatus of each embodiment.

The central processing unit 91 may read the program from a storage medium in which the program is stored so as to be readable by the computer 90 using a storage medium reading device (not shown). The central processing unit 91 receives a program from an external device (not shown) via the input / output connection circuit 95, stores the received program in the second memory 93, and based on the program stored in the second memory 93. May operate.

The first memory 92 is a non-volatile storage medium for storing programs executed by the central processing unit 91 and fixed data. The first memory 92 can be realized by, for example, a PROM (Programmable ROM) or a flash ROM.

The second memory 93 is a volatile storage medium for temporarily storing programs executed by the central processing unit 91 and data. The second memory 93 can be realized by, for example, a DRAM (Dynamic RAM).

The internal storage device 94 is a non-volatile storage medium for storing data and programs to be stored for a long time. The internal storage device 94 may be operated as a temporary storage device of the central processing unit 91. For example, the internal storage device 94 can be realized by a hard disk device, a magneto-optical disk device, a solid state drive (SSD), a disk array device, a flash memory, or the like.

The central processing unit 91 is operable based on a program stored in at least one of the first memory 92, the internal storage device 94, and the second memory 93. That is, the central processing unit 91 can operate using a non-volatile storage medium or a volatile storage medium.

In addition, the computer 90 may be equipped with a disk drive (not shown) as needed. The disk drive is connected to the bus 97. For example, the disk drive mediates reading of the data program from the recording medium, writing of the processing result of the computer 90 to the recording medium, and the like between the central processing unit 91 and the recording medium (program recording medium) not shown. . For example, the recording medium can be realized by an optical recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). Also, the recording medium may be realized by a semiconductor recording medium such as a Universal Serial Bus (USB) memory or a Secure Digital (SD) card, a magnetic recording medium such as a flexible disk, or another recording medium.

The input / output connection circuit 95 is a circuit that mediates the exchange of data between the central processing unit 91 and input / output devices such as the input device 98 and the display device 99. That is, the input / output connection circuit 95 is an interface for connecting the computer 90 and peripheral devices based on the standards and specifications. For example, the input / output connection circuit 95 can be realized by an IO (Input Output Circuit) interface card, a USB (Universal Serial Bus) card, or the like.

The input device 98 is a device that receives an input instruction input by the operator of the computer 90. For example, the input device 98 is realized by a keyboard, a mouse, a touch panel or the like.

The display device 99 is a device that provides the operator of the computer 90 with display information. For example, the display device 99 is realized by a liquid crystal display, a projector, or the like.

The network interface circuit 96 is a circuit that relays data exchange between an external device (not shown) and the computer via a network. That is, the network interface circuit 96 is an interface for connecting to an external system or apparatus through a network such as the Internet or an intranet. For example, the network interface circuit 96 is realized by a LAN (Local Area Network) card.

As described above, when the central processing unit 91 of the computer 90 operates based on a program, the functions of the analysis device of each embodiment can be realized. Note that the analysis device of each embodiment may be configured by hardware in which a plurality of constituent elements are combined. In addition, the components of the analyzer of each embodiment may be configured by at least one hardware circuit. In addition, the components of the analyzer of each embodiment may be configured by combining a plurality of hardware circuits. In addition, the components of the analysis device of each embodiment may be configured by a plurality of devices connected via a network.

The above is an example of the hardware configuration for enabling the analyzer according to each embodiment of the present invention. The hardware configuration in FIG. 15 is an example of the hardware configuration for realizing the analyzer according to each embodiment, and does not limit the scope of the present invention. Further, a program that causes a computer to execute the process related to the analyzer according to each embodiment is also included in the scope of the present invention. Furthermore, a program recording medium recording the program according to each embodiment is also included in the scope of the present invention.

Moreover, the components of the analyzer of each embodiment can be arbitrarily combined. The components of the analysis device of each embodiment may be realized by software or circuits.

Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
[Supplementary Note]
Some or all of the above embodiments may be described as in the following appendices, but is not limited to the following.
(Supplementary Note 1)
By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated, and the index matrix is generated. Topic analysis means for calculating the matrix product of と and the dictionary matrix,
Matrix product storage means in which the matrix product is stored;
Residual matrix deriving means for obtaining at least one of the matrix products stored in the matrix product storage means and the analysis target matrix, and for deriving a residual matrix equivalent to the difference between the analysis subject matrix and the matrix product When,
Residual matrix refinement means for obtaining a refinement residual matrix by obtaining the residual matrix and removing noise contained in the residual matrix;
The analysis target matrix and the refinement residual matrix are obtained, and based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix is derived in which elements including the topic not yet acquired are emphasized. Residual matrix boosting means,
And an emphasizing residual matrix storage means in which the emphasizing residual matrix derived by the residual matrix boosting means is accumulated.
(Supplementary Note 2)
The topic analysis means
The analyzer according to appendix 1, wherein the dictionary matrix and the index matrix are generated by performing nonnegative matrix factorization on the analysis target matrix.
(Supplementary Note 3)
The residual matrix deriving means
A matrix difference corresponding to a difference between the analysis target matrix and the matrix product is calculated, and the residual matrix is derived by replacing a negative element of the elements of the matrix difference with 0, as described in

Appendix

1 or 2. Analyzer.
(Supplementary Note 4)
The residual matrix refining unit
The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements below the first threshold with 0 for each element of the residual matrix.
(Supplementary Note 5)
The residual matrix refining unit
The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated by replacing elements that become negative by subtracting a predetermined value with 0 for each element of the residual matrix.
(Supplementary Note 6)
The residual matrix refining unit
15. The analyzer according to any one of appendices 1 to 3, wherein the refinement residual matrix is generated using a sparse estimation method.
(Appendix 7)
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The position of the element above the second threshold among the elements of the matrix product is derived as a first position, and noise is removed for the element at the first position in the residual matrix Analyzer according to paragraph.
(Supplementary Note 8)
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The analyzer according to appendix 7, wherein at least one of a specific row and column selected from the matrix product is set as a group, and the first position is set for each set group.
(Appendix 9)
The residual matrix refining unit
The analyzer according to appendix 8, wherein the refinement residual matrix is generated by performing the group-by-group sparse estimation with respect to the residual matrix.
(Supplementary Note 10)
The residual matrix boost unit
Calculating statistics of each row and each column of the refinement residual matrix, and selecting one reference row and reference column from each of the rows and columns of the residual matrix based on the calculated statistics;
Selecting a selected row and a selected column corresponding to each of the reference row and the reference column from the residual matrix;
Generating a selected row vector having the value of the selected row as an element, and a selected column vector having the value of the selected column as an element;
Calculating the similarity between each of the generated selected row vector and the elements of the selected column vector and each of the elements of the analysis target matrix;
Generating a first diagonal matrix in which the similarity calculated for each row is set to a diagonal element, and a second diagonal matrix in which the similarity calculated for each column is set to a diagonal element;
The analysis device according to any one of appendices 1 to 9, wherein a matrix product of the first diagonal matrix, the analysis target matrix, and the second diagonal matrix is derived as the enhanced residual matrix.
(Supplementary Note 11)
The residual matrix boost unit
10. The analyzer according to appendix 10, wherein L ¹ norm of each row and each column of the refinement residual matrix is calculated as the statistic.
(Supplementary Note 12)
The residual matrix boost unit
10. The analyzer according to appendix 10, wherein an L ² norm of each row and each column of the refinement residual matrix is calculated as the statistic.
(Supplementary Note 13)
The analysis device according to any one of appendices 1 to 12, further comprising residual matrix storage means for storing the residual matrix.
(Supplementary Note 14)
The analysis device according to any one of appendices 1 to 13, further comprising refining residual matrix storage means for storing the refining residual matrix.
(Supplementary Note 15)
The topic analysis means
The analyzer according to any one of appendices 1 to 14, wherein the topic analysis on the analysis target matrix including the enhanced residual matrix stored in the enhanced residual matrix storage means is repeated until a predetermined condition is satisfied. .
(Supplementary Note 16)
By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated.
Calculate the matrix product of the index matrix and the dictionary matrix;
Accumulating the matrix product,
Deriving a residual matrix corresponding to a difference between the analysis target matrix, the at least one accumulated matrix product, and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix,
An enhanced residual matrix is derived based on the analysis target matrix and the refinement residual matrix, in which an element including the topic that has not been acquired is emphasized.
An analysis method of including and accumulating the enhanced residual matrix in the analysis target matrix.
(Supplementary Note 17)
A process of generating a dictionary matrix storing topics included in the analysis target matrix by performing topic analysis on the analysis target matrix; and an index matrix indicating the degree to which the analysis target matrix includes the topics.
A process of calculating a matrix product of the index matrix and the dictionary matrix;
A process of accumulating the matrix product;
A process of deriving a residual matrix corresponding to a difference between the analysis target matrix, and at least one of the stored matrix product and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix;
A process of deriving, based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix in which an element including the topic that has not been acquired is emphasized.
A program causing a computer to execute a process of including the enhancement residual matrix in the analysis target matrix and accumulating the matrix.

Reference Signs List

1, 2

analysis device

11, 21

topic analysis unit

12, 22 matrix

product storage unit

13, 23 residual matrix derivation unit 14 residual

matrix storage unit

15, 25 residual matrix refinement unit 16 refinement residual

matrix storage unit

17, 27 Residual

matrix boost unit

18, 28 Enhanced residual matrix storage unit 51 Input unit 52 First cell derivation unit 53 Refinement residual matrix generation unit 54 Output unit 71 Input unit 72 Statistics value calculation unit 73 Selection unit 74 Diagonal matrix generation unit 75 Emphasized residual matrix calculator 76 Output

Claims

By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated, and the index matrix is generated. Topic analysis means for calculating the matrix product of と and the dictionary matrix,
Matrix product storage means in which the matrix product is stored;
Residual matrix deriving means for obtaining at least one of the matrix products stored in the matrix product storage means and the analysis target matrix, and for deriving a residual matrix equivalent to the difference between the analysis subject matrix and the matrix product When,
Residual matrix refinement means for obtaining a refinement residual matrix by obtaining the residual matrix and removing noise contained in the residual matrix;
The analysis target matrix and the refinement residual matrix are obtained, and based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix is derived in which elements including the topic not yet acquired are emphasized. Residual matrix boosting means,
And an emphasizing residual matrix storage means in which the emphasizing residual matrix derived by the residual matrix boosting means is accumulated.
The topic analysis means
The analyzer according to claim 1, wherein the dictionary matrix and the index matrix are generated by performing nonnegative matrix factorization on the analysis target matrix.
The residual matrix deriving means
A matrix difference corresponding to a difference between the analysis target matrix and the matrix product is calculated, and the residual matrix is derived by replacing negative elements of the elements of the matrix difference with 0. Analyzer as described.
The residual matrix refining unit
The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated by replacing the elements below the first threshold with 0 for each element of the residual matrix.
The residual matrix refining unit
The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated by replacing elements that become negative by subtracting a predetermined value with 0 for each element of the residual matrix. .
The residual matrix refining unit
The analyzer according to any one of claims 1 to 3, wherein the refinement residual matrix is generated using a sparse estimation method.
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The position of the element more than a 2nd threshold value among the elements of said matrix product is derived as a 1st position, and noise is removed about the element of said 1st position in said remainder matrix. Analyzer according to one of the items.
The residual matrix refining unit
Obtaining the matrix product from the matrix product storage means;
The analyzer according to claim 7, wherein at least one of a specific row and column selected from the matrix product is set as a group, and the first position is set for each set group.
The residual matrix refining unit
The analyzer according to claim 8, wherein the refinement residual matrix is generated by performing the group-wise sparse estimation on the residual matrix.
The residual matrix boost unit
Calculating statistics of each row and each column of the refinement residual matrix, and selecting one reference row and reference column from each of the rows and columns of the residual matrix based on the calculated statistics;
Selecting a selected row and a selected column corresponding to each of the reference row and the reference column from the residual matrix;
Generating a selected row vector having the value of the selected row as an element, and a selected column vector having the value of the selected column as an element;
Calculating the similarity between each of the generated selected row vector and the elements of the selected column vector and each of the elements of the analysis target matrix;
Generating a first diagonal matrix in which the similarity calculated for each row is set to a diagonal element, and a second diagonal matrix in which the similarity calculated for each column is set to a diagonal element;
The analyzer according to any one of claims 1 to 9, wherein a matrix product of the first diagonal matrix, the analysis target matrix, and the second diagonal matrix is derived as the enhanced residual matrix.
The residual matrix boost unit
The analyzer according to claim 10, wherein L 1 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
The residual matrix boost unit
The analyzer according to claim 10, wherein an L 2 norm of each row and each column of the refinement residual matrix is calculated as the statistic.
The analyzer according to any one of claims 1 to 12, further comprising residual matrix storage means for storing the residual matrix.
The analyzer according to any one of claims 1 to 13, further comprising refining residual matrix storage means for storing the refining residual matrix.
The topic analysis means
The analysis according to any one of claims 1 to 14, wherein the topic analysis on the analysis target matrix including the enhancement residual matrix stored in the enhancement residual matrix storage means is repeated until a predetermined condition is satisfied. apparatus.
By performing topic analysis on the analysis target matrix, a dictionary matrix storing topics included in the analysis target matrix and an index matrix indicating the degree to which the analysis target matrix includes the topic are generated.
Calculate the matrix product of the index matrix and the dictionary matrix;
Accumulating the matrix product,
Deriving a residual matrix corresponding to a difference between the analysis target matrix, the at least one accumulated matrix product, and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix,
An enhanced residual matrix is derived based on the analysis target matrix and the refinement residual matrix, in which an element including the topic that has not been acquired is emphasized.
An analysis method of including and accumulating the enhanced residual matrix in the analysis target matrix.
A process of generating a dictionary matrix storing topics included in the analysis target matrix by performing topic analysis on the analysis target matrix; and an index matrix indicating the degree to which the analysis target matrix includes the topics.
A process of calculating a matrix product of the index matrix and the dictionary matrix;
A process of accumulating the matrix product;
A process of deriving a residual matrix corresponding to a difference between the analysis target matrix, and at least one of the stored matrix product and the analysis target matrix;
Generating a refinement residual matrix by removing noise contained in the residual matrix;
A process of deriving, based on the analysis target matrix and the refinement residual matrix, an enhanced residual matrix in which an element including the topic that has not been acquired is emphasized.
A program storage medium storing a program that causes a computer to execute a process of including and accumulating the enhancement residual matrix in the analysis target matrix.