KR101613397B1 - Method and apparatus for associating topic data with numerical time series - Google Patents

Method and apparatus for associating topic data with numerical time series Download PDF

Info

Publication number
KR101613397B1
KR101613397B1 KR1020150076402A KR20150076402A KR101613397B1 KR 101613397 B1 KR101613397 B1 KR 101613397B1 KR 1020150076402 A KR1020150076402 A KR 1020150076402A KR 20150076402 A KR20150076402 A KR 20150076402A KR 101613397 B1 KR101613397 B1 KR 101613397B1
Authority
KR
South Korea
Prior art keywords
time
topic
series
data
text data
Prior art date
Application number
KR1020150076402A
Other languages
Korean (ko)
Inventor
문일철
박성래
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Priority to KR1020150076402A priority Critical patent/KR101613397B1/en
Application granted granted Critical
Publication of KR101613397B1 publication Critical patent/KR101613397B1/en

Links

Images

Classifications

    • G06F17/2745
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/277

Abstract

Embodiments relate to a method and apparatus for associating text data and time series numerical data. A method of associating time series text data and time series numerical data to be performed by a computing device, comprising: obtaining a data set including time series text data and time series numerical data corresponding to each other temporally, Applying an ATM (Associative Topic Model) to the numeric data, the step of applying ATM comprises the steps of calculating a time-dependent trajectory of topic proportions from time-series text data, And correlating it with time-series numerical data on the time axis.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and apparatus for associating time series text data and time series numerical data,

The disclosed technique relates to big data processing, and particularly relates to a technique for associating large data based time series text data with time series numerical data.

In recent years, big data data mining techniques have been proposed to collect useful data by collecting and analyzing such online data as the number of data generated and exchanged on the Internet increases. For example, studies are under way to forecast and prepare economic conditions and stock price fluctuations by synthesizing and analyzing public information through SNS (social network service) such as Twitter or Facebook.

Typically, when a series of events occurs, various types of time series data are generated, such as information generated by individuals on the SNS, analysis articles on specialized sites, articles on media, and statistical numerical information. For example, a series of economic events affect not only stock market indices but also economic news streams.

Accordingly, there has been proposed a method for associating different types of data with each other. For example, some researchers have proposed a technique for predicting whether the next day's stock market will fluctuate from financial information extracted from Big Data collected in SNS since 2010. For example, Bollen published a paper that predicted the DOW Index (ie, DJI, Dow Jones Industrial Average) using OpinionFinder and Google's Google-Profile of Mood States (GPOMS). In this paper, we analyze the twitter-based big data by using emotional analysis system such as OpinioFinder and GPOMS, classify the DOW index up / down by 87% probability by using Granger causality and Neural Network technique classification). In addition to these, various papers have been published, but there have not been any researches on professional analytical techniques that can be used in practice using big data analysis technology.

It is an object of the present invention to provide a method and an apparatus for associating text data and time series numerical data to improve understanding of a specific event by associating text data with time series numerical data, The purpose is to provide.

In particular, the disclosed technique is based on a topic model that finds a topic affected by time series numerical data and text data, that is, text data associating text data with time series numerical data by an Associative Topic Model (ATM) And an object thereof is to provide a method of associating the same.

The disclosed technique also provides a method and apparatus for associating text data and time series numerical data that identifies a topic associated with a time series characteristic from various types of data and predicts time series numerical data at a higher accuracy than an iterative model It is for that purpose.

The above objects are provided by a method and apparatus for associating text data and time series numerical data provided according to embodiments.

A method of associating text data and time series numerical data provided according to an aspect of embodiments includes time series text data and time series numerical data association methods performed by a computing device, And applying an ATM (Associative Topic Model) to time-series text data and time-series numerical data corresponding to each other in time, wherein applying the ATM comprises extracting a topic ratio calculating a time-based trajectory of topic proportions, and correlating topic ratios according to the trajectory with time-series numerical data in a time axis.

The step of calculating the locus of the topic proportions from the time series text data may include calculating a locus of the topic proportions from the time series text data according to a time according to a Dynamic Topic Model (DTM) Based on a time series numerical variable generated from the same prior information as the topic proportion.

The step of correlating the topic ratios according to the locus with the time series numerical data in the time axis may further comprise the step of selecting a corpus-level state of the topic ratio at time t, a document- level state of the topic ratio at time t as a variational method, In accordance with the present invention.

 In addition, correlating the topic ratios according to the locus with the time series numerical data in the time axis may further comprise using a Kalman filter to estimate the dynamics of the topic ratios in the corpus over time have.

In addition, acquiring a data set that includes time-series text data and time-series numeric data that are temporally corresponding to each other includes determining whether the data set has terms that are less than stop words or frequencies less than a predefined ratio And finding and removing the step.

The apparatus for associating time-series text data and time-series numerical data provided according to another aspect of the embodiments, comprising: a data acquiring module for acquiring a data set of time-series text data and time-series numeric data corresponding to each other temporally, And an ATM (Associative Topic Model) module for correlating the topic ratios according to the locus with time series numerical data on the time axis.

In addition, the ATM module can be configured to calculate a time-based trajectory of the topic ratio from the time-series text data based on a time-series numerical variable generated from the same fryer information as the topic ratio of the dynamic topic model (DTM).

The ATM module also uses the variance method to determine the corpus-level state of the topic ratio at time t, the document-level state of the topic ratio at time t to correlate the topic ratios along the locus with the time series numerical data in the time axis, ). ≪ / RTI >

The ATM module can also be further configured to estimate the dynamics of topic ratios in the corpus over time to correlate topic ratios along the locus with time series numerical data in the time axis and to use a Kalman filter for estimation have.

Further, the data acquisition module for obtaining time-series text data and time-series numerical data corresponding to each other in time can be configured to find out whether there is a term that appears in the data set with a frequency less than a stop word or a predefined ratio.

There is provided a computer-readable medium having recorded thereon a program provided in accordance with another aspect of the embodiments, the program comprising the steps of: when executed by a computer, obtaining a data set comprising time-series text data and time- Applying an ATM (Associative Topic Model) to time-series text data and time-series numeric data that correspond to each other in time, and the step of applying ATM is a step of generating a trajectory according to time of topic proportions from time- Calculating and correlating topic ratios according to the locus with time-series numerical data in the time axis.

In addition, the step of calculating the locus of the topic ratio from the time series text data with respect to time from the time series text data may include the step of calculating a locus of the topic ratio from the time series text data with respect to time in the same manner as the topic ratio of the dynamic topic model (DTM: based on the time series numerical parameter generated from the prior information.

Also, the step of correlating the topic ratios according to the locus with the time series numerical data in the time axis may comprise the step of comparing the document-level state of the topic ratio at the time t with the corpus-level state of the topic ratio at time t, method according to an embodiment of the present invention.

In addition, the step of correlating the topic ratios according to the locus with the time series numerical data in the time axis further comprises using a Kalman filter to estimate the dynamics of the topic ratios in the corpus over time can do.

Further, the program may further comprise the steps of acquiring a time-series corresponding time-series text data and a data set including time-series numeric data, wherein the data words are stored in a database with stop words or terms appearing less frequently than a pre- It is possible to carry out a step of finding out whether there is a defect.

A memory for storing time-series text data and time-series numeric data in temporal correspondence with each other, and a processor, the processor comprising: an ATM (Associative Topic Model), and to correlate the topic ratios according to the locus with the time series numerical data stored in the memory in the time axis.

The processor may further be configured to calculate a time-based trajectory of the topic ratio from the time-series text data based on a time-series numerical variable generated from the same fryer information as the topic ratio of a dynamic topic model (DTM).

The processor may also be configured to compute the corpus-level state of the topic ratio at time t, the document-level state of the topic ratio at time t in a variational method to correlate topic ratios along the locus with time- Lt; / RTI >

The processor may also be configured to estimate the dynamics of the topic ratio in the corpus over time to correlate topic ratios along the locus with time-series numerical data in the time axis, and to use a Kalman filter for estimation.

A memory for storing time-series text data and time-series numerical data corresponding to each other in a temporal manner, and a processor, the processor comprising: a time-series text data and time-series numerical data, (Associative Topic Model) module configured to calculate a time-based trajectory of a topic proportion from text data and to correlate topic ratios according to the trajectory with time-series numerical data stored in memory in a time axis .

The features and advantages of the embodiments will become more apparent from the following detailed description based on the accompanying drawings.

According to the embodiments, it is possible to provide a method and apparatus for associating text data and time-series numerical data that can improve understanding of a specific event by associating text data with time-series numeric data.

The disclosed technique is based on a topic model that finds a topic that is affected by numerical and text time series data, that is, an associated topic model (ATM: Associative Topic Model) that associates text data with time- An associated method and apparatus therefor.

The disclosed technique also identifies topics associated with time-series characteristics from various types of data and provides a method and an apparatus for associating textual and time series numerical data that predict numerical time series data with higher accuracy than an iterative model .

1 is a schematic diagram showing a general dynamic topic model (DTM)
Figure 2 is a schematic diagram showing an Associative Topic Model (ATM) according to an embodiment;
FIG. 3 is a flowchart showing a process of an ATM according to an embodiment.
FIG. 4 is a diagram showing a grouping of the top eight related topics as a result obtained by applying ATM to stock returns according to the embodiment; FIG.
FIG. 5 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the stock price return rate in the example of FIG. 4
5 (b) is a graph showing a comparison between the actual stock price return rate in the example of FIG. 4 and the stock price return ratio predicted by ATM
Figure 6 is a schematic diagram showing an associated topic among the results analyzed for volatilities according to an embodiment;
Fig. 7 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the stock price change in the example of Fig. 6
FIG. 7 (b) is a graph showing a comparison between the actual stock price change in the example of FIG. 6 and the stock price change predicted by the ATM
FIG. 8 is a schematic diagram showing an associated topic among the results analyzed for the Obama approval index according to the embodiment; FIG.
FIG. 9 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the Obama approval rate in the example of FIG. 8
FIG. 9 (b) is a graph showing a comparison between the actual Obama approval rate and the Obama approval rate predicted by the ATM in the example of FIG. 8
FIG. 10 is a schematic view showing an associated topic among the analyzed results of the stock price return rate according to the embodiment
11 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the stock price return rate in the example of FIG. 10
11 (b) is a graph showing the actual stock price return rate in the example of FIG. 10 compared with the stock price return rate predicted by ATM
FIG. 12 is a schematic view showing an associated topic among the analyzed results of the stock price change according to the embodiment
Fig. 13 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the stock price change in the example of Fig. 12
FIG. 13 (b) is a graph showing a comparison between the actual stock price change in the example of FIG. 12 and the stock price change predicted by the ATM
FIG. 14 is a schematic diagram showing an associated topic among results analyzed for the Obama approval index according to an embodiment; FIG.
FIG. 15 (a) is a graph showing the dynamics of the topic ratio among the results analyzed for the Obama approval rate in the example of FIG. 14
FIG. 15 (b) is a graph showing a comparison between the actual Obama support rate in the example of FIG. 14 and the Obama support rate predicted by ATM
FIG. 16 is a graph showing log likelihood among the results analyzed for the stock price return according to the embodiment by comparing various conventional models with ATM for the entire test period
FIG. 17 is a graph showing the results of estimating the log probability during the last test period among the results analyzed for the stock price return in the example of FIG. 16,
18 is a graph showing the log probability among the results of analyzing the stock price change according to the embodiment by comparing various existing models with the ATM for the entire test period
FIG. 19 is a graph showing comparison of various existing models with ATM with respect to the result of estimating the log probability during the last test period among the analyzed results of the stock price change in the example of FIG. 18
FIG. 20 is a graph showing the log probability of the results analyzed for the Obama approval rate according to the embodiment by comparing various existing models with the ATM for the entire test period
FIG. 21 is a graph showing comparison of various existing models with ATM with respect to the result of estimating the log probability in the final test period among the results analyzed for the Obama support rate in the example of FIG. 20

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

FIG. 1 is a schematic diagram showing a general dynamic topic model (DTM), and FIG. 2 is a schematic diagram showing an Associative Topic Model (ATM) according to an embodiment.

Referring to FIGS. 1 and 2, each circle represents a random variable. The gray filled circle is an observation variable, and the unfilled circle is a hidden variable. A large rectangle containing a plurality of random variables is called a plate, and the number specified in a corner (for example, D t , N) means that the number of sets of random variables contained in the plate is duplicated by that number. The arrows indicate that there is a statistical relationship represented by a probability distribution between two connected variables.

The ATM (see FIG. 2) can be configured to perform learning based on time series numerical values of associated topics extracted from time series text data. ATM computes the trajectory of topic proportions over time, which topics are correlated with numerical variables over time. ATM can be thought of as a combination of Kalman filter (Kalman filter) and latent Dirichlet allocation (LDA). ATM has a somewhat similar aspect with the DTM (see FIG. 1). DTM is the dynamics of the topic over time,

Figure 112015052152836-pat00001
And the occurrence frequency of the topic (appearances)
Figure 112015052152836-pat00002
≪ / RTI > The difference between DTM and ATM is that it adds 1) a time series variable influenced by the topic proportion, and 2) simplifies word distribution using topics.

Since ATM should include a time series numerical variable, it is necessary to make a simple link from the topic ratio to a numerical variable. To this end, it is assumed that the latent topic extracted from the corpus generates numerical values as well as words in the corpus. In order to make it easier to interpret, a single set of topic words within a certain period of time is created,

Figure 112015052152836-pat00003
To the DTM. Topic ratio at time t in ATM
Figure 112015052152836-pat00004
Gaussian distribution and numerical variables < RTI ID = 0.0 >
Figure 112015052152836-pat00005
, The percentage of topics in the document
Figure 112015052152836-pat00006
. ≪ / RTI >

Figures 1 and 2 are graphical representations of DTM and ATM. ATM compared to DTM

Figure 112015052152836-pat00007
Have more variables. At DTM and ATM
Figure 112015052152836-pat00008
Is modeled as a Gaussian distribution and represents the dynamics of the frequency of occurrence of the topic over time. in this case,
Figure 112015052152836-pat00009
Is the percentage of topics in the document
Figure 112015052152836-pat00010
Which have the same distribution type as shown in equations (1) and (2).

Figure 112015052152836-pat00011

Figure 112015052152836-pat00012

here,

Figure 112015052152836-pat00013
Are K-dimensional identical matrices. Each of the above co-variance matrices is modeled as a scalar matrix to reduce the computational cost. Within the document, each word's topic
Figure 112015052152836-pat00014
Is assigned according to the topic ratio of the document as expressed in Equation (3).

Figure 112015052152836-pat00015

In Equation (3)

Figure 112015052152836-pat00016
Silver Time
Figure 112015052152836-pat00017
In the document
Figure 112015052152836-pat00018
. In order to transform the Gaussian distribution into a priori in a multinomial distribution, a softmax function defined as in equation (4)
Figure 112015052152836-pat00019
Lt; / RTI >

Figure 112015052152836-pat00020

In the soft max function,

Figure 112015052152836-pat00021
The size of
Figure 112015052152836-pat00022
Is used to determine the topic for the word. As a result,
Figure 112015052152836-pat00023
For each word of a lexical set, each word has a topic distribution
Figure 112015052152836-pat00024
Lt; / RTI > So far, ATM's modeling approach is very similar to DTM.

The difference between ATM and DTM,

Figure 112015052152836-pat00025
Time-series numerical variable < RTI ID = 0.0 >
Figure 112015052152836-pat00026
. In order to use the same fryer information and use the same information as the text data when assigning a topic to a word, the time series variable is calculated using the same soft max function as in Equation 4
Figure 112015052152836-pat00027
Lt; / RTI > Equation (5)
Figure 112015052152836-pat00028
Wow
Figure 112015052152836-pat00029
The causal relationship between the model and the model.

Figure 112015052152836-pat00030

here

Figure 112015052152836-pat00031
Is a vector of K-dimensional linear coefficient parameters. Previous studies integrating word-level features in DTM suggest a simple linear combination of Gaussian distributions. In a similar manner,
Figure 112015052152836-pat00032
The corpus-level feature may be modeled as follows. nevertheless,
Figure 112015052152836-pat00033
The size of
Figure 112015052152836-pat00034
But does not generate a corpus. Affecting in both directions
Figure 112015052152836-pat00035
In order to model,
Figure 112015052152836-pat00036
Is rescaled to fit the topic selection procedure. Rescaled topic ratio and
Figure 112015052152836-pat00037
The
Figure 112015052152836-pat00038
Lt; / RTI > This assumption is the same as linear regressions using topic ratios as independent variables.

In general, the appropriate number of coefficients

Figure 112015052152836-pat00039
Is determined by the number of value data in the linear regression to avoid over-fitting. In this context, an appropriate number of topics need to be selected to prevent over-fitting. In Equation (5)
Figure 112015052152836-pat00040
Changes the equation, such as the strength factor of the fitting value along with the topic ratio. Changing
Figure 112015052152836-pat00041
To explain the relationship between text and numerical values in ATM
Figure 112015052152836-pat00042
Which means that it accepts the appropriate time series errors of. On the other hand,
Figure 112015052152836-pat00043
of
Figure 112015052152836-pat00044
Is fixed to a small value such as the dynamics of the ATM-induced topic ratio
Figure 112015052152836-pat00045
And is highly related to. That is,
Figure 112015052152836-pat00046
Is fixed to a small value, the ATM can only find very relevant topics to explain the correct trajectory of the clock during the training period. Hereinafter, an ATM having this processing is referred to as a fixed-ATM (ATM). FIG. 3 is a flow chart showing the process of the ATM according to the embodiment, and shows a generation process summarizing the above-described assumptions of the ATM.

The posterior reasoning part of ATM has a property that can not be traced. Therefore, a variational method is used to approximate the trailing part of the ATM of FIG. The idea based on variational methods is to optimize the free parameters of distribution of potential variables by optimizing KL (Kullback-Leibler) divergence.

At ATM, the potential variables are 1) the corpus-level latency of the topic ratio

Figure 112015052152836-pat00047
, 2) the document-level potential status of the topic ratio
Figure 112015052152836-pat00048
, And 3) a topic indicator
Figure 112015052152836-pat00049
. Equation (6) below represents the factorized assumptions of the assumed variational function
Figure 112015052152836-pat00050
.

Figure 112015052152836-pat00051

here,

Figure 112015052152836-pat00052
,
Figure 112015052152836-pat00053
,
Figure 112015052152836-pat00054
And
Figure 112015052152836-pat00055
Is a variation parameter of the assumed variance function for the latent variables.
Figure 112015052152836-pat00056
And
Figure 112015052152836-pat00057
The factorized variance distribution of
Figure 112015052152836-pat00058
And
Figure 112015052152836-pat00059
Lt; / RTI > But,
Figure 112015052152836-pat00060
In the variation distribution of the Gaussian fluctuation observation
Figure 112015052152836-pat00061
And maintains a sequential structure of the corpus topic expression. In the DTM, a variational Kalman filter model is used to represent topic dynamics. ATM uses a modified Kalman filter model as a model for estimating the dynamics of topic ratios in corpus over time. The main idea of a fluctuating Kalman filter is that observations in a standard Kalman filter &
Figure 112015052152836-pat00062
And the posterior distribution of the latent state in the standard Kalman filter model is considered to be the variance distribution
Figure 112015052152836-pat00063
.

The variable Kalman filter according to the embodiment is expressed by the following equations (7) and (8).

Figure 112015052152836-pat00064

Figure 112015052152836-pat00065

Using standard Kalman filter calculations, the forward mean and variance are fixed

Figure 112015052152836-pat00066
And fixed
Figure 112015052152836-pat00067
Is given by the following equation (9) together with the initial condition of < RTI ID = 0.0 >

Figure 112015052152836-pat00068

The inverse mean and variance are given by Equation (10) below.

Figure 112015052152836-pat00069

In Equation (10)

Figure 112015052152836-pat00070
And
Figure 112015052152836-pat00071
Is derived from the fluctuating Kalman filter calculations. Post State Space
Figure 112015052152836-pat00072
use with
Figure 112015052152836-pat00073
.

Using these fluctuation backslashes and Jensen's inequality, we can find the lower bound of log likelihood as shown in equation (11).

Figure 112015052152836-pat00074

These boundaries include four prediction terms associated with the data presently present. The first term relates to the latency state from both data sources. The second and third prediction terms are associated with textual data. Fourth term

Figure 112015052152836-pat00075
Is associated with continuous time series data. The first term on the right side of Equation (11) is expressed by Equation (12).

Figure 112015052152836-pat00076

Equation (12) uses the Gaussian secondary shape identity of Equation (13) below.

Figure 112015052152836-pat00077

The second term on the right side of Equation (11) is shown in Equation (14).

Figure 112015052152836-pat00078

The third term on the right side of Equation (11) is expressed by Equation (15).

Figure 112015052152836-pat00079

Equation (15) shows that, due to the soft max function in which the input variables are derived from the Gaussian distribution,

Figure 112015052152836-pat00080
. The closed form of this prediction term can not be computed, but the lower bound can be found. The processing of lower bounds maintains the lower bound of the log probability.

The fourth term on the right side of Equation (11) is expressed by Equation (16).

Figure 112015052152836-pat00081

Because we model regression from discretized topic ratios for numerical values

Figure 112015052152836-pat00082
SoftMax function on
Figure 112015052152836-pat00083
Is applied. The rationale of discretization is that the topic extracted from the text must influence regression,
Figure 112015052152836-pat00084
And
Figure 112015052152836-pat00085
The same soft-max processing is applied to the data. This discretization for topic extraction that occurs in the DTM combines polynomial and Gaussian distributions, but ATM is different in that it combines two Gaussian distributions. This combination of two Gaussian distributions using a soft max function results in a non-trivial calculation in the prediction calculation. In Equation 16, finding a lower boundary with the variation parameter of the prediction term is not traceable due to the non-concavity caused by the opposite signs of this prediction term. Also, the closed form of the prediction term can not be calculated accurately due to log-normality and difficulties in the ratio of the two random variables. Thus, ATM uses an approximate approach to calculate local predictions using Taylor expansions for rate estimation. Inference of stochastic graphical models with approximate prediction of soft max function with Gaussian fryer is a unique feature of ATM. The prediction of the simple soft max function is expressed in Equation (17).

Figure 112015052152836-pat00086

In Equation 17, a new symbol

Figure 112015052152836-pat00087
Is introduced. The combined prediction of the two soft max functions is consequently approximated as: < EMI ID = 18.0 >

Figure 112015052152836-pat00088

Figure 112015052152836-pat00089

The last term of Equation (11) is an entropy term and is expressed by the following Equation (20).

Figure 112015052152836-pat00090

Using the above-described prediction terms, an approximate lower boundary of log likelihood can be found.

Model parameter learning

By using a variational distribution, which is an approximate posterior distribution for the latent variables, an updater equation for the model parameters can be found. In a topic expression,

Figure 112015052152836-pat00091
Lt; RTI ID = 0.0 > (21) < / RTI >

Figure 112015052152836-pat00092

here

Figure 112015052152836-pat00093
silver
Figure 112015052152836-pat00094
And is 0 otherwise. Text data (
Figure 112015052152836-pat00095
) And numerical time series data (
Figure 112015052152836-pat00096
May be updated as shown in Equation (22) and Equation (23), respectively.

Figure 112015052152836-pat00097

Figure 112015052152836-pat00098

The document-level potential status of each topic ratio

Figure 112015052152836-pat00099
Lt; RTI ID = 0.0 >
Figure 112015052152836-pat00100
, The time series potential of the topic ratio
Figure 112015052152836-pat00101
, While numerical time series variables are derived from
Figure 112015052152836-pat00102
To have
Figure 112015052152836-pat00103
Lt; / RTI > By learning these distributions, it is possible to find the degree of association between text data and time series numerical data.
Figure 112015052152836-pat00104
This low value means that two data sources (ie, text data and time series numerical data) are strongly correlated and text data can help predict time series variables. On the other hand,
Figure 112015052152836-pat00105
If this value is high, it means that the two data sources are not related to each other. If the variance is fixed at a low value, ATM tends to learn topics with high explanatory power over the trajectory of time series values in the training data set.

As mentioned above, the rescaled

Figure 112015052152836-pat00106
And
Figure 112015052152836-pat00107
The linear combination of
Figure 112015052152836-pat00108
.
Figure 112015052152836-pat00109
To maximize the lower bound of its log-likelihood. Equation 24 below is a soft max function for inferring a coefficient vector.

Figure 112015052152836-pat00110

In Equation 24,

Figure 112015052152836-pat00111
The
Figure 112015052152836-pat00112
Figure 112015052152836-pat00113
Element with
Figure 112015052152836-pat00114
Matrix,
Figure 112015052152836-pat00115
The
Figure 112015052152836-pat00116
As an element
Figure 112015052152836-pat00117
Matrix,
Figure 112015052152836-pat00118
Period
Figure 112015052152836-pat00119
Lt; / RTI > This update equation is similar to the Gaussian response of a supervised latent Dirichlet Allocation (LDA).

prediction

After all the parameters have been learned, the ATM sends the new text data (

Figure 112015052152836-pat00120
) To observe the future time series variable (
Figure 112015052152836-pat00121
Can be used as a prediction model.
Figure 112015052152836-pat00122
Can be expressed by the following equation (25).

Figure 112015052152836-pat00123

To compute the predictions of the soft max function for the new time step,

Figure 112015052152836-pat00124
Backward distribution of the document,
Figure 112015052152836-pat00125
, And using learned model parameters
Figure 112015052152836-pat00126
Of the posterior distribution. The inference at this time is based on Equation (11) except for the fourth prediction term. After sufficient repetition of variable reasoning,
Figure 112015052152836-pat00127
) Can be used to predict the numerical value of the next time step.

An apparatus and method for associating text data and time-series numeric data using ATM as described above according to an embodiment are provided.

The associated device of text data and time series numerical data using ATM can be implemented as a computing device. The computing device includes, without limitation, an apparatus having a processor for performing data processing, a memory for storing programs and data, and the like, such as a personal computer, a server computer, a desktop, a laptop, a palmtop, The computing device may be one independent device, but it is also possible to implement a distributed computing system in which a plurality of devices connected by a data communication network cooperate with each other.

In the embodiment, the text data extracting the topic may be a step of collecting data on the SNS such as Twitter and Facebook. In another embodiment, text data may be collected from data such as shopping malls, newsgroups, media, etc., as well as the SNS. Numerical time series data includes a series of time series numerical data that are announced at regular intervals in time. The text data and the time series numerical data have the same total collection period, and the collection periods per unit time correspond to each other. For example, if the time series numerical data is the data collected during the total collection period of one year of the closing stock return, which is generated every Friday, this "short-term return on stock market" is a time series data having a period of one week's time period . The corresponding text data is likewise collected during the one-year total collection period as time-series data with a period of one week's time period. In other words, in this case, the text data collected from the previous Saturday to Friday are collected as one unit. That is, the set of texts collected between the time when the numerical data y (t-1) and y (t) are generated corresponds to y (t).

The text data and the time series numerical data thus collected may be stored in a memory or a hard disk built in the computing device and then provided to be usable by the processor. Big data collected in other manners may be stored in an optical disk or a portable memory, and then provided to the computing device. Big data collected in another manner may be stored in another remote computing device or a cloud server, and then provided to be usable through a data communication network such as the Internet.

On the other hand, a method of associating text data and time series numerical data using ATM can be performed in a corresponding module of the independent apparatus. May be implemented as a software program to be installed in a general purpose computing device, including, by way of example, a processor, memory, etc., to be executed by a processor of a general purpose computing device.

Experiment

Hereinafter, actual examples of associating text time series data with time series numerical data using ATM according to the embodiment are exemplified. Illustrated are examples of ATMs applied to financial news corpus (textual data) and stock price indexes (numerical data), examples of news corpus related to Obama (textual data) and President Obama's approval rating (numerical data). Table 1 shows the data set used in this experiment.

Experimental Example  Text data Numerical data One financial news articles from Bloomberg (2011.1 ~ 2013.4, 120 weeks) weekly closing returns of the Dow-Jones Industrial Average (DJIA) (2011.1 ~ 2013.4, 120 weeks) 2 financial news articles from Bloomberg (2011.1 ~ 2013.4, 120 weeks) weekly closing volatilityes of the Dow-Jones Industrial Average (DJIA) (2011.1 ~ 2013.4, 120 weeks) 3 news articles searched with the name of president Obama from The Washington Post (2009.1 ~ 2014.6, 284 weeks) weekly vice president approval index of Obama (2009.1 ~ 2014.6, 284 weeks)

As shown in Table 1. Experimental Example 1 is a data set in which text data obtained by collecting financial news articles of Bloomberg for 120 weeks on a weekly basis and numerical data obtained by collecting stock returns of the weekly foreground Dow index corresponding thereto are stored in a data set do. Experimental Example 2 is the data set of the text data of Experimental Example 1 and numerical data obtained by collecting the changes of the weekly long-term Dow Index (stock volatilities) corresponding thereto. Experimental Example 3 shows that the gathering of news articles extracted from the Washington Post articles using President Obama's name in a weekly 284-week period, The collected numerical data is regarded as a data set.

Of these data sets, text data was randomly selected, and the number of documents (i.e., news articles) collected during the collection unit period, i.e., a week, was preset. For example, in the case of Bloomberg articles, it was set at 500 for a week, and for the Washington Post article, it was not exceeded 100 for a week. Also, these text data were processed to remove stop words such as articles and pronouns which have no significant meaning such as 'the' and 'I'm' before analyzing by ATM. Also, words appearing in a small number of documents, such as person names that do not significantly affect the semantic segmentation of each document, have been removed for efficiency of analysis processing. In other words, less than 2% of the documents were removed.

By applying the above-described ATM to each of the above data sets, we have associated associated text data and numerical data such as stock price returns, stock price changes, and ratings. In these examples,

Figure 112015052152836-pat00128
To orient randomly disturbed topics from uniform topics,
Figure 112015052152836-pat00129
Is a sample dispersion of time series data,
Figure 112015052152836-pat00130
0.1,
Figure 112015052152836-pat00131
0.1,
Figure 112015052152836-pat00132
Was initialized to zero vector for the number of topics. The results are shown in Figs. 4-9.

FIG. 4 is a schematic diagram showing an associated topic among the results analyzed for stock returns according to an embodiment using ATM, FIG. 5 is a diagram illustrating the dynamics of the topic ratio among the results analyzed for the stock price return in the example of FIG. 4 And the actual return of the stock price is compared with the stock price return predicted by the ATM. FIG. 6 is a schematic diagram showing an associated topic among the results analyzed for stock volatilities according to the embodiment using ATM, FIG. 7 is a diagram illustrating the dynamics of the topic ratio among the results analyzed for the stock price change in the example of FIG. 6 This graph is a graph showing the change in the graph and actual stock price compared with the change in stock price predicted by ATM. Meanwhile, FIG. 8 is a schematic view showing an associated topic among results analyzed for an Obama approval index according to an embodiment using ATM, FIG. 9 is a diagram illustrating a relationship between a topic ratio And Obama's approval ratings compared to Obama's predicted by ATM.

Figures 4, 6, and 8 show related topics, each of which is represented by eight words. Each word was selected in order of occurrence frequency. The upper graphs of FIGS. 5, 7, and 9 show the dynamics of the topic ratio in the same order as the order of FIGS. 4, 6, and 8, respectively. In the upper graph of FIGS. 5, 7 and 9, the color of the topic ratio represents the effect of the topic. The blue color at the top of the graph shows a positive effect, the red color at the bottom shows a negative effect, . Figures 5, 7, and 9 show the comparison of actual stock price returns, stock price changes, and approval ratings, stock price returns, stock price changes, and ratings supported by ATM, respectively.

Referring to FIGS. 4 through 7, which are the results of analyzing the stock price return and the stock price change, the illustrated results are derived from different time series and some related topics, for example, due to the nature of the financial sector, Tax cuts, and economic reports, among others. However, some related topics are different. For example, topics on Asia's energy and economy are related to stock returns, while topics on federal lows are related to stock price changes. The dynamics of the topic ratio were very different due to different topics in the same text data. These results qualitatively demonstrate the feature that ATM identifies different topics associated with different time series values, even if the initial settings are the same as the text data. In FIGS. 5 and 7, when comparing the actual value and the predicted value of the stock price return and the stock price change, it can be seen that ATM predicts the stock price change more than the stock price return. These results are already anticipated by an efficient market theory that speaks of the difficulty of predicting stock prices and yields. In the financial sector, forecasting changes in stock prices is a common problem.

Referring now to Figures 8 and 9, the analysis of text data and rating data for President Obama is shown. The results show that topics related to family life are positively associated with ratings. However, some topics, such as Romney party, war and policy, are shown to be negatively associated. It also shows that some topics, such as education, tax, and agency, do not show high relevance. These results qualitatively demonstrate that ATM identifies reasonably relevant topics.

As can be seen from the above, the textual and numerical data association technique using ATM according to the embodiment shows how these two data are related to each other, and the different types of data collected from two different sources Lt; RTI ID = 0.0 > related < / RTI > topics.

ATMs, on the other hand, can be used to analyze existing data rather than predictions. In ATM modeling processor, time series numerical data and DTM are integrated. In time series modeling, numerical hidden states are defined as Gaussian error

Figure 112015052152836-pat00133
Is generated. If the error
Figure 112015052152836-pat00134
To
Figure 112015052152836-pat00135
, ATM will find only relevant topics that describe the exact trajectory of the time series during the learning period. As already mentioned above, the ATM in this case is referred to as fixed-ATM.

For the application of fixed-ATM, the data set was used identical to that shown in Table 1. Also,

Figure 112015052152836-pat00136
To orient randomly disturbed topics from uniform topics,
Figure 112015052152836-pat00137
0.1,
Figure 112015052152836-pat00138
0.1,
Figure 112015052152836-pat00139
Was initialized to zero vector for the number of topics. The number of topics was set at 10 for stock price returns and ratings, and 5 for stock price changes. 10 to 15 show these results.

FIG. 10 is a schematic diagram showing an associated topic among the results analyzed for the stock price return rate according to the embodiment using the fixed-ATM, FIG. 11 is a graph showing the dynamics of the topic ratio among the analyzed results of the stock price return in the example of FIG. And the actual stock price return is compared with the stock price return predicted by ATM. FIG. 12 is a schematic view showing an associated topic among the analyzed results of the stock price change according to the embodiment using the fixed-ATM, FIG. 13 is a graph showing the dynamics of the topic ratio among the analyzed results of the stock price change in the example of FIG. 12 And the actual share price change compared with the stock price change predicted by ATM. FIG. 14 is a schematic diagram showing an associated topic among the results analyzed for the Obama approval rate according to the embodiment using the fixed-ATM, FIG. 15 is a graph showing the dynamics of the topic ratio among the results analyzed for the Obama approval rate in the example of FIG. Graphs and actual Obama ratings compared to Obama's predictions as predicted by ATM.

Figures 10, 12 and 14 show related topics, each of which is represented by eight words. Each word was selected in order of occurrence frequency. The upper graphs of FIGS. 11, 13, and 15 show the dynamics of the topic ratio in the same order as the order of FIGS. 10, 12, and 14, respectively. In the upper graph of FIGS. 11, 13 and 15, the color of the topic ratio represents the effect of the topic. The blue color at the top of the graph shows a positive effect, the red color at the bottom shows a negative effect, . Figures 11, 13, and 15 show the comparison of actual stock price returns, stock price changes, and approval ratings, stock price returns, stock price changes, and ratings supported by ATM, respectively.

In the fixed-ATM analysis shown, if we look at the topic ratio dynamics, we can see that all dynamics change dramatically to include time series errors. Fixed-ATM is not useful for prediction and can be used to analyze topics with high relevance during the learning period. The result of ATM application and the application of fixed-ATM may be different. For example, in the case of the stock price returns shown in FIGS. 10 and 11, topics related to bloomberg editors and stories have shown the most negative impacts, while in the case of ATMs of FIGS. 4 and 5, .

Evaluation of prediction performance

In order to quantitatively evaluate the predictive performance of ATM, it is assumed that the model has been learned by the fryer data, and the next time value is predicted. For this, prediction was performed for 21 weeks in the case of stock price change and stock price change for the data set of Table 1, and for 25 weeks in case of support rate. The comparison models (AR, LDA-LR, IT-LDA, DTM-LR, IT-DTM) and ATM inferred five topics and all models were set to have the same initial parameters and approximated by the same criteria . For AR (p), p is chosen to be the best performing for test data. For the ITMTF model, the fryer feedback loop was repeated three times with a confidence threshold of 70%. After prediction, mean squared errors (MSE)

Figure 112015052152836-pat00140
And mean absolute errors (MAE)
Figure 112015052152836-pat00141
, Where M is the number of test points;
Figure 112015052152836-pat00142
Is the actual value of the prediction;
Figure 112015052152836-pat00143
Is a predicted value).

Figure 112015052152836-pat00144

Table 2 above shows the predictive performance of the comparison models and the proposed model (ATM) as MSE and MAE. Bold fonts represent the best model to generate both interpretation and prediction while simultaneously analyzing text and numbers. Underlined fonts, such as AR, are the best performing only in numeric models. ATM has the best overall performance.

Performance comparison between ATM and comparison models was also performed using the log likelihood metric under the same conditions as the prediction performance, and the results are shown in FIGS. 16 to 21. FIG.

16 is a graph showing a log probability (Log likelihood) among the results analyzed for the stock price return rate according to the embodiment of the performance test using the log probability metric by comparing various existing models with the ATM for the entire test period, In the example of FIG. 16, there is a graph showing various conventional models compared with ATM with respect to the result of estimating the log probability during the last test period among the analyzed results of the stock price return. 18 is a graph showing the log probability among the results of analyzing the stock price change according to the embodiment of the performance test using the log probability metric by comparing various existing models with the ATM for the entire test period, , Which is a graph showing the results of estimating the log probability during the last test period among the analyzed results of the stock price change in comparison with ATM. Meanwhile, FIG. 20 is a graph showing the log probability among the results analyzed for the Obama support rate according to the embodiment of the performance test using the log probability metric, comparing various existing models with the ATM for the entire test period, and FIG. In this paper, we present a comparison of various existing models against the results of estimating log probability in the last test period.

Referring to Figures 16, 18 and 20 showing the results for the entire test period, it can be seen that all experimental models have similar values. These results show that ATM has similar probability performance despite the strong assumption that two different sources are generated from the same hidden state.

17, 19 and 21 showing the results for the last unit period, ATM does not have the best performance. However, even though ATM incorporates time series values into the probability modeling part, it can be seen that it has similar performance compared with the comparative models.

A new topic model, i. E. ATM, according to the above embodiment can find the relationship between numerical data and corpus collected in time. ATM can be useful in a wide variety of applications with varying numerical data and text data collected from the crowd. For example, the model can be applied not only to political SNS messages, ratings, but also to product reviews, sales records, and so on.

Various and modified configurations are possible with reference to and combining various features described herein. Accordingly, it should be pointed out that the scope of the embodiments is not limited to the described embodiments, but rather should be construed in accordance with the appended claims.

Claims (20)

A time series text data and time series numerical data association method performed by a computing device,
Obtaining a data set including time-series text data and time-series numeric data corresponding to each other in time, and
And applying an associative topic model (ATM) to the time-series text data and the time-series numeric data corresponding to each other in time,
Wherein applying the ATM comprises: calculating a time-based trajectory of topic proportions from the time-series text data; and correlating topic ratios according to the trajectory with the time-series numerical data in a time axis,
Wherein the step of calculating the locus of the topic proportions from the time series text data includes the step of calculating a locus of the topic proportions from the time series text data according to time in a Dynamic Topic Model (DTM) Based on a time-series numerical variable generated from the same prior information as the topic proportion. ≪ Desc / Clms Page number 19 >
delete The method according to claim 1,
Correlating the topic ratios according to the locus with the time-series numerical data in the time axis comprises: comparing the document-level state of the topic ratio at the time t with the corpus-level state of the topic ratio at the time t by a variational method, Wherein the time-series text data and the time-series numerical data are associated with each other.
The method of claim 3,
Correlating the topic ratios according to the locus with the time series numerical data in the time axis further comprises using a Kalman filter to estimate the dynamics of the topic ratios in the corpus over time. Time series text data and time series numerical data.
The method according to claim 1,
Wherein acquiring a data set that includes time-series text data and time-series numeric data that are temporally corresponding to each other includes determining whether there are stop words or terms appearing less frequently than a predefined ratio in the data set And locating and removing time series text data and time series numerical data.
Time-series text data and time-series numerical data,
A data acquisition module for acquiring time-series text data and time-series numerical data sets of data corresponding to each other in time, and
And an ATM (Associative Topic Model) module for calculating a locus of the topic ratio with respect to time from the time series text data and correlating the topic ratios according to the locus with the time series numerical data on a time axis,
The ATM module includes:
And compute a time-based trajectory of the topic ratio from the time-series text data on the basis of a time-series numerical variable generated from the same fryer information as the topic ratio of the dynamic topic model (DTM).
delete The method according to claim 6,
The ATM module includes:
Level state of the topic ratio at time t, the document-level state of the topic ratio at time t to correlate the topic ratios according to the locus with the time series numerical data on the time axis, according to a variational method Time-series text data and a time-series numeric data associating device.
9. The method of claim 8,
The ATM module includes:
A time series text that is further configured to estimate a dynamics of a topic ratio in a corpus over time to correlate topic ratios according to the locus with the time series numerical data in a time axis and to use a Kalman filter for the estimation, Data and time series numerical data associating devices.
10. The method of claim 9,
And a data acquiring module for acquiring time-series text data and time-series numeric data corresponding to each other in time,
Wherein the data set is configured to find and remove a stop word or a term appearing less frequently than a predefined rate.
There is provided a computer-readable recording medium having recorded thereon a program which, when executed by a computer,
Obtaining a data set including time-series text data and time-series numeric data corresponding to each other in time, and
And applying an associative topic model (ATM) to the time-series text data and the time-series numeric data corresponding to each other in time,
Wherein the step of applying the ATM comprises the steps of calculating a locus of the topic proportions in time from the time series text data and correlating the topic ratios according to the locus with the time series numerical data on a time axis,
The program includes:
Wherein the step of calculating a locus based on the time ratio of the topic ratio from the time-series text data comprises the steps of: calculating a locus of the topic ratio with respect to time from the time-series text data as priority information equivalent to a topic ratio of a dynamic topic model (DTM) based on a time-series numerical parameter generated from the time-series numerical information.
delete 12. The method of claim 11,
The program includes:
Correlating the topic ratios according to the locus with the time-series numerical data in the time axis comprises: comparing the document-level state of the topic ratio at the time t with the corpus-level state of the topic ratio at the time t by a variational method, In accordance with a result of the comparison.
14. The method of claim 13,
The program includes:
Correlating the topic ratios according to the locus with the time series numerical data in the time axis further comprises using a Kalman filter to estimate the dynamics of the topic ratio in the corpus over time, A computer-readable recording medium storing a program.
15. The method of claim 14,
The program includes:
Wherein acquiring a data set that includes time-series text data and time-series numeric data that are temporally corresponding to each other includes determining whether there are stop words or terms appearing less frequently than a predefined ratio in the data set And a step of detecting and removing the program.
Time-series text data and time-series numerical data,
A memory for storing time-series text data and time-series numeric data in a temporally corresponding manner;
A processor,
The processor calculates a locus of a topic proportion according to time from the time series text data stored in the memory according to an associative topic model (ATM), calculates topic ratios according to the locus in a time axis, Data,
The processor
And calculating a time-based trajectory of the topic ratio from the time-series text data on the basis of a time-series numerical variable generated from the same priority information as a topic ratio of a dynamic topic model (DTM) A numeric data association device.
delete 17. The method of claim 16,
The processor
Level state of the topic ratio at time t, the document-level state of the topic ratio at time t to correlate the topic ratios according to the locus with the time series numerical data on the time axis, according to a variational method Time-series text data and a time-series numeric data associating device.
19. The method of claim 18,
The processor comprising:
Time text data, which is configured to estimate a dynamics of topic ratios in a corpus over time to correlate topic ratios according to the locus with the time series numerical data in a time axis and to use a Kalman filter for the estimation, And a time series numerical data association device.
delete
KR1020150076402A 2015-05-29 2015-05-29 Method and apparatus for associating topic data with numerical time series KR101613397B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150076402A KR101613397B1 (en) 2015-05-29 2015-05-29 Method and apparatus for associating topic data with numerical time series

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150076402A KR101613397B1 (en) 2015-05-29 2015-05-29 Method and apparatus for associating topic data with numerical time series

Publications (1)

Publication Number Publication Date
KR101613397B1 true KR101613397B1 (en) 2016-04-18

Family

ID=55916954

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150076402A KR101613397B1 (en) 2015-05-29 2015-05-29 Method and apparatus for associating topic data with numerical time series

Country Status (1)

Country Link
KR (1) KR101613397B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019013376A1 (en) * 2017-07-14 2019-01-17 한국과학기술원 Method and device for predicting approval rating by using text-compensated automatic statistical model
CN111125305A (en) * 2019-12-05 2020-05-08 东软集团股份有限公司 Hot topic determination method and device, storage medium and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120095952A1 (en) * 2010-10-19 2012-04-19 Xerox Corporation Collapsed gibbs sampler for sparse topic models and discrete matrix factorization

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120095952A1 (en) * 2010-10-19 2012-04-19 Xerox Corporation Collapsed gibbs sampler for sparse topic models and discrete matrix factorization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
박성래, 사회 지표와 연관된 토픽모델, KAIST 석사 학위 논문, (2014.)*

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019013376A1 (en) * 2017-07-14 2019-01-17 한국과학기술원 Method and device for predicting approval rating by using text-compensated automatic statistical model
KR20190007915A (en) * 2017-07-14 2019-01-23 한국과학기술원 Method and apparatus for predicting approval rates of politicians with text augmented automatic statistician
KR101991569B1 (en) * 2017-07-14 2019-06-19 한국과학기술원 Method and apparatus for predicting approval rates of politicians with text augmented automatic statistician
CN111125305A (en) * 2019-12-05 2020-05-08 东软集团股份有限公司 Hot topic determination method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Xu et al. E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework
US11361200B2 (en) System and method for learning contextually aware predictive key phrases
US10600005B2 (en) System for automatic, simultaneous feature selection and hyperparameter tuning for a machine learning model
Zamani et al. Neural query performance prediction using weak supervision from multiple signals
US20210042590A1 (en) Machine learning system using a stochastic process and method
US11037080B2 (en) Operational process anomaly detection
El Morr et al. Descriptive, predictive, and prescriptive analytics
JP2021504789A (en) ESG-based corporate evaluation execution device and its operation method
Landeiro et al. Robust text classification in the presence of confounding bias
KR102105319B1 (en) Esg based enterprise assessment device and operating method thereof
Dang et al. Framework for retrieving relevant contents related to fashion from online social network data
AlDahoul et al. A comparison of machine learning models for suspended sediment load classification
US11615361B2 (en) Machine learning model for predicting litigation risk in correspondence and identifying severity levels
Prasad et al. Hybrid topic cluster models for social healthcare data
Badenes-Olmedo et al. Efficient clustering from distributions over topics
KR101613397B1 (en) Method and apparatus for associating topic data with numerical time series
Obiedat Predicting the popularity of online news using classification methods with feature filtering techniques
Iwata et al. Sequential modeling of topic dynamics with multiple timescales
CN111694957B (en) Method, equipment and storage medium for classifying problem sheets based on graph neural network
Fritsche et al. Deciphering professional forecasters' stories: Analyzing a corpus of textual predictions for the German economy
Gutsche Automatic weak signal detection and forecasting
CN113256383B (en) Recommendation method and device for insurance products, electronic equipment and storage medium
Nasr et al. Natural language processing: Text categorization and classifications
Vanipriya et al. Stock market prediction using sequential events
Hewa Nadungodage et al. Online multi-dimensional regression analysis on concept-drifting data streams

Legal Events

Date Code Title Description
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190402

Year of fee payment: 4