WO2024045005A1 - Data classification method based on dynamic bayesian network classifier - Google Patents

Data classification method based on dynamic bayesian network classifier Download PDF

Info

Publication number
WO2024045005A1
WO2024045005A1 PCT/CN2022/116055 CN2022116055W WO2024045005A1 WO 2024045005 A1 WO2024045005 A1 WO 2024045005A1 CN 2022116055 W CN2022116055 W CN 2022116055W WO 2024045005 A1 WO2024045005 A1 WO 2024045005A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
time series
sample
variable
data set
Prior art date
Application number
PCT/CN2022/116055
Other languages
French (fr)
Chinese (zh)
Inventor
周亮
吴韬
张斯雯
孔平
王双成
Original Assignee
上海健康医学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海健康医学院 filed Critical 上海健康医学院
Priority to PCT/CN2022/116055 priority Critical patent/WO2024045005A1/en
Publication of WO2024045005A1 publication Critical patent/WO2024045005A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the invention relates to the field of data classification, and in particular to a data classification method based on a dynamic Bayesian network classifier.
  • Dynamic Bayesian network is an extension of traditional Bayesian network and is suitable for solving time-related uncertainty problems, such as solving economic fields such as stock trend prediction and so on. Problems in medical fields such as disease diagnosis and prediction. Since the directed edges in the structure are more prominent in expressing causal relationships rather than phasing the channels or paths of information transmission, they are more suitable for dynamic analysis and inferential calculations and are not suitable for direct classification calculations.
  • the purpose of the present invention is to provide a data classification method based on a dynamic Bayesian network classifier, which can accurately classify time series data.
  • the present invention provides the following solutions:
  • a data classification method based on dynamic Bayesian network classifier including:
  • the time series sample data set includes sample attribute variables at multiple historical time points, the actual class variables corresponding to each sample attribute variable, the transitive dependency information of each sample attribute variable, direct export dependency information and indirect export reliance on information;
  • the present invention discloses the following technical effects: building a Bayesian network classifier based on time series sample data sets can accurately classify time series data.
  • Figure 1 is a flow chart of the data classification method based on the dynamic Bayesian network classifier
  • Figure 2 shows the local structure of the classifier
  • Figure 3 is a schematic diagram of the evolution model of the classifier.
  • data classification methods based on dynamic Bayesian network classifier include:
  • the time series sample data set includes sample attribute variables at multiple historical time points, actual class variables corresponding to each sample attribute variable, transitive dependency information, direct export dependency information, and indirect export dependency information.
  • S2 Construct a Bayesian network classifier based on the time series sample data set, learn the structure and weight coefficients of the Bayesian network classifier, and determine the optimal classifier.
  • the time series data to be classified includes attribute variables to be classified at multiple current time points, transitive dependency information, direct export dependency information, and indirect export dependency information of each attribute variable to be classified.
  • S1 includes: obtaining the first time series data set.
  • the first time series data set includes sample attribute variables at multiple historical time points, actual class variables corresponding to each sample attribute variable, transitive dependency information, direct derived dependency information and indirect derived dependency information.
  • time series transformation is performed on the first time series data set to obtain the second time series data set.
  • the misalignment transformation of the dynamic Bayesian network classifier order the misalignment correspondence between the sample attribute variables and the class variables in the second time series data set is established, and the time series sample data set is obtained.
  • S2 includes: using the maximum likelihood estimation method to determine the initial attribute tree based on the time series sample data set.
  • the initial attribute tree includes each sample attribute variable and the first prediction class variable corresponding to each sample attribute variable. Specifically, for any sample attribute variable, the classification accuracy of the actual class variable corresponding to the sample attribute variable and the target sample attribute variable is calculated; the target sample attribute variable is any other sample attribute variable in the time series sample data set.
  • the target sample attribute variable corresponding to the maximum classification accuracy is used as the first prediction class variable of the sample attribute variable.
  • the forward greedy search method is used to perform attribute tree learning to obtain the initial attribute tree.
  • the first classification accuracy rate is determined based on the first predicted class variable and the true class variable corresponding to each sample attribute variable.
  • the maximum likelihood estimation method is used to optimize the initial attribute tree to obtain the optimal attribute tree.
  • the optimal attribute tree includes each sample attribute variable and the second prediction class variable corresponding to each sample attribute variable.
  • the optimal attribute tree is the structure of the Bayesian network classifier.
  • the second classification accuracy is determined based on the second predicted class variables and actual class variables corresponding to each attribute variable. Based on the first classification accuracy, the second classification accuracy, the initial attribute tree and the optimal attribute tree, determine the weight coefficient of the Bayesian network classifier to obtain the optimal classifier.
  • accuracy(D[n,T],T 0 ) is the first classification accuracy
  • D[n,T] is the time series sample data set
  • n is the number of sample attribute variables
  • T is the total period
  • T 0 is the test threshold
  • c prediction [t] is the first prediction class variable of the sample attribute variable at time point t
  • c true [t] is the true class variable of the sample attribute variable at time point t.
  • p new is the weight coefficient of the Bayesian network classifier
  • is the first classification accuracy
  • is the second classification accuracy
  • p before is the probability distribution of the initial attribute tree
  • p after is the probability of the optimal attribute tree. distributed.
  • the class node in the classifier is the parent node of all attribute nodes, allowing the classifier to make full use of transitive dependency information.
  • Tree or forest structure and density estimation based on Gaussian functions between attributes effectively exploits direct and indirect derived dependency information to avoid overfitting of the data.
  • the time-delay transformation of variables integrates time-delay and non-time-delay information, and the dislocation transformation realizes asynchronous classification and prediction. Evolutionary learning and classification modes enable the optimal classifier to continuously accumulate classification information and improve classification capabilities.
  • is a quantity independent of c[t+1]
  • ⁇ i [t-1] is X i [t-1] in X 1 [t-1],...X i-1 [t-1]
  • the value of parent node ⁇ i [t-1], ⁇ j [t-1,t] is X i [t] in X 1 [t-1],...X n [t-1],X 1 [t],...X i-1 [t]
  • the value of the parent node ⁇ i [t-1,t] f(.) is the density, p(c[t+1]
  • the probability is estimated based on the maximum likelihood method, and the attribute density is estimated using the Gaussian function.
  • the threshold T 0 is determined based on the time series size, class probability validity, attribute density estimation or actual needs.
  • accuracy(fmdbn,D[n,T],T 0 ) is the classification accuracy of the classifier
  • c prediction [TT 0 +1] is using D[n,TT 0 ] as the training set for c[TT 0 +1]
  • the classification result of c true [TT 0 +1] is the real result, then
  • the learning of the classifier is divided into initial learning and evolutionary learning, and each stage includes structure learning and parameter learning of ordered variables.
  • Structure learning is at the core, and parameters can be estimated from the classifier structure and the input data set.
  • Structural learning focuses on the construction and adjustment of attribute trees or forests.
  • Initial learning Initialize the attribute tree, combine the temporal progressive classification accuracy standard, attribute order and forward greedy search method to perform attribute tree learning to obtain a locally optimal attribute tree. For a given attribute order, parent nodes can only be searched among previous attributes, and each attribute has at most one parent node, thus forming a locally optimal attribute tree or forest.

Abstract

The present invention belongs to the technical field of data classification. Provided is a data classification method based on a dynamic Bayesian network classifier. The data classification method comprises: acquiring a time series sample data set; constructing a Bayesian network classifier according to the time series sample data set, and learning the structure and the weight coefficient of the Bayesian network classifier, so as to determine an optimal classifier; and on the basis of the optimal classifier, determining a class variable corresponding to each attribute variable to be classified in time series data to be classified. Therefore, time series data can be accurately classified.

Description

一种基于动态贝叶斯网络分类器的数据分类方法A data classification method based on dynamic Bayesian network classifier 技术领域Technical field
本发明涉及数据分类领域,特别是涉及一种基于动态贝叶斯网络分类器的数据分类方法。The invention relates to the field of data classification, and in particular to a data classification method based on a dynamic Bayesian network classifier.
背景技术Background technique
时间序列数据的类和属性的更改并不是同步的,动态贝叶斯网络是传统贝叶斯网络的扩展,适用于解决与时间相关的不确定性问题,例如,解决股票走势预测等经济领域和疾病诊断预测等医学领域问题。由于结构中的有向边在表达因果关系方面更为突出,而不是在信息传输的通道或路径上进行相位化,因此其更适用于动态分析和推理计算,不适合用于直接分类计算。Changes in classes and attributes of time series data are not synchronous. Dynamic Bayesian network is an extension of traditional Bayesian network and is suitable for solving time-related uncertainty problems, such as solving economic fields such as stock trend prediction and so on. Problems in medical fields such as disease diagnosis and prediction. Since the directed edges in the structure are more prominent in expressing causal relationships rather than phasing the channels or paths of information transmission, they are more suitable for dynamic analysis and inferential calculations and are not suitable for direct classification calculations.
发明内容Contents of the invention
本发明的目的是提供一种基于动态贝叶斯网络分类器的数据分类方法,可对时序数据进行准确分类。The purpose of the present invention is to provide a data classification method based on a dynamic Bayesian network classifier, which can accurately classify time series data.
为实现上述目的,本发明提供了如下方案:In order to achieve the above objects, the present invention provides the following solutions:
一种基于动态贝叶斯网络分类器的数据分类方法,包括:A data classification method based on dynamic Bayesian network classifier, including:
获取时间序列样本数据集;时间序列样本数据集中包括历史多个时间点下的样本属性变量、各样本属性变量对应的实际类变量、各样本属性变量的传递依赖信息、直接导出依赖信息及间接导出依赖信息;Obtain the time series sample data set; the time series sample data set includes sample attribute variables at multiple historical time points, the actual class variables corresponding to each sample attribute variable, the transitive dependency information of each sample attribute variable, direct export dependency information and indirect export reliance on information;
根据时间序列样本数据集构建贝叶斯网络分类器,并对贝叶斯网络分类器结构及权重系数进行学习,确定最优分类器;Build a Bayesian network classifier based on the time series sample data set, learn the Bayesian network classifier structure and weight coefficients, and determine the optimal classifier;
获取待分类时序数据;Obtain time series data to be classified;
基于最优分类器,确定待分类时序数据中各待分类属性变量对应的类变量。Based on the optimal classifier, determine the class variables corresponding to each attribute variable to be classified in the time series data to be classified.
根据本发明提供的具体实施例,本发明公开了以下技术效果:根据时间序列样本数据集构建贝叶斯网络分类器,能够对时序数据进行准确地分类。According to specific embodiments provided by the present invention, the present invention discloses the following technical effects: building a Bayesian network classifier based on time series sample data sets can accurately classify time series data.
附图说明Description of drawings
图1为基于动态贝叶斯网络分类器的数据分类方法的流程图;Figure 1 is a flow chart of the data classification method based on the dynamic Bayesian network classifier;
图2为分类器的局部结构;Figure 2 shows the local structure of the classifier;
图3为分类器的进化模式示意图。Figure 3 is a schematic diagram of the evolution model of the classifier.
具体实施方式Detailed ways
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
如图1所示,基于动态贝叶斯网络分类器的数据分类方法包括:As shown in Figure 1, data classification methods based on dynamic Bayesian network classifier include:
S1:获取时间序列样本数据集。时间序列样本数据集中包括历史多个时间点下的样本属性变量、各样本属性变量对应的实际类变量、传递依赖信息、直接导出依赖信息及间接导出依赖信息。S1: Obtain the time series sample data set. The time series sample data set includes sample attribute variables at multiple historical time points, actual class variables corresponding to each sample attribute variable, transitive dependency information, direct export dependency information, and indirect export dependency information.
S2:根据时间序列样本数据集,构建贝叶斯网络分类器,并对贝叶斯网络分类器结构及权重系数进行学习,确定最优分类器。S2: Construct a Bayesian network classifier based on the time series sample data set, learn the structure and weight coefficients of the Bayesian network classifier, and determine the optimal classifier.
S3:获取待分类时序数据。待分类时序数据中包括当前多个时间点下的待分类属性变量、各待分类属性变量的传递依赖信息、直接导出依赖信息及间接导出依赖信息。S3: Obtain the time series data to be classified. The time series data to be classified includes attribute variables to be classified at multiple current time points, transitive dependency information, direct export dependency information, and indirect export dependency information of each attribute variable to be classified.
S4:基于最优分类器,确定待分类时序数据中各待分类属性变量对应的类变量。S4: Based on the optimal classifier, determine the class variables corresponding to each attribute variable to be classified in the time series data to be classified.
将非时间序列数据集转换为时间序列数据集,分别用X 1[t],X 2[t],...,X n[t],C[t]表示时间序列属性变量和类变量,其中t取离散的时间点且1≤t≤T,x 1[t],x 2[t],...,x n[t],c[t]是它们的具体取值,D[n,T]={x 1[t],x 2[t],...,x n[t],c[t]|1≤t≤T}是具有T个记录的时间序列分类数据集,D[n,T]中的记录之间具有时序依赖。首先基于给定的时间序列数据集D[n,T]建立分类器,然后用所建立的分类器对
Figure PCTCN2022116055-appb-000001
进行预测,其中
Figure PCTCN2022116055-appb-000002
为分类器的阶数。在本实施例中,对
Figure PCTCN2022116055-appb-000003
时的分类器展开研究。
Convert non-time series data sets into time series data sets, using X 1 [t], X 2 [t],..., X n [t], C[t] to represent time series attribute variables and class variables respectively, Where t takes a discrete time point and 1≤t≤T, x 1 [t], x 2 [t],..., x n [t], c[t] are their specific values, D[n ,T]={x 1 [t],x 2 [t],...,x n [t],c[t]|1≤t≤T} is a time series classification data set with T records, There are temporal dependencies between records in D[n,T]. First, a classifier is established based on the given time series data set D[n,T], and then the established classifier is used to
Figure PCTCN2022116055-appb-000001
Make predictions where
Figure PCTCN2022116055-appb-000002
is the order of the classifier. In this embodiment, for
Figure PCTCN2022116055-appb-000003
Conduct research on time classifiers.
S1包括:获取第一时间序列数据集。第一时间序列数据集中包括多个历史时间点的样本属性变量、各样本属性变量对应的实际类变量、传递依赖信息、直接导出依赖信息及间接导出依赖信息。基于马尔科夫假设,对第一时间序列数据集进行时序转换,得到第二时间序列数据集。基于动态贝叶斯网络分类器阶数的错位变换,建立第二时间序列数据集中样本属性变量与类变量之间的错位对应关系,得到时间序列样本数据集。S1 includes: obtaining the first time series data set. The first time series data set includes sample attribute variables at multiple historical time points, actual class variables corresponding to each sample attribute variable, transitive dependency information, direct derived dependency information and indirect derived dependency information. Based on the Markov hypothesis, time series transformation is performed on the first time series data set to obtain the second time series data set. Based on the misalignment transformation of the dynamic Bayesian network classifier order, the misalignment correspondence between the sample attribute variables and the class variables in the second time series data set is established, and the time series sample data set is obtained.
转换后的时间序列样本数据集为D[n,T]={x 1[t-1],x 1[t],x 2[t-1],x 2[t],...,x n[t-1],x n[t],c[t],c[t+1]|2≤t≤T}。 The converted time series sample data set is D[n,T]={x 1 [t-1],x 1 [t],x 2 [t-1],x 2 [t],...,x n [t-1],x n [t],c[t],c[t+1]|2≤t≤T}.
S2包括:根据时间序列样本数据集,采用最大似然估计法,确定初始属性树。初始属性树中包括各样本属性变量及各样本属性变量对应的第一预测类变量。具体地,针对任一样本属性变量,计算样本属性变量对应的实际类变量与目标样本属性变量的分类准确率;目标样本属性变量为时间序列样本数据集中其余任一样本属性变量。将最大分类准确率对应的目标样本属性变量作为样本属性变量的第一预测类变量。根据各样本属性变量及对应的第一预测类变量,采用向前贪婪搜索方法进行属性树学习,得到初始属性树。根据各样本属性变量对应的第一预测类变量及真实类变量,确定第一分类准确率。从时间序列样本数据集中选取任意多个连续时间点下的样本属性变量得到时间序列段数据集。基于贪婪随机搜索方法,根据时间序列段数据集及第一分类准确率,采用最大似然估计法对初始属性树进行优化,得到最优属性树。最优属性树中包括各样本属性变量及各样本属性变量对应的第二预测类变量。最优属性树为贝叶斯网络分类器的结构。根据各属性变量对应的第二预测类变量及实际类变量,确定第二分类准确率。根据第一分类准确率、第二分类准确率、初始属性树及最优属性树,确定贝叶斯网络分类器的权重系数,以得到最优分类器。S2 includes: using the maximum likelihood estimation method to determine the initial attribute tree based on the time series sample data set. The initial attribute tree includes each sample attribute variable and the first prediction class variable corresponding to each sample attribute variable. Specifically, for any sample attribute variable, the classification accuracy of the actual class variable corresponding to the sample attribute variable and the target sample attribute variable is calculated; the target sample attribute variable is any other sample attribute variable in the time series sample data set. The target sample attribute variable corresponding to the maximum classification accuracy is used as the first prediction class variable of the sample attribute variable. According to each sample attribute variable and the corresponding first prediction class variable, the forward greedy search method is used to perform attribute tree learning to obtain the initial attribute tree. The first classification accuracy rate is determined based on the first predicted class variable and the true class variable corresponding to each sample attribute variable. Select sample attribute variables at any number of consecutive time points from the time series sample data set to obtain the time series segment data set. Based on the greedy random search method, based on the time series segment data set and the first classification accuracy, the maximum likelihood estimation method is used to optimize the initial attribute tree to obtain the optimal attribute tree. The optimal attribute tree includes each sample attribute variable and the second prediction class variable corresponding to each sample attribute variable. The optimal attribute tree is the structure of the Bayesian network classifier. The second classification accuracy is determined based on the second predicted class variables and actual class variables corresponding to each attribute variable. Based on the first classification accuracy, the second classification accuracy, the initial attribute tree and the optimal attribute tree, determine the weight coefficient of the Bayesian network classifier to obtain the optimal classifier.
采用以下公式,确定第一分类准确率:Use the following formula to determine the first classification accuracy:
Figure PCTCN2022116055-appb-000004
Figure PCTCN2022116055-appb-000004
Figure PCTCN2022116055-appb-000005
Figure PCTCN2022116055-appb-000005
accuracy(D[n,T],T 0)为第一分类准确率,D[n,T]为时间序列样本数据集,n为样本属性变量的数量,T为总时段,T 0为测试阈值,c prediction[t]为t时间点下样本属性变量的第一预测类变量,c true[t]为t时间点下样本属性变量的真实类变量。 accuracy(D[n,T],T 0 ) is the first classification accuracy, D[n,T] is the time series sample data set, n is the number of sample attribute variables, T is the total period, and T 0 is the test threshold , c prediction [t] is the first prediction class variable of the sample attribute variable at time point t, and c true [t] is the true class variable of the sample attribute variable at time point t.
估计初始属性树的概率分布及最优属性树的概率分布。根据初始属性树的概率分布、最优属性树的概率分布、第一分类准确率及第二分类准确率,确定贝叶斯网络分类器的权重系数:Estimate the probability distribution of the initial attribute tree and the probability distribution of the optimal attribute tree. According to the probability distribution of the initial attribute tree, the probability distribution of the optimal attribute tree, the first classification accuracy and the second classification accuracy, determine the weight coefficient of the Bayesian network classifier:
Figure PCTCN2022116055-appb-000006
Figure PCTCN2022116055-appb-000006
其中,p new为贝叶斯网络分类器的权重系数,α为第一分类准确率,β为第二分类准确率,p before为初始属性树的概率分布,p after为最优属性树的概率分布。 Among them, p new is the weight coefficient of the Bayesian network classifier, α is the first classification accuracy, β is the second classification accuracy, p before is the probability distribution of the initial attribute tree, and p after is the probability of the optimal attribute tree. distributed.
分类器中类节点是所有属性节点的父节点,使得分类器能够充分利用可传递的依赖信息。基于属性间高斯函数的树或森林结构和密度估计有效地利用直接和间接的导出依赖信息,避免数据的过度拟合。变量的时滞转换融合了时滞和非时滞信息,错位变换实现了异步分类和预测。进化学习和分类模式使最优分类器不断积累分类信息,提高分类能力。The class node in the classifier is the parent node of all attribute nodes, allowing the classifier to make full use of transitive dependency information. Tree or forest structure and density estimation based on Gaussian functions between attributes effectively exploits direct and indirect derived dependency information to avoid overfitting of the data. The time-delay transformation of variables integrates time-delay and non-time-delay information, and the dislocation transformation realizes asynchronous classification and prediction. Evolutionary learning and classification modes enable the optimal classifier to continuously accumulate classification information and improve classification capabilities.
分类器的结构和表现形式:给定X 1[t-1],X 2[t-1],...,X n[t-1],C[t]时,X 1[t],X 2[t],...,X n[t],C[t+1]与及其他时滞变量在马尔科夫性假设下是条件独立的。根据贝叶斯网络理论和图2的条件独立性关系,得到: The structure and expression of the classifier: given X 1 [t-1], X 2 [t-1],..., X n [t-1], C[t], X 1 [t], X 2 [t],...,X n [t],C[t+1] and other time delay variables are conditionally independent under the Markov property assumption. According to Bayesian network theory and the conditional independence relationship in Figure 2, we get:
Figure PCTCN2022116055-appb-000007
Figure PCTCN2022116055-appb-000007
γ是与c[t+1]无关的量,π i[t-1]是X i[t-1]在X 1[t-1],...X i-1[t-1]中父结点 Π i[t-1]的取值,π j[t-1,t]是X i[t]在X 1[t-1],...X n[t-1],X 1[t],...X i-1[t]中父结点Π i[t-1,t]的取值,f(.)为密度,p(c[t+1]|c[t],x 1[t-1],...,x n[t-1],x 1[t],...,x n[t])为样本x 1[t-1],...,x n[t-1],x 1[t],...,x n[t]属于类别c[t+1]的概率。 γ is a quantity independent of c[t+1], π i [t-1] is X i [t-1] in X 1 [t-1],...X i-1 [t-1] The value of parent node Π i [t-1], π j [t-1,t] is X i [t] in X 1 [t-1],...X n [t-1],X 1 [t],...X i-1 [t] The value of the parent node Π i [t-1,t], f(.) is the density, p(c[t+1]|c[ t],x 1 [t-1],...,x n [t-1],x 1 [t],...,x n [t]) is the sample x 1 [t-1],. ..,x n [t-1],x 1 [t],..., the probability that x n [t] belongs to class c[t+1].
Figure PCTCN2022116055-appb-000008
Figure PCTCN2022116055-appb-000008
基于最大似然法估计概率,利用高斯函数估计属性密度。The probability is estimated based on the maximum likelihood method, and the attribute density is estimated using the Gaussian function.
对于时间序列数据集D[n,T],阈值T 0根据时间序列大小、类概率效度、属性密度估计或实际需要确定。accuracy(fmdbn,D[n,T],T 0)为分类器的分类准确率,c prediction[T-T 0+1]为使用D[n,T-T 0]作为训练集对c[T-T 0+1]的分类结果,c true[T-T 0+1]是真正的结果,则 For the time series data set D[n,T], the threshold T 0 is determined based on the time series size, class probability validity, attribute density estimation or actual needs. accuracy(fmdbn,D[n,T],T 0 ) is the classification accuracy of the classifier, c prediction [TT 0 +1] is using D[n,TT 0 ] as the training set for c[TT 0 +1] The classification result of c true [TT 0 +1] is the real result, then
Figure PCTCN2022116055-appb-000009
Figure PCTCN2022116055-appb-000009
其中
Figure PCTCN2022116055-appb-000010
in
Figure PCTCN2022116055-appb-000010
将分类器的学习分为初始学习和进化学习,每个阶段都包括有序变量的结构学习和参数学习。结构学习是核心,参数可以从分类器结构和输入的数据集中估计出来。结构学习侧重属性树或森林的构建和调整。The learning of the classifier is divided into initial learning and evolutionary learning, and each stage includes structure learning and parameter learning of ordered variables. Structure learning is at the core, and parameters can be estimated from the classifier structure and the input data set. Structural learning focuses on the construction and adjustment of attribute trees or forests.
(1)初始学习:初始化属性树,结合时序递进分类准确性标准、属性顺序和向前贪婪搜索方法进行属性树学习,得到局部最优的属性树。对于给定的属性顺序,父节点只能在先前属性中搜索,且每个属性最多只有一个父节点,由此形成一个局部最优属性树或森林。(1) Initial learning: Initialize the attribute tree, combine the temporal progressive classification accuracy standard, attribute order and forward greedy search method to perform attribute tree learning to obtain a locally optimal attribute tree. For a given attribute order, parent nodes can only be searched among previous attributes, and each attribute has at most one parent node, thus forming a locally optimal attribute tree or forest.
(2)进化学习:需要对初始学习得到的属性树不断进行调整,每次调整后,新的分类器作为下一轮调整的基础分类器。通过贪婪随机搜索过程避 免局部最优的收敛。针对任一结点,产生0到2n之间的随机整数,将随机数对应的结点作为该结点的初始父结点b j。根据b j及该结点的实际类变量计算分类准确率,将分类准确率最大的结点作为该属性变量的父结点。 (2) Evolutionary learning: The attribute tree obtained through initial learning needs to be continuously adjusted. After each adjustment, the new classifier is used as the basic classifier for the next round of adjustment. Avoiding convergence of local optima through a greedy random search process. For any node, a random integer between 0 and 2n is generated, and the node corresponding to the random number is used as the initial parent node b j of the node. Calculate the classification accuracy based on b j and the actual class variable of the node, and use the node with the largest classification accuracy as the parent node of the attribute variable.
(3)进化分类计算:基于调整前后对分类器的模型平均得到的新分类器进行分类计算。(3) Evolutionary classification calculation: Classification calculation is performed based on the new classifier obtained by averaging the classifier models before and after adjustment.
分别使用FMDBN before和FMDBN after表示由调整前的{before_a h|1≤h≤2n}和调整后的{after_a h|1≤h≤2n}得到的分类器,对FMDBN before和FMDBN after进行模型平均,得到新的分类器FMDBN new,如图3所示,通过迭代不断累积和压缩分类信息,提高分类器的分类能力。 Use FMDBN before and FMDBN after respectively to represent the classifier obtained by {before_a h |1≤h≤2n} before adjustment and {after_a h |1≤h≤2n} after adjustment, and perform model averaging on FMDBN before and FMDBN after , a new classifier FMDBN new is obtained, as shown in Figure 3. The classification information of the classifier is continuously accumulated and compressed through iteration to improve the classification ability of the classifier.
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,本说明书内容不应理解为对本发明的限制。This article uses specific examples to illustrate the principles and implementations of the present invention. The content of this description should not be understood as limiting the present invention.

Claims (7)

  1. 一种基于动态贝叶斯网络分类器的数据分类方法,其特征在于,所述数据分类方法包括:A data classification method based on a dynamic Bayesian network classifier, characterized in that the data classification method includes:
    获取时间序列样本数据集;时间序列样本数据集中包括历史多个时间点下的样本属性变量、各样本属性变量对应的实际类变量、各样本属性变量的传递依赖信息、直接导出依赖信息及间接导出依赖信息;Obtain the time series sample data set; the time series sample data set includes sample attribute variables at multiple historical time points, the actual class variables corresponding to each sample attribute variable, the transitive dependency information of each sample attribute variable, direct export dependency information and indirect export reliance on information;
    根据时间序列样本数据集构建贝叶斯网络分类器,并对贝叶斯网络分类器结构及权重系数进行学习,确定最优分类器;Build a Bayesian network classifier based on the time series sample data set, learn the Bayesian network classifier structure and weight coefficients, and determine the optimal classifier;
    获取待分类时序数据;Obtain time series data to be classified;
    基于最优分类器,确定待分类时序数据中各待分类属性变量对应的类变量。Based on the optimal classifier, determine the class variables corresponding to each attribute variable to be classified in the time series data to be classified.
  2. 根据权利要求1所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,所述获取时间序列样本数据集,具体包括:The data classification method based on dynamic Bayesian network classifier according to claim 1, characterized in that said obtaining a time series sample data set specifically includes:
    获取第一时间序列数据集;第一时间序列数据集中包括多个历史时间点的样本属性变量、各样本属性变量对应的实际类变量、各样本属性变量的传递依赖信息、直接导出依赖信息及间接导出依赖信息;Obtain the first time series data set; the first time series data set includes sample attribute variables at multiple historical time points, actual class variables corresponding to each sample attribute variable, transitive dependency information of each sample attribute variable, direct export dependency information and indirect Export dependency information;
    基于马尔科夫假设,对第一时间序列数据集进行时序转换,得到第二时间序列数据集;Based on the Markov hypothesis, perform time series transformation on the first time series data set to obtain the second time series data set;
    基于动态贝叶斯网络分类器阶数的错位变换,建立第二时间序列数据集中样本属性变量与类变量之间的错位对应关系,得到时间序列样本数据集。Based on the misalignment transformation of the dynamic Bayesian network classifier order, the misalignment correspondence between the sample attribute variables and the class variables in the second time series data set is established to obtain the time series sample data set.
  3. 根据权利要求1所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,所述根据时间序列样本数据集构建贝叶斯网络分类器,并对贝叶斯网络分类器结构及权重系数进行学习,确定最优分类器,具体包括:The data classification method based on dynamic Bayesian network classifier according to claim 1, characterized in that the Bayesian network classifier is constructed according to the time series sample data set, and the Bayesian network classifier structure and Learn the weight coefficients to determine the optimal classifier, including:
    根据时间序列样本数据集,采用最大似然估计法,确定初始属性树;初始属性树中包括各样本属性变量及各样本属性变量对应的第一预测类变量;According to the time series sample data set, the maximum likelihood estimation method is used to determine the initial attribute tree; the initial attribute tree includes each sample attribute variable and the first prediction class variable corresponding to each sample attribute variable;
    根据各样本属性变量对应的第一预测类变量及真实类变量,确定第一分类准确率;Determine the first classification accuracy based on the first predicted class variables and true class variables corresponding to each sample attribute variable;
    从时间序列样本数据集中选取任意多个连续时间点下的样本属性变量,得到时间序列段数据集;Select sample attribute variables at any number of consecutive time points from the time series sample data set to obtain a time series segment data set;
    基于贪婪随机搜索方法,根据时间序列段数据集及第一分类准确率,采用最大似然估计法,对初始属性树进行优化,得到最优属性树;最优属性树中包括各样本属性变量及各样本属性变量对应的第二预测类变量;最优属性树为贝叶斯网络分类器的结构;Based on the greedy random search method, based on the time series segment data set and the first classification accuracy, the maximum likelihood estimation method is used to optimize the initial attribute tree to obtain the optimal attribute tree; the optimal attribute tree includes each sample attribute variable and The second prediction class variable corresponding to each sample attribute variable; the optimal attribute tree is the structure of the Bayesian network classifier;
    根据各属性变量对应的第二预测类变量及实际类变量,确定第二分类准确率;Determine the second classification accuracy based on the second predicted class variables and actual class variables corresponding to each attribute variable;
    根据第一分类准确率、第二分类准确率、初始属性树及最优属性树,确定贝叶斯网络分类器的权重系数,以得到最优分类器。Based on the first classification accuracy, the second classification accuracy, the initial attribute tree and the optimal attribute tree, determine the weight coefficient of the Bayesian network classifier to obtain the optimal classifier.
  4. 根据权利要求3所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,所述根据时间序列样本数据集,采用最大似然估计法,确定初始属性树,具体包括:The data classification method based on dynamic Bayesian network classifier according to claim 3, characterized in that the maximum likelihood estimation method is used to determine the initial attribute tree according to the time series sample data set, specifically including:
    针对任一样本属性变量,计算所述样本属性变量对应的实际类变量与目标样本属性变量的分类准确率;目标样本属性变量为时间序列样本数据集中其余任一样本属性变量;For any sample attribute variable, calculate the classification accuracy of the actual class variable corresponding to the sample attribute variable and the target sample attribute variable; the target sample attribute variable is any other sample attribute variable in the time series sample data set;
    将最大分类准确率对应的目标样本属性变量作为所述样本属性变量的第一预测类变量;Use the target sample attribute variable corresponding to the maximum classification accuracy as the first prediction class variable of the sample attribute variable;
    根据各样本属性变量及对应的第一预测类变量,采用向前贪婪搜索方法进行属性树学习,得到初始属性树。According to each sample attribute variable and the corresponding first prediction class variable, the forward greedy search method is used to perform attribute tree learning to obtain the initial attribute tree.
  5. 根据权利要求3所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,采用以下公式,确定第一分类准确率:The data classification method based on dynamic Bayesian network classifier according to claim 3, characterized in that the following formula is used to determine the first classification accuracy:
    Figure PCTCN2022116055-appb-100001
    Figure PCTCN2022116055-appb-100001
    其中,
    Figure PCTCN2022116055-appb-100002
    accuracy(D[n,T],T 0)为第一分类准确率,D[n,T]为时间序列样本数据集,n为样本属性变量的数量,T为总时段,T 0为测试阈值,c prediction[t]为t时间 点下样本属性变量的第一预测类变量,c true[t]为t时间点下样本属性变量的真实类变量。
    in,
    Figure PCTCN2022116055-appb-100002
    accuracy(D[n,T],T 0 ) is the first classification accuracy, D[n,T] is the time series sample data set, n is the number of sample attribute variables, T is the total period, and T 0 is the test threshold , c prediction [t] is the first prediction class variable of the sample attribute variable at time point t, and c true [t] is the true class variable of the sample attribute variable at time point t.
  6. 根据权利要求3所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,所述根据第一分类准确率、第二分类准确率、初始属性树及最优属性树,确定贝叶斯网络分类器的权重系数,以得到最优分类器,具体包括:The data classification method based on a dynamic Bayesian network classifier according to claim 3, characterized in that the Bayesian classification method is determined based on the first classification accuracy, the second classification accuracy, the initial attribute tree and the optimal attribute tree. The weight coefficient of the Yeasian network classifier is used to obtain the optimal classifier, including:
    估计初始属性树的概率分布及最优属性树的概率分布;Estimate the probability distribution of the initial attribute tree and the probability distribution of the optimal attribute tree;
    根据初始属性树的概率分布、最优属性树的概率分布、第一分类准确率及第二分类准确率,确定贝叶斯网络分类器的权重系数。According to the probability distribution of the initial attribute tree, the probability distribution of the optimal attribute tree, the first classification accuracy and the second classification accuracy, the weight coefficient of the Bayesian network classifier is determined.
  7. 根据权利要求6所述的基于动态贝叶斯网络分类器的数据分类方法,其特征在于,采用以下公式,确定贝叶斯网络分类器的权重系数:The data classification method based on a dynamic Bayesian network classifier according to claim 6, characterized in that the following formula is used to determine the weight coefficient of the Bayesian network classifier:
    Figure PCTCN2022116055-appb-100003
    Figure PCTCN2022116055-appb-100003
    其中,p new为贝叶斯网络分类器的权重系数,α为第一分类准确率,β为第二分类准确率,p before为初始属性树的概率分布,p after为最优属性树的概率分布。 Among them, p new is the weight coefficient of the Bayesian network classifier, α is the first classification accuracy, β is the second classification accuracy, p before is the probability distribution of the initial attribute tree, and p after is the probability of the optimal attribute tree. distributed.
PCT/CN2022/116055 2022-08-31 2022-08-31 Data classification method based on dynamic bayesian network classifier WO2024045005A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/116055 WO2024045005A1 (en) 2022-08-31 2022-08-31 Data classification method based on dynamic bayesian network classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/116055 WO2024045005A1 (en) 2022-08-31 2022-08-31 Data classification method based on dynamic bayesian network classifier

Publications (1)

Publication Number Publication Date
WO2024045005A1 true WO2024045005A1 (en) 2024-03-07

Family

ID=90099871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116055 WO2024045005A1 (en) 2022-08-31 2022-08-31 Data classification method based on dynamic bayesian network classifier

Country Status (1)

Country Link
WO (1) WO2024045005A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085475A1 (en) * 2011-05-19 2014-03-27 The Regents Of The University Of California Dynamic bayesian networks for vehicle classification in video
CN106021524A (en) * 2016-05-24 2016-10-12 成都希盟泰克科技发展有限公司 Working method for tree-augmented Navie Bayes classifier used for large data mining based on second-order dependence
CN110568286A (en) * 2019-09-12 2019-12-13 齐鲁工业大学 Transformer fault diagnosis method and system based on weighted double-hidden naive Bayes
CN114186639A (en) * 2021-12-13 2022-03-15 国网宁夏电力有限公司营销服务中心(国网宁夏电力有限公司计量中心) Electrical accident classification method based on dual-weighted naive Bayes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085475A1 (en) * 2011-05-19 2014-03-27 The Regents Of The University Of California Dynamic bayesian networks for vehicle classification in video
CN106021524A (en) * 2016-05-24 2016-10-12 成都希盟泰克科技发展有限公司 Working method for tree-augmented Navie Bayes classifier used for large data mining based on second-order dependence
CN110568286A (en) * 2019-09-12 2019-12-13 齐鲁工业大学 Transformer fault diagnosis method and system based on weighted double-hidden naive Bayes
CN114186639A (en) * 2021-12-13 2022-03-15 国网宁夏电力有限公司营销服务中心(国网宁夏电力有限公司计量中心) Electrical accident classification method based on dual-weighted naive Bayes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG SHUANG-CHENG, RUI GAO: "Learning and optimization of dynamic naive Bayesian classifiers for small time series", CONTROL AND DECISION., vol. 32, no. 1, 1 June 2017 (2017-06-01), pages 163 - 166, XP093145122, DOI: 10.13195/j.kzyjc.2015.1556 *

Similar Documents

Publication Publication Date Title
Mihajlovic et al. Dynamic bayesian networks: A state of the art
Gan et al. A locally linear RBF network-based state-dependent AR model for nonlinear time series modeling
WO2022205833A1 (en) Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium
CN111079931A (en) State space probabilistic multi-time-series prediction method based on graph neural network
CN112949828A (en) Graph convolution neural network traffic prediction method and system based on graph learning
CN111582610A (en) Prediction method for family energy decomposition based on convolutional neural network
CN105404783A (en) Blind source separation method
CN111813858B (en) Distributed neural network hybrid synchronous training method based on self-organizing grouping of computing nodes
CN111401755A (en) Multi-new-energy output scene generation method, device and system based on Markov chain
US20080010043A1 (en) Efficient gradient computation for conditional Gaussian graphical models
CN114639483A (en) Electronic medical record retrieval method and device based on graph neural network
Cui et al. Effective Lipschitz constraint enforcement for Wasserstein GAN training
Tembusai et al. K-nearest neighbor with K-fold cross validation and analytic hierarchy process on data classification
CN112434789A (en) Distributed neural network model partitioning method for edge video analysis
Zhou et al. Time series prediction method of industrial process with limited data based on transfer learning
CN115862319A (en) Traffic flow prediction method for space-time diagram self-encoder
CN112860904A (en) External knowledge-integrated biomedical relation extraction method
Li et al. LightNestle: quick and accurate neural sequential tensor completion via meta learning
CN115051929A (en) Network fault prediction method and device based on self-supervision target perception neural network
CN115376317A (en) Traffic flow prediction method based on dynamic graph convolution and time sequence convolution network
WO2024045005A1 (en) Data classification method based on dynamic bayesian network classifier
CN113128666A (en) Mo-S-LSTMs model-based time series multi-step prediction method
CN111797935A (en) Semi-supervised deep network picture classification method based on group intelligence
Sun et al. Dynamic Intelligent Supply-Demand Adaptation Model Towards Intelligent Cloud Manufacturing.
CN116662832A (en) Training sample selection method based on clustering and active learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22956837

Country of ref document: EP

Kind code of ref document: A1