CN102521325B - XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence - Google Patents

XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence Download PDF

Info

Publication number
CN102521325B
CN102521325B CN 201110398187 CN201110398187A CN102521325B CN 102521325 B CN102521325 B CN 102521325B CN 201110398187 CN201110398187 CN 201110398187 CN 201110398187 A CN201110398187 A CN 201110398187A CN 102521325 B CN102521325 B CN 102521325B
Authority
CN
China
Prior art keywords
sequence
tsdb
document
label
frequent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110398187
Other languages
Chinese (zh)
Other versions
CN102521325A (en
Inventor
张利军
李战怀
陈群
李霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Di'an Technology Co ltd
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN 201110398187 priority Critical patent/CN102521325B/en
Publication of CN102521325A publication Critical patent/CN102521325A/en
Application granted granted Critical
Publication of CN102521325B publication Critical patent/CN102521325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an XML (Extensive Makeup Language) structural similarity measuring method based on a frequency-associated tag sequence. The method comprises the following steps of: resolving an XML document set C to obtain a tag sequence database (TSDB); excavating all frequency tag sequence sets (FTS) from the TSDB; selecting a maximum frequency tag sequence set (MFTS) from the FTS; converting to obtain a new TSDB'; excavating a closed frequency-associated tag sequence set from the TSDB'; and expressing any document in the TSDB' as a closed frequency-associated tag sequence set which is contained in the TSDB', and calculating the structural similarity between any two documents in the document set C. According to the method, the accuracy of a clustering result can be raised.

Description

XML structural similarity measure based on frequent correlation tag sequence
Technical field
The invention belongs to the data management technique field, relate to a kind of structural similarity measure of XML document, particularly relate to a kind of utilization is measured the XML document structural similarity from the concentrated frequent correlation tag sequence of excavating of XML document as feature method.
Background technology
XML represents and the de facto standard of exchanges data as internet data, is widely used.Along with the continuous growth of XML document quantity, how effectively the XML data to be stored, filter, to retrieve and to manage at database and information retrieval field becoming more and more important.Many operation tasks to XML need to be measured the similarity between the XML document, the similarity measurement of XML document has become the basic problem of many XML treatment technologies, and is applied to a plurality of fields, and is integrated such as semi-structured data, classification/the cluster of XML document, XML retrieval etc.
Different from a traditional text document content, comprised hierarchical structure in the XML document.How utilizing the structural information that is included in wherein to calculate structural similarity between the XML document is a key issue during the XML similarity is calculated.Personnel have proposed many diverse ways for this Study on Problems.Some of them are the set in path based on the method in path with the representation of XML document, then utilize set or vector operations to calculate structural similarity between the document.For example, three XML document among consideration Fig. 1.Document [1] " Joshi; S.; Agrawal; N.; Krishnapuram; R., Negi, S.:A Bag of Paths Model for Measuring Structural Similarity in Web Documents.In:Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (SIGKDD). (2003) 577-582. " in the path bag model (this instructions is called the BOTP model) that proposes; the structure of a document is represented as the set in path, and a path is the sequence from the root node to the leaf node in its corresponding dom tree.Use the BOTP in this model representation such as the table 1 to be listed as such as three documents among Fig. 1.Can find out, the path among doc1 and the doc2 " a/b/c " and " a/b ", " a/e/f/g " and " a/h/f/g " all is regarded as diverse path.In fact, these two groups of paths all are the part couplings, and are similar to a great extent.In addition, although path bag model has kept the set membership between the node, ignored its brotherhood, thought between the path it is separate, it doesn't matter.For example path " a/b/c " and " a/d " among doc1 and the doc3 is considered to separate, and in fact they consist of brotherhood, appear at simultaneously in the same document continually.Document [1] has proposed another path bag model (being called the BOXP model) based on XPath simultaneously.Although this model has comprised the brotherhood between the part of nodes, and not exclusively.Document [2] " Leung; H.P.; Chung; F.L.; Chan; S.C., Luk, R.:XML Document Clustering Using Common XPath.In:Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration. (2005) 91-96. " XPath of Mining Frequent from document sets; be called commonXPath, then XML document be expressed as the vector that is consisted of by commonXPath.For example, establishing minimum support is 60%, and then three documents among Fig. 1 use representing such as the 3rd row in the table 1 of this model.Although doc1 passes through commonXPath with the path " a/e/f/g " among the doc2 with " a/h/f/g ": " a/*/f/g " is considered to similar, the path among the doc3 " a/f/g " still is considered to dissimilar.In fact " a/e/f/g ", these three paths of " a/h/f/g " and " a/f/g " all are very similar.In addition, document [2] thinks between the path it is independently equally by the vector calculation similarity time.For example, three documents all comprise path " a/b " and " a/d ", and consist of brotherhood, but document [2] has been ignored this relation.Document [3] " Rafiei; D.; Moise; D.L.; Sun; D.:Finding Syntactic Similarities Between XML Documents.In:Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA). (2006) 512-516. " except complete trails from the root node to the leaf node as the feature, also considered the subpath of complete trails, use representing such as the 4th row in the table 1 of this model such as three documents among Fig. 1, this method has still been ignored the brotherhood between the node when calculating similarity.
The different path representation of table 1 XML document
In sum, there are following two problems in existing calculating XML document structural similarity method based on the path:
1. can not process well the situation of part coupling between the path.As above " a/e/f/g " in the example, the similarity between " a/h/f/g " and " a/f/g " can not be processed well.
2. although captured set membership or ancestors' descendent relationship between the node, the brotherhood between the node has partly or entirely been ignored.As above path " a/b " and " a/d " are considered to separate in the example.
Owing to not taking full advantage of these information that are included in the XML document, so that the structural similarity between the document that above these methods are calculated is not accurate enough, accuracy has certain loss when being applied to XML document cluster or classification.
Summary of the invention
In order to overcome the deficiency that has now based on the XML document structural similarity measure in path, the present invention introduces the concept of association mining and sequential mode mining, a kind of file structure similarity calculating method based on frequent correlation tag sequence has been proposed, the method has overcome the deficiency that has now based on the method in path, and the similarity of calculating is more accurate.
The problem to be solved in the present invention is: given XML document collection C, calculate wherein any two document d iAnd d jBetween structural similarity.
The technical solution adopted for the present invention to solve the technical problems may further comprise the steps:
1. pre-service.All XML document among the analyzing XML file collection C are an orderly tag tree with the structural modeling of each XML document, and each node in the tree represents an element in the document, and node comes mark with masurium, is called label.The set that all labels that extract from all documents consist of is called tally set.The structure of XML document all is expressed as the set of sequence label, obtains sequence label database TSDB.
Described sequence label refers to the ordered list that is made of a plurality of labels in the tally set.The order of label by the path from the root node to the leaf node in tag tree corresponding to XML document the order of process.Sequence label α can be expressed as formally:<a 1, a 2, L, a n, a wherein iBe a label in the tally set, the number of the label that wherein comprises is called the length of sequence label, and length is that the sequence label of l is called the l-sequence label.
2. Mining Frequent sequence label.From TSDB, use frequent Sequential Pattern Mining Algorithm to excavate all frequent sequence label set FTS.
Described frequent sequence label refers to for given minimum support threshold value δ (0<δ≤1), if the support of sequence label α in TSDB, claims then that α is frequent sequence label more than or equal to δ in TSDB.
The support of described sequence label α in TSDB refers in TSDB to support the ratio of all number of files among the number of document of α and the TSDB, is designated as support (α).
The document of described support α refers to have a sequence label β in the document, so that β comprises α.
Described sequence label β:<b 1, b 2, L, b nComprise sequence label α:<a 1, a 2, L, a mRefer to exist integer sequence i 1<i 2<L<i m, so that
Figure BDA0000115665130000032
L,
Figure BDA0000115665130000033
Be denoted as Claim that also α is the subtab sequence of β, or β is the metatag sequence of α.
3. maximization.From FTS, select greatly frequent sequence label, obtain greatly frequent sequence label collection MFTS.
Described greatly frequent sequence label refers to for sequence label α, and the metatag sequence that does not have it in TSDB also is frequently.
4. conversion database.For each the sequence label α in each document among the TSDB, if there is its a sub-sequence label among the MFTS, then α is replaced with this subtab sequence, if there is no, then delete α.Can obtain new database TSDB ' after all handling.
5. excavate and close frequent correlation tag sequence.Use is closed Frequent Itemsets Mining Algorithm and is excavated all set FATS that frequent correlation tag sequence consists of that closes from TSDB '.
Described correlation tag sequence refers to the set of sequence label, and there is not another one sequence label β in any sequence label α in this set, so that β comprises α or α comprises β in the set.
Described frequent correlation tag sequence refers to for given minimum support threshold value δ (0<δ≤1), if the support of correlation tag sequence γ in TSDB ', claims then that correlation tag sequence γ is frequent correlation tag sequence more than or equal to δ in TSDB '.
The support of described correlation tag sequence γ in TSDB ' refers to support among the TSDB ' ratio of all number of files among the number of document of γ and the TSDB ', is designated as support (γ).
The document of described support correlation tag sequence γ refers to for any sequence label α among the γ, the document support α.
Describedly close frequent correlation tag sequence γ refer to that γ is frequently in TSDB ', and do not have its true superset η, so that their supports in TSDB ' are identical.
6. document representation.For any one the document d among the TSDB ' i, it is expressed as the set of closing frequent correlation tag sequence that it comprises.That is:
d i={fats|fats∈FATS∧d i?supports?fats}
d j={fats|fats∈FATS∧d j?supports?fats}
7. structural similarity calculates.Utilize following formula to calculate any two document d among the collection of document C iAnd d jBetween structural similarity sim (d i, d j).
sim ( d i , d j ) = | d i ∩ d j | + | p j i | + | p i j | | d i ∪ d j |
Wherein:
Figure BDA0000115665130000042
Figure BDA0000115665130000043
The invention has the beneficial effects as follows: the present invention adopts the concept of sequence pattern and association mode, XML document is regarded as the set of sequence label, then therefrom excavated and close frequent correlation tag sequence and calculate structural similarity between the XML document as the feature of document.The introducing of sequence pattern, solved the existing shortcoming that can not process well the situation of path part coupling based on the method in path, and the introducing of association mode can remedy the existing deficiency of having ignored brotherhood between the XML document element based on the method in path to a certain extent, thereby so that the similarity between the document that the method among use the present invention is calculated is more accurate, experimental result on the True Data collection shows, the method is applied to the cluster of XML document, than other the structural similarity computing method based on the path, can improve the accuracy rate of cluster result.
The present invention is further described below in conjunction with accompanying drawing.
Description of drawings
Fig. 1 is XML document structure tree sample;
Fig. 2 is XML document structural similarity measure process flow diagram;
Fig. 3 is XML document pretreatment process figure;
Fig. 4 is the frequent sequence label process flow diagram of maximization;
Fig. 5 is switch labels sequence library process flow diagram;
Fig. 6 is the cluster result precision ratio.
Embodiment
For a given XML document collection C, the idiographic flow that the present invention calculates the similarity between any two documents comprises the steps: as shown in Figure 2
1. document sets is carried out pre-service, obtain sequence label database TSDB.Treatment scheme as shown in Figure 3, in resolving, the same paths of same XML document only occurs once in TSDB, d_TS represents the set of the sequence label that document d comprises among the figure, d.id represents the sign of document d.
Described sequence label refers to the ordered list that is made of a plurality of labels in the tally set.The order of label by the path from the root node to the leaf node in tag tree corresponding to XML document the order of process.Sequence label α can be expressed as formally:<a 1, a 2, L, a n, a wherein iBe a label in the tally set, the number of the label that wherein comprises is called the length of sequence label, and length is that the sequence label of l is called the l-sequence label.
2. given minimum support threshold value δ, Mining Frequent sequence label collection FTS from TSDB.There is the algorithm of multiple Mining Frequent sequence pattern can be used for the Mining Frequent sequence label, we adopt the prefixspan algorithm in implementation, detailed description about this algorithm can be referring to document [4] " Pei; J.; Han; J., Mortazavi-Asl, B.; Pinto; H., Chen, Q., Dayal, U., Hsu, M.C.:PrefixSpan:Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth.In:Proceedings of the 17th International Conference on Data Engineering (ICDE). (2001) 215-224. ".
Described frequent sequence label refers to for given minimum support threshold value δ (0<δ≤1), if the support of sequence label α in TSDB, claims then that α is frequent sequence label more than or equal to δ in TSDB.
The support of described sequence label α in TSDB refers in TSDB to support the ratio of all number of files among the number of document of α and the TSDB, is designated as support (α).
The document of described support α refers to have a sequence label β in the document, so that β comprises α.
Described sequence label β:<b 1, b 2, L, b nComprise sequence label α:<a 1, a 2, L, a mRefer to exist integer sequence i 1<i 2<L<i m, so that
Figure BDA0000115665130000061
L,
Figure BDA0000115665130000063
Be denoted as
Figure BDA0000115665130000064
Claim that also α is the subtab sequence of β, or β is the metatag sequence of α.
In implementation, we get minimum support threshold value δ is 0.3.
3. obtain frequent sequence label collection FTS for previous step, do maximization and process, obtain greatly frequent sequence label set MFTS.The maximization treatment scheme as shown in Figure 4, m represents the length of frequent sequence label the longest among the FTS among the figure.
Described greatly frequent sequence label refers to for sequence label α, and the metatag sequence that does not have it in TSDB also is frequently.
4. sequence label database TSDB is converted to the database TSDB ' that represents with greatly frequent sequence label.Flow path switch as shown in Figure 5, d_FTS represents the set of the frequent sequence label that document d comprises among the figure.
5. from TSDB ', excavate and close frequent correlation tag sequence sets FATS.The algorithm that frequent associations collection is closed in multiple excavation all can be used for excavating and close frequent correlation tag sequence, and we adopt the CLOSET+ algorithm in the implementation, can be referring to document [5] about the detailed description of this algorithm.
Described correlation tag sequence refers to the set of sequence label, and there is not another one sequence label β in any sequence label α in this set, so that β comprises α or α comprises β in the set.
Described frequent correlation tag sequence refers to for given minimum support threshold value δ (0<δ≤1), if the support of correlation tag sequence γ in TSDB ', claims then that correlation tag sequence γ is frequent correlation tag sequence more than or equal to δ in TSDB '.
The support of described correlation tag sequence γ in TSDB ' refers to support among the TSDB ' ratio of all number of files among the number of document of γ and the TSDB ', is designated as support (γ).
The document of described support correlation tag sequence γ refers to for any sequence label α among the γ, the document support α.
Describedly close frequent correlation tag sequence γ refer to that γ is frequently in TSDB ', and do not have its true superset η, so that their supports in TSDB ' are identical.
Equally in implementation we to get minimum support threshold value δ be 0.3.
After obtaining closing frequent correlation tag sequence sets FATS by step in front, can calculate structural similarity between any two documents according to the 6th step in the aforementioned summary of the invention and the 7th step.
Wherein, d i={ fats|fats ∈ FATS ∧ d iSupports fats},
d j={fats|fats∈FATS∧d j?supports?fats},
sim ( d i , d j ) = | d i ∩ d j | + | p j i | + | p i j | | d i ∪ d j | ,
Figure BDA0000115665130000072
Figure BDA0000115665130000073
For the proved inventive method can Effective Raise based on the accuracy of the similarity calculating method in path, we are with the inventive method and other several similarity calculating method (BOTP based on the path, BOXP, commonXPath, subPath) done the contrast experiment.Experiment is based on two real data sets, one of them comes from list of references [6] " Kurt; A.; Tozal; E.:Classification of XSLT-Generated Web Documents With Support Vector machines.In:Proceedings of the 1st international workshop on Knowledge Discovery from XML Documents. (2006) 33-42. ", is called the Texas data set; The XML version (seeing document " Sigmod Record in XML, http://www.sigmod.org/publications/sigmodrecord/xml-edition. ") that another comes from ACM Sigmod Record is called the Sigmod data set.This several method of the main contrast of experiment result's in the XML document cluster is used precision ratio (precision).Comparing result as shown in Figure 6, as can be seen from Figure 6, on two different data sets, the precision ratio of the inventive method all has in various degree raising than other several method.

Claims (1)

1. the XML structural similarity measure based on frequent correlation tag sequence is characterized in that comprising the steps:
1) pre-service: all XML document among the analyzing XML file collection C are an orderly tag tree with the structural modeling of each XML document, and each node in the tree represents an element in the document, and node comes mark with masurium, is called label; The set that all labels that extract from all documents consist of is called tally set; The structure of XML document all is expressed as the set of sequence label, obtains sequence label database TSDB;
Described sequence label refers to the ordered list that is made of a plurality of labels in the tally set, the order of label by the path from the root node to the leaf node in tag tree corresponding to XML document the order of process, sequence label α can be expressed as formally:<a1, a2,, an 〉, a wherein iBe a label in the tally set, the number of the label that wherein comprises is called the length of sequence label, and length is that the sequence label of l is called the l-sequence label;
2) Mining Frequent sequence label: from TSDB, use frequent Sequential Pattern Mining Algorithm to excavate all frequent sequence label set FTS;
Described frequent sequence label refers to for given minimum support threshold value δ, if the support of sequence label α in TSDB, claims then that α is frequent sequence label more than or equal to δ, 0<δ≤1 in TSDB;
The support of described sequence label α in TSDB refers in TSDB to support the ratio of all number of files among the number of document of α and the TSDB, is designated as support (α);
The document of described support α refers to have a sequence label β in the document, so that β comprises α;
Described sequence label β:<b 1, b 2..., b nComprise sequence label α:<a 1, a 2..., a mRefer to exist integer sequence i 1<i 2<...<i m, so that
Figure FDA00002806645300011
Be denoted as
Figure FDA00002806645300012
Claim that also α is the subtab sequence of β, or β is the metatag sequence of α;
3) maximization: from FTS, select greatly frequent sequence label, obtain greatly frequent sequence label collection MFTS;
Described greatly frequent sequence label refers to for sequence label α, and the metatag sequence that does not have it in TSDB also is frequently;
4) conversion database: for each the sequence label α in each document among the TSDB, if have its a sub-sequence label among the MFTS, then α replaced with this subtab sequence, if there is no, then delete α, can obtain new database TSDB ' after all handling;
5) frequent correlation tag sequence is closed in excavation: use is closed Frequent Itemsets Mining Algorithm and is excavated all set FATS that frequent correlation tag sequence consists of that closes from TSDB ';
Described correlation tag sequence refers to the set of sequence label, and there is not another one sequence label β in any sequence label α in this set, so that β comprises α or α comprises β in the set;
Frequent correlation tag sequence refers to for given minimum support threshold value δ, if the support of correlation tag sequence γ in TSDB ', claims then that correlation tag sequence γ is frequent correlation tag sequence more than or equal to δ in TSDB '; 0<δ≤1;
The support of described correlation tag sequence γ in TSDB ' refers to support among the TSDB ' ratio of all number of files among the number of document of γ and the TSDB ', is designated as support (γ);
The document of described support correlation tag sequence γ refers to for any sequence label α among the γ, the document support α;
Describedly close frequent correlation tag sequence γ refer to that γ is frequently in TSDB ', and do not have its true superset η, so that their supports in TSDB ' are identical;
6) document representation: for any one the document d among the TSDB ' i, it is expressed as the set of closing frequent correlation tag sequence that it comprises, i.e. d i={ fats|fats ∈ FATS^d iSupportsfats};
7) structural similarity calculates: utilize formula
Figure FDA00002806645300021
Calculate any two document d among the collection of document C iAnd d jBetween structural similarity sim (d i, d j),
Wherein:
Figure FDA00002806645300022
CN 201110398187 2011-12-02 2011-12-02 XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence Active CN102521325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110398187 CN102521325B (en) 2011-12-02 2011-12-02 XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110398187 CN102521325B (en) 2011-12-02 2011-12-02 XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence

Publications (2)

Publication Number Publication Date
CN102521325A CN102521325A (en) 2012-06-27
CN102521325B true CN102521325B (en) 2013-04-24

Family

ID=46292242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110398187 Active CN102521325B (en) 2011-12-02 2011-12-02 XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence

Country Status (1)

Country Link
CN (1) CN102521325B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778104B (en) * 2012-10-22 2017-05-03 富士通株式会社 Information processing device, information processing method and electronic device
CN104036051B (en) * 2014-07-04 2017-04-05 南开大学 A kind of database schema abstraction generating method propagated based on label
CN104750609B (en) * 2015-03-26 2018-01-19 广东欧珀移动通信有限公司 Determine the method and device of interface layout compatibility
CN110297946A (en) * 2019-07-17 2019-10-01 哈尔滨工业大学 A kind of uncertain XML data storage method of magnanimity
CN110543467B (en) * 2019-08-14 2020-06-23 清华大学 Mode conversion method and device for time series database
CN110598869B (en) * 2019-08-27 2024-01-19 创新先进技术有限公司 Classification method and device based on sequence model and electronic equipment
CN114039744B (en) * 2021-09-29 2024-02-27 中孚信息股份有限公司 Abnormal behavior prediction method and system based on user feature labels

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8688673B2 (en) * 2005-09-27 2014-04-01 Sarkar Pte Ltd System for communication and collaboration
CN101127037A (en) * 2006-08-15 2008-02-20 临安微创网格信息工程有限公司 Periodic associated rule discovery algorithm based on time sequence vector diverse sequence method clustering

Also Published As

Publication number Publication date
CN102521325A (en) 2012-06-27

Similar Documents

Publication Publication Date Title
CN102521325B (en) XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence
Liu et al. MMKG: multi-modal knowledge graphs
Day et al. Reference metadata extraction using a hierarchical knowledge representation framework
Tu et al. Indices of novelty for emerging topic detection
CN104462582B (en) A kind of web data similarity detection method based on structure and content secondary filtration
Nayak et al. XML schema clustering with semantic and hierarchical similarity measures
CN101727498A (en) Automatic extraction method of web page information based on WEB structure
CN103942335A (en) Construction method of uninterrupted crawler system oriented to web page structure change
CN105893382A (en) Priori knowledge based microblog user group division method
Jung Boosting social collaborations based on contextual synchronization: An empirical study
Zhao et al. XML structural delta mining: Issues and challenges
Vadrevu et al. Information extraction from web pages using presentation regularities and domain knowledge
Xu et al. Construction of chinese sports knowledge graph based on neo4j
CN104217025A (en) System and method for extracting record items of multi-record web page
Sun et al. An on-line sequential learning method in social networks for node classification
CN107491524B (en) Method and device for calculating Chinese word relevance based on Wikipedia concept vector
CN103488757A (en) Clustering feature equivalent histogram maintaining method based on cloud computing
Nica et al. Exploring heterogeneous sequential data on river networks with relational concept analysis
Yang et al. Layered graph data model for data management of dataspace support platform
Wang et al. Research on a frequent maximal induced subtrees mining method based on the compression tree sequence
KR20080008573A (en) Method for extracting association rule from xml data
Leung et al. A new sequential mining approach to XML document similarity computation
Shin et al. On-line generation association rules over data streams
Chen et al. Robust and Efficient Annotation based on Ontology Evolution for Deep Web Data.
Paik et al. Fast Extraction of Maximal Frequent Subtrees Using Bits Representation.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220623

Address after: 201100 room 13, floor 2, building 4, No. 728, Guanghua Road, Minhang District, Shanghai

Patentee after: SHANGHAI DI'AN TECHNOLOGY Co.,Ltd.

Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an

Patentee before: Northwestern Polytechnical University

TR01 Transfer of patent right