CN102622432A - Measuring method of similarity between extensive makeup language (XML) file structure outlines - Google Patents

Measuring method of similarity between extensive makeup language (XML) file structure outlines Download PDF

Info

Publication number
CN102622432A
CN102622432A CN2012100484439A CN201210048443A CN102622432A CN 102622432 A CN102622432 A CN 102622432A CN 2012100484439 A CN2012100484439 A CN 2012100484439A CN 201210048443 A CN201210048443 A CN 201210048443A CN 102622432 A CN102622432 A CN 102622432A
Authority
CN
China
Prior art keywords
node
chain
xml
xml document
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100484439A
Other languages
Chinese (zh)
Other versions
CN102622432B (en
Inventor
高明霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN 201210048443 priority Critical patent/CN102622432B/en
Publication of CN102622432A publication Critical patent/CN102622432A/en
Application granted granted Critical
Publication of CN102622432B publication Critical patent/CN102622432B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data mining, in particular to a measuring method of similarity between extensive makeup language (XML) file structure outlines, which aims at quickly clustering XML data streams on line from the structure view on line and meeting higher requirements of the algorithm for internal memory and time. Structure outlines of the XML file are further provided. The algorithm analyzes the XML file in an SAX form, enables the file to be formed into an outline data structure-element chain (NodeList) capable of being expressed in increment mode by means of a whole name-code index table and the proceeding type stack technology and then calculates the similarity between two element chains through a user-defined formula. The measuring method utilizes the SAX to analyze the XML file, utilizes the proceeding type stack technology to obtain layer value, and achieves the effect that internal memory consumption is small in the process of structure outline construction. The whole internal memory is basically consumed in storage of element chain type clustering results and the whole name-index table.

Description

Method for measuring similarity between a kind of XML document structural outline
Technical field
The present invention relates to the data mining technology field, be specifically related to the method for measuring similarity between a kind of structural outline that is applicable to XML document stream that internal memory and time are had higher requirements or collection.
Background technology
XML is a kind of exchanges data and self-described language of sharing of being used for, become the proposed standard of W3C in February, 1998 after, obtained using widely.Follow the Web application and service of this standard, in real-time Data Transmission and exchange process, will produce in time and the continuous magnanimity streaming XML data that change.The for example various case histories of medical institutions and inspection data, the various passenger informations of aeronautical agency, user that search service is faced request, various network resources that network monitoring need be handled or the like.In order effectively to analyze these data, a possible solution is exactly according to file structure, the similar data with Semantic Clustering of content.
Existing XML data clusters technology is only supported static XML data set; Its core concept is: regard XML document as data point; Calculate distance or distinct matrix between the required document of cluster through selected XML document method for measuring similarity, utilize traditional clustering technique as: k-means or level mode etc. are accomplished the cluster task.The key that used XML document method for measuring similarity is the cluster effect in the cluster.Existing XML document method for measuring similarity roughly can be divided into two types: based on the method for tree editing distance with based on the method for file characteristics collection.
Method based on the tree editing distance is modeled as one tree or a figure with an XML document usually, and the similarity between two XML documents can be measured with the editing distance between these two trees or figure.The basic thought of editing distance be with the distance definition between two trees for utilizing editing operation, such as deletion, insertion, pruning etc., one tree is converted into the required minimum cost of other one tree.These class methods have only considered that node is different, and the node of not distinguishing different layers is also different to the influence of similarity.
Method based on the file characteristics collection is more direct, at first proposes variety of way and is used to represent that the XML document characteristic measures the similarity between XML document through the distance between these characteristics of direct calculating then.Concrete characteristic is varied; Relate to the bitmap index technology and represent the XML document characteristic; Represent the XML document characteristic with vector space model VSM, show the XML document structure, represent file structure characteristic etc. with simplifying the corresponding layer structure LevelStructure of labelled tree with the time sequence table.These class methods are mainly used in the processing static data, generally need repetitious document to read and resolve, and the characteristics of XML data stream are only to allow 1 time and carry out access and parsing according to the order of data arrives.
Research to the XML data has recently also expanded to XML data stream aspect; But prior art concentrates on mostly the processing of this type data stream and directly inquires about the field; For example: Christoph Koch and Stefanie Scherzinger have introduced a kind of Language XML Stream Attribute Grammars (XSAGs) of the XML of being used for continuous query; People such as Koch have proposed a FluXQuery engine of optimizing the XQuery engine; Seldom relate to further knowledge excavation such as online classification, online cluster etc.
Summary of the invention
The objective of the invention is for from the online quick clustering XML of structure angle data stream, satisfy this type algorithm, a kind of structural outline of XML document and the method for measuring similarity between this structural outline are provided the high requirement of internal memory and time.This algorithm with XML document with the SAX format analysis after; But by global name-code index table with carry out the summary data structure that the formula stack technology changes into the document form an incremental representation---element chain (NodeList), calculate the similarity of two element interchains then through a self-defined formula.
The structural outline technology of setting up XML document provided by the invention, concrete steps are following:
1) is pending XML document stream (or document sets) definition global element title-code index table, and this table is put sky.Each node comprises two parts content in this table: a part is the title that string format is used to deposit the differential element that pending XML document stream (or document sets) comprises; Another part is that integer data format is used to deposit the corresponding integer coding of this element.Coding rule is following: when XML document during with the SAX format analysis, this element of this integer representation begins incident and begins in flow of event (only write down element and begin incident) order of appearance for the first time at whole differential elements.
2) according to SAX format analysis XML document, obtain the beginning incident of each element, search global element title-code index table, if element term in chained list, then the coding of this element is exactly the corresponding integer of element term; If element term is not in chained list, then the encoded radio of this element equals in the chained list existing maximum integer and adds one, and this element term and corresponding integer coding are inserted global element title-code index table as new node.
3) based on carry out the formula stack technology obtain element-specific the layer value.Concrete operations are following: according to SAX format analysis XML document; Document begins incident and activates an empty stack structure; Along with the dynamic change of element data tuple in the XML document carry out stacked with go out stack operation; Be that element begins incident and End Event corresponding element is stacked respectively to operate with popping two kinds, the number of plies value of element is equal to the indicator marker that belongs to stack, and pointer increases progressively one since 0 at every turn.
4) but utilize the differential element integer coding get access to create the partial order element chain that XML document structural outline becomes incremental representation with its respective layer value.
5) the element chain is an index with the coding integer of element, has property capable of being combined, and just combined result will satisfy with a layer repeat element of the same name and only keeps a copy.Concrete anabolic process is following: given two element chain a and b; Begin the coding of first node two element chains of comparison from the chained list head, if a=b then continues the relatively layer value of first node; If layer value also equates; Then first node among a is inserted into the result element chain, otherwise first node among a and the b all is inserted into the result element chain, continue the relatively next node of two chained lists; If first nodes encoding comparative result is a>b, then first node in the b element chain is inserted into the result element chain, next node among first node and the b among the continuation comparison a; If first nodes encoding comparative result is a<b, then first node in a element chain is inserted into the result element chain, next node among first node and a among the continuation comparison b.
6) relatively two partial order element chains obtain publicly-owned element and respective layer value thereof; Comparison procedure is following: given two element chain a and b; Begin comparison and node is basic Moving Unit from the chained list head, if element encoding is smaller or equal to element encoding among the b among a, then a moves to next node; Otherwise b moves to next node, and comparison procedure continues.Record equal element coding and respective layer value thereof are used to calculate the similarity of element interchain in the comparison procedure.
The self-defining weighting formula of the present invention is used to calculate two element interchain similaritys (NodeSim):
NodeSim 1 ↔ 2 = ComWeight 1 + ComWeight 2 ObjWeight 1 + ObjWeight 2
= Σ i = 1 M 1 ( 1 / r ) L 1 i + Σ j = 1 M 2 ( 1 / r ) L 2 j Σ k = 1 N 1 ( 1 / r ) L 1 k + Σ k = 1 N 2 ( 1 / r ) L 2 k
Wherein, ComWeight 1With ComWeight 2The weight of the publicly-owned element of representing respectively to comprise in first and second the element chain add up with; ObjWeight 1And ObjWeight 2Represent respectively first with second element chain in the weight of all elements that comprises add up with; N 1And N 2Represent first and the element number of second element chain respectively; M 1And M 2Represent respectively first with second element chain in the number of publicly-owned element;
Figure BDA0000138609260000043
I the publicly-owned element number of plies representing first element chain,
Figure BDA0000138609260000044
The number of plies of j publicly-owned element of second element chain of expression;
Figure BDA0000138609260000045
With
Figure BDA0000138609260000046
The number of plies of representing first and k element of second element chain respectively; R is the decrement factor of element weight in the different layers, is designed to the User Defined parameter, and its value is greater than 1, according to experimental result, but the common value 2,4 of r.
The present invention uses the SAX analyzing XML file, and has utilized and carried out formula stack technology securing layer value, makes to set up in the process of structural outline, and memory consumption is very little.Whole memory consumption spends on the cluster result and global name-concordance list of preserving the element chain type basically.
Description of drawings
Fig. 1-(a) SAX of XML document resolves format sample
Fig. 1-(b) through carrying out the exemplary plot of formula stack securing layer value
Global element title-code index the table of Fig. 1-(c)
The element chain that the XML document of Fig. 1-(d) is corresponding
Fig. 2 makes up the exemplary plot of two element chains
Fig. 3 comparison two element chains obtain the exemplary plot of total element
The total element that Fig. 4 obtains after relatively
The memory consumption of Fig. 5-(a) is with the situation of change of document number
Fig. 5-(b) time spends the situation of change with the document number
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.XML document in following examples can be the concrete individual specimen in the online XML document stream.And suppose that whole process is to begin from the first pending document.Set up XML document corresponding element chain and relatively obtain the concrete treatment scheme that has element following:
1) is pending XML document stream (or document sets) definition global element title-code index table, and this table is put sky.
What an XML document fragment obtained after according to the SAX format analysis shown in Fig. 1-(a) is a marked flows, and the beginning incident of each differential element is used corresponding integer mark at the order that element begins in the flow of event.For example: first element term " W4F-DOC " beginning incident is first appearance, therefore is labeled as integer " 1 ".
2) according to resolving form; For the beginning incident of each element is obtained integer coding; Fig. 1-(c) is the state when handling the 5th element of four differential element aftertreatments, at first utilizes element term " LastName " to search global name-code index table, and this moment, element term was not in chained list; Then the encoded radio of this element equals in the chained list existing maximum integer " 4 " and adds one, and this element term and corresponding integer coding are inserted global element title-code index table as new node.
3) Fig. 1-(b) is based on and carries out the exemplary plot that the formula stack technology obtains element-specific layer value.Just as shown in the figure, the beginning incident of the 4th element is stacked with integer coding " 4 ", and this moment, corresponding stack pointer was 3, so this element layer value is " 3 ", and when this element End Event occurred, this integer coding was popped.
4) according to step 2) and 3), for each differential element gets access to corresponding integer coding and its layer value, created the corresponding partial order element chain of XML document structural outline among Fig. 1-(a) shown in Fig. 1-(d).
5) the element chain is an index with the coding integer of element, has property capable of being combined, and just combined result will satisfy with a layer repeat element of the same name and only keeps a copy.Fig. 2 is two element chain a, the concrete anabolic process of b.Shown in figure, a, the coding of first node all equates with layer value among the b, therefore only first node among a is inserted into the result element chain; The coding of the 3rd node equates, and layer value do not wait, so the 3rd node among a and the b all is inserted into the result element chain.
6) similarity (NodeSim) in order to calculate two element chains, needing relatively, two partial order element chains obtain publicly-owned element and respective layer value thereof.Fig. 3 is given two element chain a and c, the example of concrete comparison procedure.Begin comparison from the chained list head, first nodes encoding is equal, and then a moves to next node, moves to second node as relatively finishing back a among the figure for the first time, and first nodes encoding of continuation and b relatively; Second nodes encoding is greater than first nodes encoding among the b among a, and then b moves to second node, and comparison procedure continues.After such comparison, obtained the total element of the two element chains that thick line indicates among Fig. 4.
After obtaining total element, can utilize the self-defining weighting formula of the present invention to calculate two element interchain similaritys, suppose that the decrement factor of weight is set r=2, (1/r)=0.5 then, but from Fig. 4 major elements chain a element number N 1=8, element number N among the element chain c 2=9; Total element number equates M among element chain a, the c 1=M 2=8;
Figure BDA0000138609260000061
The number of plies of i publicly-owned element among the expression element chain a,
Figure BDA0000138609260000062
The number of plies of j publicly-owned element among the expression element chain c; With The number of plies of representing k element among element chain a and the element chain c respectively; Then the similarity computation process of two element interchains among Fig. 4 is following:
NodeSim 1 ↔ 2 = ComWeight 1 + ComWeight 2 ObjWeight 1 + ObjWeight 2
= Σ i = 1 M 1 ( 1 / r ) L 1 i + Σ j = 1 M 2 ( 1 / r ) L 2 j Σ k = 1 N 1 ( 1 / r ) L 1 k + Σ k = 1 N 2 ( 1 / r ) L 2 k
= ( 0.5 0 + 0.5 1 + 2 × 0.5 2 + 2 × 0.5 3 + 2 × 0.5 4 ) + ( 0.5 0 + 0.5 1 + 2 × 0.5 2 + 2 × 0.5 3 + 2 × 0.5 4 ) ( 0.5 0 + 0.5 1 + 2 × 0.5 2 + 2 × 0.5 3 + 2 × 0.5 4 ) + ( 0.5 0 + 0.5 1 + 2 × 0.5 2 + 3 × 0.5 3 + 2 × 0.5 4 )
= 0.974
For the method for measuring similarity between the structural outline of this XML document of evaluating and testing our invention in the effect aspect time and the internal memory; Adopt traditional division methods to carry out a series of cluster experiments then through this method tolerance similarity; And assess similarity with the up-to-date layer structure LevelStructure that passes through under the same conditions; The XCLS algorithm that uses division methods to carry out cluster then contrasts, and experimental design procedure and net result statement are as follows.
Experiment condition: the PC of Pentium IV, 2.4G internal memory, JAVA language are realized program, the setting of user definition parameter is also identical, specifically is that the decrement factor of weight is set r=2,1/r=0.5 then, minimum similarity threshold value is 0.8, maximum cluster number is 130.
Experimental data: one 10419 document simulated data collection have been used in experiment, and these data are to use traditional industries, such as civil aviaton, and network application etc., the XML Schema of middle maturation produces through XML instrument oXygen xml editor at random.Document size from several k to hundreds of k.
Experimental result: Fig. 5 is one group of contrast experiment's a experimental result; Although two kinds of clustering methods have all used a kind of specific XML document structural outline and corresponding method for measuring similarity; Can see that from the result the present invention obviously is being superior to the XCLS algorithm aspect time cost and the memory consumption.By analyzing the specific operation process of two kinds of measures, the present invention can be summarized as follows in the advantage aspect time cost and the memory consumption:
(1) structural outline of the present invention is an index with orderly integer, in the process of relatively obtaining total element, is O (max{N under the time complexity worst condition therefore 1, N 2), and obtaining the process that has element through layer structure LevelStructure assessment similarity, the time complexity worst condition is O{N 1* N 2).
(2) memory consumption aspect, the present invention uses the SAX analyzing XML file, and has utilized and carried out formula stack technology securing layer value, makes to set up in the process of structural outline, and memory consumption is very little.Whole memory consumption spends on the cluster result and global name-concordance list of preserving the element chain type basically.And the layer structure used in the XCLS algorithm is based on and simplifies labelled tree; This tree can be regarded as the simplification form of dom tree; Therefore setting up in the process of summary structure need be that document is set up corresponding labelled tree, and when document was very big, the memory consumption of this work was very big.Although it is suitable basically with the cluster result consumes memory of preserving the element chain type to preserve the internal memory of cluster result cost of layer version, the memory consumption difference is very big because of setting up in the process, and final difference as a result is also very remarkable.

Claims (1)

1. the method for measuring similarity between an XML document structural outline is characterized in that step is following:
1) is pending XML document stream or document sets definition global element title-code index table, and this table is put sky; Each node comprises two parts content in this table: a part is the title that string format is used to deposit the differential element that pending XML document stream or document sets comprise; Another part is that integer data format is used to deposit the corresponding integer coding of this element; Coding rule is following: when XML document during with the SAX format analysis, this element of this integer representation begins incident and begins the order that occurs for the first time in the flow of event at whole differential elements;
2) according to SAX format analysis XML document, obtain the beginning incident of each element, search global element title-code index table, if element term in chained list, then the coding of this element is exactly the corresponding integer of element term; If element term is not in chained list, then the encoded radio of this element equals in the chained list existing maximum integer and adds one, and this element term and corresponding integer coding are inserted global element title-code index table as new node;
3) based on carry out the formula stack technology obtain element-specific the layer value; Concrete operations are following: according to SAX format analysis XML document; Document begins incident and activates an empty stack structure; Along with the dynamic change of element data tuple in the XML document carry out stacked with go out stack operation; Be that element begins incident and End Event corresponding element is stacked respectively to operate with popping two kinds, the number of plies value of element is equal to the indicator marker that belongs to stack;
4) but utilize the differential element integer coding get access to create the partial order element chain that the XML document structural outline becomes incremental representation with its layer value;
5) the element chain is an index with the coding integer of element, has property capable of being combined, and just combined result will satisfy with a layer repeat element of the same name and only keeps a copy; Concrete anabolic process is following: given two element chain a and b; Begin the coding of first node two element chains of comparison from the chained list head, if a=b then continues the relatively layer value of first node; If layer value also equates; Then first node among a is inserted into the result element chain, otherwise first node among a and the b all is inserted into the result element chain, continue the relatively next node of two chained lists; If first nodes encoding comparative result is a>b, then first node in the b element chain is inserted into the result element chain, next node among first node and the b among the continuation comparison a; If first nodes encoding comparative result is a<b, then first node in a element chain is inserted into the result element chain, next node among first node and a among the continuation comparison b;
Relatively two partial order element chains obtain publicly-owned element and respective layer value thereof; Comparison procedure is following: given two element chain a and b; Begin comparison and node is basic Moving Unit from the chained list head, if element encoding is smaller or equal to element encoding among the b among a, then a moves to next node; Otherwise b moves to next node, and comparison procedure continues; Record equal element coding and respective layer value thereof are used to calculate the similarity of element interchain in the comparison procedure;
NodeSim 1 ↔ 2 = ComWeight 1 + ComWeight 2 ObjWeight 1 + ObjWeight 2
= Σ i = 1 M 1 ( 1 / r ) L 1 i + Σ j = 1 M 2 ( 1 / r ) L 2 j Σ k = 1 N 1 ( 1 / r ) L 1 k + Σ k = 1 N 2 ( 1 / r ) L 2 k
Wherein, ComWeight 1With ComWeight 2The weight of the publicly-owned element of representing respectively to comprise in first and second the element chain add up with; ObjWeight 1And ObjWeight 2Represent respectively first with second element chain in the weight of all elements that comprises add up with; N 1And N 2Represent first and the element number of second element chain respectively; M 1And M 2Represent respectively first with second element chain in the number of publicly-owned element;
Figure FDA0000138609250000023
I the publicly-owned element number of plies representing first element chain,
Figure FDA0000138609250000024
The number of plies of j publicly-owned element of second element chain of expression;
Figure FDA0000138609250000025
With
Figure FDA0000138609250000026
The number of plies of representing first and k element of second element chain respectively; R is the decrement factor of weight, and its value is greater than 1.
CN 201210048443 2012-02-27 2012-02-27 Measuring method of similarity between extensive makeup language (XML) file structure outlines Expired - Fee Related CN102622432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210048443 CN102622432B (en) 2012-02-27 2012-02-27 Measuring method of similarity between extensive makeup language (XML) file structure outlines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210048443 CN102622432B (en) 2012-02-27 2012-02-27 Measuring method of similarity between extensive makeup language (XML) file structure outlines

Publications (2)

Publication Number Publication Date
CN102622432A true CN102622432A (en) 2012-08-01
CN102622432B CN102622432B (en) 2013-07-31

Family

ID=46562351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210048443 Expired - Fee Related CN102622432B (en) 2012-02-27 2012-02-27 Measuring method of similarity between extensive makeup language (XML) file structure outlines

Country Status (1)

Country Link
CN (1) CN102622432B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077228A (en) * 2013-01-02 2013-05-01 北京科技大学 Set characteristic vector-based quick clustering method and device
CN106528508A (en) * 2016-10-27 2017-03-22 乐视控股(北京)有限公司 Repeated text judgment method and apparatus
CN108733681A (en) * 2017-04-14 2018-11-02 华为技术有限公司 Information processing method and device
CN109240903A (en) * 2017-06-15 2019-01-18 北京京东尚科信息技术有限公司 A kind of method and apparatus assessed automatically
CN114547404A (en) * 2022-01-10 2022-05-27 普瑞纯证医疗科技(苏州)有限公司 Big data platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799825A (en) * 2010-03-05 2010-08-11 南开大学 XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
CN101876995A (en) * 2009-12-18 2010-11-03 南开大学 Method for calculating similarity of XML documents
US20100306273A1 (en) * 2009-06-01 2010-12-02 International Business Machines Corporation Apparatus, system, and method for efficient content indexing of streaming xml document content
CN101996252A (en) * 2010-11-17 2011-03-30 浙江省电力试验研究院 Expression method of indexing information for node element in XML (Extensive Makeup Language) file
CN102043848A (en) * 2010-12-20 2011-05-04 北京大学 XML document tree example query method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306273A1 (en) * 2009-06-01 2010-12-02 International Business Machines Corporation Apparatus, system, and method for efficient content indexing of streaming xml document content
CN101876995A (en) * 2009-12-18 2010-11-03 南开大学 Method for calculating similarity of XML documents
CN101799825A (en) * 2010-03-05 2010-08-11 南开大学 XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
CN101996252A (en) * 2010-11-17 2011-03-30 浙江省电力试验研究院 Expression method of indexing information for node element in XML (Extensive Makeup Language) file
CN102043848A (en) * 2010-12-20 2011-05-04 北京大学 XML document tree example query method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姚文集 等: "基于滑动窗口的XML数据流聚类算法", 《计算机工程》, vol. 36, no. 13, 31 July 2010 (2010-07-31) *
郑仕辉 等: "XML 文档的相似测度和结构索引研究", 《计算机学报》, vol. 26, no. 9, 30 September 2003 (2003-09-30) *
高明霞 等: "XML 数据流中面向聚类的指数直方图", 《北京工业大学学报》, vol. 37, no. 8, 31 August 2011 (2011-08-31) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077228A (en) * 2013-01-02 2013-05-01 北京科技大学 Set characteristic vector-based quick clustering method and device
CN103077228B (en) * 2013-01-02 2016-03-02 北京科技大学 A kind of Fast Speed Clustering based on set feature vector and device
CN106528508A (en) * 2016-10-27 2017-03-22 乐视控股(北京)有限公司 Repeated text judgment method and apparatus
CN108733681A (en) * 2017-04-14 2018-11-02 华为技术有限公司 Information processing method and device
US11132346B2 (en) 2017-04-14 2021-09-28 Huawei Technologies Co., Ltd. Information processing method and apparatus
CN108733681B (en) * 2017-04-14 2021-10-22 华为技术有限公司 Information processing method and device
CN109240903A (en) * 2017-06-15 2019-01-18 北京京东尚科信息技术有限公司 A kind of method and apparatus assessed automatically
CN114547404A (en) * 2022-01-10 2022-05-27 普瑞纯证医疗科技(苏州)有限公司 Big data platform

Also Published As

Publication number Publication date
CN102622432B (en) 2013-07-31

Similar Documents

Publication Publication Date Title
CN101174273B (en) News event detecting method based on metadata analysis
Publio et al. ML-schema: exposing the semantics of machine learning with schemas and ontologies
Jaschke et al. Trias--An algorithm for mining iceberg tri-lattices
US8069190B2 (en) System and methodology for parallel stream processing
CN102622432B (en) Measuring method of similarity between extensive makeup language (XML) file structure outlines
CN106250412A (en) The knowledge mapping construction method merged based on many source entities
CN102982131B (en) A kind of based on markovian book recommendation method
CN107133213A (en) A kind of text snippet extraction method and system based on algorithm
CN105893641A (en) Job recommending method
CN107077459A (en) Equipment with communication interface and the method for access of controlling database
CN106067094A (en) A kind of dynamic assessment method and system
CN102982168B (en) A kind of metadata model matching process based on XML document
CN101136018A (en) Method and apparatus for preprocessing multiple documents and displaying searched result for retrieval
CN112100394B (en) Knowledge graph construction method for recommending medical expert
CN105975440A (en) Matrix decomposition parallelization method based on graph calculation model
Ji et al. Tag tree template for Web information and schema extraction
CN109460354A (en) A method of test case reduction is carried out based on RDF reasoning
CN102262658B (en) Method for extracting web data from bottom to top based on entity
CN105677638A (en) Web information extraction method
CN103544299B (en) A kind of construction method of business intelligence cloud computing system
CN114491082A (en) Plan matching method based on network security emergency response knowledge graph feature extraction
CN110019634A (en) The geographical spatial data correlating method and device of quantitative accurate
Kuntarto et al. Dwipa ontology III: Implementation of ontology method enrichment on tourism domain
CN113590779B (en) Construction method of intelligent question-answering system of knowledge graph in air traffic control field
CN102457569B (en) Redundancy check method and system for Web services facing IOT (Internet of Things) application

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130731

Termination date: 20140227