WO2017116341A2 - Système de traitement parallèle et de modélisation de données - Google Patents

Système de traitement parallèle et de modélisation de données Download PDF

Info

Publication number
WO2017116341A2
WO2017116341A2 PCT/TR2016/000209 TR2016000209W WO2017116341A2 WO 2017116341 A2 WO2017116341 A2 WO 2017116341A2 TR 2016000209 W TR2016000209 W TR 2016000209W WO 2017116341 A2 WO2017116341 A2 WO 2017116341A2
Authority
WO
WIPO (PCT)
Prior art keywords
xml
unit
processing
data
parallel
Prior art date
Application number
PCT/TR2016/000209
Other languages
English (en)
Other versions
WO2017116341A3 (fr
Inventor
Sezgin ONDER
Original Assignee
Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi filed Critical Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Publication of WO2017116341A2 publication Critical patent/WO2017116341A2/fr
Publication of WO2017116341A3 publication Critical patent/WO2017116341A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams

Definitions

  • the present invention relates to a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
  • XML Extensible Markup Language
  • XML is a frequently used technology which is used for creating semi-structured data structure; which is extensible, universal and used generally for configuration management of software solutions and messaging of devices and software inside networks.
  • network providers in terms of turning the data -which are used while carrying out system maintenance- into value; being able to perform data analysis after processing and data modelling of XML documents can provide positive gains for reducing operational costs by performing analysis of changes made in an operational sense manually and including automation solutions to
  • a solution enabling to perform efficient and quick XML processing, data modelling and then efficient data analysis for frequently used XML technology can also make serious contributions for data analysis methods which can be used in the fields of OSS/BSS (Operations Support System/Business Support System) in particularly telecom world.
  • OSS/BSS Operations Support System/Business Support System
  • the United States patent document no. US20140089332 an application in the state of the art, discloses a system for converting XML documents in parallel.
  • the United States patent document no. US2009089658 another application in the state of the art, discloses a system for modelling XML documents and recording them to a database.
  • An objective of the present invention is to realize a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
  • XML Extensible Markup Language
  • Figure 1 is a schematic view of the inventive system. The components illustrated in the figure are individually numbered, where the numbers refer to the following: 1.
  • the inventive system (1) for parallel processing and data modelling comprises:
  • At least one XML processing and modelling unit (3) where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure;
  • M mediator unit
  • At least one XML writing unit (5) where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel.
  • XML documents fed or received from XML source (K) will be mentioned in the description of the inventive system (1) and the expression of XML document will be used in the rest of the description of the invention.
  • XML documents can also be considered as XML messages in different embodiments of the invention and all kinds of transactions carried out in the inventive system (1) with respect to XML documents can also be carried out by XML messages.
  • the transactions which are stated to be carried out in the inventive system (1) in parallel are also transactions which are carried out by a plurality of threads at the same time as well.
  • the XML receiving unit (2) is a unit where XML documents flow from a XML source (K) or which receives XML documents from a XML source (K).
  • the XML receiving unit (2) is a unit which receives XML documents from XML source (K) by means of a plurality of threads in parallel such that it will process each thread in the same way and transfers them to the XML processing and modelling unit (3) -that will process XML documents in parallel and model the data- again in parallel.
  • the XML processing and modelling unit (3) is a unit where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure.
  • the XML processing and modelling unit (3) is a unit which is configured such that it can process an XML document having a correct format without knowing its schema, data sequence and structure, data types.
  • the XML processing and modelling unit (3) is a unit which is configured such that it will not require an obligation such that XML document is valid.
  • the modelled document which is created by modelling the data obtained by processing the XML document can also be created by the XML processing and modelling unit (3) as a modelled document on thereof such that analysis can be performed via a text-based search by any mediator unit (M) without having to know XML structure.
  • the XML processing and modelling unit (3) is a unit which is configured such that it takes a XML document -which does not completely have a correct format but part of it has a correct format- and can process its part having a correct format.
  • the XML processing and modelling unit (3) is a unit which creates data modelled such that it can be written to the databases (4) that are relational or non-relational as output.
  • the mediator unit (M) can send query to the database (4) only for the related part inside the data modelled over SQL (Structured Query Language) commands directly.
  • queries can be made by the mediator unit (M) by means of SQL and its derivative methods via an interface layer.
  • the XML processing and modelling unit (3) is a unit which models XML documents such that they can be inserted to a database (4) model having a single table.
  • the database (4) is a central unit wherein the modelled data is written by the XML writing unit (5) in parallel and which is configured such that a mediator unit (M) generating report, event, analysis result or alarm accesses so as to be transmitted to end systems (S) or displayed to persons by means of communication technique or displayed on an interface can perform analysis on thereof in parallel.
  • the database (4) can be a relational or non-relational database.
  • the database (4) is a unit keeping data in a table which has: ID that is an original identifier; PARENTID which is an original identifier of an upper tag in tag hierarchy in XML document; TAGNAME which is the string value of the tag; a CONTENT TYPE which indicates the content type of the tag, is evaluated by an enumeration value and relates to the tag hierarchy; CONTENT which indicates the value inside the tag; and CONTENT SEQ columns which helps multiple lines to be sent to the table upon being parsed if the hierarchy value of the feature under the tag or of the tag is too long.
  • CREATE DATE wherein there are date values created by the value of that moment for the lines added to the table and APP_NAME which determines for which application the table is being filled can be two of the columns included in the database (4) as well.
  • the said columns can be varied in many more ways.
  • the table located on the database (4) is a table wherein logical relations are created by means of additional columns over fields (for example, CONTENT TYPE) that can be diversified by enumeration.
  • the database (4) is a central storage space which stores XML data whereon analysis that can serve many purposes can be performed according to the content of the XML document received from the XML source (K) and modelled by the XML processing and modelling unit (3).
  • the XML writing unit (5) is a unit where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel.
  • the XML writing unit (5) is a unit which serializes the modelled XML data -that is received from the XML processing and modelling unit (3) in parallel- to as one serializer for each parallel branch.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système (1) qui garantit que les documents XML (Extensible Markup Language) sont traités en parallèle, les données comprises dans les documents XML étant écrites dans une base de données afin d'être prêtes pour une analyse de données à effectuer lors de la modélisation, ce qui permet également d'effectuer une analyse de données en parallèle. Le système de l'invention (1) comprend : une unité de réception XML (2), une unité de traitement XML (3), une base de données (4) et unité d'écriture XML (5).
PCT/TR2016/000209 2015-12-31 2016-12-26 Système de traitement parallèle et de modélisation de données WO2017116341A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR201517649 2015-12-31
TR2015/17649 2015-12-31

Publications (2)

Publication Number Publication Date
WO2017116341A2 true WO2017116341A2 (fr) 2017-07-06
WO2017116341A3 WO2017116341A3 (fr) 2017-08-03

Family

ID=58213311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2016/000209 WO2017116341A2 (fr) 2015-12-31 2016-12-26 Système de traitement parallèle et de modélisation de données

Country Status (1)

Country Link
WO (1) WO2017116341A2 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055355A1 (en) 2003-09-05 2005-03-10 Oracle International Corporation Method and mechanism for efficient storage and query of XML documents based on paths
US20090089658A1 (en) 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20140089332A1 (en) 2012-09-27 2014-03-27 Siemens Product Lifecycle Management Software Inc. Efficient conversion of xml data into a model using persistent stores and parallelism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631379B2 (en) * 2001-01-31 2003-10-07 International Business Machines Corporation Parallel loading of markup language data files and documents into a computer database
US7899834B2 (en) * 2004-12-23 2011-03-01 Sap Ag Method and apparatus for storing and maintaining structured documents
US20110289118A1 (en) * 2010-05-20 2011-11-24 Microsoft Corporation Mapping documents to a relational database table with a document position column

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055355A1 (en) 2003-09-05 2005-03-10 Oracle International Corporation Method and mechanism for efficient storage and query of XML documents based on paths
US20090089658A1 (en) 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20140089332A1 (en) 2012-09-27 2014-03-27 Siemens Product Lifecycle Management Software Inc. Efficient conversion of xml data into a model using persistent stores and parallelism

Also Published As

Publication number Publication date
WO2017116341A3 (fr) 2017-08-03

Similar Documents

Publication Publication Date Title
CN111400408B (zh) 数据同步方法、装置、设备及存储介质
US7024425B2 (en) Method and apparatus for flexible storage and uniform manipulation of XML data in a relational database system
US8583704B2 (en) Systems and methods for efficient data transfer
US7873649B2 (en) Method and mechanism for identifying transaction on a row of data
CN101098248B (zh) 一种基于配置描述文件实现通用网络管理的方法及系统
US20120330908A1 (en) System and method for investigating large amounts of data
US20180218052A1 (en) Extensible data driven etl framework
CN105227367A (zh) 一种低延迟的告警解析处理方法
CN102819591A (zh) 一种基于内容的网页分类方法及系统
US11347620B2 (en) Parsing hierarchical session log data for search and analytics
CN103914572A (zh) 数据库建立系统、装置和方法
CN111241065B (zh) 一种支持国产数据库的数据库适配开发与操作方法
US9665601B1 (en) Using a member attribute to perform a database operation on a computing device
CN116541411A (zh) Sql语句获取方法、报表生成方法、装置、计算机设备及存储介质置
CN116680354A (zh) 锂电池生产制造业元数据管理方法及系统
US7475090B2 (en) Method and apparatus for moving data from an extensible markup language format to normalized format
US7844601B2 (en) Quality of service feedback for technology-neutral data reporting
US7103872B2 (en) System and method for collecting and transferring sets of related data from a mainframe to a workstation
US20050010595A1 (en) System and method for automating an identification mechanism and type information configuration process for a real-time data feed to a database
Park et al. A Study on the Link Server Development Using B-Tree Structure in the Big Data Environment
WO2017116341A2 (fr) Système de traitement parallèle et de modélisation de données
US20110055279A1 (en) Application server, object management method, and object management program
CN105740997A (zh) 一种控制任务流程的方法、装置及数据库管理系统
CN110647518B (zh) 一种数据源融合计算方法、组件及装置
JP2004265421A (ja) 選択されたオブジェクトに関する情報の要求を生成するためのシステムおよび方法

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16843211

Country of ref document: EP

Kind code of ref document: A2