CN110347879A - A kind of rule-based data normalization method and system - Google Patents

A kind of rule-based data normalization method and system Download PDF

Info

Publication number
CN110347879A
CN110347879A CN201910630119.XA CN201910630119A CN110347879A CN 110347879 A CN110347879 A CN 110347879A CN 201910630119 A CN201910630119 A CN 201910630119A CN 110347879 A CN110347879 A CN 110347879A
Authority
CN
China
Prior art keywords
data
rule
conversion
management
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910630119.XA
Other languages
Chinese (zh)
Inventor
严春利
杨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI XILING INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI XILING INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI XILING INFORMATION TECHNOLOGY Co Ltd filed Critical SHANGHAI XILING INFORMATION TECHNOLOGY Co Ltd
Priority to CN201910630119.XA priority Critical patent/CN110347879A/en
Publication of CN110347879A publication Critical patent/CN110347879A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion

Abstract

The invention discloses a kind of rule-based data normalization method and systems, belong to data processing field, including regulation management, management conversion, outgoing management, regulation management define data normalization rule, the data normalization rule that management conversion is defined according to regulation management, input data is converted into standardized structural data, standardized structural data is carried out persistence output by outgoing management.The present invention can generate Different Rule according to user's specification demand, rule file is realized by configuring XML file, is easily understood, and new data access is without changing scheme and carrying out secondary development, only need to increase corresponding conversion rule, so that it may realize the standardization of data in different formats.

Description

A kind of rule-based data normalization method and system
Technical field
The present invention relates to data processing field more particularly to a kind of rule-based data normalization method and system.
Background technique
A very important system is exactly data networking system in informatization.All trades and professions, portion, governments at all levels Door has all built a large amount of information system.By taking smart city as an example, provinces and cities district is at different levels all to have built respective system, and by The data of step access various industries and department.Since the time of construction is different, the standard of construction is different, the exploitation rule of contractor Model, external interface are all multifarious.Thus the construction of data networking system becomes the weight of smart city system engineerings at different levels construction Component part is wanted, by networked system platform construction, achievable core technology platform in smart city at different levels is built, and is subsequent Solid technical foundation is established in every profession and trade Applications construct and access.Data networking system is as under smart city overall architectures at different levels Basic platform, the business datum that will be dispersed in each department is processed, analysis mining formed it is unified, complete, have The data assets system of sequence realizes inter-trade, trans-departmental, trans-regional integrated application and data sharing by shared exchange.? Data are multifarious during data network, increase the difficulty of analysis.And application platform is with the increase of the system of docking, branch The data class held increases therewith.Networking docking every time requires individually to develop, and the data after parsing cannot be multiplexed.Not only Exploitation is complicated, is also not easy save the cost.Therefore, data are standardized into as a very important class in development of information system Topic.
For data normalization, existing market has following several common methods:
1, demand customizes
According to user's data-oriented, customized solution carries out the customized development of demand, can use mesh to reach data 's.
The shortcomings that this method is to be unable to satisfy unknown data access standardization demand, and the docking of each data requires It is customized according to the format of Interworking Data and melts hair, development amount is big, and the docking period is long.
2, different data format is supported by exposure code.
Such as the language such as C, NodeJS, Java are placed in conversion platform, allow enterprise that can be based on data in different formats standard Change.
The shortcomings that this method is O&M cost height, is not easy to extend.
Therefore, those skilled in the art is dedicated to developing a kind of rule-based data normalization method and system, root According to the rule file of predefined, without developing the standardization for achieving that data.
Summary of the invention
In view of the above drawbacks of the prior art, the technical problem to be solved by the present invention is to can be according to user's specification Demand generates Different Rule, and the data of different-format realize the standardization of data by configuring the rule of correspondence.New data connects Enter without changing scheme and development, it is only necessary to increase new rule.
To achieve the above object, the present invention provides a kind of rule-based data normalization method, including regulation management, Management conversion, outgoing management, regulation management define data normalization rule, the data mark that management conversion is defined according to regulation management Input data is converted to standardized structural data by standardization rule, and outgoing management is defeated by standardized structural data progress persistence Out.
Further, data normalization rule uses XML file format.
Further, regulation management includes rule parsing and rule match, and XML file is loaded into memory by rule parsing In, rule match provides corresponding data normalization rule for management conversion.
Further, rule parsing the following steps are included:
S101, data normalization rule file is read from configuration path;
S102, data normalization rule file is parsed with xml tool;
S103, a node element is created;
S104, continuously start label if it is one, then create a model node;
If S105, node are Condition, it is that special joint does not create node element, takes out attribute assignment to model Node condition storage organization Condition;
S106, XML value are defaulted as string type, for subsequent conversion, are converted to value pair according to type attribute Answer type;
S107, taking-up attribute assignment give model node condition storage organization Condition;
S108, specially treated modification beginning label and end label, so that next node is with Condition node same Under one father node;
If element is stored as sky, manual creation memory space in S109, model node;
S110, newly-built node element is put into model node map;
S111, the model father node for creating node element are "current" model;
S112, if it is a continuous end-tag, then need to move back upwards once to father's model node;
S113, end-tag structuring are completed;
S114, the storage of structuring rule is corresponded in Hash table to ID.
Further, rule match the following steps are included:
S201, rule ID is obtained according to data definition model;
S202, data corresponding conversion rule is searched according to rule ID;
S203, the transformation rule result that will match to are with the return of structuring pointer.
Further, management conversion include data parsing and data conversion, data parsing by the source data of different-format into Source data is converted to standardized data according to transformation rule by row structure elucidation, data conversion.
Further, data parsing the following steps are included:
S301, source data is received, source data structureization is arrived into memory;
S302, source data stem identification information is obtained;
S303, according to stem identification information create-rule ID;
S304, corresponding data transformation rule is obtained according to rule ID.
Further, data conversion the following steps are included:
S401, transformation rule key is obtained;
S402, search in data whether have corresponding field according to key value in rule;
S403, corresponding field is found, judges to operate in next step according to data format;
S404, if it is data type, recycle all elements in array;
S405, object is then further split if it is object;
S406, if it is individual element, data value is taken out according to element key corresponding conversion destination field and is put into caching;
S407, circular recursion execute step S403, S404, S405, S406, all fields in source data are traversed, according to rule Then find corresponding transformation result;
S408, the structured result after conversion is exported.
Further, standardized structural data is assembled into json format and persistence, persistence method packet by outgoing management Include kafka, rest, database etc., specific implementation process the following steps are included:
S501, internal standard data structure is obtained according to data type;
S502, data type is obtained according to field name;
S503, external field type is converted to by standardized type according to data type;
S504, circulation execute S502, S503 until type in data is converted to standardized type;
S505, data after standardization are packaged as to the unified output of json;
S506, demand persistence is exported according to user.
The invention also discloses a kind of rule-based data normalization systems, including input data module, data model Module, rules administration module, change data module, packaged data module, output data module carry out data according to preceding method Standardization.
The present invention can generate Different Rule according to user's specification demand, and new data access is without changing scheme and secondary Exploitation, it is only necessary to increase corresponding conversion rule, so that it may realize the standardization of data in different formats.Rule file, which passes through, to be configured XML file is realized, is easily understood.
It is described further below with reference to technical effect of the attached drawing to design of the invention, specific structure and generation, with It is fully understood from the purpose of the present invention, feature and effect.
Detailed description of the invention
Fig. 1 is the data interaction logic chart of a preferred embodiment of the invention;
Fig. 2 is the flow chart of data processing figure of a preferred embodiment of the invention;
Fig. 3 is the rule file building-block of logic of a preferred embodiment of the invention.
Specific embodiment
Multiple preferred embodiments of the invention are introduced below with reference to Figure of description, keep its technology contents more clear and just In understanding.The present invention can be emerged from by many various forms of embodiments, and protection scope of the present invention not only limits The embodiment that Yu Wenzhong is mentioned.
In the accompanying drawings, the identical component of structure is indicated with same numbers label, everywhere the similar component of structure or function with Like numeral label indicates.The size and thickness of each component shown in the drawings are to be arbitrarily shown, and there is no limit by the present invention The size and thickness of each component.Apparent in order to make to illustrate, some places suitably exaggerate the thickness of component in attached drawing.
As shown in figure, rule-based data normalization system, including input data module, data model module, rule Then management module, change data module, packaged data module, output data module.Input data module is responsible for the defeated of source data Enter, by data model module, determines the rule model used needed for conversion.Rules administration module is responsible for the pipe of transformation rule Reason, according to the type of source data, increases required transformation rule.Data conversion module then according to transformation rule, realizes source data Specific conversion, the normal data of structuring is generated by packaged data module, output data module is raw by packaged data module At structural data persistence.
Motor vehicle camera shooting can generate a large amount of data in intelligent safety and defence system and each department is frequently necessary to carry out data Exchange, the data format generated without homologous ray are different, it is therefore desirable to the conversion of data is carried out according to standard criterion.
One specific source data is as follows:
The source data include the essential information of motor vehicle, the lane taken in use process, speed, shooting time, The information of the information such as device number, picture format, picture store path and several picture concerneds.
In order to realize different departments, data sharing between the superior and the subordinate, need to turn to the data standard into unified format. Standardized data include following field: MotorVehicleID, InfoKind, SourceID, ScenesImageID, ScenesEventSort、ScenesDeviceID、ScenesType、FaceImageID、FaceEventSort、 FaceDeviceID, FaceType field.The field that wherein standardized data needs has corresponding field in source data, can be with It directly acquires, some fields then need the different type according to picture, selectively carry out the extraction and conversion of data.
For this reason, it may be necessary to increase the transformation rule for being directed to source data motor vehicle SDK in rule file, using XML File format description, specific rules file are described as follows:
The logical construction of rule file is as shown in Figure 3, describes the logical relation of rule file, and according to different items Part carries out the logic of pictorial information standardization conversion.Wherein, MotorVehicleID, InfoKind, SourceID field can be with It directly extracts, ScenesImageID, ScenesEventSort, ScenesDeviceID, ScenesType are needed according to source number Type value according to middle picture is that the pictorial information of " 14 " is converted;FaceImageID,FaceEventSort, FaceDeviceID, FaceType then need to be converted according to the pictorial information that the Type value of picture in source data is " 1 ".Cause This, respectively defines two Condition nodes in rule file, to difference ImageID in SubImageInforObject The Type value of picture is judged, and carries out assignment according to judging result.
Data to be converted carry out data conversion as source data, according to data conversion flow chart shown in figure two, and step is such as Under:
Step 1: start.
Step 2: the transformation rule of entitled motor vehicle SDK is imported memory by initialization transformation rule.It is specific to import step Suddenly are as follows:
Rule file " motor vehicle SDK " is read from configuration path;
The rule file is parsed with XML tool;
Successively establish the transformation rule of MotorVehicleID, InfoKind, SourceID;
Establish a condition node, if the value of type be " 14 ", successively establish ScenesImageID, The transformation rule of ScenesEventSort, ScenesDeviceID, ScenesType;
Establish another condition node, if the value of type be " 1 ", successively establish FaceImageID, The transformation rule of FaceEventSort, FaceDeviceID, FaceType;
Label construction is parsed.
Step 3: judging whether initialization succeeds, and as unsuccessful, terminates, successful then continue step 4.
Step 4: data are judged whether there is.
Step 5: data are obtained.
Step 6: data parsing parses fields all in MotorVehicleListObject.
Step 7: according to rule described in step 2 carry out data conversion, be successively converted to MotorVehicleID, InfoKind, SourceID, two nodes being then resolved in SubImageInfoObject, wherein first node Type value is " 14 ", the value of ImageID, EventSort, DeviceID, Type of the corresponding picture by ImageID=10001 Successively it is assigned to ScenesImageID, ScenesEventSort, ScenesDeviceID, ScenesType;Wherein second The type value of node is " 1 ", ImageID, EventSort, DeviceID, Type of the corresponding picture by ImageID=10002 Value be successively assigned to FaceImageID, FaceEventSort, FaceDeviceID, FaceType.
Step 8: data be packaged, by the MotorVehicleID, InfoKind being converted in step 7, SourceID, ScenesImageID、ScenesEventSort、ScenesDeviceID、ScenesType、FaceImageID、 FaceEventSort, FaceDeviceID, FaceType field are assembled into json format.
Step 9: output data completes the persistence of data.
Step 10: judging whether that there are also data, if returning to step 4 there is also data and repeating conversion operation, if number According to all having handled, then process terminates.
By above step, for just now given source data, obtained normalization output data are as follows:
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that the ordinary skill of this field is without wound The property made labour, which according to the present invention can conceive, makes many modifications and variations.Therefore, all technician in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Scheme, all should be within the scope of protection determined by the claims.

Claims (10)

1. a kind of rule-based data normalization method, which is characterized in that including regulation management, management conversion, outgoing management, The regulation management defines data normalization rule, the data standard that the management conversion is defined according to the regulation management Change rule, input data is converted into standardized structural data, the outgoing management holds the standardized structural data Longization output.
2. rule-based data normalization method as described in claim 1, which is characterized in that the data normalization rule Using XML file format.
3. rule-based data normalization method as claimed in claim 2, which is characterized in that the regulation management includes rule Then the XML file is loaded into memory by parsing and rule match, the rule parsing, and the rule match is the conversion Management provides the corresponding data normalization rule.
4. rule-based data normalization method as claimed in claim 3, which is characterized in that the rule parsing include with Lower step:
S101, data normalization rule file is read from configuration path;
S102, data normalization rule file is parsed with XML tool;
S103, a node element is created;
S104, continuously start label if it is one, then create a model node;
If S105, node are Condition, it is that special joint does not create node element, takes out attribute assignment to model node Condition storage organization Condition;
S106, XML value are defaulted as string type, and for subsequent conversion, value is converted to corresponding class according to type attribute Type;
S107, taking-up attribute assignment give model node condition storage organization Condition;
S108, specially treated modification beginning label and end label, so that next node and Condition node are in same father Under node;
If element is stored as sky, manual creation memory space in S109, model node;
S110, newly-built node element is put into model node map;
S111, the model father node for creating node element are "current" model;
S112, if it is a continuous end-tag, then move back upwards once to father's model node;
S113, end-tag structuring are completed;
S114, the storage of structuring rule is corresponded in Hash table to ID.
5. rule-based data normalization method as claimed in claim 3, which is characterized in that the rule match include with Lower step:
S201, rule ID is obtained according to data definition model;
S202, data corresponding conversion rule is searched according to rule ID;
S203, the transformation rule result that will match to are with the return of structuring pointer.
6. rule-based data normalization method as claimed in claim 2, which is characterized in that the management conversion includes number According to parsing and data conversion, the source data of different-format is carried out structure elucidation by data parsing, the data conversion according to Source data is converted to standardized data by transformation rule.
7. rule-based data normalization method as claimed in claim 6, which is characterized in that the data parsing include with Lower step:
S301, source data is received, source data structureization is arrived into memory;
S302, source data stem identification information is obtained;
S303, according to stem identification information create-rule ID;
S304, corresponding data transformation rule is obtained according to rule ID.
8. rule-based data normalization method as claimed in claim 6, which is characterized in that the data conversion include with Lower step:
S401, transformation rule key is obtained;
S402, search in data whether have corresponding field according to key value in rule;
S403, corresponding field is found, judges to operate in next step according to data format;
S404, if it is data type, recycle all elements in array;
S405, object is then further split if it is object;
S406, if it is individual element, data value is taken out according to element key corresponding conversion destination field and is put into caching;
S407, circular recursion execute step S403, S404, S405, S406, traverse all fields in source data, are looked for according to rule To corresponding transformation result;
S408, the structured result after conversion is exported.
9. rule-based data normalization method as claimed in claim 2, which is characterized in that the outgoing management will be described Standardized structural data is assembled into json format and persistence, and the persistence method includes kafka, rest, database etc., tool Body realize process the following steps are included:
S501, internal standard data structure is obtained according to data type;
S502, data type is obtained according to field name;
S503, external field type is converted to by standardized type according to data type;
S504, circulation execute S502, S503 until type in data is converted to standardized type;
S505, data after standardization are packaged as to the unified output of json;
S506, demand persistence is exported according to user.
10. a kind of rule-based data normalization system, which is characterized in that including input data module, data model module, Rules administration module, change data module, packaged data module, output data module, each module is according to such as claim 1 ~9 described in any item methods carry out data normalization.
CN201910630119.XA 2019-07-12 2019-07-12 A kind of rule-based data normalization method and system Pending CN110347879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910630119.XA CN110347879A (en) 2019-07-12 2019-07-12 A kind of rule-based data normalization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910630119.XA CN110347879A (en) 2019-07-12 2019-07-12 A kind of rule-based data normalization method and system

Publications (1)

Publication Number Publication Date
CN110347879A true CN110347879A (en) 2019-10-18

Family

ID=68176063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910630119.XA Pending CN110347879A (en) 2019-07-12 2019-07-12 A kind of rule-based data normalization method and system

Country Status (1)

Country Link
CN (1) CN110347879A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125997A (en) * 2019-12-27 2020-05-08 中国银行股份有限公司 Text data standardization processing method and device
CN111597390A (en) * 2020-03-17 2020-08-28 用友网络科技股份有限公司 Data format conversion framework
CN112000652A (en) * 2020-08-17 2020-11-27 杭州数云信息技术有限公司 Standardized processing engine and processing method based on real-time computing data
CN112487072A (en) * 2020-11-24 2021-03-12 云汉芯城(上海)互联网科技股份有限公司 Method, device, system and medium for standardizing parameter structure of electronic component
CN112650806A (en) * 2020-12-30 2021-04-13 邦邦汽车销售服务(北京)有限公司 ERP system docking accessory data standardization method and device and storage medium
CN112948637A (en) * 2021-03-30 2021-06-11 上海熙菱信息技术有限公司 Rule-based data standardization system
CN112948479A (en) * 2021-04-16 2021-06-11 深圳市今天国际物流技术股份有限公司 Data structure interconversion method based on aviator
CN113836211A (en) * 2021-09-24 2021-12-24 央视国际网络无锡有限公司 Data extraction method for accessing Internet of things equipment data to JAVA platform
CN114064720A (en) * 2021-11-15 2022-02-18 中国建设银行股份有限公司 Heterogeneous stream data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930523A (en) * 2016-05-25 2016-09-07 中国科学院新疆理化技术研究所 Dynamic configurable rule-based data cleaning framework under big data background
CN107870917A (en) * 2016-09-23 2018-04-03 中国电信股份有限公司 Transmission network management system data convert and inverse transformation method and standardized system
CN108874847A (en) * 2017-12-26 2018-11-23 北京安天网络安全技术有限公司 Matching process, device, electronic equipment and the storage medium of custom rule

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930523A (en) * 2016-05-25 2016-09-07 中国科学院新疆理化技术研究所 Dynamic configurable rule-based data cleaning framework under big data background
CN107870917A (en) * 2016-09-23 2018-04-03 中国电信股份有限公司 Transmission network management system data convert and inverse transformation method and standardized system
CN108874847A (en) * 2017-12-26 2018-11-23 北京安天网络安全技术有限公司 Matching process, device, electronic equipment and the storage medium of custom rule

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125997A (en) * 2019-12-27 2020-05-08 中国银行股份有限公司 Text data standardization processing method and device
CN111597390A (en) * 2020-03-17 2020-08-28 用友网络科技股份有限公司 Data format conversion framework
CN112000652A (en) * 2020-08-17 2020-11-27 杭州数云信息技术有限公司 Standardized processing engine and processing method based on real-time computing data
CN112487072A (en) * 2020-11-24 2021-03-12 云汉芯城(上海)互联网科技股份有限公司 Method, device, system and medium for standardizing parameter structure of electronic component
CN112487072B (en) * 2020-11-24 2021-06-08 云汉芯城(上海)互联网科技股份有限公司 Method, device, system and medium for standardizing parameter structure of electronic component
CN112650806A (en) * 2020-12-30 2021-04-13 邦邦汽车销售服务(北京)有限公司 ERP system docking accessory data standardization method and device and storage medium
CN112948637A (en) * 2021-03-30 2021-06-11 上海熙菱信息技术有限公司 Rule-based data standardization system
CN112948479A (en) * 2021-04-16 2021-06-11 深圳市今天国际物流技术股份有限公司 Data structure interconversion method based on aviator
CN113836211A (en) * 2021-09-24 2021-12-24 央视国际网络无锡有限公司 Data extraction method for accessing Internet of things equipment data to JAVA platform
CN113836211B (en) * 2021-09-24 2024-02-20 央视国际网络无锡有限公司 Data extraction method for accessing data of internet of things equipment to JAVA platform
CN114064720A (en) * 2021-11-15 2022-02-18 中国建设银行股份有限公司 Heterogeneous stream data processing method and device

Similar Documents

Publication Publication Date Title
CN110347879A (en) A kind of rule-based data normalization method and system
Michail An introduction to temporal graphs: An algorithmic perspective
Poisot et al. mangal–making ecological network analysis simple
EP3720041A1 (en) Method for making smart contract executable in block chain network, and node
CN107798017B (en) Method and system for generating execution plan information in distributed database
CN113361658A (en) Method, device and equipment for training graph model based on privacy protection
EP3545372B1 (en) Building management system having knowledge base
Uzam Synthesis of feedback control elements for discrete event systems using Petri net models and theory of regions
Aminof et al. On the expressive power of communication primitives in parameterised systems
Nicklas et al. Adding high-level reasoning to efficient low-level context management: A hybrid approach
Chatzigiannakis et al. Mediated population protocols
Savi et al. Liveness and boundedness analysis for Petri nets with event graph modules
Risau-Gusman et al. Escaping from cycles through a glass transition
US20120131597A1 (en) Message routing based on modeled semantic relationships
KR101743731B1 (en) Method and apparatus for processing quary based on ontology generated by collaborating distributed data
Greer Concept trees: building dynamic concepts from semi-structured data using nature-inspired methods
CN115309743A (en) Multi-data center space situation awareness data sharing method, device and equipment
Pasandideh et al. Modelling cyber physical social systems using dynamic time Petri nets
Buslaev et al. On holonomic mathematical F‐pendulum
Trujillo Ramsey for<? EmphasisB type=" normalmath"?> ultrafilter mappings and their Dedekind cuts
Hudry et al. Some results about a conjecture on identifying codes in complete suns
Averian A Programming Model for Context-Aware Applications in Digital Ecosystems
Jauregui et al. Distributed interactive proofs for the recognition of some geometric intersection graph classes
Kuikka et al. Models of influence spreading on social networks
Medina Guevara et al. Evolution of electoral preferences for a regime of three political parties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination