CN109446157A - System and method are looked into a kind of data format school based on format data - Google Patents

System and method are looked into a kind of data format school based on format data Download PDF

Info

Publication number
CN109446157A
CN109446157A CN201811212343.9A CN201811212343A CN109446157A CN 109446157 A CN109446157 A CN 109446157A CN 201811212343 A CN201811212343 A CN 201811212343A CN 109446157 A CN109446157 A CN 109446157A
Authority
CN
China
Prior art keywords
data
school
rule
format
looked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811212343.9A
Other languages
Chinese (zh)
Other versions
CN109446157B (en
Inventor
杨文杰
李昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HONGXU INFORMATION TECHNOLOGY Co Ltd WUHAN
Original Assignee
HONGXU INFORMATION TECHNOLOGY Co Ltd WUHAN
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HONGXU INFORMATION TECHNOLOGY Co Ltd WUHAN filed Critical HONGXU INFORMATION TECHNOLOGY Co Ltd WUHAN
Priority to CN201811212343.9A priority Critical patent/CN109446157B/en
Publication of CN109446157A publication Critical patent/CN109446157A/en
Application granted granted Critical
Publication of CN109446157B publication Critical patent/CN109446157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a kind of, and system and method are looked into the data format school based on format data, are related to digital text field of data transmission.This system is: data acquisition module (1), data school look into filtering module (2) and wrong data processing module (3) successively interacts, and realize that the wrong school of format data is looked into, filters and stored;Filtering module (2) is looked into data school and rule module (4) interaction is looked into format data school, realizes that the statistics and priority level initializing of rule are looked into school.1. the present invention, which includes, to be checked the format of format data, high-efficient and error checking is accurate;2. the case where using the hit of real-time statistics rule, improving the efficiency that school is looked into;3. various wrong data sample analyses for deliberation can be saved;4. using the form of configurable rule, scalability is strong.The present invention can provide an efficient easy-to-use technical method for the format check of all kinds of magnanimity format datas.

Description

System and method are looked into a kind of data format school based on format data
Technical field
The present invention relates to digital text field of data transmission more particularly to a kind of data formats based on format data System and method are looked into school.
Background technique
With the fast development of the technologies such as internet, big data and artificial intelligence, transmission, storage, the analysis of mass data The development of processing and excacation is like a raging fire, and format data occupies significant proportion wherein.The not number between homologous ray Also the correctness of data format is required according to conversion stringent.In the hardware and software development in these fields and application process, to having The requirement for imitating the supplementary means of the format correctness of each intermodule generation data of rapid verification is more and more prominent.
The format of general standard Protocol Data there are many software as similar wireshark provide it is powerful check and Analytic function, the data of reference format can be allowed by consensus standard, and word for word section is shown, moreover it is possible to which each data correlation is shown and identified Wrong place.And the data format of proprietary protocol is the software powerful and mature without those to support what school was looked into, especially In the development phase, needs and auxiliary tool is looked into effective school of Various types of data different-format to help to check and generate the correct of data Property.
Summary of the invention
The object of the invention is that providing a kind of data format school based on formatted text data looks into system and its side Method.
The present invention is whether the formats of detection transmission or storage formatting data whether to comply with standard be valid data, and can Real-time monitoring provides help to repair and optimizing data!Mistake can be rejected by checking for the format data in big data Invalid data makes data analyze and excavate more acurrate.It is supervised in real time for the data in transmission, also can be related in development process The format data of module output provides effective auxiliary system of a check sum commissioning.
Realizing the object of the invention technical solution is:
The present invention mainly to the data format issues of transmission or the format data stored, with real-time statistics, supervises wrong lattice The type of formula, quantity look into parameter of regularity to calculate and correct school to realize the efficient inspection to data format issues are formatted;It is logical The data class and error category for crossing statistics carry out sample preservation to every kind of mistake of Various types of data, for maintenance and development It provides and assists to support.
The present invention looks into school the research that parameter of regularity adjusts in real time, realizes by the following method to formatted data structures Efficient verification.
1, setting verification rule refers to that looking into all schools of rule module dynamic configuration formulation by school looks into rule;Its lieutenant colonel looks into rule Then module is supported to be arranged by collocation form;School looks into rule module and periodically looks into module to data school and issue newest rule, can be with As needed, increase at any time, delete, changing, looking into rule and its parameter, jump can be ignored by which field format configured needing school which is looked into It crosses, to adapt to different it is required that verifying effect and performance more preferably.
2, the statistics of rule hit and the calculating of rule prioritization refer to that the every rule of rule module statistics is looked into unit in school The number being hit in time, and calculate other parameters and original that hit frequency of each rule within the unit time combines it to configure Some priority from new school of formulating looks into rule and its each parameter;The priority of the high rule of hit frequency is improved, shielding is long-term The rule not being hit.
3, show that real-time rule is participated and draw with manual analysis and corrected for reference to the indices of its statistics, so that data School looks into module and can obtain one and simplify efficient school looking into rule.
Specifically:
One, system (abbreviation system) is looked into the data format school based on format data
This system includes that data acquisition module, data school are looked into filtering module, wrong data processing module and format data school and looked into Rule module;
Its interactive relation is:
Data acquisition module, data school look into filtering module and wrong data processing module successively interacts, and realize format data It looks into, filter and stores in mistake school;
Filtering module is looked into data school and rule module interaction is looked into format data school, realizes that the statistics of rule is looked into school and priority is set It is fixed.
Two, the data format school checking method (abbreviation method) based on format data
This method includes the following steps:
1. the initial configuration that rule module completes rule first is looked into format data school, including the number to formatted data structures According to length, type, separator, field number, the data type of each field, value range and length, the school of each rule is looked into preferentially Grade and validity, and be handed down to data school and look into filtering module;
2. data acquisition module receives format data, unified to save into file, or directly reads the formatting number of document form According into memory, data are transmitted to data school in the form of a message and look into filtering module, slitting provides format data and looks into data school Filtering module processing;
3. it is to look into rule that rule module issues by format data school to pieces of data successively to every that format data school, which is looked into, Rule checks, and the sequence that rule is looked into school sorts by the rule prioritization in rule;
Go out problematic data format 4. filtering module is looked into data school according to all kinds of rule analysis and record the rule being hit, The data for there are format issues and the Rule Information being accordingly hit are issued into wrong data processing module, the rule being hit is believed Breath issues format data school and looks into rule module;
5. wrong data processing module is the storage center of error format data instance, the rule that it is hit according to wrong data To determine whether such stored wrong sample;
6. the real-time adjuster that rule module is rule is looked into format data school, it receives and counts the letter for the rule being hit Breath, calculates the frequency that each rule is hit, and looks into sequence according to the school of this frequency planning rule and retains.
The present invention has following advantages and good effect:
1. looking into system with format school to check the format of format data, high-efficient and error checking is accurate;
2. the case where using the hit of real-time statistics rule, when looking into the priority of rule according to hit frequency modification school and stop long Between the rule without hit, improve the efficiency that school is looked into;
3. can count display data school look into be hit rule frequency and number, can save various wrong data samples for It researchs and analyses;
4. using the form of configurable rule, scalability is strong.
The present invention can provide an efficient easy-to-use technical side in a word for the format check of all kinds of magnanimity format datas Method.
Detailed description of the invention
Fig. 1 is the structural block diagram of this system.
In figure:
1-data acquisition module;
Filtering module is looked into 2-data schools;
3-wrong data processing modules;
Rule module is looked into 4-format data schools.
English to Chinese
1, SOCKET: original meaning is " hole " or " socket ";As the Interprocess Communication Mechanism of BSD UNIX, the latter is taken to look like;For IP address and port are described, is the handle of a communication chain, can be used to realize logical between different virtual machine or different computers Letter;Two programs on network realize the exchange of data by a two-way communication connection.
2, claim ethereal before wireshark:() it is a network packet analysis software, use WinPCAP as interface, Directly data message is carried out with network interface card to exchange.
Specific embodiment
It is described in detail below in conjunction with drawings and examples.
One, system
Such as Fig. 1, this system includes that data acquisition module 1, data school are looked into filtering module 2, wrong data processing module 3 and formatted Rule module 4 is looked into data school;
Its interactive relation is:
Data acquisition module 1, data school look into filtering module 2 and wrong data processing module 3 successively interacts, and realize format data Wrong school look into, filter and store;
Filtering module 2 is looked into data school and interaction before and after rule module 4 is looked into format data school, realizes that the statistics of rule and excellent is looked into school First grade setting.
2, functional module
1. data acquisition module 1
Data acquisition module 1 is a kind of collecting method;
It receives the format data that transmission comes and judges data type, write data as file and Unified number is stored, or directly The data of document form are read into memory, data school is transmitted to and looks into filtering module 2.
2. filtering module 2 is looked into data school
Data school looks into the format that filtering module 2 is a kind of pair of format data and does the method verified according to rule;
Rule inspection data are looked into according to school is formulated to each wiht strip-lattice type data received, are separated out each field by format regulation, It checks the case where field is total, and lower rule is hit and feeds back information to format data school and look into rule module 4 to support newly The definition of each field is looked into the generation of rule, school, and value range, correctly whether format write etc.;It is looked by the school to data format, Record will hit information and relevant error data are sent to wrong data processing module 3 and provide the sample preservation of wrong data.
3. wrong data processing module 3
Wrong data processing module 3 is a kind of pool wrong data type, the method for storing wrong data sample;
Wrong data processing module 3 is by statisticalling analyze to the rule hit information received, to each of every class data Mistake saves a wrong data sample and analyzes error reason for developer.
4. rule module 4 is looked into format data school
It is the calculation method that Rulemaking is looked into a kind of real-time school that rule module 4 is looked into format data school;
It misses the error situation of formatted data structures for school debugging, and the frequency for statisticalling analyze error carrys out the ginseng of alteration ruler Number;The frequency of each rule hit is counted within the unit time, the higher verification for just correspondingling increase the rule of hit frequency is excellent First grade shuts down and is not all hit primary school for a long time and looks into rule, thus formulates new rule so that filtering is looked into data school Module 2 efficiently works.
3, the working mechanism of this system
1, format data school looks into rule module 4 and reads configuration file, looks into rule to school with this and initializes, and it is initial to complete rule It is handed down to data school after change and looks into filtering module 2;It is subsequent reading that data acquisition module 1, which receives format data classification and generates file, These files are prepared, or directly read the file of format data, and data are read one by one in memory and is transmitted to data school and looks into mould Block 2 does regular verification.
2, data school looks into filtering module 2 and does arrangement to the format data received according to verification rule and verify each word The format of section, will hit rule wrong data and wrong Rule Information be transmitted to wrong data processing module 3 for subsequent processing, The error message of hit rule is transmitted to format data school and looks into foundation of the rule module 4 as rule parameter adjusting.
3, wrong data processing module 3 understands whether the mistake has had according to the information analysis of name rule after receiving data Mistake sample is stored, and if any this wrong data is then abandoned, sample of this data as the mistake is saved if not.
4, rule module 4 is looked into according to the information of the rule of the hit received to mistake to count in format data school, divides Long-time is not had mistake according to the priority of this frequency adjustment rule by the frequency size for the rule being hit in the analysis unit time The rule being accidentally hit deactivates, and makes new school and looks into rule, is periodically sent to data school and looks into filtering module 2.
Two, method
1, step is 1.:
A, high-ranking officers look into rule and finish writing in configuration file, and school looks into format data school and looks into rule module 4 by reading configuration file pair Rule initialization;
B, format data school is looked into the case where rule module 4 is hit according to statistical rules and is adjusted to parameter of regularity, again New rule is generated, format data school is periodically handed down to and looks into rule module 4;
2, step is 2.:
A, data acquisition module 1 is used as SOCKET server-side, establishes connection with the module as SOCKET client;
B, it receives format data and statistical classification is write as file preservation;
C, the format data one by one in reading file is transmitted to data school to memory and looks into filtering module 2.
3, step is 3.:
I, data school look into the format data that the reception of filtering module 2 is sent and rule is looked into school, to each wiht strip-lattice type data received Rule inspection data are looked into according to school is formulated, each field is separated out by format regulation, checks each field format;
II will verify out vicious data and be hit regular information and issues wrong data processing module 3;
III, the information for being hit rule issue format data school and look into rule module 4.
4, step is 4.:
Judge whether new data needs to be saved according to stored wrong data sample, if there is similar wrong data then not Preservation is repeated, there are the data of new type of error to be numbered and saves as wrong sample into file.

Claims (7)

1. system is looked into a kind of data format school based on format data, it is characterised in that:
Filtering module (2), wrong data processing module (3) and format data school are looked into including data acquisition module (1), data school Look into rule module (4);
Its interactive relation is:
Data acquisition module (1), data school look into filtering module (2) and wrong data processing module (3) successively interacts, and realize format It looks into, filter and stores in the wrong school for changing data;
Filtering module (2) is looked into data school and rule module (4) interaction is looked into format data school, realizes that the statistics of rule and excellent is looked into school First grade setting.
2. system is looked into data format school according to claim 1, it is characterised in that:
The data acquisition module (1) is a kind of collecting method;
The data school looks into the format that filtering module (2) is a kind of pair of format data and does the method verified according to rule;
The wrong data processing module (3) is a kind of pool wrong data type, the method for storing wrong data sample;
It is the calculation method that Rulemaking is looked into a kind of real-time school that rule module (4) is looked into the format data school.
3. looking into the data format school checking method of system by data format school described in claim 1-2, it is characterised in that including following Step:
1. the initial configuration that rule module (4) completes rule first is looked into format data school, including to formatted data structures Data length, type, separator, field number, the data type of each field, value range and length, the school of each rule is looked into Priority and validity, and be handed down to data school and look into filtering module (2);
2. data acquisition module (1) receives format data, unified to save into file, or directly reads the formatting of document form Data are transmitted to data school in the form of a message and looked into filtering module (2) by data into memory, and slitting provides format data to data Filtering module (2) processing is looked into school;
3. it is that look into the rule that rule module (4) issues by format data school successively right to pieces of data that format data school, which is looked into, Every rule checks, and the sequence that rule is looked into school sorts by the rule prioritization in rule;
Go out problematic data format 4. filtering module (2) is looked into data school according to all kinds of rule analysis and record the rule being hit Then, the data for there are format issues and the Rule Information being accordingly hit are issued into wrong data processing module (3), by what is be hit Rule Information issues format data school and looks into rule module (4);
5. wrong data processing module (3) is the storage center of error format data instance, it is hit according to wrong data Rule is to determine whether such stored wrong sample;
6. the real-time adjuster that rule module (4) is rule is looked into format data school, it receives and counts the rule being hit Information calculates the frequency that each rule is hit, and looks into sequence according to the school of this frequency planning rule and retains.
4. data format school according to claim 3 checking method, it is characterised in that the step 1.:
A, high-ranking officers look into rule and finish writing in configuration file, and school looks into format data school and looks into rule module (4) by reading configuration file Rule is initialized;
B, format data school is looked into the case where rule module (4) is hit according to statistical rules and is adjusted to parameter of regularity, weight Newly-generated new rule is periodically handed down to format data school and looks into rule module (4).
5. data format school according to claim 3 checking method, it is characterised in that the step 2.:
A, data acquisition module (1) is used as SOCKET server-side, establishes connection with the module as SOCKET client;
B, it receives format data and statistical classification is write as file preservation;
C, the format data one by one in reading file is transmitted to data school to memory and looks into filtering module (2).
6. data format school according to claim 3 checking method, it is characterised in that the step 3.:
I, data school look into the format data that filtering module (2) reception is sent and rule is looked into school, to each wiht strip-lattice type number received Data are checked according to rule is looked into according to formulation school, are separated out each field by format regulation, are checked each field format;
II will verify out vicious data and be hit regular information and issues wrong data processing module (3);
III, the information for being hit rule issue format data school and look into rule module (4).
7. data format school according to claim 3 checking method, it is characterised in that the step 4.:
Judge whether new data needs to be saved according to stored wrong data sample, if there is similar wrong data then not Preservation is repeated, there are the data of new type of error to be numbered and saves as wrong sample into file.
CN201811212343.9A 2018-10-18 2018-10-18 Data format checking system and method based on formatted data Active CN109446157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811212343.9A CN109446157B (en) 2018-10-18 2018-10-18 Data format checking system and method based on formatted data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811212343.9A CN109446157B (en) 2018-10-18 2018-10-18 Data format checking system and method based on formatted data

Publications (2)

Publication Number Publication Date
CN109446157A true CN109446157A (en) 2019-03-08
CN109446157B CN109446157B (en) 2021-10-29

Family

ID=65547283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811212343.9A Active CN109446157B (en) 2018-10-18 2018-10-18 Data format checking system and method based on formatted data

Country Status (1)

Country Link
CN (1) CN109446157B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189093A (en) * 2019-04-16 2019-08-30 红云红河烟草(集团)有限责任公司 Data error prevention system
CN110675048A (en) * 2019-09-19 2020-01-10 国网福建省电力有限公司 Power data quality detection method and system
CN110753092A (en) * 2019-09-24 2020-02-04 深圳指芯智能科技有限公司 Method and device for server to obtain external data
CN111756781A (en) * 2019-03-28 2020-10-09 上海新微技术研发中心有限公司 Sensor integrated interaction device and interaction method
CN112699636A (en) * 2021-01-08 2021-04-23 中南大学 Multi-source Markdown geological data text format standardization method and system
CN112860520A (en) * 2021-02-23 2021-05-28 合肥大多数信息科技有限公司 Information data formatting assembly based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1122639A (en) * 1993-03-31 1996-05-15 英国电讯有限公司 Data correction system for communications network
US6295505B1 (en) * 1995-01-10 2001-09-25 Schlumberger Technology Corporation Method of filter generation for seismic migration using Remez algorithm
CN101779194A (en) * 2007-07-09 2010-07-14 美光科技公司 Error correction for memory
CN103034738A (en) * 2012-12-29 2013-04-10 天津南大通用数据技术有限公司 Relevant database for managing heterogeneous unstructured data and method for creating and inquiring description information of unstructured data thereof
CN104462604A (en) * 2014-12-31 2015-03-25 成都市卓睿科技有限公司 Data processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1122639A (en) * 1993-03-31 1996-05-15 英国电讯有限公司 Data correction system for communications network
US6295505B1 (en) * 1995-01-10 2001-09-25 Schlumberger Technology Corporation Method of filter generation for seismic migration using Remez algorithm
CN101779194A (en) * 2007-07-09 2010-07-14 美光科技公司 Error correction for memory
CN103034738A (en) * 2012-12-29 2013-04-10 天津南大通用数据技术有限公司 Relevant database for managing heterogeneous unstructured data and method for creating and inquiring description information of unstructured data thereof
CN104462604A (en) * 2014-12-31 2015-03-25 成都市卓睿科技有限公司 Data processing method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111756781A (en) * 2019-03-28 2020-10-09 上海新微技术研发中心有限公司 Sensor integrated interaction device and interaction method
CN111756781B (en) * 2019-03-28 2023-08-11 上海新微技术研发中心有限公司 Sensor integrated interaction device and interaction method
CN110189093A (en) * 2019-04-16 2019-08-30 红云红河烟草(集团)有限责任公司 Data error prevention system
CN110675048A (en) * 2019-09-19 2020-01-10 国网福建省电力有限公司 Power data quality detection method and system
CN110753092A (en) * 2019-09-24 2020-02-04 深圳指芯智能科技有限公司 Method and device for server to obtain external data
CN112699636A (en) * 2021-01-08 2021-04-23 中南大学 Multi-source Markdown geological data text format standardization method and system
CN112860520A (en) * 2021-02-23 2021-05-28 合肥大多数信息科技有限公司 Information data formatting assembly based on artificial intelligence

Also Published As

Publication number Publication date
CN109446157B (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN109446157A (en) System and method are looked into a kind of data format school based on format data
EP2244418A1 (en) Database security monitoring method, device and system
US9135280B2 (en) Grouping interdependent fields
CN108600192A (en) A kind of DBC document analysis and message analysis method based on regular expression
CN107634848A (en) A kind of system and method for collection analysis network equipment information
CN112988762B (en) Real-time identification and early warning method suitable for unit of losing message
CN112380131B (en) Module testing method and device and electronic equipment
CN106228068A (en) Android malicious code detecting method based on composite character
CN109800259A (en) Collecting method, device and terminal device
CN111181800A (en) Test data processing method and device, electronic equipment and storage medium
CN108363649B (en) Distributed log access amount statistical method and device
KR101319299B1 (en) Device for handling korean variable message format message and method thereof
CN103838739B (en) The detection method and system of error correction term in a kind of search engine
CN106250397A (en) A kind of analysis method and device of user behavior feature
CN103345527A (en) Intelligent data statistical system
CN117130851B (en) High-performance computing cluster operation efficiency evaluation method and system
CN112711614A (en) Service data management method and device
CN113592116B (en) Equipment state analysis method, device, equipment and storage medium
CN108833156B (en) Evaluation method and system for simulation performance index of power communication network
CN110297747A (en) A kind of method and terminal of test statistics function
CN104065490A (en) System and method for simulating transceiver signaling based on online charging environment
CN113282657A (en) Frequent item business data mining analysis method and business data mining equipment
Cohen et al. Sketching unaggregated data streams for subpopulation-size queries
CN110781637A (en) Chip verification auxiliary environment and chip verification system
CN113377801A (en) Data inspection method, data inspection device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Luo Jiao

Inventor after: Yang Wenjie

Inventor after: Li Xin

Inventor before: Yang Wenjie

Inventor before: Li Xin

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant