CN109446157A - System and method are looked into a kind of data format school based on format data - Google Patents
System and method are looked into a kind of data format school based on format data Download PDFInfo
- Publication number
- CN109446157A CN109446157A CN201811212343.9A CN201811212343A CN109446157A CN 109446157 A CN109446157 A CN 109446157A CN 201811212343 A CN201811212343 A CN 201811212343A CN 109446157 A CN109446157 A CN 109446157A
- Authority
- CN
- China
- Prior art keywords
- data
- school
- rule
- format
- looked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a kind of, and system and method are looked into the data format school based on format data, are related to digital text field of data transmission.This system is: data acquisition module (1), data school look into filtering module (2) and wrong data processing module (3) successively interacts, and realize that the wrong school of format data is looked into, filters and stored;Filtering module (2) is looked into data school and rule module (4) interaction is looked into format data school, realizes that the statistics and priority level initializing of rule are looked into school.1. the present invention, which includes, to be checked the format of format data, high-efficient and error checking is accurate;2. the case where using the hit of real-time statistics rule, improving the efficiency that school is looked into;3. various wrong data sample analyses for deliberation can be saved;4. using the form of configurable rule, scalability is strong.The present invention can provide an efficient easy-to-use technical method for the format check of all kinds of magnanimity format datas.
Description
Technical field
The present invention relates to digital text field of data transmission more particularly to a kind of data formats based on format data
System and method are looked into school.
Background technique
With the fast development of the technologies such as internet, big data and artificial intelligence, transmission, storage, the analysis of mass data
The development of processing and excacation is like a raging fire, and format data occupies significant proportion wherein.The not number between homologous ray
Also the correctness of data format is required according to conversion stringent.In the hardware and software development in these fields and application process, to having
The requirement for imitating the supplementary means of the format correctness of each intermodule generation data of rapid verification is more and more prominent.
The format of general standard Protocol Data there are many software as similar wireshark provide it is powerful check and
Analytic function, the data of reference format can be allowed by consensus standard, and word for word section is shown, moreover it is possible to which each data correlation is shown and identified
Wrong place.And the data format of proprietary protocol is the software powerful and mature without those to support what school was looked into, especially
In the development phase, needs and auxiliary tool is looked into effective school of Various types of data different-format to help to check and generate the correct of data
Property.
Summary of the invention
The object of the invention is that providing a kind of data format school based on formatted text data looks into system and its side
Method.
The present invention is whether the formats of detection transmission or storage formatting data whether to comply with standard be valid data, and can
Real-time monitoring provides help to repair and optimizing data!Mistake can be rejected by checking for the format data in big data
Invalid data makes data analyze and excavate more acurrate.It is supervised in real time for the data in transmission, also can be related in development process
The format data of module output provides effective auxiliary system of a check sum commissioning.
Realizing the object of the invention technical solution is:
The present invention mainly to the data format issues of transmission or the format data stored, with real-time statistics, supervises wrong lattice
The type of formula, quantity look into parameter of regularity to calculate and correct school to realize the efficient inspection to data format issues are formatted;It is logical
The data class and error category for crossing statistics carry out sample preservation to every kind of mistake of Various types of data, for maintenance and development
It provides and assists to support.
The present invention looks into school the research that parameter of regularity adjusts in real time, realizes by the following method to formatted data structures
Efficient verification.
1, setting verification rule refers to that looking into all schools of rule module dynamic configuration formulation by school looks into rule;Its lieutenant colonel looks into rule
Then module is supported to be arranged by collocation form;School looks into rule module and periodically looks into module to data school and issue newest rule, can be with
As needed, increase at any time, delete, changing, looking into rule and its parameter, jump can be ignored by which field format configured needing school which is looked into
It crosses, to adapt to different it is required that verifying effect and performance more preferably.
2, the statistics of rule hit and the calculating of rule prioritization refer to that the every rule of rule module statistics is looked into unit in school
The number being hit in time, and calculate other parameters and original that hit frequency of each rule within the unit time combines it to configure
Some priority from new school of formulating looks into rule and its each parameter;The priority of the high rule of hit frequency is improved, shielding is long-term
The rule not being hit.
3, show that real-time rule is participated and draw with manual analysis and corrected for reference to the indices of its statistics, so that data
School looks into module and can obtain one and simplify efficient school looking into rule.
Specifically:
One, system (abbreviation system) is looked into the data format school based on format data
This system includes that data acquisition module, data school are looked into filtering module, wrong data processing module and format data school and looked into
Rule module;
Its interactive relation is:
Data acquisition module, data school look into filtering module and wrong data processing module successively interacts, and realize format data
It looks into, filter and stores in mistake school;
Filtering module is looked into data school and rule module interaction is looked into format data school, realizes that the statistics of rule is looked into school and priority is set
It is fixed.
Two, the data format school checking method (abbreviation method) based on format data
This method includes the following steps:
1. the initial configuration that rule module completes rule first is looked into format data school, including the number to formatted data structures
According to length, type, separator, field number, the data type of each field, value range and length, the school of each rule is looked into preferentially
Grade and validity, and be handed down to data school and look into filtering module;
2. data acquisition module receives format data, unified to save into file, or directly reads the formatting number of document form
According into memory, data are transmitted to data school in the form of a message and look into filtering module, slitting provides format data and looks into data school
Filtering module processing;
3. it is to look into rule that rule module issues by format data school to pieces of data successively to every that format data school, which is looked into,
Rule checks, and the sequence that rule is looked into school sorts by the rule prioritization in rule;
Go out problematic data format 4. filtering module is looked into data school according to all kinds of rule analysis and record the rule being hit,
The data for there are format issues and the Rule Information being accordingly hit are issued into wrong data processing module, the rule being hit is believed
Breath issues format data school and looks into rule module;
5. wrong data processing module is the storage center of error format data instance, the rule that it is hit according to wrong data
To determine whether such stored wrong sample;
6. the real-time adjuster that rule module is rule is looked into format data school, it receives and counts the letter for the rule being hit
Breath, calculates the frequency that each rule is hit, and looks into sequence according to the school of this frequency planning rule and retains.
The present invention has following advantages and good effect:
1. looking into system with format school to check the format of format data, high-efficient and error checking is accurate;
2. the case where using the hit of real-time statistics rule, when looking into the priority of rule according to hit frequency modification school and stop long
Between the rule without hit, improve the efficiency that school is looked into;
3. can count display data school look into be hit rule frequency and number, can save various wrong data samples for
It researchs and analyses;
4. using the form of configurable rule, scalability is strong.
The present invention can provide an efficient easy-to-use technical side in a word for the format check of all kinds of magnanimity format datas
Method.
Detailed description of the invention
Fig. 1 is the structural block diagram of this system.
In figure:
1-data acquisition module;
Filtering module is looked into 2-data schools;
3-wrong data processing modules;
Rule module is looked into 4-format data schools.
English to Chinese
1, SOCKET: original meaning is " hole " or " socket ";As the Interprocess Communication Mechanism of BSD UNIX, the latter is taken to look like;For
IP address and port are described, is the handle of a communication chain, can be used to realize logical between different virtual machine or different computers
Letter;Two programs on network realize the exchange of data by a two-way communication connection.
2, claim ethereal before wireshark:() it is a network packet analysis software, use WinPCAP as interface,
Directly data message is carried out with network interface card to exchange.
Specific embodiment
It is described in detail below in conjunction with drawings and examples.
One, system
Such as Fig. 1, this system includes that data acquisition module 1, data school are looked into filtering module 2, wrong data processing module 3 and formatted
Rule module 4 is looked into data school;
Its interactive relation is:
Data acquisition module 1, data school look into filtering module 2 and wrong data processing module 3 successively interacts, and realize format data
Wrong school look into, filter and store;
Filtering module 2 is looked into data school and interaction before and after rule module 4 is looked into format data school, realizes that the statistics of rule and excellent is looked into school
First grade setting.
2, functional module
1. data acquisition module 1
Data acquisition module 1 is a kind of collecting method;
It receives the format data that transmission comes and judges data type, write data as file and Unified number is stored, or directly
The data of document form are read into memory, data school is transmitted to and looks into filtering module 2.
2. filtering module 2 is looked into data school
Data school looks into the format that filtering module 2 is a kind of pair of format data and does the method verified according to rule;
Rule inspection data are looked into according to school is formulated to each wiht strip-lattice type data received, are separated out each field by format regulation,
It checks the case where field is total, and lower rule is hit and feeds back information to format data school and look into rule module 4 to support newly
The definition of each field is looked into the generation of rule, school, and value range, correctly whether format write etc.;It is looked by the school to data format,
Record will hit information and relevant error data are sent to wrong data processing module 3 and provide the sample preservation of wrong data.
3. wrong data processing module 3
Wrong data processing module 3 is a kind of pool wrong data type, the method for storing wrong data sample;
Wrong data processing module 3 is by statisticalling analyze to the rule hit information received, to each of every class data
Mistake saves a wrong data sample and analyzes error reason for developer.
4. rule module 4 is looked into format data school
It is the calculation method that Rulemaking is looked into a kind of real-time school that rule module 4 is looked into format data school;
It misses the error situation of formatted data structures for school debugging, and the frequency for statisticalling analyze error carrys out the ginseng of alteration ruler
Number;The frequency of each rule hit is counted within the unit time, the higher verification for just correspondingling increase the rule of hit frequency is excellent
First grade shuts down and is not all hit primary school for a long time and looks into rule, thus formulates new rule so that filtering is looked into data school
Module 2 efficiently works.
3, the working mechanism of this system
1, format data school looks into rule module 4 and reads configuration file, looks into rule to school with this and initializes, and it is initial to complete rule
It is handed down to data school after change and looks into filtering module 2;It is subsequent reading that data acquisition module 1, which receives format data classification and generates file,
These files are prepared, or directly read the file of format data, and data are read one by one in memory and is transmitted to data school and looks into mould
Block 2 does regular verification.
2, data school looks into filtering module 2 and does arrangement to the format data received according to verification rule and verify each word
The format of section, will hit rule wrong data and wrong Rule Information be transmitted to wrong data processing module 3 for subsequent processing,
The error message of hit rule is transmitted to format data school and looks into foundation of the rule module 4 as rule parameter adjusting.
3, wrong data processing module 3 understands whether the mistake has had according to the information analysis of name rule after receiving data
Mistake sample is stored, and if any this wrong data is then abandoned, sample of this data as the mistake is saved if not.
4, rule module 4 is looked into according to the information of the rule of the hit received to mistake to count in format data school, divides
Long-time is not had mistake according to the priority of this frequency adjustment rule by the frequency size for the rule being hit in the analysis unit time
The rule being accidentally hit deactivates, and makes new school and looks into rule, is periodically sent to data school and looks into filtering module 2.
Two, method
1, step is 1.:
A, high-ranking officers look into rule and finish writing in configuration file, and school looks into format data school and looks into rule module 4 by reading configuration file pair
Rule initialization;
B, format data school is looked into the case where rule module 4 is hit according to statistical rules and is adjusted to parameter of regularity, again
New rule is generated, format data school is periodically handed down to and looks into rule module 4;
2, step is 2.:
A, data acquisition module 1 is used as SOCKET server-side, establishes connection with the module as SOCKET client;
B, it receives format data and statistical classification is write as file preservation;
C, the format data one by one in reading file is transmitted to data school to memory and looks into filtering module 2.
3, step is 3.:
I, data school look into the format data that the reception of filtering module 2 is sent and rule is looked into school, to each wiht strip-lattice type data received
Rule inspection data are looked into according to school is formulated, each field is separated out by format regulation, checks each field format;
II will verify out vicious data and be hit regular information and issues wrong data processing module 3;
III, the information for being hit rule issue format data school and look into rule module 4.
4, step is 4.:
Judge whether new data needs to be saved according to stored wrong data sample, if there is similar wrong data then not
Preservation is repeated, there are the data of new type of error to be numbered and saves as wrong sample into file.
Claims (7)
1. system is looked into a kind of data format school based on format data, it is characterised in that:
Filtering module (2), wrong data processing module (3) and format data school are looked into including data acquisition module (1), data school
Look into rule module (4);
Its interactive relation is:
Data acquisition module (1), data school look into filtering module (2) and wrong data processing module (3) successively interacts, and realize format
It looks into, filter and stores in the wrong school for changing data;
Filtering module (2) is looked into data school and rule module (4) interaction is looked into format data school, realizes that the statistics of rule and excellent is looked into school
First grade setting.
2. system is looked into data format school according to claim 1, it is characterised in that:
The data acquisition module (1) is a kind of collecting method;
The data school looks into the format that filtering module (2) is a kind of pair of format data and does the method verified according to rule;
The wrong data processing module (3) is a kind of pool wrong data type, the method for storing wrong data sample;
It is the calculation method that Rulemaking is looked into a kind of real-time school that rule module (4) is looked into the format data school.
3. looking into the data format school checking method of system by data format school described in claim 1-2, it is characterised in that including following
Step:
1. the initial configuration that rule module (4) completes rule first is looked into format data school, including to formatted data structures
Data length, type, separator, field number, the data type of each field, value range and length, the school of each rule is looked into
Priority and validity, and be handed down to data school and look into filtering module (2);
2. data acquisition module (1) receives format data, unified to save into file, or directly reads the formatting of document form
Data are transmitted to data school in the form of a message and looked into filtering module (2) by data into memory, and slitting provides format data to data
Filtering module (2) processing is looked into school;
3. it is that look into the rule that rule module (4) issues by format data school successively right to pieces of data that format data school, which is looked into,
Every rule checks, and the sequence that rule is looked into school sorts by the rule prioritization in rule;
Go out problematic data format 4. filtering module (2) is looked into data school according to all kinds of rule analysis and record the rule being hit
Then, the data for there are format issues and the Rule Information being accordingly hit are issued into wrong data processing module (3), by what is be hit
Rule Information issues format data school and looks into rule module (4);
5. wrong data processing module (3) is the storage center of error format data instance, it is hit according to wrong data
Rule is to determine whether such stored wrong sample;
6. the real-time adjuster that rule module (4) is rule is looked into format data school, it receives and counts the rule being hit
Information calculates the frequency that each rule is hit, and looks into sequence according to the school of this frequency planning rule and retains.
4. data format school according to claim 3 checking method, it is characterised in that the step 1.:
A, high-ranking officers look into rule and finish writing in configuration file, and school looks into format data school and looks into rule module (4) by reading configuration file
Rule is initialized;
B, format data school is looked into the case where rule module (4) is hit according to statistical rules and is adjusted to parameter of regularity, weight
Newly-generated new rule is periodically handed down to format data school and looks into rule module (4).
5. data format school according to claim 3 checking method, it is characterised in that the step 2.:
A, data acquisition module (1) is used as SOCKET server-side, establishes connection with the module as SOCKET client;
B, it receives format data and statistical classification is write as file preservation;
C, the format data one by one in reading file is transmitted to data school to memory and looks into filtering module (2).
6. data format school according to claim 3 checking method, it is characterised in that the step 3.:
I, data school look into the format data that filtering module (2) reception is sent and rule is looked into school, to each wiht strip-lattice type number received
Data are checked according to rule is looked into according to formulation school, are separated out each field by format regulation, are checked each field format;
II will verify out vicious data and be hit regular information and issues wrong data processing module (3);
III, the information for being hit rule issue format data school and look into rule module (4).
7. data format school according to claim 3 checking method, it is characterised in that the step 4.:
Judge whether new data needs to be saved according to stored wrong data sample, if there is similar wrong data then not
Preservation is repeated, there are the data of new type of error to be numbered and saves as wrong sample into file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811212343.9A CN109446157B (en) | 2018-10-18 | 2018-10-18 | Data format checking system and method based on formatted data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811212343.9A CN109446157B (en) | 2018-10-18 | 2018-10-18 | Data format checking system and method based on formatted data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446157A true CN109446157A (en) | 2019-03-08 |
CN109446157B CN109446157B (en) | 2021-10-29 |
Family
ID=65547283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811212343.9A Active CN109446157B (en) | 2018-10-18 | 2018-10-18 | Data format checking system and method based on formatted data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446157B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110189093A (en) * | 2019-04-16 | 2019-08-30 | 红云红河烟草(集团)有限责任公司 | Data error prevention system |
CN110675048A (en) * | 2019-09-19 | 2020-01-10 | 国网福建省电力有限公司 | Power data quality detection method and system |
CN110753092A (en) * | 2019-09-24 | 2020-02-04 | 深圳指芯智能科技有限公司 | Method and device for server to obtain external data |
CN111756781A (en) * | 2019-03-28 | 2020-10-09 | 上海新微技术研发中心有限公司 | Sensor integrated interaction device and interaction method |
CN112699636A (en) * | 2021-01-08 | 2021-04-23 | 中南大学 | Multi-source Markdown geological data text format standardization method and system |
CN112860520A (en) * | 2021-02-23 | 2021-05-28 | 合肥大多数信息科技有限公司 | Information data formatting assembly based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1122639A (en) * | 1993-03-31 | 1996-05-15 | 英国电讯有限公司 | Data correction system for communications network |
US6295505B1 (en) * | 1995-01-10 | 2001-09-25 | Schlumberger Technology Corporation | Method of filter generation for seismic migration using Remez algorithm |
CN101779194A (en) * | 2007-07-09 | 2010-07-14 | 美光科技公司 | Error correction for memory |
CN103034738A (en) * | 2012-12-29 | 2013-04-10 | 天津南大通用数据技术有限公司 | Relevant database for managing heterogeneous unstructured data and method for creating and inquiring description information of unstructured data thereof |
CN104462604A (en) * | 2014-12-31 | 2015-03-25 | 成都市卓睿科技有限公司 | Data processing method and system |
-
2018
- 2018-10-18 CN CN201811212343.9A patent/CN109446157B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1122639A (en) * | 1993-03-31 | 1996-05-15 | 英国电讯有限公司 | Data correction system for communications network |
US6295505B1 (en) * | 1995-01-10 | 2001-09-25 | Schlumberger Technology Corporation | Method of filter generation for seismic migration using Remez algorithm |
CN101779194A (en) * | 2007-07-09 | 2010-07-14 | 美光科技公司 | Error correction for memory |
CN103034738A (en) * | 2012-12-29 | 2013-04-10 | 天津南大通用数据技术有限公司 | Relevant database for managing heterogeneous unstructured data and method for creating and inquiring description information of unstructured data thereof |
CN104462604A (en) * | 2014-12-31 | 2015-03-25 | 成都市卓睿科技有限公司 | Data processing method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111756781A (en) * | 2019-03-28 | 2020-10-09 | 上海新微技术研发中心有限公司 | Sensor integrated interaction device and interaction method |
CN111756781B (en) * | 2019-03-28 | 2023-08-11 | 上海新微技术研发中心有限公司 | Sensor integrated interaction device and interaction method |
CN110189093A (en) * | 2019-04-16 | 2019-08-30 | 红云红河烟草(集团)有限责任公司 | Data error prevention system |
CN110675048A (en) * | 2019-09-19 | 2020-01-10 | 国网福建省电力有限公司 | Power data quality detection method and system |
CN110753092A (en) * | 2019-09-24 | 2020-02-04 | 深圳指芯智能科技有限公司 | Method and device for server to obtain external data |
CN112699636A (en) * | 2021-01-08 | 2021-04-23 | 中南大学 | Multi-source Markdown geological data text format standardization method and system |
CN112860520A (en) * | 2021-02-23 | 2021-05-28 | 合肥大多数信息科技有限公司 | Information data formatting assembly based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN109446157B (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446157A (en) | System and method are looked into a kind of data format school based on format data | |
EP2244418A1 (en) | Database security monitoring method, device and system | |
US9135280B2 (en) | Grouping interdependent fields | |
CN108600192A (en) | A kind of DBC document analysis and message analysis method based on regular expression | |
CN107634848A (en) | A kind of system and method for collection analysis network equipment information | |
CN112988762B (en) | Real-time identification and early warning method suitable for unit of losing message | |
CN112380131B (en) | Module testing method and device and electronic equipment | |
CN106228068A (en) | Android malicious code detecting method based on composite character | |
CN109800259A (en) | Collecting method, device and terminal device | |
CN111181800A (en) | Test data processing method and device, electronic equipment and storage medium | |
CN108363649B (en) | Distributed log access amount statistical method and device | |
KR101319299B1 (en) | Device for handling korean variable message format message and method thereof | |
CN103838739B (en) | The detection method and system of error correction term in a kind of search engine | |
CN106250397A (en) | A kind of analysis method and device of user behavior feature | |
CN103345527A (en) | Intelligent data statistical system | |
CN117130851B (en) | High-performance computing cluster operation efficiency evaluation method and system | |
CN112711614A (en) | Service data management method and device | |
CN113592116B (en) | Equipment state analysis method, device, equipment and storage medium | |
CN108833156B (en) | Evaluation method and system for simulation performance index of power communication network | |
CN110297747A (en) | A kind of method and terminal of test statistics function | |
CN104065490A (en) | System and method for simulating transceiver signaling based on online charging environment | |
CN113282657A (en) | Frequent item business data mining analysis method and business data mining equipment | |
Cohen et al. | Sketching unaggregated data streams for subpopulation-size queries | |
CN110781637A (en) | Chip verification auxiliary environment and chip verification system | |
CN113377801A (en) | Data inspection method, data inspection device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Luo Jiao Inventor after: Yang Wenjie Inventor after: Li Xin Inventor before: Yang Wenjie Inventor before: Li Xin |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |