CN109995784A - A kind of data extraction accelerated method based on UDP - Google Patents
A kind of data extraction accelerated method based on UDP Download PDFInfo
- Publication number
- CN109995784A CN109995784A CN201910267845.XA CN201910267845A CN109995784A CN 109995784 A CN109995784 A CN 109995784A CN 201910267845 A CN201910267845 A CN 201910267845A CN 109995784 A CN109995784 A CN 109995784A
- Authority
- CN
- China
- Prior art keywords
- rule
- data
- regularity
- server
- canonical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/164—Adaptation or special uses of UDP protocol
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer And Data Communications (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a kind of, and the data based on UDP extract accelerated method.The log that the program run in server A generates is transmitted to server B by syslog and is managed collectively.First choice defines log processing regularity, then generates preprocessing rule according to regularity, and preprocessing rule includes source address, Rule content, rule action, and source address specifies this preprocessing rule to handle the log that the source address is sent;Executing rule acts if comprising Rule content in daily record data;Rule action, which refers to, last increases label+canonical ID at it to meeting the daily record data of preprocessing rule;Finally carry out real-time logs data processing.The regularity that the present invention is handled by pretreated mode explicit data reduces the matched number of data regularity;It can solve because process performance caused by the increase of same source regularity declines serious problem, guarantee stabilization and efficient process that data are extracted.
Description
Technical field
The invention belongs to log processing fields more particularly to a kind of data based on UDP to extract accelerated method.
Background technique
With the continuous development of information technology, the scale of network is increasing, there is a large amount of log number in network daily
According to generation, the operating condition of the network equipment and network service has been reacted in these logs, and people also increasingly pay attention to these data, think
Therefrom obtain some useful information.It so just needs to handle data, wherein extracting from character string some specific
The operation of substring at Data processing routine operation.Extracting the most common mode of character string is exactly canonical matching, but canonical
Consumption system resource is compared in matching, and single treatment one canonical of execution is matched and executed can be poor on a plurality of canonical matching efficiency
Very much.For example we when focusing on log, we can only distinguish what canonical is data pass through by source address
It handles, when the same source address has a plurality of regularity, data that this source address comes can not be determined
With that canonical, will match one by one until successful match.Performance can be bright after in this way more than the data every time matched canonical
Aobvious decline.
Summary of the invention
The purpose of the present invention is to provide a kind of, and the data based on UDP extract the method accelerated, solve the prior art in number
The problem of being matched according to multiple canonical in extraction process.
For the present invention on the basis of modifying UDP message, explicit data extracts data by which canonical, reduces canonical and tastes
The number of examination promotes process performance.Steps are as follows for concrete implementation:
Step 1: the log that the program run in server A generates is transmitted to server B by syslog and carries out unified pipe
Reason.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and
A unique ID is distributed for every regularity.
(2.2) indicate that a field, field canonical are interior using " % { field canonical: field name } " in regularity
The common canonical set, field name are the customized titles of user.
(2.3) changeless character removes location field before and after field in usage log in regularity.
Step 3: preprocessing rule generates, specific as follows:
(3.1) changeless character string is known as prefix, changeless character string behind field before field in step 2
Referred to as suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates that this is pre-processed
Rule only handles the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if including rule in daily record data
Then then executing rule acts content;Rule action refer to meet the daily record data of preprocessing rule its it is last increase label+
Canonical ID.
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, pre-processes to meeting
The data packet of rule is modified, and in the additional label+canonical ID in tail portion, data packet is then transmitted to server B again
Syslog service.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) whether server B reads data from buffer queue, judge in data containing tag characters string, have just into
Enter step 4.5, does not enter step 4.6 then.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, so
The processing for directly carrying out data extraction by the regularity afterwards, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, data are carried out one by one
Matching enters step 4.7 until configuration successful is returned to extraction result.
(4.7) data handled are stored into disk.
The beneficial effects of the present invention are: the data proposed by the present invention based on UDP extract accelerated method, by pretreated
The regularity of mode explicit data processing, reduces the matched number of data regularity.It can solve because same next
Process performance caused by the regularity increase of source declines serious problem, guarantees stabilization and efficient process that data are extracted.
Detailed description of the invention
The flow diagram of Fig. 1 real-time logs data processing.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of data based on UDP provided by the invention extract accelerated method, comprising the following steps:
Step 1: multiple programs are run in server A, each program can have log generation, these logs pass through
Syslog is transmitted to server B and is managed collectively.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and
A unique ID is distributed for every regularity.
(2.2) indicate that a field, field canonical are interior using " % { field canonical: field name } " in regularity
Common canonical, such as IP, time, user name for setting etc., field name are the customized titles of user.
(2.3) location field is removed with some changeless characters before and after field in log in regularity, such as:
" 2018-10-1016:54:09user=test&client=192.168.1.12&action=u pdate " is extracted therein
IP address, " client=" and field subsequent " " before field are fixed and invariable, so regularity generates are as follows: ^
(.*?) client=% { IP:client } &.
Step 3: preprocessing rule generates, and preprocessing rule is generated according to the regularity generated in step 2:
(3.1) changeless character string is known as prefix, changeless character string behind field before field in step 2
Referred to as suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates that this is pre-processed
Rule only handles the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if including rule in daily record data
Then then executing rule acts content;Rule action, which refers to, last increases label at it to meeting the daily record data of preprocessing rule
("@###@" form can be used in label)+canonical ID.
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, pre-processes to meeting
The data packet of rule is modified, and in the additional label+canonical ID in tail portion, data packet is then transmitted to server B again
Syslog service.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) whether server B reads data from buffer queue, judge in data containing tag characters string, have just into
Enter step 4.5, does not enter step 4.6.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, so
The processing for directly carrying out data extraction by the regularity afterwards, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, data are carried out one by one
Matching extracts result and enters step 4.7 (the same source address has multiple regularities here until configuration successful is returned to
Situation may need to be implemented multiple canonical matching).
(4.7) data handled are stored into disk.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and
In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (1)
1. a kind of data based on UDP extract accelerated method, which comprises the following steps:
Step 1: the log that the program run in server A generates is transmitted to server B by syslog and is managed collectively.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and be every
Regularity distributes a unique ID.
(2.2) indicate that a field, field canonical are built-in using " % { field canonical: field name } " in regularity
Common canonical, field name is the customized title of user.
(2.3) changeless character removes location field before and after field in usage log in regularity.
Step 3: preprocessing rule generates, specific as follows:
(3.1) changeless character string is known as prefix before field in step 2, and changeless character string is known as behind field
Suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates this preprocessing rule
Only handle the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if comprising in rule in daily record data
Hold then executing rule to act;Rule action, which refers to, last increases label+canonical at it to meeting the daily record data of preprocessing rule
ID。
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, to meeting preprocessing rule
Data packet modify, in the additional label+canonical ID in tail portion, then the syslog that data packet is transmitted to server B is taken again
Business.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) server B reads data from buffer queue, judges to enter step whether containing tag characters string in data
Rapid 4.5,4.6 are not entered step then.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, is then led to
The processing that the regularity directly carries out data extraction is crossed, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, one by one to data progress
Match, enters step 4.7 until configuration successful is returned to extraction result.
(4.7) data handled are stored into disk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267845.XA CN109995784B (en) | 2019-04-03 | 2019-04-03 | UDP-based data extraction acceleration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267845.XA CN109995784B (en) | 2019-04-03 | 2019-04-03 | UDP-based data extraction acceleration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109995784A true CN109995784A (en) | 2019-07-09 |
CN109995784B CN109995784B (en) | 2022-02-11 |
Family
ID=67130979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910267845.XA Active CN109995784B (en) | 2019-04-03 | 2019-04-03 | UDP-based data extraction acceleration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109995784B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948845A (en) * | 2021-02-01 | 2021-06-11 | 航天科技控股集团股份有限公司 | Data processing method and system based on Internet of things data center |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103377260A (en) * | 2012-04-28 | 2013-10-30 | 阿里巴巴集团控股有限公司 | Analysis method and device of URLs (Uniform Resource Locator) of weblog |
US20150057978A1 (en) * | 2009-12-04 | 2015-02-26 | Tektronix, Inc. | Serial bit stream regular expression with states |
CN105138593A (en) * | 2015-07-31 | 2015-12-09 | 山东蚁巡网络科技有限公司 | Method for extracting log key information in user-defined way by using regular expressions |
CN106021554A (en) * | 2016-05-30 | 2016-10-12 | 北京奇艺世纪科技有限公司 | Log analysis method and device |
CN106294317A (en) * | 2016-07-29 | 2017-01-04 | 浪潮(北京)电子信息产业有限公司 | The form information method of calibration at a kind of cloud platform interface and system |
CN106598827A (en) * | 2016-12-19 | 2017-04-26 | 东软集团股份有限公司 | Method and device for extracting log data |
CN109361701A (en) * | 2018-12-07 | 2019-02-19 | 北京知道创宇信息技术有限公司 | Network security detection method, device and server |
CN109450671A (en) * | 2018-10-22 | 2019-03-08 | 北京安信天行科技有限公司 | A kind of log multiple groups close alarm classifying method and system |
-
2019
- 2019-04-03 CN CN201910267845.XA patent/CN109995784B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150057978A1 (en) * | 2009-12-04 | 2015-02-26 | Tektronix, Inc. | Serial bit stream regular expression with states |
CN103377260A (en) * | 2012-04-28 | 2013-10-30 | 阿里巴巴集团控股有限公司 | Analysis method and device of URLs (Uniform Resource Locator) of weblog |
CN105138593A (en) * | 2015-07-31 | 2015-12-09 | 山东蚁巡网络科技有限公司 | Method for extracting log key information in user-defined way by using regular expressions |
CN106021554A (en) * | 2016-05-30 | 2016-10-12 | 北京奇艺世纪科技有限公司 | Log analysis method and device |
CN106294317A (en) * | 2016-07-29 | 2017-01-04 | 浪潮(北京)电子信息产业有限公司 | The form information method of calibration at a kind of cloud platform interface and system |
CN106598827A (en) * | 2016-12-19 | 2017-04-26 | 东软集团股份有限公司 | Method and device for extracting log data |
CN109450671A (en) * | 2018-10-22 | 2019-03-08 | 北京安信天行科技有限公司 | A kind of log multiple groups close alarm classifying method and system |
CN109361701A (en) * | 2018-12-07 | 2019-02-19 | 北京知道创宇信息技术有限公司 | Network security detection method, device and server |
Non-Patent Citations (1)
Title |
---|
陈婷婷等: "基于改进内容分析算法的网页正文提取", 《计算机工程与设计》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948845A (en) * | 2021-02-01 | 2021-06-11 | 航天科技控股集团股份有限公司 | Data processing method and system based on Internet of things data center |
Also Published As
Publication number | Publication date |
---|---|
CN109995784B (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110058987B (en) | Method, apparatus, and computer readable medium for tracking a computing system | |
US8381201B2 (en) | Processing of expressions | |
US7548848B1 (en) | Method and apparatus for semantic processing engine | |
RU2419986C2 (en) | Combining multiline protocol accesses | |
CN105162626B (en) | Network flow depth recognition system and recognition methods based on many-core processor | |
US8825750B2 (en) | Application server management system, application server management method, management apparatus, application server and computer program | |
US8141149B1 (en) | Keyword obfuscation | |
EP1203297A4 (en) | Method and system for extracting application protocol characteristics | |
US20090119774A1 (en) | Network implemented content processing system | |
CN110232146B (en) | Data grabbing method and grabbing device | |
JP2008042892A (en) | Network monitoring system and its operation method | |
CN114157502B (en) | Terminal identification method and device, electronic equipment and storage medium | |
CN105471635B (en) | A kind of processing method of system log, device and system | |
CN107368578B (en) | Method and system for quickly generating ES query statement | |
CN111679886A (en) | Heterogeneous computing resource scheduling method, system, electronic device and storage medium | |
US11681606B2 (en) | Automatic configuration of logging infrastructure for software deployments using source code | |
WO2012126301A1 (en) | Processing method and device for message transmission and reception | |
CN106034113A (en) | Data processing method and data processing device | |
CN111104188A (en) | Scheduling method and device of vulnerability scanner | |
US20170220218A1 (en) | Automatic Generation of Regular Expression Based on Log Line Data | |
CN114760369A (en) | Protocol metadata extraction method, device, equipment and storage medium | |
CN109995784A (en) | A kind of data extraction accelerated method based on UDP | |
CN109284319A (en) | A kind of auditing system based on big data visualization technique | |
CN111427710B (en) | Communication method, device, equipment and storage medium of components in application program | |
CN112883088B (en) | Data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220424 Address after: 310012 rooms 403, 405, 407, 409 and 411, North floor, building 5, No. 90, Wensan Road, Xihu District, Hangzhou, Zhejiang Patentee after: HANGZHOU PALLADIUM NETWORKING TECHNOLOGY CO.,LTD. Address before: 310012 Room 401, north, 4th floor, building 5, No. 90, Wensan Road, Xihu District, Hangzhou City, Zhejiang Province Patentee before: HANGZHOU LEADSINO INFORMATION TECHNOLOGY CO.,LTD. |
|
TR01 | Transfer of patent right |