CN109995784A - A kind of data extraction accelerated method based on UDP - Google Patents

A kind of data extraction accelerated method based on UDP Download PDF

Info

Publication number
CN109995784A
CN109995784A CN201910267845.XA CN201910267845A CN109995784A CN 109995784 A CN109995784 A CN 109995784A CN 201910267845 A CN201910267845 A CN 201910267845A CN 109995784 A CN109995784 A CN 109995784A
Authority
CN
China
Prior art keywords
rule
data
regularity
server
canonical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910267845.XA
Other languages
Chinese (zh)
Other versions
CN109995784B (en
Inventor
陈云
吴建波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Palladium Networking Technology Co ltd
Original Assignee
杭州汉领信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州汉领信息科技有限公司 filed Critical 杭州汉领信息科技有限公司
Priority to CN201910267845.XA priority Critical patent/CN109995784B/en
Publication of CN109995784A publication Critical patent/CN109995784A/en
Application granted granted Critical
Publication of CN109995784B publication Critical patent/CN109995784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/164Adaptation or special uses of UDP protocol

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of, and the data based on UDP extract accelerated method.The log that the program run in server A generates is transmitted to server B by syslog and is managed collectively.First choice defines log processing regularity, then generates preprocessing rule according to regularity, and preprocessing rule includes source address, Rule content, rule action, and source address specifies this preprocessing rule to handle the log that the source address is sent;Executing rule acts if comprising Rule content in daily record data;Rule action, which refers to, last increases label+canonical ID at it to meeting the daily record data of preprocessing rule;Finally carry out real-time logs data processing.The regularity that the present invention is handled by pretreated mode explicit data reduces the matched number of data regularity;It can solve because process performance caused by the increase of same source regularity declines serious problem, guarantee stabilization and efficient process that data are extracted.

Description

A kind of data extraction accelerated method based on UDP
Technical field
The invention belongs to log processing fields more particularly to a kind of data based on UDP to extract accelerated method.
Background technique
With the continuous development of information technology, the scale of network is increasing, there is a large amount of log number in network daily According to generation, the operating condition of the network equipment and network service has been reacted in these logs, and people also increasingly pay attention to these data, think Therefrom obtain some useful information.It so just needs to handle data, wherein extracting from character string some specific The operation of substring at Data processing routine operation.Extracting the most common mode of character string is exactly canonical matching, but canonical Consumption system resource is compared in matching, and single treatment one canonical of execution is matched and executed can be poor on a plurality of canonical matching efficiency Very much.For example we when focusing on log, we can only distinguish what canonical is data pass through by source address It handles, when the same source address has a plurality of regularity, data that this source address comes can not be determined With that canonical, will match one by one until successful match.Performance can be bright after in this way more than the data every time matched canonical Aobvious decline.
Summary of the invention
The purpose of the present invention is to provide a kind of, and the data based on UDP extract the method accelerated, solve the prior art in number The problem of being matched according to multiple canonical in extraction process.
For the present invention on the basis of modifying UDP message, explicit data extracts data by which canonical, reduces canonical and tastes The number of examination promotes process performance.Steps are as follows for concrete implementation:
Step 1: the log that the program run in server A generates is transmitted to server B by syslog and carries out unified pipe Reason.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and A unique ID is distributed for every regularity.
(2.2) indicate that a field, field canonical are interior using " % { field canonical: field name } " in regularity The common canonical set, field name are the customized titles of user.
(2.3) changeless character removes location field before and after field in usage log in regularity.
Step 3: preprocessing rule generates, specific as follows:
(3.1) changeless character string is known as prefix, changeless character string behind field before field in step 2 Referred to as suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates that this is pre-processed Rule only handles the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if including rule in daily record data Then then executing rule acts content;Rule action refer to meet the daily record data of preprocessing rule its it is last increase label+ Canonical ID.
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, pre-processes to meeting The data packet of rule is modified, and in the additional label+canonical ID in tail portion, data packet is then transmitted to server B again Syslog service.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) whether server B reads data from buffer queue, judge in data containing tag characters string, have just into Enter step 4.5, does not enter step 4.6 then.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, so The processing for directly carrying out data extraction by the regularity afterwards, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, data are carried out one by one Matching enters step 4.7 until configuration successful is returned to extraction result.
(4.7) data handled are stored into disk.
The beneficial effects of the present invention are: the data proposed by the present invention based on UDP extract accelerated method, by pretreated The regularity of mode explicit data processing, reduces the matched number of data regularity.It can solve because same next Process performance caused by the regularity increase of source declines serious problem, guarantees stabilization and efficient process that data are extracted.
Detailed description of the invention
The flow diagram of Fig. 1 real-time logs data processing.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of data based on UDP provided by the invention extract accelerated method, comprising the following steps:
Step 1: multiple programs are run in server A, each program can have log generation, these logs pass through Syslog is transmitted to server B and is managed collectively.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and A unique ID is distributed for every regularity.
(2.2) indicate that a field, field canonical are interior using " % { field canonical: field name } " in regularity Common canonical, such as IP, time, user name for setting etc., field name are the customized titles of user.
(2.3) location field is removed with some changeless characters before and after field in log in regularity, such as: " 2018-10-1016:54:09user=test&client=192.168.1.12&action=u pdate " is extracted therein IP address, " client=" and field subsequent " " before field are fixed and invariable, so regularity generates are as follows: ^ (.*?) client=% { IP:client } &.
Step 3: preprocessing rule generates, and preprocessing rule is generated according to the regularity generated in step 2:
(3.1) changeless character string is known as prefix, changeless character string behind field before field in step 2 Referred to as suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates that this is pre-processed Rule only handles the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if including rule in daily record data Then then executing rule acts content;Rule action, which refers to, last increases label at it to meeting the daily record data of preprocessing rule ("@###@" form can be used in label)+canonical ID.
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, pre-processes to meeting The data packet of rule is modified, and in the additional label+canonical ID in tail portion, data packet is then transmitted to server B again Syslog service.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) whether server B reads data from buffer queue, judge in data containing tag characters string, have just into Enter step 4.5, does not enter step 4.6.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, so The processing for directly carrying out data extraction by the regularity afterwards, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, data are carried out one by one Matching extracts result and enters step 4.7 (the same source address has multiple regularities here until configuration successful is returned to Situation may need to be implemented multiple canonical matching).
(4.7) data handled are stored into disk.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.

Claims (1)

1. a kind of data based on UDP extract accelerated method, which comprises the following steps:
Step 1: the log that the program run in server A generates is transmitted to server B by syslog and is managed collectively.
Step 2: log processing regularity is defined, rule is defined as follows:
(2.1) every kind of log has fixed output format, configures a regularity for a kind of format to handle, and be every Regularity distributes a unique ID.
(2.2) indicate that a field, field canonical are built-in using " % { field canonical: field name } " in regularity Common canonical, field name is the customized title of user.
(2.3) changeless character removes location field before and after field in usage log in regularity.
Step 3: preprocessing rule generates, specific as follows:
(3.1) changeless character string is known as prefix before field in step 2, and changeless character string is known as behind field Suffix.
(3.2) preprocessing rule includes source address, Rule content, rule action, and source address indicates this preprocessing rule Only handle the log that the source address is sent;Rule content is exactly that content is sewed in front and back, if comprising in rule in daily record data Hold then executing rule to act;Rule action, which refers to, last increases label+canonical at it to meeting the daily record data of preprocessing rule ID。
Step 4: real-time logs data handling procedure is as follows:
(4.1) server A sends daily record data toward server B with UDP message packet by syslog agreement.
(4.2) server B receives the UDP message packet of server A transmission, according to preprocessing rule, to meeting preprocessing rule Data packet modify, in the additional label+canonical ID in tail portion, then the syslog that data packet is transmitted to server B is taken again Business.
(4.3) data are stored in buffer queue after syslog service reception daily record data.
(4.4) server B reads data from buffer queue, judges to enter step whether containing tag characters string in data Rapid 4.5,4.6 are not entered step then.
(4.5) label and subsequent content are intercepted, corresponding regularity is found according to the canonical ID in interception content, is then led to The processing that the regularity directly carries out data extraction is crossed, enters step 4.7.
(4.6) according to the source address of daily record data, for the regularity of this source address, one by one to data progress Match, enters step 4.7 until configuration successful is returned to extraction result.
(4.7) data handled are stored into disk.
CN201910267845.XA 2019-04-03 2019-04-03 UDP-based data extraction acceleration method Active CN109995784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910267845.XA CN109995784B (en) 2019-04-03 2019-04-03 UDP-based data extraction acceleration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910267845.XA CN109995784B (en) 2019-04-03 2019-04-03 UDP-based data extraction acceleration method

Publications (2)

Publication Number Publication Date
CN109995784A true CN109995784A (en) 2019-07-09
CN109995784B CN109995784B (en) 2022-02-11

Family

ID=67130979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910267845.XA Active CN109995784B (en) 2019-04-03 2019-04-03 UDP-based data extraction acceleration method

Country Status (1)

Country Link
CN (1) CN109995784B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948845A (en) * 2021-02-01 2021-06-11 航天科技控股集团股份有限公司 Data processing method and system based on Internet of things data center

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377260A (en) * 2012-04-28 2013-10-30 阿里巴巴集团控股有限公司 Analysis method and device of URLs (Uniform Resource Locator) of weblog
US20150057978A1 (en) * 2009-12-04 2015-02-26 Tektronix, Inc. Serial bit stream regular expression with states
CN105138593A (en) * 2015-07-31 2015-12-09 山东蚁巡网络科技有限公司 Method for extracting log key information in user-defined way by using regular expressions
CN106021554A (en) * 2016-05-30 2016-10-12 北京奇艺世纪科技有限公司 Log analysis method and device
CN106294317A (en) * 2016-07-29 2017-01-04 浪潮(北京)电子信息产业有限公司 The form information method of calibration at a kind of cloud platform interface and system
CN106598827A (en) * 2016-12-19 2017-04-26 东软集团股份有限公司 Method and device for extracting log data
CN109361701A (en) * 2018-12-07 2019-02-19 北京知道创宇信息技术有限公司 Network security detection method, device and server
CN109450671A (en) * 2018-10-22 2019-03-08 北京安信天行科技有限公司 A kind of log multiple groups close alarm classifying method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150057978A1 (en) * 2009-12-04 2015-02-26 Tektronix, Inc. Serial bit stream regular expression with states
CN103377260A (en) * 2012-04-28 2013-10-30 阿里巴巴集团控股有限公司 Analysis method and device of URLs (Uniform Resource Locator) of weblog
CN105138593A (en) * 2015-07-31 2015-12-09 山东蚁巡网络科技有限公司 Method for extracting log key information in user-defined way by using regular expressions
CN106021554A (en) * 2016-05-30 2016-10-12 北京奇艺世纪科技有限公司 Log analysis method and device
CN106294317A (en) * 2016-07-29 2017-01-04 浪潮(北京)电子信息产业有限公司 The form information method of calibration at a kind of cloud platform interface and system
CN106598827A (en) * 2016-12-19 2017-04-26 东软集团股份有限公司 Method and device for extracting log data
CN109450671A (en) * 2018-10-22 2019-03-08 北京安信天行科技有限公司 A kind of log multiple groups close alarm classifying method and system
CN109361701A (en) * 2018-12-07 2019-02-19 北京知道创宇信息技术有限公司 Network security detection method, device and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈婷婷等: "基于改进内容分析算法的网页正文提取", 《计算机工程与设计》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948845A (en) * 2021-02-01 2021-06-11 航天科技控股集团股份有限公司 Data processing method and system based on Internet of things data center

Also Published As

Publication number Publication date
CN109995784B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN110058987B (en) Method, apparatus, and computer readable medium for tracking a computing system
US8381201B2 (en) Processing of expressions
US7548848B1 (en) Method and apparatus for semantic processing engine
RU2419986C2 (en) Combining multiline protocol accesses
CN105162626B (en) Network flow depth recognition system and recognition methods based on many-core processor
US8825750B2 (en) Application server management system, application server management method, management apparatus, application server and computer program
US8141149B1 (en) Keyword obfuscation
EP1203297A4 (en) Method and system for extracting application protocol characteristics
US20090119774A1 (en) Network implemented content processing system
CN110232146B (en) Data grabbing method and grabbing device
JP2008042892A (en) Network monitoring system and its operation method
CN114157502B (en) Terminal identification method and device, electronic equipment and storage medium
CN105471635B (en) A kind of processing method of system log, device and system
CN107368578B (en) Method and system for quickly generating ES query statement
CN111679886A (en) Heterogeneous computing resource scheduling method, system, electronic device and storage medium
US11681606B2 (en) Automatic configuration of logging infrastructure for software deployments using source code
WO2012126301A1 (en) Processing method and device for message transmission and reception
CN106034113A (en) Data processing method and data processing device
CN111104188A (en) Scheduling method and device of vulnerability scanner
US20170220218A1 (en) Automatic Generation of Regular Expression Based on Log Line Data
CN114760369A (en) Protocol metadata extraction method, device, equipment and storage medium
CN109995784A (en) A kind of data extraction accelerated method based on UDP
CN109284319A (en) A kind of auditing system based on big data visualization technique
CN111427710B (en) Communication method, device, equipment and storage medium of components in application program
CN112883088B (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220424

Address after: 310012 rooms 403, 405, 407, 409 and 411, North floor, building 5, No. 90, Wensan Road, Xihu District, Hangzhou, Zhejiang

Patentee after: HANGZHOU PALLADIUM NETWORKING TECHNOLOGY CO.,LTD.

Address before: 310012 Room 401, north, 4th floor, building 5, No. 90, Wensan Road, Xihu District, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU LEADSINO INFORMATION TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right