CN110245155A - Data processing method, device, computer readable storage medium and terminal device - Google Patents

Data processing method, device, computer readable storage medium and terminal device Download PDF

Info

Publication number
CN110245155A
CN110245155A CN201910423175.6A CN201910423175A CN110245155A CN 110245155 A CN110245155 A CN 110245155A CN 201910423175 A CN201910423175 A CN 201910423175A CN 110245155 A CN110245155 A CN 110245155A
Authority
CN
China
Prior art keywords
data
regular expression
format
data packet
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910423175.6A
Other languages
Chinese (zh)
Inventor
孙云雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910423175.6A priority Critical patent/CN110245155A/en
Priority to PCT/CN2019/103039 priority patent/WO2020232880A1/en
Publication of CN110245155A publication Critical patent/CN110245155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to field of computer technology more particularly to a kind of data processing method, device, computer readable storage medium and terminal devices.It include the data record of one or more in the data packet the described method includes: receiving the data packet that preset packet catcher is acquired and sent;Format match is carried out to target record according to preset regular expression resources bank, the data format of the data packet is determined, includes more than one regular expression in the regular expression resources bank, each regular expression both corresponds to a kind of data format;Target processing rule is searched in preset data processing rule library, the target processing rule is data processing rule corresponding with the data format of the data packet;It handles rule according to the target to be respectively processed the pieces of data record in the data packet, the data packet that obtains that treated.Whole process is not necessarily to any manual intervention, save a large amount of time cost and human cost, significant increase efficiency.

Description

Data processing method, device, computer readable storage medium and terminal device
Technical field
The invention belongs to field of computer technology more particularly to a kind of data processing method, device, computer-readable storages Medium and terminal device.
Background technique
Continuous with big data technology is popularized, and needs to carry out analysis meter to the data of magnanimity in more and more scenes It calculates, and before carrying out analytical calculation to these data, it is necessary first to these data are pre-processed, data are converted into Analysis tool is convenient for the data format of analytical calculation, currently, these data processing works are relied primarily on and are accomplished manually, in data volume In biggish situation, need to take a substantial amount of time cost and human cost, efficiency is very low.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of data processing method, device, computer readable storage medium and Terminal device, to solve to take a substantial amount of time cost and human cost, efficiency when existing dependence is artificial to carry out data processing Very low problem.
The first aspect of the embodiment of the present invention provides a kind of data processing method, may include:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled Data packet afterwards.
The second aspect of the embodiment of the present invention provides a kind of data processing equipment, may include:
Packet-receiving module, the data packet for acquiring and sending for receiving preset packet catcher, in the data packet Including the data record of one or more;
Format match module, for carrying out format match to target record according to preset regular expression resources bank, really Determine the data format of the data packet, includes more than one regular expression, Mei Gezheng in the regular expression resources bank Then expression formula both corresponds to a kind of data format, and the target record is any one data record in the data packet;
Rule searching module is handled, for searching target processing rule, the mesh in preset data processing rule library Mark processing rule is data processing rule corresponding with the data format of the data packet;
Data processing module records difference to the pieces of data in the data packet for handling rule according to the target It is handled, the data packet that obtains that treated.
The third aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer-readable instruction, and the computer-readable instruction realizes following steps when being executed by processor:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled Data packet afterwards.
The fourth aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer-readable instruction that can run on the processor, the processor executes the computer can Following steps are realized when reading instruction:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled Data packet afterwards.
Existing beneficial effect is the embodiment of the present invention compared with prior art: the embodiment of the present invention is receiving preset grab After the acquisition of job contract tool and the data packet sent, format is carried out to target record according to preset regular expression resources bank first Match, determine the data format of the data packet, the number with the data packet is then searched in preset data processing rule library According to the corresponding target processing rule of format, rule is finally handled according to the target, the pieces of data in the data packet is recorded It is respectively processed, the data packet that obtains that treated.Through the embodiment of the present invention, it is automatically determined out using the matched mode of canonical The data format of data packet, and data processing is further carried out according to the corresponding rule that handles automatically, that is, pass through full-automation Mode realizes the complete procedure of pattern matched and data processing, and whole process is not necessarily to any manual intervention, saves A large amount of time cost and human cost, the significant increase efficiency of data processing.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is a kind of one embodiment flow chart of data processing method in the embodiment of the present invention;
Fig. 2 is the schematic flow diagram that format match is carried out to data packet;
Fig. 3 is the schematic diagram that the multiple preliminary data processing terminals of setting to carry out data packet parallel processing;
Fig. 4 is the schematic flow diagram for carrying out shunting processing to data packet;
Fig. 5 is a kind of one embodiment structure chart of data processing equipment in the embodiment of the present invention;
Fig. 6 is a kind of schematic block diagram of terminal device in the embodiment of the present invention.
Specific embodiment
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field Those of ordinary skill's all other embodiment obtained without making creative work, belongs to protection of the present invention Range.
Referring to Fig. 1, a kind of one embodiment of data processing method may include: in the embodiment of the present invention
Step S101, the data packet that preset packet catcher is acquired and sent is received.
The packet catcher is the tool being acquired to the data transmitted in a network, in the present embodiment, described to grab Job contract tool includes but is not limited to the tools such as fiddler, wireshark.
Collected data can be packaged as several data packets by the packet catcher, and be sent data packets to preset In data processing terminal, the data processing terminal is the subject of implementation of the present embodiment.Wherein, include in each data packet The item number of the data record of one or more, the data record that each data packet specifically accommodates can be set according to the actual situation It sets, for example, 1000,2000,5000 or other values etc. can be set to.
As follows, the specific example that pieces of data records in as a certain data packet:
C0-e1=string:tsalesApplyCustContact
C0-e4=string:tsalesApplyCust
C0-e6=string:true
C0-e7=string:upd
C0-e8=string:2011068
C0-e9=string:2
C0-e10=string:1
C0-e11=string:1
C0-e12=string:
C0-e13=string:9
C0-e14=string:tsalesApplyCust
C0-e15=string:1
Wherein, every a line is data record.
It should be noted that the pieces of data record in each data packet is for same business scenario number collected According to pieces of data in same data packet record all has identical data format, and the data record in different data packet can be with Data format having the same is also possible to different data formats.Wherein, data format refers to that data record is showed Regularity format character, as shown in the previous example, data format are as follows: every data record is started with c, is followed by several positions ten Binary digits (at least one), are followed by "-e ", are followed by several ten's digits (at least one), be followed by "= String: ", it is followed by the character string being made of several ten's digits or character (string length can be 0).
Step S102, format match is carried out to target record according to preset regular expression resources bank, determines the number According to the data format of packet.
It include more than one regular expression in the regular expression resources bank, each regular expression both corresponds to A kind of data format.For each business scenario, regular expression corresponding with its data format can be preset, The regular expression of each data format is constructed in regular expression resources bank shown in table:
Regular expression, also known as regular expression (Regular Expression) are a concepts of computer science, It is usually used to retrieval, replaces those texts for meeting some mode (rule), is exactly with predefined some specific words The combination of symbol and these specific characters, forms one " regular character string ", this " regular character string " is used to express to character string A kind of filter logic.Regular expression is a kind of Text Mode, and mode description wants matched one or more when searching for text A character string.
When the data processing terminal needs to carry out format match to the data packet received, first from the regular expressions One of regular expression is chosen in formula resources bank to match the data record in the data packet, if matching at Function then can determine that the data format of the data packet is in regular expression resources bank corresponding to this regular expression Data format.
Since the pieces of data record in same data packet all has identical data format, using canonical table Up to formula carry out format match when, can arbitrarily be chosen from the data packet data record (namely described target record) into Row format matching, without carrying out format match to all data records in the data packet.
If format match fails, then chooses next regular expression and carry out format match to the data packet, and constantly Above procedure is repeated, until format match success.
It should be noted that the particular content of above-mentioned regular expression resources bank can be adjusted according to the actual situation, It, can be by its corresponding entry from the regular expression for example, when the data of certain data formats are no longer analyzed It deletes in resources bank, when the data for having increased certain data formats newly are analyzed, its corresponding entry can be added Enter in the regular expression resources bank, regular expression corresponding to a certain data format can also be carried out according to the actual situation Modification, to keep the regular expression resources bank to can be adapted for newest business scenario.
Preferably, it is contemplated that the data packet of different data format occur probability may there is biggish difference, examples Such as, the data packet sum of some or a few a data formats may occupy the overwhelming majority of all data packet sums, and The data packet of other data formats may only have seldom quantity, can be according to mistake as shown in Figure 2 in order to reduce matching times Journey carries out format match to the data packet:
Step S1021, the regular expression is calculated separately according to the history match record in preset statistical time range The successful match rate of each regular expression in resources bank.
The statistical time range can be set as according to the actual situation 1 month, and 2 months, 3 months, half a year, 1 year or other Value, since data reference excessively remote has little significance, being generally disposed within 1 year is advisable.
The successful match number of successful match rate and regular expression in history match record is positively correlated, i.e. successful match Number is more, then successful match rate is higher, and successful match number is fewer, then successful match rate is lower.In history match record Used regular expression when having recorded every time to the success of data packet format match, for example, if in history altogether to 50 data Packet has carried out format match, wherein 30 times are by 1 successful match of regular expression, and 14 times are by regular expression 2 With successful, 6 times are to illustrate to carry out matched success rate most using regular expression 1 by 3 successful match of regular expression Height carries out matched success rate using regular expression 2 and takes second place, and it is minimum to carry out matched success rate using regular expression 3, then Regular expression 1 can be set to highest successful match rate, set secondary high successful match rate for regular expression 2, Minimum successful match rate is set by regular expression 3.
In order to be accurately calculated, the present embodiment one kind in the specific implementation, can first by the statistical time range draw Be divided into T sub-period, T is positive integer, and the value of T can be arranged according to the actual situation, for example, can be set to 5,10, 20 or other values.It should be noted that T value is bigger, then calculation amount is also bigger, but computational accuracy is higher;T value is got over Small, then calculation amount is also bigger, but computational accuracy is lower, needs according to the actual situation to weigh the two.
Then, of each regular expression in the regular expression resources bank in each sub-period is counted respectively With number of success, and the successful match of each regular expression in the regular expression resources bank is calculated separately according to the following formula Rate:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expressions in the regular expression resources bank The sum of formula, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, the more early sub-period on time dimension The value of its t is smaller, MatSucNumn,tIt is n-th of regular expression in the regular expression resources bank in t-th of period of the day from 11 p.m. to 1 a.m Successful match number in section, WeighttFor preset weight coefficient, and Weightt<Weightt+1, i.e., sub- time more rearward Section weight coefficient is bigger, this is because the data closer with current time, reference significance is bigger, and more long with current time Remote data, reference significance is smaller, for example, the data of this week record obviously can more reflect user than data a few months ago Current use habit, MatSucRationFor n-th of regular expression in the regular expression resources bank matching at Power.
Step S1022, a successful match rate being not yet selected is chosen most from the regular expression resources bank High regular expression is as candidate expression formula.
Step S1023, format match is carried out to the target record using the candidate expression formula.
For example, if candidate expression formula are as follows: " ^c [0-9] { 1, }-e [0-9] { 1, }=string: ", wherein ^ indicates that row is first Position, [0-9] indicate any one number in 0 to 9, and { 1, } indicates that at least matching is primary, then the regular expression can be right Data record is started with c, is followed by several ten's digits (at least one), is followed by "-e ", be followed by several positions ten into Digital (at least one) is made, "=string: " is followed by, is followed by the character string being made of several ten's digits or character Data record as (string length can be 0) is matched, therein any still by taking above-mentioned lifted data packet as an example Data record can with the candidate expression formula successful match, therefore can determine format match successfully, conversely, then can determine format It fails to match.
Step S1024, judge whether format match succeeds.
If format match fails, S1022 and its subsequent step are returned to step, until format match success; If format match success, thens follow the steps S1025.
Step S1025, data format corresponding with candidate's expression formula is determined as to the data format of the data packet.
When carrying out format match, according to the sequence of successful match rate from high to low from the regular expression resources bank Each regular expression is successively chosen, in this way, format match process can be completed by least matching times, accelerates logarithm The speed of format match is carried out according to packet.
Step S103, target processing rule is searched in preset data processing rule library.
The target processing rule is data processing rule corresponding with the data format of the data packet.
In the present embodiment, different data processing rule will be taken the data packet of various different data formats, from And the data format that subsequent data analysis tool is convenient for analytical calculation is generated, it can preset corresponding with each data format Data processing rule, the data processing rule of each data format is constructed in data processing rule library shown in table:
Data format Data processing rule
Data format a Data processing rule 1
Data format b Data processing rule 2
Data format c Data processing rule 3
…… ……
…… ……
It should be noted that the particular content in above-mentioned canonical library data processing rule library can be adjusted according to the actual situation It is whole, including but not limited to the newly-increased of data processing rule, deletion and modification etc..
Step S104, rule is handled according to the target to locate the pieces of data record in the data packet respectively Reason, the data packet that obtains that treated.
By taking the data packet of data format as follows as an example:
C0-e1=string:tsalesApplyCustContact
C0-e4=string:tsalesApplyCust
C0-e6=string:true
C0-e7=string:upd
C0-e8=string:2011068
C0-e9=string:2
C0-e10=string:1
C0-e11=string:1
C0-e12=string:
C0-e13=string:9
C0-e14=string:tsalesApplyCust
C0-e15=string:1
Corresponding data processing rule can be set are as follows: every data record is divided into two parts, first part For the data (c0-e1) before equal sign, first part be after equal sign data (string: TsalesApplyCustContact), the data of each section add quotation marks, and separate (" c0- with colon between two parts E1 ": " string:tsalesApplyCustContact "), finally, pieces of data comma is separated, and integrally adds and include greatly Number, to form the data packet of data format as follows:
" c0-e1 ": " string:tsalesApplyCustContact ", " c0-e4 ": " string: TsalesApplyCust ", " c0-e6 ": " string:true ", " c0-e7 ": " string:upd ", " c0-e8 ": " string: 2011068 ", " c0-e9 ": " string:2 ", " c0-e10 ": " string:1 ", " c0-e11 ": " string:1 ", " c0- E12 ": " string: ", " c0-e13 ": " string:9 ", " c0-e14 ": " string:tsalesApplyCust ", " c0- e15":"string:1"}
It should be noted that the above is only an example of data processing rule, it, can be according to specific field in actual use Data processing rule corresponding with the data format of data packet to be processed is arranged in scape, and details are not described herein again.
Further, it is contemplated that in practical applications it is possible that mass data packet extreme case to be processed, and Under this extreme case, only handled by the data processing terminal, then can overload, in order to solve this problem, As shown in figure 3, multiple preliminary data processing terminals can also be arranged to carry out parallel processing to data packet in the present embodiment.
Specifically, it after the data format that step S102 determines the data packet, can count first at the data The total number of the medium data packet to be processed of terminal is managed, is preset if the total number of the data packet to be processed such as described is less than or equal to Quantity threshold, then still handled according to process shown in FIG. 1, the quantity threshold can be set according to the actual situation It sets, for example, 100,200,500 or other values can be set to.If the total number of the data packet to be processed such as described Greater than the quantity threshold, then handled according to process shown in Fig. 4:
Step S401, the configuration file of preset each preliminary data processing terminal is obtained, and according to the configuration file Determine data format corresponding to each preliminary data processing terminal.
Each preliminary data processing terminal sole duty is for handling the data packet of a certain data format, this correspondence Relationship can be stored in advance in the configuration file of each preliminary data processing terminal, the data processing terminal it is available these Configuration file, and determine therefrom that data format corresponding to each preliminary data processing terminal.
Step S402, each preliminary data processing terminal is divided in corresponding data processing cluster.
As shown in figure 4, in the present embodiment, preferably all data processing terminals are divided at more than two data Manage cluster, wherein data format corresponding to the preliminary data processing terminal in same data processing cluster is consistent.
Step S403, target cluster corresponding with the data packet is chosen.
The data of data format and the data packet corresponding to each preliminary data processing terminal in the target cluster Format is consistent.
Step S404, the data packet target cluster is sent to handle.
It, can since each data processing terminal in the target cluster is consistent with the data format of the data packet More quickly the data packet is handled.
Further, the data processing terminal can be to each preliminary data processing terminal in the target cluster point Not Fa Song data packet inquiry request, and receive respectively each preliminary data processing terminal in the target cluster feedback wait locate Number of data packets is managed, the smallest preliminary data processing terminal of pending data packet number is then chosen from the target cluster and is made For preferred process terminal, and the allocation of packets to the preferred process terminal is handled, the place of the preferred terminal Reason process is similar with the treatment process in step S104, specifically can refer to aforementioned particular content, details are not described herein again.
By process shown in Fig. 4, after carrying out format match to each data packet in data flow, according to lattice Each data packet is diverted to by the matched result of formula to be handled with data processing cluster corresponding to its data format.This When, each preliminary data processing terminal in each data processing cluster will simultaneously carry out simultaneously the data packet of each data format Row processing, to promote whole data-handling efficiency.
In conclusion the embodiment of the present invention acquires and after the data packet that sends receiving preset packet catcher, root first Format match is carried out to target record according to preset regular expression resources bank, determines the data format of the data packet, then Target processing rule corresponding with the data format of the data packet, last basis are searched in preset data processing rule library Target processing rule is respectively processed the pieces of data record in the data packet, the data packet that obtains that treated. Through the embodiment of the present invention, the data format of data packet is automatically determined out using the matched mode of canonical, and further according to phase The processing rule answered is automatic to carry out data processing, i.e., is realized at pattern matched and data by way of full-automatic The complete procedure of reason, whole process are not necessarily to any manual intervention, save a large amount of time cost and human cost, significant increase The efficiency of data processing.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Corresponding to a kind of data processing method described in foregoing embodiments, Fig. 5 shows provided in an embodiment of the present invention one One embodiment structure chart of kind data processing equipment.
In the present embodiment, a kind of data processing equipment may include:
Packet-receiving module 501, the data packet for acquiring and sending for receiving preset packet catcher, the data packet In include one or more data record;
Format match module 502, for carrying out format match to target record according to preset regular expression resources bank, It determines the data format of the data packet, includes more than one regular expression in the regular expression resources bank, each Regular expression both corresponds to a kind of data format, and the target record is any one data record in the data packet;
Rule searching module 503 is handled, it is described for searching target processing rule in preset data processing rule library Target processing rule is data processing rule corresponding with the data format of the data packet;
Data processing module 504 records the pieces of data in the data packet for handling rule according to the target It is respectively processed, the data packet that obtains that treated.
Further, the format match module may include:
Successful match rate computing unit, for calculating separately institute according to the history match record in preset statistical time range State the successful match rate of each regular expression in regular expression resources bank;
Candidate expression formula selection unit was not yet selected for choosing one from the regular expression resources bank The highest regular expression of successful match rate is as candidate expression formula;
Format match unit, for carrying out format match to the target record using the candidate expression formula;
It is described from the regular expression resources bank to return to execution if failing for format match for first processing units The step of highest regular expression of one successful match rate being not yet selected of middle selection is as candidate expression formula, until lattice Until formula successful match;
The second processing unit, if for format match success, it will be corresponding with the candidate expression formula of successful match Data format is determined as the data format of the data packet.
Further, the successful match rate computing unit may include:
Sub-period divides subelement, and for the statistical time range to be divided into T sub-period, T is positive integer;
Number counts subelement, for counting each regular expression in the regular expression resources bank respectively each Successful match number in a sub-period;
Successful match rate computation subunit, it is each in the regular expression resources bank for calculating separately according to the following formula The successful match rate of regular expression:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expressions in the regular expression resources bank The sum of formula, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the canonical table Up to successful match number of n-th of the regular expression in formula resources bank in t-th of sub-period, WeighttFor preset power Weight coefficient, and Weightt<Weightt+1, MatSucRationFor n-th of regular expressions in the regular expression resources bank The successful match rate of formula.
Further, the data processing equipment can also include:
Number of data packets statistical module, for the total number of data packet to be processed such as counting;
Configuration file obtains module, if the total number for the data packet to be processed such as described is greater than preset number threshold Value, then obtain the configuration file of preset each preliminary data processing terminal, and is determined according to the configuration file each spare Data format corresponding to data processing terminal;
Assemblage classification module, for each preliminary data processing terminal to be divided in corresponding data processing cluster, In, data format corresponding to the preliminary data processing terminal in same data processing cluster is consistent;
Cluster chooses module, each standby in the target cluster for selection target cluster corresponding with the data packet The data format corresponding to data processing terminal is consistent with the data format of the data packet;
Packet sending module is handled for the data packet to be sent to the target cluster.
Further, the data processing equipment can also include:
Number enquiry module sends data packet for each preliminary data processing terminal into the target cluster respectively Inquiry request, and the pending data packet number of the feedback of each preliminary data processing terminal in the target cluster is received respectively Mesh;
Terminal chooses module, for choosing from the smallest preliminary data of pending data packet number from the target cluster Terminal is managed as preferred process terminal;
Allocation of packets module, for handling the allocation of packets to the preferred process terminal.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description, The specific work process of module and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.
The schematic block diagram that Fig. 6 shows a kind of terminal device provided in an embodiment of the present invention is only shown for ease of description Part related to the embodiment of the present invention.
In the present embodiment, the terminal device 6 can be desktop PC, notebook, palm PC and cloud clothes Business device etc. calculates equipment.The terminal device 6 can include: processor 60, memory 61 and be stored in the memory 61 simultaneously The computer-readable instruction 62 that can be run on the processor 60, such as executing the computer of above-mentioned data processing method can Reading instruction.The processor 60 is realized when executing the computer-readable instruction 62 in above-mentioned each data processing method embodiment The step of, such as step S101 to S104 shown in FIG. 1.Alternatively, the processor 60 executes the computer-readable instruction 62 The function of each module/unit in the above-mentioned each Installation practice of Shi Shixian, such as the function of module 501 to 504 shown in Fig. 5.
Illustratively, the computer-readable instruction 62 can be divided into one or more module/units, one Or multiple module/units are stored in the memory 61, and are executed by the processor 60, to complete the present invention.Institute Stating one or more module/units can be the series of computation machine readable instruction section that can complete specific function, the instruction segment For describing implementation procedure of the computer-readable instruction 62 in the terminal device 6.
The processor 60 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.
The memory 61 can be the internal storage unit of the terminal device 6, such as the hard disk or interior of terminal device 6 It deposits.The memory 61 is also possible to the External memory equipment of the terminal device 6, such as be equipped on the terminal device 6 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 61 can also both include the storage inside list of the terminal device 6 Member also includes External memory equipment.The memory 61 is for storing the computer-readable instruction and the terminal device 6 Required other instruction and datas.The memory 61 can be also used for temporarily storing the number that has exported or will export According to.
The functional units in various embodiments of the present invention may be integrated into one processing unit, is also possible to each Unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit both may be used To use formal implementation of hardware, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a storage medium, including several computer-readable instructions are used so that one Platform computer equipment (can be personal computer, server or the network equipment etc.) executes described in each embodiment of the present invention The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of computer-readable instruction.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of data processing method characterized by comprising
The data packet that preset packet catcher is acquired and sent is received, includes the data record of one or more in the data packet;
Format match is carried out to target record according to preset regular expression resources bank, determines the data lattice of the data packet Formula, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one kind Data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data packet The corresponding data processing rule of data format;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, obtains that treated Data packet.
2. data processing method according to claim 1, which is characterized in that described according to preset regular expression resource Library carries out format match to target record, determines that the data format of the data packet includes:
It is each in the regular expression resources bank according to being calculated separately in the history match record in preset statistical time range The successful match rate of regular expression;
A highest regular expression of successful match rate being not yet selected is chosen from the regular expression resources bank As candidate expression formula;
Format match is carried out to the target record using the candidate expression formula;
If format match fails, returns to the execution selection one from the regular expression resources bank and be not yet selected The highest regular expression of successful match rate as candidate expression formula the step of, until format match successfully until;
If format match success, data format corresponding with candidate's expression formula of successful match is determined as the data The data format of packet.
3. data processing method according to claim 2, which is characterized in that the basis is in preset statistical time range The successful match rate that history match record calculates separately each regular expression in the regular expression resources bank includes:
The statistical time range is divided into T sub-period, T is positive integer;
The successful match in each sub-period time of each regular expression in the regular expression resources bank is counted respectively Number;
The successful match rate of each regular expression in the regular expression resources bank is calculated separately according to the following formula:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expression in the regular expression resources bank Sum, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the regular expression Successful match number of n-th of regular expression in t-th of sub-period in resources bank, WeighttFor preset weight system Number, and Weightt<Weightt+1, MatSucRationFor n-th of regular expression in the regular expression resources bank Successful match rate.
4. data processing method according to any one of claim 1 to 3, which is characterized in that determining the data packet Data format after, further includes:
The total number of the data packets to be processed such as statistics;
If the total number of the data packet to be processed such as described is greater than preset quantity threshold, preset each preliminary data is obtained The configuration file of processing terminal, and according to the configuration file determine each preliminary data processing terminal corresponding to data lattice Formula;
Each preliminary data processing terminal is divided in corresponding data processing cluster, wherein in same data processing cluster Preliminary data processing terminal corresponding to data format it is consistent;
Target cluster corresponding with the data packet is chosen, in the target cluster corresponding to each preliminary data processing terminal Data format is consistent with the data format of the data packet;
The data packet is sent to the target cluster to handle.
5. data package processing method according to claim 4, which is characterized in that the data packet is being sent to the mesh After mark cluster is handled, further includes:
Each preliminary data processing terminal into the target cluster sends data packet inquiry request respectively, and receives institute respectively State the pending data packet number of each preliminary data processing terminal feedback in target cluster;
The smallest preliminary data processing terminal of pending data packet number is chosen from the target cluster as preferred process end End;
The allocation of packets to the preferred process terminal is handled.
6. a kind of data processing equipment characterized by comprising
Packet-receiving module, the data packet for acquiring and sending for receiving preset packet catcher include in the data packet The data record of one or more;
Format match module determines institute for carrying out format match to target record according to preset regular expression resources bank The data format of data packet is stated, includes more than one regular expression, each canonical table in the regular expression resources bank A kind of data format is both corresponded to up to formula, the target record is any one data record in the data packet;
Rule searching module is handled, it is regular for searching target processing in preset data processing rule library, at the target Reason rule is data processing rule corresponding with the data format of the data packet;
Data processing module carries out the pieces of data record in the data packet for handling rule according to the target respectively Processing, the data packet that obtains that treated.
7. data processing equipment according to claim 6, which is characterized in that the format match module includes:
Successful match rate computing unit, for according in preset statistical time range history match record calculate separately it is described just The then successful match rate of each regular expression in expression formula resources bank;
Candidate expression formula selection unit, for choosing a matching being not yet selected from the regular expression resources bank The highest regular expression of success rate is as candidate expression formula;
Format match unit, for carrying out format match to the target record using the candidate expression formula;
First processing units are returned and are selected from the regular expression resources bank described in execution if failing for format match The step of highest regular expression of the successful match rate being not yet selected is as candidate expression formula is taken, until format Until success;
The second processing unit, if for format match success, by data corresponding with candidate's expression formula of successful match Format is determined as the data format of the data packet.
8. data processing equipment according to claim 7, which is characterized in that the successful match rate computing unit includes:
Sub-period divides subelement, and for the statistical time range to be divided into T sub-period, T is positive integer;
Number counts subelement, for counting each regular expression in the regular expression resources bank respectively in each height Successful match number in period;
Successful match rate computation subunit, for calculating separately each canonical in the regular expression resources bank according to the following formula The successful match rate of expression formula:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expression in the regular expression resources bank Sum, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the regular expression Successful match number of n-th of regular expression in t-th of sub-period in resources bank, WeighttFor preset weight system Number, and Weightt<Weightt+1, MatSucRationFor n-th of regular expression in the regular expression resources bank Successful match rate.
9. a kind of computer readable storage medium, the computer-readable recording medium storage has computer-readable instruction, special Sign is, is realized at the data as described in any one of claims 1 to 5 when the computer-readable instruction is executed by processor The step of reason method.
10. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer-readable instruction of operation, which is characterized in that the processor realizes such as right when executing the computer-readable instruction It is required that the step of data processing method described in any one of 1 to 5.
CN201910423175.6A 2019-05-21 2019-05-21 Data processing method, device, computer readable storage medium and terminal device Pending CN110245155A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910423175.6A CN110245155A (en) 2019-05-21 2019-05-21 Data processing method, device, computer readable storage medium and terminal device
PCT/CN2019/103039 WO2020232880A1 (en) 2019-05-21 2019-08-28 Data processing method and apparatus, storage medium and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910423175.6A CN110245155A (en) 2019-05-21 2019-05-21 Data processing method, device, computer readable storage medium and terminal device

Publications (1)

Publication Number Publication Date
CN110245155A true CN110245155A (en) 2019-09-17

Family

ID=67884683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910423175.6A Pending CN110245155A (en) 2019-05-21 2019-05-21 Data processing method, device, computer readable storage medium and terminal device

Country Status (2)

Country Link
CN (1) CN110245155A (en)
WO (1) WO2020232880A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660530A (en) * 2021-07-27 2021-11-16 中央广播电视总台 Program stream data capturing method and device, computer equipment and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656659A (en) * 2021-08-31 2021-11-16 上海观安信息技术股份有限公司 Data extraction method, device and system and computer readable storage medium
CN115757423B (en) * 2022-11-29 2024-01-30 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786545A (en) * 2017-09-29 2018-03-09 中国平安人寿保险股份有限公司 A kind of attack detection method and terminal device
CN109299164A (en) * 2018-09-03 2019-02-01 中国平安人寿保险股份有限公司 A kind of data query method, computer readable storage medium and terminal device
CN109656487A (en) * 2018-12-24 2019-04-19 平安科技(深圳)有限公司 A kind of data processing method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8374986B2 (en) * 2008-05-15 2013-02-12 Exegy Incorporated Method and system for accelerated stream processing
CN103078808B (en) * 2012-12-29 2015-09-30 大连环宇移动科技有限公司 The data flow being applicable to multithread matching regular expressions exchanges multiplex system and method
CN107766466A (en) * 2017-09-29 2018-03-06 上海望友信息科技有限公司 Recognition methods, system, computer-readable recording medium and the equipment of data type
CN107729475B (en) * 2017-10-16 2021-07-02 深圳视界信息技术有限公司 Webpage element acquisition method, device, terminal and computer-readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786545A (en) * 2017-09-29 2018-03-09 中国平安人寿保险股份有限公司 A kind of attack detection method and terminal device
CN109299164A (en) * 2018-09-03 2019-02-01 中国平安人寿保险股份有限公司 A kind of data query method, computer readable storage medium and terminal device
CN109656487A (en) * 2018-12-24 2019-04-19 平安科技(深圳)有限公司 A kind of data processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660530A (en) * 2021-07-27 2021-11-16 中央广播电视总台 Program stream data capturing method and device, computer equipment and readable storage medium
CN113660530B (en) * 2021-07-27 2024-03-19 中央广播电视总台 Program stream data grabbing method and device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
WO2020232880A1 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
CN102402605B (en) Mixed distribution model for search engine indexing
CN110245155A (en) Data processing method, device, computer readable storage medium and terminal device
CN103279478A (en) Method for extracting features based on distributed mutual information documents
EP3516539B1 (en) Techniques for in-memory key range searches
WO2012039760A1 (en) Processing of categorized product information cross reference to other applications
CN111339078A (en) Data real-time storage method, data query method, device, equipment and medium
CN110245289A (en) A kind of information search method and relevant device
CN102169491B (en) Dynamic detection method for multi-data concentrated and repeated records
CN108304354A (en) A kind of prediction model training method and device, storage medium, electronic equipment
CN109885651B (en) Question pushing method and device
CN112328688B (en) Data storage method, device, computer equipment and storage medium
WO2021126439A1 (en) Selecting a normalized form for conversion of a query expression
CN103970747A (en) Data processing method for network side computer to order search results
CN105117442A (en) Probability based big data query method
CN109947729A (en) A kind of real-time data analysis method and device
CN112650743A (en) Funnel data analysis method and system, electronic device and storage medium
Romero Hung et al. ACE-GCN: A Fast data-driven FPGA accelerator for GCN embedding
CN111159213A (en) Data query method, device, system and storage medium
US11748255B1 (en) Method for searching free blocks in bitmap data, and related components
US11709798B2 (en) Hash suppression
CN115563310A (en) Method, device, equipment and medium for determining key service node
WO2018136371A1 (en) Compressed encoding for bit sequence
CN105468603B (en) Data selecting method and device
CN115454356B (en) Data file processing method, device and equipment based on recognition and aggregation algorithm
CN115510292B (en) Distributed storage system tree search management method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination