CN110245155A - Data processing method, device, computer readable storage medium and terminal device - Google Patents
Data processing method, device, computer readable storage medium and terminal device Download PDFInfo
- Publication number
- CN110245155A CN110245155A CN201910423175.6A CN201910423175A CN110245155A CN 110245155 A CN110245155 A CN 110245155A CN 201910423175 A CN201910423175 A CN 201910423175A CN 110245155 A CN110245155 A CN 110245155A
- Authority
- CN
- China
- Prior art keywords
- data
- regular expression
- format
- data packet
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to field of computer technology more particularly to a kind of data processing method, device, computer readable storage medium and terminal devices.It include the data record of one or more in the data packet the described method includes: receiving the data packet that preset packet catcher is acquired and sent;Format match is carried out to target record according to preset regular expression resources bank, the data format of the data packet is determined, includes more than one regular expression in the regular expression resources bank, each regular expression both corresponds to a kind of data format;Target processing rule is searched in preset data processing rule library, the target processing rule is data processing rule corresponding with the data format of the data packet;It handles rule according to the target to be respectively processed the pieces of data record in the data packet, the data packet that obtains that treated.Whole process is not necessarily to any manual intervention, save a large amount of time cost and human cost, significant increase efficiency.
Description
Technical field
The invention belongs to field of computer technology more particularly to a kind of data processing method, device, computer-readable storages
Medium and terminal device.
Background technique
Continuous with big data technology is popularized, and needs to carry out analysis meter to the data of magnanimity in more and more scenes
It calculates, and before carrying out analytical calculation to these data, it is necessary first to these data are pre-processed, data are converted into
Analysis tool is convenient for the data format of analytical calculation, currently, these data processing works are relied primarily on and are accomplished manually, in data volume
In biggish situation, need to take a substantial amount of time cost and human cost, efficiency is very low.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of data processing method, device, computer readable storage medium and
Terminal device, to solve to take a substantial amount of time cost and human cost, efficiency when existing dependence is artificial to carry out data processing
Very low problem.
The first aspect of the embodiment of the present invention provides a kind of data processing method, may include:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet
Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet
Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one
Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data
The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled
Data packet afterwards.
The second aspect of the embodiment of the present invention provides a kind of data processing equipment, may include:
Packet-receiving module, the data packet for acquiring and sending for receiving preset packet catcher, in the data packet
Including the data record of one or more;
Format match module, for carrying out format match to target record according to preset regular expression resources bank, really
Determine the data format of the data packet, includes more than one regular expression, Mei Gezheng in the regular expression resources bank
Then expression formula both corresponds to a kind of data format, and the target record is any one data record in the data packet;
Rule searching module is handled, for searching target processing rule, the mesh in preset data processing rule library
Mark processing rule is data processing rule corresponding with the data format of the data packet;
Data processing module records difference to the pieces of data in the data packet for handling rule according to the target
It is handled, the data packet that obtains that treated.
The third aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer-readable instruction, and the computer-readable instruction realizes following steps when being executed by processor:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet
Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet
Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one
Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data
The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled
Data packet afterwards.
The fourth aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in
In the memory and the computer-readable instruction that can run on the processor, the processor executes the computer can
Following steps are realized when reading instruction:
The data packet that preset packet catcher is acquired and sent is received, includes the data note of one or more in the data packet
Record;
Format match is carried out to target record according to preset regular expression resources bank, determines the data of the data packet
Format, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one
Kind data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data
The corresponding data processing rule of the data format of packet;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, is handled
Data packet afterwards.
Existing beneficial effect is the embodiment of the present invention compared with prior art: the embodiment of the present invention is receiving preset grab
After the acquisition of job contract tool and the data packet sent, format is carried out to target record according to preset regular expression resources bank first
Match, determine the data format of the data packet, the number with the data packet is then searched in preset data processing rule library
According to the corresponding target processing rule of format, rule is finally handled according to the target, the pieces of data in the data packet is recorded
It is respectively processed, the data packet that obtains that treated.Through the embodiment of the present invention, it is automatically determined out using the matched mode of canonical
The data format of data packet, and data processing is further carried out according to the corresponding rule that handles automatically, that is, pass through full-automation
Mode realizes the complete procedure of pattern matched and data processing, and whole process is not necessarily to any manual intervention, saves
A large amount of time cost and human cost, the significant increase efficiency of data processing.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of one embodiment flow chart of data processing method in the embodiment of the present invention;
Fig. 2 is the schematic flow diagram that format match is carried out to data packet;
Fig. 3 is the schematic diagram that the multiple preliminary data processing terminals of setting to carry out data packet parallel processing;
Fig. 4 is the schematic flow diagram for carrying out shunting processing to data packet;
Fig. 5 is a kind of one embodiment structure chart of data processing equipment in the embodiment of the present invention;
Fig. 6 is a kind of schematic block diagram of terminal device in the embodiment of the present invention.
Specific embodiment
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below
Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field
Those of ordinary skill's all other embodiment obtained without making creative work, belongs to protection of the present invention
Range.
Referring to Fig. 1, a kind of one embodiment of data processing method may include: in the embodiment of the present invention
Step S101, the data packet that preset packet catcher is acquired and sent is received.
The packet catcher is the tool being acquired to the data transmitted in a network, in the present embodiment, described to grab
Job contract tool includes but is not limited to the tools such as fiddler, wireshark.
Collected data can be packaged as several data packets by the packet catcher, and be sent data packets to preset
In data processing terminal, the data processing terminal is the subject of implementation of the present embodiment.Wherein, include in each data packet
The item number of the data record of one or more, the data record that each data packet specifically accommodates can be set according to the actual situation
It sets, for example, 1000,2000,5000 or other values etc. can be set to.
As follows, the specific example that pieces of data records in as a certain data packet:
C0-e1=string:tsalesApplyCustContact
C0-e4=string:tsalesApplyCust
C0-e6=string:true
C0-e7=string:upd
C0-e8=string:2011068
C0-e9=string:2
C0-e10=string:1
C0-e11=string:1
C0-e12=string:
C0-e13=string:9
C0-e14=string:tsalesApplyCust
C0-e15=string:1
Wherein, every a line is data record.
It should be noted that the pieces of data record in each data packet is for same business scenario number collected
According to pieces of data in same data packet record all has identical data format, and the data record in different data packet can be with
Data format having the same is also possible to different data formats.Wherein, data format refers to that data record is showed
Regularity format character, as shown in the previous example, data format are as follows: every data record is started with c, is followed by several positions ten
Binary digits (at least one), are followed by "-e ", are followed by several ten's digits (at least one), be followed by "=
String: ", it is followed by the character string being made of several ten's digits or character (string length can be 0).
Step S102, format match is carried out to target record according to preset regular expression resources bank, determines the number
According to the data format of packet.
It include more than one regular expression in the regular expression resources bank, each regular expression both corresponds to
A kind of data format.For each business scenario, regular expression corresponding with its data format can be preset,
The regular expression of each data format is constructed in regular expression resources bank shown in table:
Regular expression, also known as regular expression (Regular Expression) are a concepts of computer science,
It is usually used to retrieval, replaces those texts for meeting some mode (rule), is exactly with predefined some specific words
The combination of symbol and these specific characters, forms one " regular character string ", this " regular character string " is used to express to character string
A kind of filter logic.Regular expression is a kind of Text Mode, and mode description wants matched one or more when searching for text
A character string.
When the data processing terminal needs to carry out format match to the data packet received, first from the regular expressions
One of regular expression is chosen in formula resources bank to match the data record in the data packet, if matching at
Function then can determine that the data format of the data packet is in regular expression resources bank corresponding to this regular expression
Data format.
Since the pieces of data record in same data packet all has identical data format, using canonical table
Up to formula carry out format match when, can arbitrarily be chosen from the data packet data record (namely described target record) into
Row format matching, without carrying out format match to all data records in the data packet.
If format match fails, then chooses next regular expression and carry out format match to the data packet, and constantly
Above procedure is repeated, until format match success.
It should be noted that the particular content of above-mentioned regular expression resources bank can be adjusted according to the actual situation,
It, can be by its corresponding entry from the regular expression for example, when the data of certain data formats are no longer analyzed
It deletes in resources bank, when the data for having increased certain data formats newly are analyzed, its corresponding entry can be added
Enter in the regular expression resources bank, regular expression corresponding to a certain data format can also be carried out according to the actual situation
Modification, to keep the regular expression resources bank to can be adapted for newest business scenario.
Preferably, it is contemplated that the data packet of different data format occur probability may there is biggish difference, examples
Such as, the data packet sum of some or a few a data formats may occupy the overwhelming majority of all data packet sums, and
The data packet of other data formats may only have seldom quantity, can be according to mistake as shown in Figure 2 in order to reduce matching times
Journey carries out format match to the data packet:
Step S1021, the regular expression is calculated separately according to the history match record in preset statistical time range
The successful match rate of each regular expression in resources bank.
The statistical time range can be set as according to the actual situation 1 month, and 2 months, 3 months, half a year, 1 year or other
Value, since data reference excessively remote has little significance, being generally disposed within 1 year is advisable.
The successful match number of successful match rate and regular expression in history match record is positively correlated, i.e. successful match
Number is more, then successful match rate is higher, and successful match number is fewer, then successful match rate is lower.In history match record
Used regular expression when having recorded every time to the success of data packet format match, for example, if in history altogether to 50 data
Packet has carried out format match, wherein 30 times are by 1 successful match of regular expression, and 14 times are by regular expression 2
With successful, 6 times are to illustrate to carry out matched success rate most using regular expression 1 by 3 successful match of regular expression
Height carries out matched success rate using regular expression 2 and takes second place, and it is minimum to carry out matched success rate using regular expression 3, then
Regular expression 1 can be set to highest successful match rate, set secondary high successful match rate for regular expression 2,
Minimum successful match rate is set by regular expression 3.
In order to be accurately calculated, the present embodiment one kind in the specific implementation, can first by the statistical time range draw
Be divided into T sub-period, T is positive integer, and the value of T can be arranged according to the actual situation, for example, can be set to 5,10,
20 or other values.It should be noted that T value is bigger, then calculation amount is also bigger, but computational accuracy is higher;T value is got over
Small, then calculation amount is also bigger, but computational accuracy is lower, needs according to the actual situation to weigh the two.
Then, of each regular expression in the regular expression resources bank in each sub-period is counted respectively
With number of success, and the successful match of each regular expression in the regular expression resources bank is calculated separately according to the following formula
Rate:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expressions in the regular expression resources bank
The sum of formula, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, the more early sub-period on time dimension
The value of its t is smaller, MatSucNumn,tIt is n-th of regular expression in the regular expression resources bank in t-th of period of the day from 11 p.m. to 1 a.m
Successful match number in section, WeighttFor preset weight coefficient, and Weightt<Weightt+1, i.e., sub- time more rearward
Section weight coefficient is bigger, this is because the data closer with current time, reference significance is bigger, and more long with current time
Remote data, reference significance is smaller, for example, the data of this week record obviously can more reflect user than data a few months ago
Current use habit, MatSucRationFor n-th of regular expression in the regular expression resources bank matching at
Power.
Step S1022, a successful match rate being not yet selected is chosen most from the regular expression resources bank
High regular expression is as candidate expression formula.
Step S1023, format match is carried out to the target record using the candidate expression formula.
For example, if candidate expression formula are as follows: " ^c [0-9] { 1, }-e [0-9] { 1, }=string: ", wherein ^ indicates that row is first
Position, [0-9] indicate any one number in 0 to 9, and { 1, } indicates that at least matching is primary, then the regular expression can be right
Data record is started with c, is followed by several ten's digits (at least one), is followed by "-e ", be followed by several positions ten into
Digital (at least one) is made, "=string: " is followed by, is followed by the character string being made of several ten's digits or character
Data record as (string length can be 0) is matched, therein any still by taking above-mentioned lifted data packet as an example
Data record can with the candidate expression formula successful match, therefore can determine format match successfully, conversely, then can determine format
It fails to match.
Step S1024, judge whether format match succeeds.
If format match fails, S1022 and its subsequent step are returned to step, until format match success;
If format match success, thens follow the steps S1025.
Step S1025, data format corresponding with candidate's expression formula is determined as to the data format of the data packet.
When carrying out format match, according to the sequence of successful match rate from high to low from the regular expression resources bank
Each regular expression is successively chosen, in this way, format match process can be completed by least matching times, accelerates logarithm
The speed of format match is carried out according to packet.
Step S103, target processing rule is searched in preset data processing rule library.
The target processing rule is data processing rule corresponding with the data format of the data packet.
In the present embodiment, different data processing rule will be taken the data packet of various different data formats, from
And the data format that subsequent data analysis tool is convenient for analytical calculation is generated, it can preset corresponding with each data format
Data processing rule, the data processing rule of each data format is constructed in data processing rule library shown in table:
Data format | Data processing rule |
Data format a | Data processing rule 1 |
Data format b | Data processing rule 2 |
Data format c | Data processing rule 3 |
…… | …… |
…… | …… |
It should be noted that the particular content in above-mentioned canonical library data processing rule library can be adjusted according to the actual situation
It is whole, including but not limited to the newly-increased of data processing rule, deletion and modification etc..
Step S104, rule is handled according to the target to locate the pieces of data record in the data packet respectively
Reason, the data packet that obtains that treated.
By taking the data packet of data format as follows as an example:
C0-e1=string:tsalesApplyCustContact
C0-e4=string:tsalesApplyCust
C0-e6=string:true
C0-e7=string:upd
C0-e8=string:2011068
C0-e9=string:2
C0-e10=string:1
C0-e11=string:1
C0-e12=string:
C0-e13=string:9
C0-e14=string:tsalesApplyCust
C0-e15=string:1
Corresponding data processing rule can be set are as follows: every data record is divided into two parts, first part
For the data (c0-e1) before equal sign, first part be after equal sign data (string:
TsalesApplyCustContact), the data of each section add quotation marks, and separate (" c0- with colon between two parts
E1 ": " string:tsalesApplyCustContact "), finally, pieces of data comma is separated, and integrally adds and include greatly
Number, to form the data packet of data format as follows:
" c0-e1 ": " string:tsalesApplyCustContact ", " c0-e4 ": " string:
TsalesApplyCust ", " c0-e6 ": " string:true ", " c0-e7 ": " string:upd ", " c0-e8 ": " string:
2011068 ", " c0-e9 ": " string:2 ", " c0-e10 ": " string:1 ", " c0-e11 ": " string:1 ", " c0-
E12 ": " string: ", " c0-e13 ": " string:9 ", " c0-e14 ": " string:tsalesApplyCust ", " c0-
e15":"string:1"}
It should be noted that the above is only an example of data processing rule, it, can be according to specific field in actual use
Data processing rule corresponding with the data format of data packet to be processed is arranged in scape, and details are not described herein again.
Further, it is contemplated that in practical applications it is possible that mass data packet extreme case to be processed, and
Under this extreme case, only handled by the data processing terminal, then can overload, in order to solve this problem,
As shown in figure 3, multiple preliminary data processing terminals can also be arranged to carry out parallel processing to data packet in the present embodiment.
Specifically, it after the data format that step S102 determines the data packet, can count first at the data
The total number of the medium data packet to be processed of terminal is managed, is preset if the total number of the data packet to be processed such as described is less than or equal to
Quantity threshold, then still handled according to process shown in FIG. 1, the quantity threshold can be set according to the actual situation
It sets, for example, 100,200,500 or other values can be set to.If the total number of the data packet to be processed such as described
Greater than the quantity threshold, then handled according to process shown in Fig. 4:
Step S401, the configuration file of preset each preliminary data processing terminal is obtained, and according to the configuration file
Determine data format corresponding to each preliminary data processing terminal.
Each preliminary data processing terminal sole duty is for handling the data packet of a certain data format, this correspondence
Relationship can be stored in advance in the configuration file of each preliminary data processing terminal, the data processing terminal it is available these
Configuration file, and determine therefrom that data format corresponding to each preliminary data processing terminal.
Step S402, each preliminary data processing terminal is divided in corresponding data processing cluster.
As shown in figure 4, in the present embodiment, preferably all data processing terminals are divided at more than two data
Manage cluster, wherein data format corresponding to the preliminary data processing terminal in same data processing cluster is consistent.
Step S403, target cluster corresponding with the data packet is chosen.
The data of data format and the data packet corresponding to each preliminary data processing terminal in the target cluster
Format is consistent.
Step S404, the data packet target cluster is sent to handle.
It, can since each data processing terminal in the target cluster is consistent with the data format of the data packet
More quickly the data packet is handled.
Further, the data processing terminal can be to each preliminary data processing terminal in the target cluster point
Not Fa Song data packet inquiry request, and receive respectively each preliminary data processing terminal in the target cluster feedback wait locate
Number of data packets is managed, the smallest preliminary data processing terminal of pending data packet number is then chosen from the target cluster and is made
For preferred process terminal, and the allocation of packets to the preferred process terminal is handled, the place of the preferred terminal
Reason process is similar with the treatment process in step S104, specifically can refer to aforementioned particular content, details are not described herein again.
By process shown in Fig. 4, after carrying out format match to each data packet in data flow, according to lattice
Each data packet is diverted to by the matched result of formula to be handled with data processing cluster corresponding to its data format.This
When, each preliminary data processing terminal in each data processing cluster will simultaneously carry out simultaneously the data packet of each data format
Row processing, to promote whole data-handling efficiency.
In conclusion the embodiment of the present invention acquires and after the data packet that sends receiving preset packet catcher, root first
Format match is carried out to target record according to preset regular expression resources bank, determines the data format of the data packet, then
Target processing rule corresponding with the data format of the data packet, last basis are searched in preset data processing rule library
Target processing rule is respectively processed the pieces of data record in the data packet, the data packet that obtains that treated.
Through the embodiment of the present invention, the data format of data packet is automatically determined out using the matched mode of canonical, and further according to phase
The processing rule answered is automatic to carry out data processing, i.e., is realized at pattern matched and data by way of full-automatic
The complete procedure of reason, whole process are not necessarily to any manual intervention, save a large amount of time cost and human cost, significant increase
The efficiency of data processing.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Corresponding to a kind of data processing method described in foregoing embodiments, Fig. 5 shows provided in an embodiment of the present invention one
One embodiment structure chart of kind data processing equipment.
In the present embodiment, a kind of data processing equipment may include:
Packet-receiving module 501, the data packet for acquiring and sending for receiving preset packet catcher, the data packet
In include one or more data record;
Format match module 502, for carrying out format match to target record according to preset regular expression resources bank,
It determines the data format of the data packet, includes more than one regular expression in the regular expression resources bank, each
Regular expression both corresponds to a kind of data format, and the target record is any one data record in the data packet;
Rule searching module 503 is handled, it is described for searching target processing rule in preset data processing rule library
Target processing rule is data processing rule corresponding with the data format of the data packet;
Data processing module 504 records the pieces of data in the data packet for handling rule according to the target
It is respectively processed, the data packet that obtains that treated.
Further, the format match module may include:
Successful match rate computing unit, for calculating separately institute according to the history match record in preset statistical time range
State the successful match rate of each regular expression in regular expression resources bank;
Candidate expression formula selection unit was not yet selected for choosing one from the regular expression resources bank
The highest regular expression of successful match rate is as candidate expression formula;
Format match unit, for carrying out format match to the target record using the candidate expression formula;
It is described from the regular expression resources bank to return to execution if failing for format match for first processing units
The step of highest regular expression of one successful match rate being not yet selected of middle selection is as candidate expression formula, until lattice
Until formula successful match;
The second processing unit, if for format match success, it will be corresponding with the candidate expression formula of successful match
Data format is determined as the data format of the data packet.
Further, the successful match rate computing unit may include:
Sub-period divides subelement, and for the statistical time range to be divided into T sub-period, T is positive integer;
Number counts subelement, for counting each regular expression in the regular expression resources bank respectively each
Successful match number in a sub-period;
Successful match rate computation subunit, it is each in the regular expression resources bank for calculating separately according to the following formula
The successful match rate of regular expression:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expressions in the regular expression resources bank
The sum of formula, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the canonical table
Up to successful match number of n-th of the regular expression in formula resources bank in t-th of sub-period, WeighttFor preset power
Weight coefficient, and Weightt<Weightt+1, MatSucRationFor n-th of regular expressions in the regular expression resources bank
The successful match rate of formula.
Further, the data processing equipment can also include:
Number of data packets statistical module, for the total number of data packet to be processed such as counting;
Configuration file obtains module, if the total number for the data packet to be processed such as described is greater than preset number threshold
Value, then obtain the configuration file of preset each preliminary data processing terminal, and is determined according to the configuration file each spare
Data format corresponding to data processing terminal;
Assemblage classification module, for each preliminary data processing terminal to be divided in corresponding data processing cluster,
In, data format corresponding to the preliminary data processing terminal in same data processing cluster is consistent;
Cluster chooses module, each standby in the target cluster for selection target cluster corresponding with the data packet
The data format corresponding to data processing terminal is consistent with the data format of the data packet;
Packet sending module is handled for the data packet to be sent to the target cluster.
Further, the data processing equipment can also include:
Number enquiry module sends data packet for each preliminary data processing terminal into the target cluster respectively
Inquiry request, and the pending data packet number of the feedback of each preliminary data processing terminal in the target cluster is received respectively
Mesh;
Terminal chooses module, for choosing from the smallest preliminary data of pending data packet number from the target cluster
Terminal is managed as preferred process terminal;
Allocation of packets module, for handling the allocation of packets to the preferred process terminal.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description,
The specific work process of module and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
The schematic block diagram that Fig. 6 shows a kind of terminal device provided in an embodiment of the present invention is only shown for ease of description
Part related to the embodiment of the present invention.
In the present embodiment, the terminal device 6 can be desktop PC, notebook, palm PC and cloud clothes
Business device etc. calculates equipment.The terminal device 6 can include: processor 60, memory 61 and be stored in the memory 61 simultaneously
The computer-readable instruction 62 that can be run on the processor 60, such as executing the computer of above-mentioned data processing method can
Reading instruction.The processor 60 is realized when executing the computer-readable instruction 62 in above-mentioned each data processing method embodiment
The step of, such as step S101 to S104 shown in FIG. 1.Alternatively, the processor 60 executes the computer-readable instruction 62
The function of each module/unit in the above-mentioned each Installation practice of Shi Shixian, such as the function of module 501 to 504 shown in Fig. 5.
Illustratively, the computer-readable instruction 62 can be divided into one or more module/units, one
Or multiple module/units are stored in the memory 61, and are executed by the processor 60, to complete the present invention.Institute
Stating one or more module/units can be the series of computation machine readable instruction section that can complete specific function, the instruction segment
For describing implementation procedure of the computer-readable instruction 62 in the terminal device 6.
The processor 60 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 61 can be the internal storage unit of the terminal device 6, such as the hard disk or interior of terminal device 6
It deposits.The memory 61 is also possible to the External memory equipment of the terminal device 6, such as be equipped on the terminal device 6
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 61 can also both include the storage inside list of the terminal device 6
Member also includes External memory equipment.The memory 61 is for storing the computer-readable instruction and the terminal device 6
Required other instruction and datas.The memory 61 can be also used for temporarily storing the number that has exported or will export
According to.
The functional units in various embodiments of the present invention may be integrated into one processing unit, is also possible to each
Unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit both may be used
To use formal implementation of hardware, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a storage medium, including several computer-readable instructions are used so that one
Platform computer equipment (can be personal computer, server or the network equipment etc.) executes described in each embodiment of the present invention
The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-
Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with
Store the medium of computer-readable instruction.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of data processing method characterized by comprising
The data packet that preset packet catcher is acquired and sent is received, includes the data record of one or more in the data packet;
Format match is carried out to target record according to preset regular expression resources bank, determines the data lattice of the data packet
Formula, includes more than one regular expression in the regular expression resources bank, and each regular expression both corresponds to one kind
Data format, the target record are any one data record in the data packet;
The lookup target processing rule in preset data processing rule library, the target processing rule are and the data packet
The corresponding data processing rule of data format;
Rule is handled according to the target to be respectively processed the pieces of data record in the data packet, obtains that treated
Data packet.
2. data processing method according to claim 1, which is characterized in that described according to preset regular expression resource
Library carries out format match to target record, determines that the data format of the data packet includes:
It is each in the regular expression resources bank according to being calculated separately in the history match record in preset statistical time range
The successful match rate of regular expression;
A highest regular expression of successful match rate being not yet selected is chosen from the regular expression resources bank
As candidate expression formula;
Format match is carried out to the target record using the candidate expression formula;
If format match fails, returns to the execution selection one from the regular expression resources bank and be not yet selected
The highest regular expression of successful match rate as candidate expression formula the step of, until format match successfully until;
If format match success, data format corresponding with candidate's expression formula of successful match is determined as the data
The data format of packet.
3. data processing method according to claim 2, which is characterized in that the basis is in preset statistical time range
The successful match rate that history match record calculates separately each regular expression in the regular expression resources bank includes:
The statistical time range is divided into T sub-period, T is positive integer;
The successful match in each sub-period time of each regular expression in the regular expression resources bank is counted respectively
Number;
The successful match rate of each regular expression in the regular expression resources bank is calculated separately according to the following formula:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expression in the regular expression resources bank
Sum, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the regular expression
Successful match number of n-th of regular expression in t-th of sub-period in resources bank, WeighttFor preset weight system
Number, and Weightt<Weightt+1, MatSucRationFor n-th of regular expression in the regular expression resources bank
Successful match rate.
4. data processing method according to any one of claim 1 to 3, which is characterized in that determining the data packet
Data format after, further includes:
The total number of the data packets to be processed such as statistics;
If the total number of the data packet to be processed such as described is greater than preset quantity threshold, preset each preliminary data is obtained
The configuration file of processing terminal, and according to the configuration file determine each preliminary data processing terminal corresponding to data lattice
Formula;
Each preliminary data processing terminal is divided in corresponding data processing cluster, wherein in same data processing cluster
Preliminary data processing terminal corresponding to data format it is consistent;
Target cluster corresponding with the data packet is chosen, in the target cluster corresponding to each preliminary data processing terminal
Data format is consistent with the data format of the data packet;
The data packet is sent to the target cluster to handle.
5. data package processing method according to claim 4, which is characterized in that the data packet is being sent to the mesh
After mark cluster is handled, further includes:
Each preliminary data processing terminal into the target cluster sends data packet inquiry request respectively, and receives institute respectively
State the pending data packet number of each preliminary data processing terminal feedback in target cluster;
The smallest preliminary data processing terminal of pending data packet number is chosen from the target cluster as preferred process end
End;
The allocation of packets to the preferred process terminal is handled.
6. a kind of data processing equipment characterized by comprising
Packet-receiving module, the data packet for acquiring and sending for receiving preset packet catcher include in the data packet
The data record of one or more;
Format match module determines institute for carrying out format match to target record according to preset regular expression resources bank
The data format of data packet is stated, includes more than one regular expression, each canonical table in the regular expression resources bank
A kind of data format is both corresponded to up to formula, the target record is any one data record in the data packet;
Rule searching module is handled, it is regular for searching target processing in preset data processing rule library, at the target
Reason rule is data processing rule corresponding with the data format of the data packet;
Data processing module carries out the pieces of data record in the data packet for handling rule according to the target respectively
Processing, the data packet that obtains that treated.
7. data processing equipment according to claim 6, which is characterized in that the format match module includes:
Successful match rate computing unit, for according in preset statistical time range history match record calculate separately it is described just
The then successful match rate of each regular expression in expression formula resources bank;
Candidate expression formula selection unit, for choosing a matching being not yet selected from the regular expression resources bank
The highest regular expression of success rate is as candidate expression formula;
Format match unit, for carrying out format match to the target record using the candidate expression formula;
First processing units are returned and are selected from the regular expression resources bank described in execution if failing for format match
The step of highest regular expression of the successful match rate being not yet selected is as candidate expression formula is taken, until format
Until success;
The second processing unit, if for format match success, by data corresponding with candidate's expression formula of successful match
Format is determined as the data format of the data packet.
8. data processing equipment according to claim 7, which is characterized in that the successful match rate computing unit includes:
Sub-period divides subelement, and for the statistical time range to be divided into T sub-period, T is positive integer;
Number counts subelement, for counting each regular expression in the regular expression resources bank respectively in each height
Successful match number in period;
Successful match rate computation subunit, for calculating separately each canonical in the regular expression resources bank according to the following formula
The successful match rate of expression formula:
Wherein, n is the serial number of regular expression, and 1≤n≤N, N are the regular expression in the regular expression resources bank
Sum, t are the serial number that sub-period is arranged according to chronological order, 1≤t≤T, MatSucNumn,tFor the regular expression
Successful match number of n-th of regular expression in t-th of sub-period in resources bank, WeighttFor preset weight system
Number, and Weightt<Weightt+1, MatSucRationFor n-th of regular expression in the regular expression resources bank
Successful match rate.
9. a kind of computer readable storage medium, the computer-readable recording medium storage has computer-readable instruction, special
Sign is, is realized at the data as described in any one of claims 1 to 5 when the computer-readable instruction is executed by processor
The step of reason method.
10. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor
The computer-readable instruction of operation, which is characterized in that the processor realizes such as right when executing the computer-readable instruction
It is required that the step of data processing method described in any one of 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423175.6A CN110245155A (en) | 2019-05-21 | 2019-05-21 | Data processing method, device, computer readable storage medium and terminal device |
PCT/CN2019/103039 WO2020232880A1 (en) | 2019-05-21 | 2019-08-28 | Data processing method and apparatus, storage medium and terminal device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423175.6A CN110245155A (en) | 2019-05-21 | 2019-05-21 | Data processing method, device, computer readable storage medium and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110245155A true CN110245155A (en) | 2019-09-17 |
Family
ID=67884683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910423175.6A Pending CN110245155A (en) | 2019-05-21 | 2019-05-21 | Data processing method, device, computer readable storage medium and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110245155A (en) |
WO (1) | WO2020232880A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113660530A (en) * | 2021-07-27 | 2021-11-16 | 中央广播电视总台 | Program stream data capturing method and device, computer equipment and readable storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113656659A (en) * | 2021-08-31 | 2021-11-16 | 上海观安信息技术股份有限公司 | Data extraction method, device and system and computer readable storage medium |
CN115757423B (en) * | 2022-11-29 | 2024-01-30 | 中诚智信工程咨询集团股份有限公司 | Engineering cost data correction method, system, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107786545A (en) * | 2017-09-29 | 2018-03-09 | 中国平安人寿保险股份有限公司 | A kind of attack detection method and terminal device |
CN109299164A (en) * | 2018-09-03 | 2019-02-01 | 中国平安人寿保险股份有限公司 | A kind of data query method, computer readable storage medium and terminal device |
CN109656487A (en) * | 2018-12-24 | 2019-04-19 | 平安科技(深圳)有限公司 | A kind of data processing method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8374986B2 (en) * | 2008-05-15 | 2013-02-12 | Exegy Incorporated | Method and system for accelerated stream processing |
CN103078808B (en) * | 2012-12-29 | 2015-09-30 | 大连环宇移动科技有限公司 | The data flow being applicable to multithread matching regular expressions exchanges multiplex system and method |
CN107766466A (en) * | 2017-09-29 | 2018-03-06 | 上海望友信息科技有限公司 | Recognition methods, system, computer-readable recording medium and the equipment of data type |
CN107729475B (en) * | 2017-10-16 | 2021-07-02 | 深圳视界信息技术有限公司 | Webpage element acquisition method, device, terminal and computer-readable storage medium |
-
2019
- 2019-05-21 CN CN201910423175.6A patent/CN110245155A/en active Pending
- 2019-08-28 WO PCT/CN2019/103039 patent/WO2020232880A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107786545A (en) * | 2017-09-29 | 2018-03-09 | 中国平安人寿保险股份有限公司 | A kind of attack detection method and terminal device |
CN109299164A (en) * | 2018-09-03 | 2019-02-01 | 中国平安人寿保险股份有限公司 | A kind of data query method, computer readable storage medium and terminal device |
CN109656487A (en) * | 2018-12-24 | 2019-04-19 | 平安科技(深圳)有限公司 | A kind of data processing method, device, equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113660530A (en) * | 2021-07-27 | 2021-11-16 | 中央广播电视总台 | Program stream data capturing method and device, computer equipment and readable storage medium |
CN113660530B (en) * | 2021-07-27 | 2024-03-19 | 中央广播电视总台 | Program stream data grabbing method and device, computer equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020232880A1 (en) | 2020-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102402605B (en) | Mixed distribution model for search engine indexing | |
CN110245155A (en) | Data processing method, device, computer readable storage medium and terminal device | |
CN103279478A (en) | Method for extracting features based on distributed mutual information documents | |
EP3516539B1 (en) | Techniques for in-memory key range searches | |
WO2012039760A1 (en) | Processing of categorized product information cross reference to other applications | |
CN111339078A (en) | Data real-time storage method, data query method, device, equipment and medium | |
CN110245289A (en) | A kind of information search method and relevant device | |
CN102169491B (en) | Dynamic detection method for multi-data concentrated and repeated records | |
CN108304354A (en) | A kind of prediction model training method and device, storage medium, electronic equipment | |
CN109885651B (en) | Question pushing method and device | |
CN112328688B (en) | Data storage method, device, computer equipment and storage medium | |
WO2021126439A1 (en) | Selecting a normalized form for conversion of a query expression | |
CN103970747A (en) | Data processing method for network side computer to order search results | |
CN105117442A (en) | Probability based big data query method | |
CN109947729A (en) | A kind of real-time data analysis method and device | |
CN112650743A (en) | Funnel data analysis method and system, electronic device and storage medium | |
Romero Hung et al. | ACE-GCN: A Fast data-driven FPGA accelerator for GCN embedding | |
CN111159213A (en) | Data query method, device, system and storage medium | |
US11748255B1 (en) | Method for searching free blocks in bitmap data, and related components | |
US11709798B2 (en) | Hash suppression | |
CN115563310A (en) | Method, device, equipment and medium for determining key service node | |
WO2018136371A1 (en) | Compressed encoding for bit sequence | |
CN105468603B (en) | Data selecting method and device | |
CN115454356B (en) | Data file processing method, device and equipment based on recognition and aggregation algorithm | |
CN115510292B (en) | Distributed storage system tree search management method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |