CN109726284A

CN109726284A - A kind of versatile data analysing method

Info

Publication number: CN109726284A
Application number: CN201811497451.5A
Authority: CN
Inventors: 曾小强; 徐滢
Original assignee: Chengdu Pinguo Technology Co Ltd
Current assignee: Chengdu Pinguo Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2019-05-07
Anticipated expiration: 2038-12-07
Also published as: CN109726284B

Abstract

The present invention discloses a kind of versatile data analysing method, belongs to data analysis technique field, comprising the following steps: S1, be split initial data to be processed with behavior unit, obtain at least a line line of text；S2, at least one data configuration table is set, the content set in the data configuration table includes: regular expression, keyword, number of samples, value type and interval time；S3, successively by the obtained every style of writing current row of step S1 and meanwhile be sent to step S2 setting all data configuration tables carry out data classification extraction, obtain data classification extract result；S4, the data classification for obtaining step S3 extract result and generate chart, obtain data analysis result；The problem of its is versatile, easy to operate, easy to maintain, and independence is good, solves the cumbersome difficulty of prior art extraction and analysis data, and time-consuming, low efficiency, and data analyze poor universality.

Description

A kind of versatile data analysing method

Technical field

The present invention relates to data analysis technique fields, more particularly to a kind of versatile data analysing method.

Background technique

In program development process, the relevant log information of some debugging of printing, the data warp in these information are often had It often can individually be extracted by engineers and be counted to obtain corresponding reference data, preferably to modify program.But Due to various information format disunities, data volume is big, and data structure is complicated, and many times also need according to these data into The drafting of some charts of row just can be appreciated that its trend etc., cause the cumbersome difficulty of engineer's extraction and analysis data, time-consuming, efficiency It is low.Likewise, not only in program development process situation in this way, other scenes such as handle some financial statements, statistical table etc. When text data information, these problems can be similarly encountered.

Existing main data analysis has following several method:

The first disposably can carry out same operation to all rows using it using the text editor of some complexity The screening and filtering of data is carried out, is then introduced into office software and carries out tabular, finally average, maximum value etc. institute Need data；

Second, one section of program for the data format is write to extract data and export；

Statistics codes are just built into program by the third in the program development stage, are counted by the methods of log Data output.

Defect of the existing technology:

The first: many data information formats are complicated, and can have the information of a variety of different-formats, and manual uses text Software for editing is operated, not only time-consuming but also easy error；

Second: poor universality because the Debugging message in project is multifarious, it is impossible to for every suit information go into Row exploitation, and it is maintainable very poor；

The third: built-in statistical code will increase the unstability of program, may lead to the problem of new, and workload is huge Greatly.

Summary of the invention

To solve the above-mentioned problems, the present invention provides a kind of versatile data analysing method, solves the prior art The problem of cumbersome difficulty of extraction and analysis data, time-consuming, low efficiency, and data analyze poor universality.

For this purpose, the technical solution adopted by the present invention is that:

A kind of versatile data analysing method is provided, method includes the following steps:

S1, initial data to be processed is split with behavior unit, obtains at least a line line of text；

S2, set at least one data configuration table, the content set in the data configuration table include: regular expression, Keyword, number of samples, value type and interval time；

S3, successively by the obtained every style of writing current row of step S1 and meanwhile be sent to step S2 setting all data configuration tables into Row data classification is extracted, and is obtained data classification and is extracted result；

S4, the data classification for obtaining step S3 extract result and generate chart, obtain data analysis result.

Further, in step S3, for all data configuration tables, each data configuration table be performed simultaneously respectively with Lower step:

S31, successively by the obtained line of text of step S1 be sent to some data configuration table carry out data classification extraction；

Some described data configuration table is named as current data allocation list, wherein single file text row is matched with current data The data classification for setting table extracts specific steps are as follows:

The line of text for carrying out data classification extraction with the current data allocation list is known as current text row；

S311, the sample count for initializing current data allocation list and row are counted as zero；

S312, current text row and current data allocation list are subjected to regular expression matching, if it does not match, label Current text row and current data configuration list processing are completed, and current text row is waited to match completion with remaining data configuration table； If it does, then extracting corresponding with character in the regular expression round bracket in current data allocation list in current text row Data, and sample count adds 1, count is incremented for row, and the sample count and row for obtaining current data allocation list count；

S313, according to the key sequence in current data allocation list, the keyword for setting current data allocation list is corresponding Value, generate the title and time shaft of tables of data, if data table name is not present, creates to summarize accordingly in tables of data and deposit Storage unit, temporary storage cell and chart storage unit continue to use already present tables of data if data table name is existing, By the combinations of values in time shaft line of text corresponding with keyword name, key assignments is added to summarizing in the tables of data in a pair Storage unit；

S314, the current size relation according to allocation list sample count and number of samples that S312 is obtained is judged, if described Currently it is less than the number of samples according to allocation list sample count, then the corresponding value of keyword name is added to corresponding tables of data Interim storage, label current text row and current data configuration list processing complete, wait current text row with it is remaining Data configuration list notation is completed；If the sample count is equal to the number of samples, the corresponding value of keyword name is added It is added to the interim storage of corresponding tables of data, according to the value type set in data configuration table, takes interim storage single Analog value in member, and be added in chart storage element, while sample count zero setting, empty temporary storage cell；

S32, when certain style of writing current row completely pass through step S311-S314, then the line of text after the style of writing current row according to Secondary extract with data classification that is being currently configured table executes following steps:

S321, current text row and current data allocation list are subjected to regular expression matching, if it does not match, label Current text row and current data configuration list processing are completed, and current text row is waited to match completion with remaining data configuration table； If it does, then extracting corresponding with character in the regular expression round bracket in current data allocation list in current text row Data, and sample count adds 1, count is incremented for row, and the sample count and row for obtaining current data allocation list count；

The size relation of S322, the difference for judging timestamp and the interval time, if the difference is less than or equal to The interval time then marks current text row and current data to configure list processing and completes, wait current text row with it is remaining Data configuration list notation is completed；If the difference is greater than the interval time, according to the keyword in current data allocation list Sequentially, the corresponding value of keyword for setting current data allocation list, generates the title and time shaft of tables of data, if data table name Title is not present, then creates and summarize storage unit, temporary storage cell and chart storage unit in tables of data accordingly, if data Table name is existing, then continues to use already present tables of data, by the numerical value group in time shaft line of text corresponding with keyword name A pair of of key assignments of synthesis, which is added in the tables of data, summarizes storage unit；The difference of the timestamp is for current time stamp and currently The difference of timestamp when last line of text complete process is completed in data configuration table；

Last line of text complete process completion refers to some last line of text in current-configuration table in current-configuration table Completely pass through step S31-S34, is not directly labeled as processing in the process and completes, wait line of text and remaining data Allocation list matching is completed.

S323, the current size relation according to allocation list sample count and number of samples that S321 is obtained is judged, if described Currently it is less than the number of samples according to allocation list sample count, then the corresponding value of keyword name is added to corresponding tables of data Interim storage, label current text row and current data configuration list processing complete, wait current text row with it is remaining Data configuration list notation is completed；If the sample count is equal to the number of samples, the corresponding value of keyword name is added It is added to the interim storage of corresponding tables of data, according to the value type set in data configuration table, takes interim storage single Analog value in member, and be added in chart storage element, while sample count zero setting, empty temporary storage cell.

Every style of writing current row all simultaneously according to step S311-S314 or step S321-S323, with all data configuration tables into Row data classification is extracted, and similarly, data classification of each data configuration table Jing Guo all line of text is extracted.

Step S314 and S323 are mainly used to generate the data of chart dictionary, do not influence to summarize dictionary, because sometimes such as The data of fruit chart are too many, can seem very complicated, are unfavorable for analyzing, so filtering out the data of some charts with S314.

Further, the content that the data configuration table in the step S2 is set further includes the period, then step S313 Between S314 and S322 and S323, all the following steps are included:

Judge that the time shaft whether there is in the period, then updates and summarize dictionary, if there is no then marking the row Line of text and the data configuration list processing are completed.

Further, the value type is average value, maximum value or minimum value.

Further, the keyword includes: at least one mark；One time shaft；At least one title.

Further, the keyword further includes invalid.

Further, the regular expression is advanced according to the text in step S1 in the data configuration table Row setting, then adds round bracket in the regular expression and keyword corresponds.

Regular expression is that (including general character (for example, letter between a to z) and spcial character is (referred to as to character string " metacharacter ")) operation a kind of logical formula, be exactly the group with predefined some specific characters and these specific characters It closes, forms one " regular character string ", this " regular character string " is used to express a kind of filter logic to character string.Canonical table It is a kind of Text Mode up to formula, matched one or more character strings are wanted in mode description when searching for text.

All data configuration tables of the method for the present invention are used equally for the processing of data with existing or real time data, only basis The parameter of configuration can generate different output results.For example the interval time being arranged is 500 milliseconds, the data generated in real time will be New valid data are just generated after 500 milliseconds that valid data generate, the data during this just can be filtered.But if being Have data, computer disposal is very fast, may all data processings it is complete all in 500 milliseconds, so just almost without significant figure According to generation.The interval time for comparing general data with existing is set as 0, the data interval time setting 0 or other generated in real time Value is ok.So just not clearly stating initial data is data with existing or the data generated in real time, oneself is according to demand Allocation list is arranged.

Using the technical program the utility model has the advantages that

1. compared with prior art, the method for the present invention is due to that can carry out flexible to data allocation list according to actual needs Match, the data file that can be used for having generated, be readily applicable to the data generated in real time, and can generate it is different output as a result, Only needing to configure different data configuration tables can be achieved with.

2. the method for the present invention versatility is extremely strong, match according to regular expressions, can match nearly all can be converted into text Data.

3. the method for the present invention can generate various charts, use that can be more convenient stores and checks data statistics.

4. the method for the present invention independence is good, do not need to carry out any modification to initial data.

Detailed description of the invention

Fig. 1 is the flow chart of the method for the present invention；

Fig. 2 is the data analysis result figure that generates in real time in a kind of embodiment of the method for the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is further elaborated.

In the present embodiment, as shown in Fig. 1~2, a kind of versatile data analysing method, this method includes following step It is rapid:

S1, initial data to be processed is split with behavior unit, obtains N style of writing current row；

The obtained following image data of text behavior:

LINE1:09-30 10:45:39.880 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29

LINE2:09-30 10:45:40.393 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29

LINE3:09-30 10:45:40.902 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=31

LINE4:09-30 10:45:41.410 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29

LINE5:09-30 10:45:41.940 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=30

LINE6:09-30 10:45:42.454 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29

LINE7:09-30 10:45:42.995 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29

...

LINEN:09-30 10:45:42.995 1,481 2124 D HawkeyeD:onGlFpsCalculated, key= Activity_vStudio.Android.Camera360.activity.CameraMainAc tivity, lastFps=29.

S2, setting M data allocation list config_table1,2,3...M, the content set in the data configuration table It include: regular expression, keyword, number of samples, value type, period and interval time；

In the data configuration table, the regular expression is constituted according to the data of the line of text in step S1 and is carried out Setting, then adds round bracket in the regular expression and keyword corresponds.

The keyword includes: mark flag；Time shaft time；Invalid invalid；Title: frame per second.

Regular expression are as follows: ([d-]+s [d:.]+) [d s]+D HawkeyeD:(onGlFpsCalculated), Key=Activity_ ([w.]+), lastFps=(d+)；

The corresponding key sequence of the round bracket of regular expression is time；invalid；flag；Frame per second；

Set number of samples sampling_n:3；

Set value type sampling_type: average value；

Interval time sampling_time:500 milliseconds of setting；

Set period of time time_slots:09-30 10:45:13.298/09-30 10:45:44.063,09-30 10: 50:17.311/09-30 10:50:47.441；

S3, successively LINE1, LINE2 ... LINEN are sent to M data allocation list simultaneously and carry out data classification extraction, obtained To data processed result；

In step S3, for all data configuration tables, each data configuration table is performed simultaneously following steps respectively:

It will be known as working as with the current data allocation list config_table1 line of text for carrying out data classification extraction Preceding line of text LINE1；

S311, sample count sampling_count=0, the row for initializing current data allocation list config_table1 Count line_count=0；

Specifically, LINE1 and data configuration table config_table1 to be carried out to the process of data classification extraction are as follows:

Step 1: initializing the sample count sampling_count=0 of config_table1, row counts line_count =0；

Step 2: LINE1 and config_table1 being subjected to regular expression matching, if it does not match, label L INE1 It is completed with config_table1 processing, LINE1 is waited to finish with remaining all data configuration list notations；If it does, then mentioning The data in the round bracket of the regular expression in config_table1 in the corresponding LINE1 of character are taken out, sample count adds 1, Count is incremented for row；It is specific: extract round bracket value value, keyword time axis time corresponds to bracket ([d-]+s [d:.]+), Its value is time'09-3010:45:39.880'；Keyword invalid corresponds to bracket (onGlFpsCalculated), and value is invalid'onGlFpsCalculated'；Keyword identification flag corresponds to bracket ([w.]+), value flag'vStudio .Android.Camera360.activity.CameraMainActivity'；Keyword name frame per second corresponds to bracket (d+), Its value be ' 29'；

Step 3:

A. data_table1=/vStudio.Android.Camera360.activity.CameraMa inAct is created ivity/09-30 10:45:13.298/09-3010:45:44.063；Data_table1=/vStudio.Android.Cam era360.activity.CameraMa inActivity/09-30 10:50:17.311/09-30 10:50:47.441；

B.data_time=09-30 10:45:39.880；

C.name=frame per second, creation summarize dictionary data_ frame per second X, chart dictionary data_chart_ frame per second X, interim to store Dictionary data_chart_ frame per second X；

Judge that the time shaft whether there is in the period, summarizes dictionary if so, updating, if it is not, then label L INE1 It is completed with config_table1 processing.

Update summarizes dictionary: summarizing dictionary data_ frame per second X addition { ' 09-30 10:45:39.880':29 } this group of data；

Update chart dictionary: chart dictionary data_chart_ frame per second X addition ' 09-30 10:45:39.880':29 this Group data；

Step 4: if temporarily storage dictionary data_chart_ frame per second X in have data ' 09-30 10:45:39.880': 29, ' 09-30 10:45:40.393':29, ' 09-30 10:45:40.902':31 }, then average value=29.667, chart dictionary Data_chart_ frame per second X addition ' 09-30 10:45:39.880':29.667；

When LINE1 and config_table1 carries out above-mentioned steps 1-4, LINE1 is held with remaining data configuration table simultaneously Row step step 1-4；Therefore, when LINE1 and config_table1 complete step step 1-4, at the same also with remaining data Allocation list completes step step 1-4.

Similarly, by remaining line of text successively with config_table1 carry out step 1-4 when, while with remaining number Step 1-4 is performed according to allocation list.If each data configuration table successively have passed through institute from the perspective of data configuration table There is the data classification of line of text to extract, the data classification of extraction is stored.

The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent thereof.

Claims

1. a kind of versatile data analysing method, which comprises the following steps:

S2, at least one data configuration table is set, the content set in the data configuration table includes: regular expression, key Word, number of samples, value type and interval time；

S3, successively by the obtained every style of writing current row of step S1 and meanwhile be sent to step S2 setting all data configuration tables count It is extracted according to classification, obtains data classification and extract result；

2. the method according to claim 1, wherein in step S3, for all data configuration tables, each data Allocation list is all performed simultaneously following steps respectively:

Some described data configuration table is named as current data allocation list, wherein single file text row and current data allocation list Data classification extract specific steps are as follows:

S312, current text row and current data allocation list are subjected to regular expression matching, if it does not match, label is current Line of text and current data configuration list processing are completed, and current text row is waited to match completion with remaining data configuration table；If Matching, then extract number corresponding with character in the regular expression round bracket in current data allocation list in current text row According to, and sample count adds 1, count is incremented for row, and the sample count and row for obtaining current data allocation list count；

S313, according to the key sequence in current data allocation list, set the corresponding value of keyword of current data allocation list, The title and time shaft for generating tables of data create if data table name is not present and summarize storage list in tables of data accordingly Member, temporary storage cell and chart storage unit continue to use already present tables of data if data table name is existing, by when Between combinations of values in axis line of text corresponding with keyword name key assignments is added in the tables of data and summarizes storage in a pair Unit；

S314, the current size relation according to allocation list sample count and number of samples that S312 is obtained is judged, if described current It is less than the number of samples according to allocation list sample count, then the corresponding value of keyword name is added to facing for corresponding tables of data When storage element, label current text row and current data configuration list processing complete, and waits current text row and remaining data List notation is configured to complete；If the sample count is equal to the number of samples, the corresponding value of keyword name is added to The interim storage of corresponding tables of data takes in interim storage according to the value type set in data configuration table Analog value, and be added in chart storage element, while sample count zero setting, empty temporary storage cell；

S32, when certain style of writing current row completely pass through step S311-S314, then the line of text after the style of writing current row successively with The data classification for being currently configured table, which is extracted, executes following steps:

S321, current text row and current data allocation list are subjected to regular expression matching, if it does not match, label is current Line of text and current data configuration list processing are completed, and current text row is waited to match completion with remaining data configuration table；If Matching, then extract number corresponding with character in the regular expression round bracket in current data allocation list in current text row According to, and sample count adds 1, count is incremented for row, and the sample count and row for obtaining current data allocation list count；

The size relation of S322, the difference for judging timestamp and the interval time, if the difference is less than or equal to described Interval time then marks current text row and current data to configure list processing and completes, waits current text row and remaining data List notation is configured to complete；If the difference is greater than the interval time, according to the key sequence in current data allocation list, The corresponding value of keyword for setting current data allocation list, generates the title and time shaft of tables of data, if data table name is not In the presence of then creating and summarize storage unit, temporary storage cell and chart storage unit in tables of data accordingly, if data table name Claim existing, then continue to use already present tables of data, by the combinations of values in time shaft line of text corresponding with keyword name at A pair of of key assignments, which is added in the tables of data, summarizes storage unit；The difference of the timestamp is current time stamp and current data The difference of timestamp when last line of text complete process is completed in allocation list；

S323, the current size relation according to allocation list sample count and number of samples that S321 is obtained is judged, if described current It is less than the number of samples according to allocation list sample count, then the corresponding value of keyword name is added to facing for corresponding tables of data When storage element, label current text row and current data configuration list processing complete, and waits current text row and remaining data List notation is configured to complete；If the sample count is equal to the number of samples, the corresponding value of keyword name is added to The interim storage of corresponding tables of data takes in interim storage according to the value type set in data configuration table Analog value, and be added in chart storage element, while sample count zero setting, empty temporary storage cell.

3. according to the method described in claim 2, it is characterized in that, the content that sets of the data configuration table in the step S2 also Including the period, then between step S313 and S314 and S322 and S323, all the following steps are included:

Judge that the time shaft whether there is in the period, then updates and summarize dictionary, if there is no then marking the style of writing sheet It is capable to be completed with the data configuration list processing.

4. method according to claim 1 or 2, which is characterized in that the value type is average value, maximum value or minimum Value.

5. method according to claim 1 or 2, which is characterized in that the keyword includes: at least one mark；One Time shaft；At least one title.

6. according to the method described in claim 5, it is characterized in that, the keyword further includes invalid.

7. the method according to claim 1, wherein in the data configuration table, the regular expression according to The line of text in step S1 is set, and round bracket is then added in the regular expression and keyword one is a pair of It answers.