CN110287308A - A kind of computer data formula statistical method - Google Patents
A kind of computer data formula statistical method Download PDFInfo
- Publication number
- CN110287308A CN110287308A CN201910513867.XA CN201910513867A CN110287308A CN 110287308 A CN110287308 A CN 110287308A CN 201910513867 A CN201910513867 A CN 201910513867A CN 110287308 A CN110287308 A CN 110287308A
- Authority
- CN
- China
- Prior art keywords
- data
- statistical
- computer data
- statistical method
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of computer data formula statistical methods, comprising: step A, initialization carry out cleaning correction to statistical data;Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.The demand on demand screened of the present invention to solve mass data, it is extracted for text data, database data can directly be shown, the high complexity and deployment O&M cost of existing data base are overcome by new data pattern, improve the calculating scope of application, the exploration to data rule is accelerated, complexity is reduced, it is easier to be used.
Description
Technical field
The present invention relates to data structure computing technique fields, more particularly to a kind of computer data formula statistical method.
Background technique
MongoDB is the database based on distributed document storage, is write by C Plus Plus, it is intended to mention for WEB application
For expansible high-performance data storage solution.
MongoDB is a product between relational database and non-relational database, is that non-relational database is worked as
Middle function is most abundant, is most like relational database.The data structure that it is supported is very loose, is the bson format of similar json,
Therefore it can store more complicated data type.The feature of Mongo maximum is that the query language that it is supported is very powerful,
Grammer is somewhat similarly to the query language of object-oriented, and the overwhelming majority of similarity relation database list table inquiry almost may be implemented
Function, but also support to establish data and index.
Existing open source technology MongoDB is the non-relational database based on distributed document storage, is had loosely
Key-value pair data structure.However this technology is just for the solution of Distributed Storage, it is special comprising the calculating to data
It is not to show for the global regularity in (elasticity) variable length time interval, while it is more more flexible than the calculation of template formula, it can
It is existing defects when being extracted to text data, database data can not be accomplished directly to show.
Summary of the invention
Technical problem to be solved by the present invention lies in: a kind of computer data formula statistical method is provided, to solve
The demand of mass data screened on demand, is extracted for text data, and database data can directly be shown, improves calculating
The scope of application accelerates the exploration to data rule, reduces complexity, it is easier to use.
In order to solve the above technical problems, the invention proposes a kind of computer data formula statistical methods, comprising:
Step A, initialization, carries out cleaning correction to statistical data;
Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;
Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;
Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.
Further, the step A includes:
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, to statistical number
Keyword in carries out driving replacement operation;
A2. the statistical data of construction error is corrected.
Further, directly text initial data can be operated in the step A1, rather than obtains db information, expanded
The scope of application.
Further, the construction error is the time to count format error.
Further, it is default null value or designated value that the step B, which includes the supplement inadequate to data structured,.
Further, the step C includes sorting out corresponding statistics by specified time zone vernier in given text
Data, each time zone vernier statistics is primary, and is stored in document database CelDB, completes until calculating.
Further, the implementation method of the step C is vernier integration method.
Further, the step D is included in different charts or temporally axis Dynamic Display change procedure.
Above-mentioned technical proposal at least has the following beneficial effects:
1. the demand on demand screened of the present invention to solve mass data, is extracted, database number for text data
According to can directly show.
2. high complexity and deployment O&M cost that the present invention overcomes existing data base by new data pattern.
3. for presently, there are high complicated dynamic behaviour problem, the present invention only need to do determination to elastic time zone, can be to
There are complicated chaotic data to do counting statistics and intuitive displaying.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The present invention is described further below.
Computer data formula statistical method of the invention the following steps are included:
Step A, initialization, carries out cleaning correction to statistical data.
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, to statistical number
Keyword in carries out driving replacement operation, in actual count, can directly operate, rather than obtain to text initial data
Db information expands the scope of application.
A2. the statistical data of time statistics format error is corrected, for example default time statistics format is " 2019-
03-28 11:59:51.114758 ", when the time format in statistical data be " 11:59:51 " either " 2019-03-28 " or
When " 2019-03-2811:59:51 ", can all it be corrected as " when the moon in year-- day-point-second-microsecond ".Cleaning correction includes creation
Path to be written such as checks file to be sorted, creation database, pre-reads by type at the modes.
Step B is formatted, and creates structured data, carries out dimension adjustment to corrected statistical data is cleaned, supplement is insufficient
Data dimension;The supplement inadequate to data structured is default null value or designated value.
Step C, sorting storage carry out screening several piece to dimension statistical data adjusted, including in given text,
Corresponding statistical data is sorted out by specified time zone vernier, wherein time zone vernier (elastic time zone) needs to refer to when referring to calculating
Fixed time interval variable range, unit are generally ms (millisecond), and theoretical minimum unit is 1 μ s (microsecond), general to default
1000ms preferably, each elasticity time zone statistics is primary, and is stored in document database CelDB, completes until calculating.Such as
Start in daily zero hour, the frequency of every 100 microsecond event, while the mobile advance in elastic time zone.Preferably, step C
Implementation method be vernier integration method, meet in a more flexible way in face of different demands.Wherein, document database (CelDB)
Refer to that by { time num1 num2 ... numN } format memory data, in specified document path, screening several piece, which refers to, is advising greatly
The data file block that modulus is screened in by elastic time zone, and it is stored in document database CelDB.Sorting input includes pressing class
Type, by dimension (element) output, creation document database CelDB, storing data etc..
Step D, calculate show, to after screening statistical data carry out visual presentation, be included in different charts or by
Time shaft Dynamic Display change procedure, can more preferable heuristic data rule to the displaying of change procedure.Such as can by trend, by than
Example (element) output is shown according to static broken line, by modes such as dynamic trends.
Example one: we will calculate how many event have occurred in every 1000ms equipment, can execute as follows:
1, the logout if any device-aware in certain environment is shaped like under:
2019-03-05 00:00:01.288 ASK ... (... ellipsis indicates the data of other dimensions)
2019-03-05 00:00:01.290 WRITE …
……
2019-03-05 00:00:03.258 POST …
2019-03-05 00:00:03.288 ERROR …
……
2, this data is stored in document database to be analyzed, specifying time zone vernier length is 1000ms;
Computer data formula statistical method through the invention, obtains two-dimensional data storage to database, including vernier
Time point and event frequency;
2019-03-05 00:00:01.288000 1
2019-03-05 00:00:02.288000 13
2019-03-05 00:00:03.288000 68
……
Example two: it is as follows that we will calculate certain equipment energy consumption information, can following steps analyze:
A: energy consumption speed (mg) when reading sets the equipment (number) and passes through equipment is as follows:
2019-03-15 12:00:00.001 A12345 30 ... (... ellipsis indicates the data of other dimensions)
2019-03-15 12:00:01.288123 B23456 39 …
2019-03-15 12:00:01.290 A23456 10 …
……
2019-03-15 12:01:01.258 A 34567 10 …
2019-03-15 12:01:01.258 A 33567 60 …
2019-03-15 12:01:01.258 A 35567 30 …
2019-03-15 12:01:01.288 D45678 50 …
2019-03-15 12:01:01.258 A 35567 30 …
……
B: formatting the above data time, and call the computer program statistical method, and specified time zone vernier length is
1min (minute);
C: carrying out by vernier integration method evaluation the data, and information includes two dimensions: device numbering, and energy consumption speed is (big
In 40 number);
D: automatically recording data to CelDB, such as the first vernier section (in 1 minute: [2019-03-15 12:00:
00.001000,2019-03-15 12:00:01.00100]) data, from initial time to 1 minutes point, check it is several
Equipment, 0 equipment energy consumption speed is more than 40:
2019-03-15 12:00:00.001000 3 0 ...
E: data time point 2019-03-15 12:01:01.258000 beyond after the first vernier range, and formula automatically moves
To the second vernier range [2019-03-15 12:00:01.00100,2019-03-15 12:00:02.00100], including vernier
Time point and event information three-dimensional information are shaped like under:
2019-03-15 12:00:00.001000 3 0 ...
2019-03-15 12:00:01.001000 5 2 ...
......
D: completion shows that single dimension or three dimensions are shown simultaneously to above data, shows its rule, and in data
It can give a forecast when measuring enough.
To sum up, the demand on demand screened of the present invention to solve mass data, is extracted, database for text data
Data can directly show, the present invention overcome by new data pattern existing data base high complexity and deployment O&M at
This, improves the calculating scope of application, accelerates the exploration to data rule, reduce complexity, it is easier to use.
The foregoing is a specific embodiment of the present invention, it is noted that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (8)
1. a kind of computer data formula statistical method characterized by comprising
Step A, initialization, carries out cleaning correction to statistical data;
Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;
Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;
Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.
2. computer data formula statistical method as described in claim 1, which is characterized in that the step A includes:
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, in statistical data
Keyword carry out driving replacement operation;
A2. the statistical data of construction error is corrected.
3. computer data formula statistical method as claimed in claim 2, which is characterized in that can be directly right in the step A1
The operation of text initial data, rather than db information is obtained, expand the scope of application.
4. computer data formula statistical method as claimed in claim 2, which is characterized in that the construction error is time system
Count format error.
5. computer data formula statistical method as described in claim 1, which is characterized in that the step D is to include to meter
The data structured for calculating result is stored as the multidimensional data file divided with space.
6. computer data formula statistical method as described in claim 1, which is characterized in that the step C is included in given
Text in, sort out corresponding statistical data by specified time zone vernier, each time zone vernier statistics is primary, and is stored in text
Profile database CelD B is completed until calculating.
7. computer data formula statistical method as claimed in claim 6, which is characterized in that the implementation method of the step C
For vernier integration method.
8. computer data formula statistical method as described in claim 1, which is characterized in that the step D is included in difference
In chart or temporally axis Dynamic Display change procedure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910513867.XA CN110287308A (en) | 2019-06-13 | 2019-06-13 | A kind of computer data formula statistical method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910513867.XA CN110287308A (en) | 2019-06-13 | 2019-06-13 | A kind of computer data formula statistical method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287308A true CN110287308A (en) | 2019-09-27 |
Family
ID=68004188
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910513867.XA Pending CN110287308A (en) | 2019-06-13 | 2019-06-13 | A kind of computer data formula statistical method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287308A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1926546A (en) * | 2004-03-03 | 2007-03-07 | 英国电讯有限公司 | Data handling system |
CN102346750A (en) * | 2010-08-05 | 2012-02-08 | 深圳华强数字动漫有限公司 | Three-dimensional database plug-in management system and method |
CN107908606A (en) * | 2017-10-31 | 2018-04-13 | 上海壹账通金融科技有限公司 | Method and system based on different aforementioned sources automatic report generation |
CN108021664A (en) * | 2017-12-04 | 2018-05-11 | 北京工商大学 | A kind of multidimensional data correlation visual analysis method and system based on dimensional projections |
CN108446391A (en) * | 2018-03-23 | 2018-08-24 | 万帮充电设备有限公司 | Processing method, device, electronic equipment and the computer-readable medium of data |
CN109408549A (en) * | 2018-11-02 | 2019-03-01 | 大连瀚闻资讯有限公司 | Foreign trade big data Visualized Analysis System |
CN111651758A (en) * | 2020-06-08 | 2020-09-11 | 成都安恒信息技术有限公司 | Method for auditing result set of relational database of operation and maintenance auditing system |
-
2019
- 2019-06-13 CN CN201910513867.XA patent/CN110287308A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1926546A (en) * | 2004-03-03 | 2007-03-07 | 英国电讯有限公司 | Data handling system |
CN102346750A (en) * | 2010-08-05 | 2012-02-08 | 深圳华强数字动漫有限公司 | Three-dimensional database plug-in management system and method |
CN107908606A (en) * | 2017-10-31 | 2018-04-13 | 上海壹账通金融科技有限公司 | Method and system based on different aforementioned sources automatic report generation |
CN108021664A (en) * | 2017-12-04 | 2018-05-11 | 北京工商大学 | A kind of multidimensional data correlation visual analysis method and system based on dimensional projections |
CN108446391A (en) * | 2018-03-23 | 2018-08-24 | 万帮充电设备有限公司 | Processing method, device, electronic equipment and the computer-readable medium of data |
CN109408549A (en) * | 2018-11-02 | 2019-03-01 | 大连瀚闻资讯有限公司 | Foreign trade big data Visualized Analysis System |
CN111651758A (en) * | 2020-06-08 | 2020-09-11 | 成都安恒信息技术有限公司 | Method for auditing result set of relational database of operation and maintenance auditing system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2963570A1 (en) | Dynamic selection of source table for db rollup aggregation and query rewrite based on model driven definitions and cardinality estimates | |
US11615076B2 (en) | Monolith database to distributed database transformation | |
CN102667774A (en) | Compensating for unbalanced hierarchies when generating olap queries from report specifications | |
CN109871406B (en) | Design method of general monitoring report platform | |
US20210256063A1 (en) | Ad-hoc graph definition | |
US20190325045A1 (en) | Schema data structure | |
US10394844B2 (en) | Integrating co-deployed databases for data analytics | |
US20190391964A1 (en) | Specifying and applying rules to data | |
US9058215B2 (en) | Integration of a calculation engine with a software component | |
CN110287308A (en) | A kind of computer data formula statistical method | |
US20150134660A1 (en) | Data clustering system and method | |
CN106339293B (en) | A kind of log event extracting method based on signature | |
US11144373B2 (en) | Data pipeline using a pluggable topology connecting components without altering code of the components | |
CN107958046A (en) | Internet finance big data warehouse analysis mining method | |
US20200372019A1 (en) | System and method for automatic completion of queries using natural language processing and an organizational memory | |
CN111159221A (en) | Method for data processing or query through dynamically constructing cube | |
US20210374771A1 (en) | Data analysis support apparatus and data analysis support method | |
JP6346378B2 (en) | Data management apparatus and data management method | |
CN114860819A (en) | Method, device, equipment and storage medium for constructing business intelligent system | |
CN110399396A (en) | Efficient data processing | |
CN107832282A (en) | A kind of implementation method for defining Visual Report Forms | |
US3662402A (en) | Data sort method utilizing finite difference tables | |
US20240176787A1 (en) | Tables time zone adjuster | |
US20220391751A1 (en) | Uncertainty determination | |
US12001710B2 (en) | Dynamic update of consolidated data based on granular data values |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190927 |
|
WD01 | Invention patent application deemed withdrawn after publication |