CN110287308A - A kind of computer data formula statistical method - Google Patents

A kind of computer data formula statistical method Download PDF

Info

Publication number
CN110287308A
CN110287308A CN201910513867.XA CN201910513867A CN110287308A CN 110287308 A CN110287308 A CN 110287308A CN 201910513867 A CN201910513867 A CN 201910513867A CN 110287308 A CN110287308 A CN 110287308A
Authority
CN
China
Prior art keywords
data
statistical
computer data
statistical method
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910513867.XA
Other languages
Chinese (zh)
Inventor
薛映杜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910513867.XA priority Critical patent/CN110287308A/en
Publication of CN110287308A publication Critical patent/CN110287308A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of computer data formula statistical methods, comprising: step A, initialization carry out cleaning correction to statistical data;Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.The demand on demand screened of the present invention to solve mass data, it is extracted for text data, database data can directly be shown, the high complexity and deployment O&M cost of existing data base are overcome by new data pattern, improve the calculating scope of application, the exploration to data rule is accelerated, complexity is reduced, it is easier to be used.

Description

A kind of computer data formula statistical method
Technical field
The present invention relates to data structure computing technique fields, more particularly to a kind of computer data formula statistical method.
Background technique
MongoDB is the database based on distributed document storage, is write by C Plus Plus, it is intended to mention for WEB application For expansible high-performance data storage solution.
MongoDB is a product between relational database and non-relational database, is that non-relational database is worked as Middle function is most abundant, is most like relational database.The data structure that it is supported is very loose, is the bson format of similar json, Therefore it can store more complicated data type.The feature of Mongo maximum is that the query language that it is supported is very powerful, Grammer is somewhat similarly to the query language of object-oriented, and the overwhelming majority of similarity relation database list table inquiry almost may be implemented Function, but also support to establish data and index.
Existing open source technology MongoDB is the non-relational database based on distributed document storage, is had loosely Key-value pair data structure.However this technology is just for the solution of Distributed Storage, it is special comprising the calculating to data It is not to show for the global regularity in (elasticity) variable length time interval, while it is more more flexible than the calculation of template formula, it can It is existing defects when being extracted to text data, database data can not be accomplished directly to show.
Summary of the invention
Technical problem to be solved by the present invention lies in: a kind of computer data formula statistical method is provided, to solve The demand of mass data screened on demand, is extracted for text data, and database data can directly be shown, improves calculating The scope of application accelerates the exploration to data rule, reduces complexity, it is easier to use.
In order to solve the above technical problems, the invention proposes a kind of computer data formula statistical methods, comprising:
Step A, initialization, carries out cleaning correction to statistical data;
Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;
Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;
Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.
Further, the step A includes:
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, to statistical number Keyword in carries out driving replacement operation;
A2. the statistical data of construction error is corrected.
Further, directly text initial data can be operated in the step A1, rather than obtains db information, expanded The scope of application.
Further, the construction error is the time to count format error.
Further, it is default null value or designated value that the step B, which includes the supplement inadequate to data structured,.
Further, the step C includes sorting out corresponding statistics by specified time zone vernier in given text Data, each time zone vernier statistics is primary, and is stored in document database CelDB, completes until calculating.
Further, the implementation method of the step C is vernier integration method.
Further, the step D is included in different charts or temporally axis Dynamic Display change procedure.
Above-mentioned technical proposal at least has the following beneficial effects:
1. the demand on demand screened of the present invention to solve mass data, is extracted, database number for text data According to can directly show.
2. high complexity and deployment O&M cost that the present invention overcomes existing data base by new data pattern.
3. for presently, there are high complicated dynamic behaviour problem, the present invention only need to do determination to elastic time zone, can be to There are complicated chaotic data to do counting statistics and intuitive displaying.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The present invention is described further below.
Computer data formula statistical method of the invention the following steps are included:
Step A, initialization, carries out cleaning correction to statistical data.
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, to statistical number Keyword in carries out driving replacement operation, in actual count, can directly operate, rather than obtain to text initial data Db information expands the scope of application.
A2. the statistical data of time statistics format error is corrected, for example default time statistics format is " 2019- 03-28 11:59:51.114758 ", when the time format in statistical data be " 11:59:51 " either " 2019-03-28 " or When " 2019-03-2811:59:51 ", can all it be corrected as " when the moon in year-- day-point-second-microsecond ".Cleaning correction includes creation Path to be written such as checks file to be sorted, creation database, pre-reads by type at the modes.
Step B is formatted, and creates structured data, carries out dimension adjustment to corrected statistical data is cleaned, supplement is insufficient Data dimension;The supplement inadequate to data structured is default null value or designated value.
Step C, sorting storage carry out screening several piece to dimension statistical data adjusted, including in given text, Corresponding statistical data is sorted out by specified time zone vernier, wherein time zone vernier (elastic time zone) needs to refer to when referring to calculating Fixed time interval variable range, unit are generally ms (millisecond), and theoretical minimum unit is 1 μ s (microsecond), general to default 1000ms preferably, each elasticity time zone statistics is primary, and is stored in document database CelDB, completes until calculating.Such as Start in daily zero hour, the frequency of every 100 microsecond event, while the mobile advance in elastic time zone.Preferably, step C Implementation method be vernier integration method, meet in a more flexible way in face of different demands.Wherein, document database (CelDB) Refer to that by { time num1 num2 ... numN } format memory data, in specified document path, screening several piece, which refers to, is advising greatly The data file block that modulus is screened in by elastic time zone, and it is stored in document database CelDB.Sorting input includes pressing class Type, by dimension (element) output, creation document database CelDB, storing data etc..
Step D, calculate show, to after screening statistical data carry out visual presentation, be included in different charts or by Time shaft Dynamic Display change procedure, can more preferable heuristic data rule to the displaying of change procedure.Such as can by trend, by than Example (element) output is shown according to static broken line, by modes such as dynamic trends.
Example one: we will calculate how many event have occurred in every 1000ms equipment, can execute as follows:
1, the logout if any device-aware in certain environment is shaped like under:
2019-03-05 00:00:01.288 ASK ... (... ellipsis indicates the data of other dimensions)
2019-03-05 00:00:01.290 WRITE …
……
2019-03-05 00:00:03.258 POST …
2019-03-05 00:00:03.288 ERROR …
……
2, this data is stored in document database to be analyzed, specifying time zone vernier length is 1000ms;
Computer data formula statistical method through the invention, obtains two-dimensional data storage to database, including vernier Time point and event frequency;
2019-03-05 00:00:01.288000 1
2019-03-05 00:00:02.288000 13
2019-03-05 00:00:03.288000 68
……
Example two: it is as follows that we will calculate certain equipment energy consumption information, can following steps analyze:
A: energy consumption speed (mg) when reading sets the equipment (number) and passes through equipment is as follows:
2019-03-15 12:00:00.001 A12345 30 ... (... ellipsis indicates the data of other dimensions)
2019-03-15 12:00:01.288123 B23456 39 …
2019-03-15 12:00:01.290 A23456 10 …
……
2019-03-15 12:01:01.258 A 34567 10 …
2019-03-15 12:01:01.258 A 33567 60 …
2019-03-15 12:01:01.258 A 35567 30 …
2019-03-15 12:01:01.288 D45678 50 …
2019-03-15 12:01:01.258 A 35567 30 …
……
B: formatting the above data time, and call the computer program statistical method, and specified time zone vernier length is 1min (minute);
C: carrying out by vernier integration method evaluation the data, and information includes two dimensions: device numbering, and energy consumption speed is (big In 40 number);
D: automatically recording data to CelDB, such as the first vernier section (in 1 minute: [2019-03-15 12:00: 00.001000,2019-03-15 12:00:01.00100]) data, from initial time to 1 minutes point, check it is several Equipment, 0 equipment energy consumption speed is more than 40:
2019-03-15 12:00:00.001000 3 0 ...
E: data time point 2019-03-15 12:01:01.258000 beyond after the first vernier range, and formula automatically moves To the second vernier range [2019-03-15 12:00:01.00100,2019-03-15 12:00:02.00100], including vernier Time point and event information three-dimensional information are shaped like under:
2019-03-15 12:00:00.001000 3 0 ...
2019-03-15 12:00:01.001000 5 2 ...
......
D: completion shows that single dimension or three dimensions are shown simultaneously to above data, shows its rule, and in data It can give a forecast when measuring enough.
To sum up, the demand on demand screened of the present invention to solve mass data, is extracted, database for text data Data can directly show, the present invention overcome by new data pattern existing data base high complexity and deployment O&M at This, improves the calculating scope of application, accelerates the exploration to data rule, reduce complexity, it is easier to use.
The foregoing is a specific embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (8)

1. a kind of computer data formula statistical method characterized by comprising
Step A, initialization, carries out cleaning correction to statistical data;
Step B is formatted, and is carried out dimension adjustment to corrected statistical data is cleaned, is supplemented insufficient data dimension;
Step C, sorting storage, carries out screening several piece to dimension statistical data adjusted;
Step D is calculated and is shown, carries out visual presentation to the statistical data after screening.
2. computer data formula statistical method as described in claim 1, which is characterized in that the step A includes:
A1. desensitization process is carried out to the sensitive data in the statistical data for being unsatisfactory for structured data format, in statistical data Keyword carry out driving replacement operation;
A2. the statistical data of construction error is corrected.
3. computer data formula statistical method as claimed in claim 2, which is characterized in that can be directly right in the step A1 The operation of text initial data, rather than db information is obtained, expand the scope of application.
4. computer data formula statistical method as claimed in claim 2, which is characterized in that the construction error is time system Count format error.
5. computer data formula statistical method as described in claim 1, which is characterized in that the step D is to include to meter The data structured for calculating result is stored as the multidimensional data file divided with space.
6. computer data formula statistical method as described in claim 1, which is characterized in that the step C is included in given Text in, sort out corresponding statistical data by specified time zone vernier, each time zone vernier statistics is primary, and is stored in text Profile database CelD B is completed until calculating.
7. computer data formula statistical method as claimed in claim 6, which is characterized in that the implementation method of the step C For vernier integration method.
8. computer data formula statistical method as described in claim 1, which is characterized in that the step D is included in difference In chart or temporally axis Dynamic Display change procedure.
CN201910513867.XA 2019-06-13 2019-06-13 A kind of computer data formula statistical method Pending CN110287308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910513867.XA CN110287308A (en) 2019-06-13 2019-06-13 A kind of computer data formula statistical method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910513867.XA CN110287308A (en) 2019-06-13 2019-06-13 A kind of computer data formula statistical method

Publications (1)

Publication Number Publication Date
CN110287308A true CN110287308A (en) 2019-09-27

Family

ID=68004188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910513867.XA Pending CN110287308A (en) 2019-06-13 2019-06-13 A kind of computer data formula statistical method

Country Status (1)

Country Link
CN (1) CN110287308A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1926546A (en) * 2004-03-03 2007-03-07 英国电讯有限公司 Data handling system
CN102346750A (en) * 2010-08-05 2012-02-08 深圳华强数字动漫有限公司 Three-dimensional database plug-in management system and method
CN107908606A (en) * 2017-10-31 2018-04-13 上海壹账通金融科技有限公司 Method and system based on different aforementioned sources automatic report generation
CN108021664A (en) * 2017-12-04 2018-05-11 北京工商大学 A kind of multidimensional data correlation visual analysis method and system based on dimensional projections
CN108446391A (en) * 2018-03-23 2018-08-24 万帮充电设备有限公司 Processing method, device, electronic equipment and the computer-readable medium of data
CN109408549A (en) * 2018-11-02 2019-03-01 大连瀚闻资讯有限公司 Foreign trade big data Visualized Analysis System
CN111651758A (en) * 2020-06-08 2020-09-11 成都安恒信息技术有限公司 Method for auditing result set of relational database of operation and maintenance auditing system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1926546A (en) * 2004-03-03 2007-03-07 英国电讯有限公司 Data handling system
CN102346750A (en) * 2010-08-05 2012-02-08 深圳华强数字动漫有限公司 Three-dimensional database plug-in management system and method
CN107908606A (en) * 2017-10-31 2018-04-13 上海壹账通金融科技有限公司 Method and system based on different aforementioned sources automatic report generation
CN108021664A (en) * 2017-12-04 2018-05-11 北京工商大学 A kind of multidimensional data correlation visual analysis method and system based on dimensional projections
CN108446391A (en) * 2018-03-23 2018-08-24 万帮充电设备有限公司 Processing method, device, electronic equipment and the computer-readable medium of data
CN109408549A (en) * 2018-11-02 2019-03-01 大连瀚闻资讯有限公司 Foreign trade big data Visualized Analysis System
CN111651758A (en) * 2020-06-08 2020-09-11 成都安恒信息技术有限公司 Method for auditing result set of relational database of operation and maintenance auditing system

Similar Documents

Publication Publication Date Title
EP2963570A1 (en) Dynamic selection of source table for db rollup aggregation and query rewrite based on model driven definitions and cardinality estimates
US11615076B2 (en) Monolith database to distributed database transformation
CN102667774A (en) Compensating for unbalanced hierarchies when generating olap queries from report specifications
CN109871406B (en) Design method of general monitoring report platform
US20210256063A1 (en) Ad-hoc graph definition
US20190325045A1 (en) Schema data structure
US10394844B2 (en) Integrating co-deployed databases for data analytics
US20190391964A1 (en) Specifying and applying rules to data
US9058215B2 (en) Integration of a calculation engine with a software component
CN110287308A (en) A kind of computer data formula statistical method
US20150134660A1 (en) Data clustering system and method
CN106339293B (en) A kind of log event extracting method based on signature
US11144373B2 (en) Data pipeline using a pluggable topology connecting components without altering code of the components
CN107958046A (en) Internet finance big data warehouse analysis mining method
US20200372019A1 (en) System and method for automatic completion of queries using natural language processing and an organizational memory
CN111159221A (en) Method for data processing or query through dynamically constructing cube
US20210374771A1 (en) Data analysis support apparatus and data analysis support method
JP6346378B2 (en) Data management apparatus and data management method
CN114860819A (en) Method, device, equipment and storage medium for constructing business intelligent system
CN110399396A (en) Efficient data processing
CN107832282A (en) A kind of implementation method for defining Visual Report Forms
US3662402A (en) Data sort method utilizing finite difference tables
US20240176787A1 (en) Tables time zone adjuster
US20220391751A1 (en) Uncertainty determination
US12001710B2 (en) Dynamic update of consolidated data based on granular data values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190927

WD01 Invention patent application deemed withdrawn after publication