CN111143433A - Method and device for counting data of data bins - Google Patents
Method and device for counting data of data bins Download PDFInfo
- Publication number
- CN111143433A CN111143433A CN201911263035.3A CN201911263035A CN111143433A CN 111143433 A CN111143433 A CN 111143433A CN 201911263035 A CN201911263035 A CN 201911263035A CN 111143433 A CN111143433 A CN 111143433A
- Authority
- CN
- China
- Prior art keywords
- data
- field
- counting
- target
- statistical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000008859 change Effects 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 23
- 230000008569 process Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 230000002159 abnormal effect Effects 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the application provides a method and a device for counting data of a data bin, wherein the method for counting the data of the data bin comprises the following steps: acquiring table structure information corresponding to a target data table in a data bin, wherein the table structure information comprises field information, and the field information comprises a field name of a target field; generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting record values of each row of one or more field values of the target field; and executing the statistical statement to acquire the number of records corresponding to the one or more field values. By implementing the embodiment of the application, the data of the data warehouse can be quickly and conveniently counted, and the distribution of the field values of the data warehouse can be efficiently counted.
Description
Technical Field
The present application relates to the field of data warehouse technology, and in particular, to a method and an apparatus for counting data of a data warehouse.
Background
Because the data distribution change of the data warehouse is often monitored in the data warehouse system, the obtained monitoring data can be used for metadata display of the data warehouse, for example, value distribution of a certain field can be quickly checked for an analyst, or data abnormality caused by human misoperation or bug of a running script and the like can be timely discovered and timely processed.
But one data warehouse corresponds to multiple data sources. As data in a data warehouse continues to increase, the data warehouse accordingly has a number of tasks. A large amount of new data is stored in the data warehouse every day, and if an abnormality occurs in the daily task processing process, the quality of the data is affected, and the data in the next time can be affected. Therefore, it is very important for the data warehouse to find the abnormality of the data in the processing process in time and process the data.
Therefore, how to quickly and conveniently count data of the data bins is a problem to be solved urgently.
Disclosure of Invention
In view of the above, the present application is directed to a method and apparatus for statistical data bin data that overcomes, or at least partially solves, the above-mentioned problems.
In a first aspect, an embodiment of the present application provides a method for counting data of a data bin, which may include:
acquiring table structure information corresponding to a target data table in a data bin, wherein the table structure information comprises field information, and the field information comprises a field name of a target field;
generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting record values of each row of one or more field values of the target field;
and executing the statistical statement to acquire the number of records corresponding to the one or more field values.
Through the method provided by the first aspect, the embodiment of the application can automatically count the distribution of the field values of the data warehouse with high performance by acquiring the table structure information corresponding to the target data table in the data warehouse, then generating the statistical statement according to the target field information in the table structure information, and finally executing the statistical statement to acquire the number of records corresponding to one or more field values of the target field. The statistical statement can be generated by direct and automatic splicing according to the field names in the field information, the value distribution of a plurality of fields can be simultaneously counted, and the performance consumption for running the statistical statement is low, so that the abnormal early warning and the rapid and convenient statistical data bin data can be further generated by changing the service data source, and the method is very effective for guaranteeing the stability and accuracy of the data warehouse.
In one possible implementation, the data bin includes M data tables, wherein the target data table is one of the M data tables, and the method further includes: counting the number of records corresponding to one or more field values of all fields in each of the M data tables, and generating a counting result; after comparing the statistical results in each of N preset periods, determining a change trend of the statistical results in the N preset periods, wherein the change trend comprises one or more of increment, acceleration, decrement and deceleration of the record number corresponding to one or more field values corresponding to all the fields, and N is a positive integer greater than 1; and if the change trend exceeds a preset change threshold, generating warning information, wherein the warning information is used for indicating the change trend of the record number in the N preset periods.
In one possible implementation, the statistics statement comprises a count function statement that returns the statistics result in JSON format.
In one possible implementation, the method further includes: if the number of records corresponding to the target field value exceeds a first preset threshold, stopping counting the target field value, and marking the first preset threshold as the number of records of the target field value and returning the counting result, wherein the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
In one possible implementation, the method further includes: and if the number of records corresponding to the target field value exceeds a second preset threshold, stopping continuously inserting field data corresponding to the target field value in the data table corresponding to the target field value, wherein the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
In a second aspect, an embodiment of the present application provides an apparatus for counting data of a data bin, where the apparatus includes:
the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring table structure information corresponding to a target data table in a data bin, the table structure information comprises field information, and the field information comprises a field name of a target field;
the generating unit is used for generating a statistical statement according to the field name of the target field, and the statistical statement is used for counting the record values of each row of one or more field values of the target field;
and the execution unit is used for executing the statistical statement and acquiring the number of records corresponding to the one or more field values.
In one possible implementation manner, the data bin includes M data tables, wherein the target data table is one of the M data tables, and the apparatus further includes: the statistical unit is used for counting the record number corresponding to one or more field values of all the fields in each of the M data tables and generating a statistical result; the comparison unit is used for determining the change trend of the statistical result in N preset periods after comparing the statistical result in each of the N preset periods, wherein the change trend comprises one or more of increment, acceleration, decrement and deceleration of the record number corresponding to one or more field values corresponding to all the fields, and N is a positive integer greater than 1; and the warning unit is used for generating warning information if the change trend exceeds a preset change threshold, wherein the warning information is used for indicating the change trend of the record number in the N preset periods.
In one possible implementation, the statistics statement comprises a count function statement that returns the statistics result in JSON format.
In one possible implementation, the apparatus further includes: the device comprises a first stopping unit, a second stopping unit, a third stopping unit and a fourth stopping unit, wherein the first stopping unit is used for stopping statistics of a target field value if the number of records corresponding to the target field value exceeds a first preset threshold, and marking the first preset threshold as the number of records of the target field value to return the statistical result, and the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
In one possible implementation, the apparatus further includes: a second stopping unit, configured to stop continuing to insert field data corresponding to a target field value in a data table corresponding to the target field value if a number of records corresponding to the target field value exceeds a second preset threshold, where the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
In a third aspect, an embodiment of the present application provides an apparatus for counting data of a data bin, including a storage component, a processing component and a communication component, where the storage component is used for storing a computer program, and the communication component is used for performing information interaction with an external device; the processing component is configured to invoke the computer program to execute the method according to the first aspect, which is not described herein again
In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is executed by a processor to implement the method of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
FIG. 1 is a schematic diagram of a system architecture for statistical data bin data provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a process flow of a method for counting data of a data bin according to an embodiment of the present application;
FIG. 3A is a schematic diagram of another method flow for counting data of a data bin according to an embodiment of the present application;
FIG. 3B is a schematic diagram of an interface for determining statistics of a target data table according to an embodiment of the present disclosure;
fig. 3C is a schematic interface diagram of a terminal receiving warning information according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an apparatus for counting data of a data bin according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of another apparatus for counting data of a data bin according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
The terms "first," "second," and "third," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, "include" and "have" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As used in this application, the terms "server," "unit," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a server may be, but is not limited to, a processor, a data processing platform, a computing device, a computer, two or more computers, and the like.
First, some terms in the present application are explained so as to be easily understood by those skilled in the art.
(1) The Data Warehouse, known in english under the name Data Warehouse, may be abbreviated as DW or DWH. Data warehouses are structured data environments for decision support systems (dss) and online analytical application data sources. Data warehouses research and solve the problem of obtaining information from databases. Data warehouses are characterized by theme-oriented, integrated, stable, and time-varying properties. A data warehouse is a strategic set that provides all types of data support for all levels of decision-making processes of an enterprise. It is a single data store created for analytical reporting and decision support purposes, providing guidance on business process improvements, monitoring time, cost, quality, and control for enterprises that need business intelligence.
(2) The table structure is information defining the name of a table, the field, column name, data type, length, whether the table can be empty, and the like, wherein the field, the type, the primary key, the foreign key and the index form the table structure of the database. The data table is composed of three parts of a table name, fields in the table and records of the table. Designing a data table structure is to define the file name of the data table, determine which fields the data table contains, the field name, the field type, and the width of each field, and input these data into the computer.
(3) JSON (JSON object Notation) is a lightweight data exchange format. It stores and represents data in a text format that is completely independent of the programming language, based on a subset of the js specification (ECMAScript) set by the european computer association. The compact and clear hierarchy makes JSON an ideal data exchange language. The network transmission efficiency is effectively improved, and the machine analysis and generation are easy while the reading and the writing are easy.
(4) The COUNT function, i.e. the COUNT function, may be used in a database (sql server or access) to COUNT the number of eligible data pieces. When the function COUNT COUNTs, numerical figures are calculated; but error values, null values, logical values, dates, text are ignored. If the parameter is an array or reference, only counting the numbers in the array or reference; empty cells, logical values, literal or error values in the array or references are ignored. If a logical, literal, or error value is to be counted, the function COUNTA is used.
(5) Spark SQL, which is a module of Spark, processes structured data such as txt, json, etc., and can support a large number of data sources and data analysis algorithms. Spark SQL can fuse the structured data management capabilities of traditional relational databases and the data processing capabilities of machine learning algorithms. The DataFrame (namely the RDD with the Schema information) is added, so that a user can execute the SQL statement in Spark SQL, and the data can be from the RDD, or from external data sources such as Hive, HDFS, Cassandra, or JSON-formatted data. Spark SQL provides a DataFrame API that can perform various relational operations on various data sources, both internal and external.
Next, a description is given of a system architecture of statistical data bin data based on the embodiment of the present application.
In a first case, the device 101 for statistics of data warehouse data may be a server in the cloud, and the server and the local terminal form a system, please refer to fig. 1, fig. 1 is a schematic diagram of a system architecture for statistics of data warehouse data provided in an embodiment of the present application, as shown in fig. 1, the system architecture may include one or more servers (a plurality of servers may form a server cluster), and one or more terminals (or devices), which includes: a device 101 for counting data bins and a terminal device 102.
The device 101 for counting data of the data bin may include, but is not limited to, a backend server, a component server, a data processing server, and the like, when the device 101 for counting data of the data bin is a server, the server may communicate with a plurality of terminals through the internet, and the server also needs to run a corresponding server-side program to provide services for counting data of the data bin, such as a database query service, data counting, decision execution, and the like. For example, the server may obtain table structure information corresponding to a target data table in the data bin, where the table structure information includes field information, where the field information includes a field name of the target field; generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting records in each row corresponding to one or more field values of the target field in the table structure; and executing the statistical statement to acquire the record number corresponding to the one or more field values respectively. Optionally, the server may further include a data statistics module as a main core function, which provides a user data statistics function, a data modification and data statistics result storage function, an existing data table viewing function, and an online and offline and data statistics result sharing function.
The terminal device 102 may install and run the relevant applications. An application is a program that corresponds to a server and provides local services to a client. Here, the local service may include, but is not limited to: sending data statistics result information (for example, the number of records corresponding to one or more field values respectively) to the server, receiving information sent by the server (for example, obtaining data statistics information in a data bin) and other shared information, and the like. The terminal in this embodiment may include, but is not limited to, any electronic product based on an intelligent operating system, which may perform human-computer interaction with a user through an input device such as a keyboard, a virtual keyboard, a touch pad, a touch screen, and a voice control device, such as a smart phone, a tablet computer, and a personal computer. Smart operating systems include, but are not limited to, any operating system that enriches device functionality by providing various mobile applications to a mobile device, such as: android (android), iOSTM, Windows Phone, etc.
In case two, the system architecture may be a device, which may be a local terminal, and the terminal may install and run the relevant application. An application is a program that corresponds to a server and provides local services to a client. For example, the terminal may obtain table structure information corresponding to a target data table in a data bin, where the table structure information includes field information, where the field information includes a field name of the target field; generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting records in each row corresponding to one or more field values of the target field in the table structure; and executing the statistical statement to acquire the record number corresponding to the one or more field values respectively. The terminal in this embodiment may include, but is not limited to, any electronic product based on an intelligent operating system, which may perform human-computer interaction with a user through an input device such as a keyboard, a virtual keyboard, a touch pad, a touch screen, and a voice control device, such as a smart phone, a tablet computer, and a personal computer. The smart operating system includes, but is not limited to, any operating system that enriches device functionality by providing various mobile applications to the mobile device, such as android (android), ios, Windows phone, and the like.
It is further understood that the system architecture of the statistical data bin data of fig. 1 is only a partial exemplary implementation in the embodiments of the present application, and the system architecture of the statistical data bin data in the embodiments of the present application includes, but is not limited to, the above architecture of the statistical data bin data.
Referring to fig. 2, fig. 2 is a schematic diagram of a flow of a method for counting data of a data bin according to an embodiment of the present application. Applicable to the system of fig. 1 described above, will be described below in connection with fig. 2 from a single side of the means 101 for statistical data-bin data. The method may include the following steps S201 to S203.
Step S201: and acquiring the table structure information corresponding to the target data table in the data bin.
Specifically, the apparatus for counting data of a data bin may obtain table structure information corresponding to a target data table in the data bin, where the table structure information includes field information, and the field information includes a field name of the target field. It should be noted that the table structure information is information defining a table name, a table field, a column name, a data type, a length, whether it can be empty, and the like, where the field, the type, the primary key, the foreign key, and the index, and these basic attributes constitute the table structure of the database. It is understood that the database and the data bin in the present application can be considered as the same description.
Optionally, the device for counting data of the data bin may store each field in the array through the structural information of the lookup table, so as to facilitate automatic generation of the structural information of the lookup table in subsequent statistics statements. When the table structure information is inquired, a show create table which is a self-contained statement can be directly used for obtaining, then the show create table is stored in a text, and then the text is analyzed by using script languages such as a shell or python, and the like, and also the Spark SQL can be used, because the Spark SQL can directly read metadata information such as the table structure and the like
Step S202: and generating a statistical statement according to the field name of the target field.
In particular, the means for counting data bin data may generate a statistical statement for counting each row of records corresponding to one or more field values of the target field according to the field name of the target field. It is understood that one target field may correspond to a plurality of field values. For example: the target field is male, and the field value may be 0-18 year old male, 19-36 year old male, 37-54 year old male, and male over 55 year old. The statistical statement is used for counting the number of records corresponding to the field value of the target field, namely statistics can be carried out, wherein the field value is the number of records of men aged 0-18, 19-36, 37-54 and above 55.
Optionally, the statistical statement generated by the means for counting data bin data comprises a counting function statement identifying the statistical result in JSON format. It can be understood that the obtained table structure information is spliced to generate a statistical statement, and the statement is stored as create table my _ table _ cnt as select; value _ count (field 1); value _ count (field 2); value _ count (field 3), … from my _ table. Here, field 1, field 2, field 3, etc. are field names contained in the table structure; the value _ count function is a counting function and is used for counting the number of records of each field in the statistical data table; the my _ table is an identification indicating a corresponding data table. Optionally, when generating the statistical statement, the statistical statement of other functions such as max, min, avg, and the like may also be added to be used for counting the maximum, minimum, or average value, and the like, of the record number of each field in the data table. It should be further noted that the JSON format is a lightweight data exchange format, and the number of records in each field in the data table can be counted more conveniently and quickly by using the statistical statement of the JSON format for returning the statistical result.
Step S203: and executing the statistical statement to acquire the record number corresponding to the one or more field values respectively.
Specifically, the device for counting data of the data bin executes the counting statement to obtain the number of records corresponding to the one or more field values respectively. For example, when the field 1 is sex, the apparatus for counting data of data bins counts how many records with a value of 'male' and a value of 'female' (not statistically empty records) are in all the data of the field, and returns in json format. Eg { 'Male': 10001, 'female': 10000 }. Of all data representing this data table, there were 10001 records of value 'male' and 10000 records of value 'female'. The statistical statement can realize one or more field value distribution statistics of each field, and therefore the statistical performance is improved.
By implementing the embodiment of the application, the distribution of the field values of the data warehouse can be counted automatically with high performance by acquiring the table structure information corresponding to the target data table in the data warehouse, then determining the statistical statement according to the target field information in the table structure information, and finally executing the statistical statement to acquire the number of records corresponding to one or more field values of the target field. The statistical statement can be generated by direct and automatic splicing according to the field names in the field information, the value distribution of a plurality of fields can be simultaneously counted, and the performance consumption for running the statistical statement is low, so that the abnormal early warning and the rapid and convenient statistical data bin data can be further generated by changing the service data source, and the method is very effective for guaranteeing the stability and accuracy of the data warehouse.
Referring to fig. 3A, fig. 3A is a schematic diagram of a flow chart of another method for counting data of a data bin according to an embodiment of the present application. Applicable to the system of FIG. 1 described above, will be described below in connection with FIG. 3A from a single side of the device 101 for statistical data storage of data. The method may comprise the following steps S301-S306.
Step S301: and acquiring the table structure information corresponding to the target data table in the data bin.
Step S302: and generating a statistical statement according to the field name of the target field.
Step S301: and executing a statistical statement to obtain the number of records corresponding to the one or more field values.
Specifically, the implementation of steps S301 to S303 may also be correspondingly described with reference to steps S201 to S203 in fig. 2, and details are not repeated here.
Step S304: and counting the number of records corresponding to one or more field values of all the fields in each of the M data tables, and generating a counting result.
Specifically, the device for counting data of a data bin may generate a statistical result after counting the number of records corresponding to one or more field values of all fields in each data table in the data bin, where the data bin includes M data tables, and the target data table is one of the M data tables. It should be noted that the statistical result may include all the records corresponding to all the fields of the data table in the database in one period. Referring to fig. 3B, fig. 3B is a schematic diagram of an interface for determining a statistical result of a target data table according to an embodiment of the present application, where the target data table is one of M data tables. As shown, for example: the target data table has a group of data, records with field names of marital can be counted at present, and the number of the records with the field values of negative is 5, and the number of the records with the field values of positive is 4. Another example is: the records with the field names of the sexes can be counted, and the number of records with the field values of male is 7, and the number of records with the field values of female is 2. Another example is: if there are 3 data tables in the database, each data table has 4 fields, and each field of each data table corresponds to 3 field values, then the total data statistics result of the database includes that all 36 field values correspond to a plurality of record numbers, and the statistics statement can count the records corresponding to the 36 field values at the same time.
Optionally, the statistical statement is a counting function implemented by using a Hive (distributed database) UADF (user-defined aggregation function), so that when the statistical result is generated by executing the statement, actually, multiple machines simultaneously count data files corresponding to different table structures, and the results of each machine are combined to obtain a final statistical result. The function is realized by putting the statistical result of each machine into a HashMap, then combining all HashMaps, and finally returning and converting into a json format.
Optionally, the device for counting the data in the data warehouse can output the counting result to a visualization tool, generate a report and provide the report to an analyst, and also can set a change threshold value by comparing the change of the counting data, and timely inform relevant staff to check and repair when the change is abnormal.
Step S305: and after comparing the statistical results of each period in the N preset periods, determining the variation trend of the statistical results in the N preset periods.
Specifically, after comparing the statistical result in each of N preset periods, the device for counting data in a data bin may determine a change trend of the statistical result in the N preset periods, where the change trend includes one or more of increment, decrement, and decrement of the number of records corresponding to one or more field values corresponding to all the fields, where N is a positive integer greater than 1. It should be noted that, a statistical result is generated in each period, and data change of each data table in the database can be clearly observed by comparing statistical results in different periods, so that the purpose of monitoring database data can be achieved, further, it can be avoided that data quality is affected due to abnormal conditions occurring in the task processing process, and it is avoided that data in the next time is affected by error data.
Optionally, a data statistics table may be determined in a preset period, and therefore determining the data statistics table according to the data statistics result further includes: comparing the variation trend of the M data statistical tables in the N preset periods, and if the variation trend exceeds a preset variation threshold, sending out a warning, wherein the variation trend comprises: increment, speed increase.
Step S306: and if the variation trend exceeds a preset variation threshold value, generating warning information.
Specifically, if the device for counting data of the data bin determines that the variation trend exceeds a preset variation threshold, warning information is generated, where the warning information is used to indicate the variation trend of the record count in the N preset periods. Referring to fig. 3C, fig. 3C is a schematic view of an interface for a terminal to receive warning information according to an embodiment of the present application. It is understood that the warning information may include abnormal variation trends of M data statistics tables, i.e. the number of records of a field exceeding a preset variation threshold, and the field name of the field and the table structure information to which the field belongs.
Optionally, if the number of records corresponding to the target field value exceeds a first preset threshold, stopping statistics on the target field value, and marking the first preset threshold as the number of records of the target field value and returning the statistics result, where the target field value is any one of one or more field values corresponding to all fields in each of the M data tables. It is understood that the statistical statement may count the field value of a certain field in a specified data table statistically, but if there are hundreds of millions of field values to be counted, the statistical result cannot be saved, so this function gives up on the precision of the result, and when in the statistical process, if the number of values exceeds 100 (the number can be configured), the new value encountered next is discarded (that is, if there are too many kinds of values, 100 values are randomly taken).
Optionally, if the number of records corresponding to the target field value exceeds a second preset threshold, stopping continuously inserting field data corresponding to the target field value in the data table corresponding to the target field value, where the target field value is any one of one or more field values corresponding to all fields in each of the M data tables. It will be appreciated that when a field value in a table exceeds a second predetermined threshold, it is sufficient to identify a certain type of data in the table, and therefore, the means for counting data in a data bin can place a new data of the same type in another table, which facilitates database management in the future. It will also be appreciated that the second predetermined threshold may be the same as or different from the first predetermined threshold.
By implementing the embodiment of the application, according to the scheme of automatically monitoring the data distribution change of the data warehouse, the table structures of all tables in the data warehouse can be directly read, then the statistical statements are automatically spliced and generated according to the table structures, and then partial precision is abandoned through the function compiled by java, so that high-efficiency statistical value distribution is realized, and the field value distribution of the data warehouse is automatically counted with high performance. And outputting the statistical result to a visualization tool, and generating a report for an analyst. The change threshold value can be set by comparing the change of the statistical data, and related personnel are timely informed to check and repair when the change is abnormal, so that the data quality of the whole data warehouse is improved.
The method of the embodiment of the present application is explained in detail above, and the following provides a device for counting data of a data bin related to the embodiment of the present application, wherein the device for counting data of a data bin can be a service device which brings various conveniences for third party use based on interactive data by rapidly acquiring, processing, analyzing and extracting valuable data. Referring to fig. 4, fig. 4 is a schematic structural diagram of an apparatus for counting data of a data bin according to an embodiment of the present disclosure. The apparatus 10 for counting data bins may include an obtaining unit 401, a generating unit 402, and an executing unit 403, and may further include: a statistic unit 404, a comparison unit 405, an alarm unit 406, a first stop unit 407 and a second stop unit 408.
An obtaining unit 401, configured to obtain table structure information corresponding to a target data table in a data bin, where the table structure information includes field information, and the field information includes a field name of a target field;
a generating unit 402, configured to generate a statistical statement according to the field name of the target field, where the statistical statement is used to count record values of each row of one or more field values of the target field;
an executing unit 403, configured to execute the statistical statement, and obtain the number of records corresponding to the one or more field values.
In one possible implementation manner, the data bin includes M data tables, wherein the target data table is one of the M data tables, and the apparatus further includes: a counting unit 404, configured to count the number of records corresponding to one or more field values of all fields in each of the M data tables, and generate a counting result; a comparing unit 405, configured to determine a change trend of the statistical result in N preset periods after comparing the statistical result in each of the N preset periods, where the change trend includes one or more of increment, decrement, and deceleration of the record number corresponding to one or more field values corresponding to all the fields, where N is a positive integer greater than 1; and an alarm unit 406, configured to generate alarm information if the change trend exceeds a preset change threshold, where the alarm information is used to indicate a change trend of the record count in the N preset periods.
In one possible implementation, the statistics statement comprises a count function statement that returns the statistics result in JSON format.
In one possible implementation, the apparatus further includes: a first stopping unit 407, configured to stop statistics on a target field value if the number of records corresponding to the target field value exceeds a first preset threshold, and return the statistical result by marking the first preset threshold as the number of records of the target field value, where the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
In one possible implementation, the apparatus further includes: a second stopping unit 408, configured to stop continuing to insert field data corresponding to a target field value in a data table corresponding to the target field value if the number of records corresponding to the target field value exceeds a second preset threshold, where the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
It should be noted that implementation of each operation may also correspond to corresponding description of the method embodiments shown in fig. 2 and fig. 3A, and details are not described here again.
As shown in fig. 5, fig. 5 is a schematic structural diagram of another statistical data bin data provided in the embodiment of the present application, and the apparatus 20 includes at least one processor 501, at least one memory 502, and at least one communication interface 503. In addition, the device may also include common components such as an antenna, which will not be described in detail herein.
The processor 501 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.
The Memory 502 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The memory 502 is used for storing application program codes for executing the above scheme, and is controlled by the processor 501 for execution. The processor 501 is used to execute application program code stored in the memory 502.
The code stored in the memory 502 may perform the method for counting data of a data bin as provided in fig. 2 or fig. 3A above, for example, when the apparatus 20 is the apparatus 101 for counting data of a data bin, table structure information corresponding to a target data table in the data bin may be obtained, the table structure information includes field information, wherein the field information includes a field name of the target field; generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting record values of each row of one or more field values of the target field; and executing the statistical statement to acquire the number of records corresponding to the one or more field values.
It should be noted that, the functions of the functional units in the device for counting data bins 20 described in the embodiment of the present application can refer to the corresponding descriptions of the method embodiments shown in fig. 2 and fig. 3A, and are not described again here.
In this application, the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.
In addition, functional components in the embodiments of the present application may be integrated into one component, or each component may exist alone physically, or two or more components may be integrated into one component. The integrated components can be realized in a form of hardware or a form of software functional units.
The integrated components, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially implemented in the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. While the present application has been described herein in conjunction with various embodiments, other variations to the disclosed embodiments may be understood and effected by those skilled in the art in practicing the present application as claimed herein.
Claims (10)
1. A method of counting data of a data bin, comprising:
acquiring table structure information corresponding to a target data table in a data bin, wherein the table structure information comprises field information, and the field information comprises a field name of a target field;
generating a statistical statement according to the field name of the target field, wherein the statistical statement is used for counting record values of each row of one or more field values of the target field;
and executing the statistical statement to acquire the number of records corresponding to the one or more field values.
2. The method of claim 1, wherein the data bin includes M data tables, wherein the target data table is one of the M data tables, and wherein the method further comprises:
counting the number of records corresponding to one or more field values of all fields in each of the M data tables, and generating a counting result;
after comparing the statistical results in each of N preset periods, determining a change trend of the statistical results in the N preset periods, wherein the change trend comprises one or more of increment, acceleration, decrement and deceleration of the record number corresponding to one or more field values corresponding to all the fields, and N is a positive integer greater than 1;
and if the change trend exceeds a preset change threshold, generating warning information, wherein the warning information is used for indicating the change trend of the record number in the N preset periods.
3. The method of claim 2, wherein the statistics statement comprises a count function statement that returns the statistics result in JSON format.
4. The method of claim 2, further comprising:
if the number of records corresponding to the target field value exceeds a first preset threshold, stopping counting the target field value, and marking the first preset threshold as the number of records of the target field value and returning the counting result, wherein the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
5. The method of claim 2, further comprising:
and if the number of records corresponding to the target field value exceeds a second preset threshold, stopping continuously inserting field data corresponding to the target field value in the data table corresponding to the target field value, wherein the target field value is any one of one or more field values corresponding to all fields in each of the M data tables.
6. An apparatus for counting data of a data bin, comprising:
the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring table structure information corresponding to a target data table in a data bin, the table structure information comprises field information, and the field information comprises a field name of a target field;
the generating unit is used for generating a statistical statement according to the field name of the target field, and the statistical statement is used for counting the record values of each row of one or more field values of the target field;
and the execution unit is used for executing the statistical statement and acquiring the number of records corresponding to the one or more field values.
7. The apparatus of claim 6, wherein the data bin comprises M data tables, and wherein the target data table is one of the M data tables, the apparatus further comprising:
the statistical unit is used for counting the record number corresponding to one or more field values of all the fields in each of the M data tables and generating a statistical result;
the comparison unit is used for determining the change trend of the statistical result in N preset periods after comparing the statistical result in each of the N preset periods, wherein the change trend comprises one or more of increment, acceleration, decrement and deceleration of the record number corresponding to one or more field values corresponding to all the fields, and N is a positive integer greater than 1;
and the warning unit is used for generating warning information if the change trend exceeds a preset change threshold, wherein the warning information is used for indicating the change trend of the record number in the N preset periods.
8. The apparatus of claim 7, wherein the statistics statement comprises a count function statement that returns the statistics result in JSON format.
9. The device for counting the data of the data bin is characterized by comprising a processing component, a storage component and a communication module component, wherein the processing component, the storage component and the communication module are connected with each other, the storage component is used for storing a computer program, and the communication module is used for carrying out information interaction with external equipment; the processing component is configured for invoking a computer program for performing the method according to any of claims 1-5.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911263035.3A CN111143433B (en) | 2019-12-10 | 2019-12-10 | Method and device for counting data in data bin |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911263035.3A CN111143433B (en) | 2019-12-10 | 2019-12-10 | Method and device for counting data in data bin |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111143433A true CN111143433A (en) | 2020-05-12 |
CN111143433B CN111143433B (en) | 2024-07-09 |
Family
ID=70518057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911263035.3A Active CN111143433B (en) | 2019-12-10 | 2019-12-10 | Method and device for counting data in data bin |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111143433B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111625553A (en) * | 2020-05-27 | 2020-09-04 | 贵州易鲸捷信息技术有限公司 | Statistical information collection optimization method and system |
CN112015787A (en) * | 2020-08-28 | 2020-12-01 | 支付宝(杭州)信息技术有限公司 | Data query method and device |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477526A (en) * | 2008-12-31 | 2009-07-08 | 中兴通讯股份有限公司 | Method and system for implementing statistical forms customization |
CN101702152A (en) * | 2009-10-28 | 2010-05-05 | 金蝶软件(中国)有限公司 | Intelligent data processing method, device and system |
CN102982065A (en) * | 2003-09-15 | 2013-03-20 | 起元科技有限公司 | Data processing method, data processing apparatus, and computer readable storage medium |
CN105550270A (en) * | 2015-12-09 | 2016-05-04 | 百度在线网络技术(北京)有限公司 | Database inquiry method and device |
CN105630986A (en) * | 2015-12-25 | 2016-06-01 | 中国航天系统工程有限公司 | Method for acquiring multidimensional data from database for grid display |
CN105843945A (en) * | 2016-04-08 | 2016-08-10 | 联动优势科技有限公司 | Report generation method and system |
CN107391739A (en) * | 2017-08-07 | 2017-11-24 | 北京奇艺世纪科技有限公司 | A kind of query statement generation method, device and electronic equipment |
CN108920607A (en) * | 2018-06-27 | 2018-11-30 | 中国建设银行股份有限公司 | Field finds method, apparatus and electronic equipment |
CN109271411A (en) * | 2018-09-28 | 2019-01-25 | 中国平安财产保险股份有限公司 | Report form generation method, device, computer equipment and storage medium |
CN109299094A (en) * | 2018-09-18 | 2019-02-01 | 深圳壹账通智能科技有限公司 | Tables of data processing method, device, computer equipment and storage medium |
CN109710663A (en) * | 2018-12-29 | 2019-05-03 | 北京神舟航天软件技术有限公司 | A kind of data statistics chart generation method |
CN110275903A (en) * | 2019-06-28 | 2019-09-24 | 第四范式(北京)技术有限公司 | Improve the method and system of the feature formation efficiency of machine learning sample |
US20190370348A1 (en) * | 2018-06-04 | 2019-12-05 | Sap Se | Source data assignment based on metadata |
-
2019
- 2019-12-10 CN CN201911263035.3A patent/CN111143433B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982065A (en) * | 2003-09-15 | 2013-03-20 | 起元科技有限公司 | Data processing method, data processing apparatus, and computer readable storage medium |
CN101477526A (en) * | 2008-12-31 | 2009-07-08 | 中兴通讯股份有限公司 | Method and system for implementing statistical forms customization |
CN101702152A (en) * | 2009-10-28 | 2010-05-05 | 金蝶软件(中国)有限公司 | Intelligent data processing method, device and system |
CN105550270A (en) * | 2015-12-09 | 2016-05-04 | 百度在线网络技术(北京)有限公司 | Database inquiry method and device |
CN105630986A (en) * | 2015-12-25 | 2016-06-01 | 中国航天系统工程有限公司 | Method for acquiring multidimensional data from database for grid display |
CN105843945A (en) * | 2016-04-08 | 2016-08-10 | 联动优势科技有限公司 | Report generation method and system |
CN107391739A (en) * | 2017-08-07 | 2017-11-24 | 北京奇艺世纪科技有限公司 | A kind of query statement generation method, device and electronic equipment |
US20190370348A1 (en) * | 2018-06-04 | 2019-12-05 | Sap Se | Source data assignment based on metadata |
CN108920607A (en) * | 2018-06-27 | 2018-11-30 | 中国建设银行股份有限公司 | Field finds method, apparatus and electronic equipment |
CN109299094A (en) * | 2018-09-18 | 2019-02-01 | 深圳壹账通智能科技有限公司 | Tables of data processing method, device, computer equipment and storage medium |
CN109271411A (en) * | 2018-09-28 | 2019-01-25 | 中国平安财产保险股份有限公司 | Report form generation method, device, computer equipment and storage medium |
CN109710663A (en) * | 2018-12-29 | 2019-05-03 | 北京神舟航天软件技术有限公司 | A kind of data statistics chart generation method |
CN110275903A (en) * | 2019-06-28 | 2019-09-24 | 第四范式(北京)技术有限公司 | Improve the method and system of the feature formation efficiency of machine learning sample |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111625553A (en) * | 2020-05-27 | 2020-09-04 | 贵州易鲸捷信息技术有限公司 | Statistical information collection optimization method and system |
CN111625553B (en) * | 2020-05-27 | 2023-07-28 | 贵州易鲸捷信息技术有限公司 | Statistical information collection optimization method and system |
CN112015787A (en) * | 2020-08-28 | 2020-12-01 | 支付宝(杭州)信息技术有限公司 | Data query method and device |
Also Published As
Publication number | Publication date |
---|---|
CN111143433B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4099170B1 (en) | Method and apparatus of auditing log, electronic device, and medium | |
KR102033971B1 (en) | Data quality analysis | |
CN113190426B (en) | Stability monitoring method for big data scoring system | |
CN111143433B (en) | Method and device for counting data in data bin | |
CN114741392A (en) | Data query method and device, electronic equipment and storage medium | |
US9922116B2 (en) | Managing big data for services | |
CN114461644A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN112860808A (en) | User portrait analysis method, device, medium and equipment based on data tag | |
CN111338888A (en) | Data statistical method and device, electronic equipment and storage medium | |
CN111258819A (en) | Data acquisition method, device and system for MySQL database backup file | |
CN110011845B (en) | Log collection method and system | |
CN112162951A (en) | Information retrieval method, server and storage medium | |
US11816210B2 (en) | Risk-based alerting for computer security | |
CN115422275A (en) | Data processing method, device, equipment and storage medium | |
CN112434063B (en) | Method for processing monitoring data based on time sequence database | |
CN115293685A (en) | Logistics order state tracking method, device, equipment and storage medium | |
CN110941608B (en) | Method, device and equipment for generating buried point analysis and funnel analysis report | |
CN112346938B (en) | Operation auditing method and device, server and computer readable storage medium | |
CN103778218A (en) | Cloud computation-based standard information consistency early warning system and method | |
CN114238335A (en) | Buried point data generation method and related equipment thereof | |
CN114049036A (en) | Data computing platform, method, device and storage medium | |
CN113934894A (en) | Data display method based on index tree and terminal equipment | |
CN111581213A (en) | Information recording method, device and equipment | |
US20130086125A1 (en) | Presenting information from heterogeneous and distributed data sources with real time updates | |
CN112988542B (en) | Application scoring method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |