CN116150200A - Data processing method, device, electronic equipment and storage medium - Google Patents
Data processing method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116150200A CN116150200A CN202211434113.3A CN202211434113A CN116150200A CN 116150200 A CN116150200 A CN 116150200A CN 202211434113 A CN202211434113 A CN 202211434113A CN 116150200 A CN116150200 A CN 116150200A
- Authority
- CN
- China
- Prior art keywords
- field
- aggregation
- operator
- data
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 46
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 230000002776 aggregation Effects 0.000 claims abstract description 187
- 238000004220 aggregation Methods 0.000 claims abstract description 187
- 238000005259 measurement Methods 0.000 claims abstract description 79
- 238000012545 processing Methods 0.000 claims abstract description 79
- 238000004364 calculation method Methods 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000004044 response Effects 0.000 claims abstract description 20
- 238000012216 screening Methods 0.000 claims abstract description 18
- 238000004590 computer program Methods 0.000 claims description 20
- 230000001960 triggered effect Effects 0.000 claims description 15
- 230000004931 aggregating effect Effects 0.000 claims description 6
- 235000019580 granularity Nutrition 0.000 description 26
- 230000006870 function Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 11
- 238000012935 Averaging Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 5
- 238000006116 polymerization reaction Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000854350 Enicospilus group Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining an aggregation condition parameter, a measurement field parameter and an operator parameter in response to a received parameter configuration request for configuring a preset variable index; acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with aggregation condition parameters from the detail data table; invoking an operator corresponding to the operator parameter, and performing aggregation calculation according to a preset aggregation granularity aiming at the field value of the target measurement field corresponding to the measurement field parameter contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; and obtaining a data processing result corresponding to the preset variable index according to the second quantity of aggregated data records. According to the method and the device, the detail data records can be converted into the aggregate data records, the number of the data records is reduced, and then the data processing efficiency is improved.
Description
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a data processing method, a device, an electronic device, and a storage medium.
Background
With the development of big data, the variety and quantity of basic data available for analysis in a business system are increasing. In an application scenario of massive data, if data analysis needs to be performed for a certain business index, calculation needs to be performed for massive basic data. Typically, these underlying data are stored as detailed data in which many business fields are recorded in detail, and the granularity of data storage is very fine, with the amount of data that needs to be calculated potentially reaching the order of billions.
In such a huge amount of data, if operation is required for the service data according to the preset field, a great amount of time is required, and even the operation may fail due to the fact that the data amount exceeds the upper limit of the system.
Disclosure of Invention
The application provides a data processing method, a data processing device, electronic equipment and a storage medium, which can convert detailed data records into aggregated data records, reduce the number of the data records and further improve the data processing efficiency.
In a first aspect, the present application provides a data processing method, including the steps of:
determining an aggregation condition parameter, a measurement field parameter and an operator parameter in response to a received parameter configuration request for configuring a preset variable index;
Acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table;
invoking an operator corresponding to the operator parameter, and performing aggregation calculation according to a preset aggregation granularity aiming at a field value of a target measurement field corresponding to the measurement field parameter contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field;
and obtaining a data processing result corresponding to the parameter configuration request according to the second number of aggregation data records containing aggregation measurement fields.
In a second aspect, the present application provides a data processing apparatus comprising:
the response module is suitable for responding to the received parameter configuration request for configuring the preset variable index, and determining an aggregation condition parameter, a measurement field parameter and an operator parameter;
the acquisition module is suitable for acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table;
The aggregation module is suitable for calling operators corresponding to the operator parameters, and carrying out aggregation calculation according to a preset aggregation granularity aiming at field values of target measurement fields corresponding to the measurement field parameters contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field;
and the processing module is suitable for obtaining a data processing result corresponding to the parameter configuration request according to the second number of aggregation data records containing the aggregation measurement field.
In a third aspect, the present application provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, one or more of the computer programs being executable by the at least one processor to enable the at least one processor to perform the above-described method.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor/processing core implements the above-described method.
According to the embodiment provided by the application, a first number of detail data records matched with the aggregation condition parameters can be screened from the detail data table, an operator corresponding to the operator parameters is called, aggregation calculation is carried out according to the preset aggregation granularity on the field values of the target measurement fields corresponding to the measurement field parameters contained in the first number of detail data records, a second number of aggregation data records containing the aggregation measurement fields are obtained, and then the data processing result corresponding to the parameter configuration request is obtained according to the second number of aggregation data records containing the aggregation measurement fields. Therefore, the method can screen out the detail data records meeting the aggregation conditions according to the aggregation condition parameters, and conduct aggregation calculation on the target measurement fields contained in the detail data records, so that the detail data records are converted into the aggregation data records. The data volume of the aggregate data records is obviously lower than the data volume of the detail data records, so that the number of the data records is reduced, the problem that the data volume of the memory is too large to calculate is avoided, and the data processing efficiency is improved.
It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
FIG. 1 is a flow chart of a data processing method according to one embodiment of the present application;
FIG. 2 is a flow chart of a data processing method according to another embodiment of the present application;
FIG. 3 illustrates an implementation of a variable abstracted from a language description to a variable model description;
FIG. 4 illustrates the manner in which various fields in a variable model are determined;
FIG. 5 shows a schematic diagram of a process for generating intermediate layer data;
FIG. 6 shows a schematic diagram of a process for calculating variable indicators from middle layer data;
FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For a better understanding of the technical solutions of the present application, the following description of exemplary embodiments of the present application is made with reference to the accompanying drawings, in which various details of embodiments of the present application are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the absence of conflict, embodiments and features of embodiments herein may be combined with one another.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and this application and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The data processing method according to the embodiment of the present application may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a vehicle-mounted device, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing service. The method may in particular be implemented by means of a processor calling a computer program stored in a memory.
In the related art, in an application scenario of massive data, if data analysis needs to be performed for a certain business index, calculation needs to be performed for massive basic data. Typically, these underlying data are stored as detailed data in which many business fields are recorded in detail, and the granularity of data storage is very fine, with the amount of data that needs to be calculated potentially reaching the order of billions. In such a huge amount of data, if operation is required for the service data according to the preset field, a great amount of time is required, and even the operation may fail due to the fact that the data amount exceeds the upper limit of the system. In order to solve the above problem, the embodiment can screen out the detail data records meeting the aggregation conditions according to the aggregation condition parameters, and perform aggregation calculation on the target measurement fields contained in the detail data records, so as to convert the detail data records into the aggregation data records, reduce the number of the data records, and further improve the data processing efficiency.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application. Referring to fig. 1, the method includes:
step S110: and determining an aggregation condition parameter, a measurement field parameter and an operator parameter in response to the received parameter configuration request for configuring the preset variable index.
Wherein the parameter configuration request is used to set parameters required for the data processing procedure, which are used to indicate the specific implementation of the data processing and the processing purpose. Optionally, the parameter configuration request at least includes an aggregation condition parameter, a measurement field parameter, and an operator parameter. The parameter configuration request is used for configuring a preset variable index, and the preset variable index is used for representing specific characteristics of a variable, for example, the preset variable index can be various forms such as a user login frequency variable index, a user consumption amount variable index and the like.
Wherein the aggregation condition parameter is used to characterize the data screening condition (i.e., to screen the eligible data records) so as to aggregate the eligible data records. For example, the aggregation condition parameter may be various parameters such as an event class parameter, a product class parameter, and the like. When the aggregation condition parameters are event parameters, screening the data records conforming to the preset event as data records to be aggregated; and when the polymerization condition parameters are product parameters, screening the data records belonging to the preset products as data records to be polymerized.
The metric field parameter is used to characterize the field to be measured, for example, the metric field parameter may be a field capable of being measured, such as an amount field, an access number field, a login number field, and the like.
The operator parameters are used for representing operators adopted when the field to be measured is measured, and the operators are used for representing corresponding operation logic. For example, operator parameters may be used to indicate various types of operators such as summation operators, averaging operators, deduplication operators, non-deduplication operators, and the like.
Step S120: and acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table.
The detail database is used to store a detail data table for storing individual detail fields required for a service through a plurality of data records. The detail data table is characterized in that: the number of data records is enormous and the variety of service fields contained in each data record is large.
When screening a first number of detail data records matched with the aggregation condition parameters from the detail data table, firstly, determining a condition field corresponding to the aggregation condition parameters from the detail data table, and then screening the data records with field values of the condition field matched with the parameter values of the aggregation condition parameters as the first number of detail data records. For example, assuming that a condition field corresponding to an aggregate condition parameter is an event field and a parameter value of the aggregate condition parameter is "loan", a data record of the event field having a field value of "loan" is filtered as a first number of detail data records.
Therefore, by means of the aggregation condition parameters, the first number of detail data records can be screened out from mass data records stored in the detail data table, so that data records which are not matched with the aggregation condition parameters are filtered out, and the subsequent processing data quantity is reduced.
Step S130: invoking an operator corresponding to the operator parameter, and performing aggregation calculation according to a preset aggregation granularity aiming at the field value of the target measurement field corresponding to the measurement field parameter contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field.
The target measurement field corresponding to the measurement field parameter is a field to be measured, for example, an amount field, an access number field, a login number field, and the like. The preset aggregation granularity can be time granularity, for example, aggregation is carried out according to the time granularity of year, month, day and the like; it may also be a granularity of traffic types, such as aggregation by the kind of traffic type. The aggregated data records are data records obtained after aggregation, and the second number of the aggregated data records is far smaller than the first number of the detail data records. And, an aggregate metric field characterizing an aggregate result of the target metric field is included in the aggregate data record. The aggregate metric field is used to store the aggregated result for the field value of the target metric field.
For example, assuming that the target measurement field is an amount field, the operator corresponding to the operator parameter is a summation operator, the preset aggregation granularity is a daily granularity, and the daily granularity corresponds to a plurality of detail data records, performing summation operation on the amount field in the plurality of detail data records corresponding to each day to obtain an aggregate data record corresponding to each day, where the aggregate data record corresponding to each day includes an amount summation result of the day, and the amount summation result is stored through the aggregate measurement field.
Thus, by the processing of this step, the aggregate data record corresponding to each day is reduced to one, and the data volume is greatly reduced compared with the plurality of detail data records corresponding to each day.
Step S140: and obtaining a data processing result corresponding to the parameter configuration request according to the second number of aggregation data records containing the aggregation metric field.
Because the number of the aggregated data records is small, and the aggregated result of the target metric field is stored in the aggregated metric field of each aggregated data record, the operator operation logic corresponding to the operator parameter is executed on the aggregated metric field in the aggregated data record, so that the data processing result corresponding to the parameter configuration request is obtained. For example, taking the application scenario of sum total as an example, since the previous step has already obtained a daily sum total, and accordingly, further performing a sum total operation on the daily sum total, a monthly sum total and even a yearly sum total can be obtained.
Therefore, the method can screen out the detail data records conforming to the aggregation conditions according to the aggregation condition parameters, and aggregate calculation is carried out on the target measurement fields contained in the detail data records, so that the detail data records are converted into the aggregate data records, the number of the data records is reduced, and the data processing efficiency is further improved.
Fig. 2 is a flowchart of a data processing method according to another embodiment of the present application. Referring to fig. 2, the method includes:
step S210: and determining an aggregation condition parameter, a measurement field parameter and an operator parameter in response to the received parameter configuration request for configuring the preset variable index.
Wherein the parameter configuration request is used to set parameters required for the data processing procedure, which are used to indicate the specific implementation of the data processing and the processing purpose. Specifically, the parameter configuration request at least includes an aggregation condition parameter, a measurement field parameter and an operator parameter. The aggregation condition parameters are used for characterizing the data screening conditions so as to aggregate the data records meeting the screening conditions. For example, the aggregation condition parameter may be various parameters such as an event class parameter, a product class parameter, and the like. When the aggregation condition parameters are event parameters, screening the data records conforming to the preset event as data records to be aggregated; and when the polymerization condition parameters are product parameters, screening the data records belonging to the preset products as data records to be polymerized. The metric field parameter is used to characterize the field to be measured, for example, the metric field parameter may be a field capable of being measured, such as an amount field, an access number field, a login number field, and the like. The operator parameters are used for representing operators adopted when the field to be measured is measured, and the operators are used for representing corresponding operation logic. For example, operator parameters may be used to indicate various types of operators such as summation operators, averaging operators, deduplication operators, non-deduplication operators, and the like.
In an alternative implementation manner, in order to facilitate normalized input of parameters and avoid the problem that a user cannot successfully analyze the parameters because of inputting some parameters which do not meet the specifications, in this embodiment, a plurality of configuration entries are set in advance in a configuration interface, each configuration entry is used for inputting different configuration parameters, and a parameter format and a value range supported by each configuration entry are preset, so that errors caused by the fact that the user inputs the parameters which are not in specification can be avoided. Correspondingly, the method is realized in the following steps: determining an aggregation condition parameter in response to a condition parameter configuration request triggered by an aggregation condition configuration entry contained in a configuration interface; the aggregation condition configuration inlet is used for inputting according to a preset condition value range. The aggregation condition configuration entry is used for configuring aggregation condition parameters, and the entry can provide a plurality of candidate fields and candidate field values for a user in a drop-down frame mode so as to be selected by the user. Determining a metric field parameter in response to a metric field configuration request triggered by a metric field configuration entry contained in the configuration interface; the measurement field configuration entry is used for inputting according to a preset field value range. The metric field configuration entry is used for configuring metric field parameters, and the entry can also provide a plurality of candidate fields and candidate field values for a user in a drop-down frame mode for the user to select from. Determining operator parameters in response to an operator configuration request triggered by an operator configuration entry contained in a configuration interface; the operator configuration entry is used for inputting according to a preset operator library. The operator configuration entry is used for configuring operator parameters, and the entry can provide operator parameters of each operator in a preset operator library for a user to select from through a drop-down frame mode.
Therefore, the input content of the user can be standardized by setting a plurality of configuration inlets in the configuration interface and setting an inputtable parameter format and a value range for each configuration inlet, so that the user can conveniently and quickly input various parameters.
Step S220: and acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table.
The detail database is used to store a detail data table for storing individual detail fields required for a service through a plurality of data records. The detail data table is characterized in that: the number of data records is enormous and the variety of service fields contained in each data record is large. When screening a first number of detail data records matched with the aggregation condition parameters from the detail data table, firstly, determining a condition field corresponding to the aggregation condition parameters from the detail data table, and then screening the data records with field values of the condition field matched with the parameter values of the aggregation condition parameters as the first number of detail data records. For example, assuming that a condition field corresponding to an aggregate condition parameter is an event field and a parameter value of the aggregate condition parameter is "loan", a data record of the event field having a field value of "loan" is filtered as a first number of detail data records. Therefore, by means of the aggregation condition parameters, the first number of detail data records can be screened out from mass data records stored in the detail data table, so that data records which are not matched with the aggregation condition parameters are filtered out, and the subsequent processing data quantity is reduced.
In addition, the polymerization condition parameters can be one or more, and when the polymerization condition parameters are a plurality of, the polymerization condition parameters can be used as a group of combination condition parameters together, so that the detail data records conforming to the group of combination condition parameters can be screened out. For example, assume that a condition field corresponding to one aggregation condition parameter is an event field, and a parameter value of the aggregation condition parameter is "loan"; and the condition field corresponding to one aggregation condition parameter is a product field, and the parameter value of the aggregation condition parameter is 'APP', so that data records of loans through APP products need to be screened.
Step S230: determining an operator type corresponding to the operator parameter, querying operator operation logic matched with the operator type from an operator library, and aggregating a field storage format of a metric field matched with the operator type.
The aggregation measurement field is used for storing the aggregated data result. Different operator types correspond to different operator arithmetic logic and field storage formats. For example, if the operator type is a deduplication operation type, the field storage format of the aggregate metric field that matches the operator type is a key value pair format; wherein the key is the index name of the data index to be de-duplicated; the value is the duplicate removal operation result corresponding to the data index. For example, if the data index to be de-duplicated is the login frequency of the IP address, the index name of the data index to be de-duplicated is the IP address, and correspondingly, the key in each key value pair is the IP address, and the key value is the login frequency after the IP address is de-duplicated. For another example, if the operator type is a non-deduplication operation type, the field storage format of the aggregate metric field that matches the operator type is a numeric long format.
Because the operator operation logic and the operation result corresponding to different operator types are different, the field storage format of the aggregation metric field matched with each operator type is preset, and the reliable storage of the aggregation result can be ensured.
Step S240: aiming at the first number of detail data records, aggregating according to a preset aggregation granularity to obtain a second number of detail data sets; wherein each detail data set contains a plurality of detail data records.
In the implementation process, cluster identification fields contained in a first number of detail data records are obtained; according to the clustering identification field, clustering is carried out on the first number of detail data records, and a second number of detail data sets are obtained; wherein each set of detail data corresponds to the same cluster identity. For example, assuming that the cluster identifier field is a time-class field, and specifically, a date field, the date value of each different date is used as a cluster identifier, and a plurality of detail data records containing the same date value are aggregated into a detail data set, where the cluster identifiers (i.e., date values) of a plurality of detail data records contained in each detail data set are the same.
Step S250: respectively aiming at each detail data set, carrying out aggregation calculation on field values of a target measurement field corresponding to the measurement field parameters in a plurality of detail data records of the detail data set according to operator operation logic to obtain an aggregation calculation result corresponding to the field values of the target measurement field; and storing the aggregation calculation result into an aggregation data record corresponding to the detail data set according to a field storage format to obtain a second number of aggregation data records containing aggregation measurement fields.
Specifically, for a field value of a target measurement field corresponding to a measurement field parameter in at least one detail data record included in each detail data set, performing an operator operation corresponding to an operator operation logic; wherein each detail data set corresponds to an aggregate data record. For example, in the previous step, if the cluster identification field is a time field, clustering is performed on the first number of detail data records according to the time granularity, and the cluster identification corresponding to each detail data set is a time identification. For example, the time granularity is the day granularity, clustering processing is performed on a plurality of detail data records belonging to the same day according to the time field, a detail data set corresponding to the day is obtained, and operator operations (such as summation operations and the like) corresponding to operator operation logic are performed on field values of target measurement fields corresponding to measurement field parameters in all detail data records included in the detail data set corresponding to the day. For another example, if the cluster identification field is a service type field, performing cluster processing according to the service type for the first number of detail data records, and the cluster identification corresponding to each detail data set is the service type identification. For example, the service types include: and the type A, the type B, the type C and the type D are used for correspondingly performing clustering processing on a plurality of detail data records belonging to the same service type to obtain a detail data set corresponding to the service type. And executing operator operation corresponding to operator operation logic aiming at the field value of the target measurement field corresponding to the measurement field parameter in each detail data record contained in the detail data set corresponding to the service type.
And carrying out aggregation calculation on field values of the target measurement fields corresponding to the measurement field parameters in the detail data records of the detail data set according to operator operation logic to obtain an aggregation calculation result corresponding to the field values of the target measurement fields. For example, assume that one detail data set includes 10 detail data records, and the target measurement field corresponding to the measurement field parameter is an amount field, and correspondingly, for the amount value of the amount field included in the 10 detail data records, performing aggregation calculation corresponding to the operator arithmetic logic, to obtain an aggregation calculation result corresponding to the field value of the amount field (i.e., the target measurement field). For example, if the operator is a sum operator, the aggregate calculation result is a summation result obtained by summing up for 10 monetary values; if the operator is an averaging operator, the aggregate calculation result is an average result obtained by averaging the 10 monetary values.
Correspondingly, the aggregation calculation result is stored into the aggregation data records corresponding to the detail data set according to the field storage format, and a second number of aggregation data records containing aggregation measurement fields are obtained. As described in the above example, the detail data set containing 10 detail data records corresponds to one aggregated data record, and the aggregation metric field in the aggregated data record is used to store the obtained aggregation calculation result.
Thus, the first number of detail data records are aggregated into the second number of detail data sets, and the second number of aggregated data records are obtained, so that the number of records is greatly reduced.
Step S260: and obtaining a data processing result corresponding to the preset variable index according to the second number of aggregation data records containing the aggregation metric field.
Wherein the aggregate data records are in one-to-one correspondence with the detail data sets.
Because the number of the aggregated data records is small, and the aggregated result of the target metric field is stored in the aggregated metric field of each aggregated data record, the operator operation logic corresponding to the operator parameter is executed on the aggregated metric field in the aggregated data record, so that the data processing result corresponding to the parameter configuration request is obtained. For example, taking the application scenario of sum total as an example, since the previous step has already obtained a daily sum total, and accordingly, further performing a sum total operation on the daily sum total, a monthly sum total and even a yearly sum total can be obtained.
In an alternative implementation, the number of metric field parameters included in the metric field configuration request triggered by the metric field configuration entry included in the configuration interface is multiple, and the number of operator parameters included in the operator configuration request triggered by the operator configuration entry included in the configuration interface is multiple. Correspondingly, the data processing result corresponding to the parameter configuration request comprises: a plurality of data processing results respectively corresponding to different variable indexes; wherein each variable index corresponds to a combination of the metric field parameters and the operator parameters.
Specifically, the variable is a result calculated based on the basic data and is used for modeling, and the variable index is correspondingly used for describing the preset attribute of the target object from a preset angle. The variable index is determined based on a combination of the metric field parameters and the operator parameters. For example, assume that the metric field parameters include: a first metric field parameter indicating that the field value of the event field is a loan, a second metric field parameter indicating that the field value of the event field is a repayment; the operator parameters include: summing operator parameters, averaging operator parameters. Correspondingly, the plurality of variable indicators includes: a user's loan total index (obtained based on a combination of a first metric field parameter indicating an event field as a loan and a summation operator), a loan average index (obtained based on a combination of a first metric field parameter indicating an event field as a loan and an averaging operator), a repayment total index (obtained based on a combination of a first metric field parameter indicating an event field as a repayment and a summation operator), a repayment average index (obtained based on a combination of a first metric field parameter indicating an event field as a repayment and an averaging operator), a overdue rate index; an operation time index of the equipment, an operation performance index of the equipment and the like. Therefore, the data processing results corresponding to the variable indexes can be automatically combined by directly inputting the plurality of measurement field parameters and the plurality of operator parameters in the configuration interface, so that the data processing process of the variable indexes can be conveniently and rapidly executed in batches. In order to improve the processing efficiency, the aggregation operation may be performed on each variable index in parallel by a parallel operation method of a plurality of processes in the background. The operation mode of each variable index is the same as that of the above steps, namely, the data of the detail layers are aggregated, and then the operation is carried out according to the obtained aggregated data record, so that the processing speed is improved.
In an alternative implementation manner, in order to facilitate performing data prediction analysis according to the data processing results, after obtaining the data processing results corresponding to the parameter configuration request, a plurality of data processing results respectively corresponding to different variable indexes are further used as input parameters and input into a preset data prediction model; predicting the security level of the target object corresponding to the multiple different variable indexes according to the output result of the data prediction model; wherein, the target object corresponding to the variable index comprises: user class objects, device class objects, etc. For example, taking a target object as a user class object as an example, the variable index can be data such as a loan total index, a loan average index, a repayment total index, a repayment average index, a overdue rate index of a user, and the like, automatically calculating user grade assessment results corresponding to the variable indexes through a data prediction model, and if the user grade assessment results are safety grades, indicating that the credit condition of the current user is good, and granting new service to the user; if the user grade evaluation result is a dangerous grade, the credit condition of the current user is not good, and new service is not granted to the user so as to avoid possible service risks. Because the embodiment can rapidly carry out operation processing on a plurality of variable indexes corresponding to the same target object, the safety level of the target object can be rapidly predicted, and the prediction efficiency and accuracy are improved.
In summary, the method can screen out the detail data records conforming to the aggregation conditions according to the aggregation condition parameters, and perform aggregation calculation on the target measurement fields contained in the detail data records, so that the detail data records are converted into the aggregation data records, the number of the data records is reduced, and further the data processing efficiency is improved. Because the data record content of the detail layer is more, the data record of the detail layer is processed into the aggregate data record, so that the total data amount can be greatly reduced, and the operation efficiency is greatly improved in a massive data operation scene.
In the following, specific implementation details in the present embodiment are described in detail by taking a specific example as an example for understanding.
With the continuous development of cloud computing, big data and artificial intelligence technology, a large amount of bottom layer original data is accumulated in an enterprise data warehouse. These underlying data need to serve the upper business systems and decision systems that can be implemented with the data prediction model mentioned above, in which variables are needed. Accordingly, in order to calculate the data processing result corresponding to the variable index, the variable value needs to be calculated from the basic data and then used for modeling. The quality of the variable determines the upper limit of the model prediction capability, and in the related art, the variable needs to be calculated based on detail layer basic data, so that the calculation mode is very complicated.
For example, in the related art, variables are calculated from the detail layer, assuming 1 million incremental data per day, the last year of data needs to be traced back, that total amount of data is 365 million, and calculating variables based on 365 hundred million detailed data will result in poor calculation performance. If MR is used as a calculation engine, variable calculation logic is complex, multiple IO operations are needed in the calculation process, and the performance is poor; if the Spark computing engine is used, although the memory is used for iterative computation and a small amount of IO operations are performed, the data volume is too large, the memory is very consumed, the computing time is too long, and even OOM errors occur to cause the failure of the computing task. In addition, if the code calculation logic is written to realize, the code is too complex, so that the variable calculation is a calculation task with very complex logic, and if the variable calculation is directly based on the detail layer data, all calculation logics sink to the detail layer data, so that the code is very complex to realize. In the case of a mass data scenario, the performance of computing variables from the underlying detail data may be very poor, or even impossible to compute. In order to solve the above-mentioned problems, the present example first aggregates according to detail layer data (i.e., detail data records) to generate intermediate layer data (i.e., the above-mentioned aggregate data records), and then calculates variables based on the intermediate layer data, thereby improving the variable calculation performance by several tens or even hundreds of times in a massive data scenario. Therefore, the top-down design method is adopted in the embodiment, the variable calculation model is designed according to the calculation logic of the variable, the middle layer data model is reversely pushed according to the variable calculation model, and after the middle layer data is generated, variable calculation is performed based on the middle layer data.
Specific implementation details of this example are described in detail below:
firstly, after a batch of variable indexes are set, a calculation model of each variable index is confirmed, and the variable index to be calculated of each variable is required to be comprehensively determined according to the variable dimension, the condition field and the measurement field. The variable dimension is used to describe an object attribute of the target object corresponding to the variable index, for example, the variable dimension may be a user dimension (e.g. identified by an identity card or a mobile phone number), a device dimension (e.g. identified by a device ID), or the like. The condition field is the field corresponding to the above mentioned aggregation condition parameter, and the measurement field is the field corresponding to the above mentioned measurement field parameter.
To facilitate normalized characterization of the various parameter fields described above, this example abstracts a general variable computation model as follows:
${dim}_${function}_${measure}_${time_range}_${where},
where $ { dim } represents a variable dimension, $ { function } represents a calculation function (i.e., operator), $ { measure } represents a metric field, $ { time_range } represents a backtracking date range, and $ { wherem } represents a data filtering condition. The variable dimension is determined according to the above-mentioned preset variable index, for example, if the preset variable index is an index related to the user (such as the login frequency of the user, etc.), the variable dimension is the user dimension; if the preset variable index is an index related to the product (such as the amount of the product, etc.), the variable dimension is the product dimension. The backtracking date range and the data filtering condition are collectively called an aggregation condition corresponding to the aggregation condition parameter. FIGS. 3 and 4 illustrate specific implementations abstracting an engineering computing description from one variable into a generic computing model. Wherein fig. 3 illustrates an implementation of a variable abstracted from a language description to a variable model description. As shown in FIG. 3, the language description of a variable index is "user A, last 30 days, on a platform in the financial industry, a loan event behavior occurs, the number of platforms involved together", and the corresponding engineering abstract description is
Select cnt(partner)from table where event=‘Loan’and industry=‘finance’and time between‘ds’and‘ds-30day’and user=‘A’。
After structured abstraction, the user ID is a, the computation function is cnt, the metric is partner, the time slice is 30d, and the condition is "event= 'lan' and reduction= 'finish'".
Accordingly, the variable model description (i.e., the variable expression, or the expression called the variable index) corresponding to the variable index is u_cnt_partner_30d_Loan_finish.
Fig. 4 shows the manner in which the various fields in the variable model are determined. For example, a variable dimension is a dimension to which a variable belongs, and is a fixed core element for describing an object attribute of a target object corresponding to the variable. The computation function is used to describe the computation logic of the feature. Metrics are data fields that calculate eigenvalues. The time range is the time range of the feature calculation, and the filtering condition is the data filtering condition of the feature calculation.
Then, in the case where the variable calculation model is determined, a middle layer data structure including a dimension field, a condition field, and the like is designed according to the variable calculation model. The middle layer is an aggregation layer, and the middle layer data structure is used for setting a storage mode of aggregation data record.
The dimension field is used to represent the dimension to which the variable belongs, and in popular terms, the variable represents the quantized value of which target object, such as the user dimension, the identity card dimension, the commodity dimension, etc., and fig. 3 is an example of the user dimension.
Condition field: if the variable's calculation logic has a condition, the condition field needs to be brought to the middle layer as it is.
Metric field: grouping aggregation (group aggregation) is carried out according to the dimension field and the condition field, the aggregated data is an array, and the intermediate layer metric value is calculated based on the array. For example, if the number of times of no duplicate removal is required to be calculated, the number of times of no duplicate removal is directly accumulated; if the number of times of de-duplication needs to be calculated, detail data needs to be stored, for example, map storage is used, keys represent attributes, and values represent attribute values; if the maximum value is required to be calculated, taking a max value; if the minimum value is required to be calculated, taking a min value; the number of times is taken as the length of the array; and taking sum value, etc. In the process of calculating from detail layer data to middle layer data, an operator (also called an aggregation operator) is used for calculating the aggregation data, an operator library is needed to be provided for packaging common operators into the operator library, and when new variable calculation demands exist subsequently, new aggregation operator logic is reversely pushed out from top to bottom and is continuously expanded into the operator library. Table 1 shows the relevant content of the function, data storage type, etc. corresponding to the aggregation operator.
TABLE 1
Aggregation operator | Chinese name | Aggregation function | Aggregate description | Data storage type |
cnt | Number of deduplication times | key-value | Storing detail data | Key value pair map |
freq | Number of times of no duplication removal | count | Number of times of calculation | Number long |
max | Maximum value | max | Take the maximum value | Numerical double |
min | Minimum value | min | Take the minimum value | Numerical double |
sum | Summing up | sum | Summing up | Numerical double |
Time field: the date of the aggregate time granularity is stored. If the data is daily aggregation, taking the data to generate daily yyyyMMdd; if the data is month aggregation, taking the data to generate month yyyyMM; in the case of the year, the taking of detail data occurs in the year yyyy.
The specific calculation process of the middle layer data is as follows:
first, detail layer data is loaded, and if the middle layer to be calculated is a japanese table, the data is loaded according to days, and each day is calculated. If the middle layer to be calculated is a month table, loading data of each month, and calculating each month. If the middle layer to be calculated is the chronology, annual data is loaded, each year by year. Then, an aggregation field is made according to the dimension field and the condition field, grouping aggregation (i.e. group by operation) is carried out on the aggregation field, after grouping, each group has a plurality of pieces of data, calculation is carried out according to the measurement fields respectively, the calculated data are intermediate layer data fields, and the dimension field and the condition field are carried into the intermediate layer as they are. Finally, the historical data are complemented, tables with different granularities such as an annual table, a lunar table, a Japanese table and the like are respectively generated according to the granularity of the aggregation time, and the structure of a finally generated intermediate layer data table (corresponding to the above-mentioned aggregation data record) is shown in the table 2:
TABLE 2
Fig. 5 shows a schematic diagram of the generation process of the above-described intermediate layer data. As shown in fig. 5, the intermediate layer data is specifically generated by:
step S501: a backtracking period is determined. For example, a data record is loaded for a preset period of time.
Step S502: aggregate loading detail layer data.
Step S503: the data is aggregated. Specifically, an aggregation operation is performed for the detail layer data.
Step S504: intermediate layer data is calculated. Intermediate layer data is calculated from the result of the aggregation operation.
Step S505: and judging whether the historical data is refreshed. The history data refreshing is completed, which means that the history data is completely loaded.
Fig. 6 shows a schematic diagram of a process for calculating a variable index from intermediate layer data. As shown in fig. 6, when calculating the variable index from the intermediate layer data, this is achieved by:
step S601: a backtracking period is determined.
Step S602: loading the middle layer data.
In specific implementation, the middle layer data is loaded according to the variable time range (namely the backtracking time period), for example, three years can load the chronology data of the last 3 years, one year can load the month table data of the last 12 months, one month can load the day table data of the last 30 days, and one week can load the day table data of the last 7 days
Step S603: the middle layer data was filtered. And after the data loading is completed, filtering out the data meeting the conditions according to the condition field.
Step S604: aggregation was performed on the filtered middle layer data. And grouping the filtered data according to the dimension values according to the variable dimension field, wherein after the dimensions are grouped, each dimension value has a plurality of pieces of data.
Step S605: variables are calculated from the aggregated intermediate layer data. Wherein, according to the designed middle layer data, variable operators are used for calculation, for example, the following steps are adopted:
TABLE 3 Table 3
Wherein table 3 shows the specific form of the calculation logic and field storage format corresponding to the various types of operators. For example, with the operator of the number of deduplication, the operator operation logic of the operator is described by a computation function cnt, the specific operation logic is "combine a plurality of pieces of data after grouping into one set, find the number of keys (keys) in the set", and the field storage format of the aggregate metric field matching the operator type is "key value pair" (characterized by "middle layer data" of table 3). For another example, with the maximum value operator, the operator operation logic of the operator is described by calculating a function max, the specific operation logic finds the maximum value for the plurality of data, and the field storage format of the aggregate metric field matched with the operator type is "numerical value".
It can be seen that the variable result can be directly calculated according to the middle layer data generated in the previous step, and different calculation operators correspond to different calculation logics. And (3) continuously expanding the variable computing operator into an operator library after new variable computing requirements exist. After the variable is calculated, the variable result is stored in a table, and the table structure is as follows: variable dimension, variable 1, variable 2, variable 3. The calculated results table can be used directly for modeling.
For example, assuming the current time is 2022-08-10, there is a detail layer data table that includes the following fields:
user-user
event
product of products
ip_addr: ip Address
amountof amount of money
ds: when data occurs
According to the service requirement, the following three variables need to be calculated:
variable one: the user A logs in different ip numbers in the condition that the event is logging in and the product is payment application in the past year;
two variables: sum of amounts that user a has incurred over the past year under the condition that the event is login and the product is payment application;
three variables: the average amount that user a has incurred over the past year with the event being login and the product being the payment application.
Wherein, the detail layer data corresponding to the calculation variable is as follows:
TABLE 4 Table 4
Firstly, designing a variable calculation model, wherein the dimension of a variable is a user A, the condition is that an event is login and a product is payment application, the backtracking time is the last year and is just 12 months, the calculation functions (i.e. operators) are respectively an ip duplication removal number cnt, an amount summation sum and an amount average avg, and the variable names are as follows:
u_cnt_ip_12m_Loan_Payment application
u_sum_current_12m_Loan_Payment application
u_avg_current_12m_Loan_Payment application
After the middle layer data model is designed, group by groups are made according to dimension field and condition field (user, event, product), aggregation is carried out according to month granularity, frequency is stored for each month, the de-duplication number is calculated by the ip_addr field, aggregation is needed to be a key value pair type, amountsum of each month is needed, and ds represents months. Table 5 shows the data records before aggregation.
TABLE 5
User' s | Event(s) | Product(s) | IP address | Quantity of | Date of day |
A | Login | Payment application | 192.168.1.1 | 10000 | 2022-07-01 12:00:00 |
B | Login | Payment application | 192.168.1.1 | 20000 | 2022-07-01 12:00:00 |
A | Login | Payment application | 192.168.1.1 | 20000 | 2022-07-02 12:00:00 |
A | Access to | Payment application | 192.168.1.2 | 20000 | 2022-08-01 15:00:00 |
B | Login | Online banking application | 192.168.1.2 | 10000 | 2022-08-01 11:00:00 |
A | Login | Payment application | 192.168.1.3 | 30000 | 2022-08-02 16:00:00 |
A | Login | Payment application | 192.168.1.1 | 40000 | 2022-08-02 12:00:00 |
Table 6 shows a data record obtained by aggregating the number of IP address login de-duplication times stored in the "ip_addr" field, using the month field "ds" as a cluster identification field. The "ip_addr" field in table 6 is an aggregation metric field included in the aggregated data record, and is used for storing the result after the aggregation operation is performed on the number of the IP address de-duplication logging. The operator in this example is a deduplication operator, and correspondingly, the field data format of the aggregate metric field is a key value pair format, taking the first data record in table 6 as an example, where the key is "192.168.1.1", the value is "2", and the meaning is: the number of de-duplication logins for IP address "192.168.1.1" is 2.
TABLE 6
When calculating the variables, data of the past year are loaded from the intermediate data, data needed by the variables are filtered, then grouping groups by are carried out according to dimensions, and the obtained data are calculated:
TABLE 7
From this, it can be seen that table 7 shows the data processing results finally obtained. The details of the amount are not listed, and the specific implementation is similar to the number of duplicate removal entries, which is not described here again.
In summary, in this example, the data amount from the detail layer to the intermediate layer is reduced by several tens of hundred times, and through practical tests, in the case of 1 million id number dimension data per day, the data amount in the past year is approximately 400 million, after convergence according to the main dimension+condition dimension according to the degree of day, the data amount of the intermediate layer in one year is 30 more million, after convergence according to the degree of month, the data amount of the intermediate layer in one year is 10 more million, and after convergence according to the year, the data amount of the intermediate layer in one year is 4 more million. Calculating variables from intermediate data greatly improves performance. Through practical tests, in the case of 1 million data amount per day, the detail layer data amount in the past year range is approximately 400 million, if 500 variables are calculated based on the detail layer calculation variable, the backtracking time is past one year, 8 hours are required, and if the variables are calculated based on the intermediate layer data, the calculation is completed less than half an hour. The method solves the technical problem that the traditional calculation variable data volume based on the detail layer is too large by generating the middle layer convergence data and calculating the variable based on the middle layer convergence data, and achieves the effect of improving the calculation variable performance under a large data volume.
It will be appreciated that the above-mentioned method embodiments of the present application may be combined with each other to form a combined embodiment without departing from the principle logic, which is not repeated herein, and the present application is limited to the description. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the application further provides a data processing device, an electronic device, a storage medium and a computer readable storage medium, and the above can be used for implementing any one of the data processing methods provided in the application, and the corresponding technical schemes, descriptions and corresponding records referring to method parts are not repeated.
Fig. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application.
Referring to fig. 7, an embodiment of the present application provides a data processing apparatus 70, the data processing apparatus 70 including:
a response module 71 adapted to determine an aggregation condition parameter, a metric field parameter and an operator parameter in response to a received parameter configuration request for configuring a preset variable indicator;
an acquisition module 72 adapted to acquire a list of details stored in a database of details, and to screen from the list of details a first number of detail data records matching the aggregation condition parameter;
An aggregation module 73, adapted to call an operator corresponding to the operator parameter, and perform an aggregation calculation according to a preset aggregation granularity for a field value of a target metric field corresponding to the metric field parameter included in the first number of detail data records, to obtain a second number of aggregated data records including an aggregated metric field; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field;
the processing module 74 is adapted to obtain a data processing result corresponding to the parameter configuration request according to the second number of aggregated data records containing the aggregated metric field.
Optionally, the aggregation module is specifically adapted to: determining an operator type corresponding to the operator parameter; querying operator operation logic matched with the operator type from an operator library, and a field storage format of an aggregate measurement field matched with the operator type; aiming at the first quantity of detail data records, aggregating according to the preset aggregation granularity to obtain a second quantity of detail data sets; wherein each detail data set comprises a plurality of detail data records; respectively aiming at each detail data set, carrying out aggregation calculation on field values of a target measurement field corresponding to the measurement field parameter in a plurality of detail data records of the detail data set according to the operator arithmetic logic to obtain an aggregation calculation result corresponding to the field values of the target measurement field; storing the aggregation calculation result into an aggregation data record corresponding to the detail data set according to the field storage format to obtain a second number of aggregation data records containing aggregation measurement fields; wherein the aggregate data records are in one-to-one correspondence with the detail data sets.
Optionally, if the operator type is a deduplication operation type, a field storage format of the aggregate metric field matched with the operator type is a key value pair format; wherein the key is the index name of the data index to be de-duplicated; the value is the duplicate removal operation result corresponding to the data index.
Optionally, the aggregation module is specifically adapted to: acquiring cluster identification fields corresponding to the preset aggregation granularity, which are contained in the first number of detail data records; performing clustering processing on the first number of detail data records according to the cluster identification field to obtain the second number of detail data sets; wherein each detail data set corresponds to the same cluster identifier; if the cluster identification field is a time field, performing clustering processing according to time granularity aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a time identification; and if the cluster identification field is a service type field, performing cluster processing according to the service type aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a service type identification.
Optionally, the aggregation module is specifically adapted to: if the cluster identification field is a time field, performing clustering processing according to time granularity aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a time identification; and if the cluster identification field is a service type field, performing cluster processing according to the service type aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a service type identification.
Optionally, the response module is specifically adapted to: determining an aggregation condition parameter in response to a condition parameter configuration request triggered by an aggregation condition configuration entry contained in a configuration interface; the aggregation condition configuration inlet is used for inputting according to a preset condition value range; determining a metric field parameter in response to a metric field configuration request triggered by a metric field configuration entry contained in the configuration interface; the measurement field configuration entry is used for inputting according to a preset field value range; determining operator parameters in response to an operator configuration request triggered by an operator configuration entry contained in a configuration interface; the operator configuration entry is used for inputting according to a preset operator library.
Optionally, the number of the metric field parameters included in the metric field configuration request triggered by the metric field configuration entry included in the configuration interface is a plurality of the metric field parameters, and the number of the operator parameters included in the operator configuration request triggered by the operator configuration entry included in the configuration interface is a plurality of the operator parameters; the data processing result corresponding to the parameter configuration request includes: a plurality of data processing results respectively corresponding to different variable indexes; wherein each variable index corresponds to a combination of the metric field parameters and the operator parameters.
Optionally, the processing module is specifically adapted to: taking a plurality of data processing results respectively corresponding to different variable indexes as input parameters, and inputting the input parameters into a preset data prediction model; predicting the security level of the target object corresponding to a plurality of different variable indexes according to the output result of the data prediction model; wherein the target object corresponding to the variable index comprises: a user class object and a device class object.
Fig. 8 is a block diagram of an electronic device 50 according to an embodiment of the present application.
Referring to fig. 8, an embodiment of the present application provides an electronic device, including: at least one processor 501; at least one memory 502, and one or more I/O interfaces 503, coupled between the processor 501 and the memory 502; wherein the memory 502 stores one or more computer programs executable by the at least one processor 501, the one or more computer programs being executed by the at least one processor 501 to perform the data processing method described above.
The embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor/processing core, implements the data migration method described above. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present application also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the above-described data migration method.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which may execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will therefore be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present application as set forth in the following claims.
Claims (10)
1. A method of data processing, comprising:
determining an aggregation condition parameter, a measurement field parameter and an operator parameter corresponding to a preset variable index in response to a received parameter configuration request for configuring the preset variable index;
acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table;
invoking an operator corresponding to the operator parameter, and performing aggregation calculation according to a preset aggregation granularity aiming at a field value of a target measurement field corresponding to the measurement field parameter contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field;
And obtaining a data processing result corresponding to the preset variable index according to the second number of aggregation data records containing the aggregation measurement field.
2. The method of claim 1, wherein invoking the operator corresponding to the operator parameter, performing an aggregation calculation according to a preset aggregation granularity for a field value of a target metric field corresponding to the metric field parameter included in the first number of detail data records, to obtain a second number of aggregate data records including an aggregate metric field includes:
determining an operator type corresponding to the operator parameter;
querying operator operation logic matched with the operator type from an operator library, and a field storage format of an aggregate measurement field matched with the operator type;
aiming at the first quantity of detail data records, aggregating according to the preset aggregation granularity to obtain a second quantity of detail data sets; wherein each detail data set comprises a plurality of detail data records;
respectively aiming at each detail data set, carrying out aggregation calculation on field values of a target measurement field corresponding to the measurement field parameter in a plurality of detail data records of the detail data set according to the operator arithmetic logic to obtain an aggregation calculation result corresponding to the field values of the target measurement field;
Storing the aggregation calculation result into an aggregation data record corresponding to the detail data set according to the field storage format to obtain a second number of aggregation data records containing aggregation measurement fields; wherein the aggregate data records are in one-to-one correspondence with the detail data sets.
3. The method of claim 2, wherein if the operator type is a deduplication operation type, the field storage format of the aggregate metric field that matches the operator type is a key-value pair format;
wherein the key is the index name of the data index to be de-duplicated; the value is the duplicate removal operation result corresponding to the data index.
4. The method of claim 2, wherein aggregating the first number of detail data records according to the preset aggregation granularity to obtain a second number of detail data sets comprises:
acquiring cluster identification fields corresponding to the preset aggregation granularity, which are contained in the first number of detail data records;
performing clustering processing on the first number of detail data records according to the cluster identification field to obtain the second number of detail data sets; wherein each detail data set corresponds to the same cluster identifier;
If the cluster identification field is a time field, performing clustering processing according to time granularity aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a time identification;
and if the cluster identification field is a service type field, performing cluster processing according to the service type aiming at the first number of detail data records, wherein the cluster identification corresponding to each detail data set is a service type identification.
5. The method of claim 1, wherein determining the aggregate condition parameter, the metric field parameter, and the operator parameter in response to the received parameter configuration request comprises:
determining an aggregation condition parameter in response to a condition parameter configuration request triggered by an aggregation condition configuration entry contained in a configuration interface; the aggregation condition configuration inlet is used for inputting according to a preset condition value range;
determining a metric field parameter in response to a metric field configuration request triggered by a metric field configuration entry contained in the configuration interface; the measurement field configuration entry is used for inputting according to a preset field value range;
Determining operator parameters in response to an operator configuration request triggered by an operator configuration entry contained in a configuration interface; the operator configuration entry is used for inputting according to a preset operator library.
6. The method of claim 5, wherein the number of metric field parameters included in the metric field configuration request triggered by the metric field configuration entry included in the configuration interface is a plurality and the number of operator parameters included in the operator configuration request triggered by the operator configuration entry included in the configuration interface is a plurality;
the data processing result corresponding to the parameter configuration request includes: a plurality of data processing results respectively corresponding to different variable indexes; wherein each variable index corresponds to a combination of the metric field parameters and the operator parameters.
7. The method of claim 6, wherein after the obtaining the data processing result corresponding to the parameter configuration request, the method further comprises:
taking a plurality of data processing results respectively corresponding to different variable indexes as input parameters, and inputting the input parameters into a preset data prediction model;
Predicting the security level of the target object corresponding to a plurality of different variable indexes according to the output result of the data prediction model;
wherein the target object corresponding to the variable index comprises: a user class object and a device class object.
8. A data processing apparatus, comprising:
the response module is suitable for responding to the received parameter configuration request for configuring the preset variable index, and determining an aggregation condition parameter, a measurement field parameter and an operator parameter;
the acquisition module is suitable for acquiring a detail data table stored in a detail database, and screening a first number of detail data records matched with the aggregation condition parameters from the detail data table;
the aggregation module is suitable for calling operators corresponding to the operator parameters, and carrying out aggregation calculation according to a preset aggregation granularity aiming at field values of target measurement fields corresponding to the measurement field parameters contained in the first quantity of detail data records to obtain a second quantity of aggregation data records containing aggregation measurement fields; wherein the second number is less than the first number and the aggregate metric field is used to characterize an aggregate result of the target metric field;
And the processing module is suitable for obtaining a data processing result corresponding to the parameter configuration request according to the second number of aggregation data records containing the aggregation measurement field.
9. An electronic device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores one or more computer programs executable by the at least one processor, one or more of the computer programs being executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data processing method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211434113.3A CN116150200A (en) | 2022-11-16 | 2022-11-16 | Data processing method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211434113.3A CN116150200A (en) | 2022-11-16 | 2022-11-16 | Data processing method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116150200A true CN116150200A (en) | 2023-05-23 |
Family
ID=86339673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211434113.3A Pending CN116150200A (en) | 2022-11-16 | 2022-11-16 | Data processing method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116150200A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841752A (en) * | 2023-08-31 | 2023-10-03 | 杭州瞬安信息科技有限公司 | Data analysis and calculation system based on distributed real-time calculation framework |
-
2022
- 2022-11-16 CN CN202211434113.3A patent/CN116150200A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841752A (en) * | 2023-08-31 | 2023-10-03 | 杭州瞬安信息科技有限公司 | Data analysis and calculation system based on distributed real-time calculation framework |
CN116841752B (en) * | 2023-08-31 | 2023-11-07 | 杭州瞬安信息科技有限公司 | Data analysis and calculation system based on distributed real-time calculation framework |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107622326B (en) | User classification and available resource prediction method, device and equipment | |
US11768958B2 (en) | System and method for objective quantification and mitigation of privacy risk | |
CN115795928B (en) | Gamma process-based accelerated degradation test data processing method and device | |
CN110879808A (en) | Information processing method and device | |
EP4107646A1 (en) | Systems and methods for anonymizing sensitive data and simulating accelerated schedule parameters using the anonymized data | |
CN114741392A (en) | Data query method and device, electronic equipment and storage medium | |
CN111062600B (en) | Model evaluation method, system, electronic device, and computer-readable storage medium | |
CN116150200A (en) | Data processing method, device, electronic equipment and storage medium | |
CN114385722A (en) | Interface attribute consistency checking method and device, electronic equipment and storage medium | |
CN112950359B (en) | User identification method and device | |
US11715037B2 (en) | Validation of AI models using holdout sets | |
US11227288B1 (en) | Systems and methods for integration of disparate data feeds for unified data monitoring | |
CN117033765A (en) | Service recommendation method, device, computer equipment and storage medium | |
CN115759742A (en) | Enterprise risk assessment method and device, computer equipment and storage medium | |
CN114372867A (en) | User credit verification and evaluation method and device and computer equipment | |
US20230010147A1 (en) | Automated determination of accurate data schema | |
CN114579628A (en) | Method and device for monitoring nuclear subtraction, electronic equipment and computer readable storage medium | |
CN118096339A (en) | Service data processing method, device, computer equipment and storage medium | |
CN118467750A (en) | Resource rate determination method, apparatus, computer device, readable storage medium, and program product | |
CN113239083A (en) | Data analysis method and device | |
CN117743306A (en) | Data migration method, device, computer equipment and storage medium | |
CN115293452A (en) | User behavior prediction method and device, computer equipment and storage medium | |
CN118195773A (en) | Transaction data verification method, device, computer equipment, storage medium and product | |
CN116091209A (en) | Credit service processing method, apparatus, computer device and storage medium | |
CN113177002A (en) | Test design method and device based on test points, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |