CN115438054A - Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium - Google Patents

Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium Download PDF

Info

Publication number
CN115438054A
CN115438054A CN202211017850.3A CN202211017850A CN115438054A CN 115438054 A CN115438054 A CN 115438054A CN 202211017850 A CN202211017850 A CN 202211017850A CN 115438054 A CN115438054 A CN 115438054A
Authority
CN
China
Prior art keywords
statistical
expert
information
data
historical data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211017850.3A
Other languages
Chinese (zh)
Inventor
周婷婷
刘智
胡汉一
胡明睿
徐圣源
许浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202211017850.3A priority Critical patent/CN115438054A/en
Publication of CN115438054A publication Critical patent/CN115438054A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses an incremental calculation updating method based on expert statistical characteristics, electronic equipment and a medium, wherein the method comprises the steps of off-line calculation and on-line calculation, wherein the off-line calculation firstly reads configuration information of the expert statistical characteristics; then extracting user statistical correlation information of statistical characteristics of historical data; and finally, persisting the statistical association information of the historical data. On-line calculation, firstly, reading configuration information of expert statistical characteristics; then extracting user statistical correlation information of statistical characteristics of the newly added data; meanwhile, reading the user statistical association information with persistent historical data; then, generating updated offline data statistics associated information based on the statistics associated information extracted from the history and the newly added data and executing persistence operation; and finally, generating corresponding statistical characteristics based on the updated statistical association information of the offline data. The method of the invention can still output the statistical characteristics with high efficiency aiming at the condition of low server resources.

Description

Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence characteristic engineering, statistical variable calculation and data storage, in particular to an incremental calculation updating method based on expert statistical characteristics, electronic equipment and a medium.
Background
For artificial intelligence techniques, data and features determine the upper limit of machine learning, and models and algorithms only approximate this upper limit. Therefore, feature engineering plays a significant role in machine learning. In practical application, data and features generated in the feature engineering stage are the key to successful machine learning. The existing automatic feature generation technology generates a large number of new features by simply converting and aggregating low-order features, then reserves the features with high importance, and performs model retraining by using the new features. Most users have insufficient computing resources and are difficult to carry with such a powerful computational load. Therefore, constructing appropriate sample features based on their limited computational resources in the traditional model presents new challenges for feature engineering tasks, and constructing such features often requires the intervention of expert experience. The existing expert experience features can be generally classified into static features and dynamic features, i.e., features obtained by combining and calculating a plurality of simple low-order features, and features obtained by performing statistical operations on the simple low-order features in the spatio-temporal dimension.
In the conventional feature statistical calculation process, the static features are usually obtained by directly executing calculation logic by using a plurality of low-order features, and the dynamic features are obtained by performing statistical calculation on part of or all of the historical data. To obtain these features requires powerful computational resources or a long period of time, which virtually increases the amount of computation and the huge computational cost of generating expert features.
Therefore, it is highly desirable to provide an incremental computation updating method based on expert statistical characteristics, which can still output statistical characteristics efficiently in case of low server resources.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an incremental calculation updating method based on expert statistical characteristics.
In order to achieve the technical purpose, the technical scheme of the invention is as follows: the first aspect of the embodiments of the present invention provides an incremental computation updating method based on expert statistical characteristics, where the method specifically includes the following steps:
(1) Constructing an expert statistical characteristic configuration file, wherein the expert statistical characteristic configuration file comprises a plurality of fields, the column name of the first field is the name of the statistical expert characteristic, and the column names of the other fields comprise statistical calculation categories, grouping attribute names, screening condition association attribute names, screening conditions, attribute names for executing statistical operation and statistical operation types corresponding to the statistical expert characteristic;
(2) Reading the expert statistical characteristic configuration file constructed in the step (1), taking the first field name as a main key, verifying the rest fields according to statistical calculation categories corresponding to statistical expert statistics, and splicing the rest fields passing the verification into a linked list to be used as a value group to form Hash mapping;
(3) Generating associated statistical information of historical data offline: extracting expert statistical characteristics to be generated according to the Hash mapping obtained in the step (2), obtaining corresponding calculation categories of the expert statistical characteristics according to the values obtained in the step (2), and respectively storing associated statistical information of historical data corresponding to the statistical expert characteristics according to the calculation categories of the statistical expert characteristics;
(4) Persisting the associated statistical information of the historical data stored in the step (3);
(5) Generating associated statistical information of the newly added data on line, extracting expert statistical characteristics to be generated according to the Hash mapping obtained in the step (2), obtaining the calculation categories of the corresponding expert statistical characteristics according to the values obtained in the step (2), and respectively storing the statistical associated information of the newly added data corresponding to the statistical expert characteristics according to the statistical calculation categories corresponding to the statistical expert characteristics;
(6) Updating historical associated statistical information, updating corresponding historical associated statistical information according to the associated statistical information of the historical data stored in the step (4) and the statistical associated information of the newly added data generated in the step (5), taking the current newly added data time as a new historical data time node, obtaining the updated historical associated statistical information, and persisting the updated historical associated statistical information as new historical data statistical associated information;
(7) And (4) deducing the numerical value of the statistical expert characteristic by using the historical data statistical association information updated in the step (6).
A second aspect of embodiments of the present invention provides an electronic device, comprising a memory and a processor, the memory being coupled to the processor; the memory is used for storing program data, and the processor is used for executing the program data to realize the incremental calculation updating method based on the expert statistical characteristics.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the above incremental computation updating method based on expert statistical features.
Based on the technical background and the service scene, the invention provides a method for calculating the statistical characteristics of financial wind control experts, which is general, efficient, strong in compatibility and strong in practicability.
The invention has the beneficial effects that: the invention provides an incremental calculation updating method based on expert statistical characteristics, which respectively obtains the associated statistical information of historical data and the associated statistical information of newly added data through an off-line calculation part and an on-line calculation part, and carries out the statistical associated information of persistent historical data. Generating updated offline data statistics associated information based on the statistics associated information extracted from the history and the newly added data and executing persistence operation; and finally, generating corresponding statistical characteristics based on the updated statistical association information of the offline data. Compared with the existing full-quantity data statistical characteristic updating method, the incremental calculation updating method based on the expert statistical characteristics provided by the invention can achieve the speed improvement of more than 10 times on the test data set; in addition, aiming at the condition of low server resources, the method can still efficiently output statistical characteristics and can be applied to real-time data inference of the AI model in the financial wind control service scene.
Drawings
FIG. 1 is a flow chart of the updating method of the statistical characteristics of the financial wind control expert based on incremental calculation according to the invention;
fig. 2 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The present invention will be described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.
The invention provides an incremental calculation updating method based on expert statistical characteristics, which specifically comprises the following steps as shown in figure 1:
(1) Constructing an expert statistical characteristic configuration file, comprising the following steps:
and configuring the column name of a first field, wherein the first field participates in statistical calculation from the original data form, and the statistical expert characteristic is taken as the column name of the first field.
Configuring column names of a second field, wherein the second field consists of categories corresponding to the statistical expert characteristics in the first field; in an embodiment of the present invention, the categories of the expert statistical characteristics include: the method comprises the steps of historical data statistics based on a certain characteristic, historical data statistics based on a certain characteristic in a specific effective time, statistics of occurrence times in a specific effective time and numerical statistics based on recent periods of a certain characteristic.
And configuring column names of third fields, wherein the third fields consist of column names corresponding to grouping statistical features formed by grouping statistical operations on the original data table for generating the expert statistical features of the first fields, and the third fields need to be the column names in the original table for executing the statistical operations.
And configuring the column name of a fourth field, wherein the fourth field consists of the column name for screening the conditional action when the original data table for generating the expert statistical characteristics of the first field is subjected to conditional screening operation, and the field needs to be the column name in the original table for executing the statistical operation.
And configuring the column name of a fifth field, wherein the fifth field consists of condition range values set by the screening conditions, and the fifth field can be specifically subdivided into: the pure numerical type, the combination of numerical values and corresponding measurement units represent the type, and when the number of parameters is multiple, the parameters are separated by "#".
And configuring the column name of a sixth field, wherein the sixth field consists of the column name to which the generated expert statistical feature output statistical result value of the first field belongs, and the field needs to be the column name in the original table for performing the statistical operation.
And configuring the column name of a seventh field, wherein the seventh field consists of the type of statistical operation which needs to be performed to generate the expert statistical characteristics identified by the first field based on the original table values associated with the column name of the sixth field.
Preferably, the fields in the configured field list are comma-separated.
(2) And reading the configuration information of the statistical characteristics of the experts, and splicing the corresponding first field names as a main Key (Key) and the subsequent field names into a linked list as a Value (Value) to form HashMap.
As a preferred scheme, the HashMap is spliced by using an array in the step (2), which specifically comprises the following steps:
reading each record in the expert statistical characteristic configuration file constructed in the step (1), taking the Value of the first field as Key, verifying the values of the second to seventh fields, and constructing a linked list of the values of the second to seventh fields passing the verification, wherein the linked list is used as Value;
and verifying and judging the configuration legality of the second field to the seventh field, wherein the following conditions are specifically judged:
condition 1: whether the category configured by the second field in the feature configuration file is in a configured category range or not is counted by the expert (the configured category range is a historical data statistic value based on a certain feature, a historical data statistic value based on a certain feature in a specific effective time, a statistic value of the occurrence times in a specific effective time, and a numerical statistic value of a few days of a certain feature);
condition 2: judging whether the non-null value requirements of the values of the third to seventh fields are met or not according to the configured categories;
illustratively, if the category configured according to the second field is to perform numerical statistics based on historical data of a certain feature, it is determined whether the third field, the fourth field, the sixth field, and the seventh field are empty.
Illustratively, the third field, the fourth field, the fifth field, the sixth field and the seventh field are determined to be empty if the category configured by the second field is the execution numerical value statistics based on the historical data of a certain feature within a specific effective time.
If any condition is not met, the expert statistical characteristic configuration file has problems, and the field which does not meet the requirement of the category is null.
(3) Generating associated statistical information of historical data offline: extracting expert statistical characteristics to be generated according to the HashMap obtained in the step (2), obtaining corresponding calculation categories of the expert statistical characteristics according to Value, respectively storing statistical association information of the expert statistical characteristics according to the calculation categories of the expert statistical characteristics, and ensuring that the generated historical data statistical association information can be deduced to obtain the corresponding expert statistical characteristics.
Illustratively, the Value of each expert feature in step (3) is generated, the statistical TYPE expert feature is extracted according to the hash map (HashMap) in step (2), the statistical calculation category of the corresponding expert feature is obtained according to the Value (i.e., determined by the second field in step (1)), and different statistical association information is stored according to the historical data statistical Value based on a certain feature (STA _ TYPE _ HISTORY _ Value), the historical data statistical Value based on a certain feature within a specific effective time (STA _ TYPE _ recovery _ DATE _ Value), the statistical Value of the number of occurrences within a specific effective time (STA _ TYPE _ recovery _ DATE), and the numerical statistical Value based on the RECENT period of a certain feature (STA _ TYPE _ recovery _ period _ probability _ Value).
In the process of storing different statistical associated information, the stored associated information is determined by the following contents:
content 1: and (3) determining the executed filter condition attribute according to the third Value of the Value linked list obtained in the step (2), namely the fourth field (namely the statistical screening condition associated attribute name) in the step (1).
Content 2: and (3) determining the associated attribute of the executed statistical operation according to the fifth Value of the Value linked list obtained in the step (2), namely the sixth field (namely the attribute name on which the statistical result depends) in the step (1).
Wherein, the associated information is stored by adopting independent linked list structures according to the requirements of the content 1 and the content 2.
Illustratively, in the process of storing the statistical association information according to the statistical calculation TYPE of the historical data statistical VALUE (STA _ TYPE _ HISTORY _ VALUE) based on a certain characteristic, a screening condition is determined according to the content 1, and filtering is performed to obtain the storage association information [ VALUE1, VALUE2, VALUE3 ].
Illustratively, in the process of storing statistical association information according to the statistical calculation category (STA _ TYPE _ response _ DATE _ VALUE) which is an attribute column for performing numerical statistics in a valid time based on a certain feature, the stored association information is determined by the content 1 and the content 2, and stored association information [ DATE1, DATE2, DATE3. ] and [ VALUE1, VALUE2, VALUE3. ] are obtained.
Illustratively, in the process of storing the statistical association information according to the statistical calculation TYPE (STA _ TYPE _ response _ DATE) based on the statistical value of the number of occurrences within the valid time, the stored association information is determined by the content 2, and the stored association information [ DATE1, DATE2, DATE3.
(4) And storing the historical data associated information generated in the offline mode, persisting the historical data associated statistical information, and ensuring the persistence of the associated statistical information.
In step (4), the data may be stored in the form of database such as MySQL, ORACLE, etc., or in the form of file such as csv, xls, pkl, etc., as an example.
(5) And (3) generating associated statistical information of the newly added data on line, extracting expert statistical characteristics to be generated according to the HashMap obtained in the step (2), acquiring the calculation categories of the corresponding expert statistical characteristics according to Value, and respectively storing the statistical associated information of the newly added data corresponding to the expert statistical characteristics according to the calculation categories of the expert statistical characteristics, so as to ensure that the generated statistical associated information of the newly added data can be deduced to obtain the corresponding expert statistical characteristics.
Exemplarily, in step (5):
(5.1) generating a numerical Value of each expert characteristic, extracting the statistical characteristic of the expert to be generated according to the HashMap in the step (2), and acquiring the statistical calculation category of the corresponding expert characteristic according to Value, namely the second field in the step (1) determines, and respectively executing the storage of different statistical associated information;
specifically, the stored association information is determined by:
content 1: determining the executed filter condition attribute according to the third Value of the Value linked list obtained in the step (2), namely the fourth field in the step (1);
content 2: determining the correlation attribute of the executed statistical operation according to the fifth Value of the Value linked list obtained in the step (2), namely the sixth field in the step (1);
the newly added data associated information meeting the statistical condition needs to be stored, wherein the associated information is stored by adopting an independent linked list structure according to the content 1 and the content 2 respectively.
(6) And (5) updating historical associated statistical information, updating corresponding historical associated statistical information according to the historical associated statistical information stored in the step (4) and the associated statistical information of the newly added data generated online in the step (5), taking the current newly added data time as a new historical data time node, obtaining the updated historical associated statistical information, and persisting the updated historical associated statistical information as new historical associated statistical information.
Exemplarily, in the step (6):
and (6.1) the associated statistical information of the historical data is merged into the statistical associated information of the newly added data, the associated statistical information of the historical data is updated according to the statistical associated information of the newly added data to form updated historical data associated statistical information, persistence operation is executed, the next round of iteration of the associated statistical information of the historical data is executed after the newly added data is waited for.
(6.1) the associated statistical information of the historical data is merged into the statistical associated information of the newly added data, and the method specifically comprises the following two modes:
mode 1: and according to the statistical associated information of the newly added data, finding the historical statistical associated information of the newly added data which needs to be combined from the statistical associated information of the historical data.
Mode 2: and searching data needing to be updated in the statistical association information of the historical data according to the filtering condition of the statistical value (namely the fourth field screening condition association attribute name and the fifth field statistical screening condition), executing updating operation on the part of data, and merging the updated information into the statistical association information of the newly added data.
(6.2) the ways of adding new data to update the historical data include the following two ways:
mode 1: and the part overlapped with the historical data information is processed in a mode of replacing the statistical associated information of the historical data with the newly added statistical associated information of the data.
Mode 2: and the part which is not overlapped with the historical data information is processed in a mode of directly incorporating the newly added data statistical association information.
(7) And (4) outputting expert statistical characteristics, and outputting the expert statistical characteristics for the data of the next inference period according to the new historical data association statistical information generated in the step (6).
Exemplarily, in step (7): and (4) counting the associated information by using the historical data updated in the step (6), and respectively executing the counting operations of summation (sum), maximum value (max), minimum value (min), average value (avg), counting (count), unique value (unique) extraction and the like aiming at the stored linked list information of the associated statistical information, so that the numerical value of the expert statistical characteristics of the inferred data can be quickly calculated.
Example 1
(1) Constructing an expert statistical characteristic configuration file:
table 1 shows a table structure of the expert statistical feature configuration file of the embodiment, and a service scenario of the embodiment is to provide relevant statistical information of expert statistical features in the field of financial wind control.
Table 1: table structure of expert statistical profile
Figure BDA0003812845120000071
And configuring the column name of the first field, and taking the statistical expert characteristic name (new _ generated _ col _ name) generated in the original data form as the column name of the first field.
Configuring a column name of a second field as a statistical calculation type (statistical _ type) of statistical expert statistical feature abstraction according to the statistical calculation type of the statistical feature abstraction of the expert from the financial (covering bank and insurance fields) wind control system, wherein the statistical calculation type is totally four types, and the statistical calculation type comprises the following steps: the method comprises the steps of performing numerical statistics on the basis of a historical condition statistic (STA _ TYPE _ HISTORY _ VALUE) of a certain characteristic, performing numerical statistics on the basis of an attribute column (STA _ TYPE _ RECENT _ DATE _ VALUE) of the certain characteristic in a valid time, performing numerical statistics on the basis of a statistic (STA _ TYPE _ RECENT _ DATE) of the occurrence times in the valid time, and performing numerical statistics on the basis of a RECENT period (STA _ TYPE _ RECENT _ PERIEDS _ VALUE) of the certain characteristic.
The column name of the third field is configured as a statistical grouping attribute name (col _ grouping _ name), the column name of the fourth field is configured as a statistical screening condition associated attribute name (col _ filter), the column name of the fifth field is configured as a statistical screening condition (filter _ condition), the column name of the sixth field is configured as an attribute name (col _ value) on which a statistical result depends, and the column name of the seventh field is configured as a statistical operation type (calc _ function).
(2) Reading each piece of information in the expert statistical characteristic configuration file constructed in the step (1):
and taking the corresponding first field name as Key, splicing the subsequent field names into List as Value to form HashMap (HashMap), and judging the legality of the configuration of the statistical features of experts, so that illegal feature configuration is realized, and the feature generation is cancelled. The steps of validity judgment are as follows:
(2.1) counting whether the category belongs to a legal configuration item;
(2.2) counting whether the parameter items with the non-defaultable category are reasonably and correctly configured;
(2.3) if the expert statistical feature is not configured correctly, the feature is not put into the HashMap.
The non-empty parameter configuration items in fig. 2 are parameter items that cannot be defaulted for the corresponding statistical calculation category, and after the statistical calculation category is selected, the corresponding parameter items must be filled, otherwise, the corresponding expert features cannot be generated.
(3) Generating associated statistical information of historical data off line:
and (3) acquiring statistical calculation category parameters in the Value linked list according to the HashMap generated in the step (2), and respectively executing associated information statistics of different offline historical data according to the calculation category parameters, wherein the purpose of storing the associated information is to effectively execute the iteration of subsequent newly-added data and ensure that expert statistical characteristics are correctly and quickly generated, so that the associated information to be stored for different statistical calculation category data is shown in the following table 2.
Table 2: statistical calculation type and storage associated information table
Figure BDA0003812845120000081
In order to optimize the updating performance of the associated statistical information of the subsequent execution history data by using the newly added data, the minimum value of the Date value in the linked list is recorded at the same time under the condition that the stored associated information comprises the Date linked list.
(4) Persistent associated statistical information:
the associated statistical information of the offline-generated historical data needs to be saved, in this example, a file in a csv format is used for storage, and the table of the associated statistical information of the offline-generated historical data in the embodiment is shown in table 3 below.
Table 3: association statistical information table of offline generated historical data
Figure BDA0003812845120000082
Figure BDA0003812845120000091
(5) Generating associated statistical information of the newly added data on line:
the step is similar to the generation process of the step 3), because the online updated data volume is far smaller than the offline data volume, the calculation amount of the associated statistical information of the newly added data generated online is small, and the result can be obtained quickly. Table 4 is a table of associated statistical information of the newly added data generated online in the embodiment (for comparison, in this example, storage is performed, and no persistence is required in actual online calculation).
Table 4: associated statistical information table of newly added data generated on line
Figure BDA0003812845120000092
(6) Updating historical associated statistical information:
the statistical association information of the updated data and the statistical association information of the historical data are finished in a two-step mode, firstly, the statistical association information of the historical data is screened to relate to the updating operation (with time window statistical calculation TYPEs, namely STA _ TYPE _ RECENT _ DATE _ VALUE and STA _ TYPE _ RECENT _ DATE), the screening is required to be carried out according to the VALUE of the minimum effective time attribute column (see table 4, namely, the offset _ Date _ count _ with _12 u month \\\ min attribute column) of the effective time link table of the statistical association information, and the statistical association information of the historical data of which the minimum effective time of the effective time link table of the association information does not meet the requirement of the time window needs to be screened according to the time window and is merged into the statistical association information of the updated data. Then, regarding the historical data statistical association information and the statistical association information of the updated data, regarding the case that the grouping values (see table 4, i.e. ID attribute column) are the same, the statistical association information of the updated data is used as the standard; regarding the grouping value (see table 4, i.e. ID attribute column) only existing in the statistical association information of the historical data, the statistical association information of the historical data is taken as the standard; for the grouping value (see table 4, i.e. ID attribute column) only existing in the statistical association information of the update data, the statistical association information of the update data is used as the standard. And forming the historical data associated statistical information of the updated version, executing persistence operation, waiting for the next newly added data, and executing the iteration of the associated statistical information of the historical data of the next round. The table of information after the operation of updating the history associated statistical information is shown in table 5.
Table 5: updated historical data associated statistics table
Figure BDA0003812845120000101
(7) Outputting expert statistical characteristics:
and (3) updating historical associated statistical information according to the step (6), and respectively performing statistical operations such as summation (sum), maximum Value (max), minimum Value (min), average Value (avg), counting (count), unique Value extraction (unique) and the like according to the category of statistical operations executed in the Value linked list of the HashMap generated in the step (2) aiming at the linked list information of the stored associated statistical information, so that a corresponding expert statistical characteristic Value can be quickly obtained.
The present application further provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement an expert statistics based incremental computation update method as described above. As shown in fig. 2, for a hardware structure diagram of any device with data processing capability in which an incremental computation updating method based on expert statistical characteristics according to an embodiment of the present invention is located, in addition to the processor, the memory, the DMA controller, the magnetic disk, and the nonvolatile memory shown in fig. 2, any device with data processing capability in which an embodiment of the apparatus is located may also include other hardware generally according to an actual function of the any device with data processing capability, which is not described again.
Accordingly, the present application also provides a computer readable storage medium, on which computer instructions are stored, and the instructions, when executed by a processor, implement the expert statistical feature-based incremental computation updating method as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output. The foregoing examples are illustrative and are not to be construed as limiting the invention, which is claimed to include but not be limited to the specific exemplary embodiments described above. Any method for updating incremental computation based on financial wind control expert statistical characteristics, which is consistent with the claims of the present invention, and which is described by any person skilled in the art, is within the scope of the present invention, and changes, substitutions and variations of expert statistical characteristic configuration files, database tables, and persistent storage modes according to different business scenarios and example data are all within the scope of the present invention.

Claims (10)

1. An incremental calculation updating method based on expert statistical characteristics is characterized by comprising the following steps:
(1) Constructing an expert statistical characteristic configuration file, wherein the expert statistical characteristic configuration file comprises a plurality of fields, the column name of the first field is the name of the statistical expert characteristic, and the column names of the other fields comprise statistical calculation categories, grouping attribute names, screening condition association attribute names, screening conditions, attribute names for executing statistical operation and statistical operation types corresponding to the statistical expert characteristic;
(2) Reading the expert statistical characteristic configuration file constructed in the step (1), taking the first field name as a main key, verifying the rest fields according to statistical calculation categories corresponding to statistical expert statistics, and splicing the rest fields passing the verification into a linked list to be used as a value group to form Hash mapping;
(3) Generating associated statistical information of historical data offline: extracting expert statistical characteristics to be generated according to the Hash mapping obtained in the step (2), obtaining corresponding calculation categories of the expert statistical characteristics according to the values obtained in the step (2), and respectively storing associated statistical information of historical data corresponding to the statistical expert characteristics according to the calculation categories of the statistical expert characteristics;
(4) Persisting the associated statistical information of the historical data stored in the step (3);
(5) Generating associated statistical information of the newly added data on line, extracting expert statistical characteristics to be generated according to the Hash mapping obtained in the step (2), obtaining the calculation categories of the corresponding expert statistical characteristics according to the values obtained in the step (2), and respectively storing the statistical associated information of the newly added data corresponding to the statistical expert characteristics according to the statistical calculation categories corresponding to the statistical expert characteristics;
(6) Updating historical associated statistical information, updating corresponding historical associated statistical information according to the associated statistical information of the historical data stored in the step (4) and the statistical associated information of the newly added data generated in the step (5), taking the current newly added data time as a new historical data time node, obtaining the updated historical associated statistical information, and persisting the updated historical associated statistical information as new historical data statistical associated information;
(7) And (5) deducing the numerical value of the statistical expert characteristic by using the historical data statistical association information updated in the step (6).
2. The expert statistical feature-based incremental computation updating method according to claim 1, wherein the statistical computation categories corresponding to the statistical expert features in step (1) include historical data statistics based on a certain feature, historical data statistics based on a certain feature within a specific effective time, statistics based on the number of occurrences within a specific effective time, and numerical statistics based on the recent period of a certain feature.
3. The expert statistical feature-based incremental computation update method of claim 1 wherein the fields are comma-separated.
4. The expert statistical feature-based incremental computation updating method according to claim 2, wherein the process of verifying the remaining fields according to the statistical computation categories corresponding to the statistical expert statistics in step (2) specifically comprises:
the configuration legality of other fields is verified and judged, and the following conditions are specifically judged:
condition 1: the expert counts whether the category configured by a second field in the feature configuration file is in a configured category range, wherein the configured category range is the configured category range, namely a historical data statistical value based on a certain feature, the historical data statistical value based on the certain feature in a specific effective time, the statistical value of the occurrence times in the specific effective time and the numerical value statistical value based on the recent period of the certain feature;
condition 2: judging whether the non-null value numerical value requirements in the rest fields are met or not according to the configured categories;
if any of the above conditions is not met, the expert statistical profile has a problem, and the field which does not meet the category requirement is null.
5. The expert statistical characteristic-based incremental computing updating method according to claim 2, wherein the step (3) respectively stores the associated statistical information of the historical data corresponding to the statistical expert characteristic according to the computing category of the statistical expert characteristic, and the step (5) respectively stores the statistical associated information of the new data corresponding to the statistical expert characteristic according to the computing category corresponding to the statistical expert characteristic, and is further determined by the following steps:
content 1: determining the executed filtering condition attribute according to the statistical screening condition associated attribute name in the value linked list obtained in the step (2);
content 2: determining the correlation attribute of the executed statistical operation according to the attribute name of the statistical result dependence in the value linked list obtained in the step (2);
wherein, the associated information is stored by adopting independent linked list structures according to the requirements of the content 1 and the content 2.
6. The expert statistical characteristic-based incremental computing updating method according to claim 1, wherein the step (4) is specifically: and (4) persistently storing the associated statistical information of the historical data saved in the step (3) by adopting MySQL, an ORACLE database and/or adopting a csv, xls and pkl file form.
7. The expert statistical feature-based incremental computation updating method according to claim 2, wherein the step (6) specifically comprises:
(6.1) the step of incorporating the associated statistical information of the historical data into the statistical associated information of the newly added data comprises the following steps:
mode 1: according to the statistical association information of the newly added data, finding out historical statistical association information which needs to be combined of the newly added data from the statistical association information of the historical data;
mode 2: searching data needing to be updated in the statistical association information of the historical data according to the screening condition association attribute name and the screening condition, executing updating operation on the part of data needing to be updated, and merging the updated information into the statistical association information of the newly added data;
(6.2) taking the current newly added data time as a new historical data time node, and obtaining updated historical associated statistical information comprises the following steps:
mode 1: the part overlapped with the historical data information is processed in a mode of replacing the statistical associated information of the historical data with the statistical associated information of the newly added data;
mode 2: and the part which is not overlapped with the historical data information is processed in a mode of directly incorporating the newly added data statistical association information.
8. The expert statistical feature-based incremental computation updating method according to claim 1, wherein the step (7) is specifically: and (4) utilizing the historical data statistics associated information updated in the step (6), and respectively performing statistics operations including summation, maximum value calculation, minimum value calculation, average value calculation, counting and unique value extraction on the linked list information of the updated historical data statistics associated information to obtain the numerical value of the statistical expert characteristic.
9. An electronic device comprising a memory and a processor, wherein the memory is coupled to the processor; wherein the memory is used for storing program data, and the processor is used for executing the program data to realize the incremental computation updating method based on expert statistical characteristics in any one of the above claims 1-8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the expert statistical feature-based incremental computation update method according to any one of claims 1 to 8.
CN202211017850.3A 2022-08-24 2022-08-24 Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium Pending CN115438054A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211017850.3A CN115438054A (en) 2022-08-24 2022-08-24 Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211017850.3A CN115438054A (en) 2022-08-24 2022-08-24 Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN115438054A true CN115438054A (en) 2022-12-06

Family

ID=84244619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211017850.3A Pending CN115438054A (en) 2022-08-24 2022-08-24 Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN115438054A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349345A (en) * 2023-12-05 2024-01-05 南京研利科技有限公司 Data statistics method and device and data statistics acquisition method and device thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349345A (en) * 2023-12-05 2024-01-05 南京研利科技有限公司 Data statistics method and device and data statistics acquisition method and device thereof

Similar Documents

Publication Publication Date Title
US11861462B2 (en) Preparing structured data sets for machine learning
CN110929879A (en) Business decision logic updating method based on decision engine and model platform
US11972228B2 (en) Merging database tables by classifying comparison signatures
CN106293891B (en) Multidimensional investment index monitoring method
CN109523117A (en) Risk Forecast Method, device, computer equipment and storage medium
CN111325248A (en) Method and system for reducing pre-loan business risk
CN113449753B (en) Service risk prediction method, device and system
CN111368887A (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN115438054A (en) Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium
CN114647790A (en) Big data mining method and cloud AI (Artificial Intelligence) service system applied to behavior intention analysis
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN111833177A (en) Method and device for selecting variable processing logic
CN111368864A (en) Identification method, availability evaluation method and device, electronic equipment and storage medium
CN111859057B (en) Data feature processing method and data feature processing device
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN115409104A (en) Method, apparatus, device, medium and program product for identifying object type
CN114139490A (en) Method, device and equipment for automatic data preprocessing
CN110990810B (en) User operation data processing method, device, equipment and storage medium
CN114356712A (en) Data processing method, device, equipment, readable storage medium and program product
CN116012123B (en) Wind control rule engine method and system based on Rete algorithm
CN112396513B (en) Data processing method and device
CN117272123B (en) Sensitive data processing method and device based on large model and storage medium
CN110119406B (en) Method and device for checking real-time task records
CN116912904A (en) Automatic teller machine face recognition optimization acceleration method and device based on CMA-ES algorithm
CN117391841A (en) Wind control strategy evaluation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination