CN112799919A - Data monitoring method, device, equipment and computer storage medium - Google Patents

Data monitoring method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN112799919A
CN112799919A CN202110392180.2A CN202110392180A CN112799919A CN 112799919 A CN112799919 A CN 112799919A CN 202110392180 A CN202110392180 A CN 202110392180A CN 112799919 A CN112799919 A CN 112799919A
Authority
CN
China
Prior art keywords
data
script file
file
alarm information
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110392180.2A
Other languages
Chinese (zh)
Inventor
杜仕伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Smk Network Technology Co ltd
Original Assignee
Shanghai Smk Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Smk Network Technology Co ltd filed Critical Shanghai Smk Network Technology Co ltd
Priority to CN202110392180.2A priority Critical patent/CN112799919A/en
Publication of CN112799919A publication Critical patent/CN112799919A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3068Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Abstract

The utility model relates to a data monitoring method, a device and a computer storage medium, wherein the method comprises the steps of obtaining the label data corresponding to the target label through the query statement in the script file; calculating stability index data of the tag data; calling an external program file to generate alarm information and outputting the alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file; the method can automatically inquire the tag data through the program statement, calculate the stability index data, monitor the stability of the tag data, automatically call an external program file to send alarm information when the stability index data exceeds a preset threshold value, and greatly reduce the situations of missing report and false report of manual monitoring through a program automatic monitoring mode; compared with the traditional manual monitoring method, the data monitoring method can improve the processing efficiency of data monitoring and reduce the labor cost and the time cost.

Description

Data monitoring method, device, equipment and computer storage medium
Technical Field
The present disclosure belongs to the technical field of financial data monitoring, and in particular, to a data monitoring method, apparatus, device, and computer storage medium.
Background
With the development of enterprise big data business and technology, the data volume in the enterprise is larger and larger, and the data volume is an important asset of the enterprise, so the stability of the data is very important, especially for financial data. At present, the monitoring of data is mainly based on manual testing, and a tester needs to obtain related numbers and then adopts a calculation mode of variance or ring ratio equivalence and the like to evaluate the stability of the data.
But business scenes of enterprises are more and more, and the data volume is also more and more; while the requirement on the stability of the data is increasingly obvious, the traditional calculation mode is adopted to calculate mass data, the calculation amount is large, the calculation resources are consumed more, the realization is difficult, a large amount of manpower and time are consumed, and the efficiency is low; and is limited by the experience and capability factors of monitoring personnel, and the missing report and the false report rate of manual testing and monitoring are high.
Disclosure of Invention
The embodiment of the disclosure provides a data monitoring method, a data monitoring device, a data monitoring equipment and a computer storage medium, which can be applied to monitoring the stability of financial data, improve the data monitoring efficiency and reduce the labor and time cost in the system maintenance process.
In a first aspect, an embodiment of the present disclosure provides a data monitoring method, where the method includes:
acquiring label data corresponding to a target label through an inquiry statement in the script file;
calculating stability index data of the tag data;
calling an external program file to generate alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file;
and outputting the alarm information.
In some embodiments, before obtaining the tag data corresponding to the target tag through the query statement in the script file, the method further includes:
acquiring an external program file and acquiring a script file; the script file comprises an inquiry statement and a calling statement; the calling statement is used for calling an external program file when the stability index data exceeds a preset threshold range;
and deploying the script file and operating the script file according to a preset scheduling strategy.
In some embodiments, obtaining the external program file comprises:
writing or copying an external program file; the external program file comprises a generation rule of the alarm information defined by a programming language and an output rule of the alarm information;
and generating a rule, wherein the rule comprises generating a theme and/or content of the alarm information according to a preset format and/or a preset style and according to the condition information when the stability index data exceeds a preset threshold range.
In some embodiments, the scheduling policy comprises
Setting scheduling time according to the data partition generation time and the updating frequency of the target label;
and periodically running the script file according to the set scheduling time.
In some embodiments, the stability indicator data is a population stability indicator value.
In a second aspect, an embodiment of the present disclosure provides a data monitoring apparatus, including:
the first acquisition module is used for acquiring the label data corresponding to the target label through the query statement in the script file;
the calculation module is used for calculating stability index data of the label data;
the generating module is used for calling an external program file to generate alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file;
and the output module is used for outputting the alarm information.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring the external program file and acquiring the script file; the script file comprises an inquiry statement and a calling statement; the calling statement is used for calling an external program file when the stability index data exceeds a preset threshold range;
and the deployment module is used for deploying the script file and operating the script file according to a preset scheduling strategy.
In some embodiments, the second obtaining module is specifically configured to:
writing or copying the external program file; the external program file comprises a generation rule of the alarm information defined by a programming language and an output rule of the alarm information;
and generating the theme and/or the content of the alarm information according to a preset format and/or a preset style according to the condition information when the stability index data exceeds a preset threshold range.
In some embodiments, the scheduling policy comprises:
setting scheduling time according to the data partition generation time and the updating frequency of the target label;
and periodically running the script file according to the set scheduling time.
In a third aspect, an embodiment of the present disclosure provides a data monitoring apparatus, where the apparatus includes: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the data monitoring method according to any of the embodiments described above.
In a fourth aspect, the present disclosure provides a computer storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the data monitoring method according to any one of the above embodiments is implemented.
According to the data monitoring method, the data monitoring device, the data monitoring equipment and the computer storage medium, the tag data can be automatically inquired through the program statements, the stability index data is calculated, the stability of the tag data is monitored, an external program file can be automatically called to send alarm information when the stability index data exceeds a preset threshold value, and the situations of missing report and false report of manual monitoring can be greatly reduced through a program automatic monitoring mode; compared with the traditional manual monitoring method, the data monitoring method can improve the processing efficiency of data monitoring and reduce the labor cost and the time cost.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments of the present disclosure will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart diagram of a data monitoring method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a data monitoring method according to another embodiment of the present disclosure;
fig. 3 is a schematic flowchart illustrating a script file being periodically executed in a data monitoring method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data monitoring apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a data monitoring device according to an embodiment of the present disclosure.
Detailed Description
Features and exemplary embodiments of various aspects of the present disclosure will be described in detail below, and in order to make objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting of the disclosure. It will be apparent to one skilled in the art that the present disclosure may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present disclosure by illustrating examples of the present disclosure.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
At present, most enterprises are still in the initial stage of the technical level in the aspect of data monitoring, mainly manual test monitoring is adopted, and the efficient and accurate monitoring requirements on data stability cannot be met under the current situations that enterprise business scenes are more and data quantity is more and more large. Particularly, for monitoring financial data, the traditional mode is that the variance or the ring-to-ring ratio is utilized, and the system stability is generally considered to be achieved when the fluctuation of some index is small (low variance); however, the variance calculation amount is large, the calculation resources are consumed, the realization is difficult, and the actual operation is not facilitated.
In order to solve the prior art problem, embodiments of the present disclosure provide a data monitoring method, apparatus, device, and computer storage medium, which can be applied to data stability monitoring, improve computing efficiency, and automatically alarm when data stability is abnormal.
First, a data monitoring method provided by the embodiment of the present disclosure is described below.
Fig. 1 shows a schematic flow chart of a data monitoring method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the steps of:
s100, acquiring label data corresponding to a target label through an inquiry statement in the script file;
s200, calculating stability index data of the label data;
s300, calling an external program file to generate alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file;
and S400, outputting alarm information.
In this embodiment, a script file including an inquiry statement and a call statement is deployed in advance in a business system of an enterprise, and through the step S100, the script file is run, and tag data corresponding to a target tag is automatically acquired from a data warehouse; through the step S200, calculating and monitoring stability index data, through the step S300, when the stability index data exceeds a preset threshold range, calling an external program file to generate alarm information, then outputting the alarm information through the step S400, and informing workers of knowing, so that the purposes of automatically acquiring data, automatically calculating and monitoring data stability and alarming are achieved; compared with the traditional manual test monitoring, the efficiency is higher, and the calculation accuracy is also higher.
In this embodiment, before step s100, as shown in fig. 2, the data monitoring method of the present disclosure may further include s500. acquiring an external program file and a script file.
Specifically, the step S500 includes S501, obtaining an external program file.
The manner of acquiring the external program file includes writing or copying. The program file includes a generation rule of the alarm information defined by a programming language, and an output rule of the alarm information.
The external program file may define the generation rule of the alarm information by the code, and the generation rule may include: and generating the theme and/or content of the alarm information according to the condition information when the stability index data exceeds the preset threshold range and the preset format and/or style.
The output rules may include: and sending the theme and/or the content of the generated alarm information to the address information according to the preset address information.
In an example provided by the present disclosure, the warning information is a mail message, and in other optional embodiments, the warning information may also be a short message or a message that can be received by a communication program used inside an enterprise. The address information in the output rule may be correspondingly set to the mailbox address, the mobile phone number, or the personnel account of the communication program used in the enterprise, and is used to send the generated warning mail or information to the corresponding staff.
In this example, the generation rule may customize the style and format of the subject and content of the alert mail. The subject of a general alarm mail may contain the abbreviation "xx alarm" where "xx" may be date, data causing an alarm, or other custom name, without limitation. The content of the alarm mail is a detailed description of the stability index data exceeding the preset threshold range, and generally may include the stability index data of a certain or some tags, the condition exceeding the preset threshold range, the table and partition of the database where the calculated tag data is located, and the like.
In this embodiment, when the external program file is written, a java programming language may be used to invoke a communication interface inside an enterprise, and code for generating and sending an email is written, so as to alarm by sending the email. After the codes are written, the codes can be packed into jar packages executable by the linux system without errors after testing, and the jar packages are uploaded and stored in enterprise business system equipment for calling of subsequent script files.
In other embodiments, the jar package obtaining mode of the external program file in the enterprise system may also obtain a jar package which is written and tested and packaged well from other storage devices (such as a usb disk, a hard disk, or other computers) by means of copying, and upload and store the jar package in business system devices of the enterprise for subsequent call of the script file.
Step S500 may also include S502, obtaining a script file.
The acquisition mode of the script file comprises writing and copying. In the example given in the present disclosure, the script file may be a shell script executable by the linux system; in other alternative examples, the script file may also be a python script, a Perl script, or the like.
The script file comprises a query statement and the calling statement. The query statement is used for acquiring tag data from a table and a partition designated in the database so as to calculate stability index data; the calling statement is used for calling the external program file when the stability index data exceeds a preset threshold range.
In this embodiment, the Query statement may be a hive sql (Structured Query Language) statement, and the tag data of the target tag is obtained from a hive table of a database of financial data. The label is a data form used for describing the characteristics of the business entity in the database, and can describe and define certain attributes or behavior statistics of the user; the target tags are some tags that need to be acquired to monitor the stability of the data in the embodiment of the present disclosure. For example, in a financial database of a financial credit assessment business system, a financial wind control business system, or the like, in order to perform data stability assessment, calculation may be performed from various tags of a user, with a tag such as a "risk score" or a "financial preference" as a target tag. If the label is a risk score label, the label is numerical type, the data corresponding to the label is numerical data, the higher the numerical value is, the higher the risk is, the larger the influence on the stability of the data is possibly; if the tag is a "financing preference" tag, the tag is an enumeration type, and the data corresponding to the tag may be assignments of various financing preference types (for example, types including investor type, growth, income type, robustness type, and the like), and changes in the financing preference types (obtaining corresponding assignments) may also cause changes in data stability.
Therefore, in this embodiment, the query statement written in the shell script file is obtained from the tag data causing the change in data stability, and is used for subsequent calculation.
In this example, based on the obtained shell script file, through the hive sql statement written in the script file, the tag data of the target tag can be automatically obtained from the hive table of the financial database of the business system, and the stable value index data is calculated. In this example, the stable value Index data may be a PSI (PSI, Population Stability Index) value.
PSI is a financial index suitable for financial wind control services, and can be used for measuring deviation between a predicted value and an actual value, the smaller the PSI value is, the smaller the difference between two data distributions is, the more stable the representative data is, the stability monitoring of financial data is used, and the measurement of the stability of a relatively concise financial tag can be realized. The calculation formula of the PSI value is as follows:
Figure 490032DEST_PATH_IMAGE001
Figure 344856DEST_PATH_IMAGE002
and
Figure 655752DEST_PATH_IMAGE003
and respectively a predicted value and an actual value, carrying out equal frequency division on the values of the two data sets, and using a letter i to represent the ith segmentation interval.
In the present example of the present invention,
Figure 221600DEST_PATH_IMAGE002
and
Figure 301551DEST_PATH_IMAGE003
the tag data of the same tag at two different times can be represented in the financial database, and the tag data of each time can be used as a data partition. For example, the financial wind-control business system generates label data of a certain characteristic label in the financial database in 6.30.2020, the label data can be used as a data partition with the partition name of "20200630", and the data of the partition is taken as the data of the data partition
Figure 61697DEST_PATH_IMAGE002
Collecting; the financial wind control business system generates label data of the same characteristic label in a financial database in 12, 30 and 2020, can be used as another data partition with the partition name of '20201230', and takes the data of the partition as the data
Figure 735255DEST_PATH_IMAGE003
And (4) collecting. And acquiring label data of the two data partitions by an inquiry statement hive sql, and calculating the PSI values of 20200630 partitions and 20201230 partitions by the calculation formula of the PSI values, namely calculating the data stability spanning half a year (6 months) from 30 days 6 and 30 days 2020 and 12 and 30 days 2020.
Generally, the threshold range of PSI values is related to data stability, as shown in table 1.
TABLE 1
Figure 465313DEST_PATH_IMAGE004
In step S502, the calling statement of the shell script file may be used to call the external program file for sending the alert mail when the calculated PSI value exceeds the threshold range of 0.1 according to the relationship between the threshold range of the PSI value and the data stability.
After acquiring the external program file and the script file, step S500 in this embodiment further includes S503: deploying the script file and operating the script file according to a preset scheduling strategy
When the script file is deployed, the script file and the external program file can be uploaded and configured in an enterprise dispatching system (for example, a financial wind control service system) or a cluster. The operating system based on the enterprise scheduling system can be a linux system, and the enterprise scheduling system can have corresponding functional modules according to actual internal business requirements of the enterprise, which are not limited uniquely here; but the enterprise scheduling system has the function of running tasks at fixed time so as to be beneficial to the execution of the scheduling strategy; the task is run at fixed time, namely the system has a timing function and can automatically run a certain task at the configured time. In this embodiment, the scheduling policy may include: setting scheduling time according to the data partition generation time and the updating frequency of the target label; the update frequency includes daily update (once per day), weekly update (once per week), monthly update (once per month), annual update (once per year), etc., and the generation time is a specific time at the time of data update, such as 1 hour 00 minutes 00 seconds for daily update, X minutes X seconds for X hours of the last day of each month, etc.; the schedule time is to be set after the database table partition generation time. The interval of the schedule time may be, but is not limited to, one day, one week, one month, one year or half year according to the generation time and the update frequency.
The scheduling policy may further include: and periodically running the script file according to the set scheduling time. And running the script file, namely executing the steps S100 to S300. After the script file is configured to the enterprise scheduling system, the system automatically and periodically operates the script file according to a scheduling strategy, so that the periodic stability of data monitoring can be realized, the configuration correctness is ensured, and the reliability of the data monitoring is further improved.
After the deployment in step S503 is completed, as shown in fig. 3, the scheduling system of the enterprise periodically runs the shell script file. Step S100 is performed periodically, and tag data corresponding to the target tag is obtained through the query statement in the script file. In S100, tag data is obtained from a hive table of a financial database by using hive sql sentences of the script file, wherein the tag data comprises data in data partitions generated by the same target tag at different time.
In this embodiment, through the regular operation of shell script file and the acquisition of inquiry sentence to the tag data, can realize the automation of data stability calculation and pass the parameter, compare in traditional artifical test mode, the monitoring process of this disclosure fetches data and calculation logic science, handles high-efficiently, and because reduced the human input, can avoid because the data import mistake that manual operation leads to is missed.
After the tag data is acquired, step S200 may be performed to calculate stability index data of the tag data. In this embodiment, the PSI values of the two data partitions are calculated by using the above PSI calculation formula for the acquired tag data of the at least two data partitions, and then the PSI value calculation result is stored in the result table. In this embodiment, the stability index data of the tag data may also be calculated through a hive sql statement written in the script file, and logic of the tag data acquisition and PSI value calculation is defined by using the hive sql statement in the script, so that the data acquisition and PSI value calculation processes are automatically implemented when the script runs. In other optional embodiments, step S200 may also implement the corresponding computation logic through other suitable script statements, which is not described again.
The data monitoring method of the embodiment can automatically run the script file regularly along with the update of the tag data of the database, so along with the update of the data, the PSI value can also be automatically calculated and updated, the updated PSI value is written into the database, then the step S300 is carried out, the PSI value written in the result table is called out and compared with the preset PSI threshold range, when the PSI value is larger than 0.1, the external program file is called to generate the alarm information through the calling statement of the script file, and the alarm information is sent to the preset mailbox address through the step S400 to inform the staff of knowing. After the mail is sent, the current script file is operated and the current monitoring task is finished.
The preset threshold range of the PSI value triggering the alarm may not be limited to 0.1 in this example, and may be set according to the actual situation.
When the PSI value exceeds the preset threshold range, the script file of this embodiment invokes a jar packet of an external program file on the scheduling system to generate an alert mail, where the alert mail may include specific descriptions of information on conditions that the PSI value exceeds the preset threshold range, such as PSI values of certain tag or tags, unstable ranges (for example, ranges of 0.1 to 0.25 or greater than 0.25), and information on calculated tables (for example, hive tables) and partitions.
And if the PSI value in the result table does not exceed the preset threshold range, no alarm mail is sent, the current script file is operated, and the current monitoring task is finished.
In the embodiment, the external program file jar packet is called for alarming, so that compared with the traditional manual monitoring technology, the method is more stable, the error rate is extremely low, a large amount of labor cost and time cost are saved, meanwhile, based on the regular operation of the script file and the monitoring and alarming of the data stability, the abnormal condition of the data stability can be automatically and timely informed to the staff, and the maintenance and the updating of an enterprise scheduling system (namely a business system) are facilitated; if the data table needs to be replaced, only the table name needs to be modified in the script file, and the maintenance cost is low. Therefore, the method of the embodiment can directly complete monitoring and alarming of the enterprise scheduling system by running the script and calling the external program without excessively changing the enterprise scheduling system (namely, the service system), is convenient to maintain and update, and has low maintenance cost.
Fig. 4 is a schematic structural diagram of a data monitoring apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus may include:
a first obtaining module 201, configured to obtain, through an inquiry statement in the script file, tag data corresponding to a target tag;
a calculating module 202, configured to calculate stability index data of the tag data;
the generating module 203 is configured to invoke an external program file to generate alarm information when the stability index data exceeds a preset threshold range through an invoking statement in the script file;
and the output module 204 is configured to output the alarm information.
In some embodiments, as shown in fig. 4, the data monitoring device may further include,
a second obtaining module 205, configured to obtain an external program file and obtain a script file; the script file comprises an inquiry statement and a calling statement; the calling statement is used for calling an external program file when the stability index data exceeds a preset threshold range;
and the deployment module 206 is configured to deploy the script file and run the script file according to a preset scheduling policy.
In this embodiment, the second obtaining module may be configured to implement S501 to S502 in step S500 of the data monitoring method shown in fig. 2, and the obtaining manner of the external program file and the script file may be writing or copying.
Wherein the external program file includes a generation rule of the alarm information defined by a programming language and an output rule of the alarm information. The external program file may define the generation rule of the alarm information by the code, and the generation rule may include: and generating the theme and/or content of the alarm information according to the preset format and/or style and the condition information when the stability index data exceeds the preset threshold range. The output rules may include: and sending the theme and/or the content of the generated alarm information to the address information according to the preset address information.
In an example provided by the present disclosure, the warning information is a mail message, and in other optional embodiments, the warning information may also be a short message or a message that can be received by a communication program used inside an enterprise. The address information in the output rule may be correspondingly set to the mailbox address, the mobile phone number, or the personnel account of the communication program used in the enterprise, and is used to send the generated warning mail or information to the corresponding staff.
In this example, the generation rule may customize the style and format of the subject and content of the alert mail. The subject of a general alarm mail may contain the abbreviation "xx alarm" where "xx" may be date, data causing an alarm, or other custom name, without limitation. The content of the alarm mail is a detailed description of the stability index data exceeding the preset threshold range, and generally may include the stability index data of a certain or some tags, the condition exceeding the preset threshold range, the table and partition of the database where the calculated tag data is located, and the like.
In this embodiment, when the second obtaining module 205 writes the external program file, a java programming language may be used to invoke a communication interface inside an enterprise, and write a code for generating and sending an email, so as to send the email for warning. After the code is written, the code can be packed into a jar packet executable by the linux system without errors after testing, and the jar packet is stored in a system of an enterprise for calling a subsequent script file.
In other embodiments, the jar package obtaining manner of the second obtaining module 205 for the external program file may also obtain, by means of copying, a jar package that has been written, tested, and packaged from another storage device (such as a usb disk, a hard disk, or another computer), upload, and store in a system of an enterprise, so as to be used for calling a subsequent script file.
In the example given in the present disclosure, the script file may be a shell script that is executable by the linux system. In other alternative examples, the script file may also be a python script, a Perl script, or the like. The script file comprises a query statement and the calling statement. The query statement is used for acquiring tag data from a table and a partition designated in the database so as to calculate stability index data; the calling statement is used for calling the external program file when the stability index data exceeds a preset threshold range.
In this embodiment, the Query statement may be a hive sql (Structured Query Language) statement, and the tag data of the target tag is obtained from a hive table of a database of financial data. The label is a data form used for describing the characteristics of the business entity in the database, and can describe and define certain attributes or behavior statistics of the user; the target tags are some tags that need to be acquired to monitor the stability of the data in the embodiment of the present disclosure.
In this embodiment, the query statement is written in the shell script file of the second obtaining module 205, and is used to obtain the tag data causing the change of the data stability, and use the tag data for subsequent calculation. In this example, the first obtaining module 201 may obtain the tag data of the target tag from the financial database hive table of the business system automatically through the hive sql statement written in the script file based on the obtained shell script file, and perform the calculation of the stable value index data. In this example, the stable value Index data may be a PSI (PSI, Population Stability Index) value.
PSI is a financial index suitable for financial wind control services, and can be used for measuring deviation between a predicted value and an actual value, the smaller the PSI value is, the smaller the difference between two data distributions is, the more stable the representative data is, the stability monitoring of financial data is used, and the measurement of the stability of a relatively concise financial tag can be realized. The calculation formula of the PSI value is as follows:
Figure 806296DEST_PATH_IMAGE005
Figure 330818DEST_PATH_IMAGE006
and
Figure 255787DEST_PATH_IMAGE007
and respectively a predicted value and an actual value, carrying out equal frequency division on the values of the two data sets, and using a letter i to represent the ith segmentation interval.
At this pointIn the example shown in the figure, the water-soluble polymer,
Figure 258378DEST_PATH_IMAGE006
and
Figure 453867DEST_PATH_IMAGE007
the tag data of the same tag at two different times can be represented in the financial database, and the tag data of each time can be used as a data partition. For example, the financial wind-control business system generates label data of a certain characteristic label in the financial database in 6.30.2020, the label data can be used as a data partition with the partition name of "20200630", and the data of the partition is taken as the data of the data partition
Figure 290236DEST_PATH_IMAGE006
Collecting; the financial wind control business system generates label data of the same characteristic label in a financial database in 12, 30 and 2020, can be used as another data partition with the partition name of '20201230', and takes the data of the partition as the data
Figure 328599DEST_PATH_IMAGE007
And (4) collecting. By inquiring the statement hive sql to obtain the label data of the two data partitions, the PSI values of 20200630 partition and 20201230 partition can be calculated by the calculation formula of PSI values, that is, the data stability spanning half a year (6 months) is calculated from 30 days 6 and 30 days 2020 and 12 and 30 days 2020.
Generally, the threshold range of PSI values is related to data stability, as shown in table 2.
TABLE 2
Figure 275826DEST_PATH_IMAGE008
In the shell script file acquired by the second acquiring module 205, the calling statement may be used to call the external program file for sending the alert mail when the calculated PSI value exceeds the threshold range of 0.1 according to the relationship between the threshold range of the PSI value and the data stability.
After the second obtaining module 205 obtains the external program file and the script file, the deploying module 206 of the apparatus in this embodiment may implement step S503 of the data monitoring method shown in fig. 2, deploy the script file, and run the script file according to the preset scheduling policy
When the deployment module 206 deploys the script file, both the script file and the external program file may be uploaded and configured in an enterprise scheduling system (e.g., a financial wind control service system) or a cluster. The operating system of the enterprise scheduling system can be a linux system, and the enterprise scheduling system can have a corresponding functional module according to the actual internal business requirements of the enterprise, but has a function of executing tasks at regular time so as to be beneficial to implementing the scheduling strategy; the task is executed at fixed time, namely the system has a timing function and can automatically execute a certain task at the configured time. In this embodiment, the scheduling policy may include: and setting scheduling time according to the data partition generation time and the updating frequency of the target label. The update frequency includes daily update (once per day), weekly update (once per week), monthly update (once per month), annual update (once per year), etc., and the generation time is a specific time at the time of data update, such as 1 hour 00 minutes 00 seconds for daily update, X minutes X seconds for X hours of the last day of each month, etc.; the scheduling time is to be set after the partition generation time of the database table, and the interval of the scheduling time may be, but is not limited to, one day, one week, one month, one year, or half year, depending on the generation time and the update frequency.
The scheduling policy of the deployment module 206 may further include: according to the set scheduling time, the script file is periodically run through the first obtaining module 201, the calculating module 202 and the generating module 203, and steps S100 to S300 in the data monitoring method shown in fig. 1 are executed. By automatically and regularly running the script files by the system, the periodic stability of data monitoring can be realized, the configuration correctness is ensured, and the reliability of data monitoring is further improved.
When the first obtaining module 201 is periodically called by the deployment module 206, step S100 of the data monitoring method shown in fig. 1 may be implemented, and the tag data corresponding to the target tag is obtained through the query statement in the script file. The first obtaining module 201 specifically uses the hive sql statement of the script file to obtain tag data from the hive table of the financial database, where the tag data includes data in data partitions generated by the same target tag at different time. In this embodiment, the first obtaining module 201 can realize automatic parameter transmission of data stability calculation through regular running of the shell script file and obtaining of the tag data through the query statement, and compared with a traditional manual testing mode, the data monitoring device disclosed by the present disclosure is more efficient and accurate in calculation processing of data, and because human input is reduced, data import mistakes and omissions caused by manual operation can be avoided.
After the first obtaining module 201 obtains the tag data, the calculating module 202 may implement step S200 in the data monitoring method shown in fig. 1, and calculate the stability index data of the tag data. In this embodiment, the calculation module 202 calculates PSI values of the two data partitions by using the above PSI calculation formula for the acquired tag data of the at least two data partitions, and then stores the PSI value calculation results in the result table.
The PSI value calculated by the calculating module 202 is written into the database, then the generating module 203 can implement step S300 of the data monitoring method shown in fig. 1, call out the PSI value written in the result table, compare with the preset PSI threshold range, when the PSI value is greater than 0.1, call the external program file to generate the alarm mail through the call statement of the script file,
the output module 204 sends the alarm mail to a preset mailbox address so as to be known by the staff, after the mail is sent, the current script file is finished, and the current monitoring task is completed.
The preset threshold range of the PSI value triggering the alarm may not be limited to 0.1 in this example, and may be set according to the actual situation.
When the PSI value exceeds the preset threshold range, the script file of this embodiment invokes a jar packet of an external program file on the scheduling system to generate an alert mail, where the alert mail may include specific descriptions of information on conditions that the PSI value exceeds the preset threshold range, such as PSI values of certain tag or tags, unstable ranges (for example, ranges of 0.1 to 0.25 or greater than 0.25), and information on calculated tables (for example, hive tables) and partitions.
And if the PSI value in the result table does not exceed the preset threshold range, no alarm mail is sent, the current script file is operated, and the current monitoring task is finished.
In the embodiment, the PSI value is automatically transmitted and calculated through the script file, and the mail alarm is carried out on the PSI value exceeding the threshold value, compared with the traditional manual monitoring technology, the monitoring capability of the device is more stable, the error rate is extremely low, a large amount of labor cost and time cost are saved, meanwhile, based on the regular operation of the script file and the monitoring and alarm on the data stability, the condition of abnormal data stability can be automatically and timely informed to the working personnel, and the maintenance and the update of an enterprise scheduling system (service system) are facilitated; if the data table needs to be replaced, only the table name needs to be modified in the script file, and the maintenance cost is low.
Fig. 5 shows a hardware structure diagram of a data monitoring device provided in an embodiment of the present disclosure.
The data monitoring device may include a processor 301 and a memory 302 having stored computer program instructions.
In particular, the processor 301 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present disclosure.
Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. In one example, memory 302 can include removable or non-removable (or fixed) media, or memory 302 is non-volatile solid-state memory. The memory 302 may be internal or external to the integrated gateway disaster recovery device.
The memory 302 may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.
The processor 301 reads and executes the computer program instructions stored in the memory 302 to implement the methods/steps S100 to S500 in the embodiment shown in fig. 2, and achieve the corresponding technical effects achieved by the embodiment shown in fig. 2 executing the methods/steps thereof, which are not described herein again for brevity.
In one example, the data monitoring device may also include a communication interface 303 and a bus 310. As shown in fig. 5, the processor 301, the memory 302, and the communication interface 303 are connected via a bus 310 to complete communication therebetween.
The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present disclosure.
Bus 310 includes hardware, software, or both to couple the components of the data monitoring device to each other. By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus, FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) Bus, an infiniband interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Micro Channel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a video electronics standards association local (VLB) Bus, or other suitable Bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although this disclosed embodiment describes and illustrates a particular bus, this disclosure contemplates any suitable bus or interconnect.
In addition, in combination with the data monitoring method in the foregoing embodiments, the embodiments of the present disclosure may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the data monitoring methods in the above embodiments.
It is to be understood that this disclosure is not limited to the particular configurations and processes described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present disclosure are not limited to the specific steps described and illustrated, and those skilled in the art may make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present disclosure.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present disclosure are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present disclosure is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed several steps at the same time.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present disclosure are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the present disclosure, and these modifications or substitutions should be covered within the scope of the present disclosure.

Claims (9)

1. A method for monitoring data, the method comprising:
acquiring label data corresponding to a target label through an inquiry statement in a script file according to a preset scheduling strategy; the preset scheduling strategy is used for setting scheduling time according to the data partition generation time and the updating frequency of the target label and regularly running the script file according to the set scheduling time;
calculating stability index data of the tag data;
calling an external program file to generate alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file;
and outputting the alarm information.
2. The data monitoring method according to claim 1, wherein before the tag data corresponding to the target tag is obtained through the query statement in the script file, the method further comprises:
acquiring the external program file and the script file; the script file comprises the query statement and the calling statement; the calling statement is used for calling the external program file when the stability index data exceeds a preset threshold range;
and deploying the script file, and operating the script file according to a preset scheduling strategy.
3. The data monitoring method of claim 2, wherein the obtaining the external program file comprises:
writing or copying the external program file; the external program file comprises a generation rule of the alarm information defined by a programming language and an output rule of the alarm information;
and generating the theme and/or the content of the alarm information according to a preset format and/or a preset style according to the condition information when the stability index data exceeds a preset threshold range.
4. A data monitoring method according to any one of claims 1 to 3, wherein the stability indicator data is a population stability indicator value.
5. A data monitoring apparatus, the apparatus comprising:
the first obtaining module is used for obtaining the label data corresponding to the target label through the query statement in the script file according to a preset scheduling strategy; the preset scheduling strategy is used for setting scheduling time according to the data partition generation time and the updating frequency of the target label and regularly running the script file according to the set scheduling time;
the calculation module is used for calculating stability index data of the label data;
the generating module is used for calling an external program file to generate alarm information when the stability index data exceeds a preset threshold range through a calling statement in the script file;
and the output module is used for outputting the alarm information.
6. The data monitoring device of claim 5, further comprising:
the second acquisition module is used for acquiring the external program file and the script file; the script file comprises the query statement and the calling statement; the calling statement is used for calling the external program file when the stability index data exceeds a preset threshold range;
and the deployment module is used for deploying the script file and operating the script file according to a preset scheduling strategy.
7. The data monitoring device of claim 6, wherein the second obtaining module is specifically configured to:
writing or copying the external program file; the external program file comprises a generation rule of the alarm information defined by a programming language and an output rule of the alarm information;
and generating the theme and/or the content of the alarm information according to a preset format and/or a preset style according to the condition information when the stability index data exceeds a preset threshold range.
8. A data monitoring device, the device comprising: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the data monitoring method of any one of claims 1-4.
9. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement a data monitoring method as claimed in any one of claims 1 to 4.
CN202110392180.2A 2021-04-13 2021-04-13 Data monitoring method, device, equipment and computer storage medium Pending CN112799919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110392180.2A CN112799919A (en) 2021-04-13 2021-04-13 Data monitoring method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110392180.2A CN112799919A (en) 2021-04-13 2021-04-13 Data monitoring method, device, equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN112799919A true CN112799919A (en) 2021-05-14

Family

ID=75816887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110392180.2A Pending CN112799919A (en) 2021-04-13 2021-04-13 Data monitoring method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN112799919A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641557A (en) * 2021-08-30 2021-11-12 平安证券股份有限公司 Data monitoring processing method, device, equipment and storage medium
CN116662122A (en) * 2023-06-06 2023-08-29 长春师范大学 Monitoring method, system, equipment and medium based on service monitoring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391337A (en) * 2017-07-18 2017-11-24 郑州云海信息技术有限公司 A kind of data monitoring method and device
CN109245966A (en) * 2018-11-05 2019-01-18 郑州云海信息技术有限公司 The monitoring method and device of the service state of cloud platform
CN111884878A (en) * 2020-07-24 2020-11-03 樊馨 Data monitoring method based on block chain
CN111950623A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Data stability monitoring method and device, computer equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391337A (en) * 2017-07-18 2017-11-24 郑州云海信息技术有限公司 A kind of data monitoring method and device
CN109245966A (en) * 2018-11-05 2019-01-18 郑州云海信息技术有限公司 The monitoring method and device of the service state of cloud platform
CN111884878A (en) * 2020-07-24 2020-11-03 樊馨 Data monitoring method based on block chain
CN111950623A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Data stability monitoring method and device, computer equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641557A (en) * 2021-08-30 2021-11-12 平安证券股份有限公司 Data monitoring processing method, device, equipment and storage medium
CN113641557B (en) * 2021-08-30 2024-05-03 平安证券股份有限公司 Data monitoring processing method, device, equipment and storage medium
CN116662122A (en) * 2023-06-06 2023-08-29 长春师范大学 Monitoring method, system, equipment and medium based on service monitoring

Similar Documents

Publication Publication Date Title
CN107810500B (en) Data quality analysis
CN112799919A (en) Data monitoring method, device, equipment and computer storage medium
TW202215243A (en) Abnormity alarm method, device and equipment and storage medium
CN108228469B (en) Test case selection method and device
CN112306808B (en) Performance monitoring and evaluating method and device, computer equipment and readable storage medium
CN107025224B (en) Method and equipment for monitoring task operation
CN110532152A (en) A kind of monitoring alarm processing method and system based on Kapacitor computing engines
CN109635564A (en) A kind of method, apparatus, medium and equipment detecting Brute Force behavior
CN110069925B (en) Software monitoring method, system and computer readable storage medium
CN111400294A (en) Data anomaly monitoring method, device and system
US11790249B1 (en) Automatically evaluating application architecture through architecture-as-code
CN115509797A (en) Method, device, equipment and medium for determining fault category
CN110413638A (en) A kind of SQL detection device and method
CN112819621B (en) Intelligent contract resource loss testing method and system
CN106294115A (en) The method of testing of a kind of application system animal migration and device
CN112765044A (en) Abnormal data detection method, device, equipment and storage medium
CN115033412A (en) Task log merging method and device
US7889067B2 (en) Alarm information processing device and alarm information processing method
CN112561385A (en) Risk monitoring method and system
CN112527614A (en) Intelligent device log quantity early warning system and method
CN111880959A (en) Abnormity detection method and device and electronic equipment
CN112988507B (en) Service monitoring method, device, equipment, storage medium and computer program product
CN113505159B (en) Data detection method, device and equipment
CN113806196B (en) Root cause analysis method and system
US20240160506A1 (en) Operation support apparatus, system, method, and computer-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210514

RJ01 Rejection of invention patent application after publication