CN110472114B

CN110472114B - Abnormal data early warning method and device, computer equipment and storage medium

Info

Publication number: CN110472114B
Application number: CN201910595991.5A
Authority: CN
Inventors: 陈小翔; 黄文聪
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2024-01-26
Anticipated expiration: 2039-07-03
Also published as: CN110472114A

Abstract

The invention discloses an abnormal data early warning method, an abnormal data early warning device, computer equipment and a storage medium, wherein the abnormal data early warning method comprises the steps of acquiring a data statistics request, wherein the data statistics request comprises constraint conditions and classification labels; inquiring in a database according to the constraint condition to obtain data to be evaluated; identifying the data to be evaluated, and obtaining information to be evaluated of each data to be evaluated; classifying the information to be evaluated by adopting a classification label to obtain an information classification set to be evaluated; integrating the information to be evaluated in each information set to be evaluated to obtain the statistical data of each information set to be evaluated, and transversely matching the statistical information of each information set to be evaluated according to different dimensions to obtain the abnormal information of each information set to be evaluated; carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, and generating early warning information according to target abnormal information failing the compliance verification; therefore, the efficiency and the accuracy of early warning of abnormal data are improved.

Description

Abnormal data early warning method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and apparatus for early warning abnormal data, a computer device, and a storage medium.

Background

With the continuous development of internet technology and cloud computing, the human society starts to step into a big data age, and based on the complexity and diversity of big data, some abnormal data which does not meet the requirements often exist in massive data, so that accurate early warning of the data becomes more and more important. The traditional abnormal data early warning method only carries out early warning according to the early warning conditions preset by the system; for data which can be changed or updated frequently, the traditional abnormal data early warning method is adopted to perform data early warning, so that a plurality of limitations exist, and the early warning result of the abnormal data is not accurate enough.

Disclosure of Invention

The embodiment of the invention provides an abnormal data early warning method, an abnormal data early warning device, computer equipment and a storage medium, which are used for solving the problem of low accuracy of early warning of abnormal data.

An abnormal data early warning method comprises the following steps:

acquiring a data statistics request, wherein the data statistics request comprises constraint conditions and classification labels;

inquiring in a database according to the constraint condition to obtain data to be evaluated;

Identifying the data to be evaluated, and obtaining information to be evaluated of each piece of data to be evaluated;

classifying the information to be evaluated by adopting the classification tag to obtain an information classification set to be evaluated, wherein the information classification set to be evaluated comprises N information sets to be evaluated, and N is a positive integer;

integrating the information to be evaluated of each information set to be evaluated to obtain statistical data of each information set to be evaluated, wherein the statistical data comprises statistical information of at least two dimensions;

carrying out transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain abnormal information of each information set to be evaluated;

and carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information.

An abnormal data early warning device, comprising:

the data statistics request acquisition module is used for acquiring data statistics requests, wherein the data statistics requests comprise constraint conditions and classification labels;

the data to be evaluated acquisition module is used for inquiring in the database according to the constraint condition to acquire the data to be evaluated;

The identification module is used for identifying the data to be evaluated and acquiring information to be evaluated of each piece of data to be evaluated;

the classification module is used for classifying the information to be evaluated by adopting the classification label to obtain an information classification set to be evaluated, wherein the information classification set to be evaluated comprises N information sets to be evaluated, and N is a positive integer;

the integration module is used for integrating the information to be evaluated of each information set to be evaluated to obtain the statistical data of each information set to be evaluated, wherein the statistical data comprises statistical information of at least two dimensions;

the transverse matching module is used for transversely matching the statistical information of each information set to be evaluated according to different dimensions to obtain abnormal information of each information set to be evaluated;

and the compliance verification module is used for carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above-mentioned abnormal data early warning method when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the abnormal data early warning method described above.

The abnormal data early warning method, the abnormal data early warning device, the computer equipment and the storage medium are characterized in that the data statistics request comprises constraint conditions and classification labels; inquiring in a database according to the constraint condition to obtain data to be evaluated; identifying the data to be evaluated, and obtaining information to be evaluated of each data to be evaluated; classifying information to be evaluated by using a classification label to obtain an information classification set to be evaluated, wherein the information classification set to be evaluated comprises N information sets to be evaluated, and N is a positive integer; integrating the information to be evaluated of each information set to be evaluated to obtain statistical data of each information set to be evaluated, wherein the statistical data comprise statistical information of at least two dimensions; carrying out transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain the abnormal information of each information set to be evaluated; carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information; therefore, the efficiency and the accuracy of early warning of abnormal data are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of an abnormal data early warning method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of an anomaly data early warning method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating another embodiment of an abnormal data early warning method according to the present invention;

FIG. 4 is a diagram illustrating another embodiment of an abnormal data early warning method according to the present invention;

FIG. 5 is a diagram illustrating another example of an anomaly data early warning method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating another example of an anomaly data early warning method according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of an abnormal data early warning device according to an embodiment of the present invention;

FIG. 8 is another schematic block diagram of an abnormal data early warning device according to an embodiment of the present invention;

FIG. 9 is another schematic block diagram of an abnormal data early warning device according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The abnormal data early warning method provided by the embodiment of the invention can be applied to an application environment shown in figure 1. Specifically, the abnormal data early warning method is applied to an abnormal data early warning system, the abnormal data early warning system comprises a client and a server as shown in fig. 1, and the client and the server communicate through a network to solve the problem of low accuracy of data early warning. The client is also called a user end, and refers to a program corresponding to the server end for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.

In an embodiment, as shown in fig. 2, an abnormal data early warning method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

s10: a data statistics request is obtained, the data statistics request including constraint conditions and classification tags.

Wherein, the data statistics request refers to a trigger request for counting data. Optionally, the data statistics request may be sent to the server by the client, or may be set by the client to trigger at a fixed time, or set a triggering condition to trigger. The trigger condition may be data amount, time, or the like. For example, the client may send the data statistics request to the server periodically according to a trigger period and a specific trigger time by setting a trigger period and a trigger time. For example: setting the triggering period as one month and the triggering time as number 1 of each month, the client side can send a data statistics request to the server side at number 1 of each month.

Constraint refers to a condition for screening or selecting data. Alternatively, the constraints may be data transaction time, data import time, data object, data type, data source or data amount, etc. Specifically, after the user inputs or selects corresponding constraint information in the constraint condition screening box of the client page, a constraint condition can be generated. The constraint information refers to information input or selected by a user when screening data on a client page. The class label refers to one type of identification information for distinguishing different types of data. Alternatively, the category labels may identify information for different types of data, such as consumption, health, transaction, financial, or testing. For example: consumption may also include travel, business trips, or dining, among others.

In an application scenario, after the corresponding constraint condition and classification label are respectively input or selected in the constraint condition screening frame and the classification label screening frame of the client page, a user clicks a confirmation button or inputs a corresponding control instruction through a command line, so that the data statistics request is triggered, and the data statistics request is sent to the server, so that the server receives the data statistics request, and the server can execute data statistics operation according to the data statistics request.

S20: inquiring in a database according to the constraint condition to obtain the data to be evaluated.

Wherein, the data to be evaluated refers to the data which is screened from the database and meets the constraint condition. Alternatively, the data to be evaluated may be consumption data, business data, health data, or the like. In a specific embodiment, a plurality of data are pre-stored in the database, and each data has a corresponding data tag, where the data tag may be a storage time of each data to be evaluated when the data to be evaluated is stored in the database, or may be a data attribute of each data to be evaluated. Specifically, after obtaining the constraint condition carried by the data statistics request, the server side uses the constraint condition as a query field, executes a query statement corresponding to the query field, and obtains to-be-evaluated data meeting the constraint condition from the database. Illustratively, if the constraint condition carried by the data statistics request obtained from the client includes a data import time: 2018-01-01 to 2018-12-12, data source: xx company, data type: and consuming data, namely screening a plurality of data stored in the database in advance by the server according to constraint conditions, wherein the acquired data to be evaluated is consuming data of xx company in a 2018-01-2018-12-12 time period.

S30: and identifying the data to be evaluated, and obtaining the information to be evaluated of each data to be evaluated.

The information to be evaluated refers to character information generated after the data to be evaluated are identified. In a specific embodiment, the data to be evaluated may be stored in the database directly in the form of a scanned image or a picture when stored in the database. Therefore, in order to ensure the accuracy of classifying the data to be evaluated later, the data to be evaluated needs to be identified by adopting a text recognition technology, so as to obtain the information to be evaluated of each data to be evaluated. The character recognition technology may be OCR recognition technology. For example: if a certain data to be evaluated is a scanned picture of a food and beverage invoice, identifying the picture of the food and beverage invoice by adopting a text identification technology, and obtaining the information to be evaluated of the data to be evaluated comprises the following steps: consumption time: 10 months of 2018, 10 days of consumption total: 530 yuan, department: market department, consumption type: catering, etc. It can be understood that if the acquired data to be evaluated is a specific text message, the data to be evaluated is directly used as the information to be evaluated.

S40: classifying the information to be evaluated by using a classification label to obtain an information classification set to be evaluated, wherein the information classification set to be evaluated comprises N information sets to be evaluated, and N is a positive integer.

And classifying the information to be evaluated of each piece of data to be evaluated according to the classification label carried by the data statistics request to obtain a classification set of the information to be evaluated. The information classification set to be evaluated refers to a data set obtained by classifying the acquired information to be evaluated according to the classification label. The information classification set to be evaluated comprises N information sets to be evaluated. It is understood that the data types of the information to be evaluated contained in the same information set to be evaluated are the same, and the data types of the information to be evaluated contained in different information sets to be evaluated are different. For example: if the classification label comprises stationery, travel, catering and business trips, classifying the information to be evaluated by adopting the classification label, wherein the obtained information classification set to be evaluated comprises a stationery information set to be evaluated, a travel information set to be evaluated, a catering information set to be evaluated and a business trip information set to be evaluated.

Specifically, classifying the information to be evaluated by using a classification label, extracting a keyword from each information to be evaluated, then matching the classification label with the keyword of each information to be evaluated by using a character string matching algorithm or a regular matching method, and taking the classification label with the highest matching degree with the keyword of the information to be evaluated as the classification category of the information to be evaluated; or firstly extracting the characteristics of each piece of information to be evaluated to obtain the information characteristics of each piece of information to be evaluated; then, matching the classification label with the information characteristics of each piece of information to be evaluated by adopting a similarity algorithm, and taking the classification label with the highest similarity with the information characteristics of the information to be evaluated as the classification category of the information to be evaluated; and finally, the information to be evaluated of the same classification category is summarized into the same information set to be evaluated, the information to be evaluated of different classification categories is summarized into another information set to be evaluated, and finally, the information classification set to be evaluated, which is composed of N information sets to be evaluated, is obtained.

S50: and integrating the information to be evaluated of each information set to be evaluated to obtain the statistical data of each information set to be evaluated, wherein the statistical data comprises statistical information of at least two dimensions.

Specifically, integrating the information to be evaluated of each information set to be evaluated refers to a process of summarizing the information to be evaluated of each information set to be evaluated according to a preset statistical factor. Alternatively, the statistical factor may be that the information to be evaluated of each information set to be evaluated is integrated in units of departments, or the information to be evaluated of each information set to be evaluated is integrated in units of consumption amount, or the information to be evaluated of each information set to be evaluated is integrated in units of consumption time, or the like. Preferably, in order to ensure the intuitiveness and diversity of the obtained statistical data of each information set to be evaluated, in this embodiment, the information set to be evaluated of each information set to be evaluated is integrated with at least two different statistical factors, so as to obtain the statistical data of the statistical information including at least two dimensions. The dimensions refer to columns with classification meaning, and each dimension corresponds to one column in the data table. For example: and integrating the information to be evaluated of each information set to be evaluated by taking the department as a unit and taking the consumption time as a unit at the same time to obtain the statistical data of each information set to be evaluated. Wherein the statistics comprise statistics of 2 dimensions. For example, the statistics of the first dimension may include specific consumption data of different departments at the same consumption time, and the statistics of the second dimension may include specific consumption data of the same department at different consumption times.

In this step, the information to be evaluated of each information set to be evaluated is integrated by at least two different statistical factors, for example, by taking a department as a unit and taking a consumption time as a unit, so that the obtained statistical data of each information set to be evaluated can better represent the consumption data of different departments in the same time period and the consumption data of the same department in different time periods, and the intuitiveness and diversity of the obtained statistical data of each information set to be evaluated are further ensured.

S60: and carrying out transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain the abnormal information of each information set to be evaluated.

The abnormal information refers to data which is obtained after the statistical information on the same dimension in each information set to be evaluated is matched one by one and does not meet the preset requirement. Specifically, the information data of each information set to be evaluated is transversely matched according to different dimensions, statistical information on the same dimension in each information set to be evaluated can be summed and then averaged to obtain average data, then the statistical information on the dimension is respectively compared with the average data to obtain difference data, finally whether the difference data is in a set threshold range or not is judged, if the difference data is not in the set threshold range, the statistical information corresponding to the difference data is judged to be abnormal information, and if the difference data is in the set threshold range, the statistical information corresponding to the difference data is judged to be non-abnormal information. The set threshold is a threshold for evaluating whether the statistical information in the same dimension in each information set to be evaluated belongs to abnormal information, and a user can set the threshold in a self-defined mode according to actual conditions. It will be appreciated that the corresponding set thresholds in different dimensions may be different.

In addition, a corresponding preset value can be set for each piece of statistical information in each piece of information to be evaluated in advance, then each piece of statistical information is associated with the corresponding preset value, an index is established, and the corresponding preset value is stored in a database of a server; after obtaining the statistical information of each information set to be evaluated, each statistical information can acquire a corresponding preset value from a database of the server according to the index, and the corresponding preset value is compared with the acquired preset value one by one; if the preset value is a specific value, the statistical information exceeding the corresponding preset value is determined as the abnormal information, and if the preset value is a value range, the statistical information not in the corresponding value range is determined as the abnormal information.

S70: and carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, acquiring target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information.

The reference data refers to data for evaluating whether or not the abnormality information of each information set to be evaluated is non-compliant abnormality information. In a specific embodiment, since the reference data corresponding to each piece of abnormal information may be different, when the preset reference data is adopted to perform compliance verification on the abnormal information of each information set to be evaluated, the corresponding reference data is acquired from the database according to the target header information corresponding to each piece of abnormal information, then whether the abnormal information meets the corresponding reference data is judged, if the abnormal information does not meet the corresponding reference data, the abnormal information is determined to be non-compliant abnormal information, namely, the target abnormal information, otherwise, if the abnormal information meets the corresponding reference data, the abnormal information is determined to be compliant abnormal information. Alternatively, the reference data may be a specific value or a range of values, if the reference data is a specific value, the anomaly information exceeding the corresponding reference data is determined as the target anomaly information of the failure of the compliance verification, and if the reference data is a range of values, the anomaly information not within the corresponding range of values is determined as the target anomaly information of the failure of the compliance verification.

Further, after the target abnormal information is determined, the server side generates corresponding early warning information according to each target abnormal information. The early warning information is information for prompting a user that the statistical information is unreasonable data. In particular, the corresponding statistics may be automatically identified as red alerts displayed on the client page to prompt the user that the statistics are unreasonable data.

In the step, the preset reference data is adopted to carry out compliance verification on the abnormal information of each information set to be evaluated, and then the abnormal information which fails in the compliance verification is determined as the target abnormal information, so that the accuracy and objectivity of the obtained target abnormal information can be further improved.

In this embodiment, by acquiring a data statistics request, the data statistics request includes constraint conditions and classification tags; inquiring in a database according to the constraint condition to obtain data to be evaluated; identifying the data to be evaluated, and obtaining information to be evaluated of each data to be evaluated; classifying information to be evaluated by using a classification label to obtain an information classification set to be evaluated, wherein the information classification set to be evaluated comprises N information sets to be evaluated, and N is a positive integer; integrating the information to be evaluated of each information set to be evaluated to obtain statistical data of each information set to be evaluated, wherein the statistical data comprise statistical information of at least two dimensions; carrying out transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain the abnormal information of each information set to be evaluated; carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information; the efficiency and the accuracy of early warning of abnormal data are improved.

In an embodiment, as shown in fig. 3, classifying information to be evaluated by using a classification label to obtain a classification set of the information to be evaluated, which specifically includes the following steps:

s401: and extracting the characteristics of each piece of information to be evaluated, and obtaining the characteristic item data of each piece of information to be evaluated.

The characteristic item data refer to data of specific characteristic items extracted from information to be evaluated and capable of representing attributes. In this embodiment, the feature item data may be data representing a consumption type. Specifically, a specific field capable of representing a consumption type may be predefined, after each piece of information to be evaluated is obtained, a specific position of the specific field in each piece of information to be evaluated is located by using a keyword extraction algorithm, and corresponding feature item data is obtained from each piece of information to be evaluated. The keyword extraction algorithm is an algorithm for extracting keywords of a text from the text. For example: if the specific field set in advance is a spending item, the spending item may be: pen, notebook, eraser or calendar, etc.; and if the certain information to be evaluated is that the marketing 1 part purchases 500 yuan of pen, the keyword extraction algorithm is adopted to locate the position of the spending item in the information to be evaluated, and then the value (pen) corresponding to the spending item is extracted as the characteristic item data.

S402: and obtaining a keyword library corresponding to the classification label from the database based on the classification label.

The keyword library is a preset word library which stores a plurality of different types of keywords. In a specific embodiment, the database of the server side stores a plurality of keyword libraries in advance, and types of keywords stored in correspondence with different keyword libraries are different. Each keyword library has a corresponding word library identifier, so that the server can quickly and accurately inquire the corresponding keyword library based on the word library identifier. Specifically, keyword libraries corresponding to the classification tags are obtained from the database based on the classification tags, a character string matching algorithm can be adopted to match each classification tag obtained from the data statistics request with the word library identifier of each keyword library in the database one by one, and the keyword library corresponding to the word library identifier with the highest matching degree with the classification tag is used as the keyword library corresponding to the classification tag.

S403: and matching the characteristic item data by using a keyword library to obtain the classification category of each piece of information to be evaluated.

The types of keywords in the keyword library corresponding to different category labels are different, for example: keywords contained in the keyword library corresponding to the stationery category label may be: pencils, erasers, rulers, notebooks, etc. Keywords contained in the keyword library corresponding to the travel classification tag may be: spring, autumn, quarter, annual meeting, etc. Specifically, a character string matching algorithm or a regular matching method may be adopted, the feature item data of each piece of information to be evaluated obtained in step S401 is subjected to one-to-one matching processing with the keywords in the keyword library corresponding to each classification label, the keyword library where the keyword with the highest matching degree with the feature item data of the information to be evaluated is located is used as the target keyword library of the information to be evaluated, and the classification label corresponding to the target keyword library is used as the classification category of the information to be evaluated. For example: if the characteristic item data of a certain data to be evaluated is pen, matching the characteristic item data by using a keyword library to obtain the highest matching degree of the characteristic item data and a keyword pencil in the keyword library corresponding to the stationery classification label, and taking the stationery classification label as the classification category of the information to be evaluated.

S404: and classifying each piece of information to be evaluated according to the classification category to obtain a classification set of the information to be evaluated.

Specifically, each piece of information to be evaluated is classified according to classification categories, the information to be evaluated with the same classification category is used as the same type of information set to be evaluated, and the information to be evaluated with different classification categories is used as different types of information sets to be evaluated. Finally, a to-be-evaluated information classification set consisting of a plurality of different types of to-be-evaluated information sets is obtained. It can be understood that the classification categories corresponding to the information to be evaluated contained in the same information set to be evaluated are the same, and the classification categories corresponding to the information to be evaluated contained in different information sets to be evaluated are different.

In the embodiment, feature item data of each piece of information to be evaluated is obtained by identifying each piece of information to be evaluated; acquiring a keyword library corresponding to the classification tag from a database based on the classification tag; matching the characteristic item data by using a keyword library to obtain a classification category of each piece of information to be evaluated; classifying each piece of information to be evaluated according to the classification category to obtain a classification set of the information to be evaluated; and classifying the acquired information to be evaluated according to different classification labels, so that the subsequent analysis processing of the information to be evaluated is facilitated.

In an embodiment, as shown in fig. 4, the method integrates the information to be evaluated of each information set to be evaluated to obtain the statistical data of each information set to be evaluated, and specifically includes the following steps:

s501: the method comprises the steps of obtaining a preset reference data table, wherein the reference data table comprises at least two paging data tables, and each paging data table comprises a table ID and table header information.

The reference data table is a table which is preset by the server side and used for storing information to be evaluated of each information set to be evaluated. In this embodiment, the reference data table stores a plurality of paging data tables in advance, and each paging data table includes a table ID and corresponding header information. The table ID refers to an identifier for uniquely identifying a different paging data table, and the table ID corresponding to each paging data table in the reference data table is uniquely determined, so that the corresponding paging data table is acquired from the reference data table based on the table ID. The header information is pre-configured field information in each paging data table. Optionally, the header information may be classification information preset by a manager at the server according to a preset classification standard. In one embodiment, header information corresponding to each paging data table in the reference data table may be the same or different. For example: the header information may be a department name, a consumption period, a consumption amount section, and the like. The department names may include market segments, research and development segments, administrative segments, project segments, and the like. The consumption period may include 1-3 months, 4-6 months, 7-9 months, 10-12 months, etc.

S502: and importing the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table according to the table ID and the table header information to obtain the classification data table of each information set to be evaluated.

Specifically, the server determines, according to the table ID of each paging data table, the paging data table to which each information set to be evaluated should be imported, and in a specific embodiment, the server has associated each information set to be evaluated with the table ID of the corresponding paging data table in advance, and when the table ID of each paging data table is obtained, the paging data table to which each information set to be evaluated should be imported may be determined directly according to the table ID. Further, after determining the paging data table to which each information set to be evaluated should be imported, importing the information to be evaluated in each information set to be evaluated into the corresponding paging data table according to the header information in each paging data table. Optionally, the JAVA reflection principle may be used to import the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table, so as to obtain the classification data table of each information set to be evaluated. The JAVA reflection principle is that in the running state, for any one class, all the attributes and methods of the class can be known; any method and attribute of any object can be called; the function of the dynamically acquired information and the method for dynamically calling the object is called a java language reflection mechanism. In this embodiment, the server determines, according to the table ID of each paging data table, the paging data table to which each information set to be evaluated should be imported, and then imports the information to be evaluated in the corresponding information set to be evaluated into the paging data table by using the JAVA reflection principle, so as to obtain the corresponding classification data table.

In this embodiment, the corresponding information set to be evaluated may be uniquely determined according to the table ID, the header information may define contents included in each paging data table, and based on the table ID and the header information, the information set to be evaluated in each information set to be evaluated may be imported into the paging data table corresponding to the reference data table, so as to obtain the classification data table of each information set to be evaluated. The classification data table refers to a table after the information to be evaluated in the information set to be evaluated is introduced. In this step, the user can locate the table position of each information set to be evaluated based on the table ID, and can intuitively see the distribution condition of the information to be evaluated in each paging data table according to the header information, which is helpful for improving the query efficiency.

S503: and calculating the information to be evaluated in the classification data table of each information set to be evaluated by adopting a preset statistical function to obtain the statistical data of each information set to be evaluated.

The statistical function refers to a function for automatically performing statistical analysis on data contained in a specific area in the data table. For example: the statistical function may be a SUMIF function, a MAX function, a MIN function, an AVERGE function, a VARP function, and the like. Specifically, if the preset statistical function is a SUMIF function, calculating the information to be evaluated in the classification data table of each information set to be evaluated, so as to obtain the statistical data of each information set to be evaluated as the numerical sum of all the information to be evaluated in the same dimension in the classification data table of each information set to be evaluated. Illustratively, if the header information of a certain classified data table includes: and calculating the information to be evaluated in the classification data table by adopting the SUMIF function, wherein the obtained statistical data comprises total consumption data of each department in the total consumption time period and total consumption data of all departments in each consumption time period. Preferably, in order to ensure the intuitiveness and diversity of the obtained statistical data of each classified data, a plurality of different types of statistical functions can be adopted simultaneously to calculate the information to be evaluated in the classified data table of each information set to be evaluated.

In this embodiment, by acquiring a preset reference data table, the reference data table includes at least two paging data tables, and each paging data table includes a table ID and header information; according to the table ID and the table header information, importing the information to be evaluated in each information set to be evaluated into a paging data table corresponding to the reference data table to obtain a classification data table of each information set to be evaluated; and calculating the information to be evaluated in the classification data table of each information set to be evaluated by adopting a preset statistical function to obtain the statistical data of each information set to be evaluated, thereby ensuring the intuitiveness and diversity of the obtained statistical data of each information set to be evaluated.

In one embodiment, as shown in fig. 5, the statistical information of each information set to be evaluated is transversely matched according to different dimensions to obtain the abnormal information of each information set to be evaluated, which specifically includes the following steps:

s601: and carrying out feature vector conversion on each piece of statistical information in each piece of information to be evaluated to obtain a statistical vector of each piece of statistical information.

Specifically, each piece of statistical information in each piece of information to be evaluated is extracted, and then each piece of statistical information is vectorized to obtain a statistical vector of each piece of statistical information. For example: if the statistical information contained in a certain set of information to be evaluated is: 2000 stationery consumption data of part 1 of research and development, 3000 stationery consumption data of part 2 of research and development, 5000 stationery consumption data of part 3 of research and development, 6000 stationery consumption data of part 4 of research and development, and 9000 stationery consumption data of part 5 of research and development; and carrying out feature vector conversion on each piece of statistical information in the information set to be evaluated to obtain the statistical vector of the statistical information [2000,3000,5000,6000,9000].

S602: and calculating the statistical vector of each piece of statistical information according to different dimensions to obtain a vector average value in each dimension.

The vector average value refers to a value obtained by summing and averaging statistical vectors of all statistical information in the same dimension. Specifically, the server may sum the statistical vectors of the statistical information in each dimension by using a summing function to obtain a sum value corresponding to each dimension, and then average the obtained sum value corresponding to each dimension by using an averaging function to obtain a vector average value in each dimension. For example: if the statistical vector of the statistical information in a certain dimension is [2000,3000,5000,6000,9000], calculating the statistical vector of the statistical information in the dimension to obtain a vector average value of 5000 in the dimension.

S603: and calculating a vector difference value of the statistical vector of each piece of statistical information and the corresponding vector average value, and determining the statistical vector of which the vector difference value does not accord with the preset vector deviation as the abnormal information to obtain the abnormal information of each information set to be evaluated.

The preset vector deviation refers to a value used for evaluating whether each statistical vector is abnormal information. For example: the preset vector deviation may be set to 2000,3000,5000, or the like. The user can perform custom setting according to the overall numerical values of different statistical vectors in different dimensions. It will be appreciated that the corresponding preset vector deviations in different dimensions may be different.

In a specific embodiment, the database of the server side has pre-stored preset vector deviations corresponding to different dimensions. After calculating the statistical vector of each piece of statistical information and the vector difference value of the corresponding vector mean value by calling a difference function, the server can directly acquire the preset vector deviation in the corresponding dimension from the database, then compare each vector difference value with the preset vector deviation in the corresponding dimension one by one, and if the vector difference value is larger than the preset vector deviation in the corresponding dimension, determine the statistical information corresponding to the vector difference value as abnormal information; otherwise, if the vector difference value is smaller than or equal to the preset vector deviation in the corresponding dimension, determining that the statistical information corresponding to the vector difference value is non-abnormal information. For example: if the statistical vector of the statistical information in a certain dimension is [2000,3000,5000,6000,9000], the preset vector deviation in the corresponding dimension is 3000, the vector average value is 5000, the calculated vector difference value is [3000,2000,1000,4000], after each vector difference value is compared with the preset vector deviation 3000 one by one, the vector difference value 4000 is larger than the preset vector deviation 3000, and the statistical vector 9000 corresponding to the vector difference value 4000 is determined as the abnormal information.

In the embodiment, the feature vector conversion is performed on each piece of statistical information in each piece of information to be evaluated to obtain a statistical vector of each piece of statistical information; calculating the statistical vector of each piece of statistical information according to different dimensions to obtain a vector average value in each dimension; and calculating a vector difference value of the statistical vector of each piece of statistical information and the corresponding vector mean value, determining the statistical vector of which the vector difference value does not accord with the preset vector deviation as abnormal information, and obtaining the abnormal information of each information set to be evaluated, thereby ensuring the accuracy of the obtained abnormal information.

In one embodiment, as shown in fig. 6, the method includes the steps of performing compliance verification on the abnormal information of each information set to be evaluated by using preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information, and specifically includes the following steps:

s701: and determining target header information corresponding to the abnormal information of each information set to be evaluated.

Specifically, in step S502, the information to be evaluated in each information set to be evaluated is imported into the paging data table corresponding to the reference data table based on the table ID and the header information, so that the target header information corresponding to the abnormal information can be determined according to the position of the abnormal information of each information set to be evaluated in the corresponding classification data table. In this embodiment, since the statistics data of each information set to be evaluated includes statistics information of at least two dimensions, there are at least two pieces of target header information corresponding to each anomaly information.

S702: and acquiring corresponding reference data from the database according to the target header information.

In one embodiment, the target header information in each classified data table is pre-associated with corresponding reference data. Therefore, after the target table header information corresponding to the abnormal information is determined, the corresponding reference data can be directly acquired from the database according to the acquired target table information. It is understood that each piece of abnormality information corresponds to only one piece of reference data.

S703: judging whether the abnormal information of each information set to be evaluated is in the corresponding reference data range, determining the abnormal information which is not in the reference data range as target abnormal information, and generating corresponding early warning information according to the target abnormal information.

The reference data refers to data for evaluating whether or not the abnormality information of each information set to be evaluated is the target abnormality information. In this step, the reference data is set to a numerical range in order to ensure the accuracy of the obtained target abnormality information. Specifically, comparing the abnormal information of each information set to be evaluated with corresponding reference data, and judging whether each abnormal information is in the corresponding reference data range; if the abnormal information is not in the corresponding reference data range, determining the abnormal data as target abnormal information, and generating corresponding early warning information according to the target abnormal information.

The early warning information is information for prompting a user that the statistical information is unreasonable data. In particular, the corresponding statistics may be automatically identified as red alerts displayed on the client page to prompt the user that the statistics are unreasonable data. Preferably, the early warning information may further include recommendation data corresponding to each piece of target abnormality information. The recommended data refers to correct reference data recommended for each target anomaly information. The recommended data may be a specific data or a data range. For example: if the target anomaly information is 9000, the corresponding recommended data may be 5000-8000.

In the embodiment, the target header information corresponding to the abnormal information of each information set to be evaluated is determined; acquiring corresponding reference data from a database according to the target header information; judging whether the abnormal information of each information set to be evaluated is in the corresponding reference data range; determining the abnormal information which is not in the reference data range as target abnormal information, and generating corresponding early warning information according to the target abnormal information; after the abnormal data is acquired, the acquired abnormal data is checked through the reference data, so that the accuracy and objectivity of the acquired target abnormal information are ensured, and the early warning efficiency and accuracy of the abnormal data are further improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, an abnormal data early warning device is provided, where the abnormal data early warning device corresponds to the abnormal data early warning method in the above embodiment one by one. As shown in fig. 7, the abnormal data early warning device includes a data statistics request acquisition module 10, a data to be evaluated acquisition module 20, an identification module 30, a classification module 40, an integration module 50, a lateral matching module 60, and a compliance verification module 70. The functional modules are described in detail as follows:

a data statistics request acquisition module 10, configured to acquire a data statistics request, where the data statistics request includes constraint conditions and classification labels;

the data to be evaluated acquisition module 20 is used for inquiring in the database according to the constraint condition to acquire the data to be evaluated;

the identifying module 30 is configured to identify data to be evaluated, and obtain information to be evaluated of each data to be evaluated;

the classification module 40 is configured to classify information to be evaluated by using a classification label to obtain an information classification set to be evaluated, where the information classification set to be evaluated includes N information sets to be evaluated, and N is a positive integer;

The integration module 50 is configured to integrate the information to be evaluated of each information set to be evaluated to obtain statistical data of each information set to be evaluated, where the statistical data includes statistical information of at least two dimensions;

the transverse matching module 60 is configured to transversely match the statistical information of each information set to be evaluated according to different dimensions, so as to obtain abnormal information of each information set to be evaluated;

the compliance verification module 70 is configured to perform compliance verification on the anomaly information of each information set to be evaluated by using preset reference data, obtain target anomaly information failing the compliance verification, and generate early warning information according to the target anomaly information.

Preferably, the classification module 40 includes:

a feature extraction unit 401, configured to perform feature extraction on each piece of information to be evaluated, and obtain feature item data of each piece of information to be evaluated;

a keyword library obtaining unit 402, configured to obtain a keyword library corresponding to the classification tag from the database based on the classification tag;

a matching unit 403, configured to match the feature item data by using a keyword library, so as to obtain a classification category of each piece of information to be evaluated;

and the classification unit 404 is configured to classify each piece of information to be evaluated according to the classification category, and obtain a classification set of the information to be evaluated.

Preferably, the integration module 50 includes:

a reference data table obtaining unit 501, configured to obtain a preset reference data table, where the reference data table includes at least two paging data tables, and each paging data table includes a table ID and header information;

an importing unit 502, configured to import the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table according to the table ID and the header information, so as to obtain a classification data table of each information set to be evaluated;

the first calculating unit 503 is configured to calculate the information to be evaluated in the classification data table of each information set to be evaluated by using a preset statistical function, so as to obtain the statistical data of each information set to be evaluated.

Preferably, the lateral matching module 60 comprises:

the feature vector conversion unit is used for carrying out feature vector conversion on each piece of statistical information in each piece of information set to be evaluated to obtain a statistical vector of each piece of statistical information;

the second calculation unit is used for calculating the statistical vector of each piece of statistical information according to different dimensions to obtain a vector average value in each dimension;

and the third calculation unit is used for calculating a vector difference value of the statistical vector of each piece of statistical information and the corresponding vector average value, determining the statistical vector of which the vector difference value does not accord with the preset vector deviation as the abnormal information, and obtaining the abnormal information of each information set to be evaluated.

Preferably, the compliance verification module 70 includes:

the target header information confirming unit is used for confirming target header information corresponding to the abnormal information of each information set to be evaluated;

the reference data acquisition unit is used for acquiring corresponding reference data from the database according to the target header information;

and the judging unit is used for judging whether the abnormal information of each information set to be evaluated is in the corresponding reference data range, determining the abnormal information which is not in the reference data range as target abnormal information, and generating corresponding early warning information according to the target abnormal information.

For specific limitation of the abnormal data early warning device, reference may be made to the limitation of the abnormal data early warning method hereinabove, and the description thereof will not be repeated here. All or part of the modules in the abnormal data early warning device can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data used in the abnormal data early warning method in the embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an abnormal data pre-warning method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the abnormal data early warning method in the above embodiments when executing the computer program.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the anomaly data early warning method of the embodiment.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An abnormal data early warning method is characterized by comprising the following steps:

integrating the information to be evaluated of each information set to be evaluated to obtain statistical data of each information set to be evaluated, wherein the statistical data comprises: acquiring a preset reference data table, wherein the reference data table comprises at least two paging data tables, and each paging data table comprises a table ID and table header information; according to the table ID and the header information, importing the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table to obtain a classification data table of each information set to be evaluated; calculating the information to be evaluated in the classification data table of each information set to be evaluated by adopting a preset statistical function to obtain the statistical data of each information set to be evaluated, wherein the statistical data comprises statistical information of at least two dimensions;

Performing transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain abnormal information of each information set to be evaluated, wherein the method comprises the following steps: performing feature vector conversion on each piece of statistical information in each piece of information to be evaluated to obtain a statistical vector of each piece of statistical information; calculating the statistical vector of each piece of statistical information according to different dimensions to obtain a vector average value in each dimension; calculating a vector difference value of the statistical vector of each piece of statistical information and the corresponding vector average value, and determining the statistical vector of which the vector difference value does not accord with a preset vector deviation as abnormal information to obtain the abnormal information of each information set to be evaluated;

carrying out compliance verification on the abnormal information of each information set to be evaluated by adopting preset reference data, obtaining target abnormal information failing the compliance verification, and generating early warning information according to the target abnormal information, wherein the method comprises the following steps: determining target header information corresponding to the abnormal information of each information set to be evaluated; acquiring corresponding reference data from a database according to the target header information; judging whether the abnormal information of each information set to be evaluated is in the corresponding reference data range, determining the abnormal information which is not in the reference data range as target abnormal information, and generating corresponding early warning information according to the target abnormal information.

2. The abnormal data early warning method of claim 1, wherein classifying the information to be evaluated by using the classification tag to obtain a classified set of information to be evaluated comprises:

extracting the characteristics of each piece of information to be evaluated, and obtaining characteristic item data of each piece of information to be evaluated;

acquiring a keyword library corresponding to the classification tag from a database based on the classification tag;

matching the characteristic item data by adopting the keyword library to obtain the classification category of each piece of information to be evaluated;

and classifying each piece of information to be evaluated according to the classification category to obtain a classification set of the information to be evaluated.

3. An abnormal data early warning device, characterized by comprising:

The integration module is used for integrating the information to be evaluated of each information set to be evaluated to obtain the statistical data of each information set to be evaluated, and comprises the following steps: acquiring a preset reference data table, wherein the reference data table comprises at least two paging data tables, and each paging data table comprises a table ID and table header information; according to the table ID and the header information, importing the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table to obtain a classification data table of each information set to be evaluated; calculating the information to be evaluated in the classification data table of each information set to be evaluated by adopting a preset statistical function to obtain the statistical data of each information set to be evaluated, wherein the statistical data comprises statistical information of at least two dimensions;

the transverse matching module is used for carrying out transverse matching on the statistical information of each information set to be evaluated according to different dimensions to obtain abnormal information of each information set to be evaluated, and comprises the following steps: performing feature vector conversion on each piece of statistical information in each piece of information to be evaluated to obtain a statistical vector of each piece of statistical information; calculating the statistical vector of each piece of statistical information according to different dimensions to obtain a vector average value in each dimension; calculating a vector difference value of the statistical vector of each piece of statistical information and the corresponding vector average value, and determining the statistical vector of which the vector difference value does not accord with a preset vector deviation as abnormal information to obtain the abnormal information of each information set to be evaluated;

The compliance verification module is configured to perform compliance verification on the anomaly information of each information set to be evaluated by using preset reference data, obtain target anomaly information failing the compliance verification, and generate early warning information according to the target anomaly information, and includes: determining target header information corresponding to the abnormal information of each information set to be evaluated; acquiring corresponding reference data from a database according to the target header information; judging whether the abnormal information of each information set to be evaluated is in the corresponding reference data range, determining the abnormal information which is not in the reference data range as target abnormal information, and generating corresponding early warning information according to the target abnormal information.

4. The anomaly data early warning device of claim 3, wherein the classification module comprises:

the feature extraction unit is used for carrying out feature extraction on each piece of information to be evaluated to obtain feature item data of each piece of information to be evaluated;

a keyword library obtaining unit, configured to obtain a keyword library corresponding to the classification tag from a database based on the classification tag;

the matching unit is used for matching the characteristic item data by adopting the keyword library to obtain the classification category of each piece of information to be evaluated;

And the classification unit is used for classifying each piece of information to be evaluated according to the classification category to obtain a classification set of the information to be evaluated.

5. The anomaly data early warning device of claim 3, wherein the integration module comprises:

a reference data table obtaining unit, configured to obtain a preset reference data table, where the reference data table includes at least two paging data tables, and each paging data table includes a table ID and header information;

the importing unit is used for importing the information to be evaluated in each information set to be evaluated into the paging data table corresponding to the reference data table according to the table ID and the header information to obtain a classification data table of each information set to be evaluated;

the first calculation unit is used for calculating the information to be evaluated in the classification data table of each information set to be evaluated by adopting a preset statistical function to obtain the statistical data of each information set to be evaluated.

6. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the anomaly data early warning method according to any one of claims 1 to 2 when the computer program is executed.

7. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the abnormal data early warning method according to any one of claims 1 to 2.