CN110275878B - Service data detection method and device, computer equipment and storage medium - Google Patents

Service data detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110275878B
CN110275878B CN201910557452.2A CN201910557452A CN110275878B CN 110275878 B CN110275878 B CN 110275878B CN 201910557452 A CN201910557452 A CN 201910557452A CN 110275878 B CN110275878 B CN 110275878B
Authority
CN
China
Prior art keywords
dimension
service data
data
target index
coverage range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910557452.2A
Other languages
Chinese (zh)
Other versions
CN110275878A (en
Inventor
王方舟
王嘉敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910557452.2A priority Critical patent/CN110275878B/en
Publication of CN110275878A publication Critical patent/CN110275878A/en
Application granted granted Critical
Publication of CN110275878B publication Critical patent/CN110275878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a service data detection method, a service data detection device, a computer device and a storage medium. Obtaining the dimensionality included by the service data of the target time period; determining a target index based on the type of the service data, wherein the target index comprises a plurality of indexes corresponding to the type; acquiring a dimension coverage range corresponding to the target index, wherein the dimension coverage range is used for indicating the dimension required by the statistical target index; comparing the dimensionality included by the service data with the dimensionality coverage range; and when the dimensionality included by the service data does not meet the dimensionality coverage range, determining that the service data is abnormal. The embodiment of the disclosure considers whether the service data is complete in dimensionality by determining the dimensionality required by a certain index, and cuts in from the perspective of the required dimensionality, so that the problem of poor data quality caused by the loss of the service data in dimensionality can be avoided, the generation of an error or one-sided analysis conclusion is avoided, and the efficiency of data detection and the accuracy of analysis are improved.

Description

Service data detection method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for detecting service data, a computer device, and a storage medium.
Background
Currently, when monitoring and analyzing massive business data, a multidimensional root cause analysis method is generally adopted, that is, business data including multiple indexes are analyzed from multiple dimensions. In a multidimensional root cause analysis scene, the quality of data is critical, and if the data analysis is performed based on the data with low quality, an error conclusion is likely to be drawn. Therefore, how to detect low-quality business data from business data is very important in the multidimensional root cause analysis scenario.
At present, when detecting service data, a statistical standard may be set, after the service data is aggregated, it is verified whether the service data meets the statistical standard, and the quality of the service data is detected based on the verification result.
However, in the above conventional method for detecting service data, the statistical criteria are generally in such a way that a certain criterion needs to satisfy no numerical range, or whether a certain index satisfies a certain data rule, and the statistical criteria are single and unilateral, and the service data detected based on the statistical criteria are analyzed, and an error or unilateral analysis conclusion is easily generated, so the efficiency of data detection is low, and the accuracy of analysis is poor.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a computer device and a storage medium for detecting service data, so as to at least solve the problem that errors or one-sided analysis conclusions are easily generated in the related art, and the efficiency of data detection is low, thereby resulting in poor analysis accuracy. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, a method for detecting service data is provided, including:
obtaining the dimensionality included by the service data of the target time period;
determining a target index based on the type of the service data, wherein the target index comprises a plurality of indexes corresponding to the type;
acquiring a dimension coverage range corresponding to the target index, wherein the dimension coverage range is used for indicating the dimension required by the statistical target index;
comparing the dimensionality included by the service data with the dimensionality coverage range;
determining that the service data is abnormal when the dimensionality included in the service data does not meet the dimensionality coverage range
In a possible implementation, comparing the dimensions included in the service data with the dimension coverage includes:
obtaining the number of dimensions required by the target index from the dimension coverage range corresponding to the target index;
and when the number of the dimensions included in the business data is not matched with the number of the dimensions required by the target index, determining that the dimensions included in the business data do not meet the dimension coverage range.
In a possible implementation, comparing the dimensions included in the service data with the dimension coverage includes:
acquiring enumeration values corresponding to all dimensions from the dimension coverage range corresponding to the target index;
and when any one of the number and the value range of the enumerated values of the dimensions included in the service data is not matched with the number and the value range of the enumerated values corresponding to the dimensions, determining that the dimensions included in the service data do not satisfy the dimension coverage range.
In a possible implementation, comparing the dimensions included in the service data with the dimension coverage includes:
acquiring a life cycle corresponding to each dimension from a dimension coverage range corresponding to the target index, wherein the life cycle is used for representing a time period from creation to termination of the dimension;
and when any dimension in the service data has data missing on the life cycle corresponding to the dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
In a possible implementation, comparing the dimensions included in the service data with the dimension coverage includes:
obtaining a dimension distribution range corresponding to each dimension from a dimension coverage range corresponding to the target index;
and when the dimension distribution range included in the service data is not matched with the dimension distribution range corresponding to each dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
In a possible implementation manner, after obtaining the dimension included in the service data of the target time period, the method further includes:
acquiring a data format of service data;
matching the data format information with a preset data format rule;
and when the data format of the service data is not matched with the preset data format rule, determining that the service data is abnormal.
In a possible implementation, after determining that the service data is abnormal, the method further includes:
obtaining the corresponding dimensionality of abnormal business data;
determining the position of abnormal business data based on the dimension;
and outputting an abnormal service data report, wherein the abnormal service data report comprises the position of the abnormal service data.
According to a second aspect of the embodiments of the present disclosure, there is provided a service data processing apparatus, including:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire the dimension included by the service data of the target time period;
a first determination unit configured to determine a target index based on a type of the traffic data, the target index including a plurality of indexes corresponding to the type;
a second obtaining unit configured to obtain a dimension coverage corresponding to the target index, the dimension coverage being configured to indicate a dimension required for counting the target index;
the comparison unit is configured to compare the dimensionality included by the service data with the dimensionality coverage range;
and the second determining unit is configured to determine that the service data is abnormal when the dimensionality included in the service data does not meet the dimensionality coverage range.
In a possible implementation manner, the comparing unit is specifically configured to:
obtaining the number of dimensions required by the target index from the dimension coverage range corresponding to the target index;
and when the number of the dimensions included in the business data is not matched with the number of the dimensions required by the target index, determining that the dimensions included in the business data do not meet the dimension coverage range.
In a possible implementation manner, the comparing unit is further specifically configured to:
acquiring enumeration values corresponding to all dimensions from the dimension coverage range corresponding to the target index;
and when any one of the number and the value range of the enumerated values of the dimensions included in the service data is not matched with the number and the value range of the enumerated values corresponding to the dimensions, determining that the dimensions included in the service data do not satisfy the dimension coverage range.
In a possible implementation manner, the comparing unit is further specifically configured to:
acquiring a life cycle corresponding to each dimension from a dimension coverage range corresponding to the target index, wherein the life cycle is configured to represent a time period from creation to termination of the dimension;
and when any dimension in the service data has data missing on the life cycle corresponding to the dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
In a possible implementation manner, the comparing unit is further specifically configured to:
obtaining a dimension distribution range corresponding to each dimension from a dimension coverage range corresponding to the target index;
and when the dimension distribution range included in the service data is not matched with the dimension distribution range corresponding to each dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
Acquiring a data format of service data;
matching the data format information with a preset data format rule;
and when the data format of the service data is not matched with the preset data format rule, determining that the service data is abnormal.
In a possible implementation manner, the apparatus further includes:
the output unit is configured to acquire the dimensionality corresponding to the abnormal service data; determining the position of abnormal business data based on the dimension; and outputting an abnormal service data report, wherein the abnormal service data report comprises the position of the abnormal service data.
According to a third aspect of embodiments of the present disclosure, there is provided a computer device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the traffic data detection method as any one of the above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions of the storage medium, when executed by a processor of a computer device, enable the computer device to perform any one of the above-mentioned service data detection methods.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising executable instructions, which when executed by a processor of a computer device, enable the computer device to perform the traffic data detection method as defined in any one of the above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the embodiment of the disclosure considers whether the service data is complete in dimensionality by determining the dimensionality required by a certain index, and cuts in from the perspective of the required dimensionality, so that the problem of poor data quality caused by the loss of the service data in dimensionality can be avoided, the generation of an error or one-sided analysis conclusion is avoided, and the efficiency of data detection and the accuracy of analysis are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a traffic data detection in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating an implementation of detecting a number of dimensions of business data according to an example embodiment;
FIG. 3 is a flow diagram illustrating an implementation of detecting dimension enumeration values for business data in accordance with an illustrative embodiment;
FIG. 4 is a flow diagram illustrating an implementation of detecting a dimensional lifecycle of business data in accordance with an exemplary embodiment;
FIG. 5 is a flow diagram illustrating an implementation of detecting dimension enumeration values for business data in accordance with an illustrative embodiment;
FIG. 6 is a flow diagram illustrating a detection of a traffic data format in accordance with an exemplary embodiment;
FIG. 7 is a flow chart illustrating a method for detecting quality of service data in accordance with an exemplary embodiment provided;
FIG. 8 is a flow chart illustrating one type of operational traffic data detection provided by an exemplary embodiment;
FIG. 9 illustrates a traffic data detection device in accordance with an exemplary embodiment;
FIG. 10 is a block diagram illustrating a computer device according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The root cause analysis of the business data is to find out and solve the root cause of the problem step by step, in the multidimensional root cause analysis of the business data, the influence degree on the business abnormality under dimension is allowed by the dimension of drilling down in the step of the root cause analysis, usually, the business data in a certain time period is analyzed, in order to ensure the accuracy of the analysis result, the integrity and the continuity of the business data in the time period are detected, and whether the business data meet the expectation or not is judged, fig. 1 is a flow chart of business data detection according to an exemplary embodiment, as shown in fig. 1, the data processing method is used in a computer device and comprises the following steps.
In step 101, a dimension included in the service data of the target time period is acquired.
In a possible implementation manner, the service data of a certain time period is acquired from the database, and the time information of each service data is also correspondingly recorded in the database, where the time information may be time information when the database receives the service data, time information when the service data is updated, time information when the service data is generated, and the like, and based on this, the service data in the corresponding time period can be acquired according to actual requirements.
In a possible implementation manner, the service data stored in the database may be stored in a form of a data table, where the service data in the data table is composed of a dimension and an index, where the dimension generally refers to an attribute of the service data, and may include: the number of dimensions, an enumeration value corresponding to the dimensions, a life cycle corresponding to the dimensions, and a dimension distribution corresponding to the dimensions, and the index generally refers to a standard for business data quantization.
For example, in a session scenario of web browsing, a "city" dimension represents a city from which a session is initiated, a "web" dimension represents a web address of a web page browsed by a user, a "number of sessions" indicator is a total number of sessions, and a "number of web pages browsed per session" indicator is an average amount of web browsing per session.
In step 102, a target index is determined based on the type of the service data, wherein the target index comprises a plurality of indexes corresponding to the type;
in an embodiment of the present invention, based on the type of the service data, a target index for detecting the integrity continuity of the service data is determined, where the target index may include an expected index for verifying the dimension and the latitude value of the service data, a life cycle index of the dimension, a continuous index of the whole service data, a continuous index obtained by performing dimension reduction on the service data, a format index of the service data, and the like, and the target index is determined as the target index of the service data.
In step 103, a dimension coverage corresponding to the target index is obtained, where the dimension coverage is used to indicate a dimension required by the statistical target index.
In a multidimensional root cause analysis scenario, the integrity of the service data and the continuity of the dimension corresponding to the index are generally judged to measure whether the quality of the service data reaches the standard, and a dimension coverage range corresponding to the index in the service data is obtained according to the integrity of the service data and the continuity of the dimension corresponding to the index, wherein the dimension coverage range corresponding to the index is used for judging whether the service data is complete and whether the dimension corresponding to the index is continuous.
In a possible implementation manner, the dimension coverage range corresponding to the target index at least includes one of an expected number of dimension values of the target index, an expected number and an expected range of enumerated values of the dimension, an expected life cycle of the dimension, an expected distribution of the dimension in the life cycle, and a data format definition of the service data, and the subsequent detection of the quality of the service data is implemented based on the dimension coverage range corresponding to the target index.
In step 104, the dimensions included in the service data are compared with the dimension coverage.
In one possible implementation manner, the indexes of each day and the dimension number corresponding to the indexes in the service data are counted, and the counted dimension number corresponding to the target treatment is compared with the expected number of the dimension values of the target indexes in the obtained coverage range to judge whether the service data are missing.
In a possible implementation manner, an enumerated value of a dimension corresponding to a target index in the service data is counted, and the enumerated value is compared with an expected number and an expected range of enumerated values of the dimension in the obtained coverage range to judge whether the service data is normal.
In a possible implementation manner, the life cycle of the dimension corresponding to the target index in the service data and the distribution in the life cycle are counted, and the life cycle of the dimension corresponding to the target index and the distribution in the life cycle are compared with the expected life cycle of the dimension in the obtained coverage range and the expected distribution of the dimension in the life cycle, so as to judge whether the service data has value.
In a possible implementation manner, the format information of each data in the service data is acquired, and the format information of each data and the data format definition of the service data in the acquired coverage area are scanned to determine whether the format of the service data is correct.
In step 105, when the dimension included in the service data does not satisfy the dimension coverage, determining that the service data is abnormal.
In a possible implementation manner, after determining that the service data is abnormal, obtaining a dimension corresponding to the abnormal service data, determining a position of the abnormal service data based on the dimension, and outputting an abnormal service data report, where the abnormal service data report includes the position of the abnormal service data.
The embodiment of the disclosure considers whether the service data is complete in dimensionality by determining the dimensionality required by a certain index, and cuts in from the perspective of the required dimensionality, so that the problem of poor data quality caused by the loss of the service data in dimensionality can be avoided, the generation of an error or one-sided analysis conclusion is avoided, and the efficiency of data detection and the accuracy of analysis are improved.
In order to implement the service data detection method, quality detection is performed on the dimension of the service data based on the dimension coverage range corresponding to the target index, and when the dimension of the service data does not meet the dimension coverage range, it is determined that the service data is abnormal, the service data analysis process may be stopped, so as to prevent a one-sided or wrong analysis conclusion from being generated. However, for how to perform quality detection on service data specifically, the embodiment of the present disclosure may be performed by any one of the following possible implementations shown in fig. 2 to fig. 6.
When the dimension coverage range corresponding to the target index is the expected number of the dimension values of the target index, correspondingly, the number of the dimensions included in the service data is compared to detect the quality of the service data, see fig. 2, where fig. 2 is a flowchart illustrating an implementation of detecting the number of the dimensions of the service data according to an exemplary embodiment, and specifically includes the following steps:
in step 201, the number of dimensions required by the target index is obtained from the dimension coverage corresponding to the target index.
In step 202, when the number of dimensions included in the business data does not match the number of dimensions required by the target index, it is determined that the dimensions included in the business data do not satisfy the dimension coverage.
In a possible implementation manner, the step 201 and 203 of detecting the number of dimensions of the service data may be implemented by establishing an index-dimension matrix based on a bus architecture, specifically, a row in the index-dimension matrix represents each index, a column in the index-dimension matrix represents a dimension, an index may be marked at an intersection point to be related to a corresponding dimension, fast dimension query and comparison may be implemented by the index-dimension matrix, and meanwhile, an index may be added to the index-dimension matrix according to the existing basis, and the index may be associated with a corresponding dimension in the index-dimension matrix according to the existing basis, or a corresponding new dimension may be added to the index-dimension matrix.
For example, the foregoing steps may be implemented based on a Structured Query Language (SQL), specifically, the number of dimensions corresponding to the target index may be obtained from the database through a SELECT Query instruction, then the number value of the dimensions required for obtaining the target index according to the expected target is compared with the number value of the dimensions, when the number of the dimensions is the same, that is, the number of the dimensions is complete, it is determined that the service data is not abnormal, and when the number of the dimensions is different, that is, the number of the dimensions is incomplete, it is determined that the service data is abnormal.
In an embodiment of the present disclosure, the integrity of the service data may be verified by determining whether the number of the dimensionalities corresponding to the target index is the same as the preset number of the dimensionalities or within a normal range interval, and of course, the integrity of the service data may be verified by determining whether the number of the dimensionalities corresponding to the target index and the number of the dimensionalities correspond to the preset dimensionalities and the number of the dimensionalities one to one, which is not specifically limited by the present disclosure.
The number of dimensions included in the acquisition of the service data based on the Hive SQL implementation can be realized by the following codes:
SELECT # # QUERY COMMAND FOR QUERY OF ALL RECORDS WHICH satisfy A CONDITION
Command for counting dimension number of COUNT (DISTINCT dimension _ key) AS quota _ cnt # #
FROM $ { data _ table } ## indicates a command to query all records FROM the data _ table
WHERE p_date=${verify_date}
AND task _ id $ { INDIVual _ task _ id } ######, the queried record should satisfy $ { verify _ date } AND $ { INDIVual _ task _ id }at the same time
When the dimension coverage range corresponding to the target index is an enumeration corresponding to a dimension, correspondingly, the number and the value range of the enumerated values of the dimensions included in the service data are compared to detect the quality of the service data, see fig. 3, where fig. 3 is a flowchart illustrating an implementation of detecting the dimension enumerated values of the service data according to an exemplary embodiment, and specifically includes the following steps:
in step 301, an enumerated value corresponding to each dimension is obtained from the dimension coverage corresponding to the target index.
In step 302, when any one of the number and the value range of the enumerated values of the dimensions included in the service data does not match with the number and the value range of the enumerated values corresponding to each dimension, it is determined that the dimensions included in the service data do not satisfy the dimension coverage.
In a possible implementation manner, the enumerated value of the dimension corresponding to the index in a normal service data is usually fixed, and based on this, the number and value range of the enumerated value of the dimension corresponding to the index in the service data can be counted and verified, and the integrity of the service data is judged.
For example, in SQL, enumeration value comparison is performed on business data, specifically, enumeration values of dimensions corresponding to a target index in a database may be obtained from a SELECT query instruction, then enumeration values corresponding to each dimension are obtained according to an expected target, and the two are compared, when the number and/or value range of the enumeration values match, the enumeration values and enumeration range of the dimensions are complete, it is determined that the business data is not abnormal, and when the number of dimensions do not match, the enumeration values and enumeration range of the dimensions are incomplete, and it is determined that the business data is abnormal.
In an embodiment of the present disclosure, whether the number and the value range of the dimension enumerated value corresponding to the target index are matched with the preset dimension enumerated value and the enumerated range may be verified at the same time, and when both the number and the value range of the enumerated value are matched, it is determined that the service data is complete.
The number and value range of enumerated values of the dimensions included in the acquisition of the service data based on Hive SQL can be realized by the following codes:
SELECT
number and value range of enumeration values of service data dimension obtained by DISTINCT dimension _ value _ AS dimension _ value _ cnt # #
FROM${data_table}
WHERE p_date=${verify_date}
AND task_id=${individual_task_id}
Number of AND dimension _ key $ { verify _ dimension } ####enumerated values
When the dimension coverage range corresponding to the target index is the life cycle of the dimension, correspondingly, verifying whether data is missing in the life cycle of the dimension included in the service data to detect integrity and continuity of the service data, referring to fig. 4, where fig. 4 is a flowchart illustrating an implementation of detecting whether data is missing in the life cycle of the service data dimension according to an exemplary embodiment, specifically including the following steps:
in step 401, a life cycle corresponding to each dimension is obtained from a dimension coverage corresponding to the target index, where the life cycle is used to indicate a time period from creation to completion of the dimension.
In step 402, when any dimension in the service data has data missing in the lifecycle corresponding to the dimension, it is determined that the dimension included in the service data does not satisfy the dimension coverage.
In a possible implementation manner, since the dimension of the target index may be derived by the influence of business processes, for example, the dimensions in the group segmentation and price segmentation classes may be adjusted along with the change of distribution, new sub-dimensions may be generated under corresponding dimensions, or new dimensions are directly added, and business data with a too short life cycle of the dimension has no analytical value, based on which whether the dimension is reasonable in the life cycle of the dimension may be used as a criterion for measuring the quality of the business.
For example, in SQL, the continuity of data in a life cycle of business data is verified, specifically, a life cycle of a dimension corresponding to a target index in a predetermined time from a database may be obtained by a SELECT query instruction, that is, a time period from creation to completion of the dimension, and then the life cycle corresponding to each dimension is obtained from a dimension coverage range corresponding to the target index.
The life cycle of the dimension included in the business data acquisition based on Hive SQL can be realized by the following codes:
SELECT
COUNT(1)
,datediff(pdate2dt(${verify_begin_date}),pdate2dt(${verify_end_date}))+1 AS date_num
commands for # statistics of dimension lifecycle
FROM${data_table}
WHERE p_date>=${verify_begin_date}
AND p_date<=${verify_end_date}
AND task_id=${individual_task_id}
AND dimension_key=${verify_dimension}
Query condition for AND dimension _ value $ { verify _ dimension _ value } ## dimension life cycle
When the dimension coverage range corresponding to the target index is the dimension distribution of the dimension in the life cycle, correspondingly, the distribution of the dimension in the life cycle of the dimension included in the service data is verified, and the detection on the integrity and continuity of the service data is realized, fig. 5 is an implementation flowchart for detecting the dimension distribution of the service data dimension in the life cycle, shown in fig. 5, and includes the following steps:
in step 501, dimension distribution corresponding to each dimension is obtained from the dimension coverage corresponding to the target index.
In step 502, when the dimension distribution included in the business data does not match the dimension distribution corresponding to each dimension, it is determined that the dimension included in the business data does not satisfy the dimension coverage.
In a possible implementation, whether the dimension of the target index is relatively fixed in its distribution over its lifetime may also be used as a criterion for detecting the quality of service.
For example, in SQL, a dimension distribution range in a dimension life cycle is checked for service data, specifically, a distribution range of a dimension in the dimension life cycle corresponding to a target index in a predetermined time may be obtained from a database by a SELECT query instruction, then, a dimension distribution corresponding to each dimension is obtained from a dimension coverage range corresponding to the target index, the two are compared, when the distribution ranges of the dimensions are matched, that is, the service data has an analysis value, it is determined that the service data is not abnormal, and conversely, when the continuous ranges of the indexes are not matched, that is, the service data does not have an analysis value, it is determined that the service data is abnormal.
The life cycle of the dimension included in the business data acquisition based on Hive SQL can be realized by the following codes:
SELECT
COUNT(1)
,datediff(pdate2dt(${verify_begin_date}),pdate2dt(${verify_end_date}))+1 AS date_num
commands for # statistics of dimension lifecycle
FROM${data_table}
WHERE p_date>=${verify_begin_date}
AND p_date<=${verify_end_date}
AND task_id=${individual_task_id}
AND dimension_key=${verify_dimension}
Query condition for AND dimension _ value $ { verify _ dimension _ value } ## dimension life cycle
In a possible implementation manner, the format of the service data is quality-tested through a preset data format rule, fig. 6 is a flowchart illustrating the process of testing the format of the service data according to an exemplary embodiment, and as shown in fig. 6, the method includes the following steps:
in step 601, a data format of the service data is obtained.
In step 602, the data format information is matched with a preset data format rule.
In step 603, when the data format of the service data does not match the preset data format rule, it is determined that the service data is abnormal.
In a possible implementation manner, the quality of the service data may also be verified by determining whether the indexes in the service data are continuous and whether the data format conforms to a predetermined manner.
For example, each index and each data format in the service data are scanned in a python custom function scanning manner, whether each index is continuous and whether the data format conforms to a predefined format are verified, further, differential calculation is performed on time information in the service data to judge whether the service data is continuous, and the quality of the service data is detected based on the above process.
The embodiment provided by the disclosure provides integrity and continuity data detection for business data, and whether the business data meets the requirements of multidimensional root cause analysis is verified by verifying dimensions, whether a dimension value meets expectations, whether an enumerated value and an enumerated value range of the dimensions meet standards, whether a life cycle of the dimensions meets expectations (namely, whether the life cycle is too short and has no analytical value), whether each index of the business data is continuous, whether time is continuous (namely, time enumeration cannot be lacked or the index has a null value), whether a data format meets the definition of a root cause analysis algorithm, and the like, and if one aspect does not meet the above conditions, a root cause analysis process is interrupted.
In a possible implementation manner, referring to fig. 7, the integrity and continuity data detection on the service data may be further implemented by the following steps:
step 701, verifying the dimension number of the service data, and executing step 702 when the dimension number included in the service data is matched with the dimension number required by the target index;
step 702, verifying the enumerated values of the dimensions of the service data, and executing step 703 when any one of the number and the value range of the enumerated values of the dimensions included in the service data is matched with the number and the value range of the enumerated values corresponding to each dimension;
step 703, verifying the continuity of the data in the life cycle of the service data dimension, and executing step 704 when any dimension in the service data has no data missing in the life cycle corresponding to the dimension;
step 704, verifying the distribution of the dimensions in the service data, and executing step 705 when the dimension distribution included in the service data is matched with the dimension distribution corresponding to each dimension;
step 705, verifying the data format of the service data, and determining that the service data is abnormal when the data format of the service data is matched with a preset data format rule.
In an embodiment of the present disclosure, for any of the above steps, when the service data is verified, the verification process of the current service data is ended under the condition that the service data is not matched or is not continuous or is incomplete, the verification sequence may be flexibly set according to actual needs, for example, the first step may verify a data format of the service data, the second step verifies distribution of the dimensions in the service data, the third step verifies continuity of data in a life cycle of the dimensions of the service data, the fourth step verifies an enumerated value of the dimensions of the service data, and finally verifies the number of the dimensions of the service data.
In order to deepen understanding of an implementation manner of the embodiment of the present disclosure, the method for detecting business data is described below based on a flow chart for running business data detection provided in fig. 8, referring to fig. 8, when business data analysis is started, problems occurring in a process of analyzing business data are discovered through multidimensional root factors, meanwhile, business data analysis requirements are arranged, business data detection rules are collected, the business data detection rules are refined based on the business data analysis requirements, the refined data detection rules are verified, after the verification is passed, the verified data detection rules are executed, problems occurring in the process of analyzing business data are traced, when the problems are determined to exist, abnormal business data are output and data quality problems are checked, after the problems are solved, the business data are continuously analyzed until analysis results are obtained, and displaying the analysis result to the user in the preset application, and ending the service data analysis process.
In a possible implementation manner, the problem of the quality of the service data may be solved by removing abnormal service data, performing data analysis based on the service data from which the abnormal service data is removed, and of course, the abnormal service data may be checked to solve the problem of the abnormal service data, when the problem of the abnormal service data is solved, continuing performing data analysis based on the service data from which the problem is solved and other normal service data, when the problem of the abnormal service data cannot be solved, removing the abnormal service data, and performing data analysis based on the service data from which the abnormal service data is removed, which is not specifically limited by the present disclosure.
The embodiment of the disclosure considers whether the service data is complete in dimensionality by determining the dimensionality required by a certain index, cuts in from the perspective of the required dimensionality, can ensure that the problem of poor data quality caused by the loss of the service data in dimensionality can not occur, avoids generating errors or one-sided analysis conclusions, and improves the efficiency of data detection and the accuracy of analysis.
According to the business data detection method provided by the disclosure, firstly, in a multidimensional root cause analysis scene, after abnormal business data are filtered, the negative influence of the abnormal business data on an analysis conclusion is shielded, the probability of generating a one-sided conclusion is reduced by 80%, secondly, the multidimensional root cause analysis efficiency can be improved by 40%, and resources for tracing the quality problem of the business data are saved; finally, the automatic detection of the quality of the service data is realized, the dimension and index of the service data quality problem sending are rapidly positioned, and the problem troubleshooting efficiency is greatly improved.
The business data detection method provided by the disclosure can also be extended to data quality verification in a multidimensional analysis system, and although multidimensional analysis is a data query scene with strong randomness, the statistical goal of the business data detection method is consistent with that of root cause analysis in many cases, so the business data detection method provided by the disclosure also has a derivative use value.
Fig. 9 is a block diagram illustrating a traffic data detection apparatus according to an example embodiment. Referring to fig. 9, the apparatus includes:
a first obtaining unit 901, configured to obtain service data and a service index parameter and a service dimension parameter of the service data;
a second obtaining unit 902, configured to obtain a dimension coverage range corresponding to the target index, where the dimension coverage range is configured to indicate a dimension standard required for counting the target index;
a comparison unit 903 configured to compare the dimension included in the service data with the dimension coverage;
a determining unit 904 configured to determine that the service data is abnormal when the dimension included in the service data does not satisfy the dimension coverage.
In a possible implementation manner, the comparing unit 903 is specifically configured to:
obtaining the number of dimensions required by the target index from the dimension coverage range corresponding to the target index;
and when the number of the dimensions included in the business data is not matched with the number of the dimensions required by the target index, determining that the dimensions included in the business data do not meet the dimension coverage range.
In a possible implementation manner, the comparing unit 903 is further specifically configured to:
acquiring a life cycle corresponding to each dimension from a dimension coverage range corresponding to the target index, wherein the life cycle is configured to represent a time period from creation to termination of the dimension;
and when any dimension in the service data has data missing on the life cycle corresponding to the dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
In a possible implementation manner, the comparing unit 903 is further specifically configured to:
obtaining dimension distribution corresponding to each dimension from the dimension coverage range corresponding to the target index;
and when the dimension distribution included in the business data is not matched with the dimension distribution corresponding to each dimension, determining that the dimension included in the business data does not satisfy the dimension coverage range.
In a possible implementation manner, the comparing unit 903 is further specifically configured to:
acquiring a continuous time range required by the indexes in the period corresponding to each service dimension from the dimension coverage range corresponding to the target index;
when the continuous time of the indexes included in the service data of the target time period is not matched with the continuous time range required by the indexes in the period corresponding to each service dimension, determining that the service data is abnormal;
and when the continuous time of the indexes included in the service data of the target time period is matched with the continuous time range required by the indexes in the period corresponding to each service dimension, determining that the service data is normal.
In a possible implementation manner, the comparing unit 903 is further specifically configured to:
acquiring a data format of service data;
matching the data format information with a preset data format rule;
and when the data format of the service data is not matched with the preset data format rule, determining that the service data is abnormal.
In a possible implementation manner, the apparatus further includes:
an output unit 905 configured to obtain a dimension corresponding to the abnormal service data; determining the position of abnormal business data based on the dimension; and outputting an abnormal service data report, wherein the abnormal service data report comprises the position of the abnormal service data.
With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
FIG. 10 is a block diagram illustrating a computer device according to an example embodiment. The computer device 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1001 to implement the service data detection method provided by each of the method embodiments. Certainly, the computer device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device may further include other components for implementing the functions of the device, which is not described herein again.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

1. A service data detection method is characterized by comprising the following steps:
obtaining dimension information included in service data of a target time period, wherein the dimension information is an attribute of the service data;
determining a target index based on the type of the business data, wherein the target index is used for detecting the integrity and the continuity of the business data, the target index comprises a plurality of indexes corresponding to the type, and the indexes are the standards for business data quantization;
acquiring a dimension coverage range corresponding to the target index, wherein the dimension coverage range is used for indicating and counting dimension information required by the target index, and the dimension coverage range at least comprises one of the number of dimensions required by the target index, the number and value range of enumerated values of the dimension information, a life cycle corresponding to the dimension, dimension distribution of the dimension in the life cycle and a preset data format rule of business data, wherein the number of the dimensions required by the target index is detected by establishing an index-dimension matrix based on a bus architecture, rows in the index-dimension matrix represent each index and represent the dimension in a list, and a cross point is used for marking the index and is related to the corresponding dimension; detecting the preset data format rule of the service data in a python self-defined function scanning mode;
comparing the dimension information included in the service data with the dimension coverage range;
and when any dimension information included in the service data does not meet the dimension coverage range, determining that the service data is abnormal.
2. The method according to claim 1, wherein the comparing the dimension information included in the service data with the dimension coverage comprises:
obtaining the number of dimensions required by the target index from the dimension coverage range corresponding to the target index;
and when the number of the dimension information included in the service data is not matched with the number of the dimensions required by the target index, determining that the dimension information included in the service data does not satisfy the dimension coverage range.
3. The method according to claim 1, wherein the comparing the dimension information included in the service data with the dimension coverage comprises:
acquiring enumeration values corresponding to all dimensions from the dimension coverage range corresponding to the target index;
and when any one of the number and the value range of the enumeration values of the dimension information included in the service data is not matched with the number and the value range of the enumeration values corresponding to each dimension, determining that the dimension information included in the service data does not satisfy the dimension coverage range.
4. The method according to claim 1, wherein the comparing the dimension information included in the service data with the dimension coverage comprises:
acquiring a life cycle corresponding to each dimension from a dimension coverage range corresponding to the target index, wherein the life cycle is used for representing a time period from creation to termination of the dimension;
and when any dimension in the service data has data missing on the life cycle corresponding to the dimension, determining that the dimension information included in the service data does not satisfy the dimension coverage range.
5. The method according to claim 1, wherein the comparing the dimension information included in the service data with the dimension coverage comprises:
obtaining the dimension distribution corresponding to each dimension from the dimension coverage range corresponding to the target index;
and when the dimension information distribution included in the service data is not matched with the dimension distribution corresponding to each dimension, determining that the dimension information included in the service data does not satisfy the dimension coverage range.
6. The method according to claim 2, wherein after obtaining the dimensional information included in the service data of the target time period, the method further comprises:
acquiring a data format of the service data;
matching the data format information with a preset data format rule;
and when the data format of the service data is not matched with the preset data format rule, determining that the service data is abnormal.
7. The method of claim 1, wherein after determining that the traffic data is anomalous, the method further comprises:
obtaining the corresponding dimensionality of abnormal business data;
determining the position of the abnormal business data based on the dimension;
and outputting an abnormal service data report, wherein the abnormal service data report comprises the position of the abnormal service data.
8. A service data detection apparatus, comprising:
the service data acquisition unit is configured to acquire service data of a target time period, wherein the service data comprises dimension information, and the dimension information is an attribute of the service data;
a first determining unit, configured to determine a target index based on a type of the service data, where the target index is used to detect integrity and continuity of the service data, the target index includes a plurality of indexes corresponding to the type, and the indexes are standards for quantization of the service data;
a second obtaining unit, configured to obtain a dimension coverage range corresponding to a target index, where the dimension coverage range is configured to indicate a dimension required by statistics of the target index, and the dimension coverage range at least includes one of a number of dimensions required by the target index, a number and a value range of enumerated values of dimension information, a life cycle corresponding to the dimension, a dimension distribution of the dimension in the life cycle, and a preset data format rule of business data, where the number of dimensions required by the target index is detected by establishing an index-dimension matrix based on a bus architecture, a row in the index-dimension matrix represents each index, a list represents the dimension, and a cross point is used for marking the index and is related to the corresponding dimension; detecting the preset data format rule of the service data in a python self-defined function scanning mode;
a comparison unit configured to compare dimension information included in the service data with the dimension coverage;
a second determining unit, configured to determine that the service data is abnormal when any one of the dimension information included in the service data does not satisfy the dimension coverage.
9. The apparatus according to claim 8, wherein the alignment unit is specifically configured to:
obtaining the number of dimensions required by the target index from the dimension coverage range corresponding to the target index;
and when the number of the dimensions included in the business data is not matched with the number of the dimensions required by the target index, determining that the dimensions included in the business data do not meet the dimension coverage range.
10. The apparatus of claim 8, wherein the alignment unit is further specifically configured to:
acquiring enumeration values corresponding to all dimensions from the dimension coverage range corresponding to the target index;
and when any one of the number and the value range of the enumerated values of the dimensions included in the service data is not matched with the number and the value range of the enumerated values corresponding to the dimensions, determining that the dimensions included in the service data do not satisfy the dimension coverage range.
11. The apparatus of claim 8, wherein the alignment unit is further specifically configured to:
acquiring a life cycle corresponding to each dimension from a dimension coverage range corresponding to the target index, wherein the life cycle is configured to represent a time period from creation to termination of the dimension;
and when any dimension in the service data has data missing on the life cycle corresponding to the dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
12. The apparatus of claim 8, wherein the alignment unit is further specifically configured to:
obtaining a dimension distribution range corresponding to each dimension from a dimension coverage range corresponding to the target index;
and when the dimension distribution range included in the service data is not matched with the dimension distribution range corresponding to each dimension, determining that the dimension included in the service data does not satisfy the dimension coverage range.
13. The apparatus of claim 8, wherein the alignment unit is further specifically configured to:
acquiring a data format of service data;
matching the data format information with a preset data format rule;
and when the data format of the service data is not matched with the preset data format rule, determining that the service data is abnormal.
14. The apparatus of claim 8, further comprising:
the output unit is configured to acquire the dimensionality corresponding to the abnormal service data; determining the position of abnormal business data based on the dimension; and outputting an abnormal service data report, wherein the abnormal service data report comprises the position of the abnormal service data.
15. A computer device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the traffic data detection method according to any of claims 1 to 7.
16. A storage medium in which instructions, when executed by a processor of a computer device, enable the computer device to perform the traffic data detection method of any one of claims 1 to 7.
CN201910557452.2A 2019-06-25 2019-06-25 Service data detection method and device, computer equipment and storage medium Active CN110275878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910557452.2A CN110275878B (en) 2019-06-25 2019-06-25 Service data detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910557452.2A CN110275878B (en) 2019-06-25 2019-06-25 Service data detection method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110275878A CN110275878A (en) 2019-09-24
CN110275878B true CN110275878B (en) 2021-08-17

Family

ID=67963197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910557452.2A Active CN110275878B (en) 2019-06-25 2019-06-25 Service data detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110275878B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799903A (en) * 2019-11-14 2021-05-14 北京沃东天骏信息技术有限公司 Method and device for evaluating health state of business system
CN111125194B (en) * 2019-12-25 2023-04-11 中国建筑科学研究院有限公司 Data construction method and device applied to urban-level clean heating
CN113132130B (en) * 2019-12-30 2023-04-07 中国移动通信集团北京有限公司 Network index prediction method, device, equipment and storage medium
CN112486969B (en) * 2020-12-01 2021-08-03 罗嗣扬 Data cleaning method applied to big data and deep learning and cloud server

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN108764705A (en) * 2018-05-24 2018-11-06 国信优易数据有限公司 A kind of data quality accessment platform and method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292857B2 (en) * 2010-11-23 2016-03-22 Panorama Software Inc. One-click exceptions
JP6720610B2 (en) * 2016-03-22 2020-07-08 日本電気株式会社 Information processing system, information processing method, and program
CN107895003A (en) * 2017-10-31 2018-04-10 山东浪潮云服务信息科技有限公司 A kind of data quality checking method and apparatus
CN108764707A (en) * 2018-05-24 2018-11-06 国信优易数据有限公司 A kind of data assessment system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN108764705A (en) * 2018-05-24 2018-11-06 国信优易数据有限公司 A kind of data quality accessment platform and method

Also Published As

Publication number Publication date
CN110275878A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN110275878B (en) Service data detection method and device, computer equipment and storage medium
CN110245078B (en) Software pressure testing method and device, storage medium and server
US10031829B2 (en) Method and system for it resources performance analysis
CN109002391B (en) Method for automatically detecting embedded software interface test data
CN109934268B (en) Abnormal transaction detection method and system
WO2016008398A1 (en) Program performance test method and device
US10467590B2 (en) Business process optimization and problem resolution
CN111160329A (en) Root cause analysis method and device
CN109639456B (en) Improvement method for automatic alarm and automatic processing platform for alarm data
CN108363024B (en) Method and device for positioning fault point of charging pile
CN106294109B (en) Method and device for acquiring defect code
CN112948262A (en) System test method, device, computer equipment and storage medium
CN111506455B (en) Checking method and device for service release result
CN115545241A (en) Charging pile state identification method and device, electronic equipment and storage medium
CN115904955A (en) Performance index diagnosis method and device, terminal equipment and storage medium
CN114676061A (en) Knowledge graph-based automatic firmware detection method
CN113742213A (en) Method, system, and medium for data analysis
CN113268419A (en) Method, device, equipment and storage medium for generating test case optimization information
CN112541177A (en) Data security-based anomaly detection method and system
CN111538673A (en) Processing method, device, equipment and storage medium based on test case
CN110704326A (en) Test analysis method and device
CN111626586B (en) Data quality detection method, device, computer equipment and storage medium
CN113094265B (en) Analysis method and analysis device for test script and electronic equipment
CN113032227B (en) Abnormal network element detection method and device, electronic equipment and storage medium
CN114860549B (en) Buried data verification method, buried data verification device, buried data verification equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant