CN117493828A - Abnormal data identification method, device, computer equipment and storage medium - Google Patents

Abnormal data identification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117493828A
CN117493828A CN202311524411.6A CN202311524411A CN117493828A CN 117493828 A CN117493828 A CN 117493828A CN 202311524411 A CN202311524411 A CN 202311524411A CN 117493828 A CN117493828 A CN 117493828A
Authority
CN
China
Prior art keywords
data
target
value
difference value
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311524411.6A
Other languages
Chinese (zh)
Inventor
蒋玉宝
李振雷
王丙新
马建辉
李凯艳
张爔文
翟文亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Jiefang Automotive Co Ltd
Original Assignee
FAW Jiefang Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Jiefang Automotive Co Ltd filed Critical FAW Jiefang Automotive Co Ltd
Priority to CN202311524411.6A priority Critical patent/CN117493828A/en
Publication of CN117493828A publication Critical patent/CN117493828A/en
Pending legal-status Critical Current

Links

Landscapes

  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application relates to an abnormal data identification method, an abnormal data identification device, computer equipment and a storage medium. The method comprises the following steps: acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type; acquiring a fluctuation interval of the target field according to the time sequence; sequencing the data in the time sequence from small to large according to the value to obtain an incremental sequence; according to the increment sequence, acquiring a numerical interval of the target field; performing abnormal recognition on the target data according to the fluctuation interval and the numerical value interval; the target data is data corresponding to the target field in target report data. By adopting the method, the recognition efficiency of the abnormal value in the internet of vehicles data can be improved.

Description

Abnormal data identification method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of internet of vehicles data technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for identifying abnormal data.
Background
With the rapid development of internet of vehicles big data technology, the potential value of internet of vehicles data is gradually being mined and applied by people. In the process of data acquisition of the internet of vehicles, due to noise and interference of other factors, abnormal values may exist in the acquired data, and the abnormal values may have adverse effects on the subsequent data analysis process.
In the related art, abnormal data in the internet of vehicles data is generally identified according to a clustering algorithm, however, the clustering algorithm is greatly influenced by the initial clustering centroid position, and under the condition of large data volume, a large amount of computing resources are required to be consumed, so that the abnormal value identification efficiency is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an abnormal data identification method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the efficiency of abnormal value identification.
In a first aspect, the present application provides an abnormal data identification method, including:
acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type;
acquiring a fluctuation interval of a target field according to the time sequence;
sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence;
according to the increment sequence, acquiring a numerical interval of the target field;
performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In one embodiment, acquiring the fluctuation interval of the target field according to the time sequence includes:
Acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set;
and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
In one embodiment, acquiring the fluctuation interval according to the normal distribution parameters of the difference data set includes:
and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
In one embodiment, the obtaining the value interval of the target field according to the increment sequence includes:
sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence;
averaging the data in the difference sequence to obtain an average difference value;
and acquiring a numerical interval according to the average difference value.
In one embodiment, obtaining the value interval according to the average difference value includes:
sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and an average difference value;
Sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value;
and determining a numerical interval according to the first boundary value and the second boundary value.
In one embodiment, performing anomaly identification on target data according to a fluctuation interval and a numerical interval includes:
under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence;
Judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval;
and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
In a second aspect, the present application further provides an abnormal data identification apparatus, including:
the first acquisition module is used for acquiring a time sequence corresponding to the target field from historical report data of the target vehicle type;
the second acquisition module is used for acquiring a fluctuation interval of the target field according to the time sequence;
the sequencing module is used for sequencing the data in the time sequence from small to large according to the value to obtain an incremental sequence;
the third acquisition module is used for acquiring a numerical interval of the target field according to the incremental sequence;
the identification module is used for carrying out abnormal identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
Acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type;
acquiring a fluctuation interval of a target field according to the time sequence;
sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence;
according to the increment sequence, acquiring a numerical interval of the target field;
performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type;
acquiring a fluctuation interval of a target field according to the time sequence;
sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence;
according to the increment sequence, acquiring a numerical interval of the target field;
performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type;
acquiring a fluctuation interval of a target field according to the time sequence;
sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence;
according to the increment sequence, acquiring a numerical interval of the target field;
performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
According to the abnormal data identification method, the device, the computer equipment, the storage medium and the computer program product, the time sequence corresponding to the target field is acquired from the historical report data of the target vehicle type, the fluctuation interval of the target field is acquired according to the time sequence, then the data in the time sequence is ordered from small to large according to the value, the increment sequence is obtained, the value interval of the target field is acquired according to the increment sequence, further, the abnormal identification is carried out on the target data according to the fluctuation interval and the value interval, the abnormal value can be judged through the fluctuation interval and the value interval, complex clustering calculation is not needed, and therefore the abnormal value identification efficiency in the Internet of vehicles data can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for an anomaly data recognition method in one embodiment;
FIG. 2 is a flow chart of a method for identifying abnormal data in one embodiment;
FIG. 3 is a flowchart illustrating a process for obtaining a first boundary value according to one embodiment;
FIG. 4 is a flowchart illustrating a second boundary value acquisition process according to an embodiment;
FIG. 5 is a flowchart of a method for identifying abnormal data according to another embodiment;
FIG. 6 is a block diagram of an apparatus for identifying abnormal data in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The abnormal data identification method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, there is provided an abnormal data identification method, which is described by taking the application of the method to the server 104 in fig. 1 as an example, and includes the following steps:
s202: and acquiring a time sequence corresponding to the target field from historical report data of the target vehicle type.
Optionally, in the process of identifying the abnormal data, the server first obtains historical report data of the target vehicle type, where the historical report data includes data messages uploaded by a plurality of vehicles at different time points in a historical time period, and fields in the data messages may be determined according to a specific internet of vehicles communication protocol, for example, may include fields of engine speed, vehicle driving mileage, and the like.
Further, the server acquires a time sequence corresponding to the target field in the data message uploaded by each vehicle from the historical report data. For example, taking the engine speed as an example, the server may acquire the engine speed time series sequence [ V ] corresponding to the vehicle 1 from the historical report data 11 ,V 12 ,V 13 ……V 1a ]Engine speed time sequence [ V ] corresponding to vehicle 2 21 ,V 22 ,V 23 ……V 2b ]Etc.
S204: and acquiring a fluctuation interval of the target field according to the time sequence.
The fluctuation interval refers to a normal numerical variation range of adjacent data in a time sequence corresponding to the target field.
Optionally, after obtaining the target field time sequence corresponding to each vehicle, the server obtains the fluctuation interval of the target field according to the target field time sequence corresponding to each vehicle.
S206: and sorting the data in the sequence from small to large according to the numerical value to obtain an incremental sequence.
Optionally, after obtaining the fluctuation range of the target field, the server merges all the target field time sequence sequences, and sorts the sequence according to the order from the small value to the large value to obtain the incremental sequence. The sorting algorithm may be a selection sorting, insertion sorting, bubbling sorting, hil sorting, merging sorting, fast sorting, or heap sorting, and the like, and is not particularly limited herein.
S208: and acquiring a numerical interval of the target field according to the increment sequence.
The value interval refers to a normal value range of the data corresponding to the target field.
Optionally, after obtaining the increment sequence, the server obtains a numerical interval of the target field according to the increment sequence.
S210: performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
Optionally, after obtaining the fluctuation interval and the numerical value interval, the server performs anomaly identification on the data corresponding to the target field in the target report data according to the fluctuation interval and the numerical value interval.
According to the abnormal data identification method, firstly, the time sequence corresponding to the target field is obtained from historical report data of the target vehicle type, the fluctuation interval of the target field is obtained according to the time sequence, then the data in the time sequence are ordered from small to large according to the value, the increment sequence is obtained, the value interval of the target field is obtained according to the increment sequence, further, the abnormal identification is carried out on the target data according to the fluctuation interval and the value interval, the abnormal value can be judged through the fluctuation interval and the value interval, complex clustering calculation is not needed, and therefore the abnormal value identification efficiency in the internet of vehicles data can be improved.
In one embodiment, acquiring the fluctuation interval of the target field according to the time sequence includes: acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set; and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
Optionally, in the process of acquiring the fluctuation interval, the server firstly acquires the difference value of adjacent data in the target field time sequence corresponding to each vehicle, and merges the difference value into a difference value data set. For example, taking the engine speed as an example, the time series sequence of the engine speed corresponding to the vehicle 1 is [ V 11 ,V 12 ,V 13 ……V 1a ]The time series corresponding to the engine speed of the vehicle 2 is [ V 21 ,V 22 ,V 23 ……V 2b ]The obtained difference data set is { DeltaV 12 ,ΔV 13 ……ΔV 1a ,ΔV 22 ,ΔV 23 ……ΔV 2b }, where DeltaV 12 =V 12 -V 11 ,ΔV 13 =V 13 -V 12 Etc., and so on.
After obtaining the difference dataset, the server may calculate normal distribution parameters of the data in the difference dataset, for example, normal distribution mean, normal distribution variance, normal distribution standard deviation, and the like. Further, a fluctuation interval is acquired according to the normal distribution parameters of the difference data set.
In this embodiment, the difference data set is obtained by obtaining the difference between adjacent data in the time sequence, and the fluctuation interval is obtained according to the normal distribution parameters of the difference data set, so that the appropriate fluctuation interval can be determined according to the statistical characteristics of the data fluctuation, and the abnormal data can be more effectively determined.
In one embodiment, acquiring the fluctuation interval according to the normal distribution parameters of the difference data set includes: and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
Alternatively, the server may first calculate the normal distribution mean and normal distribution standard deviation of the difference dataset. The specific calculation mode is as follows:
wherein μ represents the normal distribution mean value of the difference data set, σ represents the normal distribution standard deviation of the difference data set, m represents the difference number in the difference data set, and Δx 1 、Δx 2 、......、Δx m A specific number of m differences is indicated.
Further, a fluctuation interval [ mu-kxsigma, mu+kxsigma ] is obtained from the normal distribution mean value and the normal distribution standard deviation of the difference data set. The specific value of k can be determined according to the area rule of the normal distribution curve.
In an alternative embodiment, k=3, and the fluctuation interval is [ μ -3σ, μ+3σ ].
In this embodiment, the fluctuation interval is obtained according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set, so that the area rule of the normal distribution curve can be quickly determined, and a suitable fluctuation range confidence interval can be quickly determined, thereby facilitating calculation of the fluctuation interval.
In one embodiment, obtaining the value interval of the target field according to the increment sequence includes: sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence; averaging the data in the difference sequence to obtain an average difference value; and acquiring a numerical interval according to the average difference value.
Optionally, in the process of acquiring the numerical value interval, the server sequentially acquires the difference value of adjacent data in the incremental sequence. For example, the increment sequence is [ Y 1 ,Y 2 ,Y 3 ……Y n ]The corresponding difference sequence is [ delta Y ] 2 ,ΔY 3 ……ΔY n ]Wherein ΔY is 2 =Y 2 -Y 1 ,ΔY 3 =Y 3 -Y 2 And so on.
Then, the server averages the data in the difference sequence to obtain an average difference. The specific calculation mode is as follows:
wherein Y is mean Represents the average difference, n-1 represents the number of differences in the sequence of differences, ΔY 2 、ΔY 3 、……ΔY n A specific number of n-1 differences is indicated.
Further, the server obtains a numerical range from the average difference value.
In this embodiment, the difference sequence is obtained by sequentially obtaining the differences of adjacent data in the incremental sequence, and the data in the difference sequence is averaged to obtain an average difference, and then the numerical interval is obtained according to the average difference, and the normal numerical value can be measured by the average difference, so as to determine a proper numerical interval.
In one embodiment, obtaining the value interval from the average difference value includes: sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and an average difference value; sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value; and determining a numerical interval according to the first boundary value and the second boundary value.
Optionally, fig. 3 is a flowchart illustrating a process of obtaining the first boundary value, as shown in fig. 3, the server first obtains the first data Δy of the difference sequence 2 Initially, two adjacent and consecutive data Δy are acquired sequentially and sequentially 2 And DeltaY 3 Sequentially acquiring two adjacent and continuous data from the second difference as a first difference and a second difference, and continuously comparing the new first difference with the target threshold and comparing the new second difference with the target threshold until the latest acquired first difference is not greater than the target threshold and the latest acquired second difference is not greater than the target thresholdA difference value corresponds to the data of the incremental sequence as a first boundary value. It should be noted that, the target threshold is determined according to a preset coefficient and an average difference, and in an alternative embodiment, the preset coefficient is 2.
FIG. 4 is a flow chart for obtaining the second boundary value, as shown in FIG. 4, from the last data ΔY of the difference sequence n Initially, two adjacent and consecutive data Δy are acquired in reverse order n And DeltaY n-1 And sequentially acquiring two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value under the condition that the third difference value is larger than the target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value, and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the increment sequence corresponding to the newly acquired third difference value as a second boundary value.
And obtaining a first boundary value and a second boundary value, and determining a numerical interval [ A, B ] according to the first boundary value and the second boundary value by the server. Wherein a represents a first boundary value and B represents a second boundary value.
In this embodiment, the effectiveness of the boundary values can be improved by acquiring continuous data from the difference sequence, comparing the continuous data with the target threshold, and determining the first boundary value and the second boundary value according to the comparison result.
In one embodiment, the anomaly identification for the target data is performed according to the fluctuation interval and the numerical interval, including: under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
Optionally, in the case that the target data belongs to a numerical value interval and the target difference value belongs to a fluctuation interval, the server determines that the target data is normal data; and under the condition that the target data does not belong to the numerical value interval and the target difference value does not belong to the fluctuation interval, the server judges that the target data is abnormal data.
And under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, the server judges that the target data is normal data and updates the numerical value interval according to the target data. If the target data is Y, the numerical value interval is [ A, B ], and Y < A, the updated numerical value interval is [ Y, B ]; if the target data is Y, the numerical interval is [ A, B ], and Y > B, the updated numerical interval is [ A, T ].
And under the condition that the target data belongs to the numerical value interval and the target difference value does not belong to the fluctuation interval, marking the target data, and generating a corresponding log for subsequent manual judgment.
In the embodiment, the abnormal value in the internet of vehicles data is judged through the fluctuation interval and the numerical interval, and complex clustering calculation is not needed, so that the abnormal value identification efficiency in the internet of vehicles data can be improved.
In one embodiment, as shown in fig. 5, there is provided an abnormal data identification method, the method comprising the steps of:
S502: and acquiring a time sequence corresponding to the target field from historical report data of the target vehicle type.
S504: acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set; and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
S506: and sorting the data in the sequence from small to large according to the numerical value to obtain an incremental sequence.
S508: sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence; and averaging the data in the difference sequence to obtain an average difference value.
S510: sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to the preset coefficient and the average difference value.
S512: and sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value.
S514: and determining a numerical interval according to the first boundary value and the second boundary value.
S516: under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an abnormal data identification device for realizing the above related abnormal data identification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiment of the one or more abnormal data recognition devices provided below may refer to the limitation of the abnormal data recognition method hereinabove, and will not be repeated herein.
In an exemplary embodiment, as shown in fig. 6, there is provided an abnormal data recognition apparatus including: a first acquisition module 610, a second acquisition module 620, a ranking module 630, a third acquisition module 640, and an identification module 650, wherein:
the first obtaining module 610 is configured to obtain a time sequence corresponding to the target field from historical report data of the target vehicle type.
The second obtaining module 620 is configured to obtain, according to the time sequence, a fluctuation interval of the target field.
The sorting module 630 is configured to sort the data in the sequence according to the value from small to large, so as to obtain an incremental sequence.
A third obtaining module 640, configured to obtain a value interval of the target field according to the increment sequence.
The identifying module 650 is configured to identify the abnormality of the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In one embodiment, the second obtaining module 620 is further configured to obtain a difference value of adjacent data in the time sequence, so as to obtain a difference value dataset; and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
In one embodiment, the second obtaining module 620 is further configured to obtain the fluctuation interval according to a normal distribution mean of the difference data set and a normal distribution standard deviation of the difference data set.
In one embodiment, the third obtaining module 640 is further configured to sequentially obtain differences between adjacent data in the incremental sequence, to obtain a difference sequence; averaging the data in the difference sequence to obtain an average difference value; and acquiring a numerical interval according to the average difference value.
In one embodiment, the third obtaining module 640 is further configured to sequentially obtain two adjacent and continuous data from the first data of the difference sequence, as a first difference value and a second difference value, sequentially obtain two adjacent and continuous data from the second difference value, as a new first difference value and a new second difference value, continuously compare the new first difference value with the target threshold value, and compare the new second difference value with the target threshold value until the newly obtained first difference value is not greater than the target threshold value and the newly obtained second difference value is not greater than the target threshold value, and use the newly obtained first difference value as the data of the incremental sequence corresponding to the first difference value as the first boundary value; the target threshold is determined according to a preset coefficient and an average difference value; sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value; and determining a numerical interval according to the first boundary value and the second boundary value.
In one embodiment, the identification module 650 is further configured to determine that the target data is normal data when the target data belongs to a numerical range and the target difference value belongs to a fluctuation range; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
The respective modules in the above-described abnormal data recognition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data discrimination logs and other business data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of anomaly data identification.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one exemplary embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type; acquiring a fluctuation interval of a target field according to the time sequence; sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence; according to the increment sequence, acquiring a numerical interval of the target field; performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set; and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
In one embodiment, the processor when executing the computer program further performs the steps of: and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
In one embodiment, the processor when executing the computer program further performs the steps of: sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence; averaging the data in the difference sequence to obtain an average difference value; and acquiring a numerical interval according to the average difference value.
In one embodiment, the processor when executing the computer program further performs the steps of: sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and an average difference value; sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value; and determining a numerical interval according to the first boundary value and the second boundary value.
In one embodiment, the processor when executing the computer program further performs the steps of: under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type; acquiring a fluctuation interval of a target field according to the time sequence; sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence; according to the increment sequence, acquiring a numerical interval of the target field; performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set; and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
In one embodiment, the computer program when executed by the processor further performs the steps of: and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
In one embodiment, the computer program when executed by the processor further performs the steps of: sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence; averaging the data in the difference sequence to obtain an average difference value; and acquiring a numerical interval according to the average difference value.
In one embodiment, the computer program when executed by the processor further performs the steps of: sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and an average difference value; sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value; and determining a numerical interval according to the first boundary value and the second boundary value.
In one embodiment, the computer program when executed by the processor further performs the steps of: under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of: acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type; acquiring a fluctuation interval of a target field according to the time sequence; sequencing the data in the sequence from small to large according to the value to obtain an incremental sequence; according to the increment sequence, acquiring a numerical interval of the target field; performing anomaly identification on the target data according to the fluctuation interval and the numerical value interval; the target data is the data corresponding to the target field in the target report data.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set; and acquiring a fluctuation interval according to the normal distribution parameters of the difference data set.
In one embodiment, the computer program when executed by the processor further performs the steps of: and acquiring a fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
In one embodiment, the computer program when executed by the processor further performs the steps of: sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence; averaging the data in the difference sequence to obtain an average difference value; and acquiring a numerical interval according to the average difference value.
In one embodiment, the computer program when executed by the processor further performs the steps of: sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the data of the newly acquired first difference value corresponding to the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and an average difference value; sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than a target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and taking the data of the newly acquired third difference value corresponding to the incremental sequence as a second boundary value; and determining a numerical interval according to the first boundary value and the second boundary value.
In one embodiment, the computer program when executed by the processor further performs the steps of: under the condition that the target data belong to a numerical value interval and the target difference value belongs to a fluctuation interval, judging the target data as normal data; the target difference value refers to the difference value between the target data and the data of the previous time point in the same time sequence; judging that the target data is abnormal data under the condition that the target data does not belong to a numerical value interval and the target difference value does not belong to a fluctuation interval; and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging that the target data is normal data, and updating the numerical value interval according to the target data.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of identifying anomalous data, the method comprising:
acquiring a time sequence corresponding to a target field from historical report data of a target vehicle type;
acquiring a fluctuation interval of the target field according to the time sequence;
sequencing the data in the time sequence from small to large according to the value to obtain an incremental sequence;
according to the increment sequence, acquiring a numerical interval of the target field;
Performing abnormal recognition on the target data according to the fluctuation interval and the numerical value interval; the target data is data corresponding to the target field in target report data.
2. The method according to claim 1, wherein the acquiring the fluctuation interval of the target field according to the timing sequence includes:
acquiring the difference value of adjacent data in the time sequence to obtain a difference value data set;
and acquiring the fluctuation interval according to the normal distribution parameters of the difference data set.
3. The method according to claim 2, wherein said obtaining said fluctuation interval from a normal distribution parameter of said difference dataset comprises:
and acquiring the fluctuation interval according to the normal distribution mean value of the difference data set and the normal distribution standard deviation of the difference data set.
4. The method according to claim 1, wherein the obtaining the value interval of the target field according to the increment sequence includes:
sequentially obtaining the difference values of adjacent data in the incremental sequence to obtain a difference value sequence;
averaging the data in the difference sequence to obtain an average difference value;
And acquiring the numerical value interval according to the average difference value.
5. The method of claim 4, wherein obtaining the interval of values based on the average difference value comprises:
sequentially acquiring two adjacent and continuous data from the first data of the difference sequence as a first difference value and a second difference value, sequentially acquiring the two adjacent and continuous data from the second difference value as a new first difference value and a new second difference value when the first difference value is larger than a target threshold value or the second difference value is larger than the target threshold value, continuously comparing the new first difference value with the target threshold value, and comparing the new second difference value with the target threshold value until the newly acquired first difference value is not larger than the target threshold value and the newly acquired second difference value is not larger than the target threshold value, and taking the newly acquired first difference value corresponding to the data of the incremental sequence as a first boundary value; the target threshold is determined according to a preset coefficient and the average difference value;
sequentially acquiring two adjacent and continuous data from the last data of the difference sequence in reverse order as a third difference value and a fourth difference value, sequentially acquiring the two adjacent and continuous data from the fourth difference value as a new third difference value and a new fourth difference value when the third difference value is larger than the target threshold value or the fourth difference value is larger than the target threshold value, continuously comparing the new third difference value with the target threshold value, and comparing the new fourth difference value with the target threshold value until the newly acquired third difference value is not larger than the target threshold value and the newly acquired fourth difference value is not larger than the target threshold value, and using the newly acquired third difference value corresponding to the data of the increment sequence as a second boundary value;
And determining the numerical interval according to the first boundary value and the second boundary value.
6. The method according to claim 1, wherein the anomaly identification of the target data based on the fluctuation interval and the numerical interval includes:
judging that the target data is normal data under the condition that the target data belongs to the numerical value interval and the target difference value belongs to the fluctuation interval; the target difference value refers to a difference value between the target data and the data of the previous time point in the same time sequence;
judging that the target data is abnormal data under the condition that the target data does not belong to the numerical value interval and the target difference value does not belong to the fluctuation interval;
and under the condition that the target data does not belong to the numerical value interval and the target difference value belongs to the fluctuation interval, judging the target data as normal data, and updating the numerical value interval according to the target data.
7. An abnormal data identification apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a time sequence corresponding to the target field from historical report data of the target vehicle type;
The second acquisition module is used for acquiring the fluctuation interval of the target field according to the time sequence;
the sequencing module is used for sequencing the data in the time sequence from small to large according to the numerical value to obtain an incremental sequence;
the third acquisition module is used for acquiring a numerical value interval of the target field according to the increment sequence;
the identification module is used for carrying out abnormal identification on the target data according to the fluctuation interval and the numerical value interval; the target data is data corresponding to the target field in target report data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311524411.6A 2023-11-15 2023-11-15 Abnormal data identification method, device, computer equipment and storage medium Pending CN117493828A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311524411.6A CN117493828A (en) 2023-11-15 2023-11-15 Abnormal data identification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311524411.6A CN117493828A (en) 2023-11-15 2023-11-15 Abnormal data identification method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117493828A true CN117493828A (en) 2024-02-02

Family

ID=89674241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311524411.6A Pending CN117493828A (en) 2023-11-15 2023-11-15 Abnormal data identification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117493828A (en)

Similar Documents

Publication Publication Date Title
CN115563477B (en) Harmonic data identification method, device, computer equipment and storage medium
CN112215655A (en) Client portrait label management method and system
CN116933035A (en) Data anomaly detection method, device, computer equipment and storage medium
CN116628138A (en) Logistics order text mining method and system applied to deep learning
CN117493828A (en) Abnormal data identification method, device, computer equipment and storage medium
CN115409070A (en) Method, device and equipment for determining critical point of discrete data sequence
US20220066988A1 (en) Hash suppression
JP2019105870A (en) Discrimination program, discrimination method and discrimination device
JP6950505B2 (en) Discrimination program, discrimination method and discrimination device
CN114741673B (en) Behavior risk detection method, clustering model construction method and device
CN116938769B (en) Flow anomaly detection method, electronic device, and computer-readable storage medium
CN117078112B (en) Energy consumption detection method and data analysis system applied to enterprise abnormal electricity management
CN117455695A (en) Resource borrowing default prediction method, device, computer equipment and storage medium
CN117522138A (en) Method, device, equipment and medium for identifying testing risk of financial business system
CN116720935A (en) Account category identification method, apparatus, computer device and storage medium
CN117933414A (en) Federal learning method for meeting Bayesian court nodes
CN114185888A (en) Data fetching method and device for business report, computer equipment and storage medium
CN117743960A (en) Method and device for determining running state parameters of robot and computer equipment
CN116993358A (en) Service processing method, device, computer equipment, storage medium and program product
CN116701531A (en) Code block sharing method, device, computer equipment and storage medium
CN117852818A (en) Method, device, computer equipment and storage medium for generating item code
CN116932677A (en) Address information matching method, device, computer equipment and storage medium
CN117314556A (en) Virtual resource recommendation method, device, computer equipment and storage medium
CN117290196A (en) System fault prediction method, model training method, device and computer equipment
CN115496158A (en) Object value prediction method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination