CN112348644B - Abnormal logistics order detection method by establishing monotonic positive correlation filter screen - Google Patents

Abnormal logistics order detection method by establishing monotonic positive correlation filter screen Download PDF

Info

Publication number
CN112348644B
CN112348644B CN202011282131.5A CN202011282131A CN112348644B CN 112348644 B CN112348644 B CN 112348644B CN 202011282131 A CN202011282131 A CN 202011282131A CN 112348644 B CN112348644 B CN 112348644B
Authority
CN
China
Prior art keywords
delivery
filter screen
distribution
abnormal
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282131.5A
Other languages
Chinese (zh)
Other versions
CN112348644A (en
Inventor
杨云丽
杨雪荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pinjian Intelligent Technology Co ltd
Original Assignee
Shanghai Pinjian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pinjian Intelligent Technology Co ltd filed Critical Shanghai Pinjian Intelligent Technology Co ltd
Priority to CN202011282131.5A priority Critical patent/CN112348644B/en
Publication of CN112348644A publication Critical patent/CN112348644A/en
Application granted granted Critical
Publication of CN112348644B publication Critical patent/CN112348644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0834Choice of carriers
    • G06Q10/08345Pricing

Abstract

The invention discloses an abnormal logistics order detection method by establishing a monotonic positive correlation filter screen, which comprises the following steps: acquiring a data source, wherein the data source comprises basic data of a goods source and delivery fees; performing necessary data cleaning, including missing value processing and necessary format conversion; calculating the correlation among the variables, analyzing independent variables with positive correlation with the delivery fees, and determining important influencing variables; dividing each dependent variable into boxes, and respectively dividing the grades among the cells; analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values of each group, including combining respective variables, analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values, and primarily forming a filter screen; correcting the filter screen by utilizing positive correlation, and ensuring that the lower limit value of the distribution cost is not reduced along with the increase of the interval value; the method and the device effectively improve the accuracy of abnormal logistics order detection on the premise of ensuring the interpretability and the flexibility.

Description

Abnormal logistics order detection method by establishing monotonic positive correlation filter screen
Technical Field
The invention relates to the technical field of abnormal value detection, in particular to an abnormal logistics order detection method by establishing a monotonic positive correlation filter screen.
Background
Outlier detection, also known as outlier detection, is one of the core problems of data mining. Outlier discrimination of one-dimensional observation point sequences is generally relatively easy based on statistical distributions, box plots, etc., but if the data is multi-dimensional, it is often necessary to build a complex model between multi-dimensional variables to detect outliers.
In the logistics transportation and distribution, whether the distribution fee of the goods distribution order is truly reasonable directly influences the upper layer analysis and application of the logistics transaction order. An abnormal value detection method is generally used to detect an abnormal delivery fee in a delivery order.
Common outlier detection methods, such as 3σ criterion based on a probabilistic statistical model method, isolated forest based on a machine learning method, etc., are based on the idea of detecting outliers from the perspective of data distribution or data density. However, the freight rate generally increases in a certain interval range along with important influencing factors such as freight weight and mileage, so that the abnormal value detection method generally used cannot effectively detect abnormal logistics orders in which the freight rate decreases along with the increasing of the influencing factors.
Disclosure of Invention
The invention aims to provide an abnormal logistics order detection method for establishing a monotonic positive correlation filter screen so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
an abnormal logistics order detection method by establishing a monotonic positive correlation filter screen comprises the following steps:
and acquiring data sources including basic data of the goods sources and delivery fees.
The necessary data cleansing is performed, including missing value processing and necessary format conversion.
And calculating the correlation among the variables, analyzing independent variables with positive correlation with the delivery fee, and determining important influencing variables.
And dividing the dependent variables into boxes, and respectively carrying out inter-cell grade division.
Analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values of each group, including combining respective variables, analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values, and primarily forming a filter screen.
Correcting the filter screen by utilizing positive correlation, and ensuring that the lower limit value of the distribution cost is not reduced along with the increase of the interval value; with the reduction of the interval value, the upper limit value of the distribution cost is not increased, and a filter screen model is formed.
The method comprises the steps of establishing a distribution fee prediction model, comparing model effects to illustrate the necessity and effectiveness of eliminating abnormal samples through a filter screen model, checking logistics transaction order data, carrying out parameter adjustment by using the transaction order with dirty data removed, and optimizing the detection effect of the filter screen model.
1. A data source is acquired. Auxiliary fields such as delivery mileage (mileage between departure place and destination), goods category and the like are derived by taking basic information of goods sources such as origin, destination, required vehicle length, goods weight, goods volume, goods type, vehicle type (such as not limited, common, flat plate, high column, van type and the like), delivery mileage, whether high-speed charging mode (such as trip, ton, party, piece and the like), loading mode and the like as main information. For each transaction already made, a large wide table is formed, associated with its delivery fee.
2. The necessary data is cleaned, such as screening samples with the weight of goods, the required vehicle length and the non-missing delivery mileage; rejecting the transaction samples occurring on a special holiday; one-to-one code mapping is performed on the discrete fields, necessary data format conversion is performed for correlation analysis, and the like. Meanwhile, a box line graph, a histogram and a bar graph are drawn to check the distribution condition of the respective variables and the dependent variables. A, rejecting variables with independent variable missing proportion higher than 30%; B. the independent variable with the missing proportion less than 30% is filled with the continuous independent variable by the median and the discrete independent variable by the mode.
3. Correlation between calculated variablesWherein X represents any independent variable, and Y represents the delivery fee freight. Three independent variables having the highest positive correlation with the delivery fee freight and having a correlation of not less than 0.8 are analyzed. If a plurality of independent variables with larger positive correlation with the distribution fee exist, the independent variables can be considered to be subjected to primary component analysis for dimension reduction treatment; if the independent variables meeting the conditions cannot be found, the sub-sample sets are continuously screened for similar analysis until important influencing variables can be determined. In this embodiment, the independent variables with high correlation with the dependent variable delivery fee freight are obtained by calculating the correlation, and the independent variables include the weight of the cargo, the required vehicle length and the delivery mileage.
4. And dividing the dependent variables into boxes, and respectively carrying out inter-cell grade division. The distribution mileage is divided into 7 section grades, the cargo weight is divided into 7 section grades, the required vehicle length is divided into 6 section grades, and the dividing critical value and the grade number are determined according to the distribution condition of the service and the cargo source.
5. The distribution of delivery fees over each combination of intervals is analyzed and upper and lower thresholds for each group are determined.
Assuming that there are N logistics transaction samples in the distribution mileage i grade, the goods weight j grade and the required vehicle length grade k, the distribution cost freight is recorded asStandard deviation of->The combined intra-interval delivery fee freight is within the interval +.>In, i.e. the lower limit of the delivery charge is +.>The upper limit is +.>Samples with a delivery cost exceeding three times the standard deviation of the mean are considered abnormal samples.
Based on the method, the upper and lower limits of the delivery cost under each combination level are determined, and a preliminary filter screen is formed.
6. And correcting the filter screen by utilizing the monotonic positive correlation.
The basic idea of the monotonic positive correlation filter screen is that for logistics orders with the same delivery mileage grade and the same goods weight grade, the higher the required vehicle length grade is, the lower limit value of the delivery cost of the orders should not be reduced correspondingly; similarly, for a logistics order with the same delivery mileage grade and the same required vehicle length grade, the lower limit value of the delivery cost of the order with the larger cargo weight grade is correspondingly not reduced; the lower limit value of the delivery cost of the order with the same goods weight grade and the same required vehicle length grade is correspondingly not reduced when the order with the longer delivery mileage is delivered. On the contrary, the upper limit value of the delivery fee should not be increased correspondingly, and the upper limit value of the delivery fee is not smaller than the lower limit value under any combination level.
And correcting the filter screen preliminarily formed in the previous step based on the same training set data. For example, for a logistics order with a cargo weight of 0-1 ton and a required vehicle length of 0-4.2 meters, the minimum/high delivery cost for a delivery mileage of 0-100 km cannot be higher than the minimum/high delivery cost for a delivery mileage of 100-300 km. If the lowest delivery cost of the mileage between 0 and 100 km is higher than the lowest delivery cost of the mileage between 100 and 300 km, the lowest delivery cost of the mileage between 100 and 300 km is changed into the lowest delivery cost of the mileage between 0 and 100 km; if the highest delivery Fei Gao with the mileage between 0 and 100 kilometers is the highest delivery fee with the mileage between 100 and 300 kilometers, the highest delivery fee with the mileage between 0 and 100 kilometers is changed into the highest delivery fee with the mileage between 100 and 300 kilometers.
Traversing the three independent variable levels in sequence, and correcting the upper and lower bounds of the delivery fee successively, so as to determine the lower bound value and the upper bound value of the delivery fee falling in different grids, and judging whether the delivery fee of a certain logistics order is in the upper and lower bound intervals of the model according to the learned monotonic positive correlation filter screen model, thereby judging whether the logistics order is abnormal. And correcting the parameters of the filter screen by using the data set with the abnormal samples removed again to form a final optimized filter screen model.
7. And (3) establishing a distribution fee prediction model, and comparing model effects to illustrate the necessity and effectiveness of eliminating abnormal samples through the filter screen model.
And (3) manually checking an abnormal order sample detected by the monotonic positive correlation filter screen model, searching and analyzing the reason and rationality of the order abnormality, and preliminarily determining the effect of the filter screen detection method by comparing the coverage rate of the abnormal order sample and the rationality, wherein the coverage rate is defined as follows:
where # denotes the count.
By the controlled variable method, the same data is used in two ways: 1) And directly establishing a regression model of the delivery fee prediction. 2) Firstly, a filter screen model is established based on data, abnormal transaction samples are screened out, then a distribution fee prediction model is established, accuracy Accurcy of the prediction of the distribution fee under a two-time regression model is compared, and the necessity and the effectiveness of the filter screen model are judged.
Using an Xgboost model as a regressor of the delivery charge prediction model, and defining an evaluation index MAPE of the model prediction effect:
when alpha is E [0, 0.5), the punishment of the sample with the larger predicted value is larger than that of the sample with the smaller predicted value;
when α=0.5, the penalty for samples with smaller predicted values is the same as the penalty for samples with larger predicted values;
when α= (0.5, 1), the penalty for samples with smaller predictors is greater than the penalty for samples with larger predictors.
For sample i, the MAPE value is MAPE i Define Accuracy Accuracy as MAPE i Less than a certain thresholdExamples, i.e
Wherein I (·) is an oscillometric function, N is a sample size, β represents the maximum tolerable relative error, and is a threshold for evaluation accuracy. Smaller β represents more severe evaluation of model prediction effect, and β is generally 5%, 10%, 20%, 30% or the like. Experiments prove that the distribution fee prediction model is established by using the data for eliminating abnormal transactions, so that higher Accuracy Accuracy can be obtained, and the necessity and effectiveness for eliminating abnormal samples through the filter screen model are further illustrated.
The determination of the upper and lower limit values of the delivery cost in a certain combination section level in the filter screen model for detecting abnormal transaction data is determined according to the distribution condition of a transaction sample, and accords with the service facts that the more the delivery mileage is far, the larger the weight of goods is, the larger the required vehicle length is, and the higher the delivery cost is relatively. Meanwhile, by controlling whether the distribution cost prediction model for screening the abnormal samples is a comparison experiment, an evaluation method is provided for explaining the effectiveness of the single alignment correlation filter screen model detection method, and the method can be called as a complete abnormal logistics order detection scheme.
Compared with the prior art, the invention has the beneficial effects that: the method of the invention carries out box division processing on several important factors affecting the distribution cost in combination with service experience based on the logistics order service data, establishes a monotonic positive correlation filter screen on the basis and carries out iterative optimization, and further selects abnormal logistics orders according to the learned filter screen.
Drawings
Fig. 1 is a general flow chart of the present invention.
Fig. 2 is a diagram of basic data and fields provided in an embodiment of the present invention.
Fig. 3 is an independent variable binning rule provided in an embodiment of the present invention.
Fig. 4 is a diagram of a scheme for determining upper and lower limits of distribution fees for a certain section combination according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of forming a monotonic positive correlation filter screen model according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Referring to fig. 1, in an embodiment of the present invention, a method for detecting an abnormal logistics order by establishing a monotonically positive correlation filter screen includes the following steps:
and acquiring data sources including basic data of the goods sources and delivery fees.
The necessary data cleansing is performed, including missing value processing and necessary format conversion.
And calculating the correlation among the variables, analyzing independent variables with positive correlation with the delivery fee, and determining important influencing variables.
And dividing the dependent variables into boxes, and respectively carrying out inter-cell grade division.
Analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values of each group, including combining respective variables, analyzing distribution of the distribution fees on each interval combination and determining upper and lower limit values, and primarily forming a filter screen.
Correcting the filter screen by utilizing positive correlation, and ensuring that the lower limit value of the distribution cost is not reduced along with the increase of the interval value; with the reduction of the interval value, the upper limit value of the distribution cost is not increased, and a filter screen model is formed.
The method comprises the steps of establishing a distribution fee prediction model, comparing model effects to illustrate the necessity and effectiveness of eliminating abnormal samples through a filter screen model, checking logistics transaction order data, carrying out parameter adjustment by using the transaction order with dirty data removed, and optimizing the detection effect of the filter screen model.
And (3) comparing with the manual abnormality detection result, analyzing the abnormality detection effect of the filter screen model. Meanwhile, a distribution fee prediction model is established, and the effectiveness and the necessity of the improved abnormal value detection method in the embodiment of the invention are further illustrated by controlling the influence of the variable method on the model effect of whether the filter screen model is used for abnormal data detection or not.
The abnormal logistics order detection method is designed by inspiring positive correlation between a variable (such as a cargo weight, a required vehicle length, a delivery mileage, etc.) related to a cargo source and a delivery fee freight, three independent variables are respectively divided into a plurality of sections, the upper and lower bounds of the delivery fee on each specific combination section are analyzed, and the upper and lower bounds in each combination section are corrected by utilizing monotonic positive correlation to form a set of filter screen. Meanwhile, under the strategy of continuous screening and iteration, the accuracy of detecting abnormal samples is improved while the interpretability and the flexibility are effectively ensured.
In the embodiment of the invention, the detection of the abnormal logistics order needs to meet the following constraint conditions:
1. the delivery cost is relatively stable within a range of a certain cargo weight, a required vehicle length and a delivery mileage, and approximately follows a normal distribution.
2. When any two of the three variables (e.g., cargo weight, desired vehicle length) are determined, the delivery costs are relatively increased as the third variable (delivery costs) is increased.
3. In the logistics order of the transaction, there is a phenomenon that the delivery cost is not matched with the basic information of the goods source due to the acquisition flow, human factors and the like, namely, abnormal data exists.
4. For the sample data of the material flow order for acquiring the filter screen model, other factors with stronger correlation except the weight of goods, the required vehicle length and the delivery cost do not exist, otherwise, sample division is carried out based on the characteristics, and a data set of the trainable filter screen model is extracted.
In addition, the independent variable segmentation scheme in the filter screen model is designed according to the logistics distribution business so as to meet the actual application requirements.
In the embodiment of the invention, the collected transaction order information in history is used for analyzing the variables positively correlated with the delivery fees, the filter screen model is determined based on a control variable method, a 3 sigma principle, monotonic positive correlation and the like, and further, the transaction data is subjected to anomaly detection and screening, so that the data is more real and more reliable in upper-layer application.
And the measurement and screening are carried out, so that the data is more real and more reliable when the upper layer application is carried out.
The following describes the model in the embodiment of the present invention.
1. A data source is acquired. Auxiliary fields such as delivery mileage (mileage between departure place and destination), goods category and the like are derived by taking basic information of goods sources such as origin, destination, required vehicle length, goods weight, goods volume, goods type, vehicle type (such as not limited, common, flat plate, high column, van type and the like), delivery mileage, whether high-speed charging mode (such as trip, ton, party, piece and the like), loading mode and the like as main information. For each transaction already made, a large wide table is formed, associated with its delivery fee.
2. The necessary data is cleaned, such as screening samples with the weight of goods, the required vehicle length and the non-missing delivery mileage; rejecting the transaction samples occurring on a special holiday; one-to-one code mapping is performed on the discrete fields, necessary data format conversion is performed for correlation analysis, and the like. Meanwhile, a box line graph, a histogram and a bar graph are drawn to check the distribution condition of the respective variables and the dependent variables. A, rejecting variables with independent variable missing proportion higher than 30%; B. the independent variable with the missing proportion less than 30% is filled with the continuous independent variable by the median and the discrete independent variable by the mode.
3. Correlation between calculated variablesWherein X represents any independent variable, and Y represents the delivery fee freight. Three independent variables having the highest positive correlation with the delivery fee freight and having a correlation of not less than 0.8 are analyzed. If a plurality of independent variables with larger positive correlation with the distribution fee exist, the independent variables can be considered to be subjected to primary component analysis for dimension reduction treatment; if the independent variables meeting the conditions cannot be found, the sub-sample sets are continuously screened for similar analysis until important influencing variables can be determined. In this embodiment, the independent variables with high correlation with the dependent variable delivery fee freight are obtained by calculating the correlation, and the independent variables include the weight of the cargo, the required vehicle length and the delivery mileage.
4. And dividing the dependent variables into boxes, and respectively carrying out inter-cell grade division. As shown in fig. 3, the monotonically positive correlation independent variable box division rule divides the delivery mileage into 7 section levels, the cargo weight into 7 section levels, the required vehicle length into 6 section levels, and the dividing critical value and the number of the levels are determined according to the distribution condition of the service and the cargo source.
5. The distribution of delivery fees over each combination of intervals is analyzed and upper and lower thresholds for each group are determined.
In the monitoring of abnormal conditions, the overall level of the index can be measured by means of the mean line, the variance gives the normal fluctuation range allowed by the value, the probability that the data point exceeds three times the standard deviation from the overall level is small, and once a so-called "small probability event" occurs, as shown in the determination scheme of the upper and lower bounds of the distribution fee of fig. 4. Assuming that there are N logistics transaction samples in the distribution mileage i grade, the goods weight j grade and the required vehicle length grade k, the distribution cost freight is recorded asStandard deviation isThe combined intra-interval delivery fee freight is within the interval +.>In, i.e. the lower limit of the delivery charge is +.>The upper limit is +.>Samples with a delivery cost exceeding three times the standard deviation of the mean are considered abnormal samples.
Based on the method, the upper and lower limits of the delivery cost under each combination level are determined, and a preliminary filter screen is formed.
6. And correcting the filter screen by utilizing the monotonic positive correlation.
As shown in the figure 5 monotonically positive correlation filter model formation schematic. The basic idea of the monotonic positive correlation filter screen is that for logistics orders with the same delivery mileage grade and the same goods weight grade, the higher the required vehicle length grade is, the lower limit value of the delivery cost of the orders should not be reduced correspondingly; similarly, for a logistics order with the same delivery mileage grade and the same required vehicle length grade, the lower limit value of the delivery cost of the order with the larger cargo weight grade is correspondingly not reduced; the lower limit value of the delivery cost of the order with the same goods weight grade and the same required vehicle length grade is correspondingly not reduced when the order with the longer delivery mileage is delivered. On the contrary, the upper limit value of the delivery fee should not be increased correspondingly, and the upper limit value of the delivery fee is not smaller than the lower limit value under any combination level.
And correcting the filter screen preliminarily formed in the previous step based on the same training set data. For example, for a logistics order with a cargo weight of 0-1 ton and a required vehicle length of 0-4.2 meters, the minimum/high delivery cost for a delivery mileage of 0-100 km cannot be higher than the minimum/high delivery cost for a delivery mileage of 100-300 km. If the lowest delivery cost of the mileage between 0 and 100 km is higher than the lowest delivery cost of the mileage between 100 and 300 km, the lowest delivery cost of the mileage between 100 and 300 km is changed into the lowest delivery cost of the mileage between 0 and 100 km; if the highest delivery Fei Gao with the mileage between 0 and 100 kilometers is the highest delivery fee with the mileage between 100 and 300 kilometers, the highest delivery fee with the mileage between 0 and 100 kilometers is changed into the highest delivery fee with the mileage between 100 and 300 kilometers.
Traversing the three independent variable levels in sequence, and correcting the upper and lower bounds of the delivery fee successively, so as to determine the lower bound value and the upper bound value of the delivery fee falling in different grids, and judging whether the delivery fee of a certain logistics order is in the upper and lower bound intervals of the model according to the learned monotonic positive correlation filter screen model, thereby judging whether the logistics order is abnormal. And correcting the parameters of the filter screen by using the data set with the abnormal samples removed again to form a final optimized filter screen model.
7. And (3) establishing a distribution fee prediction model, and comparing model effects to illustrate the necessity and effectiveness of eliminating abnormal samples through the filter screen model.
And (3) manually checking an abnormal order sample detected by the monotonic positive correlation filter screen model, searching and analyzing the reason and rationality of the order abnormality, and preliminarily determining the effect of the filter screen detection method by comparing the coverage rate of the abnormal order sample and the rationality, wherein the coverage rate is defined as follows:
where # denotes the count.
By the controlled variable method, the same data is used in two ways: 1) And directly establishing a regression model of the delivery fee prediction. 2) Firstly, a filter screen model is established based on data, abnormal transaction samples are screened out, then a distribution fee prediction model is established, accuracy Accurcy of the prediction of the distribution fee under a two-time regression model is compared, and the necessity and the effectiveness of the filter screen model are judged.
Using an Xgboost model as a regressor of the delivery charge prediction model, and defining an evaluation index MAPE of the model prediction effect:
when alpha is E [0, 0.5), the punishment of the sample with the larger predicted value is larger than that of the sample with the smaller predicted value;
when α=0.5, the penalty for samples with smaller predicted values is the same as the penalty for samples with larger predicted values;
when α= (0.5, 1), the penalty for samples with smaller predictors is greater than the penalty for samples with larger predictors.
For sample i, the MAPE value is MAPE i Define Accuracy Accuracy as MAPE i Less than a certain threshold, i.e. the proportion
Wherein I (·) is an oscillometric function, N is a sample size, β represents the maximum tolerable relative error, and is a threshold for evaluation accuracy. Smaller β represents more severe evaluation of model prediction effect, and β is generally 5%, 10%, 20%, 30% or the like. Experiments prove that the distribution fee prediction model is established by using the data for eliminating abnormal transactions, so that higher Accuracy Accuracy can be obtained, and the necessity and effectiveness for eliminating abnormal samples through the filter screen model are further illustrated.
The determination of the upper and lower limit values of the delivery cost in a certain combination section level in the filter screen model for detecting abnormal transaction data is determined according to the distribution condition of a transaction sample, and accords with the service facts that the more the delivery mileage is far, the larger the weight of goods is, the larger the required vehicle length is, and the higher the delivery cost is relatively. Meanwhile, by controlling whether the distribution cost prediction model for screening the abnormal samples is a comparison experiment, an evaluation method is provided for explaining the effectiveness of the single alignment correlation filter screen model detection method, and the method can be called as a complete abnormal logistics order detection scheme.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (7)

1. An abnormal logistics order detection method by establishing a monotonic positive correlation filter screen is characterized by comprising the following steps:
acquiring a data source, wherein the data source comprises basic data of a goods source and delivery fees;
performing necessary data cleaning, including missing value processing and necessary format conversion;
calculating the correlation among the variables, analyzing independent variables with positive correlation with the delivery fees, and determining important influencing variables;
dividing each dependent variable into boxes, and respectively dividing the grades among the cells;
analyzing distribution of the distribution fees on each interval combination and determining upper and lower boundary values of each group, wherein the distribution of the distribution fees on each interval combination is analyzed and the upper and lower boundary values are determined based on a 3 sigma criterion of a probability statistical method, so that a filter screen is formed preliminarily;
correcting the filter screen by utilizing positive correlation, and ensuring that the lower limit value of the distribution cost is not reduced along with the increase of the interval value;
with the reduction of the interval value, the upper limit value of the distribution cost is not increased, and a filter screen model is formed;
establishing a distribution fee prediction model, comparing the model effect to illustrate the necessity and the effectiveness of eliminating abnormal samples through the filter screen model, wherein the method comprises the steps of checking logistics transaction order data, carrying out parameter adjustment on the transaction order after screening dirty data, and optimizing the detection effect of the filter screen model;
the step of establishing the distribution fee prediction model comprises the following steps: and (3) manually checking an abnormal order sample detected by the monotonic positive correlation filter screen model, searching and analyzing the reason and rationality of the order abnormality, and preliminarily determining the effect of the filter screen detection method by comparing the coverage rate of the abnormal order sample and the rationality, wherein the coverage rate is defined as follows:
by the controlled variable method, the same data is used in two ways: 1) Directly establishing a regression model of the distribution fee prediction; 2) Firstly, establishing a filter screen model based on data, screening out abnormal transaction samples, then establishing a model of predicting the delivery cost, comparing Accuracy Accurcy of predicting the delivery cost under a twice regression model, and judging the necessity and the effectiveness of the filter screen model;
using an Xgboost model as a regressor of the delivery charge prediction model, and defining an evaluation index MAPE of the model prediction effect:
when alpha is E [0, 0.5), the punishment of the sample with the larger predicted value is larger than that of the sample with the smaller predicted value;
when α=0.5, the penalty for samples with smaller predicted values is the same as the penalty for samples with larger predicted values;
when α= (0.5, 1), the penalty for the sample with smaller predicted value is larger than the penalty for the sample with larger predicted value;
for the ith sample, the MAPE value is MAPEi, and the Accuracy Accuracy is defined as the proportion of MAPEi smaller than a certain threshold, namely
Wherein, I (·) is an oscillography function, N is a sample size, β represents the maximum tolerable relative error, which is a threshold for evaluation accuracy; smaller β represents more rigorous evaluation of model prediction effect, and β is generally 5%, 10%, 20%, 30% or the like; experiments prove that the distribution fee prediction model is established by using the data for eliminating abnormal transactions, so that higher Accuracy Accuracy can be obtained, and the necessity and effectiveness for eliminating abnormal samples through the filter screen model are further illustrated.
2. The abnormal logistics order detection method of the monotonic positive correlation filter according to claim 1, wherein the step of obtaining the data source is characterized in that the data source basic information comprises main information of origin, destination, required vehicle length, cargo weight, cargo volume, cargo type, vehicle type, delivery mileage, whether high speed, charging mode and loading mode is used for deriving delivery mileage and cargo major auxiliary fields, and a large width table is formed for each transaction which is already done and is related to delivery fee.
3. The method for detecting abnormal logistics orders by establishing a monotonic positive correlation filter according to claim 2, wherein the necessary data cleaning steps are performed, including screening samples for cargo weight, required vehicle length, and delivery mileage non-missing; rejecting the transaction samples occurring on a special holiday; performing one-to-one coding mapping on discrete fields, performing necessary data format conversion for analyzing correlation, and simultaneously drawing a box diagram, a histogram and a bar graph to check the distribution condition of respective variables and dependent variables, wherein for the field processing still missing, A, removing the variables with independent variable missing proportion higher than 30%; B. the independent variable with the missing proportion less than 30% is filled with the continuous independent variable by the median and the discrete independent variable by the mode.
4. A method of abnormal flow order detection by creating a monotonically positive correlation filter according to claim 3, wherein the step of calculating the correlation between variables uses the formula:
wherein X represents any independent variable, Y represents the delivery fee freight, three independent variables with highest positive correlation and no lower than 0.8 between the independent variables and the delivery fee freight are analyzed, and if more independent variables with larger positive correlation with the delivery fee exist, the independent variables can be considered to be subjected to dimension reduction treatment by taking principal component analysis; if the independent variable meeting the condition cannot be found, continuously screening the sub-sample set for similar analysis until the important influencing variable can be determined, and obtaining the independent variable with high relevance to the dependent variable delivery cost freight, wherein the independent variable comprises a cargo weight, a required vehicle length and a delivery mileage.
5. The abnormal logistics order detection method by establishing a monotonic positive correlation filter according to claim 4, wherein each dependent variable is classified into a bin, each of the dependent variables is classified into a bin class, the delivery mileage is classified into 7 bin classes, the cargo weight is classified into 7 bin classes, the required vehicle length is classified into 6 bin classes, and the number of the cut-off critical values and the number of the bin classes are determined according to the distribution conditions of the service and the cargo source.
6. An abnormal logistic order detection method by creating a monotonic positive correlation filter according to claim 5,
the method is characterized by comprising the steps of analyzing distribution of distribution fees on each interval combination and determining upper and lower limit values of each group: assuming that there are N logistics transaction samples in the distribution mileage i grade, the goods weight j grade and the required vehicle length grade k, the distribution cost freight is recorded asStandard deviation of->The combined intra-interval delivery fee freight is within the interval +.>In, i.e. the lower limit of the delivery charge is +.>The upper limit is +.>Samples with the delivery cost exceeding the average value by three times of standard deviation are regarded as abnormal samples; based on the above, the upper and lower limits of the delivery cost under each combination level are determined, and a preliminary filter screen is formed.
7. The method for detecting abnormal logistics orders by establishing a monotonic positive correlation filter as recited in claim 6, wherein the correcting step is performed on the filter using positive correlation: for the logistics orders with the same delivery mileage grade and the same required vehicle length grade, the lower limit value of the delivery cost of the orders with larger goods weight grade is correspondingly not reduced; the logistics orders with the same goods weight grade and the same required vehicle length grade are distributed with more distant mileage, and the lower limit value of the distribution cost of the orders is correspondingly not reduced; on the contrary, the upper limit value of the delivery fee is correspondingly not increased, and the upper limit value of the delivery fee under any combination level is kept not smaller than the lower limit value;
correcting the filter screen preliminarily formed in the previous step based on the same training set data; traversing three independent variable levels in sequence, successively correcting the upper and lower bounds of the delivery fees, thereby determining the lower bound value and the upper bound value of the delivery fees falling in different grids, judging whether the delivery fees of a certain logistics order are in the upper and lower bound intervals of the model according to the learned monotonically positive correlation filter screen model, judging whether the logistics order is abnormal or not, and correcting the filter screen parameters again by using the data set with abnormal samples removed to form a final optimized filter screen model.
CN202011282131.5A 2020-11-16 2020-11-16 Abnormal logistics order detection method by establishing monotonic positive correlation filter screen Active CN112348644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282131.5A CN112348644B (en) 2020-11-16 2020-11-16 Abnormal logistics order detection method by establishing monotonic positive correlation filter screen

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282131.5A CN112348644B (en) 2020-11-16 2020-11-16 Abnormal logistics order detection method by establishing monotonic positive correlation filter screen

Publications (2)

Publication Number Publication Date
CN112348644A CN112348644A (en) 2021-02-09
CN112348644B true CN112348644B (en) 2024-04-02

Family

ID=74362893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282131.5A Active CN112348644B (en) 2020-11-16 2020-11-16 Abnormal logistics order detection method by establishing monotonic positive correlation filter screen

Country Status (1)

Country Link
CN (1) CN112348644B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004169989A (en) * 2002-11-20 2004-06-17 Daikin Ind Ltd Abnormality diagnosis system
CN106368813A (en) * 2016-08-30 2017-02-01 北京协同创新智能电网技术有限公司 Abnormal alarm data detection method based on multivariate time series
CN109031374A (en) * 2018-08-06 2018-12-18 北京理工大学 Difference pseudo-range corrections abnormal signal monitoring method suitable for continuous operation of the reference station
CN109086324A (en) * 2018-07-04 2018-12-25 中国科学院地理科学与资源研究所 A kind of Oil/gas Geochemical Anomalies extracting method for dividing shape based on S-A
CN109784668A (en) * 2018-12-21 2019-05-21 国网江苏省电力有限公司南京供电分公司 A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking
CN111339297A (en) * 2020-02-21 2020-06-26 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium, and device
CN111507374A (en) * 2020-02-13 2020-08-07 华北电力大学 Power grid mass data anomaly detection method based on random matrix theory
JP2020142353A (en) * 2019-03-08 2020-09-10 ファナック株式会社 Abnormality detection device and abnormality detection method for joint of robot
CN112347230A (en) * 2020-11-16 2021-02-09 上海品见智能科技有限公司 Enterprise public opinion data analysis method based on Word2Vec
CN113807762A (en) * 2021-02-09 2021-12-17 北京京东振世信息技术有限公司 Method and system for assisting logistics abnormity decision
CN116776271A (en) * 2023-06-30 2023-09-19 闽江学院 Polluted time sequence unsupervised anomaly detection method based on negative correlation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3922375B2 (en) * 2004-01-30 2007-05-30 インターナショナル・ビジネス・マシーンズ・コーポレーション Anomaly detection system and method
WO2020090770A1 (en) * 2018-10-30 2020-05-07 国立研究開発法人宇宙航空研究開発機構 Abnormality detection device, abnormality detection method, and program
JP7466392B2 (en) * 2020-07-16 2024-04-12 コベルコ・コンプレッサ株式会社 Refueling equipment and method for detecting abnormalities therein

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004169989A (en) * 2002-11-20 2004-06-17 Daikin Ind Ltd Abnormality diagnosis system
CN106368813A (en) * 2016-08-30 2017-02-01 北京协同创新智能电网技术有限公司 Abnormal alarm data detection method based on multivariate time series
CN109086324A (en) * 2018-07-04 2018-12-25 中国科学院地理科学与资源研究所 A kind of Oil/gas Geochemical Anomalies extracting method for dividing shape based on S-A
CN109031374A (en) * 2018-08-06 2018-12-18 北京理工大学 Difference pseudo-range corrections abnormal signal monitoring method suitable for continuous operation of the reference station
CN109784668A (en) * 2018-12-21 2019-05-21 国网江苏省电力有限公司南京供电分公司 A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking
JP2020142353A (en) * 2019-03-08 2020-09-10 ファナック株式会社 Abnormality detection device and abnormality detection method for joint of robot
CN111507374A (en) * 2020-02-13 2020-08-07 华北电力大学 Power grid mass data anomaly detection method based on random matrix theory
CN111339297A (en) * 2020-02-21 2020-06-26 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium, and device
CN112347230A (en) * 2020-11-16 2021-02-09 上海品见智能科技有限公司 Enterprise public opinion data analysis method based on Word2Vec
CN113807762A (en) * 2021-02-09 2021-12-17 北京京东振世信息技术有限公司 Method and system for assisting logistics abnormity decision
CN116776271A (en) * 2023-06-30 2023-09-19 闽江学院 Polluted time sequence unsupervised anomaly detection method based on negative correlation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Empirical study on the outliers of compressed nagural gas (CNG) refueling behaviors;Kim, YH;《PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON CIVIL, ARCHITECTURAL AND HYDRAULIC ENGINEERING (ICCAHE 2016)》;20170125;第95卷;276-280 *
Intelligent Identification of Electricity Stealing Based on the Correlation of Line Loss;Gaojun Xu;2022 7th Asia Conference on Power and Electrical Engneering;20220601;1-6 *
基于向量升维的农情异常数据实时检测方法;赵 刚;安徽农业大学学报;20211231;第48卷(第2期);304-311 *
基于数据相关性的异常检测算法研究;赵曼;中国优秀硕士学位论文全文数据库信息科技辑;20170615(第06期);I138-935 *
小儿神经疾病患者的脑电图与单胺神经递质代谢;朱凯云, 黄焰, 吕冰清;脑与神经疾病杂志;19970210(第01期);全文 *
超声检测NT值异常与胎儿心脏畸形及染色体异常之间的相关性探讨;王;;影像研究与医学应用;20171101(第15期);全文 *

Also Published As

Publication number Publication date
CN112348644A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN111356148A (en) Method and related equipment for realizing network optimization
CN113096388B (en) Short-term traffic flow prediction method based on gradient lifting decision tree
CN108417033A (en) Expressway traffic accident analysis prediction technique based on multi-dimensional factors
CN112949715A (en) SVM (support vector machine) -based rail transit fault diagnosis method
CN115691120A (en) Congestion identification method and system based on highway running water data
CN111179592B (en) Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis
CN113159374B (en) Data-driven urban traffic flow rate mode identification and real-time prediction early warning method
CN108091132A (en) A kind of traffic flow forecasting method and device
CN114446064A (en) Method, device, storage medium and terminal for analyzing traffic of expressway service area
CN114997608A (en) Production efficiency assessment method and system based on industrial chain data analysis
CN114529226B (en) Underground water pollution monitoring method and system based on industrial Internet of things
CN112364910B (en) Highway charging data abnormal event detection method and device based on peak clustering
CN112348644B (en) Abnormal logistics order detection method by establishing monotonic positive correlation filter screen
CN106910334B (en) Method and device for predicting road section conditions based on big data
US20110015967A1 (en) Methodology to identify emerging issues based on fused severity and sensitivity of temporal trends
CN113610014A (en) System and method for detecting freight vehicle with shielding number plate exceeding limit
CN110119891B (en) Traffic safety influence factor identification method suitable for big data
CN114419894B (en) Method and system for setting and monitoring parking positions in road
CN111341096A (en) Bus running state evaluation method based on GPS data
CN114485826A (en) Method and device for determining mileage and oil consumption data of vehicle
CN113379334B (en) Road section bicycle riding quality identification method based on noisy track data
CN110120154B (en) Traffic road condition prediction method using detector data under large-scale road network
CN110400016B (en) Subway ticket business clearing method based on large passenger flow monitoring
CN114597886A (en) Power distribution network operation state evaluation method based on interval type two fuzzy clustering analysis
CN112765219A (en) Stream data abnormity detection method for skipping steady region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant