CN110363483A - A kind of expansion sample check method based on shared platform shipping trip data - Google Patents

A kind of expansion sample check method based on shared platform shipping trip data Download PDF

Info

Publication number
CN110363483A
CN110363483A CN201910662056.6A CN201910662056A CN110363483A CN 110363483 A CN110363483 A CN 110363483A CN 201910662056 A CN201910662056 A CN 201910662056A CN 110363483 A CN110363483 A CN 110363483A
Authority
CN
China
Prior art keywords
sample
data
cargo
province
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910662056.6A
Other languages
Chinese (zh)
Other versions
CN110363483B (en
Inventor
甘蜜
李丹丹
张发东
谢荣惠
李新媛
黄青蓝
刘晓波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201910662056.6A priority Critical patent/CN110363483B/en
Publication of CN110363483A publication Critical patent/CN110363483A/en
Application granted granted Critical
Publication of CN110363483B publication Critical patent/CN110363483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of expansion sample check method based on shared platform shipping trip data provided in an embodiment of the present invention, it is related to freight traffic management statistical technique field, the expansion sample check method combination shared platform shipping trip data, the Objective for carrying out ship data extracts, and initial data is started the cleaning processing, it corrects perfect, on this basis, complete the expansion sample research of shipping investigation, and intersection check is carried out to sample data are expanded using macroscopical scalar, it determines and expands sample error, reduce the deviation that sample result and practical shipping investigation are expanded in shipping to the greatest extent, the scientific and reasonable of sample result is expanded in shipping, the shipping trip characteristics accurately presented.

Description

A kind of expansion sample check method based on shared platform shipping trip data
Technical field
The present invention relates to freight traffic management statistical technique fields, are gone on a journey in particular to one kind based on shared platform shipping The expansion sample check method of data.
Background technique
In the prior art, in terms of resident trip survey and vehicle possess distribution, there is corresponding expansion sample check method. Involved data include door-to-door survey data, such as family information, personal information and personal trip information, also relate to open air Survey data, such as vehicle flow and cabin factor survey data, bus passenger flow survey data and track passenger flow investigation data, these are big Part needs manually to be acquired.
Currently without the expansion sample check method for being directed to shipping trip data, the data as involved by freight traffic management and residence People's trip survey and vehicle possess distribution and there is very big difference, and therefore, it is difficult to the expansion sample check method progress according to these two aspects Processing needs to design a kind of completely new expansion sample check method.
Summary of the invention
The embodiment of the present invention is to provide a kind of expansion sample check method based on shared platform shipping trip data, can Alleviate the above problem, expansion sample is carried out to the national volume of goods transported and volume of the circular flow distribution situation and is analyzed, and utilization each province and city volume of production and marketing data, The macro-datas such as each province's freight classification transport ratio, each province's car ownership carry out check amendment.
In order to alleviate above-mentioned problem;The technical solution that the embodiment of the present invention is taken is as follows:
A kind of expansion sample check method based on shared platform shipping trip data provided in an embodiment of the present invention, comprising:
S1, using shared platform shipping trip data as target sample, pass sequentially through three phases, sampled step by step, most Platform sampling samples are obtained eventually, determine platform sampling samples capacity;
S2, data prediction classify to platform sampling samples, including lorry classification and freight classification;
S3, variance analysis and amendment carry out missing data completion to platform sampling samples, and export missing data part Expand all;
S4, on the basis of data prediction and variance analysis, expand spline coefficient according to determining between OD pairs of different provinces, and root Expansion sample is carried out to platform sampling samples according to the expansion spline coefficient;
S5, expand the check of sample data, including
S51, using each province distribution of goods class year volume of production and marketing macro-data to expand sample after each province, each goods class, each volume of goods transported into Row is checked, and carries out error analysis:
Wherein, j expression jth kind cargo type, j=1,2 ..., 17, volume of production and marketing macro-data is counted from cargo volume of production and marketing Mechanism, volume of goods transported expansion sample data are the data in the platform sampling samples after expanding sample;
S52, each province lorry type after expanding sample is checked using each province year car ownership macro-data, is gone forward side by side Row error analysis:
Wherein, l expression l kind lorry type, l=1,2,3,4;Car ownership macro-data is united from car ownership Gauge body, goods stock expansion sample data are the data in the platform sampling samples after expanding sample;
S53, divide fuel type occupation rate of market data to check fuel type using each province, and carry out error point Analysis:
Wherein, y expression y kind fuel type, y=1,2,3;Fuel type occupation rate of market macro-data is divided to come from fuel oil Type market occupation rate statistical organization, divide fuel type goods stock expand sample data be expand sample after platform sampling samples in number According to;
S54, it is investigated according to comprehensive traffic and expands sample check successful experience, within the scope of acceptable error, i.e., mean error exists 10% expands sample data hereinafter, output is complete.
In embodiments of the present invention, in conjunction with shared platform shipping trip data, the Objective for carrying out ship data is extracted, and Initial data is started the cleaning processing, correct it is perfect, on this basis, complete shipping investigation expansion sample research, and using macroscopic view Scalar carries out intersection check to sample data are expanded, and determines and expands sample error, reduces shipping to the greatest extent and expands sample result and practical goods The deviation of investigation is transported, scientific and reasonable, the shipping trip characteristics accurately presented of sample result are expanded in shipping.
Optionally, step S1 is specifically included:
S11, first stage sampling, using stratified sampling method, in the way of time layering and region layering, with shared flat Platform shipping trip data carries out stratified sampling for target sample, obtains first stage sample, and calculate first stage sample Capacity n1
S12, second stage sampling, using equal proportion sampling, according to cargo type, according to equal proportion principle, from first Several unit composition second stage samples are directly extracted in stage sample, and calculate second stage sample size n sample range2
S13, phase III sampling, using method of random sampling, are directly extracted from second stage sample according to randomly assigne For several samples as phase III sample, phase III sample is platform sampling samples, calculates phase III sample size n3, and as platform sampling samples capacity.
In embodiments of the present invention, stratified sampling method, equal proportion sampling and method of random sampling three are successively used step by step Stage sampling method, can be good at matching that respondent's scale that shipping between OD pairs of city is investigated is big, spy that the field of investigation is wide Property, take into account scientific and operability.
Optionally, first stage sample size n sample range in step S111It is calculated according to formula (1) or formula (2)
In formula, N is a period of time total sample size in shared platform shipping trip data, and t represents degree of probability Za/2,It is Average variance in group, Δ represent limit error,Represent into several average intra-class variances.
Optionally, in stratified sampling, the number of units in sample that each layer should extract is allocated using method of equal proportion, calculation formula Are as follows:
mi=n1Ni/N (1-3)
In formula, miThe sample number that should be extracted for i-th layer, NiFor i-th layer of total sample number.
Optionally, second stage sample size n sample range in step S122It is calculated according to formula (4)
n2=n1t2P(1-P)/n1Δ2+t2P(1-P) (1-4)
In formula, P (1-P) is expressed as several variances.
Optionally, phase III sample size n sample range in step S133Calculation method be according to interval estimation theory, prior When requirement clearly to estimator, counter to push away parsing and obtain required sample size, which includes two kinds:
The first determines sample size by absolute precision:
Assuming that given absolute precision λ, that is, require
Under 1- α confidence level, meet
I.e.
Compare interval estimation as a result, obtaining:
In formula, u1-α/2It is N (0,1) distributionQuantile,It is estimationMean-squared departure, S2It is overall Variance;
Second by relative accuracy decision sample size:
Given relative accuracy ε, i.e.,
Under 1- α confidence level, meet
Compare interval estimation as a result, obtaining:
Optionally, step S3 is specifically included:
S31, the platform sampling samples after data prediction are divided to month and carry out point province OD by cargo type and are analyzed, looked into It sees between the OD of each province and lacks type of merchandize;
S32, in conjunction with province year cargo volume of production and marketing data of respectively setting out, the monthly cargo type highway freight ratio in each province, Determine whether each moon excalation type of merchandize in each province is abnormal;
S33, it is directed to abnormal type of merchandize, missing cargo is augmented in the OD of province, to be saved involved by the exception cargo Part highway OD freight volume accounts for the ratio of shipping total amount in province involved by the exception cargo, carries out OD points to the missing cargo type volume of goods transported Solution generates the data list comprising province of setting out, arrival province, cargo type, shipping total amount;
S34, vehicle, lorry self weight, truckload ratio are corresponded in conjunction with the abnormal cargo of such in platform sampling samples, to goods Fortune total amount is further decomposed, and is generated comprising province of setting out, is reached province, cargo type, shipping total amount, vehicle, lorry self weight, goods Vehicle-mounted heavy data list;
S35, the city OD volume of goods transported is determined using year interurban trucking total amount, Expressway Road flow etc., in city OD Shipping total amount is decomposed in dimension, generates comprising province of setting out, city of setting out, reach province, arrival city, cargo class Type, shipping total amount, vehicle, lorry self weight, the data list of truckload;
S36, using platform sampling samples point vehicle and divide fuel type proportion, decompose goods by vehicle, fuel type Freight volume generates comprising province of setting out, city of setting out, reaches province, arrival city, cargo type, shipping total amount, vehicle, fuel oil The data list of type;
S37, according to freight all kinds annual charging ratio in each province in platform sampling samples, pass through truckload and average dress Load rate obtains goods weight, and shipping total amount is decomposed into goods weight and cargo transport pass, generates comprising province of setting out, sets out City reaches province, reaches city, cargo type, shipping total amount, vehicle, fuel type, goods weight, cargo transport pass Data list;
S38, expansion all for exporting missing data part.
Optionally, step S4 is specifically included:
S41, it determines and expands spline coefficient formula
K=k0*kcargo*kvehicle*kfuel (4-1)
Wherein, koTo expand sample initial coefficients, kcargoFor cargo coefficient of variation, kvehicleVehicle fluctuation is corresponded to for each cargo class Coefficient, kfuleTo divide vehicle fuel type coefficient of variation;
S42, k is determinedo
ko=Q/q (4-2)
Wherein, Q is OD to the year macroscopic view volume of goods transported, is obtained from cargo volume of production and marketing statistical organization, q is platform sampling samples data In the year volume of goods transported;
S43, k is determinedcargo
kcargo=qcargo/qr (4-3)
Wherein, qcargoFor the moon volume of goods transported of the platform sampling samples r month class cargo between OD pairs, qrFor platform sampling sample The r volume of goods transported month in and month out between OD pairs in this;
S44, k is determinedvehicle
kvehicle=qvehicle/qr (4-4)
Wherein, qvehicleThe volume of goods transported of the vehicle between OD pairs is corresponded to for r month class cargo in platform sampling samples;
S45, k is determinedfule
kfule=qfuel/qr (4-5)
Wherein, qfuelFor the volume of goods transported of the r month fuel type between OD pairs in platform sampling samples;
S46, it is calculated and is exported according to formula (4-1) and expand spline coefficient K, sampled using revised expansion spline coefficient K to platform Sample carries out expansion sample.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, the embodiment of the present invention is cited below particularly, and match Appended attached drawing is closed, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the expansion sample check method flow chart of the present invention based on shared platform shipping trip data;
Fig. 2 is the reasoning flow figure for expanding spline coefficient in the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.
Please refer to Fig. 1, a kind of expansion sample check side based on shared platform shipping trip data provided in an embodiment of the present invention Method, comprising:
S1, research cost and data required precision are considered, shipping investigation mainly uses sample investigation side between OD pairs of city Method, the respondent's scale investigated in view of shipping between OD pairs of city is big, the field of investigation is wide, to take into account in the selection of investigation method Scientific and operability, comprehensively considers multi-party factor, and to shared platform shipping trip data (such as O2O shipping platform parent Database) in correlation attribute information analyzed after, determine and generally use three stage sampling methods, therefore with shared platform Shipping trip data is target sample, passes sequentially through three phases, is sampled step by step, finally obtains platform sampling samples, really Fixed platform sampling samples capacity;
S2, data prediction classify to platform sampling samples, including lorry classification and freight classification;
Wherein freight classification is carried out according to J/T19-2001 " Transportation Commodity Classification and code ", and 17 kinds of cargos are obtained Type, shown in freight classification situation table 1:
Table 1
Number Cargo type Number Cargo type
1 Agriculture, forestry, animal husbandry and fishery product 10 Mineral construction material
2 Light industry, medical product 11 Coal and product
3 Grain 12 Timber
4 Non-metallic ore 13 Cement
5 Fertilizer and pesticide 14 Salt
6 Steel 15 Petroleum, natural gas and its product
7 Industrial chemicals and product 16 Non-ferrous metal
8 Mechanical equipment, electric appliance 17 Other
9 Metallic ore
Lorry is classified as the different brands being collected into from commercial websites such as quotient's vehicle nets (https: //www.cn357.com/) Wagon Tech parametric statistics data combine collected external data by vehicle commander, load-carrying, self weight to goods according to lorry brand field Vehicle is classified, and lorry classification situation is shown in Table 2, and is omitted here the data such as vehicle commander, load-carrying, self weight:
Table 2
Number Lorry type Number Lorry type
1 Dumper 4 Light-duty Vehicle
2 Common in-vehicle
3 Tractor
S3, variance analysis and amendment carry out missing data completion to platform sampling samples, and export missing data part Expand all;
S4, on the basis of data prediction and variance analysis, expand spline coefficient according to determining between OD pairs of different provinces, and root Expansion sample is carried out to platform sampling samples according to the expansion spline coefficient;
S5, expand the check of sample data, including
S51, using each province distribution of goods class year volume of production and marketing macro-data to expand sample after each province, each goods class, each volume of goods transported into Row is checked, and carries out error analysis:
Wherein, j expression jth kind cargo type, j=1,2 ..., 17, volume of production and marketing macro-data is counted from cargo volume of production and marketing Mechanism, volume of goods transported expansion sample data are the data in the platform sampling samples after expanding sample;
S52, each province lorry type after expanding sample is checked using each province year car ownership macro-data, is gone forward side by side Row error analysis:
Wherein, l expression l kind lorry type, l=1,2,3,4;Car ownership macro-data is united from car ownership Gauge body, goods stock expansion sample data are the data in the platform sampling samples after expanding sample;
S53, divide fuel type occupation rate of market data to check fuel type using each province, and carry out error point Analysis:
Wherein, y expression y kind fuel type, y=1,2,3;Fuel type occupation rate of market macro-data is divided to come from fuel oil Type market occupation rate statistical organization, divide fuel type goods stock expand sample data be expand sample after platform sampling samples in number According to table 3 is to show three kinds of fuel types;
Table 3
Number Fuel type
1 Diesel oil
2 Gasoline
3 Natural gas
S54, it is investigated according to comprehensive traffic and expands sample check successful experience, within the scope of acceptable error, i.e., mean error exists 10% hereinafter, output includes the complete expansion sample data of error, can refer to table 5, table 6, data shown in table 7.
During expansion sample data of the invention are checked, each province distribution of goods class year, volume of production and marketing macro-data was mainly from China Highway index net, the annual national economy and social development statistical communique that each province statistics bureau, statistical information net are issued, each province and city The annual and monthly Economic Operation of Committee of Development and Reform's website orientation, each province and city work, agriculture data release plan, each province and city count year Mirror;Each province year car ownership macro-data is mainly from International Statistical office and China Association for Automobile Manufacturers;Each province point combustion Oil type occupation rate of market data are mainly from Chinese fuel oil market annual report.
Optionally, step S1 is specifically included:
S11, first stage sampling, using stratified sampling method, in the way of time layering and region layering, with shared flat Platform shipping trip data carries out stratified sampling for target sample, obtains first stage sample, and calculate first stage sample Capacity n1
Time layering is to be divided into whole year 12 months according to calendar month operating, and region layering is according to China mainland The whole nation is divided into 31 areas to operate by area's administrative division;
S12, second stage sampling, using equal proportion sampling, according to cargo type, according to equal proportion principle, from first Several unit composition second stage samples are directly extracted in stage sample, and calculate second stage sample size n sample range2
S13, phase III sampling, using method of random sampling, are directly extracted from second stage sample according to randomly assigne For several samples as phase III sample, phase III sample is platform sampling samples, calculates phase III sample size n3, and as platform sampling samples capacity.
In embodiments of the present invention, stratified sampling method, equal proportion sampling and method of random sampling three are successively used step by step Stage sampling method, can be good at matching that respondent's scale that shipping between OD pairs of city is investigated is big, spy that the field of investigation is wide Property, take into account scientific and operability.
Optionally, first stage sample size n sample range in step S111It is calculated according to formula (1) or formula (2)
In formula, N is a period of time total sample size in shared platform shipping trip data, and t represents degree of probability Za/2,It is Average variance in group, Δ represent limit error,Represent into several average intra-class variances.
Optionally, in stratified sampling, the number of units in sample that each layer should extract is allocated using method of equal proportion, calculation formula Are as follows:
mi=n1Ni/N (1-3)
In formula, miThe sample number that should be extracted for i-th layer, NiFor i-th layer of total sample number.
Optionally, second stage sample size n sample range in step S122It is calculated according to formula (4)
n2=n1t2P(1-P)/n1Δ2+t2P(1-P) (1-4)
In formula, P (1-P) is expressed as several variances.
Optionally, phase III sample size n sample range in step S133Calculation method be according to interval estimation theory, prior When requirement clearly to estimator, counter to push away parsing and obtain required sample size, which includes two kinds:
The first determines sample size by absolute precision:
Assuming that given absolute precision λ, that is, require
Under 1- α confidence level, meet
I.e.
Compare interval estimation as a result, obtaining:
In formula, u1-α/2It is N (0,1) distributionQuantile,It is estimationMean-squared departure, S2It is overall Variance;
Second by relative accuracy decision sample size:
Given relative accuracy ε, i.e.,
Under 1- α confidence level, meet
Compare interval estimation as a result, obtaining:
The successful experiences such as China's Urban Residential Trip survey sampling, volume of road freight survey sampling are used for reference, it is specified that originally For inventive method under 95% confidence level, the limit relative error range of sampling aim parameter estimation is 10% to 15%.Therefore root The sampling samples capacity formula of upper three stage samplings method can successively calculate shipping city OD in last the method for the present invention accordingly To sample size needed for expanding sample investigation.Sampling samples amount size is finally shown in the form of by the month OD volume of goods transported, is with 2018 Example, as shown in table 4, sample sample size is about ten thousand tons of volumes of goods transported of xxx, and sampling rate is about xxx%, this shipping survey sampling rate Close to theoretical value, meet the sampling rate requirement under Conditions of General Samples.
Table 4
Month Sample size (ten thousand tons) Month Sample size (ten thousand tons)
In January, 2018 Xxx ten thousand In July, 2018 Xxx ten thousand
2 months 2018 Xxx ten thousand In August, 2018 Xxx ten thousand
In March, 2018 Xxx ten thousand In September, 2018 Xxx ten thousand
In April, 2018 Xxx ten thousand In October, 2018 Xxx ten thousand
In May, 2018 Xxx ten thousand In November, 2018 Xxx ten thousand
In June, 2018 Xxx ten thousand In December, 2018 Xxx ten thousand
Optionally, step S3 is specifically included:
S31, the platform sampling samples after data prediction are divided to month and carry out point province OD by cargo type and are analyzed, looked into It sees between the OD of each province and lacks type of merchandize;
S32, in conjunction with province year cargo volume of production and marketing data of respectively setting out, the monthly cargo type highway freight ratio in each province, Determine whether each moon excalation type of merchandize in each province is abnormal;
S33, it is directed to abnormal type of merchandize, missing cargo is augmented in the OD of province, to be saved involved by the exception cargo Part highway OD freight volume accounts for the ratio of shipping total amount in province involved by the exception cargo, carries out OD points to the missing cargo type volume of goods transported Solution generates the data list comprising province of setting out, arrival province, cargo type, shipping total amount;
S34, vehicle, lorry self weight, truckload ratio are corresponded in conjunction with the abnormal cargo of such in platform sampling samples, to goods Fortune total amount is further decomposed, and is generated comprising province of setting out, is reached province, cargo type, shipping total amount, vehicle, lorry self weight, goods Vehicle-mounted heavy data list;
S35, the city OD volume of goods transported is determined using year interurban trucking total amount, Expressway Road flow etc., in city OD Shipping total amount is decomposed in dimension, generates comprising province of setting out, city of setting out, reach province, arrival city, cargo class Type, shipping total amount, vehicle, lorry self weight, the data list of truckload;
S36, using platform sampling samples point vehicle and divide fuel type proportion, decompose goods by vehicle, fuel type Freight volume generates comprising province of setting out, city of setting out, reaches province, arrival city, cargo type, shipping total amount, vehicle, fuel oil The data list of type;
S37, according to freight all kinds annual charging ratio in each province in platform sampling samples, pass through truckload and average dress Load rate obtains goods weight, and shipping total amount is decomposed into goods weight and cargo transport pass, generates comprising province of setting out, sets out City reaches province, reaches city, cargo type, shipping total amount, vehicle, fuel type, goods weight, cargo transport pass Data list;
S38, expansion all for exporting missing data part.
Optionally, as shown in Fig. 2, step S4 is specifically included:
S41, it determines and expands spline coefficient formula
K=k0*kcargo*kvehicle*kfuel (4-1)
Wherein, koTo expand sample initial coefficients, kcargoFor cargo coefficient of variation, kvehicleVehicle fluctuation is corresponded to for each cargo class Coefficient, kfuleTo divide vehicle fuel type coefficient of variation;
S42, k is determinedo
ko=Q/q (4-2)
Wherein, Q is OD to the year macroscopic view volume of goods transported, is obtained from cargo volume of production and marketing statistical organization, q is platform sampling samples data In the year volume of goods transported;
S43, k is determinedcargo
kcargo=qcargo/qr (4-3)
Wherein, qcargoFor the moon volume of goods transported of the platform sampling samples r month class cargo between OD pairs, qrFor platform sampling sample The r volume of goods transported month in and month out between OD pairs in this;
S44, k is determinedvehicle
kvehicle=qvehicle/qr (4-4)
Wherein, qvehicleThe volume of goods transported of the vehicle between OD pairs is corresponded to for r month class cargo in platform sampling samples;
S45, k is determinedfule
kfule=qfuel/qr (4-5)
Wherein, qfuelFor the volume of goods transported of the r month fuel type between OD pairs in platform sampling samples;
S46, it is calculated and is exported according to formula (4-1) and expand spline coefficient K, sampled using revised expansion spline coefficient K to platform Sample carries out expansion sample.
Table 5, table 6, table 7 are expansion sample result few examples of the invention:
5 2018 years each province volumes of goods transported of table and error statistics (unit: ten thousand tons)
Province January Error 2 months Error March Error April Error May Error June Error
Shanghai 3211.3 0.018 2851.268 0.052935 3406.652 0.065666 3124.757 0.075284 3157.018 0.077282 2914.982 0.153695
Yunnan 7915.33 0.025 6899.404 0.112415 11076.1 0.027709 11489.77 0.013858 9718.482 0.114475 9855.58 0.023075
The Inner Mongol 8816.99 0.067 6249.132 0.062227 11307.45 0.025412 10782.97 0.049896 12119.18 0.070617 12651.41 0.060751
Beijing 1140.64 0.013 982.7772 0.142974 1203.705 0.024338 1686.356 0.070355 1525.709 0.180435 2033.351 0.001303
Jilin 2940.86 0.014 1322.213 0.017234 2816.543 0.040637 3883.882 0.006725 4198.772 0.008152 4252.901 0.005431
Sichuan 11657.42 0.003 6726.278 0.09377 11252.39 0.061641 12318.11 0.041312 13566.69 0.08759 12694.13 0.003377
Tianjin 2233.93 0.084 2139.081 0.137404 2690.167 0.04566 2936.785 0.039232 2801.818 0.129624 2983.762 0.073142
Ningxia 1891.34 0.183 1616.389 0.035642 2317.745 0.078203 2981.495 0.034716 3269.576 0.038973 2684.336 0.17571
Anhui 15186.48 0.125 9952.966 0.135541 23917.2 0.004649 22388.61 0.072152 24926.26 0.014793 20659.39 0.118377
Shandong 15621.1 0.065 13830.78 0.029587 22968.77 0.088957 23310.59 0.074576 25443.04 0.051997 24663.48 0.05938
Shanxi 8049.51 0.017 5178.149 0.057907 5700.926 0.090525 8129.013 0.044038 8626.031 0.056569 10005.893 0.012784
Guangdong 17966.8 0.04 21319.08 0.014865 20722.69 0.047451 22610.25 0.006358 24309.85 0.023853 23229.32 0.031842
Guangxi 10044.3 0.05 6242.163 0.039543 10249.41 0.036937 11212.44 0.013428 11729.32 0.048228 10827.56 0.048436
Xinjiang 4404.07 0.08 1981.463 0.075973 6188.494 0.056962 6301.438 0.089434 6249.571 0.01287 5529.859 0.275439
Jiangsu 7079.25 0.042 7199.469 0.098414 10991.54 0.0287 11005.35 0.024 10739.31 0.114504 10739.29 0.035357
Jiangxi 10230.62 0.02 5176.989 0.082482 11025.81 0.014891 11030.74 0.011809 9432.008 0.081424 10568.91 0.010322
Hebei 12444.94 0.081 7360.87 0.102723 15493.69 0.021577 15186.77 0.160154 17524.06 0.13815 17428.63 0.082128
Henan 14331.35 0.097 5863.256 0.066302 12294.71 0.131381 13840.16 0.094857 15149.24 0.079658 16675.88 0.089837
Zhejiang 9544.46 0.023 8501.852 0.008957 9564.737 0.142426 13979.52 0.096819 13599 0.011912 12295.93 0.015377
Hainan 886.63 0.008 846.1853 0.001401 875.5046 0.000566 874.4422 0.002925 883.7024 0.001926 912.6837 0.003634
The table 6 2018 fraction of the year cargo type volume of goods transported (unit: ten thousand tons)
Type of merchandize January 2 months March April May June July August September October November December
0 0 8.5662 6.0251 0 0 0 0.74125 0 0.65214 0.70231 0
Other 5621.562 2324.213 4119.692 4872.472 4258.014 6591.685 4613.553 5998.225 5009.2 5915.412 6194.12 4765.357
Agriculture, forestry, animal husbandry and fishery product 58515.34 49942.1 64051.92 76107.94 85497.55 73686.04 96018.51 90998.1 86677.88 90290.56 96751.7 79125.48
Industrial chemicals and product 8823.129 6698.368 10518.6 11726.56 11782.94 11136.98 11895.59 10556.57 11692.86 10369.53 11355.11 10632.82
Non-ferrous metal 239.0282 2891.939 1884.259 4693.669 7715.893 300.7447 5393.4614 3041.264 448.7136 3085.799 3269.311 433.7968
Timber 332.2003 587.5458 3307.728 2736.642 1110.842 2051.674 1356.406 4561.043 4162.762 4462.865 4771.937 4057.054
Mechanical equipment, electric appliance 23976.07 16399.21 32934.38 31965.93 28900.78 30525.42 39009.09 37576.5 38762.4 36940.86 40209.02 35007.18
Cement 5033.209 130.703 241.2424 256.5611 162.3674 5868.631 286.5022 439.2028 7415.718 448.6254 467.734 7355.141
Coal and its product 8738.482 5368.435 4967.678 8139.926 10047.36 11764.59 7379.881 5434.969 15564.84 5251.552 4389.864 14081.22
Salt 126.0978 148.3826 13.71945 129.9412 238.5026 141.404 46.30471 226.7375 251.2598 227.829 236.8429 237.421
Petroleum, natural gas and its product 858.4272 1595.792 3728.116 2955.872 1786.284 866.9424 1939.271 922.382 982.6431 921.7735 938.9019 946.4748
Mineral construction material 11949.08 4280.233 12968.92 11055.31 7648.643 14717.21 7142.843 22451.94 23877.23 22875.79 24575.53 23314.12
Grain 5730.27 1985.651 3695.262 4332.912 4127.161 7583.869 3383.778 4921.332 8128.476 5051.636 4566.817 7085.13
Fertilizer and pesticide 4948.104 2329.153 4716.197 5059.472 4698.946 6487.782 2571.95 3227.086 4594.529 3280.925 3707.559 4057.353
Light industry, medical product 78051.13 57375.72 91566.03 97409.49 101647.6 95336.08 86198.33 85195.21 96503.58 84940.71 91332.27 89727.94
Metallic ore 10396.89 5277.787 8846.799 10311.16 10932.99 13002.51 12674.9 8978.381 12424.09 9181.156 9159.759 11933.5
Steel 10595.63 11450.27 7866.942 10904.15 22213.63 12853.96 28839.51 30355.3 12483.53 30226.74 32529.56 10988.54
Non-metallic ore 8041.365 6631.87 10270.08 10273.53 10192.45 9578.625 7852.742 14482.96 14868.64 14886.95 15265.02 14172.22
7 2018 years vehicle statistics (pass) of table
Lorry vehicle January 2 months March April May June July August September October November December
Common in-vehicle 66109637 49677369 86821986 64473254 53411392 66872115 8926195 100312902 97588906 100312902 105487347 93493249
Tractor 54238809 39023598 66195318 77147167 82439645 63549596 11273848 76641739 84753024 76641739 83309291 80310032
Dumper 52369848 39014109 62356523 47257096 40854597 52861219 7497167 86608224 75225970 86608224 88230555 72433316
Light-duty Vehicle 13149939 10876550 18457486 13625979 11369862 13246276 1548695 22645435 19230245 22648435 22386589 18532760
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of expansion sample check method based on shared platform shipping trip data characterized by comprising
S1, using shared platform shipping trip data as target sample, pass sequentially through three phases, sampled step by step, it is final To platform sampling samples, platform sampling samples capacity is determined;
S2, data prediction classify to platform sampling samples, including lorry classification and freight classification;
S3, variance analysis and amendment carry out missing data completion to platform sampling samples, and export the expansion sample of missing data part Sample;
S4, on the basis of data prediction and variance analysis, expand spline coefficient according to determining between OD pairs of different provinces, and according to this Expand spline coefficient and expansion sample is carried out to platform sampling samples;
S5, expand the check of sample data, including
S51, school is carried out to each province, each goods class, each volume of goods transported after expansion sample using each province distribution of goods class year volume of production and marketing macro-data Core, and carry out error analysis:
Wherein, j expression jth kind cargo type, j=1,2 ..., 17, volume of production and marketing macro-data comes from cargo volume of production and marketing statistical machine Structure, volume of goods transported expansion sample data are the data in the platform sampling samples after expanding sample;
S52, each province lorry type after expanding sample is checked using each province year car ownership macro-data, and is missed Difference analysis:
Wherein, l expression l kind lorry type, l=1,2,3,4;Car ownership macro-data comes from car ownership statistical machine Structure, goods stock expansion sample data are the data in the platform sampling samples after expanding sample;
S53, divide fuel type occupation rate of market data to check fuel type using each province, and carry out error analysis:
Wherein, y expression y kind fuel type, y=1,2,3;Fuel type occupation rate of market macro-data is divided to come from fuel type Occupation rate of market statistical organization, divide fuel type goods stock expand sample data be expand sample after platform sampling samples in data;
S54, it is investigated according to comprehensive traffic and expands sample check successful experience, within the scope of acceptable error, i.e., mean error is 10% Hereinafter, output is complete to expand sample data.
2. the expansion sample check method according to claim 1 based on shared platform shipping trip data, which is characterized in that step Rapid S1 is specifically included:
S11, first stage sampling, using stratified sampling method, in the way of time layering and region layering, with shared platform goods Fortune trip data carries out stratified sampling for target sample, obtains first stage sample, and calculate first stage sample size n1
S12, second stage sampling, using equal proportion sampling, according to cargo type, according to equal proportion principle, from the first stage Several unit composition second stage samples are directly extracted in sample, and calculate second stage sample size n sample range2
S13, phase III sampling, using method of random sampling, are directly extracted from second stage sample several according to randomly assigne For sample as phase III sample, phase III sample is platform sampling samples, calculates phase III sample size n sample range3, and As platform sampling samples capacity.
3. the expansion sample check method according to claim 2 based on shared platform shipping trip data, which is characterized in that step First stage sample size n sample range in rapid S111It is calculated according to formula (1) or formula (2)
In formula, N is a period of time total sample size in shared platform shipping trip data, and t represents degree of probability Za/2,It is in group Average variance, Δ represent limit error,Represent into several average intra-class variances.
4. the expansion sample check method according to claim 3 based on shared platform shipping trip data, which is characterized in that point In layer sampling, the number of units in sample that each layer should extract is allocated using method of equal proportion, calculation formula are as follows:
mi=n1Ni/N (1-3)
In formula, miThe sample number that should be extracted for i-th layer, NiFor i-th layer of total sample number.
5. the expansion sample check method according to claim 3 based on shared platform shipping trip data, which is characterized in that step Second stage sample size n sample range in rapid S122It is calculated according to formula (4)
n2=n1t2P(1-P)/n1Δ2+t2P(1-P) (1-4)
In formula, P (1-P) is expressed as several variances.
6. the expansion sample check method according to claim 5 based on shared platform shipping trip data, which is characterized in that step Phase III sample size n sample range in rapid S133Calculation method be according to interval estimation theory, clearly estimator is wanted in advance When asking, it is counter push away parsing obtain required sample size, the calculation method include two kinds:
The first determines sample size by absolute precision:
Assuming that given absolute precision λ, that is, require
Under 1- α confidence level, meet
I.e.
Compare interval estimation as a result, obtaining:
In formula, u1-α/2It is N (0,1) distributionQuantile,It is estimationMean-squared departure, S2It is population variance;
Second by relative accuracy decision sample size:
Given relative accuracy ε, i.e.,
Under 1- α confidence level, meet
Compare interval estimation as a result, obtaining:
7. the expansion sample check method according to claim 1 based on shared platform shipping trip data, which is characterized in that step Rapid S3 is specifically included:
S31, the platform sampling samples after data prediction are divided to month and carry out point province OD by cargo type and are analyzed, checked each Type of merchandize is lacked between the OD of province;
S32, in conjunction with province year cargo volume of production and marketing data of respectively setting out, the monthly cargo type highway freight ratio in each province, determine Whether each moon excalation type of merchandize in each province is abnormal;
S33, it is directed to abnormal type of merchandize, missing cargo is augmented in the OD of province, with the public affairs of province involved by the exception cargo Road OD freight volume accounts for the ratio of shipping total amount in province involved by the exception cargo, carries out OD decomposition to the missing cargo type volume of goods transported, raw At the data list comprising province of setting out, arrival province, cargo type, shipping total amount;
S34, vehicle, lorry self weight, truckload ratio are corresponded in conjunction with the abnormal cargo of such in platform sampling samples, it is total to shipping Amount further decomposition generates comprising province of setting out, reaches province, cargo type, shipping total amount, vehicle, lorry self weight, lorry load The data list of weight;
S35, the city OD volume of goods transported is determined using year interurban trucking total amount, Expressway Road flow etc., in city OD dimension On shipping total amount is decomposed, generate comprising set out province, city of setting out, reach province, reach city, cargo type, goods Transport total amount, vehicle, lorry self weight, the data list of truckload;
S36, using platform sampling samples point vehicle and divide fuel type proportion, decompose the volume of goods transported by vehicle, fuel type, Generate includes set out province, city of setting out, arrival province, arrival city, cargo type, shipping total amount, vehicle, fuel type Data list;
S37, according to freight all kinds annual charging ratio in each province in platform sampling samples, pass through truckload and average charging ratio Goods weight is obtained, shipping total amount is decomposed into goods weight and cargo transport pass, is generated comprising set out province, city of setting out City reaches province, reaches city, cargo type, shipping total amount, vehicle, fuel type, goods weight, cargo transport pass Data list;
S38, expansion all for exporting missing data part.
8. the expansion sample check method according to claim 1 based on shared platform shipping trip data, which is characterized in that step Rapid S4 is specifically included:
S41, it determines and expands spline coefficient formula
K=k0*kcargo*kvehicle*kfuel (4-1)
Wherein, koTo expand sample initial coefficients, kcargoFor cargo coefficient of variation, kvehicleVehicle coefficient of variation is corresponded to for each cargo class, kfuleTo divide vehicle fuel type coefficient of variation;
S42, k is determinedo
ko=Q/q (4-2)
Wherein, Q is OD to the year macroscopic view volume of goods transported, is obtained from cargo volume of production and marketing statistical organization, q is in platform sampling samples data Year volume of goods transported;
S43, k is determinedcargo
kcargo=qcargo/qr (4-3)
Wherein, qcargoFor the moon volume of goods transported of the platform sampling samples r month class cargo between OD pairs, qrFor in platform sampling samples The r volume of goods transported month in and month out between OD pairs;
S44, k is determinedvehicle
kvehicle=qvehicle/qr (4-4)
Wherein, qvehicleThe volume of goods transported of the vehicle between OD pairs is corresponded to for r month class cargo in platform sampling samples;
S45, k is determinedfule
kfule=qfuel/qr (4-5)
Wherein, qfuelFor the volume of goods transported of the r month fuel type between OD pairs in platform sampling samples;
S46, it is calculated and is exported according to formula (4-1) and expand spline coefficient K, using revised expansion spline coefficient K to platform sampling samples Carry out expansion sample.
CN201910662056.6A 2019-07-22 2019-07-22 Sample expansion and checking method based on shared platform freight trip data Active CN110363483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910662056.6A CN110363483B (en) 2019-07-22 2019-07-22 Sample expansion and checking method based on shared platform freight trip data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910662056.6A CN110363483B (en) 2019-07-22 2019-07-22 Sample expansion and checking method based on shared platform freight trip data

Publications (2)

Publication Number Publication Date
CN110363483A true CN110363483A (en) 2019-10-22
CN110363483B CN110363483B (en) 2022-05-03

Family

ID=68221430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910662056.6A Active CN110363483B (en) 2019-07-22 2019-07-22 Sample expansion and checking method based on shared platform freight trip data

Country Status (1)

Country Link
CN (1) CN110363483B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046339A (en) * 2019-12-04 2020-04-21 广州市长程软件有限公司 Method and device for processing simulation base period data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127040A (en) * 2007-09-14 2008-02-20 东南大学 Highway origin-destination traffic flow character extraction method
AU2009316464A1 (en) * 2008-11-21 2010-05-27 Visa International Service Association Authentication of documents having a magnetic stripe
CN108009972A (en) * 2017-10-24 2018-05-08 北京交通大学 A kind of multimode trip O-D needs estimate methods checked based on multi-source data
CN108156660A (en) * 2017-12-27 2018-06-12 西南交通大学 A kind of abductive approach that WiFi probe collection success rates are improved based on big data
CN108389011A (en) * 2018-05-07 2018-08-10 广州市交通规划研究院 It is a kind of the vehicle that is combined of quadrat method expanded based on big data and tradition possess distribution check modification method
CN108921465A (en) * 2018-06-11 2018-11-30 华南理工大学 A kind of network of highways cargo transport index automation statistical method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127040A (en) * 2007-09-14 2008-02-20 东南大学 Highway origin-destination traffic flow character extraction method
AU2009316464A1 (en) * 2008-11-21 2010-05-27 Visa International Service Association Authentication of documents having a magnetic stripe
CN108009972A (en) * 2017-10-24 2018-05-08 北京交通大学 A kind of multimode trip O-D needs estimate methods checked based on multi-source data
CN108156660A (en) * 2017-12-27 2018-06-12 西南交通大学 A kind of abductive approach that WiFi probe collection success rates are improved based on big data
CN108389011A (en) * 2018-05-07 2018-08-10 广州市交通规划研究院 It is a kind of the vehicle that is combined of quadrat method expanded based on big data and tradition possess distribution check modification method
CN108921465A (en) * 2018-06-11 2018-11-30 华南理工大学 A kind of network of highways cargo transport index automation statistical method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GAN, M 等: ""The identification of truck-related greenhouse gas emissions and critical impact factors in an urban logistics network"", 《JOURNAL OF CLEANER PRODUCTION》 *
ZHENGZHANG 等: ""Air Target Intention Recognition Based on Further Clustering and Sample Expansion"", 《CHINESE CONTROL CONFERENCE》 *
李元: ""基于多源大数据的居民出行调查校核体系研究"", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》 *
闫英 等: ""基于不确定群决策信息随机抽样的组合赋权法"", 《系统工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046339A (en) * 2019-12-04 2020-04-21 广州市长程软件有限公司 Method and device for processing simulation base period data
CN111046339B (en) * 2019-12-04 2024-03-08 广州市长程软件有限公司 Method and device for processing simulation base period data

Also Published As

Publication number Publication date
CN110363483B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
Chiu et al. Assessment of China transit and economic efficiencies in a modified value-chains DEA model
Caban et al. Statistical analyses of selected maintenance parameters of vehicles of road transport companies
Droździel et al. An analysis of costs of vehicle repairs in a transportation company. Part I
Duan et al. Assessing the carbon footprint of the transport sector in mega cities via streamlined life cycle assessment: a case study of Shenzhen, South China
Lewis A life cycle assessment of the passenger air transport system using three flight scenarios
Mohammadi et al. The use of grey system theory in predicting the road traffic accident in Fars province in Iran
Setiawan et al. The correlations between airport sustainability and Indonesian economic growth
Zacharof et al. An estimation of heavy-duty vehicle fleet CO2 emissions based on sampled data
CN110363483A (en) A kind of expansion sample check method based on shared platform shipping trip data
Manley et al. A model to assess industry vulnerability to disruptions in mineral commodity supplies
Kang et al. Measuring the development of Chinese provincial road safety over the period 2007–2016
Dobrowolski et al. Daily kilometrage analysis for selected vehicle groups
Verevkin et al. Method of providing safe technical condition of vehicles by technological design of enterprises
Woody et al. Life cycle greenhouse gas emissions of the USPS next-generation delivery vehicle fleet
CN113570250B (en) Multi-objective comprehensive evaluation method for full life cycle of transformer temperature measuring device
Trung et al. Decısıon Makıng for Car Selectıon in Vıetnam
Pejić et al. Model for calculating average vehicle mileage for different vehicle classes based on real data: a case study of Croatia
CN114219269A (en) Quality evaluation method and whole-process and whole-service quality evaluation method based on same
Pitfield Sub-optimality in freight distribution
Murthy et al. Modal split analysis using logit models
JPH07121588A (en) Environmental load evaluating method for industrial product
Shabani et al. A statistical study of commodity freight value/tonnage trends in the United States
Tien et al. Decısıon makıng for car selectıon ın Vıetnam
Tien et al. DECISION MAKING FOR CAR SELECTION IN VIETNAM.
Semyalo et al. Causes of financial loss to contractors in the Uganda construction industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant