CN108520335A - Inspect object prediction method, apparatus, equipment and its storage medium by random samples - Google Patents

Inspect object prediction method, apparatus, equipment and its storage medium by random samples Download PDF

Info

Publication number
CN108520335A
CN108520335A CN201810232272.2A CN201810232272A CN108520335A CN 108520335 A CN108520335 A CN 108520335A CN 201810232272 A CN201810232272 A CN 201810232272A CN 108520335 A CN108520335 A CN 108520335A
Authority
CN
China
Prior art keywords
data
waybill
history
model
waybill data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810232272.2A
Other languages
Chinese (zh)
Inventor
杨刚
黄丽诗
胡泽柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
SF Tech Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201810232272.2A priority Critical patent/CN108520335A/en
Publication of CN108520335A publication Critical patent/CN108520335A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0832Special goods or special handling procedures, e.g. handling of hazardous or fragile goods
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/705Unicode
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/80Management or planning

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses sampling observation object prediction method, apparatus, equipment and its storage mediums.This method includes:Receive new waybill data;Data processing is carried out to aforementioned waybill data;Aforementioned waybill data after data processing are inputted to the risk forecast model pre-established, output prediction result is used to indicate whether aforementioned waybill data belong to sampling observation object.According to the technical solution of the embodiment of the present application, whether the waybill data newly to be arrived using the risk forecast model prediction pre-established need to be inspected by random samples, the problems such as artificial sampling observation is without purpose, missing inspection is avoided, significant increase logistic industry determines the working efficiency of the accuracy rate and processing sampling observation object of sampling observation object.

Description

Inspect object prediction method, apparatus, equipment and its storage medium by random samples
Technical field
Present application relates generally to logistics data technical fields, and in particular to logistics data excavates processing technology field, especially It is related to inspecting object prediction method, apparatus, equipment and its storage medium by random samples.
Background technology
With the development of logistic industry, express waybill amount increases rapidly, daily nearly ten million express delivery amount, hide risk also with Increase.It unpacks if all taken every express waybill, workload is too big;If using random sampling open box to check, There is no specific purpose, causes examination hit rate low.
Currently, posting object receipts according to support posts standard to there may be the express deliveries of risk manually to be checked, it is this by virtue of experience The mode spot-check is easy the express delivery that risk is hidden in missing inspection, and this mode information is single is not easy to manage, and can not improve pumping The accuracy of inspection causes the safety issue of delivery industry to become increasingly conspicuous.
In order to overcome the problems referred above, it would be highly desirable to propose a kind of new solution.
Invention content
In view of drawbacks described above in the prior art or deficiency, it is intended to provide a kind of based on interactive interface establishment operational trials field The scheme of scape.
In a first aspect, the embodiment of the present application provides a kind of sampling observation object prediction method, this method includes:
Receive new waybill data;
Data processing is carried out to aforementioned waybill data;
Aforementioned waybill data after data processing are inputted to the risk forecast model pre-established, output prediction result is used for Indicate whether aforementioned waybill data belong to sampling observation object.
Second aspect, the embodiment of the present application provide a kind of sampling observation object prediction device, which includes:
Receiving unit, for receiving new waybill data;
Data processing unit, for carrying out data processing to aforementioned waybill data;
Predicting unit, it is defeated for the aforementioned waybill data after data processing to be inputted the risk forecast model pre-established Go out prediction result and is used to indicate whether aforementioned waybill data belong to sampling observation object.
The third aspect, the embodiment of the present application provide a kind of computer equipment, including memory, processor and are stored in On memory and the computer program that can run on a processor, the processor realize such as the embodiment of the present application when executing the program The method of description.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence, the computer program are used for:
The method as described in the embodiment of the present application is realized when the computer program is executed by processor.
The embodiment of the present application provides the scheme of prediction sampling observation object, and shadow is obtained by machine learning algorithm training study in advance Ring the mapping relations between the feature and Examined of Examined, and the waybill data for predicting newly to arrive using the mapping relations Whether need to be inspected by random samples, to promote the accuracy rate that object is inspected in express delivery by random samples.In order to further enhance the treatment effeciency of model, In the embodiment of the present application also number is promoted by data processing methods such as data conversion rule, one-hot coding and data compression process According to the efficiency of processing, and then lift scheme handles the speed of data.
Description of the drawings
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the flow diagram of sampling observation object prediction method provided by the embodiments of the present application;
Fig. 2 shows the flow diagrams of risk forecast model method for building up provided by the embodiments of the present application;
Fig. 3 shows the structural schematic diagram of sampling observation object prediction device provided by the embodiments of the present application;
Fig. 4 shows that risk forecast model provided by the embodiments of the present application establishes the structural schematic diagram of device;
Fig. 5 shows the structural schematic diagram of the computer system of the terminal device suitable for being used for realizing the embodiment of the present application.
Specific implementation mode
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, is illustrated only in attached drawing and invent relevant part.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Referring to FIG. 1, Fig. 1 shows the flow diagram of sampling observation object prediction method provided by the embodiments of the present application.
As shown in Figure 1, this method includes:
Step 110, new waybill data are received.
In the embodiment of the present application, when receiving new waybill (or being express delivery list or parcel form), system extracts The new waybill data received.New waybill data include:Sender's information, support post object information, recipient's information etc., post part Address information, posting address information etc..Wherein, sender's information includes:For the first time post the part time, post for the last time the part time, Addressee time first time, last time addressee time always post number of packages amount, total addressee quantity, hold in the palm and post object content etc..Wherein, part is posted Address information includes:Way address is posted in source, area code is posted in source, ground city codes are posted in source, earth mat vertex type, the source sites Ji Di are posted in source Longitude and latitude etc..Wherein, posting address information includes:Destination-address, purpose area codes, destination city codes, mesh Earth mat vertex type, destination site longitude and latitude etc..Wherein, support posts object information and includes:Support posts object weight, support posts object transporter Formula, Payment Type, haulage time etc..These information can be obtained by Spark clusters, can also be obtained by center for information management It obtains.
Step 120, data processing is carried out to new waybill data.
It is right using the data conversion dictionary (or being data conversion rule) being generated in advance after obtaining new waybill data New waybill data are converted.Then one-hot coding, i.e. one-hot codings are carried out again.Wherein, the data conversion being generated in advance Dictionary is to carry out data conversion by the positive sample data in history waybill data, and the data conversion generated according to algorithm training is advised Then.
Wherein, one-hot codings are also known as an efficient coding, mainly using bit status register come to each state Encoded, each state has oneself independent register-bit, and only have when arbitrary one effectively.In actual machine In the application task of device study, the features of waybill data sometimes not always successive value, it may be possible to some classification values, such as Means of transportation can be divided into " land transportation " and " shipping ", can mark as " land transportation ", " shipping "];Posting part customer type can be divided into " monthly closing entry ", " member ", " individual traveler " can mark as " monthly closing entry ", " member ", " individual traveler "];Ground city codes are posted in source to be divided into " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing " can mark as " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing "].For these Feature, it usually needs feature digitlization is carried out to it.Some sample such as [" land transportation ", " member ", " Beijing "] is needed The feature of this classification value is digitized, most straightforward approach is by the way of serializing:[0,1,3].But such feature Processing can not be directly placed into machine learning algorithm.To solve the problem above-mentioned, one-hot coding modes may be used, it is right Above-mentioned sample " [" land transportation ", " member ", " Beijing "] " is encoded, and " land transportation " is then encoded to [0001,0000], " member " coding For [0000,0010,0000], " Beijing " is encoded to [0000,0000,0000,1000].
The digitized complete result of feature of above-mentioned sample is:
[0001,0000,0000,0010,0000,0000,0000,0000,1000];
It can be abbreviated as [1,0,0,1,0,0,0,0,1], data that treated in this way, there are data to become very sparse The problem of.
It is above-mentioned sparse in order to solve the problems, such as, it can also be to the data reconvert after one-hot coding at libsvm lattice Formula, the result after features described above data are compressed is 11:14:19:1.
About the data conversion after one-hot coding at libsvm formats, can convert as follows:
A) structural data is as shown in table 1 below:
label transport_type custom_type district
1 land member beijing
0 land pay_monthly shanghai
1 air individual guangzhou
1 land individual Shenzhen
0 air member shanghai
0 air member shenzhen
Table 1
The restriction of each parameter in table 1 can be understood as
Transport_type=[" land ", " air "] (annotation, means of transportation:Transport_type, land transportation: Land, shipping:air)
(annotation, posts part customer type to custom_type=[" pay_monthly ", " member ", " individual "]: Custom_type, monthly closing entry:Pay_monthly, member:Member, individual traveler:Individual) district=[" Shenzhen ", " guangzhou ", " shanghai ", " beijing "] (annotation, area is posted in source:District, Shenzhen: Shenzhen, Guangzhou:Guangzhou, Shanghai:Shanghai, Beijing:beijing)
B) one-hot coding is carried out, as shown in table 2
Table 2
C) by the above results boil down to libsvm data formats, it can be seen that after data compression, rejected the word that value is 0 Section, filling is respectively the 1st to the 9th field:
11:14:19:1
01:13:18:1
12:15:17:1
11:15:16:1
02:14:18:1
02:14:16:1
Step 130, the waybill data after data processing are inputted to the risk forecast model pre-established, export prediction result It is used to indicate whether waybill data belong to sampling observation object.
In the embodiment of the present application, by step 120, treated that data are input in the risk forecast model pre-established, defeated Go out prediction result and is used to indicate whether waybill data belong to sampling observation object.
Wherein, the risk forecast model pre-established be history waybill data are trained using model training algorithm and Examine and establish, by being retouched for establishing the mapping relations between the characteristic parameter collection of risk forecast model and prediction result It states, the influence of the characteristic parameter set pair prediction result for establishing risk forecast model.Wherein, model training algorithm can be appointed The model training algorithm of what machine learning, preferred extreme gradient boosted tree XGBoost algorithms.
The embodiment of the present application improves the sampling observation of waybill (or express delivery, package) by the risk forecast model pre-established Efficiency, and increase the accuracy that illegal contraband checks oneself discovery, to reduce the risk to external administration.
Further, in step 130, the risk forecast model pre-established be by a large amount of history waybill data by It is obtained according to machine learning algorithm training study.
With further reference to Fig. 2, Fig. 2 shows the streams for the risk forecast model method for building up that the another embodiment of the application provides Journey schematic diagram.
In the embodiment of the present application, it is pre- that risk is carried out to the waybill data newly received by the risk forecast model pre-established It surveys, prediction steps are identical as the method and step that Fig. 1 is described.Wherein, the risk forecast model pre-established is transported using history Forms data is obtained according to the training study of model training algorithm.Wherein, model training algorithm can be machine learning algorithm or depth Spend any one in learning algorithm.Machine learning algorithm includes:Random forest, gradient boosted tree, extreme gradient boosted tree XGBoost algorithms etc..Deep learning algorithm includes:Neural network model etc..
As shown in Fig. 2, the figure shows the process for establishing risk forecast model, this method includes:
Step 210, history waybill data are divided into training dataset by the history waybill data for obtaining predetermined time range And test data set.
The embodiment of the present application, the risk forecast model pre-established are by a large amount of history waybill data according to engineering Practise what algorithm training study obtained.The history waybill data of predetermined time range are obtained, which can be with 1 year or 2 Year, it is not construed as limiting here.By the history waybill data of the predetermined time range according to scheduled ratio cut partition be training dataset And test data set, scheduled ratio could be provided as 7: 3 either 8: 2 or 9: 1 etc..The setting of ratio does not also limit here It is fixed, it can adjust according to demand, to obtain optimum efficiency.
Step 220, characteristic parameter of the selection for establishing risk forecast model from the characteristic parameter of history waybill data Collection.
Before carrying out model training study, need to screen the characteristic parameter collection for establishing risk forecast model, feature The selection of parameter set is to abandon uncorrelated or redundancy feature to find optimal feature subset, to reduce Characteristic Number, Model accuracy is improved, run time is reduced.Feature Selection can be carried out to history waybill data set by filter type, also may be used To carry out subset search using randomized policy under Las Vegas's method frame, and using final result as characteristic set, Feature selecting and the fusion of machine learning training process can also be realized, over-fitting risk is reduced by norm regularization.
In the embodiment of the present application, it can be selected from history waybill data set for establishing risk by machine learning algorithm The characteristic parameter collection of prediction model, this feature parameter set include at least one or more of:Post part user characteristics subset of parameters, Waybill characteristic parameter subset;
Wherein, it includes following at least one to post part user characteristics subset of parameters:It posts the part time for the first time, post part for the last time Time, the last time addressee time, always posts number of packages amount, total addressee quantity, history support and posts object content, goes through at addressee time first time History support posts object content quantity etc., and wherein history support is posted object content and can be subdivided into again, and history addressee content and history post part content.
Wherein, waybill characteristic parameter subset includes following at least one:Way address is posted in source, area codes are posted in source, source is posted Ground city codes, source post earth mat vertex type, the source sites Ji Di longitude and latitude, destination-address, purpose area codes, destination city City's code, destination site type, destination site longitude and latitude, support post object weight, support posts object means of transportation, Payment Type, fortune Take the amount of money, haulage time etc..
Step 230, it is obtained according to the training study of model training algorithm for building using training dataset and test data set Mapping relations between the characteristic parameter collection and prediction result of vertical risk forecast model, the mapping relations are as risk profile mould Type.
The characteristic parameter collection for establishing risk forecast model is obtained by step 220, is trained using training dataset true Determine the undetermined parameter of risk forecast model.For example, initial value and iterations that setting waybill is inspected by random samples, pass through counting loss letter Then several first derivatives and second dervative traverses each tree structure, find the tree construction so that object function minimum, and calculate The optimal weights of each leaf node, the decision tree that iteration each time is established are overlapped, to obtain for establishing risk Mapping relations between the characteristic parameter collection and prediction result of prediction model, the mapping relations are as risk forecast model.
Preferably, model training algorithm is extreme gradient boosted tree XGBoost algorithms, the algorithm be easily achieved it is distributed and Parallel computation is suitable for the large-scale dataset of data class relationship complexity mostly between data, can improve the sampling observation of waybill data Accuracy, effectively promoted express delivery sampling observation efficiency.
For carrying out model training study using XGBoost algorithms, history waybill data are divided into training dataset (accounting for 80%) and test data set (accounting for 20%) is trained under the model parameter of optimization using training data set pair model, It determines the structure of all decision trees, then concentrates, model is verified, and according to Optimality Criteria lift scheme in test data Volume accuracy, such as root-mean-square error (Root Mean Square Error, RMSE) may be used etc..Wherein, the master of model It includes the least disadvantage letter needed for learning rate, the depth capacity of tree, minimum leaf node sample weights and node-classification to want parameter L2 regularization terms etc. in number drop-out value, specimen sample rate, feature sample rate, iterations and area.It can be tested by intersecting The mode of card determines learning rate and best decision tree quantity, and can be effectively prevented over-fitting by regularization parameter tuning asks Topic.
Optionally, before step 210, this method further includes:Data processing is carried out to history waybill data.
In the embodiment of the present application, may include for data processing:
Optionally, above-mentioned data are converted according to the data conversion rule being generated in advance;
Wherein, the data conversion rule being generated in advance, can be by filtering out positive sample data, so from training data concentration Positive sample data are subjected to numeralization processing afterwards, study obtains the rule of positive sample value dataization processing, and the rule is as number According to transformation rule.Above-mentioned data include new waybill data and/or history waybill data.
According to the data transformation rule being generated in advance to new waybill data and/or history waybill data, or by history The training dataset and test data set that waybill data divide are converted.
Optionally, aforementioned transformed data one-hot is carried out again to encode;
Optionally, to the data reconvert after one-hot is encoded at libsvm formats.
Wherein, one-hot is encoded, an also known as efficient coding, mainly using bit status register come to each shape State is encoded, and each state has oneself independent deposit position, and only have when arbitrary one effectively.In actual machine In the application task of device study, the features of waybill data sometimes not always successive value, it may be possible to some classification values, such as Means of transportation can be divided into " land transportation " and " shipping ", can mark as " land transportation ", " shipping "];Posting part customer type can be divided into " monthly closing entry ", " member ", " individual traveler " can mark as " monthly closing entry ", " member ", " individual traveler "];Ground city codes are posted in source to be divided into " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing " can mark as " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing "].For these Feature, it usually needs feature digitlization is carried out to it.Some sample such as [" land transportation ", " member ", " Beijing "] is needed The feature of this classification value is digitized, most straightforward approach is by the way of serializing:[0,1,3].But such feature Processing can not be directly placed into machine learning algorithm.To solve the problem above-mentioned, one-hot coding modes may be used to upper It states sample " [" land transportation ", " member ", " Beijing "] " to be encoded, " land transportation " is then encoded to [0001,0000], and " member " is encoded to [0000,0010,0000], " Beijing " are encoded to [0000,0000,0000,1000].
The above-mentioned digitized complete result of sample characteristics is:
[0001,0000,0000,0010,0000,0000,0000,0000,1000];
It can be abbreviated as [1,0,0,1,0,0,0,0,1], data that treated in this way, there are data to become very sparse The problem of.
It is above-mentioned sparse in order to solve the problems, such as, it can also be to the data reconvert after one-hot coding at libsvm lattice Formula, the result after features described above data are compressed is 11:14:19:1.
About the data conversion after one-hot coding at libsvm formats, can convert as follows:
A) structural data is as shown in table 1 below:
label transport_type custom_type district
1 land member beijing
0 land pay_monthly shanghai
1 air individual guangzhou
1 land individual Shenzhen
0 air member shanghai
0 air member shenzhen
Table 1
Restriction to each parameter in table 1 it is to be understood that
Transport_type=[" land ", " air "] (annotation, means of transportation:Transport_type, land transportation: Land, shipping:air)
(annotation, posts part customer type to custom_type=[" pay_monthly ", " member ", " individual "]: Custom_type, monthly closing entry:Pay_monthly, member:Member, individual traveler:Individual) district=[" Shenzhen ", " guangzhou ", " shanghai ", " beijing "] (annotation, area is posted in source:District, Shenzhen: Shenzhen, Guangzhou:Guangzhou, Shanghai:Shanghai, Beijing:beijing)
B) one-hot coding is carried out, as shown in table 2
Table 2
C) by the above results boil down to libsvm data formats, it can be seen that after data compression, rejected the word that value is 0 Section, filling is respectively the 1st to the 9th field.
11:14:19:1
01:13:18:1
12:15:17:1
11:15:16:1
02:14:18:1
02:14:16:1
It should be noted that although describing the operation of the method for the present invention with particular order in the accompanying drawings, this is not required that Or imply and must execute these operations according to the particular order, it could the realization phase or have to carry out operation shown in whole The result of prestige.On the contrary, the step of describing in flow chart, which can change, executes sequence.Additionally or alternatively, it is convenient to omit certain Multiple steps are merged into a step and executed, and/or a step is decomposed into execution of multiple steps by step.
Further referring to FIG. 3, Fig. 3 shows the structural representation of sampling observation object prediction device provided by the embodiments of the present application Figure.
As shown in figure 3, device 300 includes:
Receiving unit 310, for receiving new waybill data.
In the embodiment of the present application, when receiving new waybill either express delivery list or when parcel form, system extracts reception The new waybill data arrived.New waybill data include:Sender's information, support post object information, recipient's information etc., post part address Information, posting address information etc..Wherein, sender's information includes:It posts the part time for the first time, post part time, first for the last time Secondary addressee time, last time addressee time always post number of packages amount, total addressee quantity, hold in the palm and post object content etc..Wherein, part address is posted Information includes:Way address is posted in source, area code is posted in source, ground city codes are posted in source, earth mat vertex type, the source sites Ji Di longitude and latitude are posted in source Degree etc..Wherein, posting address information includes:Destination-address, purpose area codes, destination city codes, destination Site type, destination site longitude and latitude etc..Wherein, support posts object information and includes:Support posts object weight, support posts object means of transportation, pays Money type, haulage time etc..These information can be obtained by Spark clusters, can also be obtained by center for information management It arrives.
Data processing unit 320, for carrying out data processing to new waybill data.
It is right using the data conversion dictionary (or being data conversion rule) being generated in advance after obtaining new waybill data New waybill data are converted.Then one-hot coding, i.e. one-hot codings are carried out again.Wherein, the data conversion being generated in advance Dictionary is to carry out data conversion by the positive sample data in history waybill data, and the data conversion generated according to algorithm training is advised Then.
Wherein, one-hot codings are also known as an efficient coding, mainly using bit status register come to each state Encoded, each state has the independent register-bit of himself, and only have when arbitrary one effectively.Actual In the application task of machine learning, the features of waybill data sometimes not always successive value, it may be possible to some classification values, example It if means of transportation can be divided into " land transportation " and " shipping ", can mark as " land transportation ", " shipping "];Posting part customer type can be divided into " monthly closing entry ", " member ", " individual traveler " can mark as " monthly closing entry ", " member ", " individual traveler "];Ground city codes are posted in source can be divided into " deeply Ditch between fields ", " Guangzhou ", " Shanghai ", " Beijing " can mark as " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing "].For these spies Sign, it usually needs feature digitlization is carried out to it.For some sample, such as [" land transportation ", " member ", " Beijing "], need by The feature of this classification value digitizes, and most straightforward approach is by the way of serializing:[0,1,3].But at such feature Reason can not be directly placed into machine learning algorithm.To solve the problem above-mentioned, one-hot coding modes may be used to above-mentioned Sample " [" land transportation ", " member ", " Beijing "] " is encoded, and " land transportation " is then encoded to [0001,0000], and " member " is encoded to [0000,0010,0000], " Beijing " are encoded to [0000,0000,0000,1000].
The digitized complete result of feature of above-mentioned sample is:
[0001,0000,0000,0010,0000,0000,0000,0000,1000];
It can be abbreviated as [1,0,0,1,0,0,0,0,1], data that treated in this way, there are data to become very sparse The problem of.
It is above-mentioned sparse in order to solve the problems, such as, it can also be to the data reconvert after one-hot coding at libsvm lattice Formula, the result after features described above data are compressed is 11:14:19:1.
About the data conversion after one-hot coding at libsvm formats, can convert as follows:
A) structural data is as shown in table 1 below:
label transport_type custom_type district
1 land member beijing
0 land pay_monthly shanghai
1 air individual guangzhou
1 land individual Shenzhen
0 alr member shanghai
0 air member shenzhen
Table 1
Restriction to each parameter in table 1 it is to be understood that
Transport_type=[" land ", " air "] (annotation, means of transportation:Transport_type, land transportation: Land, shipping:air)
(annotation, posts part customer type to custom_type=[" pay_monthly ", " member ", " individual "]: Custom_type, monthly closing entry:Pay_monthly, member:Member, individual traveler:Individual) district=[" Shenzhen ", " guangzhou ", " shanghai ", " beijing "] (annotation, area is posted in source:District, Shenzhen: Shenzhen, Guangzhou:Guangzhou, Shanghai:Shanghai, Beijing:beijing)
B) one-hot coding is carried out, as shown in table 2
Table 2
C) by the above results boil down to libsvm data formats, it can be seen that after data compression, it is 0 to have eliminated value Field, filling is respectively the 1st to the 9th field.
11:14:19:1
01:13:18:1
12:15:17:1
11:15:16:1
02:14:18:1
02:14:16:1
Predicting unit 330, for the waybill data after data processing to be inputted the risk forecast model pre-established, output Prediction result is used to indicate whether waybill data belong to sampling observation object.
In the embodiment of the present application, by data processing unit 320, treated that data are input to the risk profile pre-established In model, output prediction result is used to indicate whether waybill data belong to sampling observation object.
Wherein, the risk forecast model pre-established be history waybill data are trained using model training algorithm and Examine and establish, by being retouched for establishing the mapping relations between the characteristic parameter collection of risk forecast model and prediction result It states, the influence of the characteristic parameter set pair prediction result for establishing risk forecast model.Wherein, model training algorithm can be appointed The model training algorithm of what machine learning, preferred extreme gradient boosted tree XGBoost algorithms.
The embodiment of the present application improves the sampling observation of waybill (or express delivery, package) by the risk forecast model pre-established Efficiency, and increase the accuracy that illegal contraband checks oneself discovery, to reduce the risk to external administration.
Further, in predicting unit 330, the risk forecast model pre-established is by a large amount of history waybill number It is obtained according to according to machine learning algorithm training study, predicting unit 330 is by calling the risk forecast model pre-established come pre- Sentence whether new waybill data belong to sampling observation object.
Referring to FIG. 4, Fig. 4 shows that risk forecast model provided by the embodiments of the present application establishes the structural representation of device Figure.
In the embodiment of the present application, it is applied in device 300 by the risk forecast model that will be pre-established according to Fig. 4, is used Whether belong to sampling observation object in the new waybill data that prediction receives.Wherein, the risk forecast model pre-established is to utilize to go through The risk forecast model that history waybill data are obtained according to the training study of model training algorithm.Wherein, model training algorithm can be Any one in machine learning algorithm or deep learning algorithm.Machine learning algorithm includes:Random forest, gradient boosted tree, pole Hold gradient boosted tree XGBoost algorithms etc..Deep learning algorithm includes:Neural network model etc..
As shown in figure 4, the figure shows the internal structure for the device for establishing risk forecast model, device 400 includes:
It obtains and divides subelement 410, the history waybill data for obtaining predetermined time range draw history waybill data It is divided into training dataset and test data set.
The embodiment of the present application, the risk forecast model pre-established are to use engineering by a large amount of history waybill data Algorithm training is practised to learn.By obtain predetermined time range history waybill data, the predetermined time range can with 1 year or It 2 years, is not construed as limiting here.By the history waybill data of the predetermined time range according to scheduled ratio cut partition be training dataset And test data set, scheduled ratio could be provided as 7: 3 either 8: 2 or 9: 1 etc..The setting of ratio does not also limit here It is fixed, it can adjust according to demand, to obtain optimum efficiency.
Feature selecting subelement 420, for being selected from the characteristic parameter of history waybill data for establishing risk profile The characteristic parameter collection of model.
Before carrying out model training study, need to screen the characteristic parameter collection for establishing risk forecast model, feature The selection of parameter set is to abandon uncorrelated or redundancy feature to find optimal feature subset, to reduce Characteristic Number, Model accuracy is improved, run time is reduced.Feature Selection can be carried out to history waybill data set by filter type, also may be used To carry out subset search using randomized policy under Las Vegas's method frame, and using final result as characteristic set, Feature selecting and the fusion of machine learning training process can also be realized, over-fitting risk is reduced by norm regularization.
The embodiment of the present application summarizes, and can be selected from history waybill data set for establishing wind by machine learning algorithm The characteristic parameter collection of dangerous prediction model, this feature parameter set include at least one or more of:Post part user characteristics parameter Collection, waybill characteristic parameter subset;
Wherein, it includes following at least one to post part user characteristics subset of parameters:It posts the part time for the first time, post part for the last time Time, the last time addressee time, always posts number of packages amount, total addressee quantity, history support and posts object content, goes through at addressee time first time History support posts object content quantity etc., and wherein history support is posted object content and can be subdivided into again, and history addressee content and history post part content.
Wherein, waybill characteristic parameter subset includes following at least one:Way address is posted in source, area codes are posted in source, source is posted Ground city codes, source post earth mat vertex type, the source sites Ji Di longitude and latitude, destination-address, purpose area codes, destination city City's code, destination site type, destination site longitude and latitude, support post object weight, support posts object means of transportation, Payment Type, fortune Take the amount of money, haulage time etc..
Mapping relations establish subelement 430, are trained according to model training algorithm using training dataset and test data set Study is obtained for establishing the mapping relations between the characteristic parameter collection of risk forecast model and prediction result, which makees For risk forecast model.
The characteristic parameter collection for establishing risk forecast model is obtained by feature selecting subelement 420, utilizes training number The undetermined parameter of risk forecast model is determined according to collection training.For example, initial value and iterations that setting waybill is inspected by random samples, pass through Then the first derivative and second dervative of counting loss function traverse each tree structure, find the tree so that object function minimum Structure, and the optimal weights of each leaf node are calculated, the decision tree that iteration each time is established is overlapped, to be used In establishing the mapping relations between the characteristic parameter collection of risk forecast model and prediction result, the mapping relations are as risk profile Model.
Preferably, model training algorithm is extreme gradient boosted tree XGBoost algorithms, the algorithm be easily achieved it is distributed and Parallel computation is suitable for the large-scale dataset of data class relationship complexity mostly between data, can improve the sampling observation of waybill data Accuracy, effectively promoted express delivery sampling observation efficiency.
For carrying out model training study using XGBoost algorithms, history waybill data are divided into training dataset (accounting for 80%) and test data set (accounting for 20%) is trained under the model parameter of optimization using training data set pair model, It determines the structure of all decision trees, then concentrates, model is verified, and according to Optimality Criteria lift scheme in test data Volume accuracy, such as root-mean-square error (Root Mean Square Error, RMSE) may be used etc..Wherein, the master of model It includes the least disadvantage letter needed for learning rate, the depth capacity of tree, minimum leaf node sample weights and node-classification to want parameter L2 regularization terms etc. in number drop-out value, specimen sample rate, feature sample rate, iterations and area.It can be tested by intersecting The mode of card determines learning rate and best decision tree quantity, and can be effectively prevented over-fitting by regularization parameter tuning asks Topic.
Optionally, before obtaining division subelement 410, which further includes:Data processing unit, for going through History waybill data carry out data processing.
In the embodiment of the present application, data processing unit may include:
Optionally, data conversion subelement, for turning to above-mentioned data according to the data conversion rule being generated in advance It changes;
Wherein, the data conversion rule being generated in advance, can be by filtering out positive sample data, so from training data concentration Positive sample data are subjected to numeralization processing afterwards, study obtains the rule of positive sample value dataization processing, and the rule is as number According to transformation rule.Above-mentioned data include new waybill data and/or history waybill data.Rule are converted according to the data being generated in advance Then to new waybill data and/or history waybill data, or the training dataset divided by history waybill data and survey Examination data set is converted.
Optionally, data encoding subelement is encoded for aforementioned transformed data to be carried out one-hot;
Optionally, data compression subelement, for the data after afore-mentioned code to be carried out compression processing.Data compression is single Member is used for the data conversion after one-hot coding at libsvm formats, to realize the compression processing to data.
Above-mentioned data processing unit is utilized by data conversion subelement and is generated in advance after receiving new waybill data Data conversion dictionary (or be data conversion rule), new waybill data and/or history waybill data are converted.So One-hot coding is carried out to the transformation result of data conversion subunit via data encoding subelement again afterwards.One-hot coding, that is, one- Hot is encoded, and an also known as efficient coding mainly encodes each state using bit status register, Mei Gezhuan State has oneself independent deposit position, and only have when arbitrary one effectively.In the application task of actual machine learning In, the features of waybill data sometimes not always successive value, it may be possible to which some classification values, such as means of transportation can be divided into " land transportation " and " shipping " can mark as " land transportation ", " shipping "];Post part customer type can be divided into " monthly closing entry ", " member ", " individual traveler " can mark as " monthly closing entry ", " member ", " individual traveler "];Source post ground city codes can be divided into " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing " can mark as " Shenzhen ", " Guangzhou ", " Shanghai ", " Beijing "].For these features, it usually needs right It carries out feature digitlization.Some sample such as [" land transportation ", " member ", " Beijing "] is needed the spy of this classification value Sign digitlization, most straightforward approach is by the way of serializing:[0,1,3].But such characteristic processing can not directly be put Enter in machine learning algorithm.For above-mentioned problem, one-hot coding modes may be used to above-mentioned sample " [" land transportation ", " meeting Member ", " Beijing "] " encoded, " land transportation " is then encoded to [0001,0000], and " member " is encoded to [0000,0010,0000], " Beijing " is encoded to [0000,0000,0000,1000].
The above-mentioned digitized complete result of sample characteristics is:
[0001,0000,0000,0010,0000,0000,0000,0000,1000];
It can be abbreviated as [1,0,0,1,0,0,0,0,1], data that treated in this way, there are data to become very sparse The problem of.
It is above-mentioned sparse in order to solve the problems, such as, it can also be via data compression subelement to the data after one-hot coding Reconvert is at libsvm formats, and result is 11 after data compression subelement compresses features described above data:14:19:1.
About the data conversion after one-hot coding at libsvm formats, can convert as follows:
A) structural data is as shown in table 1 below:
label transport_type custom_type district
1 land member beijing
0 land pay_monthly shanghai
1 alr individual guangzhou
1 land individual Shenzhen
0 air member shanghai
0 air member shenzhen
Table 1
Restriction to each parameter in table 1 is it is to be understood that transport_type=[" land ", " air "] (transporter Formula:Transport_type, land transportation:Land, shipping:air)
Custom_type=[" pay_monthly ", " member ", " individual "] (posts part customer type: Custom_type, monthly closing entry:Pay_monthly, member:Member, individual traveler:Individual) district=[" Shenzhen ", " guangzhou ", " shanghai ", " beijing "] (post area in source:District, Shenzhen:Shenzhen, Guangzhou:Guangzhou, Shanghai:Shanghai, Beijing:beijing)
B) one-hot coding is carried out, as shown in table 2
Table 2
C) by the above results boil down to libsvm data formats, it can be seen that after data compression, it is 0 to have eliminated value Field, filling is respectively the 1st to the 9th field:
11:14:19:1
01:13:18:1
12:15:17:1
11:15:16:1
02:14:18:1
02:14:16:1
Device 300 and 400 is deployed on Spark clusters, each new waybill can be judged, and will determine that knot Fruit is transmitted to business front end.For example, the new waybill A received belongs to sampling observation object, system receives the prediction knot of the output of device 300 Then fruit generates warning information or the message information of other forms, push to business front end, after business front end obtains information, It feeds back to artificial or realizes that intelligently pushing to region is inspected by random samples, prompts the waybill to be inspected by random samples.
It after the embodiment of the present application is by predicting waybill data, is purposefully inspected by random samples, realizes waybill whole A transit link process is accurately checked, and is provided safeguard for the safety of logistic industry.
It should be appreciated that device 300, all units or module described in 400 and each step in the method with reference to 1-2 descriptions It is rapid corresponding.As a result, device 300,400 and list wherein included are equally applicable to above with respect to the operation and feature of method description Member, details are not described herein.Device 400 can realizes in advance in the browser of electronic equipment or other security applications, can also It is loaded into browser or its security application of electronic equipment by modes such as downloads.Corresponding units in device 300-400 It can be cooperated with the unit in electronic equipment to realize the scheme of the embodiment of the present application.
Below with reference to Fig. 5, it illustrates the calculating suitable for terminal device or server for realizing the embodiment of the present application The structural schematic diagram of machine system 500.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various actions appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
It is connected to I/O interfaces 505 with lower component:Importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 508 including hard disk etc.; And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 510, as needed in order to be read from thereon Computer program be mounted into storage section 508 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it is soft to may be implemented as computer for the process above with reference to Fig. 1-2 descriptions Part program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable Jie Computer program in matter, the computer program include the program code of the method for executing Fig. 1-2.In such implementation In example, which can be downloaded and installed by communications portion 509 from network, and/or from detachable media 511 It is mounted.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, the module, program segment, or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit or module involved in the embodiment of the present application can be realized by way of software, can also It is realized by way of hardware.Described unit or module can also be arranged in the processor, for example, can be described as: A kind of processor includes receiving unit, data processing unit and predicting unit.Wherein, the title of these units or module is at certain The restriction to the unit or module itself is not constituted in the case of kind, for example, receiving unit is also described as " for receiving The unit of new waybill data ".
As on the other hand, present invention also provides a kind of computer readable storage medium, the computer-readable storage mediums Matter can be computer readable storage medium included in device described in above-described embodiment;Can also be individualism, not The computer readable storage medium being fitted into equipment.There are one computer-readable recording medium storages or more than one journey Sequence, described program are used for executing the sampling observation object prediction method for being described in the application by one or more than one processor.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of sampling observation object prediction method, which is characterized in that this method includes:
Receive new waybill data;
Data processing is carried out to the waybill data;
The waybill data after data processing are inputted to the risk forecast model pre-established, output prediction result is used to indicate Whether the waybill data belong to sampling observation object.
2. according to the method described in claim 1, it is characterized in that,
The risk forecast model pre-established is obtained according to the training study of model training algorithm using history waybill data 's.
3. according to the method described in claim 2, it is characterized in that, described utilize history waybill data according to model training algorithm Training study, including:
The history waybill data are divided into training dataset and test number by the history waybill data for obtaining predetermined time range According to collection;
Characteristic parameter collection of the selection for establishing risk forecast model from the characteristic parameter of the history waybill data;
It is obtained according to the training study of model training algorithm using the training dataset and test data set pre- for establishing risk The mapping relations between the characteristic parameter collection and prediction result of model are surveyed, the mapping relations are as the risk forecast model.
4. according to the method described in claim 3, it is characterized in that, the characteristic parameter collection for establishing risk forecast model Including at least one or more of:Post part user characteristics subset of parameters, waybill characteristic parameter subset;
Wherein, the part user characteristics subset of parameters of posting includes following at least one:It posts the part time for the first time, post part for the last time Time, the last time addressee time, always posts number of packages amount, total addressee quantity, history support and posts object content, goes through at addressee time first time History support posts object quantity;
Wherein, waybill characteristic parameter subset includes following at least one:Way address is posted in source, area codes are posted in source, the source cities Ji Di City's code, source post earth mat vertex type, the source sites Ji Di longitude and latitude, destination-address, purpose area codes, destination city generation Code, destination site type, destination site longitude and latitude, support posts object weight, support posts object means of transportation, Payment Type, freight charges gold Volume, haulage time.
5. according to the method in claim 2 or 3, which is characterized in that the model training algorithm is extreme gradient boosted tree XGBoost algorithms.
6. according to the method described in claim 3, it is characterized in that, in the history waybill data for obtaining predetermined time range Later, this method includes:
The data processing is carried out to the history waybill data.
7. according to claim 1-6 any one of them methods, which is characterized in that the data processing includes following at least one Kind:
The data are converted according to the data conversion rule being generated in advance;
The transformed data are carried out one-hot to encode;
Data after the coding are subjected to compression processing.
8. a kind of sampling observation object prediction device, which is characterized in that this method includes:
Receiving unit, for receiving new waybill data;
Data processing unit, for carrying out data processing to the waybill data;
Predicting unit, for the waybill data after data processing to be inputted the risk forecast model pre-established, output is pre- It surveys result and is used to indicate whether the waybill data belong to sampling observation object.
9. device according to claim 8, which is characterized in that the device includes:
Model foundation unit, for obtaining the risk profile according to the training study of model training algorithm using history waybill data Model.
10. device according to claim 9, which is characterized in that the model foundation unit, including:
It obtains and divides subelement, the history waybill data for obtaining predetermined time range divide the history waybill data For training dataset and test data set;
Feature selecting subelement, for being selected from the characteristic parameter of the history waybill data for establishing risk forecast model Characteristic parameter collection;
Mapping relations establish subelement, for being trained according to model training algorithm using the training dataset and test data set Study is obtained for establishing the mapping relations between the characteristic parameter collection of risk forecast model and prediction result, which makees For the risk forecast model.
11. device according to claim 10, which is characterized in that after the acquisition divides subelement, the device packet It includes:
Data processing unit, for carrying out the data processing to the history waybill data.
12. according to the device described in any one of claim 8-11, which is characterized in that the data processing unit, including with Under an at least subelement:
Data conversion subelement, for being converted to the data according to the data conversion rule being generated in advance;
Data encoding subelement is encoded for the transformed data to be carried out one-hot;
Data compression subelement, for the data after the coding to be carried out compression processing.
13. a kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor Computer program, which is characterized in that the processor realizes the side as described in any in claim 1-7 when executing described program Method.
14. a kind of computer readable storage medium, is stored thereon with computer program, the computer program is used for:
The method as described in any in claim 1-7 is realized when the computer program is executed by processor.
CN201810232272.2A 2018-03-20 2018-03-20 Inspect object prediction method, apparatus, equipment and its storage medium by random samples Pending CN108520335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810232272.2A CN108520335A (en) 2018-03-20 2018-03-20 Inspect object prediction method, apparatus, equipment and its storage medium by random samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810232272.2A CN108520335A (en) 2018-03-20 2018-03-20 Inspect object prediction method, apparatus, equipment and its storage medium by random samples

Publications (1)

Publication Number Publication Date
CN108520335A true CN108520335A (en) 2018-09-11

Family

ID=63433786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810232272.2A Pending CN108520335A (en) 2018-03-20 2018-03-20 Inspect object prediction method, apparatus, equipment and its storage medium by random samples

Country Status (1)

Country Link
CN (1) CN108520335A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096967A (en) * 2019-04-10 2019-08-06 同济大学 A kind of road anger driver's hazardous act characteristic variable screening technique based on random forests algorithm
CN110288142A (en) * 2019-06-18 2019-09-27 国网上海市电力公司 A kind of engineering based on XGBoost algorithm is exceeded the time limit prediction technique
CN110569904A (en) * 2019-09-10 2019-12-13 福建榕基软件股份有限公司 method for constructing machine learning model and computer-readable storage medium
CN111985861A (en) * 2019-05-22 2020-11-24 顺丰科技有限公司 Logistics product light polishing coefficient management method and device and storage medium
CN113222663A (en) * 2021-05-11 2021-08-06 北京京东振世信息技术有限公司 Data generation method and device, terminal equipment and storage medium
CN116307273A (en) * 2023-05-17 2023-06-23 华中科技大学 Ship motion real-time forecasting method and system based on XGBoost algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701681A (en) * 2016-01-06 2016-06-22 北京京东尚科信息技术有限公司 Method and device used for predicting order amount
CN106651232A (en) * 2015-11-02 2017-05-10 阿里巴巴集团控股有限公司 Waybill number data analysis method and waybill number data analysis device
US20170161661A1 (en) * 2015-12-07 2017-06-08 Sap Se Advisor Generating Multi-representations of Time Series Data
CN107194407A (en) * 2017-05-18 2017-09-22 网易(杭州)网络有限公司 A kind of method and apparatus of image understanding
CN107292418A (en) * 2017-05-23 2017-10-24 顺丰科技有限公司 A kind of waybill is detained Forecasting Methodology
CN107657267A (en) * 2017-08-11 2018-02-02 百度在线网络技术(北京)有限公司 Product potential user method for digging and device
CN107766888A (en) * 2017-10-24 2018-03-06 众安信息技术服务有限公司 Data processing method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651232A (en) * 2015-11-02 2017-05-10 阿里巴巴集团控股有限公司 Waybill number data analysis method and waybill number data analysis device
US20170161661A1 (en) * 2015-12-07 2017-06-08 Sap Se Advisor Generating Multi-representations of Time Series Data
CN105701681A (en) * 2016-01-06 2016-06-22 北京京东尚科信息技术有限公司 Method and device used for predicting order amount
CN107194407A (en) * 2017-05-18 2017-09-22 网易(杭州)网络有限公司 A kind of method and apparatus of image understanding
CN107292418A (en) * 2017-05-23 2017-10-24 顺丰科技有限公司 A kind of waybill is detained Forecasting Methodology
CN107657267A (en) * 2017-08-11 2018-02-02 百度在线网络技术(北京)有限公司 Product potential user method for digging and device
CN107766888A (en) * 2017-10-24 2018-03-06 众安信息技术服务有限公司 Data processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
耿赟等: "实时LZW 压缩算法的FPGA实现", 《数字技术与应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096967A (en) * 2019-04-10 2019-08-06 同济大学 A kind of road anger driver's hazardous act characteristic variable screening technique based on random forests algorithm
CN111985861A (en) * 2019-05-22 2020-11-24 顺丰科技有限公司 Logistics product light polishing coefficient management method and device and storage medium
CN110288142A (en) * 2019-06-18 2019-09-27 国网上海市电力公司 A kind of engineering based on XGBoost algorithm is exceeded the time limit prediction technique
CN110288142B (en) * 2019-06-18 2023-02-28 国网上海市电力公司 XGboost algorithm-based engineering overrun prediction method
CN110569904A (en) * 2019-09-10 2019-12-13 福建榕基软件股份有限公司 method for constructing machine learning model and computer-readable storage medium
CN113222663A (en) * 2021-05-11 2021-08-06 北京京东振世信息技术有限公司 Data generation method and device, terminal equipment and storage medium
CN116307273A (en) * 2023-05-17 2023-06-23 华中科技大学 Ship motion real-time forecasting method and system based on XGBoost algorithm

Similar Documents

Publication Publication Date Title
CN108520335A (en) Inspect object prediction method, apparatus, equipment and its storage medium by random samples
CN106372402B (en) The parallel method of fuzzy region convolutional neural networks under a kind of big data environment
CN109064000A (en) The methods, devices and systems of natural resources audit
CN106485396A (en) A kind of safety in production hidden troubles removing system
CN112288247B (en) Soil heavy metal risk identification method based on space interaction relationship
Wu et al. Land Cover Mapping Based on Multisource Spatial Data Mining Approach for Climate Simulation: A Case Study in the Farming‐Pastoral Ecotone of North China
CN108428188A (en) Claims Resolution Risk Forecast Method, system, equipment and storage medium
KR102560208B1 (en) System for predicting freight rates of optimized import and export cargo transfort routes
CN113344050B (en) Lithology intelligent recognition method and system based on deep learning
CN110019519A (en) Data processing method, device, storage medium and electronic device
CN111797188B (en) Urban functional area quantitative identification method based on open source geospatial vector data
CN109472075A (en) A kind of base station performance analysis method and system
Kissling et al. Laserfarm–a high-throughput workflow for generating geospatial data products of ecosystem structure from airborne laser scanning point clouds
CN105243503A (en) Coastal zone ecological safety assessment method based on space variables and logistic regression
Bednarik et al. Different ways of landslide geometry interpretation in a process of statistical landslide susceptibility and hazard assessment: Horná Súča (western Slovakia) case study
Yan Evaluation method of ecological tourism carrying capacity of popular scenic spots based on set pair analysis method
CN108460690A (en) Claims Resolution Risk Forecast Method, system, equipment and storage medium
Sickmann et al. Fingerprinting construction sand-supply networks for traceable sourcing
Poorzady et al. Spatial and temporal changes of Hyrcanian forest in Iran
CN115271514A (en) Communication enterprise monitoring method and device, electronic equipment and storage medium
CN109410527A (en) Space weather disaster monitoring and pre-alarming method, system, storage medium and server
CN112100165B (en) Traffic data processing method, system, equipment and medium based on quality assessment
Chuangchang et al. Modelling urban growth over time using grid-digitized method with variance inflation factors applied to spatial correlation
CN114154789A (en) Construction land suitability evaluation method and device combining multi-source data
Donnini et al. National and regional-scale landslide indicators and indexes: Applications in Italy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180911