CN107153906A

CN107153906A - A kind of taxi illegal activities decision method and system

Info

Publication number: CN107153906A
Application number: CN201710169987.3A
Authority: CN
Inventors: 庞俊彪; 李恺; 黄晶; 黄庆明; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-03-21
Filing date: 2017-03-21
Publication date: 2017-09-12

Abstract

The embodiment of the present invention provides a kind of taxi illegal activities decision method and system.Methods described includes：Obtain the corresponding operation information to be determined of vehicle to be determined in preset time period；According to the operation information to be determined, the illegal probable value for obtaining the vehicle to be determined is calculated using default decision model；If judgement knows that the illegal probable value is more than predetermined threshold value, judge the vehicle to be determined as illegal vehicle.The system is used to perform methods described.The embodiment of the present invention calculates the illegal probable value for obtaining vehicle to be determined by using default decision model, and if judge or illegal probable value is more than predetermined threshold value, judge vehicle to be determined as illegal vehicle, improve the efficiency investigated to taxi illegal activities.

Description

A kind of taxi illegal activities decision method and system

Technical field

The present embodiments relate to technical field of intelligent traffic, more particularly to a kind of taxi illegal activities decision method and System.

Background technology

During urban development, being on the increase for urban population and vehicle fleet size exacerbates the office of traffic congestion Face.In order to alleviate traffic pressure, trip is convenient for people to, taxi has become people's out on tours, the main trip of work Means, therefore improve an urgent demand of operation organizational capacity and service level as Modern Urban Development for hiring out industry.

But it is due to that enterprise operation is lack of standardization, practitioner's quality is very different, the taxi-hailing software of " ticking " one class is rushed Hit, the influence for the factor such as fuel price goes up, and driven by interests, hire out industry integrally break rules and regulations illegal activities present it is occurred frequently become Gesture, the influence very severe caused.Therefore in face of huge taxi colony, increasing law enforcement dynamics, reinforcement supervision turns into One vital task of trade management.And emphasis investigation illegal vehicle, carrying out punishment to illegal vehicle can be in taxi driver Angle reduction illegal activities violating the regulations, have facilitation to the supervision of whole industry.

In the prior art, supervising or by manually being carried out one by one to passing vehicle on section to taxi Check, or supervised according to the complaint of passenger.Although most taxi can be runed as requested, occur Only a small number of vehicles of illegal activities violating the regulations, but investigation illegal vehicle, required people are screened in substantial amounts of taxi The consuming of power material resources is very big, it is very difficult to investigate, and the efficiency of investigation is very low.

Therefore, how to improve to the investigation efficiency of taxi illegal activities is problem nowadays urgently to be resolved hurrily.

The content of the invention

The problem of existing for prior art, the embodiment of the present invention provides a kind of taxi illegal activities decision method and is System.

On the one hand, the embodiment of the present invention provides a kind of taxi illegal activities decision method, including：

Obtain the corresponding operation information to be determined of vehicle to be determined in preset time period；

According to the operation information to be determined, calculated using default decision model and obtain the illegal general of the vehicle to be determined Rate value；

If judgement knows that the illegal probable value is more than predetermined threshold value, judge the vehicle to be determined as illegal vehicle.

On the other hand, the embodiment of the present invention provides a kind of taxi illegal activities decision-making system, including：

Acquisition module, for obtaining the corresponding operation information to be determined of the vehicle to be determined in preset time period；

Computing module, for according to the operation information to be determined, being calculated using default decision model and waiting to sentence described in obtaining Determine the illegal probable value of vehicle；

Determination module, if for judging to know that the illegal probable value is more than predetermined threshold value, judging the car to be determined Be illegal vehicle.

A kind of taxi illegal activities decision method provided in an embodiment of the present invention and system, by using default judgement mould Type calculates the illegal probable value for obtaining vehicle to be determined, and if judge or illegal probable value is more than predetermined threshold value, judgement is treated Vehicle is judged as illegal vehicle, improves the efficiency investigated to taxi illegal activities.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are this hairs Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is a kind of taxi illegal activities decision method schematic flow sheet provided in an embodiment of the present invention；

Fig. 2 is the frequency distribution histogram counted after carrying kilometres provided in an embodiment of the present invention are normalized；

Fig. 3 is test set vehicle illegal probability distribution graph provided in an embodiment of the present invention；

Fig. 4 is a kind of taxi illegal activities decision-making system structural representation provided in an embodiment of the present invention.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

Fig. 1 is a kind of taxi illegal activities decision method schematic flow sheet provided in an embodiment of the present invention, such as Fig. 1 institutes Show, methods described, including：

Step 101：Obtain the corresponding operation information to be determined of vehicle to be determined in preset time period；

Specifically, if to judge that the illegal probable value of illegal activities occurs for a certain taxi, this can be obtained and wait to sentence Determine to be determined operation information of the vehicle in preset time period, it is to be understood that the vehicle fortune of nearest 15 days can be taken Information is sought as operation information to be determined, its preset time period can be configured according to actual conditions, the embodiment of the present invention pair This is not especially limited.

Step 102：According to the operation information to be determined, calculated using default decision model and obtain the vehicle to be determined Illegal probable value；

Specifically, the operation information to be determined of the vehicle to be determined got is input in default decision model, can be with Calculate and obtain the corresponding illegal probable value of the vehicle to be determined, wherein illegal probable value is higher, then illustrate that the vehicle to be determined is got over Easily occur illegal activities.It should be noted that the default decision model can be decision tree or random forest, meanwhile, Suitable for other models, the embodiment of the present invention is not especially limited to this.

Step 103：If judgement knows that the illegal probable value is more than predetermined threshold value, judge the vehicle to be determined to disobey Method vehicle.

Specifically, the illegal probable value for calculating obtained vehicle to be determined is compared with predetermined threshold value, if waiting to sentence The illegal probable value for determining vehicle is more than predetermined threshold value, then judges that the vehicle to be determined, as illegal vehicle, is arranged to taxi When looking into, the vehicle is investigated emphatically.

The embodiment of the present invention calculates the illegal probable value for obtaining vehicle to be determined by using default decision model, and if sentencing Disconnected or illegal probable value is more than predetermined threshold value, then judges vehicle to be determined as illegal vehicle, improve to the illegal row of taxi For the efficiency of investigation.

On the basis of above-described embodiment, methods described also includes：

Obtain the first history operation information of all vehicles of the first preset number of days, the first history operation information bag Include：Air line distance information, carrying kilometres information, GPS track mileage information, deadhead kilometres information and the receipts of carrying starting point to the end Enter information；

The default decision model is trained according to the first history operation information, agreed-upon price decision model is obtained.

Specifically, in numerous illegal activities, agreed-upon price is one of typical illegal activities of taxi driver, because this Influence of the behavior to passenger is larger, and attention rate is higher, and law enfrocement official can check at the scene in differentiated.Wherein, institute It is exactly not show amount of money charge according to fee register that meaning, which is negotiated a price, but is directly conferred to passenger, asks for fare, and driver is logical when negotiating a price Often without using or less use fee register.Therefore judgement of the embodiment of the present invention to agreed-upon price behavior is described in detail.

All first history operation informations of all taxis of the first preset number of days are obtained, wherein the first preset number of days can Taxi operation history 15 days is thought, it is of course also possible to set preset number of days, the embodiment of the present invention pair according to actual conditions This is not especially limited.The air line distance information of first history operation information including carrying starting point to the end, carrying kilometres information, GPS track mileage information, deadhead kilometres information and income information.It should be noted that the first history operation information comes above From in the data source on taxi got, these data sources include taxi during operation gps data (including License plate number, GPS generation the time, longitude, latitude, passenger carrying status), fee register transaction data (including license plate number, income, merchandise when Between, carrying kilometres, deadhead kilometres, pick-up time), approval system data (including license plate number, single Straight Run mark), car hires a car complaint Data (including license plate number, complaining type, complain time) and taxi violation data (including it is license plate number, the review time, violating the regulations Behavior).

Can be seen that license plate number according to above-mentioned data source is shared attribute, therefore, using license plate number as index foundation, by one The data of different data sources are associated under traffic-operating period of car, and rejecting abnormalities data (such as longitude and latitude is 0), and will be thrown Tell that the data for being not belonging to negotiate a price in type are weeded out, obtain some valid data of single-car single-time operation, these data are referred to as the One history operation indicator.Wherein, the first history operation indicator includes：License plate number, GPS generation times, longitude, latitude, carrying shape State, carrying kilometres, deadhead kilometres, income.Above same operation is performed to all traffic-operating periods of all vehicles, each car is obtained The first each history operation indicator.The first history operation information is obtained from the first history operation indicator, and the first history is transported Battalion's information includes air line distance information, carrying kilometres information, GPS track mileage information, the deadhead kilometres of carrying starting point to the end Information and income information.

The corresponding first history operation information of a certain car that history is runed into 15 days constitutes a training data, owns First history operation information of vehicle is constituted in whole training data, all vehicles, the illegal vehicle discovered and seized from law enfrocement official First 15 days of illegal date, this 15 days can be discontinuous, but the time for having the first history operation information must expire 15 My god, and a period of time before discovering and seizing occur the possibilities of illegal activities can be bigger, therefore, what is got at law enfrocement official disobeys The first 15 days corresponding first history operation informations on method vehicle date are set as training set positive sample, and for training set positive sample It is 1 to put label value；From history first history of 15 days of the corresponding all vehicles of taxi company of Beijing Taxi Star Operation information, as training set negative sample, is that training set negative sample sets label value to be 0.Transported according to the first history of all vehicles Battalion's information is trained to default decision model.

Its default decision model can select each decision point in decision tree, decision tree to represent a kind of first history operation Information, after all decision-making has been done to all decision points, each leaf node represents a kind of classification, and the category is illegal Vehicle or non-illegal vehicle, it is decision tree prediction that leaf node of all segmentation conditions is met in all leaf nodes Illegal vehicle, other leaf nodes obtain agreed-upon price decision model after the completion of representing non-illegal vehicle, training.It should be noted that Default decision model can also select random forest, and the embodiment of the present invention is not especially limited to this.

A number of training set positive sample and training set negative sample can be randomly selected as checking and collect sample, checking collection Sample is used for verifying agreed-upon price decision model, and the parameter in agreed-upon price decision model is adjusted with this.

There are four kinds of situations in result of determination：Illegal vehicle is determined as illegal vehicle (true positive, tp), will be non- Illegal vehicle is determined as illegal vehicle (false positive, fp), illegal vehicle is determined as into non-illegal vehicle (false Negative, fn), non-illegal vehicle is determined as to non-illegal vehicle (true negative, tn).Illegal vehicle judges accurate Rate is to assess the ratio in the illegal vehicle result judged really shared by illegal vehicle, and ratio is more high more accurate, wherein illegal Vehicle determination rate of accuracy=tp/ (tp+fp).Non- illegal vehicle determination rate of accuracy is to assess true in the non-illegal vehicle result judged Ratio shared by positive non-illegal vehicle, ratio is more high more accurate, wherein non-illegal vehicle determination rate of accuracy=tn/ (tn+fn).

Furthermore it is possible to select all taxis in the full Beijing first history operation information of 15 days as test set sample This, tests the agreed-upon price decision model for completing training, and test set sample is used for the performance for testing agreed-upon price decision model, will discuss Valency decision model judges obtained illegal vehicle composition illegal vehicle storehouse, and the statistics all cars in illegal vehicle Ku Zhan Beijing are hired a car Ratio, this ratio is more low better.

The embodiment of the present invention is used as training by obtaining the first history operation information of all vehicles in the first preset number of days Data, are trained to default decision model, and obtain agreed-upon price decision model, can be with by the agreed-upon price decision model for completing training The illegal probable value of vehicle to be determined is predicted, the degree of accuracy of prediction is improved so that law enfrocement official is according to illegal probable value pair Vehicle is investigated, and the efficiency of investigation is improved while the workload of investigation is reduced.

On the basis of above-described embodiment, the straight line of the operation information to be determined including the carrying starting point to the end away from From information, the carrying kilometres information, the GPS track mileage information, the deadhead kilometres information and the income information；

Correspondingly, the illegal probable value that the acquisition vehicle to be determined is calculated using default decision model, including：

The illegal probable value for obtaining the vehicle to be determined is calculated using the agreed-upon price decision model.

Specifically, to predict a vehicle to be determined its illegal probable value for occurring agreed-upon price illegal activities, then obtain The operation information to be determined arrived includes the air line distance information, the carrying kilometres information, the GPS rails of carrying starting point to the end Mark mileage information, the deadhead kilometres information and the income information, above- mentioned information are input to the agreed-upon price trained and judge mould In type, agreed-upon price decision model can not only export the corresponding classification of sample (i.e. illegal vehicle, non-illegal vehicle), can also export such Not corresponding probable value, can be determined that how many probability of vehicle to be determined belongs to illegal vehicle according to classification and probable value.Calculate Obtain the illegal probable value of vehicle to be determined.

The embodiment of the present invention calculates the illegal probable value for obtaining vehicle to be determined by agreed-upon price decision model, illegal according to this Probable value is targetedly investigated to taxi, improves the efficiency of investigation.

On the basis of above-described embodiment, methods described also includes：

Obtain the corresponding second history operation information of all vehicles of the second preset number of days, the second history operation information Including：Single Straight Run flag information, distance travelled information, empty driving are than information, service time information, operation number information, average fortune Away from information, income information and average income information；

Model training is carried out to the default decision model according to the second history operation information, acquisition generation drives judgement mould Type.

Specifically, in numerous illegal activities, generation drive be also be one of typical illegal activities of taxi driver because Influence of this behavior to passenger is larger, and attention rate is higher, and law enfrocement official can check at the scene in differentiated.Its In, generation, which drives, gives other people on behalf of driving, and each taxi all corresponds to a driver, the feelings that driver is not consistent with vehicle In condition referred to as generation, drives.The judgement that the embodiment of the present invention drives this illegal activities to generation is described in detail.

Obtain the corresponding second history operation information of all vehicles of the second preset number of days, it should be noted that Ke Yiqu The corresponding second history operation information of all vehicles of 30 days in historical data, the second history operation information includes：Single Straight Run Flag information, distance travelled information, empty driving are than information, service time information, operation number information, averge distance carried information, income Information and average income information.

It should be noted that information above is both from the data source on taxi got, these data source bags Include hire a car during operation gps data (including license plate number, GPS generation the time, longitude, latitude, passenger carrying status), valuation Device transaction data (including license plate number, income, exchange hour, carrying kilometres, deadhead kilometres, pick-up time), approval system data (including license plate number, single Straight Run mark), car, which is hired a car, complains data (including license plate number, complaining type, complaint time) and hires out Car violation data (including license plate number, review time, act of violating regulations).

Can be seen that license plate number according to above-mentioned data source is shared attribute, therefore, using license plate number as index foundation, by one The data of different data sources are associated under traffic-operating period of car, and rejecting abnormalities data (such as longitude and latitude is 0), and will be thrown Tell and be not belonging to weed out for the data driven in type, obtain some valid data of single-car single-time, these data are referred to as second and gone through History operation indicator.Wherein, the second history operation indicator includes：License plate number, single Straight Run mark, pick-up time, exchange hour, carrying Mileage, deadhead kilometres, income.Many single carrying kilometres sums, many single incomes can be derived by above-mentioned second history operation indicator Sum and operation number of times (the operation odd number of bicycle repeatedly in a period of time), obtain second according to the second history operation indicator and go through History operation information, when the second history operation information includes single Straight Run flag information, distance travelled information, empty driving than information, operation Between information, operation number information, averge distance carried information, income information and average income information.

Similarly, illegal vehicle is chosen at law enfrocement official and is occurring illegal incidents date first three ten days second history fortune Information is sought as training set positive sample, its label value is 1；From Beijing Taxi Star second history operation information of 30 days As training set negative sample, label value is set to 0.

Model training is carried out to default decision model according to the second history operation information, its default decision model can be selected Each decision point represents a kind of second history operation information in decision tree, decision tree, is all determined when to all decision points After plan, each leaf node represents a kind of classification, and the category is illegal vehicle or non-illegal vehicle, in all leaf segments That leaf node for meeting all segmentation conditions in point is the illegal vehicle that decision tree is predicted, other leaf nodes represent non-disobey Obtained after the completion of method vehicle, training generation drive decision model.It should be noted that default decision model can also be from random gloomy Woods, the embodiment of the present invention is not especially limited to this.

A number of training set positive sample and training set negative sample can be randomly selected as checking and collect sample, checking collection Sample is used for verifying that generation drives decision model, is adjusted with this for the parameter driven in decision model.

There are four kinds of situations in result of determination：Illegal vehicle is determined as illegal vehicle (true positive, tp), will be non- Illegal vehicle is determined as illegal vehicle (false positive, fp), illegal vehicle is determined as into non-illegal vehicle (false Negative, fn), non-illegal vehicle is predicted as to non-illegal vehicle (true negative, tn).Illegal vehicle judges accurate Rate is to assess the ratio in the illegal vehicle result judged really shared by illegal vehicle, and ratio is more high more accurate, wherein illegal Vehicle determination rate of accuracy=tp/ (tp+fp).Non- illegal vehicle determination rate of accuracy is to assess true in the non-illegal vehicle result judged Ratio shared by positive non-illegal vehicle, ratio is more high more accurate, wherein non-illegal vehicle determination rate of accuracy=tn/ (tn+fn).

Furthermore it is possible to select all taxis in the full Beijing second history operation information of 30 days as test set sample This, drives decision model to the generation for completing training and tests, and test set sample is used for the performance for driving decision model in test generation, will generation The illegal vehicle composition illegal vehicle storehouse that decision model judges to obtain is driven, the statistics all cars in illegal vehicle Ku Zhan Beijing are hired a car Ratio, this ratio is more low better.

The embodiment of the present invention is used as training by obtaining the second history operation information of all vehicles in the second preset number of days Data, are trained to default decision model, and obtain generation and drive decision model, by complete the generation of training drive decision model can be with The illegal probable value of vehicle to be determined is predicted, the degree of accuracy of prediction is improved so that law enfrocement official is according to illegal probable value pair Vehicle is investigated, and the efficiency of investigation is improved while the workload of investigation is reduced.

On the basis of above-described embodiment, the operation information to be determined includes single Straight Run flag information, the row Sail mileage information, the empty driving than information, the service time information, the operation number information, the averge distance carried information, The income information and the average income information；

The illegal probable value that decision model calculates the acquisition vehicle to be determined is driven using the generation.

Specifically, to predict that a vehicle to be determined it occurs for the illegal probable value for driving illegal activities, then to obtain The operation information to be determined arrived includes single Straight Run flag information, distance travelled information, empty driving than information, service time information, fortune Number information, averge distance carried information, income information and average income information are sought, above- mentioned information is input to the generation trained drives and sentence In cover half type, generation, which drives decision model, can not only export the corresponding classification of sample (i.e. illegal vehicle, non-illegal vehicle), can also export The corresponding probable value of the category, can be determined that how many probability of vehicle to be determined belongs to illegal vehicle according to classification and probable value. Calculate the illegal probable value for obtaining vehicle to be determined.

The embodiment of the present invention drives the illegal probable value that decision model calculates acquisition vehicle to be determined by generation, illegal according to this Probable value is targetedly investigated to taxi, improves the efficiency of investigation.

It is described that the default decision model is entered according to the first history operation information on the basis of above-described embodiment Row training, obtains agreed-upon price decision model, including：

Each car, each first history operation information got is normalized, normalizing is obtained Change the first history operation information；The first history operation information of the normalization is grouped, each group corresponding first is obtained The frequency of history operation information；

By the corresponding frequency composing training data of the first history operation information of all vehicles；

The default decision model is trained according to the training data, agreed-upon price decision model is obtained.

Specifically, multiple first history operation informations of the first preset number of days of a car constitute a training data, tool Body is that each first history operation information of the first preset number of days of a car is normalized, is normalized First history operation information；The first history operation information of normalization is grouped, the group number divided can enter according to actual conditions Row setting, but the group number of all the first history of normalization operation informations point should be identical, so as to get each group The frequency of corresponding first history operation information.

By taking the air line distance information of carrying starting point to the end as an example, Fig. 2 is that carrying kilometres provided in an embodiment of the present invention are returned The frequency distribution histogram counted after one change, as shown in Fig. 2 obtaining the first history of history all taxis of 15 days first Operation information, selects wherein one taxi multiple first history operation informations of corresponding 15 days from all taxis, Carrying kilometres information, i.e. this taxi history are selected from the corresponding multiple first history operation informations of this taxi again 15 days corresponding carrying kilometres information, the carrying kilometres information is normalized acquisition the first history operation of normalization Information, normalizes the first history operation information=Value/Max Value, and wherein Value is current mileage information, Max Value is the maximum in multiple carrying kilometres information.Assuming that the first history operation information of normalization is divided into 100 groups, statistics Normalize the frequency of the first history operation information, it is possible to the frequency counted after the carrying kilometres normalization shown in drafting pattern 2 Distribution histogram.Aforesaid operations are all carried out to other the first history operation informations of the vehicle, vehicle correspondence can be got All first history operation informations and the corresponding frequency of the first history operation information, the vehicle corresponding all first is gone through The corresponding frequency of history operation information constitutes one group of training data.Same method carries out the processing of the above method to other vehicles, Multigroup training data is obtained, default decision model is trained using multigroup training data, agreed-upon price decision model is obtained.

The embodiment of the present invention to the first history operation information by being normalized and counting each first history Operation information in each group of frequency, obtain multigroup training data, default decision model carried out using multigroup training data Training, so as to obtain agreed-upon price decision model, improves the accuracy of agreed-upon price decision model output result.

It is described that the default decision model is entered according to the second history operation information on the basis of above-described embodiment In row model training, acquisition generation, drives decision model, including：

Packet transaction is carried out to second preset number of days；

Z-Score is carried out to all vehicles, all groups of number of days, each second history operation informations got Standardization, obtains training data；

The default decision model is trained according to the training data, acquisition generation drives decision model.

Specifically, packet transaction is carried out to the second preset number of days, specifically, obtain all vehicles of 30 days second goes through History operation information, selects the wherein one taxi second history operation information of corresponding 30 days from all taxis, will The second history operation information on all Mondays is as one group in 30 days, similarly, by Tuesday, Wednesday ..., Sunday Second history operation information is respectively as one group.Same method carries out the processing of the above method to other vehicles.By 30 days In, the second history operation information (in addition to single Straight Run flag information) in all weeks of all cars carries out Z-Score standards Change is handled, by taking distance travelled information as an example, and specific standardization formula is：(Value- μ)/σ, wherein Value are current driving Mileage information, μ is the average of all distance travelled information, and σ is the standard deviation of all distance travelled information, so as to obtain training number According to.Default decision model is trained according to the training data, acquisition generation drives decision model.It should be noted that second is default Number of days can be configured according to actual conditions, and the embodiment of the present invention is not especially limited to this.

The embodiment of the present invention carries out Z-Score marks by the second history operation information of all groups of number of days to all vehicles Quasi-ization processing, so as to obtain training data, carries out model training to default decision model according to the training data, improves and treat Judge the accuracy that vehicle judges.

On the basis of the various embodiments described above, it is described according to the first history operation information to the default decision model It is trained, including：

According to the first history operation information, the default decision model is entered using cross validation and/or boot strap Row training.

Specifically, when being trained to default decision model, it can be tested by the first history operation information using intersection Card and/or boot strap are trained to default decision model.

Wherein, the method for cross validation is：The data of predetermined number are chosen from training set as checking collection sample, are commonly used Such as 10 folding cross validations, i.e. training data is divided into 10 parts, in turn will wherein 9 parts as training set sample, 1 part of conduct is tested Card collection sample, the average of 10 results is used as final training result.Sometimes also need to the multiple 10 folding cross validation of progress and ask equal Value, such as 10 times 10 folding cross validations, so as to more be stablized, reliably preset decision model.

The specific method of boot strap is：Training set positive sample is made training set negative sample as initial positive sample first For initial negative sample, an initial preset decision model is trained using initial positive sample and initial negative sample, quilt is then collected Negative sample (is categorized as positive sample, in embodiments of the present invention, as will by the negative sample of initial preset decision model mistake classification Non- illegal vehicle is determined as illegal vehicle) form the difficult example collection of negative sample.The difficult example collection of negative sample is added untrained Negative sample forms new negative sample collection, and positive sample collection keeps constant and trains new default decision model, and the above method can be weighed It is multiple to carry out multiple, the final default decision model of acquisition.

Judge that the embodiment of this illegal activities of agreed-upon price is as follows：

During in January, 2016 to September, the first history operation of ten five day letter of the illegal vehicle before the date is discovered and seized Breath is as training set positive sample, and it is 1 to set label value, from Beijing Taxi Star between August in 2016 15 days to September 15 days The first history operation information of 15 days is as training set negative sample, and it is 0 to set label value；From August in 2016 15 days to September All taxis in Beijing first history operation information of 15 days is used as test set sample between 15 days.Due to every in each car One the first history operation information has been divided into 100 groups, therefore the dimension of each the first history operation information after normalization Spend for 100, because the first history operation information includes the air line distance information, carrying kilometres information, GPS of carrying starting point to the end Track mileage information, deadhead kilometres information and income information five, so the dimension of each car training data is 500, training set 42 cars of positive sample, 507 cars of training set negative sample, test set sample is 51874 cars.

Following table is many days corresponding first history operation indicators of some illegal vehicle bicycle, and it is as shown in the table：

The frequency distribution histogram counted after the carrying mileage information normalization in many days of some illegal vehicle is as shown in Figure 2.Will The illegal vehicle all first history operation informations of many days constitute one group of training data, and default decision model is trained, Random forest can be used to be trained, wherein the depth set is 2, the quantity of tree is 15, and training set positive sample keeps constant, instruction Practice the data that collection negative sample takes 43 cars so that positive and negative sample proportion is 1:1, using the method for five folding cross validations, it will train Data are divided into five parts, in turn will wherein four parts as training set sample, portion is as checking collection sample, and five times result is averaged Value.It is to the training result of training set sample：Tp=32, fp=5, it is possible to draw, illegal vehicle predictablity rate is 86%, non-illegal vehicle predictablity rate is 77%.Result with test set detection model is：In order to which arresting for maximum possible is separated Method vehicle, limits prediction vehicle probable value and is just determined as illegal vehicle more than 0.6, Fig. 3 surveys to be provided in an embodiment of the present invention Examination collection vehicle illegal probability distribution graph, therefore, it can show that illegal vehicle proportion is 6.7%.

The embodiment for the judgement agreed-upon price illegal activities that the present invention provides for another embodiment：

In order to improve tp, fp is reduced, boot strap can be used, the positive and negative sample proportion of training set is 1:1, entered using decision tree Row training, the depth of tree is 4, type weight parameter selection balanced.Boot strap all retention forecastings each time in training process The fp that training set is obtained, and the fp that the negative sample of training is obtained is had neither part nor lot in the judgement of default decision model, keeping training set just Negative sample ratio is 1:1, two obtained class fp reformulation training set negative samples are trained.It can be repeated several times above-mentioned dynamic Make, be to the training result of training set finally：Tp=43, fp=1, illegal vehicle determination rate of accuracy are 98%, non-illegal vehicle Determination rate of accuracy is 100%.It should be noted that the constituted mode of training set positive sample and training set negative sample and above-mentioned implementation Example is consistent, and the embodiment of the present invention is repeated no more to this.

The embodiment of the present invention is trained by cross validation and/or boot strap to default decision model, so as to obtain more Plus stably, reliably preset decision model, improve the accuracy of output.

On the basis of the various embodiments described above, it is described according to the second history operation information to the default decision model Model training is carried out, including：

According to the second history operation information, affiliated default decision model is entered using cross validation and/or boot strap Row training.

Specifically, when being trained to default decision model, it can be tested by the second history operation information using intersection Card and/or boot strap are trained to default decision model.The wherein operating method of cross validation and boot strap and above-mentioned implementation Example is consistent, and here is omitted.

The embodiment of the present invention is as follows to judge that generation drives the embodiment of illegal activities：

The building mode of the training data provided according to above-described embodiment builds training data, from January, 2015 extremely During in Septembers, 2016, the second history operation information of three ten day of the illegal vehicle before the date is discovered and seized obtains all illegal cars Training set positive sample, and set label value be 1.From in January, 2015 between in September, 2016, Beijing Taxi Star Second history operation information of 30 days of taxi company, same mode obtains training set negative sample, and setting label value is 0.Test set sample is used as from all taxis in the 2016 Nian9Yue Beijing second history operation information of 30 days.From training Collect and 67 positive sample data are selected in positive sample, 300 negative sample data composition checking collection samples are selected from training set negative sample This, remaining 100 positive sample data and 400 negative sample data composition training datas, test set sample have 60186.Can To be trained using decision tree, wherein, the depth of tree is 3, and the positive and negative sample proportion of training set is 1:4.Tested with checking collection sample The result of the default decision model of card is：Tp=52, fp=13, illegal vehicle determination rate of accuracy are 80%, and non-illegal vehicle judges Accuracy rate is 95%.With test set test sample preset decision model result be：Illegal vehicle proportion is 6.02%.

Fig. 4 is a kind of taxi illegal activities decision-making system structural representation provided in an embodiment of the present invention, such as Fig. 4 institutes Show, the system includes：Acquisition module 401, computing module 402 and determination module 403, wherein：

Acquisition module 401 is used to obtain the corresponding operation information to be determined of vehicle to be determined in preset time period；Calculate Module 402 is used for according to the operation information to be determined, is calculated using default decision model and obtains disobeying for the vehicle to be determined Method probable value；If determination module 403 is used to judge to know that the illegal probable value is more than predetermined threshold value, judge described to be determined Vehicle is illegal vehicle.

Specifically, if to judge that the illegal probable value of illegal activities occurs for a certain taxi, acquisition module 401 can Obtain to be determined operation information of the vehicle to be determined in preset time period, it is to be understood that the vehicle can be taken nearest The operation information of one day can be configured as operation information to be determined, its preset time period according to actual conditions, the present invention Embodiment is not especially limited to this.The operation information to be determined of the vehicle to be determined got is input to by computing module 402 In default decision model, the corresponding illegal probable value of the acquisition vehicle to be determined can be calculated, wherein illegal probable value is higher, then Illustrate the easier generation illegal activities of vehicle to be determined.It should be noted that the default decision model can for decision tree or Person's random forest, meanwhile, other models are also applied for, the embodiment of the present invention is not especially limited to this.Determination module 403 will be counted The illegal probable value of obtained vehicle to be determined is compared with predetermined threshold value, if the illegal probable value of vehicle to be determined is big In predetermined threshold value, then the vehicle to be determined is judged as illegal vehicle, when being investigated to taxi, emphatically to the vehicle Investigated.

The embodiment for the system that the present invention is provided specifically can be used for the handling process for performing above-mentioned each method embodiment, its Function will not be repeated here, and be referred to the detailed description of above method embodiment.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in a computer read/write memory medium, the program Upon execution, the step of including above method embodiment is performed；And foregoing storage medium includes：ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.

The embodiments such as system described above are only schematical, wherein the unit illustrated as separating component It can be or may not be physically separate, the part shown as unit can be or may not be physics list Member, you can with positioned at a place, or can also be distributed on multiple NEs.It can be selected according to the actual needs In some or all of module realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying creativeness Work in the case of, you can to understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can Realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Understood based on such, on The part that technical scheme substantially in other words contributes to prior art is stated to embody in the form of software product, should Computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some fingers Order is to cause a computer equipment (can be personal computer, server, or network equipment etc.) to perform each implementation Method described in some parts of example or embodiment.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that：It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic； And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims

1. a kind of taxi illegal activities decision method, it is characterised in that including：

According to the operation information to be determined, the illegal probability for obtaining the vehicle to be determined is calculated using default decision model Value；

2. according to the method described in claim 1, it is characterised in that methods described also includes：

The first history operation information of all vehicles of the first preset number of days is obtained, the first history operation information includes：Carry Air line distance information, carrying kilometres information, GPS track mileage information, deadhead kilometres information and the income letter of objective starting point to the end Breath；

3. method according to claim 2, it is characterised in that the operation information to be determined includes the carrying starting point extremely The air line distance information of terminal, the carrying kilometres information, the GPS track mileage information, the deadhead kilometres information and institute State income information；

4. according to the method described in claim 1, it is characterised in that methods described also includes：

Obtain the corresponding second history operation information of all vehicles of the second preset number of days, the second history operation information bag Include：Single Straight Run flag information, distance travelled information, empty driving are than information, service time information, operation number information, averge distance carried Information, income information and average income information；

Model training is carried out to the default decision model according to the second history operation information, acquisition generation drives decision model.

5. method according to claim 4, it is characterised in that the operation information to be determined includes single Straight Run mark Information, the distance travelled information, the empty driving are than information, the service time information, the operation number information, described flat Equal haul distance information, the income information and the average income information；

6. method according to claim 2, it is characterised in that it is described according to the first history operation information to described pre- If decision model is trained, agreed-upon price decision model is obtained, including：

Each car, each first history operation information got is normalized, normalization the is obtained One history operation information；The first history operation information of the normalization is grouped, each group of corresponding first history is obtained The frequency of operation information；

7. method according to claim 4, it is characterised in that it is described according to the second history operation information to described pre- If decision model carries out model training, in acquisition generation, drives decision model, including：

Packet transaction is carried out to second preset number of days；

Z-Score standards are carried out to all vehicles, all groups of number of days, each second history operation informations got Change is handled, and obtains training data；

8. the method according to claim 2,3 or 6, it is characterised in that described according to the first history operation information pair The default decision model is trained, including：

According to the first history operation information, the default decision model is instructed using cross validation and/or boot strap Practice.

9. the method according to claim 4,5 or 7, it is characterised in that described according to the second history operation information pair The default decision model carries out model training, including：

According to the second history operation information, affiliated default decision model is instructed using cross validation and/or boot strap Practice.

10. a kind of taxi illegal activities decision-making system, it is characterised in that including：

Computing module, for according to the operation information to be determined, being calculated using default decision model and obtaining the car to be determined Illegal probable value；

Determination module, if for judge know the illegal probable value be more than predetermined threshold value, judge the vehicle to be determined as Illegal vehicle.