CN112257869A

CN112257869A - Fake-licensed car analysis method and system based on random forest and computer medium

Info

Publication number: CN112257869A
Application number: CN202011052872.4A
Authority: CN
Inventors: 林伊龙; 熊赟; 王鹏飞; 夏曙东
Original assignee: CHINA TRANSINFO TECHNOLOGY CORP
Current assignee: CHINA TRANSINFO TECHNOLOGY CORP
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-22

Abstract

The embodiment of the application provides a fake-licensed vehicle analysis method, a system and a computer medium based on a random forest, wherein urban vehicle passing data and fake-licensed vehicle passing data in a preset time period are obtained; preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles. According to the method, the characteristics of the vehicle passing data are extracted by utilizing the daily massive vehicle passing data of the cities stored in the big data, the random forest algorithm in machine learning is combined for analysis, and the fake-licensed vehicle analysis result can be immediately obtained after each piece of data is input into a trained model. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

Description

Fake-licensed car analysis method and system based on random forest and computer medium

Technical Field

The application belongs to the technical field of computer identification, and particularly relates to a fake-licensed car analysis method and system based on random forests and a computer medium.

Background

With the development of cities and traffic, vehicle pictures acquired on various roads and important areas play an important role in maintaining public security. The method is particularly important for analyzing and identifying the fake-licensed vehicle in the vehicle violation.

The existing fake-licensed car analysis method mainly comprises two types: one is that, according to a certain data, for example, the feature data of a fake-licensed vehicle, the stored data of passing the vehicle is continuously and circularly compared to judge whether the time-space abnormal data exists or not until the analysis result is obtained; and the other method is that the license plate image characteristics are extracted according to the current vehicle passing picture, the license plate image characteristics extracted from other vehicle passing pictures are compared according to an algorithm or a model, then each vehicle passing picture is compared one by one, and the continuous cyclic comparison is carried out, so that a large amount of system calculation power is consumed, the input cost is not in direct proportion to the result output under the condition that the fake-licensed vehicle data is few, and the practical applicability is poor.

Disclosure of Invention

The invention provides a fake-licensed car analysis method, a fake-licensed car analysis system and a computer medium based on random forests, and aims to solve the problems that a large amount of license plate data needs to be compared circularly, the efficiency is low, and the accuracy is low in the existing fake-licensed car analysis method.

According to a first aspect of the embodiments of the present application, a method for analyzing a fake-licensed vehicle based on a random forest is provided, which specifically includes the following steps:

obtaining city vehicle passing data and fake-licensed vehicle passing data in a preset time period;

preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set;

carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model;

and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles.

In some embodiments of the present application, the vehicle passing data and the fake-licensed vehicle passing data are preprocessed to obtain a vehicle sample set and a tag set, and the method specifically includes the following steps:

obtaining vehicle attribute characteristics of city passing data in a preset time period to obtain a data set,

wherein the vehicle attribute features include valid data features and meaningless data features;

labeling the vehicles in the data set according to the passing data of the fake-licensed vehicles to obtain a label set containing whether the vehicles are the labels of the fake-licensed vehicles;

removing invalid vehicle data sets in the data sets to obtain a vehicle sample set;

the invalid vehicle data set refers to a set of vehicle data without valid data characteristics or missing key valid characteristic data.

In some embodiments of the present application, after obtaining the data set, the method further includes performing valid data feature supplementation on the vehicle with the missing part of the valid data feature, so as to obtain a complete valid data feature of the vehicle.

In some embodiments of the present application, the specific steps for performing valid data feature supplementation are as follows: acquiring a first valid data characteristic of a vehicle with a part of the valid data characteristic missing;

and matching the vehicle archive information and the vehicle annual inspection information according to the first effective data characteristics, and supplementing the effective data characteristics with partial missing effective data characteristics.

In some embodiments of the present application, after the data set is acquired, invalid feature information of vehicles in the data set is eliminated, and valid feature information of vehicles in the data set is retained.

In some embodiments of the present application, the valid data characteristics include a number plate number, a number plate type, a body color, a time of occurrence, a place of occurrence, and/or a vehicle speed; meaningless data characteristics include data ID, equipment manufacturer, lane and/or direction of travel.

In some embodiments of the present application, a random forest model is constructed according to a vehicle sample set and a label set, and model training is performed to obtain a trained random forest model, which specifically includes the following steps:

s1: randomly selecting the vehicle sample set and the label set with the release function to obtain a training set and a testing set;

s2: constructing a random forest model according to the number of samples and the number of characteristics of the training set;

s3: inputting a training set to the random forest model for training to obtain a trained random forest model;

s4: inputting a test set to the trained random forest model to obtain a model score;

s5: performing parameter adjustment on the trained random forest model according to the model score to obtain a new random forest model;

s6: and repeating the steps S1, S2, S3, S4 and S5 to train the model until the score of the model is larger than or equal to the score threshold value, terminating the training when the model is over-fitted on the training set, and finally obtaining the trained random forest model.

According to a second aspect of the embodiments of the present application, there is provided a fake-licensed vehicle analysis system based on a random forest, specifically including:

a vehicle data acquisition module: the system is used for acquiring city vehicle passing data and fake-licensed vehicle passing data in a preset time period;

a data preprocessing module: the system comprises a vehicle sample set and a label set, wherein the vehicle sample set and the label set are used for preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set;

a random forest model training module: the system is used for training a random forest model according to a vehicle sample set and a label set to obtain a trained random forest model;

fake-licensed car analysis module: and the method is used for inputting the data of the vehicles passing to be analyzed into the trained random forest model to obtain the analysis result of the fake-licensed vehicles.

According to a third aspect of the embodiments of the present application, there is provided a fake-licensed vehicle analysis device based on a random forest, including:

a memory: for storing executable instructions; and

and the processor is connected with the memory to execute the executable instructions so as to complete the random forest based fake-licensed car analysis method.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon; a computer program is executed by a processor to implement a random forest based fake-licensed car analysis method.

By adopting the fake-licensed vehicle analysis method, the fake-licensed vehicle analysis system and the computer medium based on the random forest in the embodiment of the application, city vehicle passing data and fake-licensed vehicle passing data in a preset time period are obtained; preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles. The method and the device have the advantages that the mass vehicle passing data of the city stored in the big data every day are utilized, the characteristics of the vehicle passing data are extracted, the random forest algorithm in machine learning is combined for analysis, each piece of data can be input into a trained model and then an analysis result of the fake-licensed vehicle can be obtained immediately, circulation is not needed, a large amount of comparison with other data is conducted, and the fake-licensed vehicle data analysis efficiency is greatly improved. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram illustrating steps of a random forest based fake-licensed vehicle analysis method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a training step of a random forest model according to an embodiment of the present application;

FIG. 3 is a schematic flow diagram illustrating a random forest based fake-licensed vehicle analysis method according to another embodiment of the present application;

a schematic structural diagram of a random forest based fake-licensed vehicle analysis system according to an embodiment of the application is shown in fig. 4;

a schematic structural diagram of a random forest based fake-licensed vehicle analysis device according to an embodiment of the application is shown in fig. 5.

Detailed Description

In the process of implementing the application, the inventor finds that the current fake-licensed car analysis method mainly comprises two types: one is that, according to a certain data, for example, the feature data of a fake-licensed vehicle, the stored data of passing the vehicle is continuously and circularly compared to judge whether the time-space abnormal data exists or not until the analysis result is obtained; and the other method is that the license plate image characteristics are extracted according to the current vehicle passing picture, the license plate image characteristics extracted from other vehicle passing pictures are compared according to an algorithm or a model, then each vehicle passing picture is compared one by one, and the continuous cyclic comparison is carried out, so that a large amount of system calculation power is consumed, the input cost is not in direct proportion to the result output under the condition that the fake-licensed vehicle data is few, and the practical applicability is poor.

The method aims to solve the problems that a large amount of license plate data needs to be compared circularly in the existing fake-licensed vehicle analysis method, and the efficiency is low and the accuracy is low. According to the fake-licensed vehicle analysis method, the fake-licensed vehicle analysis system and the computer medium based on the random forest, city vehicle passing data and fake-licensed vehicle passing data in a preset time period are obtained; preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles. The method and the device have the advantages that the mass vehicle passing data of the city stored in the big data every day are utilized, the characteristics of the vehicle passing data are extracted, the random forest algorithm in machine learning is combined for analysis, each piece of data can be input into a trained model and then an analysis result of the fake-licensed vehicle can be obtained immediately, circulation is not needed, a large amount of comparison with other data is conducted, and the fake-licensed vehicle data analysis efficiency is greatly improved. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example 1

A schematic step diagram of a random forest based fake-licensed vehicle analysis method according to an embodiment of the present application is shown in fig. 1.

As shown in fig. 1, the method for analyzing a fake-licensed vehicle based on a random forest in the embodiment of the application specifically includes the following steps:

s101: obtaining city vehicle passing data and fake-licensed vehicle passing data in a preset time period;

s102: and preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set.

In some embodiments of the present application, S101 specifically includes the following steps:

firstly, sorting vehicle passing data into a data set, and acquiring vehicle attribute characteristics of urban vehicle passing data in a preset time period to obtain the data set; wherein the vehicle attribute features include valid data features and meaningless data features.

Secondly, labeling the vehicle data in the data set according to the vehicle passing data of the fake-licensed vehicle to obtain a label set containing whether the vehicle is a fake-licensed vehicle label;

and finally, after the data characteristics of the data set are determined, the invalid vehicle data set in the data set is removed, and a vehicle sample set is obtained. Wherein the invalid vehicle data set refers to a set of vehicle data without the valid data feature or lacking key valid feature data.

Further, in the above-mentioned case,

and after the data set is obtained, the method also comprises the step of supplementing the effective data characteristics of the vehicles with partial missing effective data characteristics to obtain the complete effective data characteristics of the vehicles.

Further, the step of performing valid data feature supplementation includes:

firstly, acquiring a first valid data characteristic of a vehicle with a part of valid data characteristics missing;

and then, matching the vehicle archive information and the vehicle annual inspection information according to the first effective data characteristics, and supplementing the effective data characteristics with partial missing effective data characteristics to obtain the complete effective data characteristics of the vehicle. The vehicle profile information and the vehicle annual survey information may come from a traffic system or other traffic databases.

Further, in the above-mentioned case,

after the data set is acquired, the invalid characteristic information of the vehicles in the data set is removed, and the valid characteristic information of the vehicles in the data set is reserved.

And finally, integrating the set of the vehicle data with complete effective data characteristic information to obtain a vehicle sample set.

The effective data characteristics comprise license plate numbers, license plate types, vehicle body colors, appearance time, appearance places and/or vehicle speeds; meaningless data characteristics include data ID, equipment manufacturer, lane and/or direction of travel.

S103: and constructing a random forest model according to the vehicle sample set and the label set, and performing model training to obtain the trained random forest model.

S104: and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles.

A schematic diagram of the training steps of the random forest model according to the embodiment of the present application is shown in fig. 2.

As shown in fig. 2, training the random forest model in S103 specifically includes the following steps:

s6: and repeating the steps S1, S3, S4 and S5 to train the model until the score of the model is larger than or equal to the score threshold value, and/or terminating the training when the model is over-fitted on the training set, and finally obtaining the trained random forest model.

In some embodiments of the present application, a random forest model is constructed according to the number of samples and the number of features of a training set, specifically: and constructing a random forest model by adopting the information Entropy Encopy or the Gini index according to the number of samples and the number of characteristics of the training set.

In some embodiments of the present application, the parameters include a weight of the category, a number of sub-models, a maximum depth, a leaf node minimum number of samples, and/or a leaf node minimum total of sample weights.

Specifically, in the specific implementation of S104, vehicle passing data to be analyzed is preprocessed to obtain a data set with complete effective characteristic data; and then, inputting the preprocessed data set into the trained random forest model to obtain an analysis result of the fake-licensed car.

A key flow diagram of a random forest based fake-licensed car analysis method according to another embodiment of the present application is shown in fig. 3.

To further illustrate the method for analyzing the fake-licensed car based on the random forest according to the embodiment, as shown in fig. 3, the flow steps in the specific implementation include:

first, the data of passing vehicles on a certain city on the same day is obtained.

The method comprises the steps of obtaining daily massive vehicle passing data and known fake-licensed vehicle passing data of a city, and performing labeling operation on the vehicle passing data according to the known fake-licensed vehicle passing data to obtain a daily vehicle passing data set S and a label set Y of whether the vehicle passing data set S is a fake-licensed vehicle or not.

Secondly, preprocessing the data according to the data template.

Effective data characteristics such as license plate numbers, license plate types, vehicle body colors, appearance time, appearance places, vehicle speeds and the like are determined by analyzing the data set S, and data ID which is meaningless to analysis results, characteristics such as equipment manufacturers, lanes, driving directions and the like are removed;

further, the data are divided into daytime, evening, night and early morning according to the passing time, vehicle archive information and vehicle annual inspection information are matched according to the license plate type and the license plate number, missing data items are supplemented, data with excessive missing items are deleted, and the sample set X is obtained through the data processing.

Then, feature analysis is performed based on the data, and a feature value is calculated.

Secondly, a random forest model is established according to the data set.

And calculating characteristic values, sample quantity and characteristic quantity according to the training set, and constructing a random forest model by adopting information Entropy Encopy or Gini index.

Then, training the random forest model, and storing the trained model;

specifically, a sample set X and a label set Y are randomly selected with a put back, wherein test _ size is equal to 0.3, and a training set xtrin and ytain and a testing set Xtest and Ytest are obtained;

training and learning the random forest model according to the training sets Xtrain and Ytrain;

testing and scoring the trained random forest model by using test sets Xtest and Ytest;

continuously adjusting and optimizing parameters such as the weight of the category of the random forest model, the number of sub-models, the maximum depth, the minimum sample number of leaf nodes, the total weight value of the minimum samples of the leaf nodes and the like according to the score of the random forest model;

regenerating a test set through the sample set X and the label set Y after tuning, substituting the test set into the trained model, comparing the score result with the score result after tuning, and judging whether the model is over-fitted on the training set;

and judging whether to execute the model training step again according to the model score and the fitting program, and finally obtaining a sufficient excellent fake-licensed car analysis random forest algorithm model.

And finally, inputting the processed data set to the trained model, and outputting the fake-licensed car analysis result.

By adopting the fake-licensed vehicle analysis method based on the random forest in the embodiment of the application, city vehicle passing data and fake-licensed vehicle passing data in a preset time period are obtained; preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles. The method and the device have the advantages that the mass vehicle passing data of the city stored in the big data every day are utilized, the characteristics of the vehicle passing data are extracted, the random forest algorithm in machine learning is combined for analysis, each piece of data can be input into a trained model and then an analysis result of the fake-licensed vehicle can be obtained immediately, circulation is not needed, a large amount of comparison with other data is conducted, and the fake-licensed vehicle data analysis efficiency is greatly improved. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

Example 2

For details not disclosed in the system for analyzing a fake-licensed vehicle based on a random forest according to this embodiment, please refer to specific implementation contents of the method for analyzing a fake-licensed vehicle based on a random forest in other embodiments.

A schematic structural diagram of a random forest based fake-licensed vehicle analysis system according to an embodiment of the present application is shown in fig. 4.

As shown in fig. 4, the fake-licensed vehicle analysis system based on random forest according to the embodiment of the present application specifically includes a vehicle data acquisition module 10, a data preprocessing module 20, a random forest model training module 30, and a fake-licensed vehicle analysis module 40.

In particular, the method comprises the following steps of,

the vehicle data acquisition module 10: the method is used for acquiring city vehicle passing data and fake-licensed vehicle passing data in a preset time period.

The data preprocessing module 20: the method is used for preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set.

Specifically, the data preprocessing module 20 obtains a data set by sorting the vehicle data;

Further, in the above-mentioned case,

after the data set is obtained, the method also comprises the step of supplementing the effective data characteristics of the vehicles with partial missing effective data characteristics to obtain the complete effective data characteristics of the vehicles,

the method comprises the following steps:

Further, in the above-mentioned case,

Random forest model training module 30: and training a random forest model according to the vehicle sample set and the label set to obtain the trained random forest model.

In particular, the method comprises the following steps of,

training a random forest model, and specifically comprising the following steps:

Fake-licensed car analysis module 40: and the method is used for inputting the data of the vehicles passing to be analyzed into the trained random forest model to obtain the analysis result of the fake-licensed vehicles.

By adopting the fake-licensed car analysis system based on the random forest in the embodiment of the application, city car passing data and fake-licensed car passing data in a preset time period are obtained through the car data obtaining module 10; the data preprocessing module 20 preprocesses city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; the random forest model training module 30 performs random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; the fake-licensed car analysis module 40 inputs the data of the passing cars to be analyzed to the trained random forest model to obtain the fake-licensed car analysis result. The method and the device have the advantages that the mass vehicle passing data of the city stored in the big data every day are utilized, the characteristics of the vehicle passing data are extracted, the random forest algorithm in machine learning is combined for analysis, each piece of data can be input into a trained model and then an analysis result of the fake-licensed vehicle can be obtained immediately, circulation is not needed, a large amount of comparison with other data is conducted, and the fake-licensed vehicle data analysis efficiency is greatly improved. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

Example 3

For details that are not disclosed in the fake-licensed car analysis device based on the random forest of this embodiment, please refer to specific implementation contents of the fake-licensed car analysis method or system based on the random forest in other embodiments.

A schematic structural diagram of a random forest based fake-licensed vehicle analysis device 400 according to an embodiment of the present application is shown in fig. 5.

As shown in fig. 4, the fake-licensed vehicle analysis apparatus 400 includes:

the memory 402: for storing executable instructions; and

a processor 401 for interfacing with the memory 402 to execute executable instructions to perform a random forest based method of fake-licensed vehicle analysis.

Those skilled in the art will appreciate that the schematic diagram 5 is merely an example of the fake-licensed vehicle analysis device 400 and does not constitute a limitation of the fake-licensed vehicle analysis device 400 and may include more or less components than shown, or combine certain components, or different components, e.g., the fake-licensed vehicle analysis device 400 may also include input-output devices, network access devices, buses, etc.

The Processor 401 (CPU) may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor 401 may be any conventional processor or the like, the processor 401 being the control center for the fake-licensed vehicle analysis device 400, with various interfaces and lines connecting the various parts of the entire fake-licensed vehicle analysis device 400.

Memory 402 may be used to store computer readable instructions and processor 401 may implement the various functions of the fake-licensed vehicle analysis device 400 by executing or executing computer readable instructions or modules stored in memory 402 and invoking data stored in memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the stored data area may store data created from use of the fake-licensed vehicle analysis device 400 computer device 30, and the like. In addition, the Memory 402 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.

The modules integrated by the fake-licensed vehicle analysis device 400, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by hardware related to computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Example 4

The present embodiment provides a computer-readable storage medium having stored thereon a computer program; the computer program is executed by a processor to implement the random forest based fake-licensed car analysis method in other embodiments.

By adopting the fake-licensed car analysis equipment and the storage medium based on the random forest in the embodiment of the application, city data of passing cars and fake-licensed car data of passing cars in a preset time period are obtained; preprocessing city vehicle passing data and fake-licensed vehicle passing data in a preset time period to obtain a vehicle sample set and a label set; carrying out random forest model training according to the vehicle sample set and the label set to obtain a trained random forest model; and inputting the data of the vehicles passing to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicles. The method and the device have the advantages that the mass vehicle passing data of the city stored in the big data every day are utilized, the characteristics of the vehicle passing data are extracted, the random forest algorithm in machine learning is combined for analysis, each piece of data can be input into a trained model and then an analysis result of the fake-licensed vehicle can be obtained immediately, circulation is not needed, a large amount of comparison with other data is conducted, and the fake-licensed vehicle data analysis efficiency is greatly improved. The method solves the problems that the existing fake-licensed vehicle analysis method needs a large amount of license plate data to be compared circularly, and is low in efficiency and accuracy.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A fake-licensed car analysis method based on random forests is characterized by comprising the following steps:

preprocessing city vehicle passing data and fake-licensed vehicle passing data in the preset time period to obtain a vehicle sample set and a label set;

and inputting the data of passing the vehicle to be analyzed to the trained random forest model to obtain an analysis result of the fake-licensed vehicle.

2. The fake-licensed vehicle analysis method according to claim 1, wherein the preprocessing is performed on the vehicle passing data and the fake-licensed vehicle passing data to obtain a vehicle sample set and a tag set, and specifically comprises the following steps:

obtaining vehicle attribute characteristics of city vehicle passing data in the preset time period to obtain a data set, wherein the vehicle attribute characteristics comprise effective data characteristics and meaningless data characteristics;

labeling the vehicles in the data set according to the fake-licensed vehicle passing data to obtain a label set containing whether the vehicles are fake-licensed vehicle labels or not;

removing invalid vehicle data sets in the data sets to obtain a vehicle sample set; wherein the invalid vehicle data set refers to a set of vehicle data without the valid data feature or lacking key valid feature data.

3. The method for analyzing the fake-licensed vehicle according to claim 2, wherein after the obtaining of the data set, the method further comprises performing effective data feature supplement on the vehicle with the missing effective data feature part to obtain the complete effective data feature of the vehicle.

4. A fake-licensed vehicle analysis method according to claim 3, wherein said step of performing valid data feature supplementation includes:

acquiring a first valid data characteristic of the vehicle with the part of the valid data characteristic missing;

and matching vehicle archive information and vehicle annual inspection information according to the first valid data characteristics, and supplementing the valid data characteristics with partial missing valid data characteristics.

5. The fake-licensed vehicle analysis method according to claim 2, further comprising, after said acquiring a data set: and eliminating the invalid characteristic information of the vehicles in the data set, and keeping the valid characteristic information of the vehicles in the data set.

6. A fake-licensed vehicle analysis method according to claim 2, wherein the valid data characteristics include a number plate number, a number plate type, a vehicle body color, a time of occurrence, a place of occurrence, and/or a vehicle speed; the meaningless data characteristics include data ID, equipment manufacturer, lane and/or direction of travel.

7. The fake-licensed vehicle analysis method according to claim 1, wherein the method for constructing the random forest model according to the vehicle sample set and the label set and performing model training to obtain the trained random forest model specifically comprises the following steps:

s1: carrying out replaced random selection according to the vehicle sample set and the label set to obtain a training set and a testing set;

s2: constructing a random forest model according to the number of samples and the number of features of the training set;

s3: inputting the training set to the random forest model for training to obtain a trained random forest model;

s4: inputting the test set to the trained random forest model to obtain a model score;

8. The utility model provides a fake-licensed car analytic system based on random forest which characterized in that specifically includes:

a data preprocessing module: the system is used for preprocessing city vehicle passing data and fake-licensed vehicle passing data in the preset time period to obtain a vehicle sample set and a label set;

a random forest model training module: the system is used for training a random forest model according to the vehicle sample set and the label set to obtain a trained random forest model;

fake-licensed car analysis module: and inputting the data of passing the vehicle to be analyzed to the trained random forest model to obtain the analysis result of the fake-licensed vehicle.

9. The utility model provides a fake-licensed car analytical equipment based on random forest which characterized in that includes:

a memory: for storing executable instructions; and

a processor for interfacing with the memory to execute the executable instructions to perform the random forest based fake-licensed vehicle analysis method of any one of claims 1-6.

10. A computer-readable storage medium, having stored thereon a computer program; a computer program for execution by a processor to implement a random forest based fake-licensed vehicle analysis method as claimed in any one of claims 1 to 6.