CN116910662A - Passenger anomaly identification method and device based on random forest algorithm - Google Patents

Passenger anomaly identification method and device based on random forest algorithm Download PDF

Info

Publication number
CN116910662A
CN116910662A CN202310800663.0A CN202310800663A CN116910662A CN 116910662 A CN116910662 A CN 116910662A CN 202310800663 A CN202310800663 A CN 202310800663A CN 116910662 A CN116910662 A CN 116910662A
Authority
CN
China
Prior art keywords
passenger
model
random forest
feature
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310800663.0A
Other languages
Chinese (zh)
Inventor
王驰
苗应亮
胡长柏
李胜南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maxvision Technology Corp
Original Assignee
Maxvision Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maxvision Technology Corp filed Critical Maxvision Technology Corp
Priority to CN202310800663.0A priority Critical patent/CN116910662A/en
Publication of CN116910662A publication Critical patent/CN116910662A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The application discloses a passenger anomaly identification method and device based on a random forest algorithm, wherein the method comprises the following steps: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step; in the step of establishing the random forest model, classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not. By adopting the scheme, when the flight does not arrive at the port, the passenger personage images are classified and predicted in advance according to the forecast information, and when personnel abnormality exists, early warning prompt is given, personnel deployment can be carried out in advance, and the method has good practicability.

Description

Passenger anomaly identification method and device based on random forest algorithm
Technical Field
The application relates to the technical field of electronic information, in particular to a method and a device for carrying out anomaly identification on port passengers based on a Bagging Boosting random forest algorithm.
Background
At present, when the side inspection staff performs certificate screening and risk identification on the passing passengers, two main methods are adopted, namely a manual judgment method for making decisions according to self experience, and an automatic judgment method realized by establishing an expert experience library through computer assistance.
Wherein, the following disadvantages exist in manual judgment: 1. the personnel have a certain subjectivity when carrying out risk identification on the passers-by personnel, and unified standard risk judgment cannot be achieved; 2. the risk identification experience of the staff is uneven, so that the risk passenger identification rate is not guaranteed; 3. the efficiency of manual discrimination is insufficient, so that the customs inspection cannot be continuously kept high.
In the automatic judging method, the port identifies the risk passenger according to expert rules by establishing an expert experience library, but the expert experience method of the expert experience library has the following defects: (1) The expert experience method generates a corresponding rule according to expert experience, and certain difference exists between the expert experience method and the real distribution of data, so that missed detection or false detection is caused; (2) The partial rules generate rules according to the specific values of the characteristics of the historical abnormal passengers, the integrity of the characteristic distribution is not fully considered, and the abnormal judgment is carried out only by the same characteristics so as to have the risk of missed detection; (3) When a plurality of characteristics irrelevant to the result exist in the rule, the abnormal checking efficiency of the passengers is affected. In addition, when the number of historical abnormal passengers reaches a certain order of magnitude, the feasibility of manually searching for rules in the data set is low, model training is performed through a machine learning model, the efficiency of extracting objective rules is more considerable, the machine learning method can calculate the model accuracy, the model with low accuracy can be screened out through setting a threshold value, the abnormal recognition accuracy of the passengers can be optimized, and no abnormal recognition method for the passengers with high recognition accuracy exists at present.
Disclosure of Invention
The following presents a simplified summary of embodiments of the application in order to provide a basic understanding of some aspects of the application. It should be understood that the following summary is not an exhaustive overview of the application. It is not intended to identify key or critical elements of the application or to delineate the scope of the application. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
According to one aspect of the application, there is provided a method for identifying anomalies in passengers based on a random forest algorithm, comprising: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step; in the step of establishing the random forest model, classifying the data sample into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.
Specifically, the step of establishing the random forest model includes:
establishing a data sample, classifying the data sample according to the types of abnormal personnel, and acquiring characteristic data of the data sample to form characteristic data of different types of abnormal personnel;
performing correlation analysis on the characteristic data of different abnormal personnel types by adopting a mutual information method to obtain correlation values of different characteristics and output results; the output result is the output of a random forest model, namely, the personnel classification and abnormality or not; the random forest model comprises input and output when being constructed, wherein the input is characteristic data of different abnormal personnel types, and the output is whether personnel are abnormal or not;
screening out feature set [ X ] with correlation value greater than 0 according to the correlation value 0 ,X 1 ,...,X M ]Feature set and tag set [ Y 0 ,Y 1 ,...,Y M ]Splitting into a training set and a testing set; feature set [ X 0 ,X 1 ,...,X M ]Wherein X is i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] 0 ,Y 1 ,...,Y M ]Data length and feature set [ X ] 0 ,X 1 ,...,X M ]Identical, tag set [ Y 0 ,Y 1 ,...,Y M ]In (1), Y i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);
substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold T, storing the screened feature set and the random forest model;
the step of updating the random forest model regularly comprises the step of updating the feature set and the random forest model regularly;
an abnormality determination step for a traveler, including:
calculating the characteristic h of the passenger P 1 、h 2 、h 3
Feature h of passenger P 1 、h 2 、h 3 Respectively comparing the model with models of different abnormal personnel types in the random forest model;
judging whether the comparison result is abnormal or not;
if the passenger is abnormal, the passenger sends out an early warning signal when the passenger enters the warning area range.
Further, substituting the test set into a trained random forest model, wherein the training process of the random forest model comprises the following steps:
recording the number of the training total samples as N, and randomly extracting N training samples serving as the single tree from N training sets by the single decision tree;
when the number of the input features of the training sample is M, and splitting is carried out on each node of each decision tree, randomly selecting M input features from the M input features, and then selecting the best splitting from the input features; m is far smaller than M, and M is not changed in the process of constructing the decision tree;
each tree is split all the time until all training samples of the node belong to the same class, and pruning is not needed;
and (3) result judgment:
(1) the target features are of the digital type: taking the average value of t decision trees as a classification result;
(2) the target features are category types: the minority obeys the majority, and the category with the most classification result is taken as the classification result of the whole random forest.
The random forest method is a random forest based on Bagging Boosting (training samples are extracted from a data set for multiple times to construct a plurality of weak learners, and Boosting is iterative construction of the strong learners during training), and the training process of the random forest model comprises the following specific implementation steps:
1. firstly, randomly selecting N samples from a training set to serve as a sampling set of Bootstrap sampling;
2. randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree;
3. training a decision tree based on the sampled sample set and the optional feature set of the Bootstrap;
4. predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;
5. carrying out Bootstrap sampling again according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;
6. repeating the step 4 and the step 5 until a specified number of decision trees are trained;
7. and finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.
Further, the abnormal person types include a betting-involved person, an illegal attendant, and an alien bride. Characteristics h of the passenger P 1 、h 2 、h 3 For passenger characteristics based on different abnormal person types, in particular, characteristics h of passenger P 1 Characteristic data of related betting personnel, generally a group of characteristic sets with relevance; feature h of passenger P 2 The characteristic data of related illegal crews are generally a group of characteristic sets with relevance; feature h of passenger P 3 The feature data of the relevant foreign-nationality bride is generally a group of feature sets with relevance; the passenger characteristics of different abnormal personnel types comprise passenger basic information, passenger travel information, passenger illegal records, passenger in-transit information and other characteristic information.
Further, substituting the test set into a trained random forest model, and obtaining a characteristic set f of the betting and fraud-involved person if the prediction accuracy meets a set threshold value 1 Illegal attendant feature set f 2 Feature set f of foreign school bride 3 Model m of person involved in betting 1 Illegal attendant model m 2 Model m of foreign-body bride 3
Further, feature h of passenger P 1 、h 2 、h 3 Respectively comparing the model with models of different abnormal personnel types in a random forest model, wherein the method specifically comprises the following steps of:
feature h of passenger P 1 Bringing the model library into a model m corresponding to the model library 1 Predicting to obtain m 1 The model predictive value judges whether the passenger is a person involved in betting or not according to the predictive value;
feature h of passenger P 2 Bringing the model library into a model m corresponding to the model library 2 Predicting to obtain m 2 The model predictive value judges whether the passenger is illegal crews or not according to the predictive value;
feature h of passenger P 3 Bringing the model library into a model m corresponding to the model library 3 Predicting to obtain m 3 Model predictive value, judge travel according to predictive valueWhether the guest is a foreign bride.
Further, the step of updating the random forest model at regular time specifically comprises the steps of traversing each abnormal personnel type, obtaining the passenger ID corresponding to each abnormal personnel type, randomly obtaining the passenger ID of the same number of conventional personnel, and forming a passenger training set by the passenger ID corresponding to each abnormal personnel type and the passenger ID of the conventional personnel obtained at random; and calculating various characteristics of the passenger training set. The passenger training set comprises passenger basic information, passenger travel information, passenger illegal records and passenger in-flight information feature modules.
Wherein the passenger basic information comprises nationality (China or not), age group (teenagers, young, middle-aged and elderly people), certificate type, gender, personnel category, visa stay period and visa category; the passenger travel information comprises the number of outbound times, average inbound time interval, average outbound time interval, whether the outbound times are matched, average domestic residence time, average foreign residence time, to-and-from (whether high risk countries are contained) and overseas tracks (whether high risk countries are contained); the illegal recording of the passengers comprises illegal times, illegal entry times and illegal residence times; the passenger in-home information includes in-home stay areas (villages, first-line cities, key cities and the like), accommodations (hotels, rentals, houses and the like), and aggregated visas (whether applied or not).
Further, the correlation analysis is carried out on the characteristic data of different abnormal personnel types by adopting a mutual information method, and the method specifically comprises the following steps:
obtaining the correlation values of different features and results according to the calculation formula of mutual information I (X; Y), wherein the range of the correlation values is [0,1], and the calculation formula of the mutual information I (X; Y) is as follows:
where p (X) and p (Y) are marginal probability distribution functions of feature X and tag Y, and p (X, Y) is a joint probability distribution function of feature X and tag Y.
According to another aspect of the present application, there is provided a passenger anomaly recognition device based on a random forest algorithm, including: a first module for performing a step of establishing a random forest model, a second module for performing a step of updating the random forest model at regular time, and a third module for performing a step of judging abnormality of the passenger; the first module is used for classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on the characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; the third module is used for calculating the characteristics of the passengers to be detected in real time, and comparing the characteristics with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.
Specifically, in the first module, the step of establishing the random forest model includes:
establishing a data sample, classifying the data sample according to the types of abnormal personnel, and acquiring characteristic data of the data sample to form characteristic data of different types of abnormal personnel;
performing correlation analysis on the feature data of different abnormal personnel types by adopting a mutual information method to obtain correlation values of different features and results;
screening out feature set [ X ] with correlation value greater than 0 according to the correlation value 0 ,X 1 ,...,X M ]Feature set and tag set [ Y 0 ,Y 1 ,...,Y M ]Splitting into a training set and a testing set; feature set [ X 0 ,X 1 ,...,XM]Wherein X is i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] 0 ,Y 1 ,...,Y M ]Data length and feature set [ X ] 0 ,X 1 ,...,X M ]Identical, tag set [ Y 0 ,Y 1 ,...,Y M ]In (1), Y i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);
substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold T, storing the screened feature set and the random forest model;
in the second module, the step of updating the random forest model at regular time comprises updating the feature set and storing the model at regular time;
in the third module, the step of determining abnormality of the passenger includes:
calculating the characteristic h of the passenger P 1 、h 2 、h 3
Feature h of passenger P 1 、h 2 、h 3 Respectively comparing the model with models of different abnormal personnel types in the random forest model;
judging whether the comparison result is abnormal or not;
if the passenger is abnormal, the passenger sends out an early warning signal when the passenger enters the warning area range.
Compared with the prior art, the application performs characteristic correlation analysis according to the passenger figure data and the marking data, combines a supervised classification model (random forest model) and provides an objective calculation method for classifying the passenger figure; the data and the service attributes are combined to provide the feature set related to the character portraits of the port passengers, so that the accuracy of the classification model is facilitated; performing correlation analysis on the data of different label types and the feature set, screening out irrelevant features, reducing the training calculation amount of the machine learning model, and improving the calculation efficiency; calculating the accuracy of each model according to the existing data at regular intervals, and when the accuracy meets the threshold requirement, predicting the characteristics of the forecast information, and automatically filtering a low-accuracy prediction model; when the flight does not arrive at the port, the passenger figures are classified and predicted in advance according to the forecast information, and when personnel abnormality exists, early warning prompt is given, personnel deployment can be carried out in advance, and the method has good practicability.
Drawings
The application may be better understood by referring to the following description in conjunction with the accompanying drawings in which like or similar reference numerals are used to indicate like or similar elements throughout the several views. The accompanying drawings, which are included to provide a further illustration of the preferred embodiments of the application and together with a further understanding of the principles and advantages of the application, are incorporated in and constitute a part of this specification. Attached at
In the figure:
FIG. 1 is a schematic diagram of a passenger anomaly identification method of the present application;
FIG. 2 is a schematic diagram of a feature data set of different abnormal person types according to the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings. Elements and features described in one drawing or embodiment of the application may be combined with elements and features shown in one or more other drawings or embodiments. It should be noted that the illustration and description of components and processes known to those skilled in the art, which are not relevant to the present application, have been omitted in the drawings and description for the sake of clarity.
At present, a port worker often carries out subjective abnormal judgment on a passenger according to own experience, and the possibility of missing error detection exists. The passenger anomaly identification method can be directly applied to the field of side-check pass passenger data research and judgment; according to the scheme, a machine learning model is built according to the characteristics of the abnormal personnel and the abnormal labels, abnormal recognition is carried out on passengers on the passway to be detected, and the early warning effect of the abnormal personnel is achieved.
As a specific embodiment, referring to fig. 1, the passenger anomaly identification method of the present application includes: a timing calculation process and a trigger type calculation process, wherein the timing calculation process comprises a pre-establishment step of a random forest model.
Wherein the timing calculation (initial calculation is started when deployed, the feature set is updated with the data accumulation per week fixed time calculation and the model is stored) comprises the following steps:
step (1): traversing the various abnormal person types i: betting on fraud personnel, illegal crews, foreign singing brides and the like;
step (2): acquiring passenger IDs corresponding to abnormal person types i, and randomly acquiring the passenger IDs of the same number of conventional persons to form a passenger training set;
step (3): calculating various characteristics of the passenger set, including passenger basic information, passenger travel information, passenger illegal records and passenger in-transit information characteristic modules;
(1) passenger basis information:
nationality (China or not), age group (teenagers, young, middle-aged and elderly), certificate type, gender, personnel category, visa residence time and visa category;
(2) passenger travel information: number of inbound, average inbound time interval, average outbound time interval, whether the inbound number matches, average domestic residence time, average foreign residence time, to-and-from (whether high risk country is included), overseas track (whether high risk country is included)
(3) Illegally recording passengers: number of illegal violations, number of illegal entries, number of illegal dwellings
(4) Passenger presence information: in the area of the Huazhi residence (villages and towns, first line cities, key cities and the like), accommodation (hotels, rentals, houses and the like), and agglomeration visa (whether applied or not);
step (4): performing correlation analysis on the characteristics by adopting a mutual information method;
and evaluating the correlation of the qualitative independent variable to the qualitative dependent variable, and evaluating the correlation of the category type variable to the category type variable, wherein the larger the mutual information is, the higher the correlation of the two variables is, and when the mutual information is 0, the two variables are mutually independent. The calculation formula of mutual information is:
where p (X) and p (Y) are marginal probability distribution functions of feature X and tag Y, and p (X, Y) is a joint probability distribution function of feature X and tag Y. Intuitively, the mutual information measures the information shared between two random variables, which can also be expressed as the amount of uncertainty reduction of Y due to the introduction of X, when the mutual information is the same as the information gain.
And finally obtaining the correlation values of different features and results according to a calculation formula of mutual information, wherein the numerical distribution [0,1].
Step (5): screening out correlation 0Feature, preserving feature set [ X ] with correlation greater than 0 0 ,X 1 ,...,X M ]Feature set and tag set [ Y 0 ,Y 1 ,...,Y M ]Splitting into a training set and a testing set; feature set [ X 0 ,X 1 ,...,X M ]Wherein X is i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] 0 ,Y 1 ,...,Y M ]Data length and feature set [ X ] 0 ,X 1 ,...,X M ]Identical, tag set [ Y 0 ,Y 1 ,...,Y M ]In (1), Y i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);
step (6): training a random forest model;
regarding the randomization:
(1) when training each tree, a subset is selected from all training samples for training (i.e. bootstrap sampling). Evaluating the residual data to evaluate the error;
(2) at each node, a subset of all features is randomly selected for computing the best segmentation.
The algorithm flow is as follows:
(1) the number of the total training samples is N, and then a single decision tree randomly extracts N training samples (bootstrap has replaced sampling) which are taken as the single tree from N training sets.
(2) Let the number of input features of training sample be M, M is far smaller than M, we choose M input features randomly from M input features when splitting is performed on each node of each decision tree, then choose one of the best input features to split. m will not change during the construction of the decision tree.
Note that: m features are randomly selected for each node and then the best one is selected for splitting. Metrics of split properties in decision trees: a base index.
(3) Each tree splits in this way until all training examples of the node belong to the same class, without pruning. Since the previous two random sampling processes ensure randomness, over-mapping does not occur even if pruning is not performed.
And (3) result judgment:
(1) the target features are of the digital type: taking the average value of t decision trees as a classification result.
(2) The target features are category types: the minority obeys the majority, and the category with the most classification result is taken as the classification result of the whole random forest.
The random forest method is a random forest based on Bagging Boosting (training samples are extracted from a data set for multiple times to construct a plurality of weak learners, and Boosting is iterative construction of the strong learners during training), and the training process of the random forest model is specifically realized as follows:
firstly, randomly selecting N samples from a training set to serve as a sampling set of Bootstrap sampling;
randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree; training a decision tree based on a sample set and an optional feature set of Bootstrap sampling;
4, predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;
step 5, re-sampling the Bootstrap according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;
step 6, repeatedly executing the step 4 and the step 5 until a specified number of decision trees are trained;
and 7, finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.
The training method of the random forest model is different from the conventional random forest model in that:
1. feature selection: the conventional random forest uses a random choice of m features (m=sqrt (p)) to construct a plurality of decision trees, while the random forest model constructed based on the Bagging Boosting method uses a random feature choice mode (m=log2p) to construct a plurality of decision trees.
2. And (3) adjusting: in the training process, training of each decision tree of a conventional random forest is carried out based on a sample set sampled by a Bootstrap, and a random forest model constructed based on a Bagging Boosting method is subjected to weight adjustment according to the error condition of a current sample in each round of training, so that the next round of training is more focused on the error sample, and the accuracy of the model is improved.
3. And (3) controlling the times: the training of the conventional random forest is usually performed based on the number of trees set in advance, while the random forest model constructed based on the Bagging Boosting method is evaluated in each iteration according to the prediction performance of the model, so as to determine whether the number of iterations needs to be increased.
Step (7): substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold T, storing the screened feature set and the random forest model;
step (8): if the prediction accuracy meets the set threshold, a characteristic set f of the betting-involved person can be obtained 1 Illegal attendant feature set f 2 Feature set f of foreign school bride 3 Model m of person involved in betting 1 Illegal attendant model m 2 Model m of foreign-body bride 3
The triggered calculation (calculation before arrival of the forecasted passenger P) comprises the following steps:
step (1) calculating passenger P characteristic h 1 、h 2 、h 3 The method comprises the steps of carrying out a first treatment on the surface of the Feature h 1 For the characteristic data of the related betting participants, the characteristic h 2 Is the characteristic data of related illegal crews, the characteristic h 3 Characteristic data of relevant foreign school brides;
step (2) passenger P feature h 1 Bringing the model library into a model m corresponding to the model library 1 Predicting to obtain m 1 Model predictive value, based on which it is known whether or not the passenger is a passengerA person involved in a wager;
step (3) passenger P feature h 2 Bringing the model library into a model m corresponding to the model library 2 Predicting to obtain m 2 The model predictive value, according to predictive value can know whether the passenger is illegal worker;
step (4) passenger P feature h 3 Bringing the model library into a model m corresponding to the model library 3 Predicting to obtain m 3 The model predictive value, according to predictive value can know whether the passenger is a foreign bride;
step (5) according to the steps, whether the passenger P is abnormal or not can be known, and if the passenger P is abnormal, early warning is sent out in advance;
step (6), if the passenger P predicts abnormality, when the passenger P goes through the customs inspection, the on-site staff performs key screening and questioning on the passenger P;
as another embodiment, the application also provides a passenger anomaly identification device based on the random forest algorithm, which executes the passenger anomaly identification method.
The method can be directly applied to the field of study and judgment of the side inspection passing passengers, a machine learning model is built according to the characteristics of abnormal personnel and the abnormal labels, abnormal recognition is carried out on the passengers on the side inspection passing, and the early warning effect of the abnormal personnel is achieved.
Furthermore, the methods of the present application are not limited to being performed in the time sequence described in the specification, but may be performed in other time sequences, in parallel or independently. Therefore, the order of execution of the methods described in the present specification does not limit the technical scope of the present application.
While the application has been disclosed in the context of specific embodiments, it should be understood that all embodiments and examples described above are illustrative rather than limiting. Various modifications, improvements, or equivalents of the application may occur to persons skilled in the art and are within the spirit and scope of the following claims. Such modifications, improvements, or equivalents are intended to be included within the scope of this application.

Claims (10)

1. The passenger anomaly identification method based on the random forest algorithm is characterized by comprising the following steps of: comprising the following steps: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step;
in the step of establishing the random forest model, classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.
2. The passenger anomaly identification method of claim 1, wherein:
the building step of the random forest model specifically comprises the following steps:
establishing a data sample, classifying the data sample according to the types of abnormal personnel, and acquiring characteristic data of the data sample to form characteristic data of different types of abnormal personnel;
performing correlation analysis on the feature data of different abnormal personnel types by adopting a mutual information method to obtain correlation values of different features and results;
screening out a feature set with a correlation value greater than 0 according to the correlation value, and splitting the feature set and the label set into a training set and a testing set;
substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold, storing the screened feature set and the random forest model;
the step of updating the random forest model regularly comprises updating the feature set and storing the model regularly;
an abnormality determination step for a traveler, including:
calculating the characteristic h of the passenger P 1 、h 2 、h 3 The method comprises the steps of carrying out a first treatment on the surface of the Characteristics h of the passenger P 1 、h 2 、h 3 Is based on passenger characteristics of different abnormal personnel types;
will passengersCharacteristic h of P 1 、h 2 、h 3 Respectively comparing the model with models of different abnormal personnel types in the random forest model;
judging whether the comparison result is abnormal or not;
if the passenger is abnormal, the passenger sends out an early warning signal when the passenger enters the warning area range.
3. The passenger anomaly identification method of claim 2, wherein: the abnormal personnel types comprise betting-involved persons, illegal attendant persons and foreign brides; characteristics h of the passenger P 1 For characteristic data of related betting participants, characteristic h of passenger P 2 Characteristic h of passenger P as characteristic data of related illegal crew member 3 Is the characteristic data of the relevant foreign school bride.
4. A method of identifying a passenger anomaly as claimed in claim 3, wherein: substituting the test set into a trained random forest model, and obtaining a characteristic set f of a betting-involved person if the prediction precision meets a set threshold value 1 Illegal attendant feature set f 2 Feature set f of foreign school bride 3 Model m of person involved in betting 1 Illegal attendant model m 2 Model m of foreign-body bride 3
5. The passenger anomaly identification method of claim 4, wherein: feature h of passenger P 1 、h 2 、h 3 Respectively comparing the model with models of different abnormal personnel types in a random forest model, wherein the method specifically comprises the following steps of:
feature h of passenger P 1 Bringing the model library into a model m corresponding to the model library 1 Predicting to obtain m 1 The model predictive value judges whether the passenger is a person involved in betting or not according to the predictive value;
feature h of passenger P 2 Bringing the model library into a model m corresponding to the model library 2 Predicting to obtain m 2 Model predictive value, judge whether the passenger is illegal according to predictive valueA worker;
feature h of passenger P 3 Bringing the model library into a model m corresponding to the model library 3 Predicting to obtain m 3 And judging whether the passenger is a foreign bride according to the model predictive value.
6. The passenger anomaly identification method of claim 5, wherein: the step of updating the random forest model at regular time specifically comprises the steps of traversing each abnormal personnel type, obtaining the passenger ID corresponding to each abnormal personnel type, randomly obtaining the same number of conventional personnel passenger IDs, and forming a passenger training set by the passenger ID corresponding to each abnormal personnel type and the conventional personnel passenger ID randomly obtained; calculating various characteristics of the passenger training set; the passenger training set comprises passenger basic information, passenger travel information, passenger illegal records and passenger in-flight information feature modules.
7. The passenger anomaly identification method of claim 6, wherein: the passenger basic information comprises nationality, age group, certificate type, gender, personnel category, visa stay period and visa category; the passenger travel information comprises the number of outbound times, average inbound time interval, average outbound time interval, whether the outbound times are matched, average domestic residence time, average foreign residence time, arrival and departure places and overseas tracks; the illegal recording of the passengers comprises illegal times, illegal entry times and illegal residence times; the passenger presence information includes presence areas, accommodations, and aggregation visas.
8. The passenger anomaly identification method of claim 6, wherein: the correlation analysis is carried out on the characteristic data of different abnormal personnel types by adopting a mutual information method, and the method specifically comprises the following steps:
obtaining the correlation values of different features and results according to the calculation formula of mutual information I (X; Y), wherein the range of the correlation values is [0,1], and the calculation formula of the mutual information I (X; Y) is as follows:
where p (X) and p (Y) are marginal probability distribution functions of feature X and tag Y, and p (X, Y) is a joint probability distribution function of feature X and tag Y.
9. The passenger anomaly identification method of claim 2, wherein: substituting the test set into a trained random forest model, wherein the training process of the random forest model comprises the following steps of: process 1: randomly selecting N samples from the training set to be used as a sampling set of Bootstrap sampling;
process 2: randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree;
process 3: training a decision tree based on the sampled sample set and the optional feature set of the Bootstrap;
process 4: predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;
process 5: carrying out Bootstrap sampling again according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;
process 6: repeating the processes 4 and 5 until a specified number of decision trees are trained;
process 7: and finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.
10. Passenger anomaly recognition device based on random forest algorithm, its characterized in that: comprising the following steps: a first module for performing a step of establishing a random forest model, a second module for performing a step of updating the random forest model at regular time, and a third module for performing a step of judging abnormality of the passenger; the first module is used for classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on the characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; the third module is used for calculating the characteristics of the passengers to be detected in real time, and comparing the characteristics with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.
CN202310800663.0A 2023-07-03 2023-07-03 Passenger anomaly identification method and device based on random forest algorithm Pending CN116910662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310800663.0A CN116910662A (en) 2023-07-03 2023-07-03 Passenger anomaly identification method and device based on random forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310800663.0A CN116910662A (en) 2023-07-03 2023-07-03 Passenger anomaly identification method and device based on random forest algorithm

Publications (1)

Publication Number Publication Date
CN116910662A true CN116910662A (en) 2023-10-20

Family

ID=88355554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310800663.0A Pending CN116910662A (en) 2023-07-03 2023-07-03 Passenger anomaly identification method and device based on random forest algorithm

Country Status (1)

Country Link
CN (1) CN116910662A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172729A (en) * 2023-11-03 2023-12-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172729A (en) * 2023-11-03 2023-12-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data
CN117172729B (en) * 2023-11-03 2024-04-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data

Similar Documents

Publication Publication Date Title
CN110796284B (en) Method and device for predicting pollution level of fine particulate matters and computer equipment
CN109214274B (en) Airport security management system
CN107528832B (en) Baseline construction and unknown abnormal behavior detection method for system logs
CN111081016B (en) Urban traffic abnormity identification method based on complex network theory
CN111507376B (en) Single-index anomaly detection method based on fusion of multiple non-supervision methods
CN109615116A (en) A kind of telecommunication fraud event detecting method and detection system
CN110647539A (en) Prediction method and system for vehicle faults
CN110287439A (en) A kind of network behavior method for detecting abnormality based on LSTM
CN111436944B (en) Falling detection method based on intelligent mobile terminal
CN109194612A (en) A kind of network attack detecting method based on depth confidence network and SVM
CN109034036A (en) A kind of video analysis method, Method of Teaching Quality Evaluation and system, computer readable storage medium
CN106874951B (en) Passenger attention rating method and device
CN115938095B (en) Landslide monitoring and early warning method and system based on integrated fusion model
CN116910662A (en) Passenger anomaly identification method and device based on random forest algorithm
Sethi et al. Soundscapes predict species occurrence in tropical forests
Zhu et al. Traffic monitoring and anomaly detection based on simulation of luxembourg road network
CN116933112A (en) DBSCAN-based passenger anomaly identification method and device
CN109639662A (en) Onboard networks intrusion detection method based on deep learning
CN114912678A (en) Online automatic detection and early warning method and system for abnormal operation of power grid regulation and control
Mills et al. Constructing a pollen proxy from low-cost Optical Particle Counter (OPC) data processed with Neural Networks and Random Forests
He et al. Analysis and real-time prediction of local incident impact on transportation networks
CN111667697B (en) Abnormal vehicle identification method and device, and computer readable storage medium
CN116522171A (en) Electric power field fault analysis method and system based on big data
Jeong et al. Constructing an Audio Dataset of Construction Equipment from Online Sources for Audio-Based Recognition
CN116302809A (en) Edge end data analysis and calculation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination