CN116910662A

CN116910662A - Passenger anomaly identification method and device based on random forest algorithm

Info

Publication number: CN116910662A
Application number: CN202310800663.0A
Authority: CN
Inventors: 王驰; 苗应亮; 胡长柏; 李胜南
Original assignee: Maxvision Technology Corp
Current assignee: Maxvision Technology Corp
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-10-20

Abstract

The application discloses a passenger anomaly identification method and device based on a random forest algorithm, wherein the method comprises the following steps: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step; in the step of establishing the random forest model, classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not. By adopting the scheme, when the flight does not arrive at the port, the passenger personage images are classified and predicted in advance according to the forecast information, and when personnel abnormality exists, early warning prompt is given, personnel deployment can be carried out in advance, and the method has good practicability.

Description

Passenger anomaly identification method and device based on random forest algorithm

Technical Field

The application relates to the technical field of electronic information, in particular to a method and a device for carrying out anomaly identification on port passengers based on a Bagging Boosting random forest algorithm.

Background

At present, when the side inspection staff performs certificate screening and risk identification on the passing passengers, two main methods are adopted, namely a manual judgment method for making decisions according to self experience, and an automatic judgment method realized by establishing an expert experience library through computer assistance.

Wherein, the following disadvantages exist in manual judgment: 1. the personnel have a certain subjectivity when carrying out risk identification on the passers-by personnel, and unified standard risk judgment cannot be achieved; 2. the risk identification experience of the staff is uneven, so that the risk passenger identification rate is not guaranteed; 3. the efficiency of manual discrimination is insufficient, so that the customs inspection cannot be continuously kept high.

In the automatic judging method, the port identifies the risk passenger according to expert rules by establishing an expert experience library, but the expert experience method of the expert experience library has the following defects: (1) The expert experience method generates a corresponding rule according to expert experience, and certain difference exists between the expert experience method and the real distribution of data, so that missed detection or false detection is caused; (2) The partial rules generate rules according to the specific values of the characteristics of the historical abnormal passengers, the integrity of the characteristic distribution is not fully considered, and the abnormal judgment is carried out only by the same characteristics so as to have the risk of missed detection; (3) When a plurality of characteristics irrelevant to the result exist in the rule, the abnormal checking efficiency of the passengers is affected. In addition, when the number of historical abnormal passengers reaches a certain order of magnitude, the feasibility of manually searching for rules in the data set is low, model training is performed through a machine learning model, the efficiency of extracting objective rules is more considerable, the machine learning method can calculate the model accuracy, the model with low accuracy can be screened out through setting a threshold value, the abnormal recognition accuracy of the passengers can be optimized, and no abnormal recognition method for the passengers with high recognition accuracy exists at present.

Disclosure of Invention

The following presents a simplified summary of embodiments of the application in order to provide a basic understanding of some aspects of the application. It should be understood that the following summary is not an exhaustive overview of the application. It is not intended to identify key or critical elements of the application or to delineate the scope of the application. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

According to one aspect of the application, there is provided a method for identifying anomalies in passengers based on a random forest algorithm, comprising: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step; in the step of establishing the random forest model, classifying the data sample into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.

Specifically, the step of establishing the random forest model includes:

establishing a data sample, classifying the data sample according to the types of abnormal personnel, and acquiring characteristic data of the data sample to form characteristic data of different types of abnormal personnel;

performing correlation analysis on the characteristic data of different abnormal personnel types by adopting a mutual information method to obtain correlation values of different characteristics and output results; the output result is the output of a random forest model, namely, the personnel classification and abnormality or not; the random forest model comprises input and output when being constructed, wherein the input is characteristic data of different abnormal personnel types, and the output is whether personnel are abnormal or not;

screening out feature set [ X ] with correlation value greater than 0 according to the correlation value ₀ ，X ₁ ，...，X _M ]Feature set and tag set [ Y ₀ ，Y ₁ ，...，Y _M ]Splitting into a training set and a testing set; feature set [ X ₀ ，X ₁ ，...，X _M ]Wherein X is _i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] ₀ ，Y ₁ ，...，Y _M ]Data length and feature set [ X ] ₀ ，X ₁ ，...，X _M ]Identical, tag set [ Y ₀ ，Y ₁ ，...，Y _M ]In (1), Y _i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);

substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold T, storing the screened feature set and the random forest model;

the step of updating the random forest model regularly comprises the step of updating the feature set and the random forest model regularly;

an abnormality determination step for a traveler, including:

calculating the characteristic h of the passenger P ₁ 、h ₂ 、h ₃ ；

Feature h of passenger P ₁ 、h ₂ 、h ₃ Respectively comparing the model with models of different abnormal personnel types in the random forest model;

judging whether the comparison result is abnormal or not;

if the passenger is abnormal, the passenger sends out an early warning signal when the passenger enters the warning area range.

Further, substituting the test set into a trained random forest model, wherein the training process of the random forest model comprises the following steps:

recording the number of the training total samples as N, and randomly extracting N training samples serving as the single tree from N training sets by the single decision tree;

when the number of the input features of the training sample is M, and splitting is carried out on each node of each decision tree, randomly selecting M input features from the M input features, and then selecting the best splitting from the input features; m is far smaller than M, and M is not changed in the process of constructing the decision tree;

each tree is split all the time until all training samples of the node belong to the same class, and pruning is not needed;

and (3) result judgment:

(1) the target features are of the digital type: taking the average value of t decision trees as a classification result;

(2) the target features are category types: the minority obeys the majority, and the category with the most classification result is taken as the classification result of the whole random forest.

The random forest method is a random forest based on Bagging Boosting (training samples are extracted from a data set for multiple times to construct a plurality of weak learners, and Boosting is iterative construction of the strong learners during training), and the training process of the random forest model comprises the following specific implementation steps:

1. firstly, randomly selecting N samples from a training set to serve as a sampling set of Bootstrap sampling;

2. randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree;

3. training a decision tree based on the sampled sample set and the optional feature set of the Bootstrap;

4. predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;

5. carrying out Bootstrap sampling again according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;

6. repeating the step 4 and the step 5 until a specified number of decision trees are trained;

7. and finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.

Further, the abnormal person types include a betting-involved person, an illegal attendant, and an alien bride. Characteristics h of the passenger P ₁ 、h ₂ 、h ₃ For passenger characteristics based on different abnormal person types, in particular, characteristics h of passenger P ₁ Characteristic data of related betting personnel, generally a group of characteristic sets with relevance; feature h of passenger P ₂ The characteristic data of related illegal crews are generally a group of characteristic sets with relevance; feature h of passenger P ₃ The feature data of the relevant foreign-nationality bride is generally a group of feature sets with relevance; the passenger characteristics of different abnormal personnel types comprise passenger basic information, passenger travel information, passenger illegal records, passenger in-transit information and other characteristic information.

Further, substituting the test set into a trained random forest model, and obtaining a characteristic set f of the betting and fraud-involved person if the prediction accuracy meets a set threshold value ₁ Illegal attendant feature set f ₂ Feature set f of foreign school bride ₃ Model m of person involved in betting ₁ Illegal attendant model m ₂ Model m of foreign-body bride ₃ 。

Further, feature h of passenger P ₁ 、h ₂ 、h ₃ Respectively comparing the model with models of different abnormal personnel types in a random forest model, wherein the method specifically comprises the following steps of:

feature h of passenger P ₁ Bringing the model library into a model m corresponding to the model library ₁ Predicting to obtain m ₁ The model predictive value judges whether the passenger is a person involved in betting or not according to the predictive value;

feature h of passenger P ₂ Bringing the model library into a model m corresponding to the model library ₂ Predicting to obtain m ₂ The model predictive value judges whether the passenger is illegal crews or not according to the predictive value;

feature h of passenger P ₃ Bringing the model library into a model m corresponding to the model library ₃ Predicting to obtain m ₃ Model predictive value, judge travel according to predictive valueWhether the guest is a foreign bride.

Further, the step of updating the random forest model at regular time specifically comprises the steps of traversing each abnormal personnel type, obtaining the passenger ID corresponding to each abnormal personnel type, randomly obtaining the passenger ID of the same number of conventional personnel, and forming a passenger training set by the passenger ID corresponding to each abnormal personnel type and the passenger ID of the conventional personnel obtained at random; and calculating various characteristics of the passenger training set. The passenger training set comprises passenger basic information, passenger travel information, passenger illegal records and passenger in-flight information feature modules.

Wherein the passenger basic information comprises nationality (China or not), age group (teenagers, young, middle-aged and elderly people), certificate type, gender, personnel category, visa stay period and visa category; the passenger travel information comprises the number of outbound times, average inbound time interval, average outbound time interval, whether the outbound times are matched, average domestic residence time, average foreign residence time, to-and-from (whether high risk countries are contained) and overseas tracks (whether high risk countries are contained); the illegal recording of the passengers comprises illegal times, illegal entry times and illegal residence times; the passenger in-home information includes in-home stay areas (villages, first-line cities, key cities and the like), accommodations (hotels, rentals, houses and the like), and aggregated visas (whether applied or not).

Further, the correlation analysis is carried out on the characteristic data of different abnormal personnel types by adopting a mutual information method, and the method specifically comprises the following steps:

obtaining the correlation values of different features and results according to the calculation formula of mutual information I (X; Y), wherein the range of the correlation values is [0,1], and the calculation formula of the mutual information I (X; Y) is as follows:

where p (X) and p (Y) are marginal probability distribution functions of feature X and tag Y, and p (X, Y) is a joint probability distribution function of feature X and tag Y.

According to another aspect of the present application, there is provided a passenger anomaly recognition device based on a random forest algorithm, including: a first module for performing a step of establishing a random forest model, a second module for performing a step of updating the random forest model at regular time, and a third module for performing a step of judging abnormality of the passenger; the first module is used for classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on the characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; the third module is used for calculating the characteristics of the passengers to be detected in real time, and comparing the characteristics with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.

Specifically, in the first module, the step of establishing the random forest model includes:

performing correlation analysis on the feature data of different abnormal personnel types by adopting a mutual information method to obtain correlation values of different features and results;

screening out feature set [ X ] with correlation value greater than 0 according to the correlation value ₀ ，X ₁ ，...，X _M ]Feature set and tag set [ Y ₀ ，Y ₁ ，...，Y _M ]Splitting into a training set and a testing set; feature set [ X ₀ ，X ₁ ，...，XM]Wherein X is _i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] ₀ ，Y ₁ ，...，Y _M ]Data length and feature set [ X ] ₀ ，X ₁ ，...，X _M ]Identical, tag set [ Y ₀ ，Y ₁ ，...，Y _M ]In (1), Y _i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);

in the second module, the step of updating the random forest model at regular time comprises updating the feature set and storing the model at regular time;

in the third module, the step of determining abnormality of the passenger includes:

calculating the characteristic h of the passenger P ₁ 、h ₂ 、h ₃ ；

judging whether the comparison result is abnormal or not;

Compared with the prior art, the application performs characteristic correlation analysis according to the passenger figure data and the marking data, combines a supervised classification model (random forest model) and provides an objective calculation method for classifying the passenger figure; the data and the service attributes are combined to provide the feature set related to the character portraits of the port passengers, so that the accuracy of the classification model is facilitated; performing correlation analysis on the data of different label types and the feature set, screening out irrelevant features, reducing the training calculation amount of the machine learning model, and improving the calculation efficiency; calculating the accuracy of each model according to the existing data at regular intervals, and when the accuracy meets the threshold requirement, predicting the characteristics of the forecast information, and automatically filtering a low-accuracy prediction model; when the flight does not arrive at the port, the passenger figures are classified and predicted in advance according to the forecast information, and when personnel abnormality exists, early warning prompt is given, personnel deployment can be carried out in advance, and the method has good practicability.

Drawings

The application may be better understood by referring to the following description in conjunction with the accompanying drawings in which like or similar reference numerals are used to indicate like or similar elements throughout the several views. The accompanying drawings, which are included to provide a further illustration of the preferred embodiments of the application and together with a further understanding of the principles and advantages of the application, are incorporated in and constitute a part of this specification. Attached at

In the figure:

FIG. 1 is a schematic diagram of a passenger anomaly identification method of the present application;

FIG. 2 is a schematic diagram of a feature data set of different abnormal person types according to the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings. Elements and features described in one drawing or embodiment of the application may be combined with elements and features shown in one or more other drawings or embodiments. It should be noted that the illustration and description of components and processes known to those skilled in the art, which are not relevant to the present application, have been omitted in the drawings and description for the sake of clarity.

At present, a port worker often carries out subjective abnormal judgment on a passenger according to own experience, and the possibility of missing error detection exists. The passenger anomaly identification method can be directly applied to the field of side-check pass passenger data research and judgment; according to the scheme, a machine learning model is built according to the characteristics of the abnormal personnel and the abnormal labels, abnormal recognition is carried out on passengers on the passway to be detected, and the early warning effect of the abnormal personnel is achieved.

As a specific embodiment, referring to fig. 1, the passenger anomaly identification method of the present application includes: a timing calculation process and a trigger type calculation process, wherein the timing calculation process comprises a pre-establishment step of a random forest model.

Wherein the timing calculation (initial calculation is started when deployed, the feature set is updated with the data accumulation per week fixed time calculation and the model is stored) comprises the following steps:

step (1): traversing the various abnormal person types i: betting on fraud personnel, illegal crews, foreign singing brides and the like;

step (2): acquiring passenger IDs corresponding to abnormal person types i, and randomly acquiring the passenger IDs of the same number of conventional persons to form a passenger training set;

step (3): calculating various characteristics of the passenger set, including passenger basic information, passenger travel information, passenger illegal records and passenger in-transit information characteristic modules;

(1) passenger basis information:

nationality (China or not), age group (teenagers, young, middle-aged and elderly), certificate type, gender, personnel category, visa residence time and visa category;

(2) passenger travel information: number of inbound, average inbound time interval, average outbound time interval, whether the inbound number matches, average domestic residence time, average foreign residence time, to-and-from (whether high risk country is included), overseas track (whether high risk country is included)

(3) Illegally recording passengers: number of illegal violations, number of illegal entries, number of illegal dwellings

(4) Passenger presence information: in the area of the Huazhi residence (villages and towns, first line cities, key cities and the like), accommodation (hotels, rentals, houses and the like), and agglomeration visa (whether applied or not);

step (4): performing correlation analysis on the characteristics by adopting a mutual information method;

and evaluating the correlation of the qualitative independent variable to the qualitative dependent variable, and evaluating the correlation of the category type variable to the category type variable, wherein the larger the mutual information is, the higher the correlation of the two variables is, and when the mutual information is 0, the two variables are mutually independent. The calculation formula of mutual information is:

where p (X) and p (Y) are marginal probability distribution functions of feature X and tag Y, and p (X, Y) is a joint probability distribution function of feature X and tag Y. Intuitively, the mutual information measures the information shared between two random variables, which can also be expressed as the amount of uncertainty reduction of Y due to the introduction of X, when the mutual information is the same as the information gain.

And finally obtaining the correlation values of different features and results according to a calculation formula of mutual information, wherein the numerical distribution [0,1].

Step (5): screening out correlation 0Feature, preserving feature set [ X ] with correlation greater than 0 ₀ ，X ₁ ，...，X _M ]Feature set and tag set [ Y ₀ ，Y ₁ ，...，Y _M ]Splitting into a training set and a testing set; feature set [ X ₀ ，X ₁ ，...，X _M ]Wherein X is _i (1.ltoreq.i.ltoreq.M) represents ith input data, containing n features; label collection [ Y ] ₀ ，Y ₁ ，...，Y _M ]Data length and feature set [ X ] ₀ ，X ₁ ，...，X _M ]Identical, tag set [ Y ₀ ，Y ₁ ，...，Y _M ]In (1), Y _i (1 is not less than i is not more than M) is the ith output result, which represents whether the output result is abnormal or not, and the value is 0 (no abnormality) or 1 (abnormality);

step (6): training a random forest model;

regarding the randomization:

(1) when training each tree, a subset is selected from all training samples for training (i.e. bootstrap sampling). Evaluating the residual data to evaluate the error;

(2) at each node, a subset of all features is randomly selected for computing the best segmentation.

The algorithm flow is as follows:

(1) the number of the total training samples is N, and then a single decision tree randomly extracts N training samples (bootstrap has replaced sampling) which are taken as the single tree from N training sets.

(2) Let the number of input features of training sample be M, M is far smaller than M, we choose M input features randomly from M input features when splitting is performed on each node of each decision tree, then choose one of the best input features to split. m will not change during the construction of the decision tree.

Note that: m features are randomly selected for each node and then the best one is selected for splitting. Metrics of split properties in decision trees: a base index.

(3) Each tree splits in this way until all training examples of the node belong to the same class, without pruning. Since the previous two random sampling processes ensure randomness, over-mapping does not occur even if pruning is not performed.

And (3) result judgment:

(1) the target features are of the digital type: taking the average value of t decision trees as a classification result.

The random forest method is a random forest based on Bagging Boosting (training samples are extracted from a data set for multiple times to construct a plurality of weak learners, and Boosting is iterative construction of the strong learners during training), and the training process of the random forest model is specifically realized as follows:

firstly, randomly selecting N samples from a training set to serve as a sampling set of Bootstrap sampling;

randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree; training a decision tree based on a sample set and an optional feature set of Bootstrap sampling;

4, predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;

step 5, re-sampling the Bootstrap according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;

step 6, repeatedly executing the step 4 and the step 5 until a specified number of decision trees are trained;

and 7, finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.

The training method of the random forest model is different from the conventional random forest model in that:

1. feature selection: the conventional random forest uses a random choice of m features (m=sqrt (p)) to construct a plurality of decision trees, while the random forest model constructed based on the Bagging Boosting method uses a random feature choice mode (m=log2p) to construct a plurality of decision trees.

2. And (3) adjusting: in the training process, training of each decision tree of a conventional random forest is carried out based on a sample set sampled by a Bootstrap, and a random forest model constructed based on a Bagging Boosting method is subjected to weight adjustment according to the error condition of a current sample in each round of training, so that the next round of training is more focused on the error sample, and the accuracy of the model is improved.

3. And (3) controlling the times: the training of the conventional random forest is usually performed based on the number of trees set in advance, while the random forest model constructed based on the Bagging Boosting method is evaluated in each iteration according to the prediction performance of the model, so as to determine whether the number of iterations needs to be increased.

Step (7): substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold T, storing the screened feature set and the random forest model;

step (8): if the prediction accuracy meets the set threshold, a characteristic set f of the betting-involved person can be obtained ₁ Illegal attendant feature set f ₂ Feature set f of foreign school bride ₃ Model m of person involved in betting ₁ Illegal attendant model m ₂ Model m of foreign-body bride ₃ ；

The triggered calculation (calculation before arrival of the forecasted passenger P) comprises the following steps:

step (1) calculating passenger P characteristic h ₁ 、h ₂ 、h ₃ The method comprises the steps of carrying out a first treatment on the surface of the Feature h ₁ For the characteristic data of the related betting participants, the characteristic h ₂ Is the characteristic data of related illegal crews, the characteristic h ₃ Characteristic data of relevant foreign school brides;

step (2) passenger P feature h ₁ Bringing the model library into a model m corresponding to the model library ₁ Predicting to obtain m ₁ Model predictive value, based on which it is known whether or not the passenger is a passengerA person involved in a wager;

step (3) passenger P feature h ₂ Bringing the model library into a model m corresponding to the model library ₂ Predicting to obtain m ₂ The model predictive value, according to predictive value can know whether the passenger is illegal worker;

step (4) passenger P feature h ₃ Bringing the model library into a model m corresponding to the model library ₃ Predicting to obtain m ₃ The model predictive value, according to predictive value can know whether the passenger is a foreign bride;

step (5) according to the steps, whether the passenger P is abnormal or not can be known, and if the passenger P is abnormal, early warning is sent out in advance;

step (6), if the passenger P predicts abnormality, when the passenger P goes through the customs inspection, the on-site staff performs key screening and questioning on the passenger P;

as another embodiment, the application also provides a passenger anomaly identification device based on the random forest algorithm, which executes the passenger anomaly identification method.

The method can be directly applied to the field of study and judgment of the side inspection passing passengers, a machine learning model is built according to the characteristics of abnormal personnel and the abnormal labels, abnormal recognition is carried out on the passengers on the side inspection passing, and the early warning effect of the abnormal personnel is achieved.

Furthermore, the methods of the present application are not limited to being performed in the time sequence described in the specification, but may be performed in other time sequences, in parallel or independently. Therefore, the order of execution of the methods described in the present specification does not limit the technical scope of the present application.

While the application has been disclosed in the context of specific embodiments, it should be understood that all embodiments and examples described above are illustrative rather than limiting. Various modifications, improvements, or equivalents of the application may occur to persons skilled in the art and are within the spirit and scope of the following claims. Such modifications, improvements, or equivalents are intended to be included within the scope of this application.

Claims

1. The passenger anomaly identification method based on the random forest algorithm is characterized by comprising the following steps of: comprising the following steps: a random forest model establishing step, a random forest model timing updating step and a passenger abnormality judging step;

in the step of establishing the random forest model, classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; in the step of judging the abnormality of the passengers, the characteristics of the passengers to be detected are calculated in real time, and the characteristics are compared with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.

2. The passenger anomaly identification method of claim 1, wherein:

the building step of the random forest model specifically comprises the following steps:

screening out a feature set with a correlation value greater than 0 according to the correlation value, and splitting the feature set and the label set into a training set and a testing set;

substituting the test set into a trained random forest model, and if the prediction precision meets a set threshold, storing the screened feature set and the random forest model;

the step of updating the random forest model regularly comprises updating the feature set and storing the model regularly;

an abnormality determination step for a traveler, including:

calculating the characteristic h of the passenger P ₁ 、h ₂ 、h ₃ The method comprises the steps of carrying out a first treatment on the surface of the Characteristics h of the passenger P ₁ 、h ₂ 、h ₃ Is based on passenger characteristics of different abnormal personnel types;

will passengersCharacteristic h of P ₁ 、h ₂ 、h ₃ Respectively comparing the model with models of different abnormal personnel types in the random forest model;

judging whether the comparison result is abnormal or not;

3. The passenger anomaly identification method of claim 2, wherein: the abnormal personnel types comprise betting-involved persons, illegal attendant persons and foreign brides; characteristics h of the passenger P ₁ For characteristic data of related betting participants, characteristic h of passenger P ₂ Characteristic h of passenger P as characteristic data of related illegal crew member ₃ Is the characteristic data of the relevant foreign school bride.

4. A method of identifying a passenger anomaly as claimed in claim 3, wherein: substituting the test set into a trained random forest model, and obtaining a characteristic set f of a betting-involved person if the prediction precision meets a set threshold value ₁ Illegal attendant feature set f ₂ Feature set f of foreign school bride ₃ Model m of person involved in betting ₁ Illegal attendant model m ₂ Model m of foreign-body bride ₃ 。

5. The passenger anomaly identification method of claim 4, wherein: feature h of passenger P ₁ 、h ₂ 、h ₃ Respectively comparing the model with models of different abnormal personnel types in a random forest model, wherein the method specifically comprises the following steps of:

feature h of passenger P ₂ Bringing the model library into a model m corresponding to the model library ₂ Predicting to obtain m ₂ Model predictive value, judge whether the passenger is illegal according to predictive valueA worker;

feature h of passenger P ₃ Bringing the model library into a model m corresponding to the model library ₃ Predicting to obtain m ₃ And judging whether the passenger is a foreign bride according to the model predictive value.

6. The passenger anomaly identification method of claim 5, wherein: the step of updating the random forest model at regular time specifically comprises the steps of traversing each abnormal personnel type, obtaining the passenger ID corresponding to each abnormal personnel type, randomly obtaining the same number of conventional personnel passenger IDs, and forming a passenger training set by the passenger ID corresponding to each abnormal personnel type and the conventional personnel passenger ID randomly obtained; calculating various characteristics of the passenger training set; the passenger training set comprises passenger basic information, passenger travel information, passenger illegal records and passenger in-flight information feature modules.

7. The passenger anomaly identification method of claim 6, wherein: the passenger basic information comprises nationality, age group, certificate type, gender, personnel category, visa stay period and visa category; the passenger travel information comprises the number of outbound times, average inbound time interval, average outbound time interval, whether the outbound times are matched, average domestic residence time, average foreign residence time, arrival and departure places and overseas tracks; the illegal recording of the passengers comprises illegal times, illegal entry times and illegal residence times; the passenger presence information includes presence areas, accommodations, and aggregation visas.

8. The passenger anomaly identification method of claim 6, wherein: the correlation analysis is carried out on the characteristic data of different abnormal personnel types by adopting a mutual information method, and the method specifically comprises the following steps:

9. The passenger anomaly identification method of claim 2, wherein: substituting the test set into a trained random forest model, wherein the training process of the random forest model comprises the following steps of: process 1: randomly selecting N samples from the training set to be used as a sampling set of Bootstrap sampling;

process 2: randomly selecting m features from all the features to be used as a selectable feature set of a current decision tree;

process 3: training a decision tree based on the sampled sample set and the optional feature set of the Bootstrap;

process 4: predicting according to the trained decision tree, and calculating the difference between the predicted result and the true value to be used as the sample weight of the next training;

process 5: carrying out Bootstrap sampling again according to the sample weight to obtain a new sample set, and training a next decision tree based on the new sample set and the optional feature set;

process 6: repeating the processes 4 and 5 until a specified number of decision trees are trained;

process 7: and finally integrating the prediction results of the decision trees in a voting mode to obtain a final classification result.

10. Passenger anomaly recognition device based on random forest algorithm, its characterized in that: comprising the following steps: a first module for performing a step of establishing a random forest model, a second module for performing a step of updating the random forest model at regular time, and a third module for performing a step of judging abnormality of the passenger; the first module is used for classifying the data samples into different abnormal personnel types, and carrying out correlation analysis on the characteristic data of the different abnormal personnel types to obtain correlation values of different characteristics and results; the third module is used for calculating the characteristics of the passengers to be detected in real time, and comparing the characteristics with the correlation values obtained in the step of establishing the random forest model to judge whether the passengers to be detected are abnormal or not.