CN110929633A - Method for realizing abnormity detection of smoke-involved vehicle based on small data set - Google Patents

Method for realizing abnormity detection of smoke-involved vehicle based on small data set Download PDF

Info

Publication number
CN110929633A
CN110929633A CN201911135269.XA CN201911135269A CN110929633A CN 110929633 A CN110929633 A CN 110929633A CN 201911135269 A CN201911135269 A CN 201911135269A CN 110929633 A CN110929633 A CN 110929633A
Authority
CN
China
Prior art keywords
data set
matrix
label
features
small data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911135269.XA
Other languages
Chinese (zh)
Inventor
王贞
陶春和
王卫
甘小莺
尤梓荃
吴寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Third Research Institute of the Ministry of Public Security
Original Assignee
Third Research Institute of the Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Third Research Institute of the Ministry of Public Security filed Critical Third Research Institute of the Ministry of Public Security
Priority to CN201911135269.XA priority Critical patent/CN110929633A/en
Publication of CN110929633A publication Critical patent/CN110929633A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a method for realizing abnormal detection of a smoke-related vehicle based on a small data set, which comprises the steps of collecting data and constructing the data set; preprocessing an original data set and extracting features to construct a feature matrix; selecting the characteristics of the obtained characteristic matrix; and identifying by a classifier to obtain a detection result. By adopting the method for realizing the abnormal detection of the smoke-related vehicle based on the small data set, the suspected target can be quickly and accurately detected through the proposed algorithm. The method has the advantages that the higher accuracy rate can be obtained under the conditions of small data volume and rare labels, and the adoption of a manual discrimination mode is avoided, so that the traffic pressure is relieved while manpower and material resources are reduced, and the information utilization is maximized.

Description

Method for realizing abnormity detection of smoke-involved vehicle based on small data set
Technical Field
The invention relates to the field of tobacco, in particular to the field of anomaly detection, and specifically relates to a method for realizing anomaly detection of a smoke-related vehicle based on a small data set.
Background
The tobacco industry is a special industry which is closely related to government financial income, government supervision and action and can influence the health of consumers. The fake non-cigarettes are produced by illegal means or enter the market for circulation by illegal channels, so that violence is illegally obtained, the physical and mental health of consumers is seriously damaged, the national tax is greatly lost, the economic order of the market is disturbed, and the national benefits are seriously damaged. 29.15 ten thousand fake non-cigarettes are obtained from 3884 which is a national survey case with the value of more than 5 ten thousand yuan in 2016, and 55.3 ten thousand fake non-cigarettes are obtained from 9100 which is a national survey case with the value of more than 5 ten thousand yuan in 2018. The situation of crime is more and more serious. Therefore, the strike of the fake-private non-cigarette is not gentle.
As law enforcement continues to increase the force of attacking illegal cigarette-related transportation links, criminals are also gradually adjusting the transportation means of fake and private non-cigarettes. The traditional transportation means is mainly logistics consignment, but in the recent period of time, the proportion of the fake non-cigarettes transported through the logistics consignment link is gradually reduced from the viewpoint of checking the cigarette effect in the logistics consignment link. Illegal transportation such as special vehicle transportation and group transportation is gradually evolving into a means of tobacco-related illegal main transportation crimes.
Aiming at the inspection of special vehicle transportation and grouping transportation, two methods exist at present. The other is that law enforcement departments adopt manual detection modes at high-speed toll stations in which main traffic is congested according to case handling experience, and judge whether the passing vehicle is an illegal transportation vehicle involved in smoking or not according to characteristics of the vehicle (vehicle type, whether the vehicle carries cargo or not) and the like. The second method is to collect a large amount of information of normal vehicles and smoke-involved vehicles and distinguish the normal vehicles from the smoke-involved vehicles by using big data and a supervised learning method.
This first approach relies on the personal business attributes of the auditors, which is time and labor intensive. On the other hand, normal vehicle passing is easily influenced, and traffic jam is caused. The second method requires a large amount of vehicle information and tag data, which is costly and impractical. In reality, small data sets with small data size, incomplete data and sparse labels are often faced. Therefore, new ideas and technical means are urgently needed to be introduced for detecting the smoking-related vehicles, a small amount of data and tags are utilized, smoking-related illegal vehicles are rapidly identified from numerous social vehicles, active clue discovery is provided for checking smoking-related cases, and information analysis efficiency is improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for realizing the abnormity detection of a smoke-related vehicle based on a small data set, which has the advantages of small error, high efficiency and low time consumption.
In order to achieve the above object, the method for detecting abnormality of a smoke-related vehicle based on a small data set of the present invention comprises the following steps:
the method for realizing the abnormal detection of the smoking-related vehicle based on the small data set is mainly characterized by comprising the following steps of:
(1) collecting data and constructing a data set;
(2) preprocessing an original data set and extracting features to construct a feature matrix;
(3) selecting the characteristics of the obtained characteristic matrix;
(4) and identifying by a classifier to obtain a detection result.
Preferably, the step (2) specifically comprises the following steps:
(2.1) complementing or deleting missing values in the original data set, and merging duplicate IDs;
and (2.2) selecting relevant features and extracting the features.
Preferably, the step (3) specifically includes the following steps:
(3.1) constructing a random forest model by a small amount of positive samples and negative samples;
and (3.2) sorting the importance of the features through a random forest model, and selecting 9 features which are ranked at the top as the features of the classifier according to a sorting result.
Preferably, the step (4) specifically includes the following steps:
(4.1) calculating an initial probability transition matrix T and a label matrix Y;
(4.2) multiplying the probability transition matrix T and the label matrix Y to obtain a new label matrix;
(4.3) normalizing each row of the label matrix Y, and recovering the label information of the sample of the original existing label;
(4.4) judging whether the label matrix Y is converged, if so, outputting the label matrix, and screening out a suspicious target according to the label matrix; otherwise, continue step (4.2).
By adopting the method for realizing the abnormal detection of the smoke-related vehicle based on the small data set, the suspected target can be quickly and accurately detected through the proposed algorithm. The method has the advantages that the higher accuracy rate can be obtained under the conditions of small data volume and rare labels, and the adoption of a manual discrimination mode is avoided, so that the traffic pressure is relieved while manpower and material resources are reduced, and the information utilization is maximized.
Drawings
FIG. 1 is a flow chart of a method of the present invention for implementing anomaly detection for a smoking-related vehicle based on a small data set.
Fig. 2 is a flow chart of label propagation calculation of the method for implementing anomaly detection of a smoking-related vehicle based on a small data set according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
The invention discloses a method for realizing abnormal detection of a smoke-related vehicle based on a small data set, which comprises the following steps:
(1) collecting data and constructing a data set;
(2) preprocessing an original data set and extracting features to construct a feature matrix;
(2.1) complementing or deleting missing values in the original data set, and merging duplicate IDs;
(2.2) selecting relevant features and extracting the features;
(3) selecting the characteristics of the obtained characteristic matrix;
(3.1) constructing a random forest model by a small amount of positive samples and negative samples;
(3.2) sorting the importance of the features through a random forest model, and selecting 9 features which are ranked at the top as the features of the classifier according to a sorting result;
(4) identifying through a classifier to obtain a detection result;
(4.1) calculating an initial probability transition matrix T and a label matrix Y;
(4.2) multiplying the probability transition matrix T and the label matrix Y to obtain a new label matrix;
(4.3) normalizing each row of the label matrix Y, and recovering the label information of the sample of the original existing label;
(4.4) judging whether the label matrix Y is converged, if so, outputting the label matrix, and screening out a suspicious target according to the label matrix; otherwise, continue step (4.2).
In the specific implementation mode of the invention, the main problems to be solved by the invention are to overcome the defects of time and labor consumption in the conventional detection of the vehicle involved in smoke and the problems of low data volume and low accuracy caused by rare labels. The method for detecting the abnormity of the smoking vehicle based on the small data set is provided, so that the smoking illegal vehicle can be quickly and accurately identified, scientific decision support is provided for inspection actions, and the efficiency is improved. According to the invention, a smoke-involved vehicle abnormity detection method based on a small data set is provided, and the detection method specifically comprises the following steps: first, a preliminary feature matrix is constructed from the original data set by feature extraction. Second, the features are screened by feature selection. And finally, through classifier identification, predicting the non-labeled vehicles by using a small number of existing labeled vehicles.
FIG. 1 is a flow chart of a preferred embodiment of the present invention. First, in step S1, data needs to be collected to construct a data set. In the present embodiment, the data set of the present embodiment is constructed by the vehicle information collected by two toll booths at a certain place. Vehicles may enter and exit the area from these two toll booths. The data set comprises information of vehicle model, weight, axle number, vehicle color and the like recorded by the toll station.
In step S2, preprocessing and feature extraction are required for the raw data set. Specifically, the original data set includes many missing values and duplicate IDs, and the missing values need to be completed or deleted first, and the duplicate IDs need to be merged. Then, relevant features are selected according to experience and priori knowledge of inspectors, and feature extraction is carried out. In the example, considering that abnormal vehicles can select night travel with relaxed inspection force for avoiding inspection when being driven into the area with full load of prohibited articles, and the selection of the driving time is similar to that of normal vehicles because the vehicles are not loaded with prohibited articles when being driven out, the frequency of driving into the toll station in each hour every day during the observation period of the vehicle i is extracted to form a 24-dimensional characteristic vector [ fi1,fi2,…,fi24]. In addition, the abnormal vehicle can be driven into the area with full load of prohibited articles, and can be driven out of the area with no load, so that the driving-in single axle weight of the abnormal vehicle can be in the interval 800-. Accordingly, the variation in the on-and-off axle weight of the abnormal vehicle obtained from the on-and-off axle weight is different from that of the normal vehicle.
In step S3, feature selection is performed on the feature matrix obtained in S2. The feature selection can effectively reduce feature dimensionality and reduce the calculation complexity. In the example, the feature importance is sorted by a random forest algorithm, and according to the sorting result, 9 features ranked at the top are selected as the features which are finally input to the classifier. Specifically, a random forest is constructed by using a small number of positive samples and negative samples. Because the random forest algorithm splits branches according to the characteristic values, after the random forest is constructed, the contribution degree of each characteristic in branch splitting can be calculated, and the characteristic contribution degree is regarded as the characteristic importance degree, so that the characteristic importance ranking is obtained.
In step S4, a detection result is obtained by the classifier. Specifically, the label-free samples are predicted by using the existing label samples through a label propagation algorithm, a complete graph is constructed by using vehicle nodes, and the points are connected with edges.
The framework of the label propagation algorithm is shown in fig. 2, which can be specifically divided into the following steps:
in step S41, the euclidean distance d of the features of every two points is first calculatedii′. The weight of the redefined edge is shown in equation (1).
Figure BDA0002279426760000041
Defining a probability transfer matrix T, let Tii′Representing label information from node xi′Is propagated to xiProbability of (c):
Figure BDA0002279426760000042
simultaneously defining a label matrix Y, each element of which
Figure BDA0002279426760000043
Representing a node xiIs labeled as class ciE {1,2, …, C }.
In step S42, the probability transition matrix T is multiplied by the label matrix Y, and a new label matrix is obtained.
In step S43, each row of the label matrix is normalized, and the label information of the sample of the original existing label is recovered.
In step S44, it is determined whether the tag matrix Y has converged, and if the convergence condition has been reached, the tag matrix is output and a suspicious object is screened according to the tag matrix, otherwise, the process returns to step S42.
By adopting the method for realizing the abnormal detection of the smoke-related vehicle based on the small data set, the suspected target can be quickly and accurately detected through the proposed algorithm. The method has the advantages that the higher accuracy rate can be obtained under the conditions of small data volume and rare labels, and the adoption of a manual discrimination mode is avoided, so that the traffic pressure is relieved while manpower and material resources are reduced, and the information utilization is maximized.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (4)

1. A method for realizing abnormal detection of a smoking-related vehicle based on a small data set is characterized by comprising the following steps:
(1) collecting data and constructing a data set;
(2) preprocessing an original data set and extracting features to construct a feature matrix;
(3) selecting the characteristics of the obtained characteristic matrix;
(4) and identifying by a classifier to obtain a detection result.
2. The method for realizing abnormality detection of a vehicle involved in smoke based on small data set according to claim 1, wherein the step (2) comprises the following steps:
(2.1) complementing or deleting missing values in the original data set, and merging duplicate IDs;
and (2.2) selecting relevant features and extracting the features.
3. The method for realizing abnormality detection of a vehicle involved in smoke based on small data set according to claim 1, wherein the step (3) comprises the following steps:
(3.1) constructing a random forest model by a small amount of positive samples and negative samples;
and (3.2) sorting the importance of the features through a random forest model, and selecting 9 features which are ranked at the top as the features of the classifier according to a sorting result.
4. The method for realizing abnormality detection of a vehicle involved in smoke based on small data set according to claim 1, wherein the step (4) comprises the following steps:
(4.1) calculating an initial probability transition matrix T and a label matrix Y;
(4.2) multiplying the probability transition matrix T and the label matrix Y to obtain a new label matrix;
(4.3) normalizing each row of the label matrix Y, and recovering the label information of the sample of the original existing label;
(4.4) judging whether the label matrix Y is converged, if so, outputting the label matrix, and screening out a suspicious target according to the label matrix; otherwise, continue step (4.2).
CN201911135269.XA 2019-11-19 2019-11-19 Method for realizing abnormity detection of smoke-involved vehicle based on small data set Pending CN110929633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911135269.XA CN110929633A (en) 2019-11-19 2019-11-19 Method for realizing abnormity detection of smoke-involved vehicle based on small data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911135269.XA CN110929633A (en) 2019-11-19 2019-11-19 Method for realizing abnormity detection of smoke-involved vehicle based on small data set

Publications (1)

Publication Number Publication Date
CN110929633A true CN110929633A (en) 2020-03-27

Family

ID=69850318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911135269.XA Pending CN110929633A (en) 2019-11-19 2019-11-19 Method for realizing abnormity detection of smoke-involved vehicle based on small data set

Country Status (1)

Country Link
CN (1) CN110929633A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358203A (en) * 2022-01-11 2022-04-15 平安科技(深圳)有限公司 Training method and device for image description sentence generation module and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning
CN108256052A (en) * 2018-01-15 2018-07-06 成都初联创智软件有限公司 Automobile industry potential customers' recognition methods based on tri-training
CN109241933A (en) * 2018-09-21 2019-01-18 深圳市九洲电器有限公司 Video linkage monitoring method, monitoring server, video linkage monitoring system
CN109389177A (en) * 2018-10-25 2019-02-26 长安大学 A kind of tunnel vehicle recognition methods again based on collaboration cascade forest
CN110460605A (en) * 2019-08-16 2019-11-15 南京邮电大学 A kind of Abnormal network traffic detection method based on autocoding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning
CN108256052A (en) * 2018-01-15 2018-07-06 成都初联创智软件有限公司 Automobile industry potential customers' recognition methods based on tri-training
CN109241933A (en) * 2018-09-21 2019-01-18 深圳市九洲电器有限公司 Video linkage monitoring method, monitoring server, video linkage monitoring system
CN109389177A (en) * 2018-10-25 2019-02-26 长安大学 A kind of tunnel vehicle recognition methods again based on collaboration cascade forest
CN110460605A (en) * 2019-08-16 2019-11-15 南京邮电大学 A kind of Abnormal network traffic detection method based on autocoding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘翱等: "基于标签传播的P2P 网络借贷平台分类" *
姚登举等: "基于随机森林的特征选择算法" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358203A (en) * 2022-01-11 2022-04-15 平安科技(深圳)有限公司 Training method and device for image description sentence generation module and electronic equipment

Similar Documents

Publication Publication Date Title
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN111475804B (en) Alarm prediction method and system
CN109768985B (en) Intrusion detection method based on flow visualization and machine learning algorithm
CN111045847B (en) Event auditing method, device, terminal equipment and storage medium
CN111882446A (en) Abnormal account detection method based on graph convolution network
Yahaya et al. Enhancing crash injury severity prediction on imbalanced crash data by sampling technique with variable selection
CN105138913A (en) Malware detection method based on multi-view ensemble learning
CN110363510B (en) Encryption currency user feature mining and abnormal user detection method based on block chain
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN110569904B (en) Method for constructing machine learning model and computer-readable storage medium
CN102420723A (en) Anomaly detection method for various kinds of intrusion
WO2016177069A1 (en) Management method, device, spam short message monitoring system and computer storage medium
CN114022904B (en) Noise robust pedestrian re-identification method based on two stages
CN112559771A (en) Intelligent capital transaction monitoring method and system based on knowledge graph
CN113922985A (en) Network intrusion detection method and system based on ensemble learning
CN114385775A (en) Sensitive word recognition method based on big data
CN111047173A (en) Community credibility evaluation method based on improved D-S evidence theory
CN115600194A (en) Intrusion detection method, storage medium and device based on XGboost and LGBM
CN111797177A (en) Financial time sequence classification method for abnormal financial account detection and application
Arya et al. Ensemble filter-based feature selection model for cyber attack detection in industrial Internet of Things
Acharya et al. Efficacy of CNN-bidirectional LSTM hybrid model for network-based anomaly detection
CN108920694B (en) Short text multi-label classification method and device
Chkirbene et al. Data augmentation for intrusion detection and classification in cloud networks
Boldt et al. Predicting burglars’ risk exposure and level of pre-crime preparation using crime scene data
Yang et al. Voting-based ensemble model for network anomaly detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination