CN112464978B - Method and device for identifying abnormal terminals of Internet of vehicles - Google Patents

Method and device for identifying abnormal terminals of Internet of vehicles Download PDF

Info

Publication number
CN112464978B
CN112464978B CN202011134825.4A CN202011134825A CN112464978B CN 112464978 B CN112464978 B CN 112464978B CN 202011134825 A CN202011134825 A CN 202011134825A CN 112464978 B CN112464978 B CN 112464978B
Authority
CN
China
Prior art keywords
data
point
data point
abnormal
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011134825.4A
Other languages
Chinese (zh)
Other versions
CN112464978A (en
Inventor
李明春
白天瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhilian Anhang Technology Co ltd
Original Assignee
Beijing Zhilian Anhang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhilian Anhang Technology Co ltd filed Critical Beijing Zhilian Anhang Technology Co ltd
Priority to CN202011134825.4A priority Critical patent/CN112464978B/en
Publication of CN112464978A publication Critical patent/CN112464978A/en
Application granted granted Critical
Publication of CN112464978B publication Critical patent/CN112464978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

One or more embodiments of the present disclosure provide a method and an apparatus for identifying an abnormal terminal of internet of vehicles, where the method includes: collecting original data of a vehicle networking terminal; clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups; for each data group, obtaining a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group; clustering each data group to obtain a second suspected abnormal data point; if a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, the data point is the second abnormal data point; based on the first abnormal data point and the second abnormal data point, the abnormal terminal is identified, the occurrence times of false alarm and missing alarm are effectively reduced, and the accuracy is improved.

Description

Method and device for identifying abnormal terminals of Internet of vehicles
Technical Field
One or more embodiments of the present disclosure relate to the field of internet of vehicles data processing technologies, and in particular, to a method and an apparatus for identifying an abnormal terminal of an internet of vehicles.
Background
With the development of internet technology, a large amount of car information is accessed to the internet to acquire data, services, etc. through the internet. At the same time, the network security problems associated therewith are constantly emerging. As an important ring for ensuring the safety of the internet of vehicles, it is important to identify and discover an abnormal terminal (i.e., a terminal device having a deviation from a normal terminal at an identity information or the like) accessing the internet of vehicles in time.
In the prior art, whether the terminal belongs to an abnormal terminal or not is identified by detecting IP login information and the like of the internet of vehicles terminal, and the detection and identification mode is easy to generate missing report, false report and the like, so that the accuracy is low.
IMSI (international mobile subscriber identity ) is a flag that distinguishes mobile subscribers. It uses numbers from 0 to 9, the total length not exceeding 15 bits. The IMSI is in the structure MCC (Mobile Country Code) + MNC (Mobile Network Code) +msin (mobile subscriber identity, mobile subscription identification number), where MCC is a country code number to which a mobile subscriber belongs, and 3 digits are occupied, MNC is a mobile network number, and consists of two digits or three digits, and is used for identifying a mobile communication network to which the mobile subscriber belongs, and MSIN is used for identifying a mobile subscriber in a certain mobile communication network.
Disclosure of Invention
In view of this, an object of one or more embodiments of the present disclosure is to provide a method and a device for identifying abnormal terminals of internet of vehicles, so as to solve the problem of low accuracy of detecting and identifying abnormal terminals of internet of vehicles in the prior art.
Based on the above objects, one or more embodiments of the present disclosure provide a method for identifying abnormal terminals in internet of vehicles, including:
collecting original data of a vehicle networking terminal;
clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups;
for each data group, obtaining a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group;
clustering each data group to obtain a second suspected abnormal data point;
if a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, the data point is the second abnormal data point;
an outlier terminal is identified based on the first outlier data point and the second outlier data point.
As an optional implementation manner, the collecting the original data of the internet of vehicles terminal; clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups, wherein the clustering comprises the following steps:
collecting original data of a vehicle networking terminal;
preprocessing the original data to obtain data to be processed;
and clustering the data to be processed to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups.
As an alternative embodiment, the preprocessing the raw data includes:
removing the items with the values of the original data being empty;
converting the IP address of the original data into decimal system;
and converting the original data after the IP address is converted into decimal into the format of an IMSI value corresponding to the IP.
As an optional implementation manner, the clustering the raw data to obtain a plurality of data groups and a first abnormal data point not belonging to the data groups includes:
calculating the local density of each data point in the data to be processed within a preset range;
comparing the local density of each data point with a preset local density threshold, and if the local density of a certain data point is larger than the local density threshold and other data points with the local density larger than the data point do not exist in a preset range taking the data point as the center, taking the data point as a center point, wherein the center point and all the data points positioned in the preset range of the center point jointly form a data group;
the data points that do not belong to any data group are the first outlier data points.
As an alternative embodiment, the obtaining, for each data group, a first suspected abnormal data point based on the MSIN value size of the data points within the data group includes:
for each data group, arranging all data points in the data group in sequence according to the MSIN value;
calculating the difference between two adjacent data points and the average value of all the difference values;
if the difference between a certain data point and two adjacent number points is larger than the average value, the data point is the first suspected abnormal data point.
As an optional implementation manner, the obtaining, for each data group, a first suspected abnormal data point based on the occurrence number of IMSI values of the data points in the data group includes:
calculating the occurrence times of each IMSI value for each data group, and determining the median of the occurrence times of the IMSI values;
if the number of occurrences of the IMSI value of a certain data point is lower than the median, the data point is the first suspected abnormal data point.
As an optional implementation manner, the clustering each data group separately to obtain the second suspected abnormal data point includes:
for each data group, selecting one data point from the data groups, judging whether the data point is a core point, and if not, marking the data point as a second suspected abnormal data point; if the data point is a core point, forming a cluster from the data point and other data points with the distance from the data point smaller than a preset radius;
judging whether the data points in the cluster are core points, if so, incorporating other data points outside the cluster, which are smaller than the preset radius, into the cluster, and returning to execute the step of judging whether the data points in the cluster are core points or not until the cluster cannot incorporate new data points;
and selecting one data point from the data points outside the cluster, and returning to the step of judging whether the data point is a core point or not until all the data points in the data group are marked.
As an optional implementation manner, the selecting a data point from the data group to determine whether the data point is a core point includes:
optionally selecting a data point from the data group;
judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number or not by taking the data point as a circle center;
if the number of the data points is larger than the preset number of samples, the data points are core points; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
As an optional implementation manner, the determining whether the data point in the cluster is a core point, if so, then including other data points outside the cluster with a distance from the core point smaller than a preset radius into the cluster includes:
optionally selecting one data point from the data points in the cluster;
judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number or not by taking the data point as a circle center;
if the distance between the data points outside the cluster and the core point is smaller than the preset radius, other data points outside the cluster, the distance between the data points outside the cluster and the core point is smaller than the preset radius, are included in the cluster; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
Corresponding to the method for identifying the abnormal terminal of the internet of vehicles, the embodiment of the invention also provides a device for identifying the abnormal terminal of the internet of vehicles, which comprises the following steps:
the acquisition unit is used for acquiring the original data of the Internet of vehicles terminal;
the first clustering unit is used for clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups;
an obtaining unit, configured to obtain, for each data group, a first suspected abnormal data point based on a MSIN value size and an IMSI value occurrence number of data points in the data group;
the second clustering unit is used for clustering each data group respectively to obtain second suspected abnormal data points;
the judging unit is used for judging whether a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, and if so, the data point is the second abnormal data point;
and the identification unit is used for identifying the abnormal terminal based on the first abnormal data point and the second abnormal data point.
From the above, it can be seen that, in the method and apparatus for identifying abnormal terminals of internet of vehicles provided in one or more embodiments of the present disclosure, raw data are clustered according to IMSI and IP address, and data are divided into a plurality of data groups by using IP and the like as features, and terminals from abnormal IP are detected, so as to obtain a plurality of data groups and first abnormal data points not belonging to the data groups; then, taking a data group as a unit, and acquiring a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group; clustering each data group again, and further detecting data points with abnormal IMSI values by taking MNC values and MSIN values as characteristics to obtain second suspected abnormal data points; and obtaining a second abnormal data point through the first suspected abnormal data point and the second suspected abnormal data point, and finally identifying an abnormal terminal through the first abnormal data point and the second abnormal data point, thereby improving the accuracy rate.
Drawings
For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only one or more embodiments of the present description, from which other drawings can be obtained, without inventive effort, for a person skilled in the art.
Fig. 1 is a logic schematic diagram of a method for identifying abnormal terminals in internet of vehicles according to one or more embodiments of the present disclosure;
FIG. 2 is a logical schematic of computing a first outlier data point in accordance with one or more embodiments of the present disclosure;
FIG. 3 is a logic diagram of calculating a first suspected outlier data point based on a MSIN value in accordance with one or more embodiments of the present disclosure;
fig. 4 is a logic diagram of calculating a first suspected outlier data point based on an IMSI value according to one or more embodiments of the present disclosure;
FIG. 5 is a logical schematic of computing a second outlier data point in accordance with one or more embodiments of the present disclosure;
FIG. 6 is a logical diagram of calculating whether a point within a data group is a core point according to one or more embodiments of the present disclosure;
FIG. 7 is a logical diagram of whether a point within a computing cluster is a core point in accordance with one or more embodiments of the present disclosure;
fig. 8 is a logic schematic diagram of an abnormal internet of vehicles terminal identification device according to one or more embodiments of the present disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made in detail to the following specific examples.
The following description of the embodiments of the present invention, such as the role and working principles of the parts involved, the manufacturing process and the method of operation and use, etc., is provided to assist those skilled in the art in a more complete, accurate and thorough understanding of the inventive concepts and technical solutions of the present invention.
In order to achieve the above objective, the embodiments of the present invention provide a method and an apparatus for identifying abnormal terminals in internet of vehicles, where the method and the apparatus may be applied to a terminal, or a server connected to a terminal, etc., and are not particularly limited. The method for identifying the abnormal terminal of the Internet of vehicles, which is provided by the embodiment of the invention, is firstly described in detail.
The method for identifying the abnormal terminal of the Internet of vehicles provided by the embodiment of the invention comprises the following steps:
s100, acquiring original data of a vehicle networking terminal;
s200, clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups;
s300, obtaining a first suspected abnormal data point according to MSIN value size and IMSI value occurrence frequency of data points in each data group;
s400, clustering each data group to obtain second suspected abnormal data points;
s500, if a certain data point belongs to a first suspected abnormal data point and a second suspected abnormal data point at the same time, the data point is a second abnormal data point;
s600, identifying an abnormal terminal based on the first abnormal data point and the second abnormal data point.
In the embodiment of the invention, the original data is clustered according to IMSI and IP address, the data is divided into a plurality of data groups by taking IP and the like as characteristics, and the terminals from abnormal IP are detected, so as to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups; then, taking a data group as a unit, and acquiring a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group; clustering each data group again, and further detecting data points with abnormal IMSI values by taking MNC values and MSIN values as characteristics to obtain second suspected abnormal data points; and obtaining a second abnormal data point through the first suspected abnormal data point and the second suspected abnormal data point, and finally identifying an abnormal terminal through the first abnormal data point and the second abnormal data point, so that the occurrence times of false alarm and false alarm are effectively reduced, and the accuracy is improved.
Fig. 1 is a schematic flow chart of a method for identifying abnormal terminals in internet of vehicles, provided by an embodiment of the invention, including:
s100, acquiring original data of the Internet of vehicles terminal.
S200, clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups.
Optionally, as shown in fig. 2, S200 includes:
s201, calculating the local density of each data point in the data to be processed within a preset range;
s202, comparing the local density of each data point with a preset local density threshold, and if the local density of a certain data point is larger than the local density threshold and other data points with the local density larger than the data point do not exist in a preset range taking the data point as the center, taking the data point as a center point, wherein the center point and all the data points positioned in the preset range of the center point jointly form a data group;
s203, the data points not belonging to any data group are first abnormal data points.
After this clustering, the obtained data points inside each data group have the same or similar IP address and MNC value, and the first abnormal data point is that some data point samples have different IP addresses from other data groups, and the number of samples from the IP is too small to be regarded as a normal sample, or the IMSI value of some data samples is numerically too large to be different from the IMSI value of other data samples from the same IP, and the number of data samples is too small to be regarded as a normal sample.
S300, obtaining a first suspected abnormal data point according to MSIN value size and IMSI value occurrence frequency of the data points in each data group.
Optionally, for each data group, the obtaining a first suspected abnormal data point based on the MSIN value size of the data points in the data group, as shown in fig. 3, includes:
s310, for each data group, arranging all data points in the data group in sequence according to the MSIN value;
s311, calculating the difference value between two adjacent data points and the average value of all the difference values;
s312, if the difference between a certain data point and two adjacent number points is larger than the average value, the data point is the first suspected abnormal data point.
Since the MSIN parts of IMSI values from the same IP and with the same MNC have the feature of increasing segment values and the average has a mathematical meaning that represents the overall trend and average level of the data, i.e. the average of the differences can be regarded as a reasonable difference between segments of different increasing values, a data point is a first suspected outlier if the difference between the data point and two adjacent number points is greater than the average.
Optionally, for each data group, the obtaining a first suspected abnormal data point based on the occurrence number of IMSI values of the data points in the data group, as shown in fig. 4, includes:
s320, calculating the occurrence times of each IMSI value for each data group, and determining the median of the occurrence times of the IMSI values;
s321, if the occurrence number of the IMSI value of a certain data point is lower than the median, the data point is the first suspected abnormal data point.
Because the occurrence times of different IMSI values in the sampled data may have a large difference, so that the occurrence times have a maximum value and a minimum value, the median of the occurrence times can be used as the general level of the occurrence times, and therefore, if the occurrence times of the IMSI value of a certain data point is lower than the median, the data point is the first suspected abnormal data point.
S400, clustering each data group to obtain second suspected abnormal data points.
Optionally, as shown in fig. 5, S400 includes:
s410, selecting one data point from the data groups according to each data group, judging whether the data point is a core point, and if not, marking the data point as a second suspected abnormal data point; if the data point is a core point, forming a cluster from the data point and other data points with the distance from the data point smaller than a preset radius;
optionally, the selecting a data point from the data group to determine whether the data point is a core point, as shown in fig. 6, includes:
s411, selecting one data point from the data group;
s412, judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number by taking the data point as a circle center;
s413, if the number of the data points is larger than the preset number of samples, the data points are core points; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
S420, judging whether the data points in the cluster are core points, if so, incorporating other data points outside the cluster, which are smaller than the preset radius, into the cluster, and returning to execute the step of judging whether the data points in the cluster are core points or not until the cluster cannot incorporate new data points;
optionally, the determining whether the data point in the cluster is a core point, if so, then including other data points outside the cluster with a distance from the core point smaller than a preset radius into the cluster, as shown in fig. 7, includes:
s421, selecting one data point from the data points in the cluster;
s422, judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number by taking the data point as a circle center;
s423, if the number of the data points is larger than the preset number of samples, the data points are core points, whether other data points outside the cluster are at a distance from the core points smaller than a preset radius is judged, and if the distance from the other data points outside the cluster to the core points is smaller than the preset radius, the other data points outside the cluster are included in the cluster; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
S430, selecting one data point from the data points outside the cluster, and returning to execute the step of judging whether the data point is a core point or not until all the data points in the data group are marked.
Because each data group has the characteristic that the MNC values of the internal data points are the same and the individual MSIN values are in a linear increasing trend, the second suspected abnormal data point represents that the MSIN values of some data samples differ too much from the MSIN values of other data samples having the same MNC and that the number of data samples is too small to be considered as normal samples.
S500, if a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, the data point is the second abnormal data point.
Thus, false alarms caused by insufficient data sampling can be avoided.
S600, identifying an abnormal terminal based on the first abnormal data point and the second abnormal data point.
As an optional implementation manner, the collecting the original data of the internet of vehicles terminal; clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups, wherein the clustering comprises the following steps:
collecting original data of a vehicle networking terminal;
preprocessing the original data to obtain data to be processed;
and clustering the data to be processed to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups.
Therefore, by preprocessing the original data, the data with obvious abnormality in the original data can be removed, and the data processing efficiency can be improved.
Optionally, the preprocessing the raw data includes:
removing the items with the values of the original data being empty;
converting the IP address of the original data into decimal system;
and converting the original data after the IP address is converted into decimal into the format of an IMSI value corresponding to the IP.
It is understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities.
Based on any one of the embodiments of the method for identifying abnormal terminals of internet of vehicles, the invention also provides a device for identifying abnormal terminals of internet of vehicles, as shown in fig. 8, comprising:
the acquisition unit 100 is used for acquiring original data of the internet of vehicles terminal;
a first clustering unit 200, configured to cluster the raw data, and obtain a plurality of data groups and first abnormal data points that do not belong to the data groups;
an obtaining unit 300, configured to obtain, for each data group, a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence number of the data points in the data group;
a second clustering unit 400, configured to cluster each data group to obtain a second suspected abnormal data point;
the judging unit 500 is configured to determine that a certain data point is a second abnormal data point if the certain data point belongs to both the first suspected abnormal data point and the second suspected abnormal data point;
and an identifying unit 600, configured to identify an abnormal terminal based on the first abnormal data point and the second abnormal data point.
In the embodiment of the invention, the original data is clustered according to IMSI and IP address, the data is divided into a plurality of data groups by taking IP and the like as characteristics, and the terminals from abnormal IP are detected, so as to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups; then, taking a data group as a unit, and acquiring a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group; clustering each data group again, and further detecting data points with abnormal IMSI values by taking MNC values and MSIN values as characteristics to obtain second suspected abnormal data points; and obtaining a second abnormal data point through the first suspected abnormal data point and the second suspected abnormal data point, and finally identifying an abnormal terminal through the first abnormal data point and the second abnormal data point, so that the occurrence times of false alarm and false alarm are effectively reduced, and the accuracy is improved.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in one or more pieces of software and/or hardware when implementing one or more embodiments of the present description.
The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Examples
The following describes the detection method in combination with the data in Table one.
List one
Data in a period of time is collected, as shown in a first table, the data is preprocessed, and the data to be processed is obtained, namely, the data in the first table are converted into the following format:
['3232238341' '460040000000724'],['3232238341' '460040000000725'],
['3232238341' '460070000001023'],['3232238341' '460070000001024'],
['2015070263' '460020111117665'],['2015070263' '460020111117665'],
['2015070263' '460020111117665'],['2015070263' '460020111117667'],
['2015070263' '460020111117667'],['2015070263' '460020111117668'],
['2015070263' '460020111117668'],['2015070263' '460020111117668'],
['2015070263' '460020111117669'],['2015070263' '460020111117669'],
['2015070263' '460020111117669'],['2015070263' '460020153258932'],
['2015070263' '460020153258932'],['2015070263' '460020153258932'],
['2015070263' '460020153258933'],['2015070263' '460020243563286'],
['2015070263' '460021235443258'],['2015070263' '460021235443258'],
['2015070263' '460021235443258'],['2015070263' '460033333333333'],
['882397202' '460002222547542']。
clustering the data to be processed to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups:
clustering the data, wherein the clustering result is as follows:
data group [ [ '3232238341' ' -460040000000724 ' ], [ (3232238341 ' ' -460040000000725 ' ], data group [ [ '3232238341' ' -460070000001023 ' ], [ (3232238341 ' ' -460070000001024 ' ], data group [ [ '2015070263' ' -460020111117665 ' ], [ (2015070263 ' ' -460020111117665 ' ],
['2015070263' '460020111117665'],['2015070263' '460020111117667'],
['2015070263' '460020111117667'],['2015070263' '460020111117668'],
['2015070263' '460020111117668'],['2015070263' '460020111117668'],
['2015070263' '460020111117669'],['2015070263' '460020111117669'],
['2015070263' '460020111117669'],['2015070263' '460020153258932'],
['2015070263' '460020153258932'],['2015070263' '460020153258932'],
['2015070263' '460020153258933'],['2015070263' '460020243563286'],
['2015070263' '460021235443258'],['2015070263' '460021235443258'],
['2015070263' '460021235443258']];
first outlier data point: ['2015070263' '460033333333333'],['882397202' '460002222547542'].
For each data group, obtaining a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group;
taking the third data group obtained in the previous step as an example for analysis, sorting MSIN values of data points in the data group, wherein the average value of all differences after sorting is 62462532, so that [ '2015070263' '-460020243563286' ] is taken as a first suspected abnormal data point; the number of occurrences of each IMSI value is 3, so [ '2015070263' 460020111117667' ], [ '2015070263' 460020153258933' ], and [ '2015070263' 460020243563286' ] are taken as the first suspected outlier data points.
Clustering each data group to obtain a second suspected abnormal data point;
after clustering each data group, the obtained result is: data group [ [ '3232238341' '-460040000000724' ], [ (3232238341 '' -460040000000725 '], data group [ ['3232238341 '' -460070000001023 '], [ (3232238341' '-460070000001024' ], data group [ [ '2015070263' '-460020111117665' ], [ (2015070263 '' -460020111117667 '], [ (2015070263' '-460020111117668' ], [ (2015070263 '' -460020111117669 '], data group ['2015070263 '' -460020153258932 '], [ (2015070263' '-460020153258933' ], second suspected outlier data point [ '2015070263' '-460020243563286' ], second suspected outlier data point [ '2015070263' '-460021235443258' ].
If a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, the data point is the second abnormal data point;
comparing the second suspected abnormal data point with the first suspected abnormal data point obtained in the previous step, wherein [ '2015070263' ' '460020243563286' ] of the intersection is the second abnormal data point.
Based on the first outlier data point and the second outlier data point, outlier terminals are identified, so terminals corresponding to [ '2015070263' ' -460033333333333 ' ], [ '882397202' ' -460002222547542 ' ], and [ (2015070263 ' ' -460020243563286 ' ] are outlier terminals.
It is noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present disclosure should be taken in a general sense as understood by one of ordinary skill in the art to which the present disclosure pertains. The use of the terms "first," "second," and the like in one or more embodiments of the present description does not denote any order, quantity, or importance, but rather the terms "first," "second," and the like are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
It should be noted that the methods of one or more embodiments of the present description may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of one or more embodiments of the present description, the devices interacting with each other to accomplish the methods.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; combinations of features of the above embodiments or in different embodiments are also possible within the spirit of the present disclosure, steps may be implemented in any order, and there are many other variations of the different aspects of one or more embodiments described above which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure one or more embodiments of the present description. Furthermore, the apparatus may be shown in block diagram form in order to avoid obscuring the one or more embodiments of the present description, and also in view of the fact that specifics with respect to implementation of such block diagram apparatus are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
The present disclosure is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the one or more embodiments of the disclosure, are therefore intended to be included within the scope of the disclosure.

Claims (10)

1. The method for identifying the abnormal terminal of the Internet of vehicles is characterized by comprising the following steps of:
collecting original data of a vehicle networking terminal;
clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups; the data points in each data group have the same or similar IP address and MNC value;
for each data group, obtaining a first suspected abnormal data point based on the MSIN value size and the IMSI value occurrence frequency of the data points in the data group;
clustering each data group to obtain a second suspected abnormal data point; the second suspected abnormal data point represents that the difference between the MSIN value of the data point and the MSIN value of other data points with the same MNC is larger than a preset threshold value and the number of other data points in the preset radius of the data point is smaller than the preset sample number;
if a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, the data point is the second abnormal data point;
an outlier terminal is identified based on the first outlier data point and the second outlier data point.
2. The method for identifying abnormal terminals of the internet of vehicles according to claim 1, wherein the collecting of the original data of the terminals of the internet of vehicles; clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups, wherein the clustering comprises the following steps:
collecting original data of a vehicle networking terminal;
preprocessing the original data to obtain data to be processed;
and clustering the data to be processed to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups.
3. The method for identifying abnormal terminals of internet of vehicles according to claim 2, wherein the preprocessing the raw data comprises:
removing the items with the values of the original data being empty;
converting the IP address of the original data into decimal system;
and converting the original data after the IP address is converted into decimal into the format of an IMSI value corresponding to the IP.
4. The method for identifying abnormal terminals of internet of vehicles according to claim 3, wherein the clustering the raw data to obtain a plurality of data groups and first abnormal data points not belonging to the data groups includes:
calculating the local density of each data point in the data to be processed within a preset range;
comparing the local density of each data point with a preset local density threshold, and if the local density of a certain data point is larger than the local density threshold and other data points with the local density larger than the data point do not exist in a preset range taking the data point as the center, taking the data point as a center point, wherein the center point and all the data points positioned in the preset range of the center point jointly form a data group;
the data points that do not belong to any data group are the first outlier data points.
5. The method for identifying abnormal terminals in the internet of vehicles according to claim 1, wherein the obtaining, for each data group, a first suspected abnormal data point based on the MSIN value size of the data points in the data group comprises:
for each data group, arranging all data points in the data group in sequence according to the MSIN value;
calculating the difference between two adjacent data points and the average value of all the difference values;
if the difference between a certain data point and two adjacent number points is larger than the average value, the data point is the first suspected abnormal data point.
6. The method for identifying abnormal terminals in the internet of vehicles according to claim 1, wherein the obtaining, for each data group, a first suspected abnormal data point based on the number of occurrences of the IMSI value of the data point in the data group comprises:
calculating the occurrence times of each IMSI value for each data group, and determining the median of the occurrence times of the IMSI values;
if the number of occurrences of the IMSI value of a certain data point is lower than the median, the data point is the first suspected abnormal data point.
7. The method for identifying abnormal terminals of internet of vehicles according to claim 1, wherein clustering each data group to obtain a second suspected abnormal data point comprises:
for each data group, selecting one data point from the data groups, judging whether the data point is a core point, and if not, marking the data point as a second suspected abnormal data point; if the data point is a core point, forming a cluster from the data point and other data points with the distance from the data point smaller than a preset radius;
judging whether the data points in the cluster are core points, if so, incorporating other data points outside the cluster, which are smaller than the preset radius, into the cluster, and returning to execute the step of judging whether the data points in the cluster are core points or not until the cluster cannot incorporate new data points;
and selecting one data point from the data points outside the cluster, and returning to the step of judging whether the data point is a core point or not until all the data points in the data group are marked.
8. The method for identifying abnormal terminals in the internet of vehicles according to claim 7, wherein selecting a data point from the data group, determining whether the data point is a core point, comprises:
optionally selecting a data point from the data group;
judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number or not by taking the data point as a circle center;
if the number of the data points is larger than the preset number of samples, the data points are core points; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
9. The method for identifying abnormal terminals of internet of vehicles according to claim 7, wherein the determining whether the data points in the cluster are core points, if so, includes incorporating other data points outside the cluster having a distance from the core point smaller than a preset radius into the cluster, including:
optionally selecting one data point from the data points in the cluster;
judging whether the number of other data points within a preset radius from the data point is larger than the preset sample number or not by taking the data point as a circle center;
if the distance between the data points outside the cluster and the core point is smaller than the preset radius, other data points outside the cluster, the distance between the data points outside the cluster and the core point is smaller than the preset radius, are included in the cluster; if the number of samples is less than the predetermined number of samples, the data point is not a core point.
10. An abnormal terminal identification device of internet of vehicles, which is characterized by comprising:
the acquisition unit is used for acquiring the original data of the Internet of vehicles terminal;
the first clustering unit is used for clustering the original data to obtain a plurality of data groups and first abnormal data points which do not belong to the data groups; the data points in each data group have the same or similar IP address and MNC value;
an obtaining unit, configured to obtain, for each data group, a first suspected abnormal data point based on a MSIN value size and an IMS I value occurrence number of data points within the data group;
the second clustering unit is used for clustering each data group respectively to obtain second suspected abnormal data points; the second suspected abnormal data point represents that the difference between the MSIN value of the data point and the MSIN value of other data points with the same MNC is larger than a preset threshold value and the number of other data points in the preset radius of the data point is smaller than the preset sample number;
the judging unit is used for judging whether a certain data point belongs to the first suspected abnormal data point and the second suspected abnormal data point at the same time, and if so, the data point is the second abnormal data point;
and the identification unit is used for identifying the abnormal terminal based on the first abnormal data point and the second abnormal data point.
CN202011134825.4A 2021-01-15 2021-01-15 Method and device for identifying abnormal terminals of Internet of vehicles Active CN112464978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011134825.4A CN112464978B (en) 2021-01-15 2021-01-15 Method and device for identifying abnormal terminals of Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011134825.4A CN112464978B (en) 2021-01-15 2021-01-15 Method and device for identifying abnormal terminals of Internet of vehicles

Publications (2)

Publication Number Publication Date
CN112464978A CN112464978A (en) 2021-03-09
CN112464978B true CN112464978B (en) 2024-03-01

Family

ID=74833935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011134825.4A Active CN112464978B (en) 2021-01-15 2021-01-15 Method and device for identifying abnormal terminals of Internet of vehicles

Country Status (1)

Country Link
CN (1) CN112464978B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792860A (en) * 2017-01-23 2017-05-31 北京天元创新科技有限公司 A kind of 4G network coverages abnormal detection method and detection means
CN109766393A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Abnormal deviation data examination method and device
CN110008976A (en) * 2018-12-05 2019-07-12 阿里巴巴集团控股有限公司 A kind of network behavior classification method and device
CN110730472A (en) * 2019-09-18 2020-01-24 深圳市优克联新技术有限公司 Communication certificate state detection method and server
WO2020019965A1 (en) * 2018-07-27 2020-01-30 阿里巴巴集团控股有限公司 Data monitoring method and device, electronic device, and computer readable storage medium
CN112135310A (en) * 2019-06-24 2020-12-25 中兴通讯股份有限公司 Abnormal terminal identification method and device, storage medium and electronic device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792860A (en) * 2017-01-23 2017-05-31 北京天元创新科技有限公司 A kind of 4G network coverages abnormal detection method and detection means
WO2020019965A1 (en) * 2018-07-27 2020-01-30 阿里巴巴集团控股有限公司 Data monitoring method and device, electronic device, and computer readable storage medium
CN110008976A (en) * 2018-12-05 2019-07-12 阿里巴巴集团控股有限公司 A kind of network behavior classification method and device
CN109766393A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Abnormal deviation data examination method and device
CN112135310A (en) * 2019-06-24 2020-12-25 中兴通讯股份有限公司 Abnormal terminal identification method and device, storage medium and electronic device
WO2020259045A1 (en) * 2019-06-24 2020-12-30 中兴通讯股份有限公司 Method and device for identifying terminal experiencing anomaly, storage medium, and electronic device
CN110730472A (en) * 2019-09-18 2020-01-24 深圳市优克联新技术有限公司 Communication certificate state detection method and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多窗口机制的聚类异常检测算法;何明亮;陈泽茂;左进;;信息网络安全(11);全文 *

Also Published As

Publication number Publication date
CN112464978A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN109284380B (en) Illegal user identification method and device based on big data analysis and electronic equipment
CN110647896B (en) Phishing page identification method based on logo image and related equipment
CN110493363B (en) System and method for distinguishing random MAC address of smart phone
CN111177469A (en) Face retrieval method and face retrieval device
CN111064719B (en) Method and device for detecting abnormal downloading behavior of file
CN110324352B (en) Method and device for identifying batch registered account groups
CN112464978B (en) Method and device for identifying abnormal terminals of Internet of vehicles
CN116821777B (en) Novel basic mapping data integration method and system
CN110929605A (en) Video key frame storage method, device, equipment and storage medium
CN113032358B (en) Water affair data processing method and device and terminal equipment
CN115273123A (en) Bill identification method, device and equipment and computer storage medium
CN116127337A (en) Risk mining method, device, storage medium and equipment based on position and image
CN115484044A (en) Data state monitoring method and system
CN108055661B (en) Telephone number blacklist establishing method and device based on communication network
CN114205820A (en) Method, device and computer equipment for detecting suspicious user carrying pseudo base station
CN112907306B (en) Customer satisfaction judging method and device
CN110944290A (en) Companion relationship analysis method and apparatus
CN116578874B (en) Satellite signal attribute appraising method and device based on network protocol
CN110163083A (en) Matching process, device and the terminal device of user information
CN114625786B (en) Dynamic data mining method and system based on wind control technology
CN109344132B (en) User information merging method, computer readable storage medium and terminal device
CN117336062A (en) Method, device, equipment and storage medium for detecting intrusion data
CN116881216A (en) Log sample screening method and device, electronic equipment and storage medium
CN113486341A (en) Smart city data processing method and device
CN114880298A (en) Smart city data sharing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210607

Address after: 100085 room 0106-640, 1st floor, No.26, shangdixinxi Road, Haidian District, Beijing

Applicant after: Beijing Zhilian Anhang Technology Co.,Ltd.

Address before: 100876 No.406, 4th floor, building 21, 10 Xitucheng Road, Haidian District, Beijing

Applicant before: Beijing ruanhui Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant