CN117478787A - Abnormal number identification method and device and nonvolatile storage medium - Google Patents

Abnormal number identification method and device and nonvolatile storage medium Download PDF

Info

Publication number
CN117478787A
CN117478787A CN202311467581.5A CN202311467581A CN117478787A CN 117478787 A CN117478787 A CN 117478787A CN 202311467581 A CN202311467581 A CN 202311467581A CN 117478787 A CN117478787 A CN 117478787A
Authority
CN
China
Prior art keywords
features
feature
target
determining
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311467581.5A
Other languages
Chinese (zh)
Inventor
蒋艳军
赵轶新
王乾
肖楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311467581.5A priority Critical patent/CN117478787A/en
Publication of CN117478787A publication Critical patent/CN117478787A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2218Call detail recording
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses an abnormal number identification method and device and a nonvolatile storage medium. Wherein the method comprises the following steps: carrying out first feature extraction on call ticket data corresponding to a target number to be identified to obtain a plurality of first features, and respectively determining a first value corresponding to each first feature in the plurality of first features; acquiring a first value interval corresponding to each first feature; extracting the second feature of the dialogue sheet data to obtain a plurality of second features, and respectively determining a second value corresponding to each of the plurality of second features; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold. The method solves the technical problems that the number features are extracted more singly, the subdivision granularity of feature processing is lower, and the recognition precision is lower due to the related abnormal number recognition method.

Description

Abnormal number identification method and device and nonvolatile storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for identifying an abnormal number, and a nonvolatile storage medium.
Background
Fraud calls can adversely affect the life of the operator user and even threaten the property and life security of the operator user. Thus, the striking and interception of fraud calls is very important. Accurate identification of fraudulent calls is a problem because if there is a problem with identification, it will result in normal user failure to use it normally. However, the related abnormal number identification method has the technical problems that the number features are extracted more singly, the granularity of subdivision of feature processing is lower, and therefore, the identification precision of the abnormal number is lower.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides an abnormal number identification method and device and a nonvolatile storage medium, which at least solve the technical problems that the number characteristics are extracted more singly and the granularity of feature processing is lower, so that the identification precision is lower.
According to an aspect of the embodiments of the present application, there is provided a method for identifying an abnormal number, including: carrying out first feature extraction on call ticket data corresponding to a target number to be identified to obtain a plurality of first features, and respectively determining a first value corresponding to each first feature in the plurality of first features; acquiring a first value interval corresponding to each first feature; under the condition that the number of the first target features exceeds a first preset threshold, carrying out second feature extraction on the dialogue single data to obtain a plurality of second features, and respectively determining second values corresponding to each of the plurality of second features, wherein the first target features are first features, among the plurality of first features, of which the first values are not in a first value interval; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold, wherein the second target features are the second features of the plurality of second features, the second values of which are in the second value interval.
Optionally, before the first value interval corresponding to each first feature is acquired, the method further includes: determining a history database; acquiring historical ticket data corresponding to the historical abnormal number in the historical database; extracting features of the historical ticket data to obtain a plurality of third features; respectively determining a value interval corresponding to each third feature in the plurality of third features; and determining a first value interval corresponding to each first feature in the value intervals corresponding to each third feature in the plurality of third features.
Optionally, determining the history database includes: acquiring historical abnormal numbers of a plurality of different source channels; converting the historical abnormal number into a first abnormal number in a target format, wherein the first abnormal number in the target format at least comprises: source channel of history abnormal number; classifying the first abnormal numbers according to different source channels of the historical abnormal numbers to obtain a plurality of second abnormal numbers; filtering the plurality of second abnormal numbers according to different filtering rules corresponding to different source channels to obtain a third abnormal number; and determining a historical database according to the third abnormal number.
Optionally, in the case that the number of the first target features exceeds a first preset threshold, performing second feature extraction on the dialog sheet data to obtain a plurality of second features, including: determining a plurality of fourth features corresponding to the call ticket data; setting a feature mean value of each of the plurality of fourth features to 0 and a feature variance to 1; determining a plurality of principal component features and a maximum principal component feature in a plurality of fourth features according to the identification result of the historical abnormal number; and carrying out secondary feature extraction on the dialog sheet data according to the plurality of principal component features and the maximum principal component feature to obtain a plurality of second features.
Optionally, performing a second feature extraction on the dialog ticket data according to the plurality of principal component features and the maximum principal component feature to obtain a plurality of second features, including: determining a feature other than the principal component feature and the maximum principal component feature among the plurality of fourth features as a plurality of fifth features; determining a projection value of each fifth feature of the plurality of fifth features onto the principal component feature; determining a feature mean and a feature variance between each of the plurality of principal component features; determining a feature mean ratio and a feature variance ratio between each principal component feature of the plurality of principal component features and a maximum principal component feature; and performing dimension reduction processing on the dialogue single data according to the characteristic mean value ratio and the characteristic variance ratio to obtain a plurality of second characteristics.
Optionally, in a case where the number of the first target features does not exceed the first preset threshold, the target number to be identified is determined as the abnormal number.
Optionally, the plurality of first features includes at least two of: number identification, the region where the number is located, the calling frequency of the number, the complaint times of the number, the answering time of the number, the starting and ending time of the call and the time of the call.
According to still another aspect of the embodiments of the present application, there is further provided an apparatus for identifying an abnormal number, including: the first determining module is used for extracting the first characteristics of the call ticket data corresponding to the target number to be identified to obtain a plurality of first characteristics, and determining a first value corresponding to each of the plurality of first characteristics respectively; the acquisition module is used for acquiring a first value interval corresponding to each first characteristic; the second determining module is used for extracting the second characteristics from the dialogue single data under the condition that the number of the first target characteristics exceeds a first preset threshold value to obtain a plurality of second characteristics, and determining second values corresponding to each of the plurality of second characteristics respectively, wherein the first target characteristics are first characteristics of which the first values are not in a first value interval in the plurality of first characteristics; the third determining module is configured to obtain a second value interval corresponding to each second feature, and determine, when the number of second target features exceeds a second preset threshold, a target number to be identified as an abnormal number, where the second target feature is a second feature, of the plurality of second features, whose second value is in the second value interval.
According to still another aspect of the embodiments of the present application, there is further provided a non-volatile storage medium, where the storage medium includes a stored program, and when the program runs, the device on which the storage medium is controlled to execute the above method for identifying an abnormal number.
According to still another aspect of the embodiments of the present application, there is also provided an electronic device, including: the processor is used for running a program stored in the memory, wherein the program executes the identification method of the abnormal number.
In the embodiment of the application, the first feature extraction is carried out on the call ticket data corresponding to the target number to be identified, so that a plurality of first features are obtained, and the first value corresponding to each of the plurality of first features is respectively determined; acquiring a first value interval corresponding to each first feature; under the condition that the number of the first target features exceeds a first preset threshold, carrying out second feature extraction on the dialogue single data to obtain a plurality of second features, and respectively determining second values corresponding to each of the plurality of second features, wherein the first target features are first features, among the plurality of first features, of which the first values are not in a first value interval; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold value, wherein the second target features are the second features of the second value intervals in the plurality of second features, so that the aim of improving the subdivision granularity of feature processing is fulfilled, the technical effect of improving the identification precision of the abnormal number is realized, and the technical problems that the extraction of the number features is single, the subdivision granularity of feature processing is lower, and the identification precision is lower are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a method of identifying an anomaly number according to an embodiment of the present application;
FIG. 2 is a flow chart of another method of identifying an anomaly number according to an embodiment of the present application;
fig. 3 is a block diagram of an identification device of an abnormal number according to an embodiment of the present application;
fig. 4 is a block diagram of a hardware configuration of a computer terminal of a method for identifying an abnormal number according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a method for identifying an abnormal number according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:
step S102, extracting features of call ticket data corresponding to a target number to be identified for the first time to obtain a plurality of first features, and determining a first value corresponding to each of the plurality of first features respectively.
According to some optional embodiments of the present application, the ticket data is also called as an operator ticket data, where the ticket data refers to data recorded by telephone communication, and includes information such as calling time, call duration, calling party number, called party number, and the like. The ticket data can be used for management and operation of the communication network, and can also be used for analyzing aspects of user communication behavior, charging, fraud detection and the like. The call ticket data is typically collected and stored by a telecommunications carrier or communication service provider.
Note that the ticket data is ticket data within a first preset time range, for example, ticket data within the previous week since the identification of the target number to be identified.
The call ticket data refers to the original communication record information, and can be called as detail list CDR (Call Detail Record ). Taking a landline phone as an example, the ticket mainly records the following information: serial number, subscriber identification, calling number, called number, start time, end time, call duration, call nature, rate, cost, discount, etc. For the mobile phone, in addition to the call records, the information of the call ticket records also comprises short message service (Short Message Service, SMS), multimedia message service (Multimedia Messaging Service, MMS), wireless application protocol (Wireless Application Protocol, wap) service, general packet radio service (General Packet Radio Service, GPRS) and the like, and the record format is similar to the call ticket.
Optionally, the plurality of first features may include, but is not limited to, a region of a target number to be identified, a calling frequency, a number of complaints, a receiving time, a calling frequency, and a number identifier, where the number identifier may include: handset number, subscriber identity module number (Subscriber Identity Module, SIM) and International Mobile Equipment Identity (IMEI) code (International Mobile Equipment Identity).
Step S104, a first value interval corresponding to each first feature is obtained.
According to other optional embodiments of the present application, the first value interval corresponding to the first feature is determined by the following method:
in step S1041, a history database is determined.
In some alternative embodiments of the present application, determining the history database may be accomplished by:
step S10411, obtaining historical abnormal numbers of a plurality of different source channels. Optionally, a plurality of different source channels such as: enterprises and institutions, government departments, etc.
Step S10412, converting the historical abnormal number into a first abnormal number in the target format, where the first abnormal number in the target format at least includes: source channel of historical abnormal number.
Optionally, format conversion is performed on the historical abnormal numbers of the plurality of different source channels, and the historical abnormal numbers are converted into a target format, wherein the target format is as follows: source |number| questions describe |complaint| number base information, it is understood that the data format can be adjusted by configuration.
Step S10413, classifying the first abnormal numbers according to different source channels of the historical abnormal numbers to obtain a plurality of second abnormal numbers. For example, the first abnormal number is divided into an abnormal number whose source channel is an enterprise and public institution and an abnormal number whose source channel is a government department.
Step S10414, filtering the plurality of second abnormal numbers according to different filtering rules corresponding to different source channels to obtain a third abnormal number.
Optionally, different source channels correspond to different number whitelists and/or different filtering rules, and different recognition scenes can be combined according to different filtering rules corresponding to different source channels, so that a plurality of second abnormal numbers are filtered, and a third abnormal number is obtained.
Step S10415, determining a history database according to the third abnormal number.
Optionally, the third abnormal number is cleaned, so that a history database can be established, where the data cleaning refers to processing and converting the original data to remove incomplete, inaccurate, inconsistent, repeated or irrelevant contents, so that the data is more standard and usable. The data cleansing includes the following aspects: 1. missing value processing: detecting and processing missing values in the data, optionally deleting the data containing the missing values, filling the missing values with an average value or a median value, or filling the missing values with an interpolation method; 2. outlier processing: detecting and processing abnormal values in the data, wherein the abnormal values can be selected to be deleted, replaced by an average value or a median value, or replaced by an interpolation method; 3. repeating the value processing: detecting and processing repeated values in the data, wherein the repeated values can be deleted or combined; 4. format conversion: converting data into a uniform format, for example, converting date and time into a specific format, converting text into numbers, and the like; 5. data type conversion: converting data into a correct data type, for example converting a character string into a numeric type, a date type, etc.; 6. data normalization: normalization processing is performed on the data, such as scaling the data to a specific range, converting the data to a percentage, and the like; 7. data consistency check: detecting inconsistency in data, for example, checking different expression modes of the same entity, and unifying the different expression modes into one expression mode; 8. data integration: integrating and merging the data of a plurality of data sources, eliminating repeated data and ensuring the uniqueness of the data; 9. data screening: and screening the data according to the requirements, reserving the required data, and deleting the unnecessary data.
Step S1042, obtaining the historical ticket data corresponding to the historical abnormal number in the historical database.
The historical ticket data is ticket data within a second preset time range, for example, ticket data within the last week to the last three months since the identification of the target number to be identified. The call ticket data is called as operator call ticket data, wherein the call ticket data refers to data recorded by telephone communication and comprises information such as calling time, call duration, calling party number, called party number and the like.
Step S1043, extracting features of the historical ticket data to obtain a plurality of third features.
Feature extraction of historical ticket data may result in a plurality of features, some common features including: 1. call duration: calculating the call duration of each user, which can be obtained by accumulating the duration of each call record; 2. call frequency: counting the number of calls of each user, and obtaining the number of call records by calculating the number of call records; 3. talk period distribution: converting the time stamp of each call record into specific time periods (such as morning, afternoon, evening, and the like), and counting the number of calls or the call duration of each time period; 4. talk site distribution: according to the call site information in the call records, the call site distribution situation of each user is counted, and a geographic information system can be used for clustering geographic positions or thermodynamic diagram analysis; 5. call type distribution: counting the number or the ratio of call records of different types of each user, such as voice calls, short messages, multimedia messages and the like; 6. call object distribution: according to the opposite party number or contact person information in the call records, the number or the duty ratio of each user call object is counted, and classification can be carried out according to the types of the call objects, such as family, friends, colleagues and the like; 7. call duration trend: sequencing call records according to a time sequence, calculating call duration of each time period, and analyzing a change trend of the call duration, such as whether a trend of increasing or decreasing exists; 8. talk interval time: calculating the time interval between each user call record, and counting average call interval time, longest interval time, shortest interval time and the like; 9. call quality: and counting the average value or distribution condition of the call quality of each user through the call quality information (such as signal intensity, call interruption times and the like) in the call records.
Step S1044, determining a value interval corresponding to each of the plurality of third features.
For example, [10 (time/month) -20 (time/month) ], a value interval corresponding to the calling frequency [100 (time/month) -120 (time/month) ], a value interval corresponding to the calling duration [800 (minutes/month) -1000 (minutes/month) ], and the like.
In step S1045, a first value interval corresponding to each first feature is determined in the value intervals corresponding to each third feature in the plurality of third features.
Illustratively, the first feature comprises: the complaint times and the calling frequency, and the value interval corresponding to each third feature in the plurality of third features is as follows: the value interval corresponding to the complaint times is [10 (times/month) -20 (times/month) ], the value interval corresponding to the calling frequency is [100 (times/month) -120 (times/month) ], and the value interval corresponding to the calling duration is [800 (minutes/month) -1000 (minutes/month) ]. Therefore, in the value interval corresponding to each third feature in the plurality of third features, the first value interval corresponding to each first feature is determined, and the value interval of the complaint times can be determined as follows: the value interval of the calling frequency is [100 (time/month) -120 (time/month) ].
Step S106, under the condition that the number of the first target features exceeds a first preset threshold, the dialogue single data is subjected to second feature extraction to obtain a plurality of second features, and second values corresponding to each of the plurality of second features are respectively determined, wherein the first target features are first features, which are not in a first value interval, in the plurality of first features.
It can be understood that, under the condition that ten features are included in the plurality of first features, if five features exist in the ten features and are not in the value interval corresponding to the feature, the number to be identified may be an abnormal number or a normal number, and in order to further accurately identify the number to be identified, the dialogue single data is subjected to second feature extraction. In some alternative embodiments of the present application, the second feature extraction of the dialog sheet data may be achieved by:
a plurality of fourth features corresponding to the call ticket data are determined.
Optionally, the fourth plurality of features may be consistent with the first plurality of features, e.g., the fourth plurality of features and the first plurality of features each include, but are not limited to: number identification, the region where the number is located, the calling frequency of the number, the complaint times of the number, the answering time of the number, the starting and ending time of the call and the time of the call.
The feature mean of each of the plurality of fourth features is set to 0 and the feature variance is set to 1.
The method comprises the following steps of: 1. calculating a mean (mean) and standard deviation (std) for each feature; 2. subtracting the mean value from the feature value for each feature, and dividing the feature value by the standard deviation; 3. thus the mean of each feature becomes 0 and the variance becomes 1.
To sum up, this process can be implemented using, for example, the NumPy library of Python, with the code:
import numpy as np
# suppose that data is a matrix of n rows and m columns, n is the number of samples, m is the number of features
data=...
Calculation of mean and standard deviation for each feature #
mean=np.mean(data,axis=0)
std=np.std(data,axis=0)
Normalized for each feature #)
normalized_data=(data-mean)/std
In the above code, 'data' is a matrix of n rows and m columns, where each row represents a sample and each column represents a feature. The 'mean' and 'std' are each an m-dimensional vector representing the mean and standard deviation of each feature. The normalized_data' is a normalized matrix, and the mean value of each feature is 0 and the variance is 1.
And determining a plurality of principal component features and a maximum principal component feature from the fourth features according to the identification result of the historical abnormal number.
Further, according to the plurality of principal component features and the maximum principal component feature, the dialog sheet data is subjected to secondary feature extraction to obtain a plurality of second features.
The second feature extraction is performed on the dialog sheet data according to the plurality of main component features and the maximum main component features to obtain a plurality of second features, and the method specifically comprises the following steps:
determining a feature other than the principal component feature and the maximum principal component feature among the plurality of fourth features as a plurality of fifth features; determining a projection value of each fifth feature of the plurality of fifth features onto the principal component feature; determining a feature mean and a feature variance between each of the plurality of principal component features; determining a feature mean ratio and a feature variance ratio between each principal component feature of the plurality of principal component features and a maximum principal component feature; and performing dimension reduction processing on the dialogue single data according to the characteristic mean value ratio and the characteristic variance ratio to obtain a plurality of second characteristics.
The above method of determining the plurality of second features may be expressed by the following formula:
wherein mu i Representing the average value of i principal components,Representing the variance, sigma, of the ith principal component i Representing the variance ratio of the ith principal component to the maximum principal component for measuring the specific gravity and beta of the ith principal component in the data i The mean ratio of the i-th principal component to the maximum principal component is expressed and used for measuring the direction of the i-th principal component in the data.
In some alternative embodiments, in the event that the number of first target features exceeds a first preset threshold, a second feature extraction is performed on the bill data by principal component analysis, wherein principal component analysis (Principal Component Analysis, PCA) is a commonly used data dimension reduction technique that maps the bill data into a new coordinate system by linear transformation such that the mapped data has the greatest variance. The purpose of this is to find a low dimensional representation that retains the most important information in the original data. Principal component analysis is a linear transformation that transforms data into a new coordinate system such that the first large variance of any data projection is on a first coordinate (called the first principal component), the second large variance is on a second coordinate (the second principal component), and so on. Principal component analysis is often used to reduce the dimensionality of a data set while maintaining the features of the data set that contribute most to the variance. This is done by retaining the lower order principal components and ignoring the higher order principal components. Such low order components tend to preserve the most important aspects of the data. However, this is not necessarily the case, depending on the particular application.
Step S108, a second value interval corresponding to each second feature is obtained, and the target number to be identified is determined to be an abnormal number under the condition that the number of the second target features exceeds a second preset threshold, wherein the second target features are second features, of the plurality of second features, of which the second values are in the second value interval.
It can be understood that, assuming that the plurality of second features includes ten features, if six features exist in the ten features within the value interval corresponding to the feature, the number to be identified is indicated to be an abnormal number.
According to the steps, first feature extraction is carried out on call ticket data corresponding to the target number to be identified, a plurality of first features are obtained, and first values corresponding to each of the plurality of first features are respectively determined; acquiring a first value interval corresponding to each first feature; under the condition that the number of the first target features exceeds a first preset threshold, carrying out second feature extraction on the dialogue single data to obtain a plurality of second features, and respectively determining second values corresponding to each of the plurality of second features, wherein the first target features are first features, among the plurality of first features, of which the first values are not in a first value interval; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold value, wherein the second target features are the second features of the second value intervals in the plurality of second features, so that the purpose of improving the subdivision granularity of feature processing is achieved, and the technical effect of improving the identification precision of the abnormal number is realized.
Fig. 2 is a flowchart of another method for identifying an abnormal number according to an embodiment of the present application, as shown in fig. 2, the method includes the following steps:
step S201, receiving a target number to be identified, and extracting call ticket data of the number.
Step S202, feature extraction is carried out on the dialogue ticket data.
Step S203, the extracted feature data are matched with feature data in a static model library, and the feature data are identified through a dynamic model library under the condition that the corresponding identification classification is not matched. And the dynamic model library compares various characteristics of the target number to be identified with characteristic data in the dynamic model characteristics, and the target number is an abnormal number under the condition that a relevant threshold value is reached, otherwise, the target number is not the abnormal number.
According to some alternative embodiments of the present application, the static model library is built by the following method:
step S301, extracting historical abnormal number data in an original database, and obtaining ticket data of the historical abnormal number data. Wherein the original database is determined by the following method:
step S401, obtaining historical abnormal numbers of a plurality of different source channels.
Step S402, converting the data format: converting the format of the historical abnormal numbers of a plurality of different source channels into a unified format, wherein the format is as follows: source |number| questions describe |complaint| number base information.
Step S403, classifying the first abnormal numbers according to different sources of the historical abnormal numbers and attributes of the historical abnormal numbers to obtain a plurality of second abnormal numbers.
Step S404, filtering the plurality of second abnormal numbers according to different filtering rules corresponding to different source channels to obtain a third abnormal number.
Step S405, extracting configuration rules, and performing data cleaning and verification on the third abnormal number to finally obtain an original database.
Step S302, extracting features of regions, number calling frequency, complaint times, answering time, calling time and the like of historical abnormal number data, wherein the extracted related feature types can be dynamically configured.
Step S303, extracting the threshold value of the feature data of the same type of feature, extracting the data of the number calling frequency, the complaint frequency and the like with relative quantification, and taking the data as the threshold value of the type of feature matching.
And step S304, dividing the features with the same feature threshold into a static recognition type, taking the static recognition type as a recognition branch, and analogically forming a plurality of recognition classifications to complete the construction of a static model library.
According to other alternative embodiments of the present application, the dynamic model library is built by:
In step S501, an abnormal feature and a related threshold are configured as condition data for identifying an abnormal number, wherein the abnormal feature and the related threshold can be dynamically adjusted.
Step S502, obtaining ticket data of the target number to be identified, extracting features such as region, calling frequency, complaint condition and the like, and optionally, extracting features by combining a principal component analysis method.
In step S503, the extracted feature is compared with the corresponding configuration feature and the threshold, and when the threshold of the corresponding feature reaches or exceeds the value, the feature of the type can be used as a new abnormal feature classification to update into the static model library.
In the above steps, the number to be identified can be identified efficiently and accurately by combining a static model and a dynamic model, wherein the static model is used for extracting the data from all the existing sources and extracting communication characteristics by combining the call ticket data to generate the classification of the same characteristics, so as to form a plurality of identification classifications. The dynamic model is an expansion of the static model, and because of the variability of the abnormal number form, the model library data is required to be continuously perfected, and the dynamic model is required to be supplemented, so that efficient and accurate identification is achieved.
In some optional embodiments of the present application, the method for identifying an abnormal number further includes identifying a checking procedure, where the procedure includes the following steps:
in step S601, all the identification record data in the identification process of the abnormal number are acquired, and it is to be noted that the detailed steps and execution data of the identification are synchronously recorded in the identification process, so as to facilitate the subsequent investigation and verification of the problem.
Step S602, it is checked whether there is a miss in the execution step, an abnormality in the execution process, and an abnormality in the execution data return format.
Step S603, if the checking execution step has the condition of missing, abnormal execution process or abnormal execution data return format, the current identification result is invalid, and the identification is carried out on the number to be identified again; if the checking execution step has no deletion, no abnormality occurs in the execution process, no abnormality exists in the execution data return format, and the like, the identification result is directly output.
In the above identification process, problems may occur in the identification process due to various human or data reasons, for example, the identification is not performed according to the process, and the identification steps are abnormal. Through the steps, the execution steps after the recognition is completed are checked, if the recognition abnormality exists or the recognition step is missing, the recognition is performed again, the occurrence of error recognition is prevented, and the error occurrence efficiency is reduced to the maximum extent.
Fig. 3 is a structural diagram of an apparatus for identifying an abnormal number according to an embodiment of the present application, as shown in fig. 3, the apparatus includes:
the first determining module 30 is configured to perform a first feature extraction on the call ticket data corresponding to the target number to be identified, obtain a plurality of first features, and determine a first value corresponding to each of the plurality of first features.
According to some optional embodiments of the present application, the ticket data is also called as an operator ticket data, where the ticket data refers to data recorded by telephone communication, and includes information such as calling time, call duration, calling party number, called party number, and the like. The ticket data can be used for management and operation of the communication network, and can also be used for analyzing aspects of user communication behavior, charging, fraud detection and the like. The call ticket data is typically collected and stored by a telecommunications carrier or communication service provider.
Note that the ticket data is ticket data within a first preset time range, for example, ticket data within the previous week since the identification of the target number to be identified.
The call ticket data refers to the original communication record information, and can be called as detail list CDR (Call Detail Record ). Taking a landline phone as an example, the ticket mainly records the following information: serial number, subscriber identification, calling number, called number, start time, end time, call duration, call nature, rate, cost, discount, etc. For the mobile phone, in addition to the call records, the information of the call ticket records also comprises short message service (Short Message Service, SMS), multimedia message service (Multimedia Messaging Service, MMS), wireless application protocol (Wireless Application Protocol, wap) service, general packet radio service (General Packet Radio Service, GPRS) and the like, and the record format is similar to the call ticket.
Optionally, the plurality of first features may include, but is not limited to, a region of a target number to be identified, a calling frequency, a number of complaints, a receiving time, a calling frequency, and a number identifier, where the number identifier may include: handset number, subscriber identity module number (Subscriber Identity Module, SIM) and International Mobile Equipment Identity (IMEI) code (International Mobile Equipment Identity).
The obtaining module 32 is configured to obtain a first value interval corresponding to each first feature.
The second determining module 34 is configured to perform a second feature extraction on the dialog ticket data to obtain a plurality of second features when the number of first target features exceeds a first preset threshold, and determine a second value corresponding to each of the plurality of second features, where the first target feature is a first feature of the plurality of first features that is not in the first value interval.
The third determining module 36 is configured to obtain a second value interval corresponding to each second feature, and determine, when the number of second target features exceeds a second preset threshold, the target number to be identified as an abnormal number, where the second target feature is a second feature, among the plurality of second features, whose second value is in the second value interval.
Note that each module in fig. 3 may be a program module (for example, a set of program instructions for implementing a specific function), or may be a hardware module, and for the latter, it may be represented by the following form, but is not limited thereto: the expression forms of the modules are all a processor, or the functions of the modules are realized by one processor.
It should be noted that, the preferred implementation manner of the embodiment shown in fig. 3 may refer to the related description of the embodiment shown in fig. 1, which is not repeated herein.
Fig. 4 shows a hardware block diagram of a computer terminal for implementing an identification method of an abnormal number. As shown in fig. 4, the computer terminal 40 may include one or more processors 402 (shown as 402a, 402b, … …,402n in the figures) (the processor 402 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 404 for storing data, and a transmission module 406 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 4 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 40 may also include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
It should be noted that the one or more processors 402 and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other elements in the computer terminal 40. As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).
The memory 404 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for identifying an abnormal number in the embodiment of the present application, and the processor 402 executes the software programs and modules stored in the memory 404, thereby executing various functional applications and data processing, that is, implementing the method for identifying an abnormal number described above. Memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 404 may further include memory located remotely from processor 402, which may be connected to computer terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission module 406 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 40. In one example, the transmission module 406 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 406 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 40.
It should be noted here that, in some alternative embodiments, the computer terminal shown in fig. 4 described above may include hardware elements (including circuits), software elements (including computer code stored on a computer readable medium), or a combination of both hardware elements and software elements. It should be noted that fig. 4 is only one example of a specific example, and is intended to illustrate the types of components that may be present in the computer terminal described above.
It should be noted that, since the computer terminal shown in fig. 4 is used for executing the method for identifying the abnormal number shown in fig. 1, the explanation of the method for executing the command is also applicable to the electronic device, and will not be repeated here.
The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein the program controls equipment where the storage medium is located to execute the identification method of the abnormal number when running.
The nonvolatile storage medium executes a program of the following functions: carrying out first feature extraction on call ticket data corresponding to a target number to be identified to obtain a plurality of first features, and respectively determining a first value corresponding to each first feature in the plurality of first features; acquiring a first value interval corresponding to each first feature; under the condition that the number of the first target features exceeds a first preset threshold, carrying out second feature extraction on the dialogue single data to obtain a plurality of second features, and respectively determining second values corresponding to each of the plurality of second features, wherein the first target features are first features, among the plurality of first features, of which the first values are not in a first value interval; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold, wherein the second target features are the second features of the plurality of second features, the second values of which are in the second value interval.
The embodiment of the application also provides electronic equipment, which comprises: the processor is used for running a program stored in the memory, wherein the program executes the identification method of the abnormal number.
The processor is configured to execute a program that performs the following functions: carrying out first feature extraction on call ticket data corresponding to a target number to be identified to obtain a plurality of first features, and respectively determining a first value corresponding to each first feature in the plurality of first features; acquiring a first value interval corresponding to each first feature; under the condition that the number of the first target features exceeds a first preset threshold, carrying out second feature extraction on the dialogue single data to obtain a plurality of second features, and respectively determining second values corresponding to each of the plurality of second features, wherein the first target features are first features, among the plurality of first features, of which the first values are not in a first value interval; and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold, wherein the second target features are the second features of the plurality of second features, the second values of which are in the second value interval.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method for identifying an abnormal number, comprising:
performing first feature extraction on call ticket data corresponding to a target number to be identified to obtain a plurality of first features, and respectively determining a first value corresponding to each first feature in the plurality of first features;
acquiring a first value interval corresponding to each first feature;
under the condition that the number of first target features exceeds a first preset threshold value, extracting the features of the call ticket data for the second time to obtain a plurality of second features, and respectively determining a second value corresponding to each of the plurality of second features, wherein the first target features are first features of the plurality of first features, and the first values are not in the first value interval;
and acquiring a second value interval corresponding to each second feature, and determining the target number to be identified as an abnormal number under the condition that the number of the second target features exceeds a second preset threshold, wherein the second target features are second features of the plurality of second features, and the second value is in the second value interval.
2. The method of claim 1, wherein before obtaining the first value interval corresponding to each first feature, the method further comprises:
determining a history database;
acquiring historical ticket data corresponding to the historical abnormal number in the historical database;
extracting features of the historical ticket data to obtain a plurality of third features;
respectively determining a value interval corresponding to each third feature in the plurality of third features;
and determining the first value interval corresponding to each first feature in the value interval corresponding to each third feature in the plurality of third features.
3. The method of claim 2, wherein determining a history database comprises:
acquiring historical abnormal numbers of a plurality of different source channels;
converting the historical abnormal number into a first abnormal number in a target format, wherein the first abnormal number in the target format at least comprises: the source channel of the history abnormal number;
classifying the first abnormal numbers according to different source channels of the historical abnormal numbers to obtain a plurality of second abnormal numbers;
filtering the plurality of second abnormal numbers according to different filtering rules corresponding to the different source channels to obtain a third abnormal number;
And determining the historical database according to the third abnormal number.
4. The method according to claim 1, wherein, in the case where the number of first target features exceeds a first preset threshold, performing a second feature extraction on the call ticket data to obtain a plurality of second features, including:
determining a plurality of fourth features corresponding to the call ticket data;
setting a feature mean value of each of the plurality of fourth features to 0 and a feature variance to 1;
determining a plurality of principal component features and a maximum principal component feature from the fourth features according to the identification result of the historical abnormal number;
and carrying out second feature extraction on the call ticket data according to the plurality of principal component features and the maximum principal component feature to obtain a plurality of second features.
5. The method of claim 4, wherein performing a second feature extraction on the call ticket data based on the plurality of principal component features and the maximum principal component feature to obtain the plurality of second features comprises:
determining a feature other than the principal component feature and the maximum principal component feature of the plurality of fourth features as a plurality of fifth features;
Determining a projection value of each fifth feature of the plurality of fifth features onto the principal component feature;
determining a feature mean and a feature variance between each principal component feature of the plurality of principal component features;
determining a feature mean ratio and a feature variance ratio between each principal component feature of the plurality of principal component features and the maximum principal component feature;
and performing dimension reduction processing on the call ticket data according to the characteristic mean value ratio and the characteristic variance ratio to obtain the plurality of second characteristics.
6. The method according to claim 1, wherein the method further comprises: and under the condition that the number of the first target features does not exceed the first preset threshold, determining the target number to be identified as the abnormal number.
7. The method of claim 1, wherein the first plurality of features comprises at least two of: number identification, the region where the number is located, the calling frequency of the number, the complaint times of the number, the answering time of the number, the starting and ending time of the call and the time of the call.
8. An apparatus for identifying an abnormal number, comprising:
The first determining module is used for extracting the first characteristics of the call ticket data corresponding to the target number to be identified to obtain a plurality of first characteristics, and determining a first value corresponding to each of the plurality of first characteristics respectively;
the acquisition module is used for acquiring a first value interval corresponding to each first characteristic;
the second determining module is used for extracting the second characteristics from the call ticket data to obtain a plurality of second characteristics under the condition that the number of the first target characteristics exceeds a first preset threshold value, and determining second values corresponding to each of the plurality of second characteristics respectively, wherein the first target characteristics are first characteristics, among the plurality of first characteristics, of which the first values are not in the first value interval;
the third determining module is configured to obtain a second value interval corresponding to each second feature, and determine the target number to be identified as an abnormal number when the number of second target features exceeds a second preset threshold, where the second target feature is a second feature of the plurality of second features, where the second value is in the second value interval.
9. A nonvolatile storage medium, characterized in that the nonvolatile storage medium includes a stored program, wherein the program, when run, controls a device in which the nonvolatile storage medium is located to execute the method of identifying an abnormal number according to any one of claims 1 to 7.
10. An electronic device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program is executed to perform the method of identifying an abnormal number according to any one of claims 1 to 7.
CN202311467581.5A 2023-11-06 2023-11-06 Abnormal number identification method and device and nonvolatile storage medium Pending CN117478787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311467581.5A CN117478787A (en) 2023-11-06 2023-11-06 Abnormal number identification method and device and nonvolatile storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311467581.5A CN117478787A (en) 2023-11-06 2023-11-06 Abnormal number identification method and device and nonvolatile storage medium

Publications (1)

Publication Number Publication Date
CN117478787A true CN117478787A (en) 2024-01-30

Family

ID=89630827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311467581.5A Pending CN117478787A (en) 2023-11-06 2023-11-06 Abnormal number identification method and device and nonvolatile storage medium

Country Status (1)

Country Link
CN (1) CN117478787A (en)

Similar Documents

Publication Publication Date Title
CN110337059B (en) Analysis algorithm, server and network system for family relationship of user
CN102083010B (en) Method and equipment for screening user information
EP3591894B1 (en) Tariff data determination method and device
CN112434039A (en) Data storage method, device, storage medium and electronic device
CN114757639A (en) Data processing method, device, equipment and storage medium
CN108197002B (en) Mobile device non-buried point data statistical method, system, terminal and medium
CN112954626A (en) Mobile phone signaling data analysis method and device, electronic equipment and storage medium
US11050878B1 (en) Methods, systems and computer program products for detecting anomalies in a telecommunications network
CN114169438A (en) Telecommunication network fraud identification method, device, equipment and storage medium
CN109168138A (en) The recognition methods for the number of changing, device and equipment in net
CN101146148B (en) A monitoring system and method for non-standardized inter-network call
CN109189803A (en) Question and answer are to construction method, device and computer readable storage medium
CN114064445A (en) Test method, device, equipment and computer readable storage medium
CN110135190B (en) Data management method, server and computer storage medium
CN117478787A (en) Abnormal number identification method and device and nonvolatile storage medium
US8824459B2 (en) Methods and apparatus to measure market share for voice over internet protocol carriers
CN112752256B (en) Client portrait label determination method, device, equipment and storage medium
CN110460452B (en) Message pushing method and related product
CN110148011B (en) Method, device, equipment and medium for analyzing active amount drop based on big data
CN112307075A (en) User relationship identification method and device
CN110930195A (en) Data processing method and electronic equipment
CN109121137B (en) Method and device for identifying user number use type of double-card terminal
CN107770734B (en) Method and device for identifying mobile subscriber permanent station
CN113452533A (en) Charging self-inspection and self-healing method and device, computer equipment and storage medium
CN114630314B (en) Updating method, device, equipment and storage medium of terminal information base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination