CN116016771A - Harassment number identification method and device, electronic equipment and storage medium - Google Patents

Harassment number identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116016771A
CN116016771A CN202211606125.XA CN202211606125A CN116016771A CN 116016771 A CN116016771 A CN 116016771A CN 202211606125 A CN202211606125 A CN 202211606125A CN 116016771 A CN116016771 A CN 116016771A
Authority
CN
China
Prior art keywords
called
area
resident
candidate
numbers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211606125.XA
Other languages
Chinese (zh)
Inventor
伍军
韩晔
郭亚栋
张晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202211606125.XA priority Critical patent/CN116016771A/en
Publication of CN116016771A publication Critical patent/CN116016771A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The application provides a harassment number identification method, a harassment number identification device, electronic equipment and a storage medium. The method comprises the following steps: according to the call parameters of the numbers, determining candidate numbers meeting the first condition; determining a resident area of a called number corresponding to each candidate number, and calculating called times of each resident area; calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas according to the self-increase of the called times; if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, judging the candidate number as a harassment number. According to the scheme, the accuracy of harassment number identification is improved.

Description

Harassment number identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to communications, and in particular, to a method and apparatus for identifying harassment numbers, an electronic device, and a storage medium.
Background
With the economic development, telephone marketing and telecommunication fraud are endless, harassment calls seriously affect the daily life of people, and the accurate marking of harassment numbers can effectively avoid the influence of harassment calls on users.
At present, numbers with high calling frequency, low called frequency, long call time and low called number repetition rate are marked as harassment numbers. However, the current scheme for identifying harassment numbers may cause mislabeling, such as numbers for community services, express, takeaway, etc., and the current scheme for identifying harassment numbers may be mislabeled as harassment calls. Therefore, the accuracy of the current harassment number identification scheme is low.
Disclosure of Invention
The application provides a harassment number identification method, device, electronic equipment and storage medium, which are used for solving the problem of low accuracy of harassment number identification.
In a first aspect, the present application provides a method for identifying a harassment number, including: according to the call parameters of the numbers, determining candidate numbers meeting the first condition; the call frequency of the call parameter representation number is the first condition that the call frequency exceeds a preset first threshold; determining a resident area of a called number corresponding to each candidate number, and calculating called times of each resident area; the called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number; calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas according to the self-increase of the called times; if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, judging the candidate number as a harassment number.
In one possible implementation manner, the call parameters include a calling ratio, a calling number and a local called number ratio; the determining the candidate number meeting the first condition according to the call parameters of the numbers comprises the following steps: obtaining ticket data of the numbers in a first period; according to the ticket data, obtaining the calling ratio, the calling times and the local called number ratio of each number; the local called number of the number accounts for the local number in the called number corresponding to the number; and taking the number with the calling duty ratio exceeding a first sub-threshold, the calling times exceeding a second sub-threshold and the local called number duty ratio exceeding a third sub-threshold as the candidate number.
In one possible implementation manner, the obtaining the local called number ratio of each number according to the ticket data includes: determining a calling area of each call of each number in the first period of time according to each number in the numbers; wherein, the calling area of the call is the attribution of the called number in the call; if the calling area with the largest number of calls in the first period is consistent with the attribution of the number, the number is used as a local number; and calculating the duty ratio of the local number in the called number corresponding to the number, and taking the duty ratio as the duty ratio of the local called number of the number.
In one possible implementation manner, the determining, for each candidate number, the resident area of the called number corresponding to the candidate number includes: determining resident working areas, resident residence areas and broadband installation areas of called numbers corresponding to the candidate numbers; and regarding each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number.
In one possible implementation manner, the determining the resident working area of the called number corresponding to the candidate number includes: determining a region code and a base station identifier of each call of the called number in a second period of time according to each called number corresponding to the candidate number; determining a base station used by the called number in each call in the second period according to the regional code and the base station identifier; and according to the base station used by the called number in each call in the second period, taking the area where the base station with the largest use times in the second period is positioned as a resident working area of the called number.
In one possible implementation manner, the determining the resident residential area of the called number corresponding to the candidate number includes: determining the internet access address of each internet surfing of the called number in a third period of time according to each called number corresponding to the candidate number; and according to the internet access address of the called number on the internet each time in the third period, taking the area where the internet access address with the longest access accumulated duration is located as a resident residential area of the called number.
In one possible implementation manner, the determining the broadband installation area of the called number corresponding to the candidate number includes: acquiring a broadband installation address of each called number corresponding to the candidate number; and taking the area where the broadband installation address is located as a broadband installation area of the called number.
In a second aspect, the present application provides a harassment number identification device, comprising: the screening module is used for determining candidate numbers meeting the first condition according to the call parameters of the numbers; the call frequency of the call parameter representation number is the first condition that the call frequency exceeds a preset first threshold; the processing module is used for determining a resident area of the called number corresponding to each candidate number; the calculation module is used for calculating the called times of each resident area; the called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number; the calculating module is also used for calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas according to the number of the called times at least; the judging module is used for judging that the candidate number is a non-harassment number if the ratio is larger than a preset second threshold value; otherwise, judging the candidate number as a harassment number.
In one possible implementation manner, the call parameters include a calling ratio, a calling number and a local called number ratio; the screening module comprises: the first acquisition unit is used for acquiring the ticket data of each number in a first period; the second acquisition unit is used for acquiring the calling ratio, the calling times and the local called number ratio of each number according to the ticket data; the local called number of the number accounts for the local number in the called number corresponding to the number; and the screening unit is used for taking the number of which the calling duty ratio exceeds a first sub-threshold value, the calling times exceeds a second sub-threshold value and the local called number duty ratio exceeds a third sub-threshold value as the candidate number.
In a possible implementation manner, the second obtaining unit is specifically configured to: determining a calling area of each call of each number in the first period of time according to each number in the numbers; wherein, the calling area of the call is the attribution of the called number in the call; if the calling area with the largest number of calls in the first period is consistent with the attribution of the number, the number is used as a local number; and calculating the duty ratio of the local number in the called number corresponding to the number, and taking the duty ratio as the duty ratio of the local called number of the number.
In a possible implementation manner, the processing module is specifically configured to: determining resident working areas, resident residence areas and broadband installation areas of called numbers corresponding to the candidate numbers; and regarding each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number.
In one possible embodiment, the processing module includes: a first determination unit; the first determining unit is configured to: determining a region code and a base station identifier of each call of the called number in a second period of time according to each called number corresponding to the candidate number; determining a base station used by the called number in each call in the second period according to the regional code and the base station identifier; and according to the base station used by the called number in each call in the second period, taking the area where the base station with the largest use times in the second period is positioned as a resident working area of the called number.
In one possible embodiment, the processing module includes: a second determination unit; the second determining unit is configured to: determining the internet access address of each internet surfing of the called number in a third period of time according to each called number corresponding to the candidate number; and according to the internet access address of the called number on the internet each time in the third period, taking the area where the internet access address with the longest access accumulated duration is located as a resident residential area of the called number.
In one possible embodiment, the processing module includes: a third determination unit; the third determining unit is configured to: acquiring a broadband installation address of each called number corresponding to the candidate number; and taking the area where the broadband installation address is located as a broadband installation area of the called number.
In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method as described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method as described above.
In the harassment number identification method, the harassment number identification device, the electronic equipment and the storage medium, candidate numbers meeting the first condition are determined according to call parameters of the numbers; for each candidate number, determining a resident area of a called number corresponding to the candidate number, and calculating the called times of each resident area; calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas according to the number of the called times; if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. According to the scheme, whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas or not is determined by judging whether the ratio of the sum of the called numbers of the preset number of resident areas before ranking to the sum of the called numbers of all resident areas is larger than a second threshold value or not, so that whether the candidate numbers are harassment numbers or not is identified, and the accuracy of harassment number identification is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic view of a harassment number identification scenario provided in the present application;
FIG. 2 is a flow chart of a method for identifying harassment numbers according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for identifying harassment numbers according to a second embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a harassment number recognition device provided in a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated (Unless otherwise indicated). It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus. The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.
The technical scheme of the present application and the technical scheme of the present application are described in detail below with specific examples. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. In the description of the present application, the terms are to be construed broadly in the art, unless explicitly stated or defined otherwise. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of a harassment number identification scenario provided in the present application. As shown in fig. 1, the operator 101 identifies numbers in a set of numbers to be identified 100, stores numbers identified as nuisance numbers in a set of nuisance numbers 102, and stores numbers identified as non-nuisance numbers in a set of non-nuisance numbers 103.
In practical application, the harassment call has the characteristics of high calling frequency, low called frequency, long and short call time and low called number repetition rate, the harassment number is accurately identified, and the influence of the harassment call on the daily life of people can be avoided. In combination with the actual scene, the calling frequency of the number 1 and the number 3 is high, the called frequency is low, the call time is long and the called number repetition rate is low, and the operator identifies the number 1, the number 2 and the number 3 as harassment numbers and stores the harassment numbers in a harassment number set. Number 3, number 4 and number 5 do not have the feature of nuisance calls, and number 3, number 4 and number 5 are identified as non-nuisance calls and stored in a non-nuisance call set.
However, the current scheme for identifying harassment numbers may cause mislabeling, such as numbers for community services, express, takeaway, etc., and the current scheme for identifying harassment numbers may be mislabeled as harassment calls. Therefore, the accuracy of the current harassment number identification scheme is low.
In the embodiment of the application, whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas is determined by judging whether the ratio of the sum of the called numbers of the preset number of resident areas before ranking to the sum of the called numbers of all resident areas is larger than a second threshold value, so that whether the candidate numbers are harassment numbers is identified, and the accuracy of harassment number identification is improved.
Example 1
Fig. 2 is a flow chart of a harassment number identification method provided in an embodiment of the present application, where an implementation subject of the embodiment may be a harassment number identification device, as shown in fig. 2, and the method includes:
s201, determining candidate numbers meeting a first condition according to call parameters of the numbers;
s202, determining a resident area of a called number corresponding to each candidate number, and calculating called times of each resident area;
S203, calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas according to the number of the called times;
s204, judging whether the ratio is larger than a second threshold value;
s205, if the ratio is larger than a preset second threshold, judging that the candidate number is a non-harassment number;
s206, if the ratio is not greater than a preset second threshold, judging that the candidate number is a harassment number.
In this embodiment, the call parameter characterizes the call frequency of the number, and the first condition is that the call frequency exceeds a preset first threshold. In practical application, the frequency of calling the harassment number is high, and candidate numbers which are possibly harassment numbers can be primarily screened out by judging whether the frequency of calling exceeds a preset first threshold value.
In practical application, the execution main body of the identification method can be a harassment number identification device, and the implementation mode of the harassment number identification device is various, for example, the harassment number identification device can be realized through a computer program, such as application software and the like; alternatively, the computer program may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, it may be implemented by a physical device, e.g., a chip or the like, in which the relevant computer program is integrated or installed.
Specifically, in S201, a candidate number satisfying the first condition is determined according to the call parameter of each number. For example, the number 1 calls 10 times per hour, the number 2 calls 30 times per hour, the number 3 calls 50 times per hour, the preset first threshold is 20 times per hour, the first condition is that the number 2 and the number 3 exceed 20 times per hour, the first condition is satisfied, and the number 2 and the number 3 are used as candidate numbers.
The called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number. In practical application, the called number corresponds to a unique resident area, and the called number corresponding to the candidate number has the called number with the same resident area, that is, the resident area may correspond to a plurality of called numbers, so the called number of the resident area is the sum of the called numbers of all the called numbers corresponding to the resident area.
Specifically, in S202, the number of times of called for each resident area is calculated. For example, the candidate number 1 corresponds to the called number 1, the called number 2, the called number 3, the called number 4, the called number 5, and the called number 6. Wherein, the resident areas corresponding to the called number 1, the called number 2 and the called number 3 are resident areas A; the resident area corresponding to the called number 4 and the called number 5 is the resident area B; the resident area corresponding to the called number 6 is resident area C. The number of times of called the called number 1 in the resident area A is 15 times, the number of times of called the called number 2 in the resident area A is 20 times, and the number of times of called the called number 3 in the resident area A is 35 times; the number of times of called the called number 4 in the resident area B is 10 times, and the number of times of called the called number 5 in the resident area B is 15 times; the number of times of called the called number 6 in the resident area C is 20 times. Therefore, calculating the sum of the called times of the called number 1, the called number 2 and the called number 3 in the resident area A to obtain 70 called times of the resident area A; calculating the sum of the called times of the called number 4 and the called number 5 in the resident area B to obtain the called times of the resident area B as 25 times; the called number belonging to the resident area C is only called number 6, and the called number of the called number 6 in the resident area C is used as the called number of the resident area C, so that the called number of the resident area C is 20 times.
Correspondingly, in S203, the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas is calculated according to the number of called times. In combination with the above example, the number of times of callee in resident area a is 70, the number of times of callee in resident area B is 25, the number of times of callee in resident area C is 20, the sum of the number of times of callee in resident area of top 2, which is the most frequently ranked, is 95, the sum of the number of times of callee in all resident areas is 115, and the ratio of the sum of the number of times of callee in resident area of top 2, which is the most frequently ranked, to the sum of the number of times of callee in all resident areas is 0.83.
In practical application, the resident areas of the called numbers of the harassment numbers are scattered, and the resident areas of the called numbers of the civil service numbers are gathered. It can be understood that by judging whether the ratio of the sum of the called numbers of the resident areas with the preset number before ranking to the sum of the called numbers of all resident areas is larger than the second threshold value, whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas can be determined, so that whether the candidate numbers are harassment numbers can be identified. For example, the ratio of candidate number 1 is 0.83, the ratio of candidate number 2 is 0.35, the preset second threshold is 0.5, the ratio of candidate number 1 is greater than the second threshold, the candidate number 1 is determined to be a non-harassment number, the ratio of candidate number 2 is less than the second threshold, and the candidate number 2 is determined to be a harassment number.
It should be noted that, only the communication parameters of each number of the home network can be acquired for each operator. It can be understood that the called number corresponding to the candidate number is the home network called number. Optionally, calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the total call times of the candidate number according to the number of the called times at least; if the ratio is greater than a preset third threshold, judging that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. Wherein the third threshold is the second threshold multiplied by the home network mobile user market share.
Optionally, verifying the accuracy of the result of the harassment number identification. For example, for the candidate number set a, a non-disturbance number set B and a non-disturbance number set C are obtained respectively through the disturbance number identification method and the manual identification method of the embodiment, a ratio of the intersection of the non-disturbance number set B and the non-disturbance number set C to the non-disturbance number set C is calculated, and if the ratio is greater than a fourth threshold, it is determined that accuracy of the disturbance number identification result meets the requirement.
In addition, the candidate number may be determined according to the calling ratio, the number of calling and the local called number ratio of each number, and in one possible implementation, the call parameters include the calling ratio, the number of calling and the local called number ratio, and S201 includes:
Obtaining ticket data of each number in a first period;
according to the ticket data, obtaining the calling ratio, the calling times and the local called number ratio of each number;
and taking the number with the calling duty ratio exceeding the first sub-threshold, the calling times exceeding the second sub-threshold and the local called number duty ratio exceeding the third sub-threshold as the candidate number.
The calling ratio of the number is the ratio of calling times in the call times corresponding to the number; the local called number of the number accounts for the local number in the called number corresponding to the number.
In practical application, the harassment number has the characteristics of large calling times occupation ratio, more calling times and more scattered resident areas of called users. It will be appreciated that numbers with calling duty cycle exceeding the first sub-threshold, calling number exceeding the second sub-threshold and local called number duty cycle exceeding the third sub-threshold may be nuisance numbers or non-nuisance numbers.
In this embodiment, the number whose calling duty ratio exceeds the first sub-threshold, whose calling number exceeds the second sub-threshold, and whose local called number duty ratio exceeds the third sub-threshold is used as the candidate number, so that the number range in which the non-harassment number may exist can be effectively determined, the candidate number can be identified, and the efficiency of harassment number identification can be improved.
Further, regarding the calculation of the local called number duty cycle, in one possible implementation, obtaining the local called number duty cycle of each number according to the ticket data includes:
determining a calling area of each call of each number in a first period of time according to each number in the numbers;
if the calling area with the largest number of calls is consistent with the attribution of the number in the first period, the number is used as a local number;
and calculating the duty ratio of the local number in the called number corresponding to the number, and taking the duty ratio of the local called number as the duty ratio of the number.
The calling area of the call is the attribution of the called number in the call. In practical application, in the first period, the calling area with the largest number of calls is consistent with the attribution of the number, which indicates that in the first period, the called number called by the number is mainly concentrated in the attribution of the number, so that the number is used as a local number. For example, in the first period, the home location of the number 1 is the area a, the home location of the number 2 is the area B, the calling area with the largest number of calls of the number 1 is the area C, the calling area with the largest number of calls of the number 2 is the area B, the calling area with the largest number 1 call coefficient is inconsistent with the home location of the number 1, the number 1 is used as the foreign number, the calling area with the largest number 2 call coefficient is consistent with the home location of the number 2, and the number 2 is used as the local number.
In this embodiment, for each number, whether the number is a local number is determined by determining whether the calling area with the largest number of calls matches the home location of the number in the first period. And determining the local called duty ratio corresponding to the number by calculating the local number duty ratio in the called number corresponding to the number. Candidate numbers can be determined through the calling duty ratio, the calling times and the local called numbers, and the recognition efficiency of harassment numbers is improved.
In the harassment number identification method provided by the embodiment, candidate numbers meeting the first condition are determined according to call parameters of the numbers; for each candidate number, determining a resident area of a called number corresponding to the candidate number, and calculating the called times of each resident area; calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas according to the number of the called times; if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. In this embodiment, whether the called numbers corresponding to the candidate numbers are gathered in the predetermined number of resident areas is determined by judging whether the ratio of the sum of the called numbers of the predetermined number of resident areas before ranking to the sum of the called numbers of all resident areas is greater than a second threshold value, so that whether the candidate numbers are harassment numbers is identified, and the accuracy of harassment number identification is improved.
Example two
Fig. 3 is a flow chart of a method for identifying harassment numbers according to a second embodiment of the present application, as shown in fig. 3, S202 specifically includes:
s301, determining a resident working area, a resident residential area and a broadband installation area of a called number corresponding to the candidate number;
s302, regarding each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residence area and a broadband installation area of the called number as the resident area of the called number, and calculating the called times of each resident area.
The resident working area of the called number is the working area with the largest number of calls of the called number user in a certain period of time. The resident residential area of the called number is the residential area with the longest accumulated internet surfing time in a certain period of time. The broadband installation area of the called number is the area where the broadband installation address under the same certificate as the called number is located. The resident working area of the called number is a community or town where the working address is located, and the number of times of calling of the called number user is the largest; the resident residential area of the called number is the community or town where the residence address is located, the accumulated time length of surfing the internet of the called number user is longest; the broadband installation area of the called number is the jurisdiction of the operator's branch office where the broadband is installed.
Specifically, for each called number corresponding to the candidate number, the common area of the resident working area, resident residence area and broadband installation area of the called number is used as the resident area of the called number. For example, the resident working area corresponding to the called number 1 is community a, the resident residence area corresponding to the called number 1 is community B, the broadband installation area corresponding to the called number 1 is jurisdiction C of the broadband division company 1, the common area of the communities a, B and C is home area D, and the home area D is regarded as the resident area of the called number 1.
It can be understood that the common area of the resident working area, resident residence area and broadband installation area of the called number, including the active area covering the called number user, can accurately represent the resident area of the called number. Therefore, the common area of the resident working area, resident residence area and broadband installation area of the called number is used as the resident area of the called number, and the accurate resident area of the called number can be obtained.
Optionally, the resident working area of the called number can be determined according to the area where the base station with the highest number of called number calls in the second period is located. In a possible implementation manner, determining the resident working area of the called number corresponding to the candidate number in S301 includes:
Determining a regional code and a base station identifier of each call of the called number in a second period of time according to each called number corresponding to the candidate number;
determining a base station used by the called number in each call in a second period according to the regional code and the base station identifier;
and according to the base station used by the called number in each call in the second period, taking the area where the base station with the largest use times in the second period is positioned as a resident working area of the called number.
A regional area code (Location Area Code, abbreviated as LAC) is used to divide and identify a location area, a Cell Tower ID (abbreviated as CID), representing a mobile base station.
In practical application, the base station identifier can be uniquely determined according to the region code and the base station identifier, and the base station identifier corresponds to the address of the base station. Therefore, the base station used by the called number for each call and the address of the base station used by the called number for each call can be determined according to the area code and the base station identification of the called number for each call. It can be understood that the area where the address of the base station used by the called number in each call is located is consistent with the area where the called number in each call is located.
In this embodiment, according to the address of the base station used by the called number in each call in the second period, the area where the base station with the largest number of times of use in the second period is used as the resident working area of the called number, so as to obtain an accurate resident working area of the called number.
Optionally, the resident residential area of the called number can be determined according to the area where the internet access address with the longest accumulated time length of the called number access is located in the third period. In one possible implementation manner, determining the resident residential area of the called number corresponding to the candidate number in S301 includes:
determining the internet access address of each internet surfing of the called number in a third period of time according to each called number corresponding to the candidate number;
and according to the internet access address of the called number in each internet surfing in the third period, taking the area where the internet access address with the longest access accumulated duration is located as a resident residential area of the called number.
In practical application, the internet access address and the access duration of each internet surfing of the called number can be determined according to the home broadband and the wireless network information. It can be understood that the area where the internet access address with the longest accumulated time length is accessed is the area where the called number user is located with the longest time length.
In this embodiment, according to the internet access address of the called number in the third period of time, the area where the internet access address with the longest access accumulated duration in the third period of time is used as the resident residence area of the called number, so as to obtain the accurate resident residence area of the called number.
Further, regarding the determination of the broadband installation area of the called number, in one possible implementation, determining the broadband installation area of the called number corresponding to the candidate number in S301 includes:
for each called number corresponding to the candidate number, acquiring a broadband installation address of the called number;
the area where the broadband installation address is located is used as the broadband installation area of the called number.
It will be appreciated that the broadband installation address is within the jurisdiction of the carrier's carrier installing the broadband, and that the jurisdiction of the carrier's carrier installing the broadband is exemplified as the broadband installation zone. Optionally, according to the area divided by the operator, the area where the broadband installation address is located is used as the broadband installation area of the called number.
In the harassment number identification method provided by the embodiment, a resident working area, a resident residential area and a broadband installation area of a called number corresponding to the candidate number are determined; and aiming at each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number. In this embodiment, the common area to which the resident working area, resident residence area and broadband installation area of the called number belong is used as the resident area of the called number, so that the accurate resident area of the called number can be obtained. And judging whether the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas is larger than a second threshold value or not, and determining whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas or not, so that whether the candidate numbers are harassment numbers or not is identified, and the accuracy of harassment number identification is improved.
Example III
In order to facilitate understanding of the solution, the following exemplary description of the third embodiment of the present application is provided in connection with examples. In this embodiment, numbers with calling ratio of more than 70%, calling times of more than 50 and local called number ratio of more than 70% are selected as candidate numbers within 7 days. The ratio of the sum of the called times of the resident areas with the ranking of 2 to the sum of the called times of all resident areas is calculated to be more than 24 percent.
The process of the harassment number identification method in the third embodiment of the application comprises the following steps:
step 1: and obtaining the ticket data of each number within 7 days.
By way of example, program source code is as follows:
CREATE TABLE zhuanli_0601AS
SELECT/*+parallel(d,30)*/d.msisdn,d.call_type,COUNT(*)call_num
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')AND d.net_type_code='50'
GROUP BY d.msisdn,d.call_type;
step 2: and obtaining intermediate candidate numbers with calling ratio exceeding 70% and calling times exceeding 50 times according to call ticket data.
By way of example, program source code is as follows:
CREATE TABLE zhuanli_0601_1AS
SELECT msisdn,sum(decode(call_type,'01',call_num,0))calling_num,SUM(call_num)total_num FROM zhuanli_0601
GROUP BY msisdn
HAVING sum(decode(call_type,'01',call_num,0))>50AND sum(decode(
call_type,'01',call_num,0))/SUM(call_num)>0.7;
step 3: screening the intermediate candidate numbers, wherein the numbers meeting the local called rate of more than 70% are used as candidate numbers;
by way of example, program source code is as follows:
step 31: screening intermediate candidate numbers with the number attribution of the called number consistent with the number attribution within 7 days;
CREATE TABLE zhuanli_0602AS
SELECT/*+parallel(d,30)*/d.msisdn,decode(d.called_home_code,'022',1,2)
called_code,COUNT(*)call_num FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')
AND d.call_type='01'
AND d.net_type_code='50'
GROUP BY d.msisdn,decode(d.called_home_code,'022',1,2);
step 32: screening intermediate candidate numbers with the local call times of the called number being more than 70%;
CREATE TABLE zhuanli_0602_1AS
select z.msisdn,sum(decode(z.called_code,1,call_num,0))home_code,
SUM(z.call_num)total_num
FROM zhuanli_0602z
GROUP BY msisdn
HAVING sum(decode(z.called_code,1,call_num,0))/SUM(z.call_num)>0.7;
step 33: screening intermediate candidate numbers with local called numbers accounting for more than 70% as candidate numbers;
CREATE TABLE zhuanli_0603AS
SELECT msisdn FROM zhuanli_0602_1
INTERSECT
SELECT msisdn FROM zhuanli_0601_1;
Step 4: and determining the resident area of the called number corresponding to each candidate number.
By way of example, program source code is as follows:
step 41: determining a resident working area of a called number corresponding to the candidate number;
determining a regional code and a base station identifier of each call of the called number in one month aiming at each called number corresponding to the candidate number;
CREATE TABLE zhuanli_0604AS
SELECT/*+parallel(d,30)*/other_party,COUNT(*)call_num
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')
AND d.call_type='01'
AND d.msisdn IN
(SELECT z.msisdn FROM zhuanli_0603z)
GROUP BY other_party;
CREATE TABLE zhuanli_0504_1AS
SELECT/*+parallel(d,40)*/d.msisdn,d.lac1,d.cell_id1,COUNT(*)call_num
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'AND d.net_type_code='50'
AND d.msisdn IN
(SELECT z.other_party FROM zhuanli_0504z)
GROUP BY d.msisdn,d.lac1,d.cell_id1;
determining the area of the base station used by the called number in each call in one month according to the area code and the base station identification;
CREATE TABLE zhuanli_0505AS
SELECT/*+parallel(z,40)*/z.*,d.district
FROM zhuanli_0504_1z,
(select d.lac,d.cell_id,d.base_station,d.adr_detail,d.district
from ods.ods_yd_c_laccell_info_m d
where d.acct_month in(to_char(add_months(sysdate-1,-1),'yyyymm'),
to_char(add_months(sysdate-1,-2),'yyyymm'))
union
select d.lac,d.cell_id,d.net_name,d.station_name,d.district
from ods.ods_yd_cb_laccell_info_m d
where d.acct_month in(to_char(add_months(sysdate-1,-1),'yyyymm'),
to_char(add_months(sysdate-1,-2),'yyyymm'))
union
select t.gnodeb_id lac,t.cell_id cell_id,t.station_name,position_infor,t.district
from ods.ods_yd_5g_laccell_info_m1 t
where acct_month in(to_char(add_months(sysdate-1,-1),'yyyymm'),
to_char(add_months(sysdate-1,-2),'yyyymm')))d
WHERE d.lac=z.lac1 AND d.cell_id=z.cell_id1;
and taking the area with the largest number of calls in one month as a resident working area of the called number according to the base station used by the called number in one month for each call.
CREATE TABLE zhuanli_0604AS
SELECT/*+parallel(d,30)*/other_party,COUNT(*)call_num
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')
AND d.call_type='01'
AND d.msisdn IN
(SELECT z.msisdn FROM zhuanli_0603z)
GROUP BY other_party;
CREATE TABLE zhuanli_0504_1AS
SELECT/*+parallel(d,40)*/d.msisdn,d.lac1,d.cell_id1,COUNT(*)call_num
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'AND d.net_type_code='50'
AND d.msisdn IN
(SELECT z.other_party FROM zhuanli_0504z)
GROUP BY d.msisdn,d.lac1,d.cell_id1;
Step 42: determining resident residential areas of called numbers corresponding to the candidate numbers;
create TABLE zhuanli_0620AS
select mobile_num,area_name from report.rpt_yw_user_imei_m where acct_month='202204'
AND visit_cnt_all_tcp>0
AND mobile_num IN
(select other_party FROM zhuanli_0604)
AND area_name IS NOT NULL
AND area_name < > 'other';
step 43: determining a broadband installation area of a called number corresponding to the candidate number;
for each called number corresponding to the candidate number, acquiring a broadband installation address of the called number;
CREATE TABLE tf_f_0621as
SELECT*FROM cbkafka.tf_f_user r
WHERE r.serial_number IN(
select other_party FROM zhuanli_0604)
AND r.destroy_time IS NULL;
CREATE TABLE tf_f_kd_0621AS
SELECT*FROM cbkafka.tf_f_user r
WHERE r.cust_id IN
(SELECT cust_id FROM tf_f_0621)
AND r.destroy_time IS NULL AND r.net_type_code='40';
CREATE TABLE tf_f_kd_add_0621AS
SELECT t.cust_id,m.attr_value FROM tf_f_kd_0621t,cbkafka.tf_f_user_item m
WHERE t.user_id=m.user_id AND m.Attr_code='DETAIL_INSTALL_ADDRESS';
and extracting the operator branch company corresponding to the broadband installation address, and taking the jurisdiction area of the operator branch company corresponding to the broadband installation address as the broadband installation area of the called number.
CREATE TABLE tf_f_kd_add_r AS
SELECT f.serial_number,
replacement (t.attr_value, ' coastal New region ', ', 1, 3), ' region ', ' division ') area_name
FROM tf_f_0621f,
tf_f_kd_add_0621t
WHERE attr_value lie '% area%' AND f.cust_id=t.cust_id;
step 44: and aiming at each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number.
DROP TABLE zhuanli_0607
CREATE TABLE zhuanli_0607AS
SELECT/*+parallel(d,30)*/d.*
FROM fz_ods_rh_cb_tg_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')
AND d.call_type='01'
AND d.called_code='022'
AND d.msisdn IN
(SELECT z.msisdn FROM zhuanli_0603z);
CREATE TABLE zhuanli_0607_r AS
SELECT r.*,z.district area_name FROM zhuanli_0607r,zhuanli_0606z
WHERE r.other_party=z.msisdn
UNION
SELECT r.*,z.area_name FROM zhuanli_0607r,zhuanli_0620z
WHERE r.other_party=z.mobile_num
UNION
SELECT r.*,z.area_name FROM zhuanli_0607r,tf_f_kd_add_r z
WHERE r.other_party=z.serial_number;
Step 5: calculating the ratio of the sum of the called times of the resident area with the ranking of 2 to the total calling times of the candidate numbers, and taking the candidate number with the ratio larger than 24% as the non-harassment number.
By way of example, program source code is as follows:
DROP TABLE zhuanli_0608
CREATE TABLE zhuanli_0608AS
SELECT msisdn,sum(call_num)call_num FROM(
SELECT msisdn,area_name,call_num,row_number()over(partition by msisdn order by call_num desc)rn
FROM
(SELECT z.msisdn,area_name,count(*)call_num FROM zhuanli_0607_r z
WHERE area_name IS NOT NULL
GROUP BY z.msisdn,area_name))
WHERE rn<=2
GROUP BY msisdn;
SELECT COUNT(*)FROM zhuanli_0608;
create TABLE zhuanli_0608_1AS
SELECT z.msisdn,count(*)call_num FROM zhuanli_0607z
GROUP BY msisdn;
CREATE TABLE zhuanli_0608_r AS
SELECT z.msisdn,z.call_num CALL_num_main,zh.call_num call_num_total
FROM zhuanli_0608z,zhuanli_0608_1zh
WHERE z.msisdn=zh.msisdn AND z.call_num>0.5*0.48*zh.call_num;
step 6: and carrying out accuracy verification on the result of the harassment number identification.
CREATE TABLE fz_no_stop AS
SELECT/*+parallel(d,30)*/d.msisdn,COUNT(*)call_numFROM_fz_ods rh_cb_cdr_d d
WHERE d.acct_month='202205'
AND d.day_id IN('25','26','27','28','29','30','31')
AND d.net_type_code='50'
AND d.call_type='01'
AND d.msisdn IN
(select serial_number FROM no_stop_list t WHERE t.In_Type=100)
GROUP BY d.msisdn
HAVING COUNT(*)>50。
In the harassment number identification method provided by the embodiment, candidate numbers meeting the first condition are determined according to call parameters of the numbers; for each candidate number, determining a resident area of a called number corresponding to the candidate number, and calculating the called times of each resident area; calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas according to the number of the called times; if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. In this embodiment, whether the called numbers corresponding to the candidate numbers are gathered in the predetermined number of resident areas is determined by judging whether the ratio of the sum of the called numbers of the predetermined number of resident areas before ranking to the sum of the called numbers of all resident areas is greater than a second threshold value, so that whether the candidate numbers are harassment numbers is identified, and the accuracy of harassment number identification is improved.
Example IV
Fig. 4 is a schematic structural diagram of a harassment number identification device provided in a fourth embodiment of the present application, as shown in fig. 4, where the device includes:
a screening module 41, configured to determine candidate numbers that satisfy the first condition according to call parameters of each number;
a processing module 42, configured to determine, for each candidate number, a resident area of a called number corresponding to the candidate number;
a calculating module 43, configured to calculate the number of called times in each resident area;
the calculating module 43 is further configured to calculate a ratio of a sum of the number of callers of the resident areas of the predetermined number before ranking to a sum of the number of callers of all resident areas according to the number of callers being at least greater than one;
a determining module 44, configured to determine that the candidate number is a non-harassment number if the ratio is greater than a preset second threshold; otherwise, the candidate number is judged to be the harassment number.
In this embodiment, the call parameter characterizes the call frequency of the number, and the first condition is that the call frequency exceeds a preset first threshold. In practical application, the frequency of calling the harassment number is high, and candidate numbers which are possibly harassment numbers can be primarily screened out by judging whether the frequency of calling exceeds a preset first threshold value.
The called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number. In practical application, the called number corresponds to a unique resident area, and the called number corresponding to the candidate number has the called number with the same resident area, that is, the resident area may correspond to a plurality of called numbers, so the called number of the resident area is the sum of the called numbers of all the called numbers corresponding to the resident area.
In practical application, the resident areas of the called numbers of the harassment numbers are scattered, and the resident areas of the called numbers of the civil service numbers are gathered. It can be understood that by judging whether the ratio of the sum of the called numbers of the resident areas with the preset number before ranking to the sum of the called numbers of all resident areas is larger than the second threshold value, whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas can be determined, so that whether the candidate numbers are harassment numbers can be identified.
It should be noted that, only the communication parameters of each number of the home network can be acquired for each operator. It can be understood that the called number corresponding to the candidate number is the home network called number. Optionally, calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the total call times of the candidate number according to the number of the called times at least; if the ratio is greater than a preset third threshold, judging that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. Wherein the third threshold is the second threshold multiplied by the home network mobile user market share.
Further, in one possible implementation, the call parameters include a calling duty cycle, a number of times of calling, and a local called number duty cycle; a screening module 41 comprising:
The first acquisition unit is used for acquiring the ticket data of each number in the first period;
the second acquisition unit is used for acquiring the calling ratio, the calling times and the local called number ratio of each number according to the ticket data;
and the screening unit is used for taking the number of which the calling duty ratio exceeds the first sub-threshold value, the calling times exceeds the second sub-threshold value and the local called number duty ratio exceeds the third sub-threshold value as the candidate number.
The calling ratio of the number is the ratio of calling times in the call times corresponding to the number; the local called number of the number accounts for the local number in the called number corresponding to the number.
In practical application, the harassment number has the characteristics of large calling times occupation ratio, more calling times and more scattered resident areas of called users. It will be appreciated that numbers with calling duty cycle exceeding the first sub-threshold, calling number exceeding the second sub-threshold and local called number duty cycle exceeding the third sub-threshold may be nuisance numbers or non-nuisance numbers.
In the embodiment, the number with the calling ratio exceeding the first sub-threshold, the calling times exceeding the second sub-threshold and the local called number ratio exceeding the third sub-threshold is used as the candidate number, so that the number range in which the non-harassment number possibly exists can be effectively determined, the candidate number is identified, and the harassment number identification efficiency is improved.
Furthermore, in a possible embodiment, the second acquisition unit is specifically configured to:
determining a calling area of each call of each number in a first period of time according to each number in the numbers;
if the calling area with the largest number of calls is consistent with the attribution of the number in the first period, the number is used as a local number;
and calculating the duty ratio of the local number in the called number corresponding to the number, and taking the duty ratio of the local called number as the duty ratio of the number.
The calling area of the call is the attribution of the called number in the call. In practical application, in the first period, the calling area with the largest number of calls is consistent with the attribution of the number, which indicates that in the first period, the called number called by the number is mainly concentrated in the attribution of the number, so that the number is used as a local number.
In this embodiment, for each number, whether the number is a local number is determined by determining whether the calling area with the largest number of calls matches the home location of the number in the first period. And determining the local called duty ratio corresponding to the number by calculating the local number duty ratio in the called number corresponding to the number. Candidate numbers can be determined through the calling duty ratio, the calling times and the local called numbers, and the recognition efficiency of harassment numbers is improved.
In one possible implementation, the processing module 42 is specifically configured to:
determining resident working areas, resident residence areas and broadband installation areas of called numbers corresponding to the candidate numbers;
and aiming at each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number.
The resident working area of the called number is the working area with the largest number of calls of the called number user in a certain period of time. The resident residential area of the called number is the residential area with the longest accumulated internet surfing time in a certain period of time. The broadband installation area of the called number is the area where the broadband installation address under the same certificate as the called number is located.
It can be understood that the common area of the resident working area, resident residence area and broadband installation area of the called number, including the active area covering the called number user, can accurately represent the resident area of the called number. Therefore, the common area of the resident working area, resident residence area and broadband installation area of the called number is used as the resident area of the called number, and the accurate resident area of the called number can be obtained.
In this embodiment, the common area to which the resident work area, resident residence area, and broadband installation area of the called number belong is used as the resident area of the called number, so that an accurate resident area of the called number can be obtained. And judging whether the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas is larger than a second threshold value or not, and determining whether the called numbers corresponding to the candidate numbers are gathered in the preset number of resident areas or not, so that whether the candidate numbers are harassment numbers or not is identified, and the accuracy of harassment number identification is improved.
In one possible implementation, the processing module 42 includes: a first determination unit; a first determination unit for:
determining a regional code and a base station identifier of each call of the called number in a second period of time according to each called number corresponding to the candidate number;
determining a base station used by the called number in each call in a second period according to the regional code and the base station identifier;
and according to the base station used by the called number in each call in the second period, taking the area where the base station with the largest use times in the second period is positioned as a resident working area of the called number.
A regional area code (Location Area Code, abbreviated as LAC) is used to divide and identify a location area, a Cell Tower ID (abbreviated as CID), representing a mobile base station.
In practical application, the base station identifier can be uniquely determined according to the region code and the base station identifier, and the base station identifier corresponds to the address of the base station. Therefore, the base station used by the called number for each call and the address of the base station used by the called number for each call can be determined according to the area code and the base station identification of the called number for each call. It can be understood that the area where the address of the base station used by the called number in each call is located is consistent with the area where the called number in each call is located.
In this embodiment, according to the address of the base station used by the called number in each call in the second period, the area where the base station with the largest number of times of use in the second period is used as the resident working area of the called number, so as to obtain an accurate resident working area of the called number.
In one possible implementation, the processing module 42 includes: a second determination unit; a second determination unit configured to:
determining the internet access address of each internet surfing of the called number in a third period of time according to each called number corresponding to the candidate number;
And according to the internet access address of the called number in each internet surfing in the third period, taking the area where the internet access address with the longest access accumulated duration is located as a resident residential area of the called number.
In practical application, the internet access address and the access duration of each internet surfing of the called number can be determined according to the home broadband and the wireless network information. It can be understood that the area where the internet access address with the longest accumulated time length is accessed is the area where the called number user is located with the longest time length.
In this embodiment, according to the internet access address of the called number in the third period of time, the area where the internet access address with the longest access accumulated duration in the third period of time is used as the resident residence area of the called number, so as to obtain the accurate resident residence area of the called number.
In one possible implementation, the processing module 42 includes: a third determination unit; a third determination unit configured to:
for each called number corresponding to the candidate number, acquiring a broadband installation address of the called number;
the area where the broadband installation address is located is used as the broadband installation area of the called number.
It will be appreciated that the broadband installation address is within the jurisdiction of the carrier's carrier installing the broadband, and that the jurisdiction of the carrier's carrier installing the broadband is exemplified as the broadband installation zone. Optionally, according to the area divided by the operator, the area where the broadband installation address is located is used as the broadband installation area of the called number.
In the harassment number recognition device provided by the embodiment, a screening module determines candidate numbers meeting a first condition according to call parameters of all the numbers; the processing module determines a resident area of a called number corresponding to each candidate number according to each candidate number; the calculation module calculates the called times of each resident area, and calculates the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all resident areas according to the self-increase of the called times; if the ratio is larger than a preset second threshold value, the judging module judges that the candidate number is a non-harassment number; otherwise, the candidate number is judged to be the harassment number. In this embodiment, whether the called numbers corresponding to the candidate numbers are gathered in the predetermined number of resident areas is determined by judging whether the ratio of the sum of the called numbers of the predetermined number of resident areas before ranking to the sum of the called numbers of all resident areas is greater than a second threshold value, so that whether the candidate numbers are harassment numbers is identified, and the accuracy of harassment number identification is improved.
Example five
Fig. 5 is a schematic structural diagram of an electronic device provided in a fifth embodiment of the present application, as shown in fig. 5, where the electronic device includes:
A processor (processor) 51, the electronic device further comprising a memory (memory) 52; a communication interface (Communication Interface) 53 and a bus 54 may also be included. The processor 51, the memory 52, the communication interface 53, and the communication may be performed through the bus 54. The communication interface 53 may be used for information transfer. The processor 51 may call logic instructions in the memory 52 to perform the methods of the above-described embodiments.
Further, the logic instructions in the memory 52 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.
The memory 52 is a computer readable storage medium that can be used to store a software program, a computer executable program, and program instructions/modules corresponding to the methods in the embodiments of the present application. The processor 51 executes functional applications and data processing by running software programs, instructions and modules stored in the memory 52, i.e. implements the methods of the method embodiments described above.
Memory 52 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 52 may include high-speed random access memory, and may also include nonvolatile memory.
The present embodiments provide a non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement a method as in the previous embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for identifying harassment numbers, comprising:
according to the call parameters of the numbers, determining candidate numbers meeting the first condition; the call frequency of the call parameter representation number is the first condition that the call frequency exceeds a preset first threshold;
Determining a resident area of a called number corresponding to each candidate number, and calculating called times of each resident area; the called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number;
calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas according to the self-increase of the called times;
if the ratio is greater than a preset second threshold, judging that the candidate number is a non-harassment number; otherwise, judging the candidate number as a harassment number.
2. The method of claim 1, wherein the call parameters include a calling party duty cycle, a number of times of calling, and a local called number duty cycle; the determining the candidate number meeting the first condition according to the call parameters of the numbers comprises the following steps:
obtaining ticket data of the numbers in a first period;
according to the ticket data, obtaining the calling ratio, the calling times and the local called number ratio of each number; the local called number of the number accounts for the local number in the called number corresponding to the number;
And taking the number with the calling duty ratio exceeding a first sub-threshold, the calling times exceeding a second sub-threshold and the local called number duty ratio exceeding a third sub-threshold as the candidate number.
3. The method of claim 2, wherein obtaining a local called number ratio for each number based on the ticket data comprises:
determining a calling area of each call of each number in the first period of time according to each number in the numbers; wherein, the calling area of the call is the attribution of the called number in the call;
if the calling area with the largest number of calls in the first period is consistent with the attribution of the number, the number is used as a local number;
and calculating the duty ratio of the local number in the called number corresponding to the number, and taking the duty ratio as the duty ratio of the local called number of the number.
4. A method according to any one of claims 1-3, wherein, for each candidate number, determining the resident area of the called number corresponding to the candidate number includes:
determining resident working areas, resident residence areas and broadband installation areas of called numbers corresponding to the candidate numbers;
and regarding each called number corresponding to the candidate number, taking a common area of a resident working area, a resident residential area and a broadband installation area of the called number as the resident area of the called number.
5. The method of claim 4, wherein the determining the resident workplace of the called number corresponding to the candidate number comprises:
determining a region code and a base station identifier of each call of the called number in a second period of time according to each called number corresponding to the candidate number;
determining a base station used by the called number in each call in the second period according to the regional code and the base station identifier;
and according to the base station used by the called number in each call in the second period, taking the area where the base station with the largest use times in the second period is positioned as a resident working area of the called number.
6. The method of claim 4, wherein the determining the resident residential area of the called number corresponding to the candidate number comprises:
determining the internet access address of each internet surfing of the called number in a third period of time according to each called number corresponding to the candidate number;
and according to the internet access address of the called number on the internet each time in the third period, taking the area where the internet access address with the longest access accumulated duration is located as a resident residential area of the called number.
7. The method of claim 4, wherein the determining the broadband installation area of the called number corresponding to the candidate number comprises:
acquiring a broadband installation address of each called number corresponding to the candidate number;
and taking the area where the broadband installation address is located as a broadband installation area of the called number.
8. A harassment number identification device, comprising:
the screening module is used for determining candidate numbers meeting the first condition according to the call parameters of the numbers; the call frequency of the call parameter representation number is the first condition that the call frequency exceeds a preset first threshold;
the processing module is used for determining a resident area of the called number corresponding to each candidate number;
the calculation module is used for calculating the called times of each resident area; the called times of the resident area are the sum of the called times of all called numbers belonging to the resident area in the called numbers corresponding to the candidate number;
the calculating module is also used for calculating the ratio of the sum of the called times of the resident areas with the preset number before ranking to the sum of the called times of all the resident areas according to the number of the called times at least;
The judging module is used for judging that the candidate number is a non-harassment number if the ratio is larger than a preset second threshold value; otherwise, judging the candidate number as a harassment number.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-7.
CN202211606125.XA 2022-12-12 2022-12-12 Harassment number identification method and device, electronic equipment and storage medium Pending CN116016771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211606125.XA CN116016771A (en) 2022-12-12 2022-12-12 Harassment number identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211606125.XA CN116016771A (en) 2022-12-12 2022-12-12 Harassment number identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116016771A true CN116016771A (en) 2023-04-25

Family

ID=86024014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211606125.XA Pending CN116016771A (en) 2022-12-12 2022-12-12 Harassment number identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116016771A (en)

Similar Documents

Publication Publication Date Title
CN105007171A (en) User data analysis system and method based on big data in communication field
CN104683537B (en) A kind of number mark method and device
CN102469460B (en) Method for identifying invalid international mobile equipment identity number and apparatus thereof
CN110337059B (en) Analysis algorithm, server and network system for family relationship of user
CN104683538B (en) Harassing call number banking process and system
CN102469435A (en) Method for raising terminal model identification accuracy of mobile terminal and apparatus thereof
CN108111320B (en) Local service charging method, server and charging gateway
US20120185540A1 (en) Method and Arrangement for Supporting Analysis of Social Networks in a Communication Network
CN103906027A (en) User value evaluation method and system based on mobile user internet surfing behaviors
CN109640312A (en) &#34; black card &#34; recognition methods, electronic equipment and computer program product
CN105451234A (en) Signaling interactive data-based suspicious number analyzing method and device
CN105824818A (en) Informationized management method, platform and system
CN106161389B (en) Cheating identification method and device and terminal
CN104853357A (en) Method and system for automatically identifying and triggering fraud number
CN105959306A (en) IP address identification method and device
CN111428197B (en) Data processing method, device and equipment
CN106255132B (en) A kind of method and device that tracking area is set
CN105554785B (en) A kind of group technology and device
CN116016771A (en) Harassment number identification method and device, electronic equipment and storage medium
CN109121137B (en) Method and device for identifying user number use type of double-card terminal
CN116016769A (en) Identification method and device for fraudulent party and readable storage medium
CN110166964A (en) A kind of determination method and device of base station to be expanded
CN107154875B (en) Method for node sensitivity sequencing in telephone communication network
CN106993309B (en) User value evaluation method and device
CN110708706B (en) Area evaluation method, apparatus and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination