CN112492534B - Message processing method, device and equipment - Google Patents

Message processing method, device and equipment Download PDF

Info

Publication number
CN112492534B
CN112492534B CN202011338960.0A CN202011338960A CN112492534B CN 112492534 B CN112492534 B CN 112492534B CN 202011338960 A CN202011338960 A CN 202011338960A CN 112492534 B CN112492534 B CN 112492534B
Authority
CN
China
Prior art keywords
message
account
messages
determining
message set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011338960.0A
Other languages
Chinese (zh)
Other versions
CN112492534A (en
Inventor
李�根
王扬
郭超
王科峰
贲卫国
于波
宋微
刘佳
吴金吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202011338960.0A priority Critical patent/CN112492534B/en
Publication of CN112492534A publication Critical patent/CN112492534A/en
Application granted granted Critical
Publication of CN112492534B publication Critical patent/CN112492534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the application provides a message processing method, a device and equipment, comprising the following steps: determining a second message set in the first message set, wherein messages in the second message set are of preset types; dividing the second message set into at least one third message set according to the similarity among the messages in the second message set, wherein the similarity among the messages in the third message set is greater than or equal to a first threshold value; acquiring a first account corresponding to each message in the third message set; determining a first message quantity corresponding to each first account in the first message set, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining a target account in the account set, wherein the account set includes accounts corresponding to messages in the second message set. The identification precision of the mobile phone number for sending the spam short message by the operator is improved.

Description

Message processing method, device and equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, and a device for processing a message.
Background
At present, a spammer usually holds a large number of mobile phone cards, and in order to improve the efficiency of sending spam messages, the same mobile phone card is usually used for sending spam messages with similar text contents, and an operator needs to forbid the number of the mobile phone sending the spam messages, so that the mobile phone of a user is prevented from receiving a large number of spam messages.
In the prior art, an operator usually determines a mobile phone number to be forbidden according to the number of spam messages sent by the mobile phone number within a period of time. For example, in one hour, when the number of spam messages sent by the same mobile phone number is greater than a preset value, the operator may block the mobile phone number. However, to avoid the missealing of the mobile phone number, the preset value set by the operator is usually large, and the sender of the spam message can ensure that the number of spam messages sent by the same mobile phone number is smaller than the preset value set by the operator within one hour, so that the sending number of spam messages cannot be identified by the operator within a short time, and the identification precision of the operator on the mobile phone number sending spam messages is low.
Disclosure of Invention
The embodiment of the application provides a message processing method, a message processing device and message processing equipment, which are used for solving the technical problem that the identification precision of mobile phone numbers of spam messages is low in the prior art.
In a first aspect, an embodiment of the present application provides a message processing method, where the method includes:
determining a second message set in a first message set, wherein the first message set comprises messages received by at least one terminal device within a preset time period, and the messages in the second message set are of preset types;
dividing the second message set into at least one third message set according to the similarity among the messages in the second message set, wherein the similarity among the messages in the third message set is greater than or equal to a first threshold value;
acquiring a first account corresponding to each message in the third message set, wherein the first account is an account used by terminal equipment for sending the message;
determining a first message quantity corresponding to each first account in the first message set, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining a target account in the account set, wherein the account set includes accounts corresponding to messages in the second message set.
In a possible implementation manner, dividing the second message set into at least one third message set according to the similarity between the messages in the second message set includes:
acquiring the similarity between the messages in the second message set;
and clustering the messages in the second message set according to the similarity so as to divide the second message set into at least one third message set.
In a possible implementation manner, obtaining the similarity between the messages in the second message set includes:
acquiring character attributes of each message in the second message set, wherein the character attributes comprise at least one of the following: a number of characters included in the message or a stroke number of characters included in the message;
and determining the similarity among the messages in the second message set according to the character attributes.
In a possible implementation manner, determining a similarity between the messages in the second message set according to the character attribute includes:
determining the editing distance between the messages in the second message set according to the character attributes;
and determining the similarity between the messages in the second message set according to the editing distance.
In a possible implementation manner, determining, in the first message set, a first message quantity corresponding to each first account, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining, in the account set, a target account includes:
determining a second threshold value according to the first message quantity and the third message quantity;
and determining a target account in the account set according to the second message quantity and the second threshold value.
In a possible implementation, determining the second threshold according to the first message quantity and the third message quantity includes:
acquiring a first preset relation, wherein the first preset relation comprises a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
determining a proportionality coefficient according to the first message quantity and the first preset relation;
and determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity.
In a possible implementation manner, determining a target account in the account set according to the second message number and the second threshold includes:
judging whether the second message quantity is larger than a second threshold value;
if yes, determining the first account as the target account in the account set.
In a second aspect, an embodiment of the present application provides a message processing apparatus, which includes a first determining module, a dividing module, an obtaining module, and a second determining module, where:
the first determining module is configured to determine a second message set in a first message set, where the first message set includes messages received by at least one terminal device within a preset time period, and the messages in the second message set are of a preset type;
the dividing module is used for dividing the second message set into at least one third message set according to the similarity among the messages in the second message set, wherein the similarity among the messages in the third message set is greater than or equal to a first threshold;
the acquisition module is used for acquiring a first account corresponding to each message in the third message set, wherein the first account is an account used by the terminal device to send the message;
the second determining module is configured to determine, in the first message set, a first message quantity corresponding to each first account, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determine, in the account set, a target account, where the account set includes accounts corresponding to messages in the second message set.
In a possible implementation, the dividing module is specifically configured to:
acquiring the similarity between the messages in the second message set;
and clustering the messages in the second message set according to the similarity so as to divide the second message set into at least one third message set.
In a possible implementation, the dividing module is specifically configured to:
acquiring character attributes of each message in the second message set, wherein the character attributes comprise at least one of the following: a number of characters included in the message or a stroke number of characters included in the message;
and determining the similarity among the messages in the second message set according to the character attributes.
In a possible implementation, the dividing module is specifically configured to:
determining the editing distance between the messages in the second message set according to the character attributes;
and determining the similarity between the messages in the second message set according to the editing distance.
In a possible implementation manner, the second determining module is specifically configured to:
determining a second threshold value according to the first message quantity and the third message quantity;
and determining a target account in the account set according to the second message quantity and the second threshold value.
In a possible implementation manner, the second determining module is specifically configured to:
acquiring a first preset relation, wherein the first preset relation comprises a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
determining a proportionality coefficient according to the first message quantity and the first preset relation;
and determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity.
In a possible implementation manner, the second determining module is specifically configured to:
judging whether the second message quantity is larger than a second threshold value;
if yes, determining the first account as the target account in the account set.
In a third aspect, an embodiment of the present application provides a message processing apparatus, including: a memory for storing program instructions, a processor for invoking the program instructions in the memory to perform a message processing method according to any one of the first aspect, and a communication interface.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a computer program is stored; the computer program is for implementing a message processing method as claimed in any one of the first aspect.
The embodiment of the application provides a message processing method, a message processing device and message processing equipment, wherein a second message set is determined in a first message set, the first message set comprises messages received by at least one terminal device in a preset time period, and the messages in the second message set are of a preset type. And dividing the second message set into at least one third message set according to the similarity between the messages in the second message set, wherein the similarity between the messages in the third message set is greater than or equal to a first threshold value, and acquiring a first account corresponding to each message in the third message set, wherein the first account is an account used by the terminal equipment for sending the message. Determining a first message quantity corresponding to each first account in a first message set, a second message quantity corresponding to the first account in a third message set, and a third message quantity included in the third message set, and determining a target account in the account set, wherein the account set includes accounts corresponding to messages in the second message set. In the method, messages with similar text contents can be accurately divided into the same third message set according to the similarity among the messages in the second message set, and whether the first account is the target account or not can be accurately determined according to the number of messages sent by the first account in the first message set, the number of messages included in the third message set and the number of messages sent by the first account in the third message set, so that the identification precision of an operator on the mobile phone number for sending spam messages is improved.
Drawings
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a message processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a process of obtaining a third message set according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another message processing method according to an embodiment of the present application;
fig. 5 is a schematic process diagram of a message processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic hardware structure diagram of a message processing device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For ease of understanding, an application scenario of the embodiment of the present application is first described.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. Please refer to fig. 1, which includes a mobile phone a, a mobile phone B and an operator. The mobile phone A can send a short message to the mobile phone B, the mobile phone B can receive the short message sent by the mobile phone A, and the mobile phone A and the mobile phone B can perform data interaction with an operator.
When the mobile phone a sends the spam message to the mobile phone B, the operator may obtain a sender of the spam message received by the mobile phone B, for example, the operator may determine the sender of the spam message according to a mobile phone number corresponding to the spam message. When the operator determines that the mobile phone A is the sender of the spam short message, the operator can stop the number of the mobile phone A, so that the mobile phone A cannot send the spam short message to the mobile phone B.
In the related art, the sender of spam messages generally transforms the text content of spam messages on the premise of not affecting reading, and the number of spam messages sent by using the same mobile phone number in a period of time is smaller than the upper limit of spam message sending set by an operator, so that the operator cannot identify the mobile phone number sending spam messages in a short time, and the identification precision of the operator on the mobile phone number sending spam messages is low.
In order to solve the technical problem that an operator has low recognition accuracy of mobile phone numbers for sending spam messages in the related art, in the embodiment of the application, a first spam message set in all messages sent by a terminal device within a period of time is determined, and the spam messages are divided into second spam message sets of different types according to the similarity between the spam messages in the first spam message set. And determining the mobile phone number corresponding to the spam messages in the second spam message set, and determining whether the mobile phone number is the mobile phone number for sending the spam messages or not according to the number of all messages sent by the mobile phone number in a period of time, the number of the sent spam messages and the total number of the spam messages in the second spam message set. In the method, according to the similarity among the spam messages in the spam message set, the spam messages with similar text contents can be accurately divided into the same second spam message set, if a plurality of spam messages in the same second spam message set are spam messages sent by the same mobile phone number, the mobile phone number is the sending number of the spam messages, and the mobile phone number can be blocked, so that the identification precision of an operator on the mobile phone number sending the spam messages can be improved.
The technical solutions shown in the examples of the present application will be described in detail below with specific examples. It should be noted that the following embodiments may exist alone or in combination with each other, and description of the same or similar contents is not repeated in different embodiments.
Fig. 2 is a flowchart illustrating a message processing method according to an embodiment of the present application. Referring to fig. 2, the method may include:
s201, determining a second message set in the first message set.
The execution subject of the embodiment of the present application may be a server, or may be a message processing apparatus provided in the server. Alternatively, the message processing apparatus may be implemented by software, or may be implemented by a combination of software and hardware.
The first set of messages comprises messages received by at least one terminal device within a preset time period. The terminal device may be any device having a message receiving function. For example, the terminal device may be a mobile phone, a computer, or the like. For example, when the terminal device is a mobile phone, the first message set may be a message set such as a short message and a WeChat received by at least one mobile phone within a preset time period.
And the messages in the second message set are of preset types. Optionally, the preset type may be a spam message. Wherein the spam message may be a message sent to the user without the user's consent that the user is unwilling to receive. For example, spam messages may include spam messages, spam, and the like.
Optionally, the second set of messages may be a set of spam messages. For example, the first message set is a short message received by at least one mobile phone within a preset time period, wherein a set formed by spam messages in the first message set is a second message set. Optionally, the spam message may also include a spam WeChat message, a spam notification message, and the like received by the mobile phone.
Optionally, the second set of messages may be a set of spam messages. For example, the first message set is a mail received by at least one mailbox within a preset time period, wherein a set formed by junk mails in the first message set is the second message set.
Optionally, the second set of messages may be determined in the first set of messages according to the following possible implementations: and determining keywords of the messages in the second message set, and determining the second message set in the first message set according to the keywords. For example, when the second message set is a spam message set, determining keywords of spam messages, and screening messages meeting the keywords of the spam messages from the first message set according to a keyword filtering algorithm to form the second message set.
Alternatively, the second set of messages may be determined from the first set of messages according to an artificial intelligence machine learning algorithm. For example, when the second message set is a spam message set, the model can be trained according to the sample spam messages and the sample normal messages, the spam messages can be identified according to the trained model, and then the second message set is determined in the first message set.
S202, dividing the second message set into at least one third message set according to the similarity among the messages in the second message set.
The third message set is a preset type of message set. For example, when the preset type is a spam message, the second message set is a spam message set, and the third message set divided according to the second message set is also a spam message set.
Optionally, the similarity between the messages in the third message set is greater than or equal to the first threshold. For example, when the messages in the third message set are spam messages, the similarity of text content between spam messages is greater than or equal to the first threshold.
The second set of messages may be divided into at least one third set of messages according to the following feasible ways: and acquiring the similarity between the messages in the second message set. Wherein, the similarity refers to the similarity degree of the text content between the messages. For example, if the text contents in the short message a and the short message B are the same, the similarity between the short message a and the short message B is one hundred percent. And clustering the messages in the second message set according to the similarity so as to divide the second message set into at least one third message set.
Optionally, the similarity between the messages in the second message set may be obtained according to the following feasible manner: acquiring character attributes of each message in the second message set, wherein the character attributes comprise at least one of the following: the number of characters included in the message or the number of strokes of the characters included in the message.
And determining the similarity between the messages in the second message set according to the character attributes. For example, according to the character attribute, the edit distance between the messages in the second message set is determined, and according to the edit distance, the similarity between the messages in the second message set is determined. The editing distance refers to the minimum number of editing operations required for converting one text into another text of the text content. For example, the editing operations may include: replace a text, insert a text, delete a text, etc. Alternatively, the smaller the number of editing operations required to change one text into another text, the greater the similarity between the two texts. For example, if the number of editing operations required to edit the text "one" into the text "two" is one, the similarity between the text "one" and the text "two" is high, and if the number of editing operations required to edit the text "one" into the text "delete" is six, the similarity between the text "one" and the text "delete" is low. For example, the number of strokes of the text content in the spam message a is 100, and the number of strokes of the text content in the spam message B is 100, the similarity between the spam message a and the spam message B is one hundred percent, and the spam message a and the spam message B are in the same third message set.
Optionally, according to the similarity between the messages in the second message set, a Mean-Shift clustering algorithm may be used to perform clustering processing on the messages in the second message set, so as to obtain at least one third message set. Since the number of message types cannot be predicted, the unit of the sliding window radius of the Mean-Shift clustering algorithm is the levenstein distance, and the variation amplitude of the message types can be controlled. For example, a message is considered another type of message when the message varies by more than the sliding window radius.
Next, a process of dividing the second message set into at least one third message set will be described in detail with reference to fig. 3.
Fig. 3 is a schematic process diagram of obtaining a third message set according to an embodiment of the present application. Please refer to fig. 3, which includes a second message set, a third message set a, a third message set B and a third message set C. The second message set includes 8 messages (each circle represents one message), the third message set a includes 3 messages, the third message set B includes 2 messages, and the third message set C includes 4 messages.
Referring to fig. 3, the character attributes of the messages in the second message set are obtained, and the messages in the second message set are subjected to similarity division according to the character attributes of the messages. The higher the similarity between messages, the closer the distance between messages, the lower the similarity between messages, and the farther the distance between messages.
Clustering the messages after the similarity division to obtain a third message set A, a third message set B and a third message set C, wherein the similarity among the messages in the third message set A is greater than or equal to a first threshold, the similarity among the messages in the third message set B is greater than or equal to the first threshold, and the similarity among the messages in the third message set C is greater than or equal to the first threshold.
S203, acquiring a first account corresponding to each message in the third message set.
The first account is an account used by the terminal device for sending the message. For example, when the terminal device is a mobile phone, the first account for sending the short message by the mobile phone is a mobile phone number, the first account for sending the WeChat message by the mobile phone is a WeChat number, and the first account for sending the email by the mobile phone is a mailbox account.
Optionally, the first account may be determined according to a corresponding relationship between each message in the third message set and the account. For example, the correspondence between the message and the account may be as shown in table 1:
TABLE 1
Message Account number
Message 1 Account number 1
Message 2 Account 2
Message 3 Account number 3
…… ……
It should be noted that table 1 illustrates the correspondence between the message and the account by way of example only, and does not limit the correspondence between the message and the account.
For example, when the message in the third message set is message 1, the first account is account 1; when the message in the third message set is message 2, the first account is account 2; and when the message in the third message set is the message 3, the first account is the account 3.
Optionally, when the third message set is a spam short message set, the first account corresponding to each message is a mobile phone number for sending a spam short message, when the third message set is a spam micro message set, the first account corresponding to each message is a micro message account for sending a spam micro message, and when the third message set is a spam mail set, the first account corresponding to each message is a mailbox account for sending a spam mail.
S204, determining a first message quantity corresponding to each first account in the first message set, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining a target account in the account set.
The first message number is the number of messages sent by the first account in the first message set. For example, 100 messages are included in the first message set, where 10 messages are messages sent by the first account, and the first message number is 10.
Optionally, the number of messages sent by the first account in the first message set may be determined according to the correspondence between each message in the first message set and the account. For example, when the account corresponding to the message in the first message set is the same as the first account, the message is a message sent by the first account in the first message set.
The second message number is the number of messages sent by the first account in the third message set. For example, 10 messages are included in the third message set, where 3 messages are messages sent by the first account, and the second message number is 3.
Optionally, the number of messages sent by the first account in the third message set may be determined according to the corresponding relationship between each message in the third message set and the account. For example, when the account corresponding to the message in the third message set is the same as the first account, the message is a message sent by the first account in the third message set.
The third message number is a number of messages included in the third message set. For example, when the third message set is the spam message set, the total number of spam messages in the third message set is the third message number.
Optionally, the third number of messages may be obtained according to the result of the clustering process. For example, when clustering the messages in the second message set, the number of messages in each of the divided third message sets may be determined.
The account set comprises accounts corresponding to the messages in the second message set. Optionally, the account corresponding to each message in the second message set may form an account set. For example, if the messages in the second message set are spam messages, all the mobile phone numbers corresponding to the spam messages in the second message set are account numbers included in the account number set.
The target account number may be determined in the account number set according to two possible implementations as follows:
one possible implementation is:
and determining a second threshold value according to the first message quantity and the third message quantity. And determining a target account in the account set according to the second message quantity and a second threshold value. Optionally, the second threshold may be a preset threshold, and if the number of second messages corresponding to the first account in the third message set is greater than or equal to the second threshold, the first account is determined as the target account in the account set.
Optionally, when the message in the third message set is a spam message, the first account is a mobile phone number for sending the spam message, and when the target mobile phone number is determined in the account set, the operator can perform processing such as halt for a period of time and forbidden, so that the probability of receiving the spam message by the mobile phone of the user is reduced, and the use experience of the user can be improved.
Another possible implementation:
and determining the suspicious value of the first account according to the first message quantity corresponding to the first account in the first message set and the second message quantity corresponding to the first account in the third message set. And when the suspicious value is larger than or equal to a preset threshold value, determining the first account as a target account in the account set. For example, the suspect value may be a ratio of the number of first messages and the number of second messages. For example, if the number of first messages corresponding to the first account in the first message set is 100, and the number of second messages corresponding to the first account in the third message set is 30, the suspicious value of the first account is 0.3, and if the preset threshold is 0.1, the first account is the target account.
The message processing method provided by the embodiment of the application determines the second message set in the first message set, obtains the character attributes of the messages in the second message set, and determines the similarity between the messages in the second message set according to the character attributes. And according to the similarity, clustering the messages in the second message set to divide the second message set into at least one third message set. And acquiring a first account corresponding to each message in the third message set, and determining the first message quantity, the second message quantity and the third message quantity included in the third message set corresponding to the first account. And determining a second threshold value according to the first message quantity and the third message quantity, and determining a target account in the account set according to the second message quantity and the second threshold value. According to the method, the similarity between the messages in the second message set can be accurately obtained according to the character attributes of the messages in the second message set, and then a plurality of third message sets can be accurately obtained according to the similarity.
Based on the embodiment shown in fig. 2, the following describes the message processing method in detail with reference to fig. 4.
Fig. 4 is a flowchart illustrating another message processing method according to an embodiment of the present application. Referring to fig. 4, the method may include:
s401, determining a second message set in the first message set.
It should be noted that the execution process of S401 may refer to the execution process of S201, and is not described herein again.
S402, dividing the second message set into at least one third message set according to the similarity among the messages in the second message set.
It should be noted that the execution process of S403 may refer to the execution process of S203, and details are not described here.
S403, acquiring a first account corresponding to each message in the third message set.
It should be noted that the execution process of S403 may refer to the execution process of S203, and details are not described here.
S404, determining a first message quantity corresponding to each first account in the first message set, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set.
It should be noted that the execution process of S404 may refer to the execution process of S204, and is not described herein again.
S405, acquiring a first preset relation, and determining a proportionality coefficient according to the first message quantity and the first preset relation.
The first preset relationship comprises a plurality of message quantities and a proportionality coefficient corresponding to each message quantity. For example, the first preset relationship may be as shown in table 2:
TABLE 2
Number of messages Coefficient of proportionality
Number of messages 1 Coefficient of proportionality 1
Number of messages 2 Coefficient of proportionality 2
Number of messages 3 Coefficient of proportionality 3
…… ……
It should be noted that table 2 illustrates the first preset relationship by way of example only, and does not limit the first preset relationship.
Optionally, the scaling factor may be determined according to the first message quantity and the first preset relationship. For example, when the first message number corresponding to the first account in the first message set is message number 1, the corresponding scaling factor is scaling factor 1; when the first message quantity corresponding to the first account in the first message set is the message quantity 2, the corresponding proportionality coefficient is the proportionality coefficient 2; when the first message quantity corresponding to the first account in the first message set is the message quantity 3, the corresponding proportionality coefficient is the proportionality coefficient 3.
Optionally, the value of the proportionality coefficient is between 0 and 1.
S406, determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity.
The second threshold may be determined according to the following possible implementations: and determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity. For example, the second threshold is obtained by multiplying a ratio of the first message quantity to the third message quantity by a scaling factor. For example, the number of first messages corresponding to the first account in the first message set is 100, the number of third messages in the third message set in which the first account is located is 10, and the scaling factor is 0.3, then the second threshold is 3.
S407, judging whether the second message quantity is larger than or equal to a second threshold value.
If not, go to step S408.
If yes, S409 is performed.
S408, determining that the first account is not the target account.
S409, determining the first account as a target account in the account set.
Optionally, an account identical to the first account is determined as a target account in the account set.
The message processing method provided by the embodiment of the application determines a second message set in the first message set, divides the second message set into at least one third message combination according to the similarity between the messages in the second message set, acquires a first account corresponding to each message in the third message set, and determines the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, and the number of third messages included in the third message set. The method comprises the steps of obtaining a first preset relation, determining a proportionality coefficient according to the first message quantity and the first preset relation, determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity, and determining the first account number as a target account number in an account number set when the second message quantity is larger than or equal to the second threshold value. In the method, the second message set can be accurately divided into a plurality of third message sets according to the similarity among the messages in the second message set, and further, the probability of error sealing of the account by the operator can be reduced according to the first message quantity, the proportionality coefficient and the second threshold value which can be determined according to the third message quantity, and meanwhile, when the second message quantity is larger than or equal to the second threshold value, the first account sends a plurality of messages with higher similarity, the first account can be determined to be the target account in the account set, and then the target account is processed, so that the identification precision of the operator on the mobile phone number which sends the spam short message can be improved.
On the basis of any of the above embodiments, the following describes in detail a message processing method by a specific example with reference to fig. 5.
Fig. 5 is a process diagram of a message processing method according to an embodiment of the present application. Please refer to fig. 5, which includes a first message set, a second message set, a third message set a, and a third message set B, where the first message set includes a plurality of messages, the second message set includes 8 messages of a preset type (a circle in fig. 5 represents one message), the third message set a includes 4 messages of the preset type, the third message set B includes 4 messages of the preset type, and the message 1 is a message in the third message set a. The similarity of character attributes among the messages in the third message set A is larger than or equal to a first threshold, and the similarity of character attributes among the messages in the third message set B is larger than or equal to the first threshold.
Referring to fig. 5, messages conforming to a preset type are screened out from the first message set to form a second message set. And performing similarity processing on the messages in the second message set according to the character attributes of the messages, wherein the higher the message similarity is, the smaller the distance between circles in fig. 5 is. And clustering the messages in the second message set to obtain a third message set A and a third message set B, wherein the messages in the third message set A have lower similarity with the messages in the third message set B.
And determining a first account corresponding to the message 1 in the third message set A, and determining the number of first messages corresponding to the first account in the first message set. The number of messages in the third message set a is 4, the scaling factor is determined according to the first preset relationship and the first message number, and the second threshold value can be determined according to the first message number, the scaling factor and the message number in the third message set a. And determining a second message quantity corresponding to the first account in the third message set A, and determining the first account as a target account in the account set when the second message quantity is greater than or equal to a second threshold value.
When the number of messages corresponding to the first account in the third message set a is greater than or equal to the second threshold, it is indicated that the first account sends a plurality of messages of preset types with similar character attributes, and the first account can be determined as a target account in the account set, so as to process the target account, thereby improving the identification precision of an operator on a mobile phone number for sending spam messages.
Fig. 6 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present application. The message processing apparatus may be provided in a terminal device. Referring to fig. 6, the message processing apparatus 10 includes: a first determining module 11, a dividing module 12, an obtaining module 13 and a second determining module 14, wherein:
the first determining module 11 is configured to determine a second message set in a first message set, where the first message set includes messages received by at least one terminal device within a preset time period, and the messages in the second message set are of a preset type;
the dividing module 12 is configured to divide the second message set into at least one third message set according to the similarity between the messages in the second message set, where the similarity between the messages in the third message set is greater than or equal to a first threshold;
the obtaining module 13 is configured to obtain a first account corresponding to each message in the third message set, where the first account is an account used by the terminal device to send the message;
the second determining module 14 is configured to determine, in the first message set, a first message quantity corresponding to each first account, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determine, in an account set, a target account, where the account set includes accounts corresponding to messages in the second message set.
In a possible implementation, the dividing module 12 is specifically configured to:
acquiring the similarity between the messages in the second message set;
and clustering the messages in the second message set according to the similarity so as to divide the second message set into at least one third message set.
In a possible implementation, the dividing module 12 is specifically configured to:
acquiring character attributes of each message in the second message set, wherein the character attributes comprise at least one of the following: a number of characters included in the message or a stroke number of characters included in the message;
and determining the similarity among the messages in the second message set according to the character attributes.
In a possible implementation, the dividing module 12 is specifically configured to:
determining the editing distance between the messages in the second message set according to the character attributes;
and determining the similarity between the messages in the second message set according to the editing distance.
In a possible implementation, the second determining module 14 is specifically configured to:
determining a second threshold value according to the first message quantity and the third message quantity;
and determining a target account in the account set according to the second message quantity and the second threshold value.
In a possible implementation, the second determining module 14 is specifically configured to:
acquiring a first preset relation, wherein the first preset relation comprises a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
determining a proportionality coefficient according to the first message quantity and the first preset relation;
and determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity.
In a possible implementation, the second determining module 14 is specifically configured to:
judging whether the second message quantity is larger than a second threshold value;
if yes, determining the first account as the target account in the account set.
The message processing apparatus provided in the embodiment of the present application may execute the technical solutions shown in the foregoing method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 7 is a schematic hardware structure diagram of a message processing device according to an embodiment of the present application. Referring to fig. 7, the message processing apparatus 20 may include: a processor 21 and a memory 22, wherein the processor 21 and the memory 22 may communicate; illustratively, the processor 21 and the memory 22 communicate via a communication bus 23, the memory 22 being configured to store program instructions, and the processor 21 being configured to call the program instructions in the memory to perform the message processing method shown in any of the above-described method embodiments.
Optionally, the message processing device 20 may further comprise a communication interface, which may comprise a transmitter and/or a receiver.
Optionally, the Processor may be a Central Processing Unit (CPU), or may be another general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
An embodiment of the present application provides a readable storage medium, on which a computer program is stored; the computer program is for implementing a message processing method as described in any of the embodiments above.
The embodiment of the application provides a computer program product, which comprises instructions, and when the instructions are executed, the instructions cause a computer to execute the message processing method.
All or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a readable memory. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (flexible disk), optical disk (optical disk), and any combination thereof.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.
In the present application, the terms "include" and variations thereof may refer to non-limiting inclusions; the term "or" and variations thereof may mean "and/or". The terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. In the present application, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Claims (9)

1. A message processing method, comprising:
determining a second message set in a first message set, wherein the first message set comprises messages received by at least one terminal device within a preset time period, and the messages in the second message set are of preset types;
dividing the second message set into at least one third message set according to the similarity among the messages in the second message set, wherein the similarity among the messages in the third message set is greater than or equal to a first threshold value;
acquiring a first account corresponding to each message in the third message set, wherein the first account is an account used by terminal equipment for sending the message;
determining a first message quantity corresponding to each first account in the first message set, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining a target account in the account set, wherein the account set includes accounts corresponding to messages in the second message set;
the determining, in the first message set, a first message quantity corresponding to each first account, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determining, in an account set, a target account includes:
determining a second threshold value according to the first message quantity and the third message quantity; determining a target account in an account set according to the second message quantity and the second threshold value;
or, determining a suspicious value of the first account according to a first message quantity corresponding to the first account in the first message set and a second message quantity corresponding to the first account in the third message set; when the suspicious value is larger than or equal to a preset threshold value, determining a first account as a target account in the account set; the suspicious value is a ratio of the first message quantity and the second message quantity.
2. The method of claim 1, wherein dividing the second set of messages into at least one third set of messages according to similarities between the messages in the second set of messages comprises:
acquiring the similarity between the messages in the second message set;
and clustering the messages in the second message set according to the similarity so as to divide the second message set into at least one third message set.
3. The method of claim 2, wherein obtaining the similarity between the messages in the second set of messages comprises:
acquiring character attributes of each message in the second message set, wherein the character attributes comprise at least one of the following: a number of characters included in the message or a stroke number of characters included in the message;
and determining the similarity among the messages in the second message set according to the character attributes.
4. The method of claim 3, wherein determining the similarity between the messages in the second set of messages according to the character attributes comprises:
determining the editing distance between the messages in the second message set according to the character attributes;
and determining the similarity between the messages in the second message set according to the editing distance.
5. The method of claim 1, wherein determining a second threshold according to the first message quantity and the third message quantity comprises:
acquiring a first preset relation, wherein the first preset relation comprises a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
determining a proportionality coefficient according to the first message quantity and the first preset relation;
and determining a second threshold value according to the first message quantity, the proportionality coefficient and the third message quantity.
6. The method of claim 1, wherein determining a target account number in the set of account numbers according to the second number of messages and the second threshold comprises:
judging whether the second message quantity is larger than a second threshold value;
if yes, determining the first account as the target account in the account set.
7. A message processing apparatus, comprising a first determining module, a dividing module, an obtaining module, and a second determining module, wherein:
the first determining module is configured to determine a second message set in a first message set, where the first message set includes messages received by at least one terminal device within a preset time period, and the messages in the second message set are of a preset type;
the dividing module is used for dividing the second message set into at least one third message set according to the similarity among the messages in the second message set, wherein the similarity among the messages in the third message set is greater than or equal to a first threshold;
the acquisition module is used for acquiring a first account corresponding to each message in the third message set, wherein the first account is an account used by the terminal device to send the message;
the second determining module is configured to determine, in the first message set, a first message quantity corresponding to each first account, a second message quantity corresponding to the first account in the third message set, and a third message quantity included in the third message set, and determine, in an account set, a target account, where the account set includes accounts corresponding to messages in the second message set;
the second determining module is specifically configured to:
determining a second threshold value according to the first message quantity and the third message quantity; determining a target account number in the account number set according to the second message quantity and the second threshold value
The second determining module is further specifically configured to:
determining a suspicious value of the first account according to a first message quantity corresponding to the first account in the first message set and a second message quantity corresponding to the first account in the third message set; when the suspicious value is larger than or equal to a preset threshold value, determining a first account as a target account in the account set; the suspicious value is a ratio of the first message quantity and the second message quantity.
8. A message processing device, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory to implement the message processing method according to any one of claims 1 to 6.
9. A readable storage medium, on which a device control program is stored, which, when executed by a processor, implements the message processing method according to any one of claims 1 to 6.
CN202011338960.0A 2020-11-25 2020-11-25 Message processing method, device and equipment Active CN112492534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011338960.0A CN112492534B (en) 2020-11-25 2020-11-25 Message processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011338960.0A CN112492534B (en) 2020-11-25 2020-11-25 Message processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN112492534A CN112492534A (en) 2021-03-12
CN112492534B true CN112492534B (en) 2022-04-15

Family

ID=74935003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011338960.0A Active CN112492534B (en) 2020-11-25 2020-11-25 Message processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN112492534B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079452A1 (en) * 2010-12-15 2012-06-21 成都市华为赛门铁克科技有限公司 Method, device and terminal for classifying short messages
CN105323763A (en) * 2014-06-27 2016-02-10 中国移动通信集团湖南有限公司 Method and apparatus for identifying spam messages
CN105447028A (en) * 2014-08-27 2016-03-30 阿里巴巴集团控股有限公司 Method and device for identifying characteristic account
CN110119860A (en) * 2018-02-05 2019-08-13 阿里巴巴集团控股有限公司 A kind of rubbish account detection method, device and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868663B2 (en) * 2008-09-19 2014-10-21 Yahoo! Inc. Detection of outbound sending of spam

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079452A1 (en) * 2010-12-15 2012-06-21 成都市华为赛门铁克科技有限公司 Method, device and terminal for classifying short messages
CN105323763A (en) * 2014-06-27 2016-02-10 中国移动通信集团湖南有限公司 Method and apparatus for identifying spam messages
CN105447028A (en) * 2014-08-27 2016-03-30 阿里巴巴集团控股有限公司 Method and device for identifying characteristic account
CN110119860A (en) * 2018-02-05 2019-08-13 阿里巴巴集团控股有限公司 A kind of rubbish account detection method, device and equipment

Also Published As

Publication number Publication date
CN112492534A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
US11671434B2 (en) Abnormal user identification
CN108337153B (en) Method, system and device for monitoring mails
CN108256721B (en) Task scheduling method, terminal device and medium
CN104717674A (en) Number attribute recognition method and device, terminal and server
CN106156105A (en) Email polymerization sorting technique and device
CN103812826A (en) Identification method, identification system, and filter system of spam mail
CN112733639A (en) Text information structured extraction method and device
CN110011898B (en) Reply method and device of e-mail, storage medium and computer equipment
CN107241505B (en) Incoming call reminding method and device and user terminal thereof
CN113408281A (en) Mailbox account abnormity detection method and device, electronic equipment and storage medium
CN114448922A (en) Message grading processing method, device, equipment and storage medium
CN112492534B (en) Message processing method, device and equipment
CN109889432B (en) Information processing method, information processing apparatus, computer apparatus, and computer-readable storage medium
CN108462624A (en) A kind of recognition methods of spam, device and electronic equipment
CN109800432A (en) Assess method, apparatus, storage medium and the electronic equipment of semantic understanding accuracy rate
CN104065617B (en) A kind of harassing and wrecking email processing method, device and system
CN104348712B (en) A kind of rubbish mail filtering method and device
Sharma et al. Identifying spam patterns in sms using genetic programming approach
CN107770738B (en) Method and user terminal for realizing automatic short message classification
CN109547336A (en) Acquisition methods, device and the storage medium of message reading state
CN115130577A (en) Method and device for identifying fraudulent number and electronic equipment
CN112307075B (en) User relationship identification method and device
CN113554062A (en) Training method, device and storage medium of multi-classification model
CN108512744B (en) Advertisement short message identification method, electronic device, terminal equipment and storage medium
CN111626881A (en) Annuity combination risk management system, method, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant