CN111431791B - Instant communication message identification method and system - Google Patents

Instant communication message identification method and system Download PDF

Info

Publication number
CN111431791B
CN111431791B CN202010082692.4A CN202010082692A CN111431791B CN 111431791 B CN111431791 B CN 111431791B CN 202010082692 A CN202010082692 A CN 202010082692A CN 111431791 B CN111431791 B CN 111431791B
Authority
CN
China
Prior art keywords
message
layer
neural network
convolutional
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010082692.4A
Other languages
Chinese (zh)
Other versions
CN111431791A (en
Inventor
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seashell Housing Beijing Technology Co Ltd
Original Assignee
Seashell Housing Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seashell Housing Beijing Technology Co Ltd filed Critical Seashell Housing Beijing Technology Co Ltd
Priority to CN202010082692.4A priority Critical patent/CN111431791B/en
Publication of CN111431791A publication Critical patent/CN111431791A/en
Application granted granted Critical
Publication of CN111431791B publication Critical patent/CN111431791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides an instant messaging message identification method and a system, wherein the method comprises the steps of firstly obtaining an IM message; then inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; and finally, determining whether to block the IM message or not based on the class of the IM message. The embodiment of the invention introduces the neural network model to identify the belonged category of the IM message, and can respectively extract the global characteristic information and the local characteristic information of the IM message through the convolution layer group with the multi-scale convolution kernel and the Bi-GRU neural network layer, so that the belonged category of the IM message determined based on the neural network model is more accurate.

Description

Instant communication message identification method and system
Technical Field
The present invention relates to the field of information identification technologies, and in particular, to a method and a system for identifying an instant messaging message.
Background
At present, information has penetrated into every corner of daily life, and more information is received by users, such as various instant messaging information received by common instant messaging software such as QQ, short message, WeChat and the like and commercial instant messaging software.
Some abnormal users (such as WeChat users, black products users, and other malicious users) send spam messages such as advertisements or harassment information through Instant Messaging (IM) software, which causes great interference to receiving users, for example, for a house broker in a house entrustment, regular receiving of spam messages will affect normal work of the house broker, and the house broker is highly likely to cheat, violate rules, leak internal data, and the like according to the received spam messages. In addition, as the house broker generally ignores the received spam messages directly when receiving the spam messages, the received spam messages are not replied or processed correspondingly, which causes that performance indexes (such as one minute response rate and business opportunity conversion rate) of the house broker are difficult to assess. Therefore, it is important to identify whether an instant messaging message belongs to a spam message and intercept an instant messaging message belonging to a spam message.
In the prior art, the identification of the instant messaging message is generally implemented by the following three methods: 1) determining the difference degree between the instant messaging message and the spam message samples stored in the sample library by adopting a distance calculation formula or a simhash algorithm based on the spam message samples stored in the sample library, and identifying whether the instant messaging message is the spam message or not according to the determined difference degree; 2) determining whether the instant messaging message is a spam message by adopting a simple fuzzy matching method of sensitive words; 3) and directly judging the instant communication message sent by the user frequently triggering the message sending action as the junk message and intercepting the junk message by adopting methods such as online behavior characteristic analysis and the like.
Various instant messaging message identification methods provided in the prior art all have the defect of inaccurate identification, for example, for the method 1), the method completely depends on spam samples stored in a sample library, and the spam samples cannot exhaust all spam messages, so that spam messages which are not stored in the sample library cannot be identified; for the method 2), when the simple sensitive words are in fuzzy matching, the situation that the non-listed sensitive words cannot be identified due to the fact that all the sensitive words cannot be matched occurs; for the method 3), the frequently triggering the message sending action does not mean that the spam message is sent, so the method of directly determining the instant messaging message sent by the user who frequently triggers the message sending action as the spam message has a risk of wrong identification, for example, for an owner in a house entrustment, the behavior of the owner in actively promoting the house online is easily consistent with the behavior of advertisement brushing of a WeChat user and a black-property user, and further can be intercepted to cause accidental injury.
Therefore, it is urgently needed to provide an instant messaging message identification method and system.
Disclosure of Invention
To overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide an instant messaging message identification method and system.
In a first aspect, an embodiment of the present invention provides an instant messaging message identification method, including:
acquiring an Instant Messaging (IM) message;
inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model;
determining whether to block the IM message based on the class of the IM message;
the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
Preferably, a target convolutional layer exists in convolutional layers with convolutional kernels of different sizes in the convolutional layer group, and a first convolutional layer with a different size from the convolutional kernels of the target convolutional layer in the convolutional layer group is further included on a branch where the target convolutional layer is located.
Preferably, the neural network model further comprises: a plurality of smoothing layers;
the branch where the first convolution layer is located and the branch where the convolution layer with convolution kernels of different scales except the target convolution layer in the convolution layer group is located respectively comprise the smoothing layer.
Preferably, the neural network model further comprises: a plurality of first pooling layers;
and the first pooling layers are connected between the target convolution layer and the first convolution layer and between convolution layers with convolution kernels of different scales except the target convolution layer in the convolution layer group and the corresponding smooth layers.
Preferably, the neural network model further comprises: splicing the layers; and the Bi-GRU neural network layer and all the smoothing layers are respectively connected with the splicing layer.
Preferably, the neural network model further comprises: a second convolutional layer, a second pooling layer, and an output layer; the splicing layer, the second convolution layer, the second pooling layer and the output layer are connected in sequence.
Preferably, the determining whether to block the IM message based on the category to which the IM message belongs specifically includes:
if the type of the IM message is judged to be abnormal, sending reminding information to a receiving end of the IM message;
and if the reply information corresponding to the reminding information is received, determining whether to block the IM message or not based on the reply information.
In a second aspect, an embodiment of the present invention provides an instant messaging message identification system, including: the device comprises an IM message acquisition module, a category identification module and a blocking module. Wherein the content of the first and second substances,
the IM message acquisition module is used for acquiring instant messaging IM messages;
the class identification module is used for inputting the IM message into a neural network model and determining the class of the IM message based on the neural network model;
the forbidding module is used for determining whether to forbid the IM message based on the class of the IM message;
the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the instant messaging message identification method according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the instant messaging message identification method according to the first aspect.
The embodiment of the invention provides a method and a system for identifying instant messaging messages, wherein the method comprises the steps of firstly obtaining IM messages; then inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; and finally, determining whether to block the IM message or not based on the class of the IM message. The embodiment of the invention introduces the neural network model to identify the belonged category of the IM message, and can respectively extract the global characteristic information and the local characteristic information of the IM message through the convolution layer group with the multi-scale convolution kernel and the Bi-GRU neural network layer, so that the belonged category of the IM message determined based on the neural network model is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a spam message identification method according to an embodiment of the present invention;
fig. 2 is a schematic partial structural diagram of a neural network model in a spam message identification method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a neural network model in a spam message identification method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a complete flow of a spam message identification method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a spam message identification system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the embodiments of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the embodiments of the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have specific orientations, be configured in specific orientations, and operate, and thus, should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the embodiments of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. Specific meanings of the above terms in the embodiments of the present invention can be understood in specific cases by those of ordinary skill in the art.
As shown in fig. 1, an embodiment of the present invention provides an instant messaging message identification method, including:
s1, obtaining instant messaging IM message;
s2, inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model;
s3, determining whether to block the IM message based on the belonged category of the IM message;
the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
Specifically, in the instant messaging message identification method provided in the embodiment of the present invention, the execution subject is a server, and the server may be a computer server or a cloud server.
Step S1 is performed first. Instant Messaging (IM) messages are transmitted between a sending end and a receiving end in an IM real-time message stream manner, and the IM messages are edited and sent by the sending end and received by the receiving end. After the receiving end receives the IM message, the server can access the IM real-time message stream through the kafka to acquire the IM message in real time. The retrieved IM message may be in text form.
Then, step S2 is executed. And inputting the obtained IM message into a neural network model, analyzing the IM message by the neural network model, and outputting a probability value that the IM message belongs to different categories. Here, the neural network model is used to output a probability value that an IM message belongs to different categories, i.e. a likelihood that an IM message belongs to different categories, and further to determine said categories of IM messages. The greater the probability value that an IM message belongs to a certain category, the greater the likelihood that the IM message belongs to that category. In the embodiment of the invention, the neural network model can take the category corresponding to the maximum probability value as the belonging category of the IM message.
The IM message can be classified into normal message and abnormal message, the normal message refers to a message whose content is useful, and the abnormal message refers to a message whose content may be useless or related to illegal content. The abnormal message may specifically include spam messages of different categories such as WeChat, black birth, yellow-related, terrorism, and the like. Accordingly, the neural network model outputs the probability values of spam messages of different categories, such as the probability value of the IM message belonging to the normal message, the probability value of the IM message belonging to the WeChat, the probability value of the IM message belonging to the black product, the probability value of the IM message belonging to the yellow product and the probability value of the IM message belonging to the terrorism, wherein the probability value of the IM message belonging to the abnormal message is the difference between 1 and the probability value of the IM message belonging to the normal message, namely the sum of the probability values of the IM message belonging to the different.
In the embodiment of the invention, the neural network model is constructed in advance based on an Embedding layer (Embedding), a convolution layer group with a multi-scale convolution kernel and a Bi-directional Gated circulation (Bi-GRU) neural network layer, namely the neural network model comprises the Embedding layer, the convolution layer group with the multi-scale convolution kernel and the Bi-GRU neural network layer.
The embedded layer is connected with an input layer of the neural network model, the input layer is used for inputting the IM messages into the neural network model, specifically the IM messages are input into the embedded layer, the embedded layer is used for converting indexes of different words in the IM messages input by the input layer from positive integers into vectors with fixed size, namely sparse matrices, so that each word in the IM messages can be ensured to be subjected to subsequent processing in a vector mode, and the calculation amount of the whole neural network model for processing the IM messages can be greatly reduced.
It should be noted that, in the embodiment of the present invention, the input layer of the neural network model has a function of adding an index to each word in the IM message, so that the embedded layer can perform vector transformation on indexes of different words in the IM message.
The convolution layer group comprises a plurality of convolution layers, each convolution layer is provided with a corresponding convolution kernel, the scales of the convolution kernels in different convolution layers can be the same or different, the convolution layer group not only comprises a plurality of convolution layers with convolution kernels with different scales, but also comprises a plurality of convolution layers with convolution kernels with the same scale and different scales, namely the convolution layer group comprises a plurality of convolution layers with convolution kernels with the same scale, and also comprises a plurality of convolution layers with convolution kernels with different scales. . By multi-scale convolution kernel is meant that the convolution kernel scales within a convolution layer group include at least two, for example, 1 × 1, 2 × 2, 3 × 3, 5 × 5, 7 × 7, etc. Each convolutional layer of the convolutional layer group is used for extracting local characteristic information of the IM message. The local feature information may be represented by a local feature matrix, that is, the output result of each convolutional layer is a local feature matrix. Because the convolution layers with convolution kernels of different scales can extract different local characteristic information of the IM message, the convolution layers with convolution kernels of different scales exist in the convolution layer group, and the local characteristic information of the IM message can be determined more comprehensively.
The Bi-GRU neural network layer comprises two parallel processing channels of a first GRU and a second GRU, the first GRU can carry out text semantic expression from the beginning to the end of an IM message, and the second GRU can carry out text semantic expression from the end to the beginning of the IM message. The hidden states of the first GRU and the second GRU are concatenated as a textual semantic representation of each location t in the IM message. The output of the Bi-GRU neural network layer at the current moment is not only related to the previous state, but also related to the future state, and the simultaneous consideration of the previous and the next contexts can be realized, so that the Bi-GRU neural network layer is used for actually extracting the global feature information of the IM message.
In the embodiment of the invention, the convolution layers with convolution kernels of different scales in the convolution layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer, and if the convolution layers with convolution kernels of different scales in the convolution layer group comprise three convolution layers, the scales of the convolution kernels of the three convolution layers are respectively 1 × 1, 2 × 2 and 3 × 3. The three convolution layers and the Bi-GRU neural network layer are parallel and are respectively connected with the embedded layer, the Bi-GRU neural network layer is used for extracting global feature information of the output result of the embedded layer, and the three convolution layers are used for extracting local feature information of the output result of the embedded layer. The neural network model simultaneously extracts the local characteristic information and the global characteristic information of the IM message by combining the convolution layer group and the Bi-GRU neural network layer, thereby ensuring the correctness of the output classification result.
In the embodiment of the invention, the neural network model is trained by specifically adopting the IM message samples in the sample library, and the classes of the IM message samples in the sample library are known.
Finally, step S3 is performed. The IM message belongs to the category which can be divided into normal messages and abnormal messages, if the output result of the neural network model is that the probability value of the IM message belonging to the abnormal messages is larger than the preset threshold value, the result shows that the IM message belongs to the category which is determined to be the abnormal messages based on the neural network model, therefore, the reminding information can be sent to the receiving end of the IM message to remind the receiving end user that the received IM message may be the abnormal message to cause the alert of the receiving end user. Whether the IM message needs to be forbidden or not can be judged by combining the reply information which is sent by the receiving end and corresponds to the reminding information, and the manual judgment is introduced by combining the reply information of the receiving end, so that the forbidden action is more accurate and reasonable.
The instant messaging message identification method provided by the embodiment of the invention comprises the steps of firstly obtaining an IM message; then inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; and finally, determining whether to block the IM message or not based on the class of the IM message. The embodiment of the invention introduces the neural network model to identify the belonged category of the IM message, and can respectively extract the global characteristic information and the local characteristic information of the IM message through the convolution layer group with the multi-scale convolution kernel and the Bi-GRU neural network layer, so that the belonged category of the IM message determined based on the neural network model is more accurate.
On the basis of the foregoing embodiment, in the instant messaging message identification method provided in the embodiment of the present invention, a target convolutional layer exists in convolutional layers having convolutional kernels with different sizes in a convolutional layer group, and a branch where the target convolutional layer is located further includes a first convolutional layer with a size different from that of the convolutional kernel of the target convolutional layer in the convolutional layer group.
Specifically, in the embodiment of the present invention, a target convolutional layer exists in convolutional layers having convolutional kernels with different scales in a convolutional layer group, and the target convolutional layer may be any one of convolutional layers having convolutional kernels with different scales in the convolutional layer group. As shown in fig. 2, which is a partial structural schematic diagram of a neural network model in an embodiment of the present invention, the neural network model includes an embedded layer 20, a convolution layer group, and a Bi-GRU neural network layer 24, and convolution layers having convolution kernels with different scales in the convolution layer group include a convolution layer 21, a convolution layer 22, and a convolution layer 23. The convolution kernel scale of the convolution layer 21 is 1 × 1, the convolution kernel scale of the convolution layer 22 is 2 × 2, and the convolution kernel scale of the convolution layer 23 is 3 × 3. Taking the target convolutional layer as the convolutional layer 21 as an example, the convolutional layer 21 further includes a first convolutional layer 25 having a different convolutional kernel scale from that of the convolutional layer 21 on the branch, and the convolutional layer 21 and the first convolutional layer 25 can be regarded as a convolutional layer with a convolutional kernel scale added. The convolution kernel scale of the first convolution layer 25 may be the same as the convolution kernel scale of the convolution layer 22 or the convolution kernel scale of the convolution layer 23. In this case, convolutional layers having convolutional kernels of different sizes include convolutional layers having convolutional kernels of the same size.
On the basis of the above embodiment, in the instant messaging message identification method provided in the embodiment of the present invention, the neural network model further includes: a plurality of smoothing layers;
the branch where the first convolution layer is located and the branch where the convolution layer with convolution kernels of different scales except the target convolution layer in the convolution layer group is located respectively comprise the smoothing layer.
Specifically, the neural network model in the embodiment of the present invention further includes: as shown in fig. 2, the plurality of smoothing layers include a smoothing layer 29, a smoothing layer 210, and a smoothing layer 211 on the branch where the first convolution layer 25, the convolution layer 22, and the convolution layer 23 are located, respectively, and the smoothing layer 29, the smoothing layer 210, and the smoothing layer 211 are used to perform a one-dimensional process on input data.
On the basis of the above embodiment, in the instant messaging message identification method provided in the embodiment of the present invention, the neural network model further includes: a plurality of first pooling layers;
and the first pooling layers are connected between the target convolution layer and the first convolution layer and between convolution layers with convolution kernels of different scales except the target convolution layer in the convolution layer group and the corresponding smooth layers.
In particular, in embodiments of the present invention, a pooling layer is connected between two convolutional layers that are connected to compress the amount of data and parameters to reduce overfitting. Convolutional layers that are not connected to convolutional layers also need to be connected to a pooling layer, also for compressing the amount of data and parameters, reducing overfitting. As shown in fig. 2, when the convolution kernel scale of the first convolution layer 25 is the same as the convolution kernel scale of the convolution layer 23, the first pooling layer 26 is connected between the convolution layer 21 and the convolution layer 25, the first pooling layer 27 is connected between the convolution layer 22 and the smoothing layer 210, and the first pooling layer 28 is connected between the convolution layer 23 and the smoothing layer 211.
On the basis of the above embodiment, in the instant messaging message identification method provided in the embodiment of the present invention, the neural network model further includes: splicing the layers;
and the Bi-GRU neural network layer and all the smoothing layers are respectively connected with the splicing layer.
Specifically, in the embodiment of the present invention, as shown in fig. 3, the Bi-GRU neural network layer 24, the smoothing layer 29, the smoothing layer 210, and the smoothing layer 211 are respectively connected to the splicing layer 31, and the splicing layer 31 is configured to splice and fuse the local feature information and the global feature information.
On the basis of the above embodiment, in the instant messaging message identification method provided in the embodiment of the present invention, the neural network model further includes: a second convolutional layer, a second pooling layer, and an output layer;
the splicing layer, the second convolution layer, the second pooling layer and the output layer are connected in sequence.
Specifically, in the embodiment of the present invention, as shown in fig. 3, the concatenation layer 31 is sequentially connected to a second concatenation layer 32, a second pooling layer 33, and an output layer 34, where the second concatenation layer 32 is used to extract feature information after concatenation and fusion, the second pooling layer 33 is used to compress data and parameter amounts, so as to reduce overfitting, and the output layer 34 may specifically be a combination of a full concatenation layer and softmax, and is used to determine and output probability values of IM messages belonging to different categories.
On the basis of the foregoing embodiment, the method for identifying an instant messaging message according to an embodiment of the present invention, where determining whether to block the IM message based on the category to which the IM message belongs specifically includes:
if the type of the IM message is judged to be abnormal, sending reminding information to a receiving end of the IM message;
and if the reply information corresponding to the reminding information is received, determining whether to block the IM message or not based on the reply information.
Specifically, in the embodiment of the present invention, if the output result of the neural network model is that the probability value of the IM message belonging to the abnormal message is greater than the preset threshold, it is determined that the category of the IM message belongs to the abnormal message. Here, the probability value that the IM message belongs to the abnormal information is the difference between 1 and the probability value that the IM message belongs to the normal message. The specific value of the preset threshold may be set as required, and the value range of the preset threshold may be greater than or equal to 50%, for example, 50%, 70%, 80%, and the like. The reminding information sent to the receiving end of the IM message may specifically be that the receiving end is made to pay attention to that the received IM message may be an abnormal message, for example, an interface for the receiving end user to select whether the IM message is a normal message may be popped up at the receiving end, and a reply message corresponding to the reminding information is returned to the server. The reply information may include confirmation information and denial information, the confirmation information indicates that the IM message is confirmed to be an abnormal message, and the denial information indicates that the IM message is denied to be an abnormal message, that is, the IM message is considered to be a normal message.
In combination with the reply message, it may be determined whether the IM message needs to be banned, since the abnormal message specifically includes various types of spam messages, such as WeChat, black birth, collaboration, yellow-related, political-related, terrorist, and others. Correspondingly, the neural network model outputs a probability value of spam messages for which the IM messages belong to each category. Such as shown in table 1.
Table 1 correspondence table of probability values of IM messages belonging to different categories
Different classes Probability value
WeChat quotient 0.55893
Black birth 0.00023
Disturbance 0.42752
Wadding yellow wine 0.00001
Fear involving 0.00000
Is normal 0.01331
If the server receives the confirmation information corresponding to the reminding information sent by the receiving end of the IM message, the server arranges the spam messages according to the probability value sequence of different types of spam messages, specifically the spam messages can be arranged according to the probability value sequence from large to small, and the arrangement result is sent to the receiving end. For the receiving end user to further feed back the category of the spam message. And if the server receives the denial information corresponding to the reminding message sent by the receiving end of the IM message, the server allows the sending end of the IM message to normally send the IM message to the receiving end.
And if the feedback information which is sent by the receiving end and corresponds to the arrangement result is received, the sending end of the IM message is forbidden. The feedback information refers to the type of the spam message fed back by the receiving end, and the server blocks the sending end of the IM message, namely, the server blocks the sending end of the IM message no matter what type of the spam message fed back by the receiving end is, and the sending end is not allowed to send the IM message to the receiving end any more.
The instant messaging message identification method provided by the embodiment of the invention is used for carrying out the block of the sending end of the IM message by combining the belonged category of the IM message, the reply information and the feedback information of the receiving end on the basis of identifying the belonged category of the IM message, thereby preventing the block error possibly caused by the block of the sending end of the IM message with the probability value of the abnormal information being greater than the preset threshold value directly according to the output result of the neural network model by the server, and leading the block action to be more objective.
Fig. 4 is a schematic diagram of a complete flow of the spam message identification method provided in the embodiment of the present invention. Firstly, obtaining an IM message; then, inputting the IM message into a neural network model, and outputting probability values of different types of the IM message by the neural network model; finally, judging whether the probability value of the abnormal message of the IM message is greater than a preset threshold value, if so, sending reminding information to a receiving end of the IM message, otherwise, allowing the sending end of the IM message to normally send the IM message to the receiving end; if receiving the confirmation information corresponding to the reminding message sent by the receiving end, arranging according to the probability value sequence of the spam messages of different types, and sending the arrangement result to the receiving end; and if the feedback information which is sent by the receiving end and corresponds to the arrangement result is received, the sending end of the IM message is forbidden.
On the basis of the above embodiment, the instant messaging message identification method provided in the embodiment of the present invention further includes:
and if feedback information corresponding to the arrangement result and sent by the receiving end is received, taking the feedback information as the type of the IM message, and updating the sample library based on the IM message and the feedback information.
Specifically, in the embodiment of the present invention, if the server receives the feedback information corresponding to the arrangement result sent by the receiving end, the server uses the feedback information as the belonging category of the IM message, then uses the IM message as the IM message sample, uses the feedback information as the belonging category of the IM message sample, and updates the sample library. The updating method may specifically be: if the IM message exists in the sample library in advance, replacing the type of the IM message existing in the sample library in advance with the feedback information; and if the IM message does not exist in the sample library in advance, storing the IM message into the sample library to increase the sample amount in the sample library.
The neural network model can be trained by adopting the updated IM message samples in the sample library and the belonged categories of the IM message samples at intervals of a preset time period, so that the accuracy of the output result of the neural network model is ensured. The preset time period may be 1 month, half year, 1 year, or the like.
As shown in fig. 5, on the basis of the above embodiment, an instant messaging message identification system provided in the embodiment of the present invention includes: an IM message acquisition module 51, a category identification module 52 and a block module 53. Wherein the content of the first and second substances,
the IM message obtaining module 51 is configured to obtain an instant messaging IM message;
the category identification module 52 is configured to input the IM message into a neural network model, and determine a category to which the IM message belongs based on the neural network model;
the block module 53 is configured to determine whether to block the IM message based on the category to which the IM message belongs;
the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
Specifically, the functions of the modules in the instant messaging message identification system provided in the embodiment of the present invention correspond to the operation flows of the steps in the above method embodiments one to one, and the achieved effects are also consistent.
On the basis of the above embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, a target convolutional layer exists in convolutional layers having convolutional kernels with different sizes in a convolutional layer group, and a branch where the target convolutional layer is located further includes a first convolutional layer with a size different from that of the convolutional kernel of the target convolutional layer in the convolutional layer group.
On the basis of the above embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, the neural network model further includes: a plurality of smoothing layers;
the branch where the first convolution layer is located and the branch where the convolution layer with convolution kernels of different scales except the target convolution layer in the convolution layer group is located respectively comprise the smoothing layer.
On the basis of the above embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, the neural network model further includes: a plurality of first pooling layers;
and the first pooling layers are connected between the target convolution layer and the first convolution layer and between convolution layers with convolution kernels of different scales except the target convolution layer in the convolution layer group and the corresponding smooth layers.
On the basis of the above embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, the neural network model further includes: splicing the layers;
and the Bi-GRU neural network layer and all the smoothing layers are respectively connected with the splicing layer.
On the basis of the above embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, the neural network model further includes: a second convolutional layer, a second pooling layer, and an output layer;
the splicing layer, the second convolution layer, the second pooling layer and the output layer are connected in sequence.
On the basis of the foregoing embodiment, in the instant messaging message identification system provided in the embodiment of the present invention, the block module is specifically configured to:
if the type of the IM message is judged to be abnormal, sending reminding information to a receiving end of the IM message;
and if the reply information corresponding to the reminding information is received, determining whether to block the IM message or not based on the reply information.
As shown in fig. 6, on the basis of the above embodiment, an embodiment of the present invention provides an electronic device, including: a processor (processor)601, a memory (memory)602, a communication Interface (Communications Interface)603, and a communication bus 604; wherein the content of the first and second substances,
the processor 601, the memory 602, and the communication interface 603 complete communication with each other through the communication bus 604. The memory 602 stores program instructions executable by the processor 601, and the processor 601 is configured to call the program instructions in the memory 602 to perform the methods provided by the above-mentioned method embodiments, for example, including: acquiring an Instant Messaging (IM) message; inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; determining whether to block the IM message based on the class of the IM message; the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or another device, as long as the structure includes the processor 601, the communication interface 603, the memory 602, and the communication bus 604 shown in fig. 6, where the processor 601, the communication interface 603, and the memory 602 complete mutual communication through the communication bus 604, and the processor 601 may call a logic instruction in the memory 602 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
The logic instructions in memory 602 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone article of manufacture. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising: acquiring an Instant Messaging (IM) message; inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; determining whether to block the IM message based on the class of the IM message; the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
On the basis of the foregoing embodiments, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented to perform the transmission method provided by the foregoing embodiments when executed by a processor, and the method includes: acquiring an Instant Messaging (IM) message; inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model; determining whether to block the IM message based on the class of the IM message; the neural network model is constructed on the basis of an embedded layer, a convolutional layer group with multi-scale convolutional kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolutional layer with convolutional kernels of different scales in the convolutional layer group and the Bi-GRU neural network layer are respectively connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the belonged classes of the IM message samples.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. An instant messaging message identification method, comprising:
acquiring an Instant Messaging (IM) message;
inputting the IM message into a neural network model, and determining the class of the IM message based on the neural network model;
determining whether to block the IM message based on the class of the IM message;
the neural network model is constructed on the basis of an embedded layer, a convolution layer group with multi-scale convolution kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolution layer with convolution kernels of different scales in the convolution layer group and the Bi-GRU neural network layer are respectively and directly connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the classes of the IM message samples;
a target convolutional layer exists in convolutional layers with convolutional kernels of different scales in the convolutional layer group, a branch where the target convolutional layer is located also comprises a first convolutional layer with a different scale from the convolutional kernels of the target convolutional layer in the convolutional layer group, and the scale of the convolutional kernel of the first convolutional layer is the same as the scale of the convolutional kernel of one roll of layer except the target convolutional layer in the convolutional layer group;
the determining whether to block the IM message based on the category to which the IM message belongs specifically includes:
if the type of the IM message is judged to be abnormal, sending reminding information to a receiving end of the IM message;
if a reply message corresponding to the reminding message is received, determining whether to block the IM message or not based on the reply message;
if the reply information is confirmation information, arranging according to the probability value sequence of the spam messages of different types, and sending the arrangement result to a receiving end so that the receiving end can further feed back the type of the IM message;
if feedback information which is sent by the receiving end and corresponds to the arrangement result is received, the sending end of the IM message is forbidden, and the sending end of the IM message is not allowed to send the IM message to the receiving end any more; taking the feedback information as the category of the IM message, and updating the sample library based on the IM message and the feedback information;
and if the reply message is a negative confirmation message, allowing the sending end of the IM message to normally send the IM message to the receiving end.
2. The instant messaging message identification method of claim 1, wherein the neural network model further comprises: a plurality of smoothing layers;
the branch where the first convolution layer is located and the branch where the convolution layer with convolution kernels of different scales except the target convolution layer in the convolution layer group is located respectively comprise the smoothing layer.
3. The instant messaging message identification method of claim 2, wherein the neural network model further comprises: a plurality of first pooling layers;
and the first pooling layers are connected between the target convolution layer and the first convolution layer and between convolution layers with convolution kernels of different scales except the target convolution layer in the convolution layer group and the corresponding smooth layers.
4. The instant messaging message identification method of claim 2, wherein the neural network model further comprises: splicing the layers;
and the Bi-GRU neural network layer and all the smoothing layers are respectively connected with the splicing layer.
5. The instant messaging message identification method of claim 4, wherein the neural network model further comprises: a second convolutional layer, a second pooling layer, and an output layer;
the splicing layer, the second convolution layer, the second pooling layer and the output layer are connected in sequence.
6. An instant messaging message identification system, comprising:
the IM message acquisition module is used for acquiring the instant messaging IM message;
the class identification module is used for inputting the IM message into a neural network model and determining the class of the IM message based on the neural network model;
the forbidding module is used for determining whether to forbid the IM message based on the class of the IM message;
the neural network model is constructed on the basis of an embedded layer, a convolution layer group with multi-scale convolution kernels and a bidirectional gated cyclic Bi-GRU neural network layer, wherein each convolution layer with convolution kernels of different scales in the convolution layer group and the Bi-GRU neural network layer are respectively and directly connected with the embedded layer; the neural network model is obtained by training based on IM message samples in a sample library and the classes of the IM message samples;
a target convolutional layer exists in convolutional layers with convolutional kernels of different scales in the convolutional layer group, a branch where the target convolutional layer is located also comprises a first convolutional layer with a different scale from the convolutional kernels of the target convolutional layer in the convolutional layer group, and the scale of the convolutional kernel of the first convolutional layer is the same as the scale of the convolutional kernel of one roll of layer except the target convolutional layer in the convolutional layer group;
the sealing module is specifically configured to:
if the type of the IM message is judged to be abnormal, sending reminding information to a receiving end of the IM message;
if a reply message corresponding to the reminding message is received, determining whether to block the IM message or not based on the reply message;
if the reply information is confirmation information, arranging according to the probability value sequence of the spam messages of different types, and sending the arrangement result to a receiving end so that the receiving end can further feed back the type of the IM message;
if feedback information which is sent by the receiving end and corresponds to the arrangement result is received, the sending end of the IM message is forbidden, and the sending end of the IM message is not allowed to send the IM message to the receiving end any more; taking the feedback information as the category of the IM message, and updating the sample library based on the IM message and the feedback information;
and if the reply message is a negative confirmation message, allowing the sending end of the IM message to normally send the IM message to the receiving end.
7. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program performs the steps of the instant messaging message identification method according to any of claims 1-5.
8. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the instant messaging message identification method according to any one of claims 1 to 5.
CN202010082692.4A 2020-02-07 2020-02-07 Instant communication message identification method and system Active CN111431791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010082692.4A CN111431791B (en) 2020-02-07 2020-02-07 Instant communication message identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010082692.4A CN111431791B (en) 2020-02-07 2020-02-07 Instant communication message identification method and system

Publications (2)

Publication Number Publication Date
CN111431791A CN111431791A (en) 2020-07-17
CN111431791B true CN111431791B (en) 2021-06-18

Family

ID=71547633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010082692.4A Active CN111431791B (en) 2020-02-07 2020-02-07 Instant communication message identification method and system

Country Status (1)

Country Link
CN (1) CN111431791B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113219943B (en) * 2021-04-24 2022-12-20 浙江大学 Fault diagnosis method without mathematical modeling of underwater robot

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107835496A (en) * 2017-11-24 2018-03-23 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and server
CN110267272A (en) * 2019-06-28 2019-09-20 国家计算机网络与信息安全管理中心 A kind of fraud text message recognition methods and identifying system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10382367B2 (en) * 2016-11-23 2019-08-13 Oath Inc. Commentary generation
CN107256357B (en) * 2017-04-18 2020-05-15 北京交通大学 Detection and analysis method for android malicious application based on deep learning
CN108966158B (en) * 2018-08-21 2022-04-12 平安科技(深圳)有限公司 Short message sending method, system, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107835496A (en) * 2017-11-24 2018-03-23 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and server
CN110267272A (en) * 2019-06-28 2019-09-20 国家计算机网络与信息安全管理中心 A kind of fraud text message recognition methods and identifying system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于机器学习的网络舆情文本情感分类方法研究;范文慧;《中国优秀硕士学位论文全文数据库》;20200131;第3章 *

Also Published As

Publication number Publication date
CN111431791A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
CN108874777B (en) Text anti-spam method and device
CN107533557B (en) It is communicated using template identification network fraud
CN109858248B (en) Malicious Word document detection method and device
CN105005594A (en) Abnormal Weibo user identification method
CN103064987A (en) Bogus transaction information identification method
CN113408281B (en) Mailbox account anomaly detection method and device, electronic equipment and storage medium
CN111045847A (en) Event auditing method and device, terminal equipment and storage medium
US20240048514A1 (en) Method for electronic impersonation detection and remediation
CN108932669A (en) A kind of abnormal account detection method based on supervised analytic hierarchy process (AHP)
Debnath et al. Email spam detection using deep learning approach
CN111835622B (en) Information interception method, device, computer equipment and storage medium
CN113836128A (en) Abnormal data identification method, system, equipment and storage medium
CN111431791B (en) Instant communication message identification method and system
Agarwal et al. SMS spam detection for Indian messages
Abinaya et al. Spam detection on social media platforms
Anitha et al. Email spam filtering using machine learning based XGBoost classifier method
Liubchenko et al. Research of Antispam Bot Algorithms for Social Networks.
Permana et al. Perception analysis of the Indonesian society on twitter social media on the increase in BPJS kesehatan contribution in the Covid 19 pandemic era
Raihan et al. Human behavior analysis using association rule mining techniques
CN111191239B (en) Process detection method and system for application program
CN116383742B (en) Rule chain setting processing method, system and medium based on feature classification
CN111178718B (en) Fair competition auditing method, server, system and storage medium
Gupta et al. Spam filter using Naïve Bayesian technique
Pham et al. Content-based approach for Vietnamese spam SMS filtering
CN110611655A (en) Blacklist screening method and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200827

Address after: 100085 Floor 102-1, Building No. 35, West Second Banner Road, Haidian District, Beijing

Applicant after: Seashell Housing (Beijing) Technology Co.,Ltd.

Address before: 300 457 days Unit 5, Room 1, 112, Room 1, Office Building C, Nangang Industrial Zone, Binhai New Area Economic and Technological Development Zone, Tianjin

Applicant before: BEIKE TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant