CN113850283A - Method and device for identifying violation of RCS (Rich client System) message - Google Patents

Method and device for identifying violation of RCS (Rich client System) message Download PDF

Info

Publication number
CN113850283A
CN113850283A CN202110665929.6A CN202110665929A CN113850283A CN 113850283 A CN113850283 A CN 113850283A CN 202110665929 A CN202110665929 A CN 202110665929A CN 113850283 A CN113850283 A CN 113850283A
Authority
CN
China
Prior art keywords
violation
picture
rcs message
rcs
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110665929.6A
Other languages
Chinese (zh)
Inventor
胡雅坤
王光全
韩赛
高杰复
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202110665929.6A priority Critical patent/CN113850283A/en
Publication of CN113850283A publication Critical patent/CN113850283A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a method and a device for identifying violation of RCS (Rich client System) messages, wherein the method comprises the following steps: acquiring an RCS message containing a picture to be identified; inputting the picture to be identified in the RCS message into a classification model trained in advance for preliminary violation identification to obtain a preliminary violation identification result; judging whether the preliminary violation identification result is a violation picture; if not, inputting the picture to be recognized into a detection model trained in advance to perform secondary violation recognition, and obtaining a secondary violation recognition result; and judging whether the RCS message violates the rules or not according to the secondary violation identification result. The method and the device can solve the problems that in the prior art, the content audit of the existing RCS message usually consumes time and labor through manual audit, and the traditional spam message identification method is not suitable for RCS message identification.

Description

Method and device for identifying violation of RCS (Rich client System) message
Technical Field
The present invention relates to the field of communications, and in particular, to a method and an apparatus for identifying an RCS message violation.
Background
With the advent of the 5G era, operators have a simple function and limited experience of traditional short message services, and cannot meet various requirements of users, and the short message services need to be upgraded to Rich media message services, called RCS (Rich Communication Suite). The RCS message service not only supports multimedia message interaction between individual users, but also enables business customers to provide new digital interactive services based on rich media for their users. The business customer interacts with the individual user in the form of chatbot (namely, chat robot) through the operator network, and the message content can comprise: text, pictures, expressions, locations, and the like.
However, the content audit of the existing RCS message usually takes time and labor through manual audit, and the traditional identification method of the spam message adopts passive defense, that is, through a communication trust mechanism, the users of both parties can perform short message communication after the identity of the other party is determined and trust permission is obtained, but the method is only suitable for the interaction between the personal users, is not suitable for the interaction between the enterprise users and the personal users, and increases the complexity of the communication; or a keyword filtering mechanism is adopted, namely the operator determines keywords and identifies whether the keywords are spam messages according to keyword matching rules, but the method is only suitable for text information and is not suitable for rich media information, and the illegal text cannot be identified when the industry client converts the illegal text into illegal pictures.
Disclosure of Invention
The invention aims to solve the technical problem that in the prior art, the method and the device for identifying the violation of the RCS message are provided, so that the problems that in the related art, the content audit of the conventional RCS message usually requires time and labor through manual audit, and the conventional method for identifying the spam message is not suitable for RCS message identification are solved.
In a first aspect, an embodiment of the present invention provides a method for identifying an RCS message violation, including:
acquiring an RCS message containing a picture to be identified;
inputting the picture to be identified in the RCS message into a classification model trained in advance for preliminary violation identification to obtain a preliminary violation identification result;
judging whether the preliminary violation identification result is a violation picture;
if not, inputting the picture to be recognized into a detection model trained in advance to perform secondary violation recognition, and obtaining a secondary violation recognition result;
and judging whether the RCS message violates the rules or not according to the secondary violation identification result.
Preferably, before the picture to be recognized in the RCS message is input into a pre-trained classification model for preliminary violation recognition, the method further includes:
acquiring a violation picture collected in advance;
inputting the violation pictures collected in advance into a pre-established GAN model for training to obtain new violation pictures;
generating violation picture training samples according to the violation pictures collected in advance and the new violation pictures;
and training the classification model and the detection model respectively based on the violation picture training sample to obtain the trained classification model and the trained detection model.
Preferably, before inputting the pre-collected violation pictures into a pre-established GAN model for training and obtaining new violation pictures, the method further includes:
the GAN model is built according to the following formula:
Figure RE-GDA0003359265700000021
wherein D is a discriminator of the GAN model, and G is a generator of the GAN model; x is a pre-collected violation picture, Pdata(x)Distribution of violation pictures collected in advance; z is the noise input to the generator G, G (z) is the new violation picture generated by the generator G, PZ(Z) distribution of noise, D (x) probability that the rule violation picture x collected in advance is judged to be true by the discriminator D, and D (G (Z)) new rule violation picture generated by the discriminator D judgment generator GG (z) is the probability of truth; e is desired.
Preferably, the classification model is an Efficient Net model, the image to be recognized in the RCS message is input into a pre-trained classification model for preliminary violation recognition, and a preliminary violation recognition result is obtained, which specifically includes:
inputting the picture to be identified in the RCS message into the efficiency Net model trained in advance to carry out preliminary violation identification, and obtaining a preliminary violation identification result;
wherein, the loss function expression of the efficiency Net model is as follows:
L=-[y·log(p)+(1-y)·log(1-p)]
in the formula, y represents a label of the illegal picture training sample, the illegal picture is 1, and the normal picture is 0; p represents the probability that the illegal picture training sample is predicted to be the illegal picture, and 1-p represents the probability that the illegal picture training sample is predicted to be the normal picture.
Preferably, the detection model is a yolo-v2 model, and the image to be recognized is input into a pre-trained detection model for secondary violation recognition to obtain a secondary violation recognition result, which specifically includes:
and inputting the picture to be recognized into a pre-trained yolo-v2 model for secondary violation recognition to obtain a secondary violation recognition result.
Preferably, the classification model is configured to determine a first probability that the picture to be identified is a violation picture, compare the first probability with a preset first threshold, and determine that the picture to be identified is a violation picture if the first probability is greater than the first threshold;
the detection model is used for determining a second probability that the picture to be identified is a violation picture, comparing the second probability with a preset second threshold value, and if the second probability is greater than the second threshold value, judging that the picture to be identified is the violation picture;
wherein the first threshold is smaller than the second threshold.
Preferably, the determining whether the RCS message violates the rule according to the secondary violation identification result specifically includes:
and if the secondary violation identification result is a violation picture, judging that the RCS message is violated, otherwise, judging that the RCS message is not violated.
Preferably, the acquiring the RCS message including the picture to be recognized specifically includes:
receiving the RCS message which is sent by the Maap platform and carries the Chatbot identifier, the terminal identifier and the picture to be identified so as to obtain the RCS message containing the picture to be identified;
after the judging whether the RCS message violates the rule according to the secondary violation identification result, the method further comprises the following steps:
if the RCS message is illegal, adding the Chatbot identifier into a blacklist;
and if the RCS message is not illegal, returning the RCS message to the Maap platform so that the Maap platform sends the RCS message to a terminal corresponding to the terminal identification.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying an RCS message violation, including:
the RCS message acquisition module is used for acquiring RCS messages containing the pictures to be identified;
the preliminary violation identification module is connected with the RCS message acquisition module and used for inputting the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification to obtain a preliminary violation identification result;
the first judgment module is connected with the preliminary violation identification module and used for judging whether the preliminary violation identification result is a violation picture;
the second violation identification module is connected with the first judgment module and used for inputting the picture to be identified into a pre-trained detection model to perform second violation identification when the first judgment module judges that the picture is not the second violation identification result;
and the second judgment module is connected with the secondary violation identification module and used for judging whether the RCS message violates the rule or not according to the secondary violation identification result.
In a third aspect, an embodiment of the present invention provides an apparatus for identifying an RCS message violation, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to implement the method for identifying an RCS message violation according to the first aspect.
The method and the device for identifying the violation of the RCS message are based on the deep learning technology, and can obtain a preliminary violation identification result by acquiring the RCS message containing the picture to be identified and inputting the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification, so that automatic identification of the violation picture is realized without manual intervention. Meanwhile, in order to further improve the accuracy of illegal picture identification and avoid the situations of missed judgment and erroneous judgment, when the primary illegal picture identification result is not the illegal picture, the picture to be identified is input into a pre-trained detection model to carry out secondary illegal identification, so that a secondary illegal identification result is obtained, and whether the RCS message is illegal or not is judged according to the secondary illegal identification result, so that the problems that in the related art, the content audit of the conventional RCS message usually consumes time and labor through manual audit, and the conventional recognition method of the junk short message is not suitable for RCS message identification are solved.
Drawings
FIG. 1: the invention relates to a scene diagram of an RCS message violation identification method;
FIG. 2: the invention is a flow chart of an RCS message violation identification method in embodiment 1;
FIG. 3: the structure of the device for identifying the violation of RCS message in embodiment 2 of the present invention is schematically illustrated;
FIG. 4: a schematic structural diagram of an apparatus for identifying an RCS message violation according to embodiment 3 of the present invention is shown.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description will be made with reference to the accompanying drawings.
It is to be understood that the specific embodiments and figures described herein are merely illustrative of the invention and are not limiting of the invention.
It is to be understood that the embodiments and features of the embodiments can be combined with each other without conflict.
It is to be understood that, for the convenience of description, only parts related to the present invention are shown in the drawings of the present invention, and parts not related to the present invention are not shown in the drawings.
It should be understood that each unit and module related in the embodiments of the present invention may correspond to only one physical structure, may also be composed of multiple physical structures, or multiple units and modules may also be integrated into one physical structure.
It will be understood that, without conflict, the functions, steps, etc. noted in the flowchart and block diagrams of the present invention may occur in an order different from that noted in the figures.
It is to be understood that the flowchart and block diagrams of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, apparatus, devices and methods according to various embodiments of the present invention. Each block in the flowchart or block diagrams may represent a unit, module, segment, code, which comprises executable instructions for implementing the specified function(s). Furthermore, each block or combination of blocks in the block diagrams and flowchart illustrations can be implemented by a hardware-based system that performs the specified functions or by a combination of hardware and computer instructions.
It is to be understood that the units and modules involved in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware, for example, the units and modules may be located in a processor.
It should be noted that the scene diagram described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
As shown in fig. 1, a scene diagram of a violation identification method for an RCS message provided in the embodiment of the present application is shown, where each part is described as follows:
(1) chatbot: the chat robot is responsible for automatically or manually completing chat with mobile phone terminal users under the configuration of an industrial customer administrator, and comprises the functions of message sending, message receiving, message analyzing, message processing and the like. Where the RCS message is primarily B2C (Business-to-Customer), all businesses that use Chatbot may be referred to as Business clients.
(2) Maap (Messaging as a Platform): the method aims to construct an open standard operator message platform ecosystem, upgrade the short messages and multimedia messages in the existing industry to RCS messages, and enable users to complete one-stop service experience of searching, interaction, payment and the like in a message window.
(3) The message security control platform is used for carrying out violation identification on the RCS message issued by the chatbot to the mobile phone terminal user, directly releasing the normal RCS message, intercepting the violation RCS message, and carrying out corresponding violation freezing and other treatment on the chatbot sending the violation RCS message.
(4) And the RCS message center is used for receiving the normal RCS message sent by the map platform and forwarding the normal RCS message to an IMS (IP Multimedia system).
(5) And the IMS is used for sending the normal RCS message forwarded by the RCS message center to the mobile phone terminal.
Based on the scenario diagram shown in fig. 1, the following describes a related embodiment of the method for identifying an violation of RCS message according to the present application.
Example 1:
the embodiment provides a method for identifying an RCS message violation, as shown in fig. 2, the method includes:
step S102: and acquiring the RCS message containing the picture to be identified.
It should be noted that the violation identification method for the RCS message provided in this embodiment is mainly applied to the message security control platform in fig. 1, and after receiving the RCS message sent by an industry client to a mobile phone terminal user through chatbot, the map platform forwards the RCS message to the message security control platform to perform violation identification. Specifically, the message security management and control platform receives an RCS message which is sent by the map platform and carries a Chatbot identifier, a terminal identifier and a picture to be recognized, so as to obtain the RCS message containing the picture to be recognized.
Step S104: inputting the picture to be identified in the RCS message into a classification model trained in advance for preliminary violation identification to obtain a preliminary violation identification result;
step S106: judging whether the preliminary violation identification result is a violation picture;
step S108: and if not, inputting the picture to be recognized into a pre-trained detection model to perform secondary violation recognition, and obtaining a secondary violation recognition result.
Optionally, before the to-be-recognized picture in the RCS message is input into a pre-trained classification model for preliminary violation recognition, the method may further include:
acquiring a violation picture collected in advance;
inputting the violation pictures collected in advance into a pre-established GAN (Generative adaptive Networks, Generative countermeasure network) model for training to obtain new violation pictures;
generating violation picture training samples according to a violation picture collected in advance and a new violation picture;
and respectively training the classification model and the detection model based on the illegal picture training sample to obtain the trained classification model and the trained detection model.
In the embodiment, the violation pictures are special, the number of samples is small, and the samples are not easy to obtain, so that the effectiveness and the accuracy of model training are ensured. If the traditional data enhancement method is adopted to generate the image in order to increase the illegal picture data, the defects of large calculation amount, low accuracy and the like are caused. The GAN model is a deep learning model, comprising a generator and a discriminator, the generator is used for estimating the distribution of real data, data which obeys a certain probability distribution is input into the generator, the generator uses the random sample to generate a false data which is input into the discriminator, the input of the discriminator comprises the real data and the data generated by the generator, and the true and false of the data are discriminated according to the input. When an image is generated by using the GAN, an image is input into the GAN, the GAN learns the game through a generator and a discriminator in a frame to reach dynamic balance, a new image is output, and the discriminator cannot discriminate whether the data generated by the generator is from real data or not.
In this embodiment, the GAN model may be built according to the following formula:
Figure RE-GDA0003359265700000081
wherein D is a discriminator of the GAN model, and G is a generator of the GAN model; the discriminator D comprises a convolution layer, a pooling layer and a dropout layer, and the generator G comprises a full-connection layer and a reverse convolution layer; x is the original violation picture, i.e. the violation picture collected in advance, Pdata(x)The distribution of the original illegal pictures is adopted; z is the noise input to the generator G, G (z) is the new violation picture generated by the generator G, PZ(Z) is the distribution of noise, D (x) is the probability that the discriminator D judges the original violation picture x to be real, and D (G (Z)) is the probability that the discriminator D judges the new violation picture G (Z) generated by the generator G to be real; e is desired.
In this embodiment, the training process of the GAN model is as follows:
(a) the generator generates a sample similar to real training data by using randomly generated noise z which obeys a certain distribution (uniform distribution, Gaussian distribution and the like), the pursuit effect is that the more the real sample is, the better the generated new violation picture and the real original violation picture are taken as samples and sent to the discriminator.
(b) The discriminator is a two-classifier and judges the probability that the input sample comes from a real illegal picture (but not a generated picture), if the sample is the real illegal picture, the discriminator outputs a high probability, otherwise, the discriminator outputs a low probability.
(c) If the judgment is correct, the judgers win in the game, and the generator needs to be trained; if the decision is false, indicating that the player won in the game, the arbiter needs to be trained.
(d) And (4) finishing the GAN model training until the model is converged (the generator and the arbiter are not changed any more).
In this embodiment, the pictures can be classified into illegal pictures or normal pictures through the classification model, and in consideration of the real-time property of the RCS message, a classification model with high accuracy and high efficiency can be used, preferably, an EfficientNet model is used, which can give consideration to both speed and precision, and the classification model utilizes a mobile inverted bottle neck convolution (MBConv) module, for example, the classification model may specifically use the EfficientNet-B0 model.
In this embodiment, the loss function expression of the classification model, i.e., the efficiency Net model, may be:
L=-[y·log(p)+(1-y)·log(1-p)]
in the formula, y represents a label of the illegal picture training sample, the illegal picture is 1, and the normal picture is 0; p represents the probability that the illegal picture training sample is predicted to be the illegal picture, and 1-p represents the probability that the illegal picture training sample is predicted to be the normal picture. It should be noted that each sample picture in the violation picture training samples corresponds to one violation label.
In this embodiment, in order to further improve the accuracy of identifying the violation pictures and avoid the situations of missed judgment and erroneous judgment, when the preliminary violation identification result is not the violation picture, the picture to be identified may be input into a pre-trained detection model to perform secondary violation identification. The detection model can further perform feature detection on the to-be-recognized picture determined as the normal picture, detect whether illegal features (such as a naked body part, a large blood smell and the like) exist in the input to-be-recognized picture, and if yes, the to-be-recognized picture is marked out and determined as the illegal picture.
In the present embodiment, in order to improve the accuracy of positioning while maintaining the accuracy of classification, the detection model may be a yolo model, preferably a yolo-v2 model.
In this embodiment, the classification model and the detection model are based on the principle that the probability that the picture to be identified is an illegal picture is determined by a trained parameter value, the probability is compared with a preset threshold, if the probability exceeds the preset threshold, the picture to be identified is judged to be an illegal picture, and if not, the picture is judged not to be the illegal picture. If the preset threshold of the classification model is the first threshold and the preset threshold of the detection model is the second threshold, the first threshold is smaller than the second threshold, for example, the first threshold may be set to 0.6 and the second threshold may be set to 0.9.
Step S110: and judging whether the RCS message violates the rules or not according to the secondary violation identification result.
Specifically, if the result of the secondary violation identification is the violation picture, the RCS message violation is determined, and otherwise, the RCS message violation is determined.
In this embodiment, if the RCS message violates the rule, the RCS message is prohibited from being sent to the terminal user, and the Chatbot identifier can be added to the blacklist to perform corresponding processing such as "violation freeze"; and if the RCS message is not in violation, performing corresponding processing such as 'normal release' and the like, and returning the RCS message to the Maap platform so that the Maap platform sends the RCS message to the terminal corresponding to the terminal identifier. It should be noted that the RCS message only includes the picture content, or all contents except the picture content have been determined to be not illegal through detection in advance.
In a specific embodiment, in order to ensure the ecological security and cleanness of the RCS message service, it is urgently necessary to perform security control on the RCS message issued by an industry client, and specifically, the RCS message may be subjected to violation identification through the following steps:
1) a violation picture database is created. Because the violation pictures are special, the number of samples is small, the samples are not easy to obtain, and if the number of the violation pictures is not enough, the training of classification and detection models is directly influenced, therefore, in order to ensure the effectiveness and accuracy of the training, the violation pictures are generated by using a GAN model to expand a violation picture database, and the method specifically comprises the following steps:
a) collecting the existing violation pictures to obtain a violation picture database, wherein each picture corresponds to a violation label and is used for indicating that the picture is a violation picture;
b) in order to accelerate the convergence of model training, preprocessing and standard normalization processing are carried out on the illegal picture database, wherein the preprocessing is to cut all pictures into 224 × 224, and the standard normalization processing is to normalize the pixel points of the pictures between (0,1), namely dividing all the pixel points by 256;
c) training a GAN model by using the processed pictures;
d) and generating more violation pictures through the GAN model, and expanding a violation picture database.
2) And taking the illegal pictures in the illegal picture database as training samples to respectively train a classification model and a detection model to obtain the trained classification model and detection model.
3) And the Maap platform sends the RCS message issued by the chatbot to the message security control platform for violation judgment.
4) And the message security control platform takes the pictures in the RCS messages as input and sends the input into the classification model, the threshold value is set to be 0.6, and the pictures are roughly divided into illegal pictures and normal pictures.
5) If the classification model judges that the picture is an illegal picture, the message security control platform can return the message which is judged to be the illegal picture to the Maap platform, intercept the illegal picture and not send the illegal picture to the Maap platform, the Maap platform and the message security control platform can both perform corresponding processing such as illegal freezing, for example, adding a Chatbot identifier into a blacklist, warning and the like, and the Maap platform prohibits sending the RCS message to a terminal user.
6) If the classification model judges that the picture is a normal picture, the message security control platform takes the picture in the RCS message as input and sends the input to the detection model, and the threshold value is set to be 0.9, wherein the picture containing the illegal picture characteristics is judged to be the illegal picture, and the picture not containing the illegal picture characteristics is judged to be the normal picture.
7) And if the picture is judged to be the illegal picture by the detection model, the processing is the same as the step 5).
8) And the RCS message which is judged to be a normal picture by the detection model is returned to the Maap platform, corresponding processing such as 'normal release' is carried out, and the RCS message is sent to the terminal user by the Maap platform.
The method for identifying the violation of the RCS message is based on the deep learning technology, the RCS message containing the picture to be identified is obtained, the picture to be identified in the RCS message is input into a classification model which is trained in advance for preliminary violation identification, and a preliminary violation identification result can be obtained, so that automatic identification of the violation picture is achieved, and manual intervention is not needed. Meanwhile, in order to further improve the accuracy of illegal picture identification and avoid the situations of missed judgment and erroneous judgment, when the primary illegal picture identification result is not the illegal picture, the picture to be identified is input into a pre-trained detection model to carry out secondary illegal identification, so that a secondary illegal identification result is obtained, and whether the RCS message is illegal or not is judged according to the secondary illegal identification result, so that the problems that in the related art, the content audit of the conventional RCS message usually consumes time and labor through manual audit, and the conventional recognition method of the junk short message is not suitable for RCS message identification are solved.
Example 2:
as shown in fig. 3, the present embodiment provides an apparatus for identifying an RCS message violation, including:
an RCS message obtaining module 12, configured to obtain an RCS message including a picture to be identified;
the preliminary violation identification module 14 is connected with the RCS message acquisition module 12, and is configured to input the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification, so as to obtain a preliminary violation identification result;
the first judging module 16 is connected to the preliminary violation identification module 14, and is configured to judge whether a preliminary violation identification result is a violation picture;
the second violation identification module 18 is connected with the first judgment module 16, and is used for inputting the picture to be identified into the pre-trained detection model to perform second violation identification when the judgment result of the first judgment module 16 is negative, so as to obtain a second violation identification result;
and the second judging module 20 is connected with the secondary violation identification module 18 and is used for judging whether the RCS message violates the rule according to the secondary violation identification result.
Optionally, the method may further include:
the illegal picture acquisition module is used for acquiring a pre-collected illegal picture;
the first training module is used for inputting the violation pictures collected in advance into a pre-established GAN model for training to obtain new violation pictures;
the training sample generation module is used for generating violation picture training samples according to the violation pictures collected in advance and the new violation pictures;
and the second training module is used for respectively training the classification model and the detection model based on the violation picture training sample to obtain the trained classification model and detection model.
Optionally, the method may further include:
the GAN model establishing module is used for establishing a GAN model according to the following formula:
Figure RE-GDA0003359265700000121
wherein D is a discriminator of the GAN model, and G is a generator of the GAN model; x is a pre-collected violation picture, Pdata(x)Distribution of violation pictures collected in advance; z is the noise input to the generator G, G (z) is the new violation picture generated by the generator G, PZ(Z) is the distribution of noise, D (x) is the probability that the discriminator D judges the violation picture x collected in advance to be real, and D (G (Z)) is the probability that the discriminator D judges the new violation picture G (Z) generated by the generator G to be real; e is desired.
Optionally, the classification model is an efficiency Net model, and the preliminary violation identification module 14 is specifically configured to input a to-be-identified picture in the RCS message into a pre-trained efficiency Net model for preliminary violation identification to obtain a preliminary violation identification result;
wherein, the loss function expression of the efficiency Net model is as follows:
L=-[y·log(p)+(1-y)·log(1-p)]
in the formula, y represents a label of the illegal picture training sample, the illegal picture is 1, and the normal picture is 0; p represents the probability that the illegal picture training sample is predicted to be the illegal picture, and 1-p represents the probability that the illegal picture training sample is predicted to be the normal picture.
Optionally, the detection model is a yolo-v2 model, and the secondary violation recognition module 18 is specifically configured to, when the first determination module 16 determines that the result is negative, input the picture to be recognized into a pre-trained yolo-v2 model to perform secondary violation recognition, so as to obtain a secondary violation recognition result.
Optionally, the classification model is configured to determine a first probability that the picture to be identified is a violation picture, compare the first probability with a preset first threshold, and determine that the picture to be identified is the violation picture if the first probability is greater than the first threshold;
the detection model is used for determining a second probability that the picture to be identified is the illegal picture, comparing the second probability with a preset second threshold value, and if the second probability is greater than the second threshold value, judging that the picture to be identified is the illegal picture;
wherein the first threshold is less than the second threshold.
Optionally, the second determining module 20 is specifically configured to determine that the RCS message is violated if the second violation identification result is the violation picture, and otherwise, determine that the RCS message is not violated.
Optionally, the RCS message obtaining module 12 is specifically configured to obtain an RCS message including a picture to be recognized by receiving an RCS message that is sent by the map platform and carries a Chatbot identifier, a terminal identifier, and the picture to be recognized;
optionally, the method may further include:
the blacklist module is used for adding the Chatbot identifier into the blacklist if the RCS message violates the rule;
and the return module is used for returning the RCS message to the Maap platform if the RCS message is not violated, so that the Maap platform sends the RCS message to the terminal corresponding to the terminal identifier.
Example 3:
as shown in fig. 4, the present embodiment provides an apparatus for identifying an RCS message violation, which includes a memory 21 and a processor 22, where the memory 21 stores a computer program, and the processor 22 is configured to run the computer program to execute the method for identifying an RCS message violation in embodiment 1.
The memory 21 is connected to the processor 22, the memory 21 may be a flash memory, a read-only memory or other memories, and the processor 22 may be a central processing unit or a single chip microcomputer.
The violation identification device for the RCS messages provided in embodiments 2 to 3 is based on a deep learning technique, and can obtain a preliminary violation identification result by obtaining the RCS message including the picture to be identified and inputting the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification, thereby implementing automatic identification of the violation picture without manual intervention. Meanwhile, in order to further improve the accuracy of illegal picture identification and avoid the situations of missed judgment and erroneous judgment, when the primary illegal picture identification result is not the illegal picture, the picture to be identified is input into a pre-trained detection model to carry out secondary illegal identification, so that a secondary illegal identification result is obtained, and whether the RCS message is illegal or not is judged according to the secondary illegal identification result, so that the problems that in the related art, the content audit of the conventional RCS message usually consumes time and labor through manual audit, and the conventional recognition method of the junk short message is not suitable for RCS message identification are solved.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A violation identification method for RCS messages is characterized by comprising the following steps:
acquiring an RCS message containing a picture to be identified;
inputting the picture to be identified in the RCS message into a classification model trained in advance for preliminary violation identification to obtain a preliminary violation identification result;
judging whether the preliminary violation identification result is a violation picture;
if not, inputting the picture to be recognized into a detection model trained in advance to perform secondary violation recognition, and obtaining a secondary violation recognition result;
and judging whether the RCS message violates the rules or not according to the secondary violation identification result.
2. The method for identifying the violation of RCS message according to claim 1, wherein before the picture to be identified in the RCS message is input into a pre-trained classification model for preliminary violation identification, the method further comprises:
acquiring a violation picture collected in advance;
inputting the violation pictures collected in advance into a pre-established GAN model for training to obtain new violation pictures;
generating violation picture training samples according to the violation pictures collected in advance and the new violation pictures;
and training the classification model and the detection model respectively based on the violation picture training sample to obtain the trained classification model and the trained detection model.
3. The method for identifying the violation of RCS messages according to claim 2, wherein before inputting the pre-collected violation pictures into a pre-established GAN model for training and obtaining new violation pictures, the method further comprises:
the GAN model is built according to the following formula:
Figure FDA0003116813170000011
wherein D is a discriminator of the GAN model, and G is a generator of the GAN model; x is a pre-collected violation picture, Pdata(x)Distribution of violation pictures collected in advance; z is the noise input to the generator G, G (z) is the new violation picture generated by the generator G, PZ(Z) distribution of noise, D (x) probability that the rule violation picture x collected in advance is judged to be true by the discriminator D, and D (G (Z)) generation of the generator G is judged by the discriminator DThe probability that the new violation picture G (z) is true; e is desired.
4. The method for identifying the violation of the RCS message according to claim 2, wherein the classification model is an Efficient Net model, and the step of inputting the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification obtains a preliminary violation identification result includes:
inputting the picture to be identified in the RCS message into the efficiency Net model trained in advance to carry out preliminary violation identification, and obtaining a preliminary violation identification result;
wherein, the loss function expression of the efficiency Net model is as follows:
L=-[y·log(p)+(1-y)·log(1-p)]
in the formula, y represents a label of the illegal picture training sample, the illegal picture is 1, and the normal picture is 0; p represents the probability that the illegal picture training sample is predicted to be the illegal picture, and 1-p represents the probability that the illegal picture training sample is predicted to be the normal picture.
5. The RCS message violation identification method according to claim 2, wherein the detection model is a yolo-v2 model, and the inputting of the to-be-identified picture into a pre-trained detection model for secondary violation identification obtains a secondary violation identification result, which specifically includes:
and inputting the picture to be recognized into a pre-trained yolo-v2 model for secondary violation recognition to obtain a secondary violation recognition result.
6. The method according to claim 1, wherein the classification model is configured to determine a first probability that the picture to be identified is a violation picture, compare the first probability with a preset first threshold, and determine that the picture to be identified is a violation picture if the first probability is greater than the first threshold;
the detection model is used for determining a second probability that the picture to be identified is a violation picture, comparing the second probability with a preset second threshold value, and if the second probability is greater than the second threshold value, judging that the picture to be identified is the violation picture;
wherein the first threshold is smaller than the second threshold.
7. The method for identifying the RCS message violation according to claim 1, wherein the determining whether the RCS message violation according to the secondary violation identification result specifically includes:
and if the secondary violation identification result is a violation picture, judging that the RCS message is violated, otherwise, judging that the RCS message is not violated.
8. The method for identifying the violation of the RCS message according to claim 7, wherein the obtaining of the RCS message including the picture to be identified specifically includes:
receiving the RCS message which is sent by the Maap platform and carries the Chatbot identifier, the terminal identifier and the picture to be identified so as to obtain the RCS message containing the picture to be identified;
after the judging whether the RCS message violates the rule according to the secondary violation identification result, the method further comprises the following steps:
if the RCS message is illegal, adding the Chatbot identifier into a blacklist;
and if the RCS message is not illegal, returning the RCS message to the Maap platform so that the Maap platform sends the RCS message to a terminal corresponding to the terminal identification.
9. An apparatus for identifying violations of RCS messages, comprising:
the RCS message acquisition module is used for acquiring RCS messages containing the pictures to be identified;
the preliminary violation identification module is connected with the RCS message acquisition module and used for inputting the picture to be identified in the RCS message into a pre-trained classification model for preliminary violation identification to obtain a preliminary violation identification result;
the first judgment module is connected with the preliminary violation identification module and used for judging whether the preliminary violation identification result is a violation picture;
the second violation identification module is connected with the first judgment module and used for inputting the picture to be identified into a pre-trained detection model to perform second violation identification when the first judgment module judges that the picture is not the second violation identification result;
and the second judgment module is connected with the secondary violation identification module and used for judging whether the RCS message violates the rule or not according to the secondary violation identification result.
10. An apparatus for violation identification of RCS messages comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to implement the method for violation identification of RCS messages according to any of claims 1-8.
CN202110665929.6A 2021-06-16 2021-06-16 Method and device for identifying violation of RCS (Rich client System) message Pending CN113850283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110665929.6A CN113850283A (en) 2021-06-16 2021-06-16 Method and device for identifying violation of RCS (Rich client System) message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110665929.6A CN113850283A (en) 2021-06-16 2021-06-16 Method and device for identifying violation of RCS (Rich client System) message

Publications (1)

Publication Number Publication Date
CN113850283A true CN113850283A (en) 2021-12-28

Family

ID=78973068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110665929.6A Pending CN113850283A (en) 2021-06-16 2021-06-16 Method and device for identifying violation of RCS (Rich client System) message

Country Status (1)

Country Link
CN (1) CN113850283A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996470A (en) * 2023-09-27 2023-11-03 创瑞技术有限公司 Rich media information sending system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996470A (en) * 2023-09-27 2023-11-03 创瑞技术有限公司 Rich media information sending system
CN116996470B (en) * 2023-09-27 2024-02-06 创瑞技术有限公司 Rich media information sending system

Similar Documents

Publication Publication Date Title
CN101730903B (en) Multi-dimensional reputation scoring
US10178115B2 (en) Systems and methods for categorizing network traffic content
US8561167B2 (en) Web reputation scoring
US8763114B2 (en) Detecting image spam
US7779156B2 (en) Reputation based load balancing
CN105704005B (en) Malicious user reporting method and device, and reported information processing method and device
CN110149266B (en) Junk mail identification method and device
US20200287936A1 (en) Message Management Platform for Performing Impersonation Analysis & Detection
CN110519150B (en) Mail detection method, device, equipment, system and computer readable storage medium
CN108347374B (en) Message pushing method and device for preventing illegal messages
CN101389074B (en) Short message monitoring method ensuring identity of sender based social network mechanism
AU2008207924A1 (en) Web reputation scoring
US11956196B2 (en) Bulk messaging detection and enforcement
CN108123933B (en) Information leakage automatic monitoring method and system based on internet big data
CN111932427B (en) Method and system for detecting emergent public security incident based on multi-mode data
CN113850283A (en) Method and device for identifying violation of RCS (Rich client System) message
US20130145289A1 (en) Real-time duplication of a chat transcript between a person of interest and a correspondent of the person of interest for use by a law enforcement agent
Revar et al. A Review on Different types of Spam Filtering Techniques.
US20220182347A1 (en) Methods for managing spam communication and devices thereof
CN116015925A (en) Data transmission method, device, equipment and medium
CN106911660B (en) Information management method and device
Morovati et al. Detection of Phishing Emails with Email Forensic Analysis and Machine Learning Techniques.
CN115599345A (en) Application security requirement analysis recommendation method based on knowledge graph
US11257090B2 (en) Message processing platform for automated phish detection
CN113489677B (en) Zero rule attack detection method and device based on semantic context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination