CN111950268A

CN111950268A - Method, device and storage medium for detecting junk information

Info

Publication number: CN111950268A
Application number: CN202010829688.XA
Authority: CN
Inventors: 彭丁聪
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-11-17

Abstract

The invention discloses a method, equipment and a storage medium for detecting junk information, wherein the method comprises the following steps: inputting information to be detected into a coding layer of a pre-trained bert recognition model for coding to obtain a coding feature vector corresponding to the information to be detected; inputting the coding feature vector corresponding to the information to be detected into a decoding layer of a pre-trained bert recognition model for decoding, and calculating the probability corresponding to the coding feature vector corresponding to the information to be detected by using a judgment model corresponding to the decoding layer; and if the probability is greater than or equal to the preset probability, determining that the information to be detected is junk information. According to the technical scheme, the method and the device can achieve good coding relation between the single character or word and the context of the information to be detected, improve the accuracy of the detection result of the single character or word, and further reduce the rate of missing recognition and the rate of false recognition of junk information.

Description

Method, device and storage medium for detecting junk information

Technical Field

The invention belongs to the technical field of information security, and particularly relates to a method and equipment for detecting junk information and a storage medium.

Background

Spam messages, spam mails and other spam information plague most internet user groups. In the prior art, a keyword matching or naive Bayesian classification method is usually used for detecting junk information, although the detection method has high recognition rate, the influence of a single word or word on a detection result is considered, and the understanding of sentences and context meanings is not involved, so that the information of using a specific keyword is intentionally avoided for financial fraud, phishing mails and the like, the interception effect is not good, and the problem of missing recognition exists; and the problem of false identification exists for information containing specific keywords, such as operation promotion, client marketing, bill notification and the like.

Therefore, how to reduce the rate of missing recognition and the rate of false recognition of spam is a technical problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention mainly aims to provide a method, equipment and a storage medium for detecting junk information, and aims to solve the problems of high missing identification rate and high false identification rate of the junk information in the prior art.

In view of the above problem, the present invention provides a method for detecting spam, including:

inputting information to be detected into a coding layer of a pre-trained bert recognition model for coding to obtain a coding feature vector corresponding to the information to be detected;

inputting the coding feature vector corresponding to the information to be detected into a decoding layer of a pre-trained bert recognition model for decoding, and calculating the probability corresponding to the coding feature vector corresponding to the information to be detected by using a judgment model corresponding to the decoding layer;

and if the probability is greater than or equal to a preset probability, determining that the information to be detected is junk information.

Further, in the method for detecting spam, before the coding layer of the pre-trained bert recognition model is used to code the information to be detected, and the coding feature vector corresponding to the information to be detected is obtained, the method further includes:

inputting the acquired original sample data into a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data;

performing K-fold cross training on a judgment model corresponding to a decoding layer in a bert pre-training model by using the coding feature vector corresponding to the original sample data to obtain K sets of model parameters corresponding to K sets of verification data;

calculating K groups of error data corresponding to the K groups of verification data according to the K groups of model parameters and the K groups of verification data;

determining optimization parameters of the judgment model according to the K groups of error data;

and optimizing the decision model according to the optimization parameters of the decision model to obtain an optimized bert pre-training model as the bert recognition model.

Further, in the method for detecting spam, K-fold cross training is performed on a decision model corresponding to a decoding layer in a bert pre-training model by using the coding feature vector corresponding to the original sample data, so as to obtain K sets of model parameters corresponding to K sets of verification data, including:

dividing the coding feature vectors corresponding to the original sample data into K groups to obtain K groups of sample vector data;

and traversably taking one group of sample vector data in the K groups of sample vector data as verification data, taking K-1 group of sample vector data as training data, and training the judgment model by using the training data to obtain model parameters corresponding to each group of verification data.

Further, in the method for detecting spam, calculating K sets of error data corresponding to K sets of verification data according to the K sets of model parameters and the K sets of verification data includes:

predicting the K groups of verification data by using a judgment model under K groups of model parameters to obtain K groups of probabilities of the K groups of verification data;

determining K groups of prediction results corresponding to the K groups of verification data according to the K groups of probabilities of the K groups of verification data;

and determining K groups of error data according to the K groups of verification data and the K groups of prediction results.

Further, in the method for detecting spam, after inputting the acquired original sample data into a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data, the method further includes:

and locking the parameters of the coding layer.

Further, in the method for detecting spam, determining the optimization parameters of the decision model according to the K sets of error data includes:

calculating the weight of each group of model parameters according to the K groups of error data;

and carrying out weighted average on the K groups of model parameters according to the weight to obtain average model parameters which are used as optimization parameters of the decision model.

Further, in the above method for detecting spam, the decision model includes a loss function of a fully-connected neural network or a loss function of a support vector machine with a kernel function.

Further, in the method for detecting spam, if the decision model includes a loss function of a fully-connected neural network, performing K-fold cross training on the decision model corresponding to a decoding layer in a bert pre-training model, and before obtaining K sets of model parameters corresponding to K sets of verification data, the method further includes:

and if the number of the original sample data is less than a preset threshold value, adding a regularization item in the loss function of the fully-connected neural network so as to update the loss function of the fully-connected neural network.

The invention also provides a device for detecting the junk information, which comprises a memory and a processor;

the memory has stored thereon a computer program which, when being executed by the processor, carries out the steps of the method of detecting spam as described above.

The present invention also provides a storage medium having stored thereon a computer program which, when executed by a controller, implements the steps of the method for detecting spam as described above.

Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:

by applying the method for detecting the junk information, the information to be detected is input into the coding layer of the pre-trained bert recognition model with the strong semantic expression function of the words and sentences so as to extract the coding characteristic vector corresponding to the information to be detected, the single word or word is well in coding connection with the context of the information to be detected, and the accuracy of the detection result of the single word or word is improved. By adopting the technical scheme of the invention, the missing recognition rate and the false recognition rate of the junk information can be reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of an embodiment of a method for detecting spam according to the present invention;

fig. 2 is a schematic structural diagram of an embodiment of a spam detection apparatus according to the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

BERT (Bidirectional Encoder Representation from Transformers) pre-training model: the BERT pre-training model realizes the bi-directionality of the language model by using a masked model, and proves the importance of the bi-directionality to language representation pre-training. The BERT pre-training model is a bi-directional language model in the true sense that each word can simultaneously utilize the context information of the word. The BERT pre-training model is the first fine-tuning model which achieves the best effect in both sensor-level and token-level natural language tasks. It is proved that the pre-trailing presentations can relieve the design requirements of different tasks on special model structures. BERT achieves the best results over 11 natural language processing tasks. And the extensive relationships in BERT prove that the 'bidirectionality of BERT' is an important innovation. The BERT pre-training model realizes the conversion from text to dynamic word vector, enhances the semantic information of the text vector, and has strong semantic expression function of words and sentences.

Example one

In order to solve the technical problems in the prior art, an embodiment of the present invention provides a method for detecting spam.

Fig. 1 is a flowchart of an embodiment of a method for detecting spam according to the present invention, and as shown in fig. 1, the method for detecting spam according to this embodiment may specifically include the following steps:

100. inputting information to be detected into a coding layer of a pre-trained bert recognition model for coding to obtain a coding feature vector corresponding to the information to be detected;

in a specific implementation process, since the BERT pre-training model has a strong semantic expression function of words and sentences, in the embodiment, the BERT pre-training model can be fused with a deep learning technology to pre-train a BERT recognition model for recognizing spam information.

Specifically, the training process of the bert recognition model is as follows:

a. inputting the acquired original sample data into a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data;

in this embodiment, mass data may be collected, and spam or non-spam tagging may be performed on the mass data manually to obtain original sample data, where data corresponding to spam may be tagged as 0, and data corresponding to non-spam may be tagged as 1. And inputting the obtained original sample data into a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data, wherein the coding feature vector can be recorded as a vector V with the dimension of 1 x d. In this embodiment, parameters of the BERT pre-training model may be preset, and the BERT pre-training model is initialized, where the parameters of the BERT pre-training model at least include iteration times, learning step length, weight attenuation, selected activation function, whether dropout is used, and the like. According to the actual application requirement, parameters such as maximum sequence length, batch processing data size, random inactivation and the like can be set. The effect of BERT model classification is more sensitive to the choice of model parameters when the data set is smaller. In the technical scheme of the invention, the learning rate is preferably 2e-5, the random inactivation is 0.1, and the iteration number is 6.

For example, the sentence s ═ w1, w2]Consists of n words, and the target word t ═ wi, wi +1]Consists of m words, and the target word t is a subset of the sentence s; the sentence obtained after the word segmentation processing of the sentence s is represented as Sr: sr ═ x0, x1, x2,.., xi ',. cndot, xn ', xn ' +1](ii) a The target word in the sentence is denoted as Tr: tr ═ xi ', xi' +1,.., xi '+ m' -1]And Tr belongs to Sr; wherein x0 and xn' +1 each represent [ CLS]Label and [ SEP]Word vector, [ CLS ] corresponding to the tag]The label means: class labels added to the segmentation by the BERT pre-training model, [ SEP ]]The label means: the BERT model adds an end-of-sentence tag at the end of a sentence. After Sr passes through a multi-layer conversion network of a BERT pre-training model, a coding result about a target word Tr in the last layer is obtained and is expressed as TrVec: TrVec ═ Vi ', Vi' + 1.., Vi '+ m' -1]，TrVec∈R^m×dTrVec is a vector representation of the target word Tr, where R represents a vector space, length m', and d represents the vector dimension; and performing maximum pooling on TrVec to obtain a coded feature vector V, wherein V is max { TrVec, dim is 0}, and V belongs to R¹ ^×d。

It should be noted that after the coding feature vector corresponding to the original sample data is obtained, the parameters of the coding layer may be locked, so that when the coding feature vector of the information to be detected is subsequently extracted, the coding feature vector is extracted under the same parameter, and the consistency between the coding feature vector of the information to be detected and the coding feature vector corresponding to the original sample data is ensured.

In practical application, data received by a user can be acquired in real time to serve as mass data, and the data received by the user can be automatically marked according to the behavior of the user, so that the manual marking cost is reduced. Specifically, after a user receives a certain data, the browsing duration of the data by the user may be acquired so as to obtain the interest level of the received data by the user, and for the data with a low interest level, the data may default to garbage data or another type of data may be set. For example, a user always marks a certain type of short message as read or closes the short message after opening the short message, which indicates that the user has low interest degree in the short message and can determine the short message as junk data or other types of data, and if the user opens and views another type of short message, the opening time exceeds the preset time, which indicates that the user is interested in the content of the short message and can determine the short message as non-junk information, so that automatic marking can be realized, and the user can personally detect and intercept the junk information according to the preference of the user.

b. Performing K-fold cross training on a judgment model corresponding to a decoding layer in a bert pre-training model by using a coding feature vector corresponding to original sample data to obtain K sets of model parameters corresponding to K sets of verification data;

in a specific implementation process, a decoding layer in the bert pre-training model may be set as a function corresponding to a deep learning algorithm, which may be referred to as a decision model corresponding to the decoding layer, so that a coding feature vector corresponding to original sample data may be used as input information of the decision model corresponding to the decoding layer in the bert pre-training model, and K-fold cross training may be performed to obtain K sets of model parameters corresponding to K sets of verification data. For example, the decision model includes a loss function of a fully-connected neural network or a loss function of a support vector machine with a kernel function.

Specifically, the coding feature vectors corresponding to the original sample data may be divided into K groups to obtain K groups of sample vector data; and traversably taking one group of sample vector data in the K groups of sample vector data as verification data, taking the K-1 group of sample vector data as training data, and training the decision model by using the training data to obtain model parameters corresponding to each group of verification data.

For example, K may be 4, such that the first 3 sets of sample vector data are used as training data, the 4 th set of sample vector data are used as verification data, parameters of the decision model are set as random numbers, and the decision model is trained using the training data, so as to obtain a set of model parameters corresponding to the set of training data. Using the same method, each set of data is used as training data in a traversal manner, so that model parameters corresponding to each set of training data, that is, model parameters of 4 sets of verification data are obtained.

In this embodiment, if the decision model is a loss function of the fully-connected neural network, the number of neurons of the fully-connected neural network is not the second dimension d of the vector V, and a bias is added, and correspondingly, the dimension of the weight parameter of the neural network is (d +1) × 1. When the loss function of the fully-connected neural network is trained, the loss function can be calculated by using cross entropy cross Encopy, a back propagation algorithm is carried out, weight parameters of the model are updated, and iterative operation is carried out. If the decision model is a loss function of a Support Vector Machine (SVM) with a kernel function, the training target is converted into a method for finding the optimal segmentation hyperplane of the SVM, and the method is converted into a method for solving the quadratic programming problem.

It should be noted that, in this embodiment, if the decision model includes a loss function of the fully-connected neural network and the number of the original sample data is less than the preset threshold, a regularization term may be added to the loss function of the fully-connected neural network to update the loss function of the fully-connected neural network, so as to prevent overfitting of the model.

c. Calculating K groups of error data corresponding to the K groups of verification data according to the K groups of model parameters and the K groups of verification data;

specifically, the K groups of verification data can be predicted by using a decision model under K groups of model parameters to obtain K groups of probabilities of the K groups of verification data; determining K groups of prediction results corresponding to the K groups of verification data according to K groups of probabilities of the K groups of verification data, wherein if the prediction results can include junk information and waste junk information; in this way, K sets of error data can be determined based on K sets of validation data and K sets of prediction results. For example, the K sets of verification data are spam data, but the identified non-spam data indicate that there is an error, and K sets of error data can be obtained through calculation.

d. Determining optimization parameters of a decision model according to the K groups of error data;

specifically, the weights of the sets of model parameters can be calculated according to K sets of error data; and carrying out weighted average on the K groups of model parameters according to the weight to obtain average model parameters which are used as optimization parameters of the decision model. For example, the inverse of K sets of error data may be normalized to calculate the weight for each set of verification data. Wherein the weight of the set of authentication data will be greater if the error data of the authentication data is smaller.

e. And optimizing the decision model according to the optimization parameters of the decision model to obtain an optimized bert pre-training model as the bert recognition model.

After the optimization parameters of the decision model are obtained, the decision model can be optimized, and an optimized bert pre-training model is obtained and used as the bert recognition model.

In a specific implementation process, when the spam information is detected, the information to be detected can be input into a coding layer of a pre-trained bert recognition model for coding, so as to obtain a coding feature vector corresponding to the information to be detected. Specifically, the process of obtaining the coding feature vector corresponding to the information to be detected is the same as the above process, and please refer to the above related records for details, which is not described herein again.

101. Inputting the coding feature vector corresponding to the information to be detected into a decoding layer of a pre-trained bert recognition model for decoding, and calculating the probability corresponding to the coding feature vector corresponding to the information to be detected by using a judgment model corresponding to the decoding layer;

in this embodiment, the coding feature vector corresponding to the information to be detected may be input to a decoding layer of a pre-trained bert recognition model for decoding, and the probability corresponding to the coding feature vector corresponding to the information to be detected is calculated by using a decision model corresponding to the decoding layer.

102. And if the probability corresponding to the coding feature vector corresponding to the information to be detected is greater than or equal to the preset probability, determining that the information to be detected is junk information.

Specifically, the preset probability is preferably 0.5. In this embodiment, if the probability corresponding to the coding feature vector corresponding to the information to be detected is greater than or equal to 0.5, it is determined that the information to be detected is spam, otherwise, if the probability corresponding to the coding feature vector corresponding to the information to be detected is less than 0.5, it is determined that the information to be detected is non-spam.

According to the method for detecting the junk information, the information to be detected is input into the coding layer of the pre-trained bert recognition model with the strong semantic expression function of the words and sentences, so that the coding feature vector corresponding to the information to be detected is extracted, the situation that a single word or word is well coded and connected with the context of the information to be detected is realized, and the accuracy of the detection result of the single word or word is improved. By adopting the technical scheme of the invention, the missing recognition rate and the false recognition rate of the junk information can be reduced.

In a specific implementation process, after the information to be detected is determined to be junk information, the information to be detected can be intercepted, a user is informed, the user can confirm the junk information after seeing the notice, no operation can be performed if the information is really the junk information, and the bert recognition model can be optimized according to the user feedback information if the user feeds back non-junk information.

It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.

Example two

In order to solve the above technical problems in the prior art, an embodiment of the present invention provides a device for detecting spam.

Fig. 2 is a schematic structural diagram of an embodiment of the apparatus for detecting spam according to the present invention, and as shown in fig. 2, the apparatus for detecting spam of the present embodiment may include an encoding module 20, a calculating module 21, and a determining module 22.

The encoding module 20 is configured to input the information to be detected into an encoding layer of a pre-trained bert recognition model for encoding, so as to obtain an encoding feature vector corresponding to the information to be detected;

specifically, the training process of the bert recognition model is as follows:

a. inputting the acquired original sample data into a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data, and locking parameters of the coding layer;

specifically, dividing coding feature vectors corresponding to original sample data into K groups to obtain K groups of sample vector data; and traversably taking one group of sample vector data in the K groups of sample vector data as verification data, taking the K-1 group of sample vector data as training data, and training the decision model by using the training data to obtain model parameters corresponding to each group of verification data. Wherein, the decision model comprises a loss function of the fully-connected neural network or a kernel function of the support vector machine. And if the number of the original sample data is less than a preset threshold value, adding a regularization item in the loss function of the fully-connected neural network so as to update the loss function of the fully-connected neural network.

specifically, K groups of verification data are predicted by using a judgment model under K groups of model parameters, and K groups of probabilities of the K groups of verification data are obtained; determining K groups of prediction results corresponding to the K groups of verification data according to the K groups of probabilities of the K groups of verification data; and determining K groups of error data according to the K groups of verification data and the K groups of prediction results.

specifically, calculating the weight of each group of model parameters according to K groups of error data; and carrying out weighted average on the K groups of model parameters according to the weight to obtain average model parameters which are used as optimization parameters of the decision model.

The calculation module 21 is configured to input the coding feature vector corresponding to the information to be detected into a decoding layer of a pre-trained bert recognition model for decoding, and calculate, by using a decision model corresponding to the decoding layer, a probability corresponding to the coding feature vector corresponding to the information to be detected;

and the determining module 22 is configured to determine that the information to be detected is spam information if the probability is greater than or equal to the preset probability.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

EXAMPLE III

In order to solve the technical problems in the prior art, an embodiment of the present invention provides a device for detecting spam.

The detection device of the spam information in the embodiment comprises a memory and a processor;

the memory stores a computer program, and the computer program realizes the steps of the spam detection method of the above embodiment when executed by the processor.

Example four

In order to solve the above technical problems in the prior art, embodiments of the present invention provide a storage medium.

The storage medium provided by the embodiment of the present invention stores a computer program thereon, and the computer program, when executed by a processor, implements the steps of the spam detection method of the above embodiment.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for detecting spam messages is characterized by comprising the following steps:

2. The method for detecting spam information according to claim 1, wherein before encoding information to be detected by using an encoding layer of a pre-trained bert recognition model to obtain an encoded feature vector corresponding to the information to be detected, the method further comprises:

3. The method for detecting spam information according to claim 2, wherein performing K-fold cross training on a decision model corresponding to a decoding layer in a bert pre-training model by using the coding feature vector corresponding to the original sample data to obtain K sets of model parameters corresponding to K sets of verification data, comprises:

4. The method of claim 3, wherein calculating K sets of error data corresponding to K sets of validation data based on the K sets of model parameters and K sets of validation data comprises:

5. The method according to claim 2, wherein after the obtained original sample data is input to a coding layer in a preset bert pre-training model for coding to obtain a coding feature vector corresponding to the original sample data, the method further comprises:

and locking the parameters of the coding layer.

6. The method for detecting spam according to claim 2, wherein determining the optimized parameters of the decision model according to the K sets of error data comprises:

7. The method of claim 2, wherein the decision model comprises a loss function of a fully-connected neural network or a loss function of a support vector machine with a kernel function.

8. The method for detecting spam information according to claim 2, wherein if the decision model includes a loss function of a fully-connected neural network, K-fold cross training is performed on the decision model corresponding to a decoding layer in the bert pre-training model, and before K sets of model parameters corresponding to K sets of validation data are obtained, the method further comprises:

9. The garbage information detection device is characterized by comprising a memory and a processor;

the memory has stored thereon a computer program which, when being executed by the processor, carries out the steps of the method of detecting spam as claimed in any of claims 1 to 8.

10. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a controller, implements the steps of the method for detecting spam according to any of claims 1 to 8.