CN109947936A - A method of based on machine learning dynamic detection spam - Google Patents

A method of based on machine learning dynamic detection spam Download PDF

Info

Publication number
CN109947936A
CN109947936A CN201810952482.9A CN201810952482A CN109947936A CN 109947936 A CN109947936 A CN 109947936A CN 201810952482 A CN201810952482 A CN 201810952482A CN 109947936 A CN109947936 A CN 109947936A
Authority
CN
China
Prior art keywords
theme
model
spam
text
mail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810952482.9A
Other languages
Chinese (zh)
Other versions
CN109947936B (en
Inventor
文伟平
冯超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201810952482.9A priority Critical patent/CN109947936B/en
Publication of CN109947936A publication Critical patent/CN109947936A/en
Application granted granted Critical
Publication of CN109947936B publication Critical patent/CN109947936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The method of the invention discloses a kind of Dynamic Recognition spam based on machine learning, it is related to spam filtering technology, by using LDA topic model, Autoencoder self-encoding encoder, and creates linear model, it realizes dynamic detection spam, achievees the purpose that efficient identification spam;It include: to be pre-processed to template mail and sample post;Text is trained using LDA topic model;One-hot coding vector is compressed using self-encoding encoder bag of words vector, is converted into term vector;Linear model is created, mail to be identified is predicted by linear model.Sample technical solution of the present invention, can dynamic detection, efficiently identify spam.

Description

A method of based on machine learning dynamic detection spam
Technical field
The present invention relates to spam filtering technologies more particularly to a kind of based on machine learning dynamic detection spam Method.
Background technique
With the development of big data era, the event of user data leakage is more and more, and communication mailbox leaks thing on a large scale Part emerges one after another.Undue profits group often utilizes the mailbox got, the rubbish postal of the types such as Batch sending commercial advertisement Part has seriously affected the working efficiency of E-mail address, occupies the memory space of mailbox, directly influences the user's body of mailbox It tests.
The detection method of existing spam manually extracts spam mainly by using conventional statistics model Text feature, which is done, classifies.But it sends anti-spam technology and also upgrades therewith and constantly.It is this kind of based on artificial extraction text feature The method for detecting spam, can not efficiently intercept New-type refuse vehicle mail.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of based on machine learning dynamic detection spam Method, i.e. AAS (Autoencoder Anti-Spam System), by using LDA topic model, Autoencoder from Encoder and customized linear model, can preferably identify spam, to guarantee the fortune of mailbox normally and efficiently Turn, guarantees that user is not harassed by spam.
Core of the invention is: by using implicit Di Li Cray distribution (Latent Dirichlet Allocation, Abbreviation LDA once) LDA topic model, Autoencoder self-encoding encoder and customized linear model, dynamic detection rubbish Mail preferably identifies spam.Wherein, LDA) model is a kind of unsupervised machine learning method.It is related to: gamma letter Number, beta distribution, Dirichlet distribution, conjugate prior and Gibbs sampling.Especially by the study to magnanimity document, push away Measure the theme distribution of specific document.AutoEncoder self-encoding encoder is a kind of unsupervised neural network model, can will be believed Breath is effectively compressed.The present invention carries out term vector generation using AutoEncoder self-encoding encoder.Linear portion subpackage in AAS Ballot, two stages of prediction are included.In the ballot stage, the selection of weight is carried out to theme distribution by Softmax function;Prediction Stage is to be matched by sample post with template mail (including multiple representative spams and normal email), It whether is spam by Sigmoid function prediction result.
Technical solution provided by the invention is as follows:
A method of based on machine learning Dynamic Recognition spam, by using LDA topic model, Autoencoder self-encoding encoder and the customized linear model of creation, dynamic detection spam reach efficient identification rubbish postal The purpose of part;Include the following steps:
A. preprocessing process is carried out to template and sample post, performed the following operations:
A1. Chinese template and sample post are segmented with jieba software, English template and sample post use Spacy software is segmented;
A2. the stop words of removal Chinese and English rejects the punctuate in text using the re module of python;
A3. one_hot coding vector is converted by the sentence of point good word using sklearn software.
B. using open source GibbsLDA++, text is trained
B1. the parameter that setting Di Li Cray is distributed, is arranged theme number;
B2., the number of iterations is set;
B3. save training as a result, observation be arranged how many themes when, text subject Clustering Effect is relatively good, by this article Number of this number of topics as Autoencoder subject layer.
C. one_hot coding vector obtained in step A3 is pressed using self-encoding encoder (AutoEncoder) model Contracting, is converted into term vector (embedding);
C1. TensorFlow deep learning frame is used, the foundation of model is carried out, input layer, middle layer, master is respectively set Inscribe the number of layer, output layer;Specific training process indicates are as follows:
p(zl| v)=softmax (- F (v, zl)) (formula 1)
Wherein, v is the one_hot coding generated by A3 step;p(zl| it v) is v first of theme in theme set z Lower gained probability value;Z: the set of theme;L: first theme;zl: indicate first of theme in theme set z;
dl: at theme l, a scalar d of computer random initialization;vk: in the vocabulary that size is k, generation One_hot coding vector;In theme l, under conditions of vocabulary size is k, j-th of scalar ginseng of computer random initialization Number;
p(hj|v,zl): the lower target probability value of j-th of the term vector generated after dimensionality reduction;Hj: under hidden layer h
J-th of coordinate;σ: sigmoid function (provides specific formula) in specific implementation.
C2. it is the loss function of model using quadratic loss function, optimizes mould using AdamOptimizer optimization method Type;
C3. the method for using cross validation, selecting a best model, (i.e. AutoEncoder method trains Model), the term vector by this model, using one_hot coding vector as input, after obtaining dimensionality reduction;
D. linear model is created, the prediction of mail is carried out using linear model:
Linear model includes ballot, two stages of prediction;Ballot the stage, by Softmax function to theme distribution into The selection of row weight;Forecast period is to be matched by sample post with template mail, passes through Sigmoid function prediction knot Fruit;Specifically perform the following operations:
D1. softmax function operation is done using the theme distribution that self-encoding encoder returns, as Voting Model;
A part that the theme distribution that self-encoding encoder returns is inputted as formula 3:
Wherein:
σ: sigmoid function;
sl(q, r)=cos (zq,zr): wherein zq,zrSample post q and template mail r are expressed as in theme set Probability distribution on z, sl(q, r) indicates sample post q and cosine of the template mail r under first of theme in theme set z Similitude.
D2. the cosine similarity for calculating each sample post Yu marked template mail, as feature sl(q,r);
Marked template mail: spam classification in template mail is denoted as 0, normal email type is denoted as 1 and (uses y Value indicates the y value in 0 or 1, that is, step D4);
D3. the sample term vector and template term vector for using (step C3) to generate do dot product multiplication, and resulting result is passed through Sigmoid function produces general between the 0-1 that one group of dimension values is template mail (template mail represented by term vector) number Rate value.
D4. by probability value obtained by Voting Model in (step D1) in (step D3) sigmoid function acquired results it is general Rate value is weighted and averaged, and obtains predicted value y^ (decimal between 0-1);The true value of template mail is denoted as y, and (spam is denoted as 0,1) normal email is denoted as;
D5. loss function (function about y and y^) of the cross entropy formula as model finally is used, AdamOptimizer optimization method Optimized model, the training taken turns by 200-500, preservation model.
D6. the model and template mail for utilizing (step D5) to save.It has been trained whenever a new mail enters AAS system when, available numerical value 0 or 1 (0 to represent be not spam, 1 represent be spam).
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of method based on machine learning dynamic detection spam, by using LDA topic model, Autoencoder self-encoding encoder and customized linear model, dynamic detection spam, efficient identification spam.
Technical advantage of the invention includes:
Firstly, present invention utilizes deep learning methods, to improve the accuracy rate of garbage screen mail;
Secondly, linear model computation complexity is not high in the present invention, so that the speed of screening system mail is accelerated;
Finally, the technical scheme is that an expansible method/system, is expressed if there is novel spam Form, it is only necessary to New-type refuse vehicle mail be added in template mail set, again by step one new model of training.
Detailed description of the invention
The flow diagram of Fig. 1 the method for the present invention.
Specific embodiment
With reference to the accompanying drawing, the present invention, the model of but do not limit the invention in any way are further described by embodiment It encloses.
A specific embodiment of the invention is as follows:
A. when pre-processing to sample and template mail (including multiple rubbish and normal email), following behaviour is executed Make:
A1. jieba is found on GitHub and segment software, and install;
A2. using the simplifying mode in jieba.cut method, the character string for needing to segment in text is decomposed, and is utilized Re module in Python removes the punctuate in text;
A3. the vocabulary of the stop words of Chinese is imported in the code of data prediction, and existed after rejecting character string participle Stop words, generate character string S;
A4., sklearn kit is installed, using in sklearn.feature_extraction.text file CountVectorizer class processing character string S generates the corresponding one-hot coding (one hot) of character string, as (step C) Input data.
When being B. trained using GibbsLDA++ to text, perform the following operations:
B1. download and compile the GibbsLDA++ source code of open source;
B2. training corpus prepares, and file format is bat, and the content of file: the first row is sample post sum, the second row A line is all participles and the sample post text for removing stop words to the end;
B3., hyper parameter alpha (alpha determines text-theme distribution hyper parameter, is defaulted as 50/ number of topics) is set, (beta determines theme-word distribution hyper parameter to beta, is defaulted as 0.1);
B4. theme number is arranged, and (default: 100), setting iteration number of run (is defaulted: 1000);
B5. the keyword number wishing to retain under each theme (topic) is arranged, and (keyword number is silent under each theme Recognize value: 20), setting generates the store path of file;The parameter of step B4, B5 are the data distributions according to existing template sample It is got with experience estimation, needs to attempt repeatedly, training could be passed through and obtain a good model;
B6. above-mentioned parameter is utilized, text is trained using GibbsLDA++;After the wheel number of training iteration, Generate keyword (the keyword number are as follows: step for having recorded in file model_final.towords and being polymerize under each theme The parameter value of rapid B5, i.e., keyword number under each theme) distribution situation.Finally, manually evaluation and test GibbsLDA++ model obtains Whether as a result up to standard, standard up to standard is depending on specific requirements, the mark of GibbsLDA++ model definition up to standard in this patent Standard, which is the theme, to be distributed lower keyword cluster result and meets the 70% of sample distribution.If meeting, number of topics that step B4 is arranged Otherwise (scalar), repeats step B as the dimension (scalar) of subject layer in step C2, until the result of training meets template postal The data distribution of part theme.
It C. the use of self-encoding encoder will hot (one_hot) code conversion be solely compressed term vector.It performs the following operations:
C1., TensorFlow deep learning frame is installed;
C2., input layer, hidden layer, subject layer, the number of four layers of neuron of output layer, in which: input layer=output are set Layer=one_hot number of dimensions (v: the character representation of input layer vector, K:| v |, the dimension of input layer, k: kth dimensional input vector Scalar value), subject layer=GibbsLDA theme number (z: subject layer character representation, the scalar value of the theme of the l: the l dimension) is hidden Layer is set as between 50-200 (hj, h: hidden layer character representation, J:| h |, the dimension of hidden layer, j: the scalar of jth dimension intermediate vector Value), W: the three-dimensional tensor of random initializtion k*z*l, the resulting result of formula 2 are the term vector that autoencoder is generated, as The importation of (step D);
Specific formula:
p(zl| v)=softmax (- F (v, zl)) (formula 1)
Wherein,
p(zl| v): gained is general under the l theme in theme set z by v (one_hot generated by A3 step is encoded) Rate value;vkV (similarly hereinafter) of equal value
Z: the set of theme;
L: first theme;
zl: indicate first of theme in theme set z;
Softmax function: f (x)=ex/∑ex
dl: at theme l, a scalar d of computer random initialization;
vk: in the vocabulary that size is k, the one_hot coding vector of generation;
In theme l, under conditions of vocabulary size is k, j-th of scalar parameter of computer random initialization;
Wherein,
σ (x)=1/ [1+exp (- x)], i.e. sigmoid function;
hj: j-th of coordinate under hidden layer h
C3. it is the loss function (loss function) of model using difference of two squares loss function, is optimized using AdamOptimizer Method carrys out Optimized model;
C4. the model optimal using the method choice of cross validation generates the term vector of mail with the optimal model (embedding)。
D. linear model is created, is given a forecast using linear model, specific method carried out therewith is as follows.
D1. sample post is obtained into the theme distribution of sample post by formula 1.Input as 3 first part of formula, it may be assumed that
l∈Lp(zl|q);
Wherein, q is sample post;R is template mail;R is template mail set;Z is the theme set;L is set
In a certain theme;L is equivalent to z, also illustrates that theme set.
Metzler matrix: the matrix of random initializtion h*h size
Wherein:
σ: sigmoid function is shown in formula 2;
sl(q, r)=cos (zq,zr): wherein zq,zrSample post q and template mail r are expressed as in theme set Probability distribution on z, sl(q, r) indicates sample post q and cosine of the template mail r under first of theme in theme set z Similitude.
D2. using between the theme distribution of sklearn kit calculation template mail and the theme distribution of sample post Cosine similarity, and be normalized, specific formula is shown in sl(q,r);
D3. sample form term vector is obtained by (step C)With template mail term vectorMatrix Multiplication is done with M Method, specific formula are shown in pl(yes|q,r)。
D4. D1-D3 acquired results are subjected to product calculation by formula 3;
D5. loss function of the cross entropy loss function as model is used, is optimized using AdamOptimizer optimization method Model;
Wherein, intersect entropy function are as follows: and H (y | y^)=- ∑ y logy^;AdamOptimizer optimization method can be used The class AdamOptimizer realization that TensorFlow software provides, i.e. tf.train.AdamOptimizer.
D6. loss function (function about y and y^) of the cross entropy formula as model finally is used, AdamOptimizer optimization method Optimized model, the training taken turns by 200-500 save trained model parameter and existing Template mail.
When a new mail enters AAS system, by the calculating of formula 1- formula 3, available new input mail Score, i.e. numerical value 0 or 1;0 representative is not spam, and 1 representative is spam.
It should be noted that the purpose for publicizing and implementing example is to help to further understand the present invention, but the skill of this field Art personnel, which are understood that, not to be departed from the present invention and spirit and scope of the appended claims, and various substitutions and modifications are all It is possible.Therefore, the present invention should not be limited to embodiment disclosure of that, and the scope of protection of present invention is with claim Subject to the range that book defines.

Claims (4)

1. a kind of method of the Dynamic Recognition spam based on machine learning, by using LDA topic model, Autoencoder self-encoding encoder, and linear model is created, it realizes dynamic detection spam, reaches efficient identification spam Purpose;Include the following steps:
A. template mail and sample post are pre-processed, are performed the following operations:
A1. template mail and sample post are segmented;
A2. the stop words in text is removed, the punctuate in Chinese text is rejected;
A3. one-hot coding vector is converted by the sentence of point good word;
B. text is trained using LDA topic model;It performs the following operations:
B1., the parameter of LDA topic model is set, theme number is set;
B2., the number of iterations is set;Text is trained;
B3. save training as a result, obtaining the theme number when text subject Clustering Effect is best;By the theme of the text Number of the number as the subject layer of Autoencoder self-encoding encoder;
C. one_hot coding vector obtained in step A3 is compressed using self-encoding encoder AutoEncoder model, is converted For term vector embedding;It performs the following operations:
C1. model is established using TensorFlow deep learning frame, input layer, middle layer, subject layer, output layer is respectively set Number;Training process is expressed as formula 1:
p(zl| v)=softmax (- F (v, zl)) (formula 1)
Wherein, v is the one_hot coding generated by A3 step;p(zl| v) for v in theme set z under first of theme gained Probability value;Z: the set of theme;L: first theme;zl: indicate first of theme in theme set z;
Wherein, dl: at theme l, a scalar d of computer random initialization;vk: in the vocabulary that size is k, generation One_hot coding vector;In theme l, under conditions of vocabulary size is k, j-th of scalar ginseng of computer random initialization Number;
Wherein, p (hj|v,zl): the lower target probability value of j-th of the term vector generated after dimensionality reduction;Hj: j-th of seat under hidden layer h Mark;σ: sigmoid function;
C2. it uses quadratic loss function as the loss function of model, passes through AdamOptimizer optimization method Optimized model;
C3. using the method choice AutoEncoder method for verifying of reporting to the leadship after accomplishing a task train come model, by one_hot coding vector Term vector as input, after obtaining dimensionality reduction;
D. linear model is created, mail to be identified is predicted by linear model:
Linear model includes ballot stage and forecast period;In the ballot stage, the power of theme distribution is selected by Softmax function Weight;In forecast period, is matched by sample post with template mail, pass through Sigmoid function prediction result;
Specifically perform the following operations:
D1. softmax function operation is carried out to the theme distribution that AutoEncoder self-encoding encoder returns, as Voting Model;
A part that the theme distribution that self-encoding encoder returns is inputted as formula 3:
In formula 3:σ: sigmoid function;Wherein zq,zrPoint It is not expressed as the probability distribution of sample post q and template mail r on theme set z;sl(q, r) indicates sample post q and mould Cosine similarity of the plate mail r under first of theme in theme set z;
D2. the cosine similarity for calculating sample post and template mail that each has been marked, as feature sl(q,r);Wherein, Spam classification is denoted as 0, and normal email type is denoted as 1;
D3. the operation of dot product multiplication is carried out using the step C3 sample term vector generated and template term vector, acquired results pass through Sigmoid function generates the probability value between the 0-1 that one group of dimension values is template mail number;
D4. the probability value of sigmoid function acquired results in probability value obtained by Voting Model in step D1 and step D3 is carried out Weighted average, obtains predicted value y^;
D5. it uses cross entropy formula as the loss function of model, indicates are as follows:
Trained model is obtained, preservation is acquired by taking turns training using AdamOptimizer optimization method Optimized model more Model parameter;
D6. trained model parameter and template mail are utilized, each mail to be identified is predicted, predicted value is obtained, Thus identify whether it is spam;
Through the above steps, it realizes and is based on machine learning Dynamic Recognition spam.
2. the method for the Dynamic Recognition spam based on machine learning as described in claim 1, characterized in that in step A1, Chinese template and sample post are segmented with jieba software, English template and sample post are divided using spacy software Word;The re module of the specifically used python of step A2 rejects the punctuate in text;The specifically used sklearn software of step A3 will divide The sentence of good word is converted into one-hot coding vector.
3. the method for the Dynamic Recognition spam based on machine learning as described in claim 1, characterized in that step B is specific Text is trained using open source GibbsLDA++;Include the following steps:
B11. download and compile the GibbsLDA++ source code of open source
B12. training corpus file format is bat;The content of training corpus file are as follows: the first row is that sample post is total, second A line is all participles and the sample post text for removing stop words to row to the end;
B13. setting determines that text-theme distribution hyper parameter alpha, preferably 50/ number of topics determine theme-word distribution Hyper parameter beta, preferably 0.1;
B14. setting theme number, preferably 100;Setting iteration number of run, preferably 1000;
B15. be arranged wish under each theme topic retain keyword number, preferably 20;The storage road for generating file is set Diameter;
B16. using the parameter set, text is trained using GibbsLDA++;After the wheel number of training iteration, File model_final.towords is generated, wherein recording the distribution situation for the keyword being polymerize under each theme;
B17. whether meet the substantially distribution of sample by artificially evaluating and testing the cluster situation of keyword under each theme;Such as meet, The number of topics that step B14 is arranged is as the dimension of subject layer in step C1;Otherwise, step B is repeated, until the result of training is full The data distribution of sufficient template mail matter topics.
4. the method for the Dynamic Recognition spam based on machine learning as described in claim 1, characterized in that step D training Wheel number be 200-500.
CN201810952482.9A 2018-08-21 2018-08-21 Method for dynamically detecting junk mails based on machine learning Active CN109947936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810952482.9A CN109947936B (en) 2018-08-21 2018-08-21 Method for dynamically detecting junk mails based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810952482.9A CN109947936B (en) 2018-08-21 2018-08-21 Method for dynamically detecting junk mails based on machine learning

Publications (2)

Publication Number Publication Date
CN109947936A true CN109947936A (en) 2019-06-28
CN109947936B CN109947936B (en) 2021-03-02

Family

ID=67005803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810952482.9A Active CN109947936B (en) 2018-08-21 2018-08-21 Method for dynamically detecting junk mails based on machine learning

Country Status (1)

Country Link
CN (1) CN109947936B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182226A (en) * 2020-10-16 2021-01-05 温州职业技术学院 Junk mail detection method based on principal component analysis and density peak clustering
CN113609295A (en) * 2021-08-11 2021-11-05 平安科技(深圳)有限公司 Text classification method and device and related equipment
CN113630302A (en) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 Junk mail identification method and device and computer readable storage medium
CN115730237A (en) * 2022-11-28 2023-03-03 智慧眼科技股份有限公司 Junk mail detection method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649715A (en) * 2016-12-21 2017-05-10 中国人民解放军国防科学技术大学 Cross-media retrieval method based on local sensitive hash algorithm and neural network
CN107171944A (en) * 2017-06-27 2017-09-15 北京二六三企业通信有限公司 The recognition methods of spam and device
CN107423282A (en) * 2017-05-24 2017-12-01 南京大学 Semantic Coherence Sexual Themes and the concurrent extracting method of term vector in text based on composite character
US10002129B1 (en) * 2017-02-15 2018-06-19 Wipro Limited System and method for extracting information from unstructured text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649715A (en) * 2016-12-21 2017-05-10 中国人民解放军国防科学技术大学 Cross-media retrieval method based on local sensitive hash algorithm and neural network
US10002129B1 (en) * 2017-02-15 2018-06-19 Wipro Limited System and method for extracting information from unstructured text
CN107423282A (en) * 2017-05-24 2017-12-01 南京大学 Semantic Coherence Sexual Themes and the concurrent extracting method of term vector in text based on composite character
CN107171944A (en) * 2017-06-27 2017-09-15 北京二六三企业通信有限公司 The recognition methods of spam and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张帆: "《加权LDA模型与SVM在垃圾邮件过滤中的应用》", 《现代计算机》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113630302A (en) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 Junk mail identification method and device and computer readable storage medium
CN112182226A (en) * 2020-10-16 2021-01-05 温州职业技术学院 Junk mail detection method based on principal component analysis and density peak clustering
CN112182226B (en) * 2020-10-16 2022-09-30 温州职业技术学院 Junk mail detection method based on principal component analysis and density peak clustering
CN113609295A (en) * 2021-08-11 2021-11-05 平安科技(深圳)有限公司 Text classification method and device and related equipment
CN115730237A (en) * 2022-11-28 2023-03-03 智慧眼科技股份有限公司 Junk mail detection method and device, computer equipment and storage medium
CN115730237B (en) * 2022-11-28 2024-04-23 智慧眼科技股份有限公司 Junk mail detection method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109947936B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN107992597B (en) Text structuring method for power grid fault case
CN107861951A (en) Session subject identifying method in intelligent customer service
CN109947936A (en) A method of based on machine learning dynamic detection spam
KR20180125905A (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN106709754A (en) Power user grouping method based on text mining
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN107944014A (en) A kind of Chinese text sentiment analysis method based on deep learning
CN107480688B (en) Fine-grained image identification method based on zero sample learning
CN109992668A (en) A kind of enterprise's the analysis of public opinion method and apparatus based on from attention
CN110750640A (en) Text data classification method and device based on neural network model and storage medium
CN113268974B (en) Method, device and equipment for marking pronunciations of polyphones and storage medium
CN109872162A (en) A kind of air control classifying identification method and system handling customer complaint information
CN110717330A (en) Word-sentence level short text classification method based on deep learning
CN111597328B (en) New event theme extraction method
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN112306494A (en) Code classification and clustering method based on convolution and cyclic neural network
CN110472245B (en) Multi-label emotion intensity prediction method based on hierarchical convolutional neural network
CN111462752B (en) Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method
CN115408525B (en) Letters and interviews text classification method, device, equipment and medium based on multi-level label
CN110659367A (en) Text classification number determination method and device and electronic equipment
CN111859983A (en) Natural language labeling method based on artificial intelligence and related equipment
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN111177010B (en) Software defect severity identification method
CN115952292A (en) Multi-label classification method, device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant