CN107835496A - A kind of recognition methods of refuse messages, device and server - Google Patents
A kind of recognition methods of refuse messages, device and server Download PDFInfo
- Publication number
- CN107835496A CN107835496A CN201711191431.0A CN201711191431A CN107835496A CN 107835496 A CN107835496 A CN 107835496A CN 201711191431 A CN201711191431 A CN 201711191431A CN 107835496 A CN107835496 A CN 107835496A
- Authority
- CN
- China
- Prior art keywords
- training
- participle
- short message
- term vector
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
- H04W12/128—Anti-malware arrangements, e.g. protection against SMS fraud or mobile malware
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of recognition methods of refuse messages, device and server, methods described includes:When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to short message to be detected;The participle that word segmentation processing is obtained is mapped to term vector;Format conversion processing is carried out to term vector, obtains the data to be entered for meeting the neural network recognization mode input form being pre-configured with;By the way that data input to be entered is analyzed to neural network recognization model, determine whether short message to be detected is refuse messages.Pass through above-mentioned technical proposal, the semantic relation between the short message whole vocabulary and vocabulary of reception can be identified comprehensively using neural network recognization model, judge whether the short message is refuse messages according to the entire content of short message, the accuracy rate of refuse messages identification is effectively improved, and then comprehensive intercept process is carried out to refuse messages.
Description
Technical field
The present invention relates to field of information processing, more particularly to a kind of recognition methods of refuse messages, device and server.
Background technology
The refuse messages of present mobile phone are more and more, and refuse messages include various companies, the promotional content of website, work as people
Check during short message, it is necessary to find the short message content for oneself wanting to check from numerous refuse messages, and user wants to delete rubbish
Rubbish short message, identification one by one is also carried out to each short message content and then is deleted, is brought inconvenience to cellphone subscriber.
At present, in order to solve such case, the interception software of various catching rubbish short messages is have developed, these intercept software
Sensitive vocabulary or short message sending person that can be in short message judges whether short message is refuse messages, once find that the short message is
Refuse messages will intercept to it.
But current interception software can only be by the refuse messages with sensitive vocabulary or short message sending person in blacklist
Intercepted.So it is possible to the useful short message with sensitive vocabulary will be intercepted as refuse messages, makes user not
The content of short message can be known in time;It is also possible to occur that no sensitive vocabulary or short message sending person can not be intercepted not in blacklist
Interior refuse messages, refuse messages are made to be taken as normal short message sending to user.So, intercepting software can not be by all rubbish
Short message identifies, causes the interception accuracy rate of refuse messages to reduce, it has not been convenient to which user uses.
The content of the invention
In view of this, the invention provides a kind of refuse messages recognition methods, device and server, main purpose is can
All refuse messages can not be identified with solving to intercept software, cause the interception accuracy rate of refuse messages is relatively low to ask
Topic.
According to first aspect present invention, there is provided a kind of recognition methods of refuse messages, including:
When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to the short message to be detected;
The participle that word segmentation processing is obtained is mapped to term vector;
Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input lattice being pre-configured with
The data to be entered of formula;
By the way that the data input to be entered is analyzed to the neural network recognization model, determine described to be detected
Whether short message is refuse messages.
According to second aspect of the present invention, there is provided a kind of identification device of refuse messages, including:
Participle unit, for when receiving the instruction of the identification to short message to be detected, dividing the short message to be detected
Word processing;
Map unit, the participle for word segmentation processing to be obtained are mapped to term vector;
Format conversion unit, for carrying out format conversion processing to the term vector, obtain meeting the nerve being pre-configured with
The data to be entered of Network Recognition mode input form;
Analytic unit, for by the way that the data input to be entered is analyzed to the neural network recognization model,
Determine whether the short message to be detected is refuse messages.
According to third aspect present invention, there is provided a kind of storage device, be stored thereon with computer program, described program quilt
The recognition methods of the refuse messages described in first aspect is realized during computing device.
According to fourth aspect present invention, there is provided a kind of server, the server include storage device and processor,
The storage device, for storing computer program;
The processor, for performing the computer program to realize the identification side of the refuse messages described in first aspect
Method.
By above-mentioned technical proposal, a kind of refuse messages recognition methods, device and server provided by the invention, work as reception
When being instructed to the identification to short message to be detected, word segmentation processing is carried out to short message to be detected first, and corresponding word is mapped for participle
Vector, then the form of term vector is converted into the number to be entered for meeting the neural network recognization mode input form being pre-configured with
According to finally by data input to be entered into the neural network recognization model after learning training.Utilize neural network recognization
Semantic relation between short message to be detected of the model to reception whole vocabulary and vocabulary is identified comprehensively, according to be detected
The entire content of short message judges whether the short message to be detected is refuse messages, effectively improves the accuracy rate of refuse messages identification, enters
And comprehensive intercept process is carried out to refuse messages.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the flow chart of one embodiment of the refuse messages recognition methods of the present invention;
Fig. 2 shows the flow chart of another embodiment of the refuse messages recognition methods of the present invention;
Fig. 3 shows the training process schematic diagram of the neural network recognization model of the present invention;
Fig. 4 shows the structural representation of one embodiment of the refuse messages identification device of the present invention;
Fig. 5 shows the structural representation of another embodiment of the refuse messages identification device of the present invention;
Fig. 6 shows the entity structure schematic diagram of one embodiment of the server of the present invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
The embodiments of the invention provide a kind of recognition methods of refuse messages, server side is preferably applied to, can also be answered
For client-side, the entire content of the short message of reception can be identified comprehensively using neural network recognization model, effectively
Improve the accuracy rate of refuse messages identification.
As shown in figure 1, the recognition methods of the refuse messages of the present embodiment, step include:
Step 101, when receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to short message to be detected.
Wherein, the identification instruction of short message to be detected can be received the short message to be detected triggering of client transmission by server
Input the instruction;The practical business demand that can also be identified by user according to refuse messages is actively entered the instruction;Can also be pre-
Neural network recognization model is first stored in terminal local, receive unknown short message when the client in terminal, it is necessary to
Triggering inputs the instruction when carrying out refuse messages identification to the unknown short message.Word segmentation processing can utilize condition random field
(Condition Random Field, CRF) algorithm carries out word segmentation processing;Or segmented using Max Match word segmentation arithmetic
Processing;Or carry out word segmentation processing using minimum segmentation algorithm.
In the above-mentioned technical solutions, when receiving the instruction of the identification to short message to be detected, first short message to be detected is carried out
Signature (being, for example, that each short message to be detected matches corresponding identification code or identifier), then carries out word segmentation processing again, will
The participle obtained after word segmentation processing is carried out correspondingly with corresponding signature.Obtained after being handled for the later stage according to participle
Result, can determine short message to be detected corresponding to the result, avoid at the same time to multiple short messages to be detected at
During reason, there is the situation that can not differentiate result.
Step 102, participle word segmentation processing obtained is mapped to term vector.
In the above-mentioned technical solutions, because neural network model directly can not carry out Treatment Analysis to participle, it is therefore desirable to
Participle is subjected to mapping processing, is mapped to corresponding term vector.The term vector can be a specific numerical value or one
Individual vector matrix.
Step 103, format conversion processing is carried out to term vector, obtains meeting the neural network recognization model being pre-configured with defeated
The data to be entered of entry format.
In the above-mentioned technical solutions, it is necessary to which every numerical value of term vector is inputted into neutral net simultaneously by multiple input ports
Identification model, and the form of every numerical value corresponding to term vector is unsatisfactory for each input port of neural network recognization model and is pre-configured with
Pattern of the input, therefore, it is necessary to term vector is entered row format conversion, the form conversion include:Change the dimension of term vector, change
The dimension of term vector is weighted processing to the numerical value in term vector.
Step 104, by the way that data input to be entered is analyzed to neural network recognization model, short message to be detected is determined
Whether it is refuse messages.
In the above-mentioned technical solutions, neural network recognization model is obtained using after the excessively multiple learning training of neural network
, after data input to be entered to the neural network recognization model, neural network recognization model can simulate the god of human brain
Input data is treated through member to be analyzed, and judges whether short message to be detected is refuse messages corresponding to the data to be entered.Its
In, the neutral net for training the neural network recognization model to use is preferably convolutional neural networks.In addition, neural network recognization mould
Type can also be carried out learning training using short message to be detected, constantly improve god while short message to be detected is identified
Accuracy through Network Recognition Model Identification refuse messages.
After it is determined that the short message to be detected received is refuse messages, the refuse messages are put into and intercepted in short message, or will
The refuse messages are put into the dustbin of user terminal, or the refuse messages are carried out into complete deletion.
In addition, neural network recognization model is not limited to the identification to short message to be detected, some IMUs can also be identified
Some instant messages sent in news software, or the advertising message of each website push.
The neural network recognization model of the present embodiment is known relative to traditional refuse messages recognizer for refuse messages
Other F1 values (weighted average of accuracy rate and recall rate) can improve at least 1%.Wherein, accuracy rate is the rubbish identified
The ratio of short message quantity and total refuse messages quantity, recall rate are the ratio of the refuse messages quantity and total short message quantity identified
Rate.
In server side, using four kinds of main recognition mechanisms:The identification of intelligent garbage short message, swindle short breath identification, pseudo- base
Short message of standing identifies.Using these mechanism can to 99% fraud text message, 95% pseudo-base station note, 95% refuse messages are had
Effect identification.
Wherein, fraud text message is that maximum short message is endangered user in all refuse messages.By short to swindle class rubbish
Believe sampling analysis, find wherein, to pretend to be that the short message of bank is most, and accounting is up to 44.7%;Next to that electric business businessman is pretended to be to cheat
Fascination consumer accounts for 24.3%, and the 3rd is to pretend to be telecom operators to account for 13.1%.Fraud text message is also change frequency the most simultaneously
Numerous short message.The cell-phone number that is either left in fraud text message, fixed line or URL (Uniform Resource Locator,
URL) its life cycle is short then one day, and it is long also with regard to one week or so.So knowledge method for distinguishing for fraud text message
Inevitable is not by soon to carrying out, it is necessary to implement non-white i.e. black strategy for fraud text message in a manner of fast.It is short for integration swindle
Letter, aviation fraud text message, bank card credit line extraction short message, part-time short message etc., the swindle of common type can all apply non-white
I.e. black strategy.The short message of user's input is first handled using URL strategies and number strategy, is collected with reference to server powerful white
Library, and machine learning algorithm provide safe class jointly.Non-white i.e. black strategy is not simply white for URL contrasts
Storehouse, it can not so meet the requirement quoted for fraud text message needs high accuracy.It is so (i.e. neural using machine learning algorithm
Network algorithm), the correlation model (i.e. neural network recognization model) of fraud text message is generated, fraud text message is carried out by the model
Identification judges, further lifts accuracy rate, by non-white i.e. black accuracy rate lifting to 99%.Pass through above-mentioned technical proposal can be accurate
Efficient to intercept fraud text message, realization is coped with shifting events by sticking to a fundamental principle.
Pass through above-mentioned technical proposal, being capable of the short message to reception whole vocabulary and word using neural network recognization model
Semantic relation between remittance is identified comprehensively, judges whether the short message is refuse messages according to the entire content of short message, effectively
The accuracy rate of refuse messages identification is improved, and then comprehensive intercept process is carried out to refuse messages.
Fig. 1 corresponds to step before using neural network recognization model, and carrying out learning training to neutral net can just obtain
Obtain neural network recognization model.
As shown in Fig. 2 the obtaining step of neural network recognization model includes:
Step 111, training participle is obtained to the training short message progress word segmentation processing of acquisition.
In the above-mentioned technical solutions, when it is multiple to train short message, signature first is carried out to each training short message, then
Word segmentation processing is carried out again, and word segmentation processing can carry out word segmentation processing using CRF algorithms;Or entered using Max Match word segmentation arithmetic
Row word segmentation processing;Or carry out word segmentation processing using minimum segmentation algorithm.The training participle obtained after word segmentation processing with it is corresponding
Signature carry out correspondingly, avoid it is multiple training short messages there is situation about obscuring.
Step 112, corresponding training term vector is mapped to training participle.
Step 113, to training term vector carry out format conversion processing, and by the data input after format conversion processing to roll up
Product neutral net is handled to obtain training function.
In the above-mentioned technical solutions, neural network recognization model is by convolutional neural networks (Convolutional
Neural Network, CNN) process of learning training is completed, convolutional neural networks enter to the data after format conversion processing
Row process of convolution, training function can be exported.
Step 114, it is trained according to training function pair convolutional neural networks, obtains neural network recognization model.
Pass through above-mentioned technical proposal, after training short message to be trained convolutional neural networks so that convolutional Neural
The neuron that network can be imitated in human brain carries out deep learning, and obtains the nerve that can refuse messages be carried out with Intelligent Recognition
Network Recognition model.
In a particular embodiment, step 112 specifically includes:
Training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, preset in matrix decomposition model
It is stored with term vector corresponding to each participle and each participle.
In the above-mentioned technical solutions, preset in matrix decomposition model (word2vec) be each participle distribute corresponding word to
Numerical quantity, and term vector corresponding to each participle and each participle is entered into row-column list storage.It will so train each in segmenting
Participle substitutes into default matrix decomposition model one by one, and term vector corresponding with each participle is searched from the list of storage, then will
Each term vector is collected to obtain training term vector.
In a particular embodiment, before step 112, method also includes:
Step 112 ', short message language material is collected, word segmentation processing is carried out to short message language material and obtains segmenting storehouse.
Step 112 ", corresponding term vector is set for each participle in participle storehouse using default matrix decomposition function.
Step 112 " ', term vector corresponding to each participle and each participle is stored in default matrix decomposition model.
In the above-mentioned technical solutions, substantial amounts of short message language material is collected, each short message language material is subjected to word segmentation processing, and collect
In storehouse is segmented, calculated using default matrix decomposition function and each segment the frequency of occurrences or probability in storehouse is segmented, and foundation
The frequency of occurrences or probability set corresponding term vector for each participle.Short message language material have it is multiple, by each short message language material successively
The default matrix decomposition function of input is handled, and the term vector each segmented in participle storehouse and participle storehouse is adjusted successively
It is whole, until all processing is completed to all short message language material, by word corresponding to each participle finally given and each participle to
Measure and store to form matrix decomposition model into row-column list.
In a particular embodiment, step 113 specifically includes:
Step 1131, row format conversion (carrying out embedding processing) is entered to training term vector, the one of term vector will be trained
Dimension input matrix is converted to two-dimentional input matrix.
In the above-mentioned technical solutions, because convolutional neural networks can only be handled two-dimentional input matrix, and initially
Training term vector belong to one-dimensional input matrix, for the ease of convolutional neural networks processing, it is necessary to by one-dimensional input matrix
Term vector is trained to carry out format conversion.
For example, training term vector is one-dimensional input matrix (1,2,3,4), is entered row format conversion, it is defeated to be converted to two dimension
Enter matrix
Step 1132, the two-dimentional input matrix for training term vector is input to convolutional neural networks and carries out convolution.
Step 1133, the term vector input normalization exponential function after convolution is normalized, and according to normalizing
The result for changing processing determines training function.
In the above-mentioned technical solutions, for convenience of calculation, by the term vector input normalization exponential function after convolution
(softmax functions) term vector after convolution is normalized so that the numerical value obtained after normalization (0,1] it
Between, training function is then determined according to the numerical value after normalization, can thus utilize and train function pair convolutional neural networks to enter
Row training.
In a particular embodiment, step 1132 specifically includes:
Step 11321, two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs.
Step 11322, it is determined that each maximum in vector matrix, and splicing is carried out to each maximum.
Step 11323, full link is added in spliced vector value, and spliced vector value is entered by linking entirely
Row classification.
In the above-mentioned technical solutions, it is one-dimensional vector matrix from multiple vector matrixs after convolutional neural networks convolution, choosing
Greatest measure in each one-dimensional vector matrix is taken, all greatest measures are spliced according to the order of vector matrix after convolution.
Then it is classified according to the size of vector value after splicing, the full chain corresponding with the category is searched from tabulation
Connect, the full link found is added in spliced vector value.After splicing thus can be judged according to the full link of addition
Vector value classification, and then judge the classification of corresponding training short message.
In a particular embodiment, step 1133 specifically includes:
Step 11331, will be normalized with the vector value input normalization exponential function linked entirely.
Step 11332, maximum likelihood function is obtained according to normalized result, maximum likelihood function is minimized
Negative logarithm process obtains cross entropy loss function, using cross entropy loss function as training function.
In a particular embodiment, step 111 specifically includes:
Step 1111, the participle number for the participle that statistics training short message word segmentation processing obtains.
Step 1112, participle number is compared with predetermined maximum, when participle number is equal to predetermined maximum, will segmented
Participle after processing segments as training.
Step 1113, when participle number is less than predetermined maximum, it will segment number using supplementary data and be supplemented to predetermined maximum
Value, segmented the participle after supplement as training.
Step 1114, when participle number is more than predetermined maximum, the participle for exceeding predetermined maximum in all participles is amputated,
Segmented the participle after amputation as training.
For example, predetermined maximum is arranged to 120, if training short message obtains 20 words after carrying out word segmentation processing, not enough
120 words, just supplement 100 " 0 " behind the 20th word;If training short message obtains 232 words after carrying out word segmentation processing, just
120 words retain by before, and 112 words are deleted by after.
Pass through above-mentioned technical proposal, can ensure it is each training short message obtain training participle quantity be it is consistent, just
In carrying out subsequent treatment to it.
In a particular embodiment, training short message has multiple, corresponding to obtain multiple training functions, then step 114 specifically includes:
Convolutional neural networks are repeatedly trained respectively according to every numerical value of multiple training functions successively, obtain nerve
Network Recognition model.
In the above-mentioned technical solutions, each item number of training function scheme of first training short message more than obtained
Value, once training of each neuron completion of convolutional neural networks to convolutional neural networks is inputted, recycles second training
Convolutional neural networks after the training function pair that short message obtains once is trained carry out second training, by that analogy, until by institute
There is training function to fully enter convolutional neural networks to be trained.The convolutional neural networks obtained training for the last time are as god
Through Network Recognition model, and identification comprehensively is carried out to the entire content of the short message of reception using the neural network recognization model and sentenced
It is disconnected, and then determine whether short message is refuse messages.
In a particular embodiment, step 102 specifically includes:
The participle that word segmentation processing obtains is substituted into default matrix decomposition model and obtains corresponding term vector, presets matrix decomposition
Term vector corresponding to each participle and each participle is stored with model.
Utilize above-mentioned steps 112 ', step 112 " and step 112 " ' obtained matrix decomposition model, for short message to be detected point
Each participle that word handles to obtain matches corresponding term vector, then carries out each term vector of matching according to the order of participle
Collect, obtain as the term vector of one-dimensional input matrix.
According to such scheme, corresponding step 103 specifically includes:
The one-dimensional input matrix of term vector is entered into row format conversion, is converted to and meets neural network recognization mode input form
Two-dimentional input matrix.
In the above-mentioned technical solutions, one-dimensional term vector does not meet the pattern of the input of neural network recognization model, so refreshing
Term vector can not just be handled through Network Recognition model.The pattern of the input of neural network recognization model is 2-D data, because
This is changed, it is necessary to which one-dimensional term vector is entered into row format, is converted into the data to be entered that form is bivector.
According to such scheme, corresponding step 104 specifically includes:
Step 1041, the two-dimentional input matrix of term vector is input into neural network recognization model to be analyzed, counted
Value information.
Step 1042, numerical information is detected whether in the range of predetermined value.
Step 1043, if not, it is determined that short message to be detected is refuse messages.
For example, obtained numerical information is the probability that the short message to be detected is refuse messages, if predetermined value scope is 0-
50%, when short message by neural network recognization model to be handled the probability obtained afterwards be 55% when, it is determined that the short message is
Refuse messages.
Or, or the numerical information sets corresponding processing grade, and different grades corresponds to different numerical value models
Enclose, the processing mode of every kind of grade short message is different.
For example, one-level is:Number range is 0-20%, and processing mode notifies user to push the short message;
Two level is:Number range is 21%-50%, and processing mode is deletes the swindle sentence in short message and to push this short
Letter, and notify user;
Three-level is:Number range is 51%-70%, and processing mode is to be put into the short message to intercept in short message;
Level Four is:Number range is 71%-100%, and processing mode is by the short message complete deletion.
Or after being handled using neural network recognization model short message, multiple one-dimensional vector matrixes are obtained, choose every
Greatest measure in individual one-dimensional vector matrix, all greatest measures are spliced according to the order of vector matrix after convolution, spliced
Full link is added in spliced vector according to the concrete mode of above-mentioned steps 1132 afterwards.Thus can be according to addition
Full link judges the classification of spliced vector value, and then judges the classification of corresponding training short message.Be stored with tabulation with
Classification corresponding to each vector value, the category include refuse messages class and normal short message class.After thus can be according to convolution
Vector value judges whether short message is refuse messages.
As shown in figure 3, in another embodiment of the present invention, the acquisition process of corresponding neural network recognization model is:
First, a large amount of short message language materials are collected, collects to obtain participle corpus after segmenting short message language material, then utilizes
The participle corpus is trained to word2vec, obtains the term vector (indexvector corresponding with each segmenting
Indexword), and to it store.
Then, multiple training short messages are obtained, after being segmented to training short message, the term vector using above-mentioned storage is each
The corresponding training term vector of participle matching, because obtained training term vector belongs to the one-dimensional input space, does not meet neutral net
Pattern of the input requirement, therefore, embedding (embedded processing) is to training matrix, will belong to one-dimensional defeated by embedded processing
The training term vector for entering space is converted to the two-dimentional input space, is conveniently inputted convolutional neural networks.
Subsequently, convolution is carried out using convolutional neural networks, and max-pooling is carried out to every group of vector value after convolution
(maximum is taken in vector) operates.As shown in figure 3, two-dimensional matrix obtains six groups after convolution (convolution)
One-dimensional matrix, the maximum in six one-dimensional matrixes is then chosen again.Six maximums of extraction are spliced (concat)
Processing, the most full link of terminating one.Numerical value input softmax (normalization exponential function) layer linked entirely will be connected, returned
One change is handled, and obtains cross entropy, and using the cross entropy as final cross entropy loss function.
Finally, convolutional neural networks are trained using the cross entropy loss function, multiple training short messages passed through upper
State step repeatedly to train convolutional neural networks, training obtains identifying that the neutral net of refuse messages is known after completing
Other model.
Further, the specific implementation as Fig. 1 methods, a kind of identification device of refuse messages is present embodiments provided,
As shown in figure 4, including:Participle unit 21, map unit 22, format conversion unit 23 and analytic unit 24.
Participle unit 21, for when receiving the instruction of the identification to short message to be detected, being segmented to short message to be detected
Processing;
Map unit 22, the participle for word segmentation processing to be obtained are mapped to term vector;
Format conversion unit 23, for carrying out format conversion processing to term vector, obtain meeting the nerve net being pre-configured with
The data to be entered of network identification model pattern of the input;
Analytic unit 24, for by the way that data input to be entered is analyzed to neural network recognization model, it is determined that treating
Detect whether short message is refuse messages.
As shown in figure 5, in order to obtain neural network recognization model, the device also includes:
Participle unit 21, it is additionally operable to the progress of the training short message to acquisition word segmentation processing and obtains training participle;
Map unit 22, it is additionally operable to be mapped to training participle corresponding training term vector;
Processing unit 25, for carrying out format conversion processing to training term vector, and by the data after format conversion processing
Convolutional neural networks are input to be handled to obtain training function;
Participle unit 26, for being trained according to training function pair convolutional neural networks, obtain neural network recognization mould
Type.
In a particular embodiment, map unit 22, it is additionally operable to the default matrix decomposition model of training participle substitution obtaining phase
The training term vector answered, preset and term vector corresponding to each participle and each participle is stored with matrix decomposition model.
In a particular embodiment, device also includes:
Participle unit 21, it is additionally operable to collect short message language material, word segmentation processing is carried out to short message language material and obtains segmenting storehouse;
Setting unit, for using default matrix decomposition function for each participle in participle storehouse set corresponding word to
Amount;
Memory cell, for term vector corresponding to each participle and each participle to be stored in into default matrix decomposition model
In.
In a particular embodiment, processing unit 25, specifically include:
Format converting module, row format conversion is entered to training term vector, the one-dimensional input matrix for training term vector is changed
For two-dimentional input matrix;
Input module, convolution is carried out for the two-dimentional input matrix for training term vector to be input into convolutional neural networks;
Module is normalized, for the term vector input normalization exponential function after convolution to be normalized, and root
Training function is determined according to the result of normalized.
In a particular embodiment, input module, specifically include:
Convolution module, multiple vector matrixs are obtained for two-dimentional input matrix to be input into convolutional neural networks;
Concatenation module, splicing is carried out for determining the maximum in each vector matrix, and to each maximum;
Sort module, for adding full link in spliced vector value, and by linking entirely to spliced vector
Value is classified.
In a particular embodiment, module is normalized, is additionally operable to the vector value input normalization index letter linked entirely
Number is normalized;Maximum likelihood function is obtained according to normalized result, maximum likelihood function is minimized
Negative logarithm process obtains cross entropy loss function, using cross entropy loss function as training function.
In a particular embodiment, participle unit 21, specifically include:
Statistical module, the participle number of the participle obtained for counting training short message word segmentation processing;
Contrast module,, will when participle number is equal to predetermined maximum for participle number to be compared with predetermined maximum
Participle after word segmentation processing segments as training;When participle number is less than predetermined maximum, mended using supplementary data by number is segmented
Predetermined maximum is charged to, is segmented the participle after supplement as training;When participle number is more than predetermined maximum, all points are amputated
Exceed the participle of predetermined maximum in word, segmented the participle after amputation as training.
In a particular embodiment, training short message has multiple, and corresponding to obtain multiple training functions, then participle unit 26, are also used
In repeatedly being trained respectively to convolutional neural networks according to every numerical value of multiple training functions successively, neutral net knowledge is obtained
Other model.
In a particular embodiment, map unit 22, the participle for being additionally operable to obtain word segmentation processing substitute into default matrix decomposition
Model obtains corresponding term vector, and term vector corresponding to each participle and each participle is stored with default matrix decomposition model;
Format conversion unit 23, it is additionally operable to the one-dimensional input matrix of term vector entering row format conversion, is converted to and meets god
Two-dimentional input matrix through Network Recognition mode input form.
In a particular embodiment, analytic unit 24, specifically include:
Analysis module, analyzed, obtained for the two-dimentional input matrix of term vector to be input into neural network recognization model
To numerical information;
Detection module, for detecting numerical information whether in the range of predetermined value;If not, it is determined that short message to be detected is
Refuse messages.
Based on method shown in above-mentioned Fig. 1-3, accordingly, the embodiment of the present application additionally provides a kind of storage device, deposits thereon
Computer program is contained, step corresponding to method shown in Fig. 1-3 is realized when program is executed by processor.
Based on method shown in above-mentioned Fig. 1-3 and Fig. 4, the embodiment of 5 shown devices, the present embodiment additionally provides a kind of service
Device, as shown in fig. 6, including storage device 32 and processor 31, wherein storage device 32 and processor 31 is arranged at bus 33
On.Storage device 32, for storing computer program;Processor 31, it is square shown in Fig. 1-3 to realize for performing computer program
Step corresponding to method.
, being capable of the whole word of the short message to reception using neural network recognization model by the above-mentioned technical proposal of the present invention
Semantic relation between remittance and vocabulary is identified comprehensively, judges whether the short message is that rubbish is short according to the entire content of short message
Letter, the accuracy rate of refuse messages identification is effectively improved, and then comprehensive intercept process is carried out to refuse messages.
The embodiment of the invention discloses:
A1, a kind of recognition methods of refuse messages, including:
When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to the short message to be detected;
The participle that word segmentation processing is obtained is mapped to term vector;
Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input lattice being pre-configured with
The data to be entered of formula;
By the way that the data input to be entered is analyzed to the neural network recognization model, determine described to be detected
Whether short message is refuse messages.
A2, the method as described in A 1, the obtaining step of the neural network recognization model include:
Training participle is obtained to the training short message progress word segmentation processing of acquisition;
Corresponding training term vector is mapped to the training participle;
Format conversion processing is carried out to the training term vector, and the data input after format conversion processing is refreshing to convolution
Handled to obtain training function through network;
It is trained according to convolutional neural networks described in the training function pair, obtains neural network recognization model.
A 3, the method as described in A 2, it is described that corresponding training term vector, specific bag are mapped to the training participle
Include:
The training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, the default matrix point
Term vector corresponding to each participle and each participle is stored with solution model.
A 4, the method as described in A 3, the training participle is substituted into default matrix decomposition model and trained accordingly
Before term vector, methods described also includes:
Short message language material is collected, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse;
Corresponding term vector is set for each participle in the participle storehouse using default matrix decomposition function;
Term vector corresponding to each participle and each participle is stored in default matrix decomposition model.
A 5, the method as described in A 2, format conversion processing is carried out to the training term vector, and by format conversion processing
Data input afterwards is handled to obtain training function to convolutional neural networks, is specifically included:
Row format conversion is entered to the training term vector, the one-dimensional input matrix of the training term vector is converted into two dimension
Input matrix;
The two-dimentional input matrix of the training term vector is input to convolutional neural networks and carries out convolution;
Term vector input normalization exponential function after convolution is normalized, and according to the knot of normalized
Fruit determines training function.
A 6, the method as described in A 5, the two-dimentional input matrix of the training term vector is input to convolutional neural networks
Convolution is carried out, is specifically included:
The two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs;
It is determined that the maximum in each vector matrix, and splicing is carried out to each maximum;
Full link is added in spliced vector value, and spliced vector value is divided by described link entirely
Class.
A 7, the method as described in A 6, the term vector input normalization exponential function by after convolution are normalized
Processing, and training function is determined according to the result of normalized, specifically include:
It will be normalized with the vector value input normalization exponential function linked entirely;
Maximum likelihood function is obtained according to normalized result, the maximum likelihood function is carried out to minimize negative logarithm
Processing obtains cross entropy loss function, using the cross entropy loss function as training function.
A 8, the method as described in A 2, the training short message of described pair of acquisition carry out word segmentation processing and obtain training participle, specifically
Including:
The participle number for the participle that statistics training short message word segmentation processing obtains;
The participle number is compared with predetermined maximum, will when the participle number is equal to the predetermined maximum
Participle after word segmentation processing segments as training;
When it is described participle number be less than the predetermined maximum when, using supplementary data by it is described segment number be supplemented to it is described pre-
Determine maximum, segmented the participle after supplement as training;
When the participle number is more than the predetermined maximum, amputate in all participles beyond point of the predetermined maximum
Word, segmented the participle after amputation as training.
A 9, the method as described in A 2, the training short message have multiple, the corresponding multiple training functions of acquisition, according to described
Convolutional neural networks described in training function pair are trained, and are obtained neural network recognization model, are specifically included:
The convolutional neural networks are repeatedly trained respectively according to every numerical value of the multiple training function successively,
Obtain neural network recognization model.
A 10, the method as described in any one of A1 to A 9, the participle that word segmentation processing is obtained are mapped to term vector,
Specifically include:
The participle that word segmentation processing obtains is substituted into default matrix decomposition model and obtains corresponding term vector, the default matrix
Term vector corresponding to each participle and each participle is stored with decomposition model;
Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input lattice being pre-configured with
The data to be entered of formula, are specifically included:
The one-dimensional input matrix of the term vector is entered into row format conversion, is converted to and meets neural network recognization mode input
The two-dimentional input matrix of form.
A 11, the method as described in A 10, by by the data input to be entered to the neural network recognization model
Analyzed, determine whether the short message to be detected is refuse messages, is specifically included:
The two-dimentional input matrix of the term vector is input into the neural network recognization model to be analyzed, obtains numerical value
Information;
The numerical information is detected whether in the range of predetermined value;
If not, it is determined that the short message to be detected is refuse messages.
B12, a kind of identification device of refuse messages, including:
Participle unit, for when receiving the instruction of the identification to short message to be detected, dividing the short message to be detected
Word processing;
Map unit, the participle for word segmentation processing to be obtained are mapped to term vector;
Format conversion unit, for carrying out format conversion processing to the term vector, obtain meeting the nerve being pre-configured with
The data to be entered of Network Recognition mode input form;
Analytic unit, for by the way that the data input to be entered is analyzed to the neural network recognization model,
Determine whether the short message to be detected is refuse messages.
B 13, the device as described in B 12, described device also include:Processing unit and training unit;
The participle unit, it is additionally operable to the progress of the training short message to acquisition word segmentation processing and obtains training participle;
The map unit, it is additionally operable to be mapped to corresponding training term vector to the training participle;
Processing unit, for carrying out format conversion processing to the training term vector, and by the number after format conversion processing
Handled to obtain training function according to convolutional neural networks are input to;
Training unit, it is trained for the convolutional neural networks according to the training function pair, obtains neutral net
Identification model.
B 14, the device as described in B 13,
The map unit, it is additionally operable to obtain training word accordingly by the default matrix decomposition model of training participle substitution
Vector, term vector corresponding to each participle and each participle is stored with the default matrix decomposition model.
B 15, the device as described in B 14, described device also include:Setting unit and memory cell;
The participle unit, it is additionally operable to collect short message language material, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse;
Setting unit, for setting corresponding word using default matrix decomposition function for each participle in the participle storehouse
Vector;
Memory cell, for term vector corresponding to each participle and each participle to be stored in into default matrix decomposition model
In.
B 16, the device as described in B 13, the processing unit, are specifically included:
Format converting module, row format conversion is entered to the training term vector, by the one-dimensional input of the training term vector
Matrix conversion is two-dimentional input matrix;
Input module, rolled up for the two-dimentional input matrix of the training term vector to be input into convolutional neural networks
Product;
Module is normalized, for the term vector input normalization exponential function after convolution to be normalized, and root
Training function is determined according to the result of normalized.
B 17, the device as described in B 16, the input module, are specifically included:
Convolution module, multiple vector matrixs are obtained for the two-dimentional input matrix to be input into convolutional neural networks;
Concatenation module, splicing is carried out for determining the maximum in each vector matrix, and to each maximum;
Sort module, for adding full link in spliced vector value, and linked entirely to spliced by described
Vector value is classified.
B 18, the device as described in B 17,
The normalization module, it is additionally operable to be normalized with the vector value input normalization exponential function linked entirely
Processing;
Maximum likelihood function is obtained according to normalized result, the maximum likelihood function is carried out to minimize negative logarithm
Processing obtains cross entropy loss function, using the cross entropy loss function as training function.
B 19, the device as described in B 13, the participle unit, are specifically included:
Statistical module, the participle number of the participle obtained for counting training short message word segmentation processing;
Contrast module, for the participle number to be compared with predetermined maximum, when the participle number is equal to described pre-
When determining maximum, segmented the participle after word segmentation processing as training;When the participle number is less than the predetermined maximum, profit
The participle number is supplemented to the predetermined maximum with supplementary data, segmented the participle after supplement as training;When described
When participle number is more than the predetermined maximum, the participle for exceeding the predetermined maximum in all participles is amputated, after amputation
Participle is as training participle.
B 20, the device as described in B 13, the training short message have multiple, the corresponding multiple training functions of acquisition,
The training unit, it is additionally operable to successively according to every numerical value of the multiple training function to the convolutional Neural net
Network is repeatedly trained respectively, obtains neural network recognization model.
B 21, the device as described in any one of B 12 to B 20,
The map unit, it is additionally operable to obtain the default matrix decomposition model of participle substitution that word segmentation processing obtains accordingly
Term vector, term vector corresponding to each participle and each participle is stored with the default matrix decomposition model;
The format conversion unit, it is additionally operable to the one-dimensional input matrix of the term vector entering row format conversion, is converted to
Meet the two-dimentional input matrix of neural network recognization mode input form.
B 22, the device as described in B 21, the analytic unit, are specifically included:
Analysis module, carried out for the two-dimentional input matrix of the term vector to be input into the neural network recognization model
Analysis, obtains numerical information;
Detection module, for detecting the numerical information whether in the range of predetermined value;If not, it is determined that described to be checked
Survey short message is refuse messages.
C23, a kind of storage device, are stored thereon with computer program, and such as A1 is realized when described program is executed by processor
To the recognition methods of the refuse messages described in any one of A11.
D24, a kind of server, the server include storage device and processor,
The storage device, for storing computer program;
The processor, for performing the computer program to realize the refuse messages as described in A1 to any one of A11
Recognition methods.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment
" first ", " second " etc. be to be used to distinguish each embodiment, and do not represent the quality of each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system
Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It should be understood that it can utilize various
Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
One of meaning mode can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice
Microprocessor or digital signal processor (DSP) realize a kind of identification side of refuse messages according to embodiments of the present invention
The some or all functions of some or all parts in method, device and server.The present invention is also implemented as being used for
Perform method as described herein some or all equipment or program of device (for example, computer program and calculating
Machine program product).Such program for realizing the present invention can store on a computer-readable medium, or can have one
Or the form of multiple signals.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal,
Or provided in the form of any other.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (10)
- A kind of 1. recognition methods of refuse messages, it is characterised in that including:When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to the short message to be detected;The participle that word segmentation processing is obtained is mapped to term vector;Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input form that is pre-configured with Data to be entered;By the way that the data input to be entered is analyzed to the neural network recognization model, the short message to be detected is determined Whether it is refuse messages.
- 2. according to the method for claim 1, it is characterised in that the obtaining step of the neural network recognization model includes:Training participle is obtained to the training short message progress word segmentation processing of acquisition;Corresponding training term vector is mapped to the training participle;Format conversion processing is carried out to the training term vector, and by the data input after format conversion processing to convolutional Neural net Network is handled to obtain training function;It is trained according to convolutional neural networks described in the training function pair, obtains neural network recognization model.
- 3. according to the method for claim 2, it is characterised in that described that corresponding training word is mapped to the training participle Vector, specifically include:The training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, the default matrix decomposition mould Term vector corresponding to each participle and each participle is stored with type.
- 4. according to the method for claim 3, it is characterised in that the training participle is substituted into default matrix decomposition model and obtained To before corresponding training term vector, methods described also includes:Short message language material is collected, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse;Corresponding term vector is set for each participle in the participle storehouse using default matrix decomposition function;Term vector corresponding to each participle and each participle is stored in default matrix decomposition model.
- 5. according to the method for claim 2, it is characterised in that format conversion processing is carried out to the training term vector, and Data input after format conversion processing to convolutional neural networks is handled to obtain training function, specifically included:Row format conversion is entered to the training term vector, the one-dimensional input matrix of the training term vector is converted into two-dimentional input Matrix;The two-dimentional input matrix of the training term vector is input to convolutional neural networks and carries out convolution;Term vector input normalization exponential function after convolution is normalized, and it is true according to the result of normalized Surely function is trained.
- 6. according to the method for claim 5, it is characterised in that be input to the two-dimentional input matrix of the training term vector Convolutional neural networks carry out convolution, specifically include:The two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs;It is determined that the maximum in each vector matrix, and splicing is carried out to each maximum;Full link is added in spliced vector value, and spliced vector value is classified by described link entirely.
- 7. according to the method for claim 6, it is characterised in that the term vector input normalization index letter by after convolution Number is normalized, and determines training function according to the result of normalized, specifically includes:It will be normalized with the vector value input normalization exponential function linked entirely;Maximum likelihood function is obtained according to normalized result, the maximum likelihood function is carried out to minimize negative logarithm process Cross entropy loss function is obtained, using the cross entropy loss function as training function.
- A kind of 8. identification device of refuse messages, it is characterised in that including:Participle unit, for when receiving the instruction of the identification to short message to be detected, being carried out to the short message to be detected at participle Reason;Map unit, the participle for word segmentation processing to be obtained are mapped to term vector;Format conversion unit, for carrying out format conversion processing to the term vector, obtain meeting the neutral net being pre-configured with The data to be entered of identification model pattern of the input;Analytic unit, for by the way that the data input to be entered is analyzed to the neural network recognization model, it is determined that Whether the short message to be detected is refuse messages.
- 9. a kind of storage device, is stored thereon with computer program, it is characterised in that is realized when described program is executed by processor The recognition methods of refuse messages described in any one of claim 1 to 7.
- A kind of 10. server, it is characterised in that the server includes storage device and processor,The storage device, for storing computer program;The processor, for performing the computer program to realize the refuse messages described in any one of claim 1 to 7 Recognition methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711191431.0A CN107835496B (en) | 2017-11-24 | 2017-11-24 | Spam short message identification method and device and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711191431.0A CN107835496B (en) | 2017-11-24 | 2017-11-24 | Spam short message identification method and device and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107835496A true CN107835496A (en) | 2018-03-23 |
CN107835496B CN107835496B (en) | 2021-09-07 |
Family
ID=61652575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711191431.0A Active CN107835496B (en) | 2017-11-24 | 2017-11-24 | Spam short message identification method and device and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107835496B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197337A (en) * | 2018-03-28 | 2018-06-22 | 北京搜狐新媒体信息技术有限公司 | A kind of file classification method and device |
CN108810829A (en) * | 2018-04-19 | 2018-11-13 | 北京奇安信科技有限公司 | A kind of multimedia message intercepting processing method and device |
CN108874776A (en) * | 2018-06-11 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of recognition methods of rubbish text and device |
CN109525950A (en) * | 2018-11-01 | 2019-03-26 | 北京小米移动软件有限公司 | Pseudo-base station note ceases processing method, equipment and storage medium |
CN110177179A (en) * | 2019-05-16 | 2019-08-27 | 国家计算机网络与信息安全管理中心 | A kind of swindle number identification method based on figure insertion |
CN110401779A (en) * | 2018-04-24 | 2019-11-01 | 中国移动通信集团有限公司 | A kind of method, apparatus and computer readable storage medium identifying telephone number |
CN110633466A (en) * | 2019-08-26 | 2019-12-31 | 深圳安巽科技有限公司 | Short message crime identification method and system based on semantic analysis and readable storage medium |
CN110913353A (en) * | 2018-09-17 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Short message classification method and device |
CN111107552A (en) * | 2018-10-25 | 2020-05-05 | 中国移动通信有限公司研究院 | Method and system for identifying pseudo base station |
CN111198947A (en) * | 2020-01-06 | 2020-05-26 | 南京中新赛克科技有限责任公司 | Convolutional neural network fraud short message classification method and system based on naive Bayes optimization |
CN111209391A (en) * | 2018-11-02 | 2020-05-29 | 北京京东尚科信息技术有限公司 | Information identification model establishing method and system and interception method and system |
CN111241269A (en) * | 2018-11-09 | 2020-06-05 | 中移(杭州)信息技术有限公司 | Short message text classification method and device, electronic equipment and storage medium |
CN111259116A (en) * | 2020-01-16 | 2020-06-09 | 北京珞安科技有限责任公司 | Sensitive file detection method based on convolutional neural network |
CN111431791A (en) * | 2020-02-07 | 2020-07-17 | 贝壳技术有限公司 | Instant communication message identification method and system |
CN111581959A (en) * | 2019-01-30 | 2020-08-25 | 北京京东尚科信息技术有限公司 | Information analysis method, terminal and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131655B1 (en) * | 2008-05-30 | 2012-03-06 | Bitdefender IPR Management Ltd. | Spam filtering using feature relevance assignment in neural networks |
CN105516499A (en) * | 2015-12-14 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for classifying short messages, communication terminal and server |
CN106202330A (en) * | 2016-07-01 | 2016-12-07 | 北京小米移动软件有限公司 | The determination methods of junk information and device |
CN106506327A (en) * | 2016-10-11 | 2017-03-15 | 东软集团股份有限公司 | A kind of spam filtering method and device |
-
2017
- 2017-11-24 CN CN201711191431.0A patent/CN107835496B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131655B1 (en) * | 2008-05-30 | 2012-03-06 | Bitdefender IPR Management Ltd. | Spam filtering using feature relevance assignment in neural networks |
CN105516499A (en) * | 2015-12-14 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for classifying short messages, communication terminal and server |
CN106202330A (en) * | 2016-07-01 | 2016-12-07 | 北京小米移动软件有限公司 | The determination methods of junk information and device |
CN106506327A (en) * | 2016-10-11 | 2017-03-15 | 东软集团股份有限公司 | A kind of spam filtering method and device |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197337A (en) * | 2018-03-28 | 2018-06-22 | 北京搜狐新媒体信息技术有限公司 | A kind of file classification method and device |
CN108810829A (en) * | 2018-04-19 | 2018-11-13 | 北京奇安信科技有限公司 | A kind of multimedia message intercepting processing method and device |
CN110401779A (en) * | 2018-04-24 | 2019-11-01 | 中国移动通信集团有限公司 | A kind of method, apparatus and computer readable storage medium identifying telephone number |
CN110401779B (en) * | 2018-04-24 | 2022-02-01 | 中国移动通信集团有限公司 | Method and device for identifying telephone number and computer readable storage medium |
CN108874776A (en) * | 2018-06-11 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of recognition methods of rubbish text and device |
CN108874776B (en) * | 2018-06-11 | 2022-06-03 | 北京奇艺世纪科技有限公司 | Junk text recognition method and device |
CN110913353A (en) * | 2018-09-17 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Short message classification method and device |
CN110913353B (en) * | 2018-09-17 | 2022-01-18 | 阿里巴巴集团控股有限公司 | Short message classification method and device |
CN111107552A (en) * | 2018-10-25 | 2020-05-05 | 中国移动通信有限公司研究院 | Method and system for identifying pseudo base station |
CN109525950A (en) * | 2018-11-01 | 2019-03-26 | 北京小米移动软件有限公司 | Pseudo-base station note ceases processing method, equipment and storage medium |
CN111209391A (en) * | 2018-11-02 | 2020-05-29 | 北京京东尚科信息技术有限公司 | Information identification model establishing method and system and interception method and system |
CN111241269A (en) * | 2018-11-09 | 2020-06-05 | 中移(杭州)信息技术有限公司 | Short message text classification method and device, electronic equipment and storage medium |
CN111241269B (en) * | 2018-11-09 | 2024-02-23 | 中移(杭州)信息技术有限公司 | Short message text classification method and device, electronic equipment and storage medium |
CN111581959A (en) * | 2019-01-30 | 2020-08-25 | 北京京东尚科信息技术有限公司 | Information analysis method, terminal and storage medium |
CN110177179B (en) * | 2019-05-16 | 2020-12-29 | 国家计算机网络与信息安全管理中心 | Fraud number identification method based on graph embedding |
CN110177179A (en) * | 2019-05-16 | 2019-08-27 | 国家计算机网络与信息安全管理中心 | A kind of swindle number identification method based on figure insertion |
CN110633466B (en) * | 2019-08-26 | 2021-01-19 | 深圳安巽科技有限公司 | Short message crime identification method and system based on semantic analysis and readable storage medium |
CN110633466A (en) * | 2019-08-26 | 2019-12-31 | 深圳安巽科技有限公司 | Short message crime identification method and system based on semantic analysis and readable storage medium |
CN111198947A (en) * | 2020-01-06 | 2020-05-26 | 南京中新赛克科技有限责任公司 | Convolutional neural network fraud short message classification method and system based on naive Bayes optimization |
CN111198947B (en) * | 2020-01-06 | 2024-02-13 | 南京中新赛克科技有限责任公司 | Convolutional neural network fraud short message classification method and system based on naive Bayes optimization |
CN111259116A (en) * | 2020-01-16 | 2020-06-09 | 北京珞安科技有限责任公司 | Sensitive file detection method based on convolutional neural network |
CN111431791A (en) * | 2020-02-07 | 2020-07-17 | 贝壳技术有限公司 | Instant communication message identification method and system |
CN111431791B (en) * | 2020-02-07 | 2021-06-18 | 贝壳找房(北京)科技有限公司 | Instant communication message identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN107835496B (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107835496A (en) | A kind of recognition methods of refuse messages, device and server | |
CN109522304B (en) | Abnormal object identification method and device and storage medium | |
CN108629413B (en) | Neural network model training and transaction behavior risk identification method and device | |
CN110399490A (en) | A kind of barrage file classification method, device, equipment and storage medium | |
CN108364028A (en) | A kind of internet site automatic classification method based on deep learning | |
CN110309840A (en) | Risk trade recognition methods, device, server and storage medium | |
CN107943791A (en) | A kind of recognition methods of refuse messages, device and mobile terminal | |
CN102419777B (en) | System and method for filtering internet image advertisements | |
CN108021806A (en) | A kind of recognition methods of malice installation kit and device | |
CN108427708A (en) | Data processing method, device, storage medium and electronic device | |
CN107679997A (en) | Method, apparatus, terminal device and storage medium are refused to pay in medical treatment Claims Resolution | |
CN107808358A (en) | Image watermark automatic testing method | |
CN109872162A (en) | A kind of air control classifying identification method and system handling customer complaint information | |
CN107729492A (en) | A kind of method for pushing of exercise, system and terminal device | |
CN103092975A (en) | Detection and filter method of network community garbage information based on topic consensus coverage rate | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
CN111159404B (en) | Text classification method and device | |
CN111970400B (en) | Crank call identification method and device | |
CN110909224B (en) | Sensitive data automatic classification and identification method and system based on artificial intelligence | |
CN109992781A (en) | Processing, device, storage medium and the processor of text feature | |
CN106203103A (en) | The method for detecting virus of file and device | |
CN111652318A (en) | Currency identification method, currency identification device and electronic equipment | |
CN111046949A (en) | Image classification method, device and equipment | |
CN116226785A (en) | Target object recognition method, multi-mode recognition model training method and device | |
CN113407644A (en) | Enterprise industry secondary industry multi-label classifier based on deep learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |