CN107835496A

CN107835496A - A kind of recognition methods of refuse messages, device and server

Info

Publication number: CN107835496A
Application number: CN201711191431.0A
Authority: CN
Inventors: 郭祥; 周楠; 李强
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2018-03-23
Anticipated expiration: 2037-11-24
Also published as: CN107835496B

Abstract

The invention discloses a kind of recognition methods of refuse messages, device and server, methods described includes：When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to short message to be detected；The participle that word segmentation processing is obtained is mapped to term vector；Format conversion processing is carried out to term vector, obtains the data to be entered for meeting the neural network recognization mode input form being pre-configured with；By the way that data input to be entered is analyzed to neural network recognization model, determine whether short message to be detected is refuse messages.Pass through above-mentioned technical proposal, the semantic relation between the short message whole vocabulary and vocabulary of reception can be identified comprehensively using neural network recognization model, judge whether the short message is refuse messages according to the entire content of short message, the accuracy rate of refuse messages identification is effectively improved, and then comprehensive intercept process is carried out to refuse messages.

Description

A kind of recognition methods of refuse messages, device and server

Technical field

The present invention relates to field of information processing, more particularly to a kind of recognition methods of refuse messages, device and server.

Background technology

The refuse messages of present mobile phone are more and more, and refuse messages include various companies, the promotional content of website, work as people Check during short message, it is necessary to find the short message content for oneself wanting to check from numerous refuse messages, and user wants to delete rubbish Rubbish short message, identification one by one is also carried out to each short message content and then is deleted, is brought inconvenience to cellphone subscriber.

At present, in order to solve such case, the interception software of various catching rubbish short messages is have developed, these intercept software Sensitive vocabulary or short message sending person that can be in short message judges whether short message is refuse messages, once find that the short message is Refuse messages will intercept to it.

But current interception software can only be by the refuse messages with sensitive vocabulary or short message sending person in blacklist Intercepted.So it is possible to the useful short message with sensitive vocabulary will be intercepted as refuse messages, makes user not The content of short message can be known in time；It is also possible to occur that no sensitive vocabulary or short message sending person can not be intercepted not in blacklist Interior refuse messages, refuse messages are made to be taken as normal short message sending to user.So, intercepting software can not be by all rubbish Short message identifies, causes the interception accuracy rate of refuse messages to reduce, it has not been convenient to which user uses.

The content of the invention

In view of this, the invention provides a kind of refuse messages recognition methods, device and server, main purpose is can All refuse messages can not be identified with solving to intercept software, cause the interception accuracy rate of refuse messages is relatively low to ask Topic.

According to first aspect present invention, there is provided a kind of recognition methods of refuse messages, including：

When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to the short message to be detected；

The participle that word segmentation processing is obtained is mapped to term vector；

Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input lattice being pre-configured with The data to be entered of formula；

By the way that the data input to be entered is analyzed to the neural network recognization model, determine described to be detected Whether short message is refuse messages.

According to second aspect of the present invention, there is provided a kind of identification device of refuse messages, including：

Participle unit, for when receiving the instruction of the identification to short message to be detected, dividing the short message to be detected Word processing；

Map unit, the participle for word segmentation processing to be obtained are mapped to term vector；

Format conversion unit, for carrying out format conversion processing to the term vector, obtain meeting the nerve being pre-configured with The data to be entered of Network Recognition mode input form；

Analytic unit, for by the way that the data input to be entered is analyzed to the neural network recognization model, Determine whether the short message to be detected is refuse messages.

According to third aspect present invention, there is provided a kind of storage device, be stored thereon with computer program, described program quilt The recognition methods of the refuse messages described in first aspect is realized during computing device.

According to fourth aspect present invention, there is provided a kind of server, the server include storage device and processor,

The storage device, for storing computer program；

The processor, for performing the computer program to realize the identification side of the refuse messages described in first aspect Method.

By above-mentioned technical proposal, a kind of refuse messages recognition methods, device and server provided by the invention, work as reception When being instructed to the identification to short message to be detected, word segmentation processing is carried out to short message to be detected first, and corresponding word is mapped for participle Vector, then the form of term vector is converted into the number to be entered for meeting the neural network recognization mode input form being pre-configured with According to finally by data input to be entered into the neural network recognization model after learning training.Utilize neural network recognization Semantic relation between short message to be detected of the model to reception whole vocabulary and vocabulary is identified comprehensively, according to be detected The entire content of short message judges whether the short message to be detected is refuse messages, effectively improves the accuracy rate of refuse messages identification, enters And comprehensive intercept process is carried out to refuse messages.

Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.

Brief description of the drawings

By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings：

Fig. 1 shows the flow chart of one embodiment of the refuse messages recognition methods of the present invention；

Fig. 2 shows the flow chart of another embodiment of the refuse messages recognition methods of the present invention；

Fig. 3 shows the training process schematic diagram of the neural network recognization model of the present invention；

Fig. 4 shows the structural representation of one embodiment of the refuse messages identification device of the present invention；

Fig. 5 shows the structural representation of another embodiment of the refuse messages identification device of the present invention；

Fig. 6 shows the entity structure schematic diagram of one embodiment of the server of the present invention.

Embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

The embodiments of the invention provide a kind of recognition methods of refuse messages, server side is preferably applied to, can also be answered For client-side, the entire content of the short message of reception can be identified comprehensively using neural network recognization model, effectively Improve the accuracy rate of refuse messages identification.

As shown in figure 1, the recognition methods of the refuse messages of the present embodiment, step include：

Step 101, when receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to short message to be detected.

Wherein, the identification instruction of short message to be detected can be received the short message to be detected triggering of client transmission by server Input the instruction；The practical business demand that can also be identified by user according to refuse messages is actively entered the instruction；Can also be pre- Neural network recognization model is first stored in terminal local, receive unknown short message when the client in terminal, it is necessary to Triggering inputs the instruction when carrying out refuse messages identification to the unknown short message.Word segmentation processing can utilize condition random field (Condition Random Field, CRF) algorithm carries out word segmentation processing；Or segmented using Max Match word segmentation arithmetic Processing；Or carry out word segmentation processing using minimum segmentation algorithm.

In the above-mentioned technical solutions, when receiving the instruction of the identification to short message to be detected, first short message to be detected is carried out Signature (being, for example, that each short message to be detected matches corresponding identification code or identifier), then carries out word segmentation processing again, will The participle obtained after word segmentation processing is carried out correspondingly with corresponding signature.Obtained after being handled for the later stage according to participle Result, can determine short message to be detected corresponding to the result, avoid at the same time to multiple short messages to be detected at During reason, there is the situation that can not differentiate result.

Step 102, participle word segmentation processing obtained is mapped to term vector.

In the above-mentioned technical solutions, because neural network model directly can not carry out Treatment Analysis to participle, it is therefore desirable to Participle is subjected to mapping processing, is mapped to corresponding term vector.The term vector can be a specific numerical value or one Individual vector matrix.

Step 103, format conversion processing is carried out to term vector, obtains meeting the neural network recognization model being pre-configured with defeated The data to be entered of entry format.

In the above-mentioned technical solutions, it is necessary to which every numerical value of term vector is inputted into neutral net simultaneously by multiple input ports Identification model, and the form of every numerical value corresponding to term vector is unsatisfactory for each input port of neural network recognization model and is pre-configured with Pattern of the input, therefore, it is necessary to term vector is entered row format conversion, the form conversion include：Change the dimension of term vector, change The dimension of term vector is weighted processing to the numerical value in term vector.

Step 104, by the way that data input to be entered is analyzed to neural network recognization model, short message to be detected is determined Whether it is refuse messages.

In the above-mentioned technical solutions, neural network recognization model is obtained using after the excessively multiple learning training of neural network , after data input to be entered to the neural network recognization model, neural network recognization model can simulate the god of human brain Input data is treated through member to be analyzed, and judges whether short message to be detected is refuse messages corresponding to the data to be entered.Its In, the neutral net for training the neural network recognization model to use is preferably convolutional neural networks.In addition, neural network recognization mould Type can also be carried out learning training using short message to be detected, constantly improve god while short message to be detected is identified Accuracy through Network Recognition Model Identification refuse messages.

After it is determined that the short message to be detected received is refuse messages, the refuse messages are put into and intercepted in short message, or will The refuse messages are put into the dustbin of user terminal, or the refuse messages are carried out into complete deletion.

In addition, neural network recognization model is not limited to the identification to short message to be detected, some IMUs can also be identified Some instant messages sent in news software, or the advertising message of each website push.

The neural network recognization model of the present embodiment is known relative to traditional refuse messages recognizer for refuse messages Other F1 values (weighted average of accuracy rate and recall rate) can improve at least 1%.Wherein, accuracy rate is the rubbish identified The ratio of short message quantity and total refuse messages quantity, recall rate are the ratio of the refuse messages quantity and total short message quantity identified Rate.

In server side, using four kinds of main recognition mechanisms：The identification of intelligent garbage short message, swindle short breath identification, pseudo- base Short message of standing identifies.Using these mechanism can to 99% fraud text message, 95% pseudo-base station note, 95% refuse messages are had Effect identification.

Wherein, fraud text message is that maximum short message is endangered user in all refuse messages.By short to swindle class rubbish Believe sampling analysis, find wherein, to pretend to be that the short message of bank is most, and accounting is up to 44.7%；Next to that electric business businessman is pretended to be to cheat Fascination consumer accounts for 24.3%, and the 3rd is to pretend to be telecom operators to account for 13.1%.Fraud text message is also change frequency the most simultaneously Numerous short message.The cell-phone number that is either left in fraud text message, fixed line or URL (Uniform Resource Locator, URL) its life cycle is short then one day, and it is long also with regard to one week or so.So knowledge method for distinguishing for fraud text message Inevitable is not by soon to carrying out, it is necessary to implement non-white i.e. black strategy for fraud text message in a manner of fast.It is short for integration swindle Letter, aviation fraud text message, bank card credit line extraction short message, part-time short message etc., the swindle of common type can all apply non-white I.e. black strategy.The short message of user's input is first handled using URL strategies and number strategy, is collected with reference to server powerful white Library, and machine learning algorithm provide safe class jointly.Non-white i.e. black strategy is not simply white for URL contrasts Storehouse, it can not so meet the requirement quoted for fraud text message needs high accuracy.It is so (i.e. neural using machine learning algorithm Network algorithm), the correlation model (i.e. neural network recognization model) of fraud text message is generated, fraud text message is carried out by the model Identification judges, further lifts accuracy rate, by non-white i.e. black accuracy rate lifting to 99%.Pass through above-mentioned technical proposal can be accurate Efficient to intercept fraud text message, realization is coped with shifting events by sticking to a fundamental principle.

Pass through above-mentioned technical proposal, being capable of the short message to reception whole vocabulary and word using neural network recognization model Semantic relation between remittance is identified comprehensively, judges whether the short message is refuse messages according to the entire content of short message, effectively The accuracy rate of refuse messages identification is improved, and then comprehensive intercept process is carried out to refuse messages.

Fig. 1 corresponds to step before using neural network recognization model, and carrying out learning training to neutral net can just obtain Obtain neural network recognization model.

As shown in Fig. 2 the obtaining step of neural network recognization model includes：

Step 111, training participle is obtained to the training short message progress word segmentation processing of acquisition.

In the above-mentioned technical solutions, when it is multiple to train short message, signature first is carried out to each training short message, then Word segmentation processing is carried out again, and word segmentation processing can carry out word segmentation processing using CRF algorithms；Or entered using Max Match word segmentation arithmetic Row word segmentation processing；Or carry out word segmentation processing using minimum segmentation algorithm.The training participle obtained after word segmentation processing with it is corresponding Signature carry out correspondingly, avoid it is multiple training short messages there is situation about obscuring.

Step 112, corresponding training term vector is mapped to training participle.

Step 113, to training term vector carry out format conversion processing, and by the data input after format conversion processing to roll up Product neutral net is handled to obtain training function.

In the above-mentioned technical solutions, neural network recognization model is by convolutional neural networks (Convolutional Neural Network, CNN) process of learning training is completed, convolutional neural networks enter to the data after format conversion processing Row process of convolution, training function can be exported.

Step 114, it is trained according to training function pair convolutional neural networks, obtains neural network recognization model.

Pass through above-mentioned technical proposal, after training short message to be trained convolutional neural networks so that convolutional Neural The neuron that network can be imitated in human brain carries out deep learning, and obtains the nerve that can refuse messages be carried out with Intelligent Recognition Network Recognition model.

In a particular embodiment, step 112 specifically includes：

Training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, preset in matrix decomposition model It is stored with term vector corresponding to each participle and each participle.

In the above-mentioned technical solutions, preset in matrix decomposition model (word2vec) be each participle distribute corresponding word to Numerical quantity, and term vector corresponding to each participle and each participle is entered into row-column list storage.It will so train each in segmenting Participle substitutes into default matrix decomposition model one by one, and term vector corresponding with each participle is searched from the list of storage, then will Each term vector is collected to obtain training term vector.

In a particular embodiment, before step 112, method also includes：

Step 112 ', short message language material is collected, word segmentation processing is carried out to short message language material and obtains segmenting storehouse.

Step 112 ", corresponding term vector is set for each participle in participle storehouse using default matrix decomposition function.

Step 112 " ', term vector corresponding to each participle and each participle is stored in default matrix decomposition model.

In the above-mentioned technical solutions, substantial amounts of short message language material is collected, each short message language material is subjected to word segmentation processing, and collect In storehouse is segmented, calculated using default matrix decomposition function and each segment the frequency of occurrences or probability in storehouse is segmented, and foundation The frequency of occurrences or probability set corresponding term vector for each participle.Short message language material have it is multiple, by each short message language material successively The default matrix decomposition function of input is handled, and the term vector each segmented in participle storehouse and participle storehouse is adjusted successively It is whole, until all processing is completed to all short message language material, by word corresponding to each participle finally given and each participle to Measure and store to form matrix decomposition model into row-column list.

In a particular embodiment, step 113 specifically includes：

Step 1131, row format conversion (carrying out embedding processing) is entered to training term vector, the one of term vector will be trained Dimension input matrix is converted to two-dimentional input matrix.

In the above-mentioned technical solutions, because convolutional neural networks can only be handled two-dimentional input matrix, and initially Training term vector belong to one-dimensional input matrix, for the ease of convolutional neural networks processing, it is necessary to by one-dimensional input matrix Term vector is trained to carry out format conversion.

For example, training term vector is one-dimensional input matrix (1,2,3,4), is entered row format conversion, it is defeated to be converted to two dimension Enter matrix

Step 1132, the two-dimentional input matrix for training term vector is input to convolutional neural networks and carries out convolution.

Step 1133, the term vector input normalization exponential function after convolution is normalized, and according to normalizing The result for changing processing determines training function.

In the above-mentioned technical solutions, for convenience of calculation, by the term vector input normalization exponential function after convolution (softmax functions) term vector after convolution is normalized so that the numerical value obtained after normalization (0,1] it Between, training function is then determined according to the numerical value after normalization, can thus utilize and train function pair convolutional neural networks to enter Row training.

In a particular embodiment, step 1132 specifically includes：

Step 11321, two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs.

Step 11322, it is determined that each maximum in vector matrix, and splicing is carried out to each maximum.

Step 11323, full link is added in spliced vector value, and spliced vector value is entered by linking entirely Row classification.

In the above-mentioned technical solutions, it is one-dimensional vector matrix from multiple vector matrixs after convolutional neural networks convolution, choosing Greatest measure in each one-dimensional vector matrix is taken, all greatest measures are spliced according to the order of vector matrix after convolution. Then it is classified according to the size of vector value after splicing, the full chain corresponding with the category is searched from tabulation Connect, the full link found is added in spliced vector value.After splicing thus can be judged according to the full link of addition Vector value classification, and then judge the classification of corresponding training short message.

In a particular embodiment, step 1133 specifically includes：

Step 11331, will be normalized with the vector value input normalization exponential function linked entirely.

Step 11332, maximum likelihood function is obtained according to normalized result, maximum likelihood function is minimized Negative logarithm process obtains cross entropy loss function, using cross entropy loss function as training function.

In a particular embodiment, step 111 specifically includes：

Step 1111, the participle number for the participle that statistics training short message word segmentation processing obtains.

Step 1112, participle number is compared with predetermined maximum, when participle number is equal to predetermined maximum, will segmented Participle after processing segments as training.

Step 1113, when participle number is less than predetermined maximum, it will segment number using supplementary data and be supplemented to predetermined maximum Value, segmented the participle after supplement as training.

Step 1114, when participle number is more than predetermined maximum, the participle for exceeding predetermined maximum in all participles is amputated, Segmented the participle after amputation as training.

For example, predetermined maximum is arranged to 120, if training short message obtains 20 words after carrying out word segmentation processing, not enough 120 words, just supplement 100 " 0 " behind the 20th word；If training short message obtains 232 words after carrying out word segmentation processing, just 120 words retain by before, and 112 words are deleted by after.

Pass through above-mentioned technical proposal, can ensure it is each training short message obtain training participle quantity be it is consistent, just In carrying out subsequent treatment to it.

In a particular embodiment, training short message has multiple, corresponding to obtain multiple training functions, then step 114 specifically includes：

Convolutional neural networks are repeatedly trained respectively according to every numerical value of multiple training functions successively, obtain nerve Network Recognition model.

In the above-mentioned technical solutions, each item number of training function scheme of first training short message more than obtained Value, once training of each neuron completion of convolutional neural networks to convolutional neural networks is inputted, recycles second training Convolutional neural networks after the training function pair that short message obtains once is trained carry out second training, by that analogy, until by institute There is training function to fully enter convolutional neural networks to be trained.The convolutional neural networks obtained training for the last time are as god Through Network Recognition model, and identification comprehensively is carried out to the entire content of the short message of reception using the neural network recognization model and sentenced It is disconnected, and then determine whether short message is refuse messages.

In a particular embodiment, step 102 specifically includes：

The participle that word segmentation processing obtains is substituted into default matrix decomposition model and obtains corresponding term vector, presets matrix decomposition Term vector corresponding to each participle and each participle is stored with model.

Utilize above-mentioned steps 112 ', step 112 " and step 112 " ' obtained matrix decomposition model, for short message to be detected point Each participle that word handles to obtain matches corresponding term vector, then carries out each term vector of matching according to the order of participle Collect, obtain as the term vector of one-dimensional input matrix.

According to such scheme, corresponding step 103 specifically includes：

The one-dimensional input matrix of term vector is entered into row format conversion, is converted to and meets neural network recognization mode input form Two-dimentional input matrix.

In the above-mentioned technical solutions, one-dimensional term vector does not meet the pattern of the input of neural network recognization model, so refreshing Term vector can not just be handled through Network Recognition model.The pattern of the input of neural network recognization model is 2-D data, because This is changed, it is necessary to which one-dimensional term vector is entered into row format, is converted into the data to be entered that form is bivector.

According to such scheme, corresponding step 104 specifically includes：

Step 1041, the two-dimentional input matrix of term vector is input into neural network recognization model to be analyzed, counted Value information.

Step 1042, numerical information is detected whether in the range of predetermined value.

Step 1043, if not, it is determined that short message to be detected is refuse messages.

For example, obtained numerical information is the probability that the short message to be detected is refuse messages, if predetermined value scope is 0- 50%, when short message by neural network recognization model to be handled the probability obtained afterwards be 55% when, it is determined that the short message is Refuse messages.

Or, or the numerical information sets corresponding processing grade, and different grades corresponds to different numerical value models Enclose, the processing mode of every kind of grade short message is different.

For example, one-level is：Number range is 0-20%, and processing mode notifies user to push the short message；

Two level is：Number range is 21%-50%, and processing mode is deletes the swindle sentence in short message and to push this short Letter, and notify user；

Three-level is：Number range is 51%-70%, and processing mode is to be put into the short message to intercept in short message；

Level Four is：Number range is 71%-100%, and processing mode is by the short message complete deletion.

Or after being handled using neural network recognization model short message, multiple one-dimensional vector matrixes are obtained, choose every Greatest measure in individual one-dimensional vector matrix, all greatest measures are spliced according to the order of vector matrix after convolution, spliced Full link is added in spliced vector according to the concrete mode of above-mentioned steps 1132 afterwards.Thus can be according to addition Full link judges the classification of spliced vector value, and then judges the classification of corresponding training short message.Be stored with tabulation with Classification corresponding to each vector value, the category include refuse messages class and normal short message class.After thus can be according to convolution Vector value judges whether short message is refuse messages.

As shown in figure 3, in another embodiment of the present invention, the acquisition process of corresponding neural network recognization model is：

First, a large amount of short message language materials are collected, collects to obtain participle corpus after segmenting short message language material, then utilizes The participle corpus is trained to word2vec, obtains the term vector (indexvector corresponding with each segmenting Indexword), and to it store.

Then, multiple training short messages are obtained, after being segmented to training short message, the term vector using above-mentioned storage is each The corresponding training term vector of participle matching, because obtained training term vector belongs to the one-dimensional input space, does not meet neutral net Pattern of the input requirement, therefore, embedding (embedded processing) is to training matrix, will belong to one-dimensional defeated by embedded processing The training term vector for entering space is converted to the two-dimentional input space, is conveniently inputted convolutional neural networks.

Subsequently, convolution is carried out using convolutional neural networks, and max-pooling is carried out to every group of vector value after convolution (maximum is taken in vector) operates.As shown in figure 3, two-dimensional matrix obtains six groups after convolution (convolution) One-dimensional matrix, the maximum in six one-dimensional matrixes is then chosen again.Six maximums of extraction are spliced (concat) Processing, the most full link of terminating one.Numerical value input softmax (normalization exponential function) layer linked entirely will be connected, returned One change is handled, and obtains cross entropy, and using the cross entropy as final cross entropy loss function.

Finally, convolutional neural networks are trained using the cross entropy loss function, multiple training short messages passed through upper State step repeatedly to train convolutional neural networks, training obtains identifying that the neutral net of refuse messages is known after completing Other model.

Further, the specific implementation as Fig. 1 methods, a kind of identification device of refuse messages is present embodiments provided, As shown in figure 4, including：Participle unit 21, map unit 22, format conversion unit 23 and analytic unit 24.

Participle unit 21, for when receiving the instruction of the identification to short message to be detected, being segmented to short message to be detected Processing；

Map unit 22, the participle for word segmentation processing to be obtained are mapped to term vector；

Format conversion unit 23, for carrying out format conversion processing to term vector, obtain meeting the nerve net being pre-configured with The data to be entered of network identification model pattern of the input；

Analytic unit 24, for by the way that data input to be entered is analyzed to neural network recognization model, it is determined that treating Detect whether short message is refuse messages.

As shown in figure 5, in order to obtain neural network recognization model, the device also includes：

Participle unit 21, it is additionally operable to the progress of the training short message to acquisition word segmentation processing and obtains training participle；

Map unit 22, it is additionally operable to be mapped to training participle corresponding training term vector；

Processing unit 25, for carrying out format conversion processing to training term vector, and by the data after format conversion processing Convolutional neural networks are input to be handled to obtain training function；

Participle unit 26, for being trained according to training function pair convolutional neural networks, obtain neural network recognization mould Type.

In a particular embodiment, map unit 22, it is additionally operable to the default matrix decomposition model of training participle substitution obtaining phase The training term vector answered, preset and term vector corresponding to each participle and each participle is stored with matrix decomposition model.

In a particular embodiment, device also includes：

Participle unit 21, it is additionally operable to collect short message language material, word segmentation processing is carried out to short message language material and obtains segmenting storehouse；

Setting unit, for using default matrix decomposition function for each participle in participle storehouse set corresponding word to Amount；

Memory cell, for term vector corresponding to each participle and each participle to be stored in into default matrix decomposition model In.

In a particular embodiment, processing unit 25, specifically include：

Format converting module, row format conversion is entered to training term vector, the one-dimensional input matrix for training term vector is changed For two-dimentional input matrix；

Input module, convolution is carried out for the two-dimentional input matrix for training term vector to be input into convolutional neural networks；

Module is normalized, for the term vector input normalization exponential function after convolution to be normalized, and root Training function is determined according to the result of normalized.

In a particular embodiment, input module, specifically include：

Convolution module, multiple vector matrixs are obtained for two-dimentional input matrix to be input into convolutional neural networks；

Concatenation module, splicing is carried out for determining the maximum in each vector matrix, and to each maximum；

Sort module, for adding full link in spliced vector value, and by linking entirely to spliced vector Value is classified.

In a particular embodiment, module is normalized, is additionally operable to the vector value input normalization index letter linked entirely Number is normalized；Maximum likelihood function is obtained according to normalized result, maximum likelihood function is minimized Negative logarithm process obtains cross entropy loss function, using cross entropy loss function as training function.

In a particular embodiment, participle unit 21, specifically include：

Statistical module, the participle number of the participle obtained for counting training short message word segmentation processing；

Contrast module,, will when participle number is equal to predetermined maximum for participle number to be compared with predetermined maximum Participle after word segmentation processing segments as training；When participle number is less than predetermined maximum, mended using supplementary data by number is segmented Predetermined maximum is charged to, is segmented the participle after supplement as training；When participle number is more than predetermined maximum, all points are amputated Exceed the participle of predetermined maximum in word, segmented the participle after amputation as training.

In a particular embodiment, training short message has multiple, and corresponding to obtain multiple training functions, then participle unit 26, are also used In repeatedly being trained respectively to convolutional neural networks according to every numerical value of multiple training functions successively, neutral net knowledge is obtained Other model.

In a particular embodiment, map unit 22, the participle for being additionally operable to obtain word segmentation processing substitute into default matrix decomposition Model obtains corresponding term vector, and term vector corresponding to each participle and each participle is stored with default matrix decomposition model；

Format conversion unit 23, it is additionally operable to the one-dimensional input matrix of term vector entering row format conversion, is converted to and meets god Two-dimentional input matrix through Network Recognition mode input form.

In a particular embodiment, analytic unit 24, specifically include：

Analysis module, analyzed, obtained for the two-dimentional input matrix of term vector to be input into neural network recognization model To numerical information；

Detection module, for detecting numerical information whether in the range of predetermined value；If not, it is determined that short message to be detected is Refuse messages.

Based on method shown in above-mentioned Fig. 1-3, accordingly, the embodiment of the present application additionally provides a kind of storage device, deposits thereon Computer program is contained, step corresponding to method shown in Fig. 1-3 is realized when program is executed by processor.

Based on method shown in above-mentioned Fig. 1-3 and Fig. 4, the embodiment of 5 shown devices, the present embodiment additionally provides a kind of service Device, as shown in fig. 6, including storage device 32 and processor 31, wherein storage device 32 and processor 31 is arranged at bus 33 On.Storage device 32, for storing computer program；Processor 31, it is square shown in Fig. 1-3 to realize for performing computer program Step corresponding to method.

, being capable of the whole word of the short message to reception using neural network recognization model by the above-mentioned technical proposal of the present invention Semantic relation between remittance and vocabulary is identified comprehensively, judges whether the short message is that rubbish is short according to the entire content of short message Letter, the accuracy rate of refuse messages identification is effectively improved, and then comprehensive intercept process is carried out to refuse messages.

The embodiment of the invention discloses：

A1, a kind of recognition methods of refuse messages, including：

A2, the method as described in A 1, the obtaining step of the neural network recognization model include：

Training participle is obtained to the training short message progress word segmentation processing of acquisition；

Corresponding training term vector is mapped to the training participle；

Format conversion processing is carried out to the training term vector, and the data input after format conversion processing is refreshing to convolution Handled to obtain training function through network；

It is trained according to convolutional neural networks described in the training function pair, obtains neural network recognization model.

A 3, the method as described in A 2, it is described that corresponding training term vector, specific bag are mapped to the training participle Include：

The training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, the default matrix point Term vector corresponding to each participle and each participle is stored with solution model.

A 4, the method as described in A 3, the training participle is substituted into default matrix decomposition model and trained accordingly Before term vector, methods described also includes：

Short message language material is collected, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse；

Corresponding term vector is set for each participle in the participle storehouse using default matrix decomposition function；

Term vector corresponding to each participle and each participle is stored in default matrix decomposition model.

A 5, the method as described in A 2, format conversion processing is carried out to the training term vector, and by format conversion processing Data input afterwards is handled to obtain training function to convolutional neural networks, is specifically included：

Row format conversion is entered to the training term vector, the one-dimensional input matrix of the training term vector is converted into two dimension Input matrix；

The two-dimentional input matrix of the training term vector is input to convolutional neural networks and carries out convolution；

Term vector input normalization exponential function after convolution is normalized, and according to the knot of normalized Fruit determines training function.

A 6, the method as described in A 5, the two-dimentional input matrix of the training term vector is input to convolutional neural networks Convolution is carried out, is specifically included：

The two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs；

It is determined that the maximum in each vector matrix, and splicing is carried out to each maximum；

Full link is added in spliced vector value, and spliced vector value is divided by described link entirely Class.

A 7, the method as described in A 6, the term vector input normalization exponential function by after convolution are normalized Processing, and training function is determined according to the result of normalized, specifically include：

It will be normalized with the vector value input normalization exponential function linked entirely；

Maximum likelihood function is obtained according to normalized result, the maximum likelihood function is carried out to minimize negative logarithm Processing obtains cross entropy loss function, using the cross entropy loss function as training function.

A 8, the method as described in A 2, the training short message of described pair of acquisition carry out word segmentation processing and obtain training participle, specifically Including：

The participle number for the participle that statistics training short message word segmentation processing obtains；

The participle number is compared with predetermined maximum, will when the participle number is equal to the predetermined maximum Participle after word segmentation processing segments as training；

When it is described participle number be less than the predetermined maximum when, using supplementary data by it is described segment number be supplemented to it is described pre- Determine maximum, segmented the participle after supplement as training；

When the participle number is more than the predetermined maximum, amputate in all participles beyond point of the predetermined maximum Word, segmented the participle after amputation as training.

A 9, the method as described in A 2, the training short message have multiple, the corresponding multiple training functions of acquisition, according to described Convolutional neural networks described in training function pair are trained, and are obtained neural network recognization model, are specifically included：

The convolutional neural networks are repeatedly trained respectively according to every numerical value of the multiple training function successively, Obtain neural network recognization model.

A 10, the method as described in any one of A1 to A 9, the participle that word segmentation processing is obtained are mapped to term vector, Specifically include：

The participle that word segmentation processing obtains is substituted into default matrix decomposition model and obtains corresponding term vector, the default matrix Term vector corresponding to each participle and each participle is stored with decomposition model；

Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input lattice being pre-configured with The data to be entered of formula, are specifically included：

The one-dimensional input matrix of the term vector is entered into row format conversion, is converted to and meets neural network recognization mode input The two-dimentional input matrix of form.

A 11, the method as described in A 10, by by the data input to be entered to the neural network recognization model Analyzed, determine whether the short message to be detected is refuse messages, is specifically included：

The two-dimentional input matrix of the term vector is input into the neural network recognization model to be analyzed, obtains numerical value Information；

The numerical information is detected whether in the range of predetermined value；

If not, it is determined that the short message to be detected is refuse messages.

B12, a kind of identification device of refuse messages, including：

B 13, the device as described in B 12, described device also include：Processing unit and training unit；

The participle unit, it is additionally operable to the progress of the training short message to acquisition word segmentation processing and obtains training participle；

The map unit, it is additionally operable to be mapped to corresponding training term vector to the training participle；

Processing unit, for carrying out format conversion processing to the training term vector, and by the number after format conversion processing Handled to obtain training function according to convolutional neural networks are input to；

Training unit, it is trained for the convolutional neural networks according to the training function pair, obtains neutral net Identification model.

B 14, the device as described in B 13,

The map unit, it is additionally operable to obtain training word accordingly by the default matrix decomposition model of training participle substitution Vector, term vector corresponding to each participle and each participle is stored with the default matrix decomposition model.

B 15, the device as described in B 14, described device also include：Setting unit and memory cell；

The participle unit, it is additionally operable to collect short message language material, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse；

Setting unit, for setting corresponding word using default matrix decomposition function for each participle in the participle storehouse Vector；

B 16, the device as described in B 13, the processing unit, are specifically included：

Format converting module, row format conversion is entered to the training term vector, by the one-dimensional input of the training term vector Matrix conversion is two-dimentional input matrix；

Input module, rolled up for the two-dimentional input matrix of the training term vector to be input into convolutional neural networks Product；

B 17, the device as described in B 16, the input module, are specifically included：

Convolution module, multiple vector matrixs are obtained for the two-dimentional input matrix to be input into convolutional neural networks；

Sort module, for adding full link in spliced vector value, and linked entirely to spliced by described Vector value is classified.

B 18, the device as described in B 17,

The normalization module, it is additionally operable to be normalized with the vector value input normalization exponential function linked entirely Processing；

B 19, the device as described in B 13, the participle unit, are specifically included：

Contrast module, for the participle number to be compared with predetermined maximum, when the participle number is equal to described pre- When determining maximum, segmented the participle after word segmentation processing as training；When the participle number is less than the predetermined maximum, profit The participle number is supplemented to the predetermined maximum with supplementary data, segmented the participle after supplement as training；When described When participle number is more than the predetermined maximum, the participle for exceeding the predetermined maximum in all participles is amputated, after amputation Participle is as training participle.

B 20, the device as described in B 13, the training short message have multiple, the corresponding multiple training functions of acquisition,

The training unit, it is additionally operable to successively according to every numerical value of the multiple training function to the convolutional Neural net Network is repeatedly trained respectively, obtains neural network recognization model.

B 21, the device as described in any one of B 12 to B 20,

The map unit, it is additionally operable to obtain the default matrix decomposition model of participle substitution that word segmentation processing obtains accordingly Term vector, term vector corresponding to each participle and each participle is stored with the default matrix decomposition model；

The format conversion unit, it is additionally operable to the one-dimensional input matrix of the term vector entering row format conversion, is converted to Meet the two-dimentional input matrix of neural network recognization mode input form.

B 22, the device as described in B 21, the analytic unit, are specifically included：

Analysis module, carried out for the two-dimentional input matrix of the term vector to be input into the neural network recognization model Analysis, obtains numerical information；

Detection module, for detecting the numerical information whether in the range of predetermined value；If not, it is determined that described to be checked Survey short message is refuse messages.

C23, a kind of storage device, are stored thereon with computer program, and such as A1 is realized when described program is executed by processor To the recognition methods of the refuse messages described in any one of A11.

D24, a kind of server, the server include storage device and processor,

The storage device, for storing computer program；

The processor, for performing the computer program to realize the refuse messages as described in A1 to any one of A11 Recognition methods.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.

It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment " first ", " second " etc. be to be used to distinguish each embodiment, and do not represent the quality of each embodiment.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It should be understood that it can utilize various Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.

Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments all as the present invention.

Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice Microprocessor or digital signal processor (DSP) realize a kind of identification side of refuse messages according to embodiments of the present invention The some or all functions of some or all parts in method, device and server.The present invention is also implemented as being used for Perform method as described herein some or all equipment or program of device (for example, computer program and calculating Machine program product).Such program for realizing the present invention can store on a computer-readable medium, or can have one Or the form of multiple signals.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, Or provided in the form of any other.

It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.

Claims

A kind of 1. recognition methods of refuse messages, it is characterised in that including：

When receiving the instruction of the identification to short message to be detected, word segmentation processing is carried out to the short message to be detected；

The participle that word segmentation processing is obtained is mapped to term vector；

Format conversion processing is carried out to the term vector, obtains meeting the neural network recognization mode input form that is pre-configured with Data to be entered；

By the way that the data input to be entered is analyzed to the neural network recognization model, the short message to be detected is determined Whether it is refuse messages.
2. according to the method for claim 1, it is characterised in that the obtaining step of the neural network recognization model includes：

Training participle is obtained to the training short message progress word segmentation processing of acquisition；

Corresponding training term vector is mapped to the training participle；

Format conversion processing is carried out to the training term vector, and by the data input after format conversion processing to convolutional Neural net Network is handled to obtain training function；

It is trained according to convolutional neural networks described in the training function pair, obtains neural network recognization model.
3. according to the method for claim 2, it is characterised in that described that corresponding training word is mapped to the training participle Vector, specifically include：

The training participle is substituted into default matrix decomposition model to obtain training term vector accordingly, the default matrix decomposition mould Term vector corresponding to each participle and each participle is stored with type.
4. according to the method for claim 3, it is characterised in that the training participle is substituted into default matrix decomposition model and obtained To before corresponding training term vector, methods described also includes：

Short message language material is collected, word segmentation processing is carried out to the short message language material and obtains segmenting storehouse；

Corresponding term vector is set for each participle in the participle storehouse using default matrix decomposition function；

Term vector corresponding to each participle and each participle is stored in default matrix decomposition model.
5. according to the method for claim 2, it is characterised in that format conversion processing is carried out to the training term vector, and Data input after format conversion processing to convolutional neural networks is handled to obtain training function, specifically included：

Row format conversion is entered to the training term vector, the one-dimensional input matrix of the training term vector is converted into two-dimentional input Matrix；

The two-dimentional input matrix of the training term vector is input to convolutional neural networks and carries out convolution；

Term vector input normalization exponential function after convolution is normalized, and it is true according to the result of normalized Surely function is trained.
6. according to the method for claim 5, it is characterised in that be input to the two-dimentional input matrix of the training term vector Convolutional neural networks carry out convolution, specifically include：

The two-dimentional input matrix is input to convolutional neural networks and obtains multiple vector matrixs；

It is determined that the maximum in each vector matrix, and splicing is carried out to each maximum；

Full link is added in spliced vector value, and spliced vector value is classified by described link entirely.
7. according to the method for claim 6, it is characterised in that the term vector input normalization index letter by after convolution Number is normalized, and determines training function according to the result of normalized, specifically includes：

It will be normalized with the vector value input normalization exponential function linked entirely；

Maximum likelihood function is obtained according to normalized result, the maximum likelihood function is carried out to minimize negative logarithm process Cross entropy loss function is obtained, using the cross entropy loss function as training function.
A kind of 8. identification device of refuse messages, it is characterised in that including：

Participle unit, for when receiving the instruction of the identification to short message to be detected, being carried out to the short message to be detected at participle Reason；

Map unit, the participle for word segmentation processing to be obtained are mapped to term vector；

Format conversion unit, for carrying out format conversion processing to the term vector, obtain meeting the neutral net being pre-configured with The data to be entered of identification model pattern of the input；

Analytic unit, for by the way that the data input to be entered is analyzed to the neural network recognization model, it is determined that Whether the short message to be detected is refuse messages.
9. a kind of storage device, is stored thereon with computer program, it is characterised in that is realized when described program is executed by processor The recognition methods of refuse messages described in any one of claim 1 to 7.
A kind of 10. server, it is characterised in that the server includes storage device and processor,

The storage device, for storing computer program；

The processor, for performing the computer program to realize the refuse messages described in any one of claim 1 to 7 Recognition methods.