CN110377699A - SMS recognition methods and relevant device based on NLP - Google Patents
SMS recognition methods and relevant device based on NLP Download PDFInfo
- Publication number
- CN110377699A CN110377699A CN201910540582.5A CN201910540582A CN110377699A CN 110377699 A CN110377699 A CN 110377699A CN 201910540582 A CN201910540582 A CN 201910540582A CN 110377699 A CN110377699 A CN 110377699A
- Authority
- CN
- China
- Prior art keywords
- short message
- sms
- nlp
- word
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000003058 natural language processing Methods 0.000 claims abstract description 71
- 238000012545 processing Methods 0.000 claims abstract description 49
- 230000006870 function Effects 0.000 claims abstract description 42
- 230000011218 segmentation Effects 0.000 claims abstract description 26
- 230000000694 effects Effects 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims description 23
- 238000005516 engineering process Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims 1
- 238000012850 discrimination method Methods 0.000 abstract description 2
- 238000007670 refining Methods 0.000 abstract 1
- 238000006467 substitution reaction Methods 0.000 description 5
- 238000013475 authorization Methods 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Mining & Analysis (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The present invention relates to semantic analytic technique fields, more particularly to a kind of SMS recognition methods and relevant device based on NLP, the described method includes: carrying out degree of correlation judgement again after carrying out word segmentation processing using natural language processing technique after the short message content of any bar short message in acquisition SMS, pass through the keyword that judging result determines the corresponding short message content;Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.The present invention is by refining keyword to short message content, it is recognized after carrying out degree of correlation judgement to keyword again to whether short message content belongs to refuse messages range, realizes the effect that the identification of refuse messages is wide, discrimination method is not easy to be bypassed, plays effective short message content automatically.
Description
Technical field
The present invention relates to semantic analytic technique field more particularly to a kind of SMS recognition methods and phase based on NLP
Close equipment.
Background technique
SMS is the higher a kind of digital communication service of popularity in today's society, passes through mobile phone or other electricity
Letter mobile terminal device sends between at least two users or receives the text information of the textual form of folio, thus
Realize the information exchange between user.SMS has the advantages that function is simple easily operated when in use, deep by all kinds of use
Like at family.But with the universal of information technology and develop, start to be mingled with all kinds of advertising informations and swindle letter in SMS
Breath, these information are agreed to but can freely and arbitrarily be sent to user without user, moreover, user cannot under most of scene
This kind of short message is rejected according to the wish of oneself.Suchlike short message is referred to as refuse messages.Refuse messages are spread unchecked,
The social image for being seriously affected the normal life of people, destroying operator.In order to cope with this kind of situation, mobile service is supplied
Quotient is answered to start the measure for gradually implementing cell-phone number real-name authentication, and mobile phone development manufacturer realizes hand also by technological means
Resolution of the SMS function built in machine to refuse messages.
But current short message function is relatively simple for the resolved way of refuse messages, majority is only by short
Letter source judges after being judged.In addition, existing short message function only rests on the form of expression of pure words, receive
Messaging message needs to open the function interface of SMS, is read one by one to word content therein.Today's society, people
It is in rapid operation step and rhythm of life mostly, it is critical on pot life for relatively previous life and work demand
It lacks much.Therefore, even if refuse messages have been isolated in screening, for normal short message content, user's point is allowed to open SMS
Function interface carries out the mode of operation of word read, also seems and falls behind and holding time, affects life indirectly in some cases
Living or work efficiency, reduces the user experience of short message function.
In conclusion short message function at this stage have the defects that it is as follows:
1) more narrow to the identification covering surface of refuse messages:
2) it is easier to be bypassed, only need for example, illegal advertisement putting person or telecommunication fraud are violated through replacement cell-phone number
Code, or such refuse messages screening mechanism can be got around using hardware technologies such as pseudo-base stations, thus unbridledly
Implement malfeasance, furthermore, if being easy to influence normal mobile phone since note number is added in blacklist by maloperation
It uses;
3) short message content is only shown by textual form, reduces the easy expenditure of SMS function.
Summary of the invention
The present invention provides based on NLP SMS recognition methods and relevant device, judged in short message by keyword
Hold the technological means to mark refuse messages, realizes refuse messages and recognized from content, so that identification covering surface is improved,
And identification means are not easy the effect being bypassed, meanwhile, increase the user experience of SMS function by automatic speech broadcasting.
In a first aspect, the present invention provides a kind of SMS recognition methods based on NLP, comprising:
It is segmented after obtaining the short message content of any bar short message in SMS using natural language processing technique NLP
Degree of correlation judgement is carried out after processing again, the keyword of the corresponding short message content is determined by judging result;
Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is carried out
Label carries out shielding or delete processing when SMS is identified as refuse messages;
Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
In some possible embodiments, using certainly after the short message content for obtaining any bar short message in SMS
Right language processing techniques NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message by judging result
The keyword of content, comprising:
The adapter tube permission that SMS is obtained after user authorizes is obtained, it is short by the mobile phone after transferring SMS list
Believe that the sequence of list reads the short message content of each SMS;
The short message content is pre-processed;
Word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, to processing result
The keyword for corresponding to SMS described in each is determined after carrying out Controlling UEP.
In some possible embodiments, it is described by the participle tool based on NLP in the pretreated short message
Hold and carries out word segmentation processing, the keyword of SMS described in determining corresponding each after processing result progress Controlling UEP,
Include:
Preset preprocessing rule is called to pre-process the short message content;
Word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes the short message
Content is divided into several word segments;
Preset meaning of a word degree of correlation judgment rule table is called, according to the low correlation in the meaning of a word degree of correlation judgment rule table
The judgment rule of the degree meaning of a word screens each described word segment, when some described word segment is not belonging to the meaning of a word degree of correlation
When the low degree of correlation word that the judgment rule in judgment rule table defines, recorded;
Obtain corresponding to the keyword of the SMS described in statistics after the word segment recorded.
It is described to judge whether any bar short message is that rubbish is short according to the keyword in some possible embodiments
Letter, and the SMS of identification is marked, it is carried out when SMS is identified as refuse messages at shielding or deletion
Reason, comprising:
All keywords of same SMS are obtained, status indication, the status indication are set for the SMS
For marking whether the SMS is refuse messages;
Semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and is existed based on the analysis results
Boolean is written in the status indication of the SMS to identify whether the SMS is refuse messages, with Boolean
It is refuse messages that True, which marks the SMS, marks the SMS for effective short message with Boolean False;
Shielding or deletion function in the SMS function of calling mobile phone is short to the mobile phone that status indication is Boolean True
Letter is cleared up.
In some possible embodiments, the shielding or deletion function in the SMS function of the calling mobile phone are to state
Labeled as Boolean True SMS cleared up after, comprising:
Obtain the sender for the SMS that status indication is Boolean True;
The address list of calling mobile phone verifies the sender whether in address list, if the sender is in address list
In, then the refuse messages of the sender are deleted or shielded;
When the sender is not in address list, counting the status indication in the SMS of the sender is True's
The number of SMS, and the number is compared with preset count threshold, when the number is not less than the counting
When threshold value, the sender is appended in the blacklist in short message function.
In some possible embodiments, the calling automatic voice playing function is to the mobile phone for being identified as non-junk short message
Short message carries out voice broadcasting, comprising:
Obtain any effective short message that status indication is Boolean False;
Establish being associated between effective short message and text-to-speech engine;
It calls the text-to-speech engine to translate with voice mode and plays the content of effective short message.
In some possible embodiments, described establish between effective short message and text-to-speech engine is associated with it
Afterwards, comprising:
A corresponding trigger switch is arranged in the effective short message for being False for the status indication;
After the trigger signal for receiving the trigger switch, the text-to-speech engine is called to play effective short message.
Second aspect, the present invention provide a kind of SMS identification dress based on NLP in some possible embodiments
It sets, comprising: short message judgment module, short message mark module and short message playing module, in which:
Short message judgment module utilizes natural language after being set as the short message content for obtaining any bar short message in SMS
Processing technique NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message content by judging result
Keyword;
Short message mark module is set as judging whether any bar short message is refuse messages according to the keyword, and
The SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module, be set as calling automatic voice playing function to be identified as the SMS of non-junk short message into
Row voice plays.
Based on identical inventive concept, the present invention provides a kind of computer equipment in some possible embodiments, packet
Memory and processor are included, computer-readable instruction is stored in the memory, the computer-readable instruction is by the place
Manage device execute when, realize the above-mentioned SMS recognition methods based on NLP the step of.
Based on identical inventive concept, the present invention provides a kind of computer-readable storage in some possible embodiments
Medium is stored thereon with computer-readable instruction, when the computer-readable instruction is executed by one or more processors, realizes
The step of above-mentioned SMS recognition methods based on NLP.
The utility model has the advantages that the present invention by short message content refine keyword, then to keyword carry out degree of correlation judgement after it is right
Whether short message content, which belongs to refuse messages range, is recognized, realize the identification of refuse messages is wide, discrimination method be not easy by around
It opens, play the effect of effective short message content automatically, specifically include following advantage:
1) high coverage rate of refuse messages content identification is realized by semantic analysis short message content;
2) the easy expenditure and user experience for improving SMS function are played by automatic speech.
Detailed description of the invention
Fig. 1 is a kind of main flow chart of SMS recognition methods based on NLP of the embodiment of the present invention;
Fig. 2 is that extracting in a kind of SMS recognition methods based on NLP of the embodiment of the present invention to short message content is closed
The flow chart of degree of correlation judgement is carried out after keyword;
Fig. 3 is to carry out correlation to keyword in a kind of SMS recognition methods based on NLP of the embodiment of the present invention
Spend the flow chart of judgement;
Fig. 4 is marking in a kind of SMS recognition methods based on NLP of the embodiment of the present invention to short message content
The flow chart of note;
Fig. 5 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in refuse messages at
The flow chart of reason;
Fig. 6 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in effective short message at
The flow chart of reason;
Fig. 7 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in effective short message at
Another flow chart of reason;
Fig. 8 is a kind of functional block diagram of SMS identification device based on NLP of the embodiment of the present invention.
Specific embodiment
The SMS recognition methods and its relevant device that the embodiment of the invention provides a kind of based on NLP, are used for opponent
Machine short message content is recognized and is marked, and is carried out the processing such as shielding to refuse messages according to label result, be carried out to effective short message
Automatic speech plays, and avoids identification means from being got around by technology, while also improving user experience.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
The embodiment of the present invention is described in attached drawing.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein
Or the sequence other than the content of description is implemented.In addition, term " includes " or " having " and its any deformation, it is intended that covering is not
Exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units be not necessarily limited to it is clear
Step or unit those of is listed on ground, but is not clearly listed or for these process, methods, product or is set
Standby intrinsic other step or units.
In wherein some embodiments, the main flow for the SMS recognition methods based on NLP that the present invention provides a kind of
Figure, as shown in Figure 1, including step S1~S3:
S1, obtain SMS in any bar short message short message content after using natural language processing technique NLP progress
Degree of correlation judgement is carried out after word segmentation processing again, the keyword of the corresponding short message content is determined by judging result.
Specifically, first passing through the SMS function of the modes adapter tube mobile phones such as user's authorization, and obtain from message list each
Then the content of short message carries out the identification of content, for example, short message content is divided by NLP technology to each short message
Different paragraphs or phrase rejects the lower word of the degree of correlation in conjunction with the rule of the screening word of preset refuse messages
Perhaps the remaining phrase of paragraph or paragraph are the keyword that can be used for the subsequent progress meaning of a word or semantic analysis to group.Wherein,
NLP technology refers to natural language processing (NLP, Natural Language Processing) technology, it is research people and calculates
A series of general name of technologies of language issues in machine interaction, it is a branch of language information processing simultaneously, and is artificial
A kind of application of intelligence.
S2, judge whether any bar short message is refuse messages according to the keyword, and to the SMS of identification
It is marked, shielding or delete processing is carried out when SMS is identified as refuse messages.
Specifically, recognizing it after carrying out meaning of a word analysis to keyword and belonging to refuse messages or effective short message.Identification process
Can by calling common semantic analysis engine to carry out, as Datamate Text Parser Lite, Word Art,
The common semantic analysis tool in the markets such as Tegxedo, niuparser.And after recognition result is marked, according to label by its
In refuse messages carry out shielding or delete processing.
S3, automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
Specifically, the function that automatic speech plays can be by establishing mobile phone built-in text-to-speech engine and SMS
Between association after, the automatic voice that executes plays, for example, calling the TTS speech engine of Microsoft to call short message content, realizes text
This turns the effect played after voice.
Above-described embodiment carries out semantic analysis and identification, root by carrying out keyword extraction to short message content, to keyword
Short message label and processing are carried out according to recognition results, realizes the identification of SMS, compare traditional technology, can be fundamentally
Short message accuracy of identification is improved, is also not easy to be avoided identifying by modes such as replacement numbers by traditional technology means.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to short message
The flow chart of degree of correlation judgement is carried out after contents extraction keyword, as shown in Fig. 2, including step S101~S103:
S101, the adapter tube permission for obtaining acquisition SMS after user's authorization, after transferring SMS list, by the hand
The sequence of machine message list reads the short message content of each SMS.
Specifically, issue the user with authorization requests first, it is waiting receive user's authorization after, adapter tube short message function, so
The message list being stored in mobile phone is called according to access limit afterwards, the time or serial number further according to existing list are as sequence
It is successively read each of these short message, obtains short message content.
S102, the short message content is pre-processed.
Specifically, in order to ensure the precision of word segmentation and efficiency, having before carrying out word segmentation processing and semantic analysis to short message content
Necessity pre-processes the original contents of short message, the processing including format and punctuation mark etc., and amendment wrong word is repaired
Positive punctuation mark, duplicate removal, disambiguation etc., for example, " can ' t " is replaced with " can not " in the abbreviation replacement process of text.
Or the symbol in text is replaced, for example, unit symbol kg replaces with kilogram.
In addition, can also introduce deactivated vocabulary when pretreatment, according to the stop words in table, traverse in each short message one by one
Common stop words is filtered after appearance.The deactivated vocabulary can be generated by list is manually entered in advance, be deactivated wherein record is common
Word, for avoiding these words in information retrieval, to be that retrieval operation plays raising recall precision, saves text storage sky
Between effect.
S103, word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, to processing
As a result the keyword for corresponding to SMS described in each is determined after carrying out Controlling UEP.
Specifically, the participle tool of any mainstream is called to carry out word segmentation processing to by pretreated short message content,
Whole section of short message content is divided into several word segments, then there is independent semantic word segment to carry out the degree of correlation point these
The value as keyword is determined if after analysis.When carrying out Controlling UEP, can be advised by preset Controlling UEP
It is then analyzed, Controlling UEP model can also be trained to be analyzed in advance.The Controlling UEP in the application refers to pair
The meaning of a word of word segment is compared with preset judgement with reference to word, both judge whether be same semanteme analytic process.
Wherein, preset judgement refers to word, refers to that the feature for having positive connection with the key feature of refuse messages is used in this application
Word, for example be accused of swindling, be accused of the information such as advertising.
The present embodiment determines after carrying out Controlling UEP by the word segmentation result to short message content for judging refuse messages
Keyword judge that precision is higher for traditional means.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to key
Word carries out the flow chart of degree of correlation judgement, as shown in figure 3, including step S10301~S10304:
S10301, preset preprocessing rule is called to pre-process the short message content.
Specifically, first preparing preprocessing rule, then by script or regulation engine when pre-processing to short message content
Executing rule example is driven, using short message content as input, then output is that have passed through pretreated new short message content.Pre- place
Reason rule can include: the Substitution Rules to symbol, the Substitution Rules to abbreviation, to Substitution Rules of repetitor etc..Work as execution
When to the Substitution Rules of symbol, the symbol to be replaced of retrieval, the symbol generally abridged, such as kg are defined in rule condition,
Km etc., after retrieving hit, the regular entity of execution is that replacement kg or km is kilogram or kilometer;Work as execution
When to the Substitution Rules of abbreviation, common abbreviation list is set in rule condition, when hitting its a period of time, is replaced with corresponding
Complete term, for example, then executing rule entity is to replace with NLP as retrieval hit abbreviation NLP in short message content
Natural Language Processing。
S10302, word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes institute
It states short message content and is divided into several word segments;
S10303, preset meaning of a word degree of correlation judgment rule table is called, according in the meaning of a word degree of correlation judgment rule table
The judgment rule of the low degree of correlation meaning of a word screen each described word segment, when some described word segment is not belonging to institute's predicate
When the low degree of correlation word that the judgment rule in adopted degree of correlation judgment rule table defines, recorded;
Obtain corresponding to the keyword of the SMS described in S10304, statistics after the word segment recorded.
Specifically, the text word of short message content and semantic all more original short message content are more smart after pretreatment
It refines and accurate.Word segmentation processing is carried out at this time, and participle efficiency can be improved.When carrying out word segmentation processing, call commonly based on NLP's
Participle tool carries out word segmentation processing, then word segmentation result is summarized.Wherein, commonly the participle tool based on NLP may be selected
In Paoding, mmseg4j, IKAnalyzer, Imdict-chinese-analyzer, Ansj, Httpcws and jieba etc.
It is any.After participle, multiple word segments with independent semanteme are obtained, at this time according to preset judgment rule pair
These word segments are analyzed, and judge whether it belongs to crucial word category.In the application, the keyword refers to by processing
Obtained after analysis, for judge the short message content whether be refuse messages feature word.Wherein, the meaning of a word degree of correlation judges
Rule list can be arranged in advance, and the reference word for carrying out Controlling UEP is stored in the table, this kind of to include use with reference to word
In judge some word segment whether be effective short message content effective reference word and for judge some word segment whether be
The non-effective reference word of non-effective short message content.Controlling UEP rule may be defined as, according to preset condition, when meeting some
When word segment and the matching degree of some reference word are more than some threshold value, it is believed that the two meaning of a word having the same.It completes to analyze
Afterwards, Macro or mass analysis is as a result, these results are the keyword of some corresponding short message.
The present embodiment improves participle efficiency using pretreatment, using the participle tool based on NLP by pretreated short message
Content segmentation is at independent morpheme, after carrying out matching degree analysis in conjunction with preset reference word, determines which word segment to determine
The short message belongs to the keyword of effective short message or refuse messages.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to short message
The flow chart that content is marked, as shown in figure 4, including step S201~S203:
Status indication, the state is arranged for the SMS in S201, all keywords for obtaining same SMS
Label is for marking whether the SMS is refuse messages;
Specifically, for each by word segmentation processing and Controlling UEP treated SMS is arranged one it is right therewith
The status indication answered, the record value in the label are Boolean, and wherein True indicates that the short message is non-effective short message, i.e. rubbish is short
Letter;Wherein False indicates that the short message is effective short message.
S202, semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and is tied according to analysis
Boolean is written to identify whether the SMS is refuse messages, with boolean in fruit in the status indication of the SMS
It is refuse messages that value True, which marks the SMS, marks the SMS for effective short message with Boolean False;
S203, calling mobile phone SMS function in shielding or delete function to status indication be Boolean True hand
Machine short message is cleared up.
Specifically, can still be carried out using the degree of correlation when carrying out key word analysis by the semantic analysis tool based on NLP
Judge, for example calculate the similarity of single keyword Yu certain class reference word by semantic analysis tool, to the institute in same short message
After the completion of having the similarity calculation of keyword, the similarity of comprehensive entirety text fragment is whether the short message is refuse messages
It is biased to.For example, certain short message is total to be divided into n word segment, the text of corresponding each word segment and non-effective reference word
Similarity is w1、w2、…、wn, then the overall similarity of the short message, that is, combine similarity p (W)=p (w1、w2、…、wn), according to
The chain algorithm of Bayesian formula obtains p (W) are as follows:
The calculating formula of similarity of short message is obtained such as after simplifying above-mentioned formula using n-gram model by the chain algorithm again
Under:
The similarity of entire short message text is calculated according to above-mentioned formula, when the similarity is more than certain threshold value, recognizes
Belong to refuse messages for the short message.
After calculating and judgement, the status indication assignment Boolean True of the short message of refuse messages will be belonged to, it is opposite,
The status indication of effective short message is assigned a value of False.Then calling mobile phone short message shielding or delete function to label for
The short message of True is handled.
The present embodiment obtains some word segment after calculating the skewed popularity of each word segment of short message and is partial to
Then the degree of certain class reference word combines the calculated result of all word segments to carry out comprehensive meter using calculating formula of similarity
The judgement scope whether entire short message text belongs to refuse messages is obtained after calculation, and when being judged as refuse messages, does corresponding position
Reason, this kind of mode improve the identification of refuse messages.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to rubbish
The flow chart that short message is handled, as shown in figure 5, including step S20301~S20303:
S20301, the sender for obtaining the SMS that status indication is Boolean True;
S20302, calling mobile phone address list verify the sender whether in address list, if the sender exists
In address list, then the refuse messages of the sender is deleted or shielded.
Specifically, the sender of short message may be without knowing it as certain refuse messages under some scenes
Relay transmitting terminal, such as using certain APP loophole in the case where not informing mobile phone holder, by certain popularizations of the APP
Information is sent to all contact persons in the address list of mobile phone holder, if only carried out with refuse messages content to sender
Shielding, then do not have effect once and for all, instead probably due to accidentally deleting or accidentally shielding this kind of sender and influence normal
Human relation network.For this purpose, the status indication of any short message A can be read in the application, when status indication is True, mention
Sender's number of the short message is taken, and traverses sender's number in cell phone address book, when there is hit project, it is believed that
The number is the significant number of customer acceptance, and short message A is done deletion or shielding processing at this time, avoids being judged as refuse messages
Content influence user, but sender can be retained and do not handled, prevent from accidentally deleting significant number.
S20303, when the sender is not in address list, count the status indication in the SMS of the sender
For the number of the SMS of True, and the number is compared with preset count threshold, when the number is not less than
When the count threshold, the sender is appended in the blacklist in short message function.
Specifically, sender is found not in address list after traversing address list, the short message sent in conjunction with the sender
Content is judged as refuse messages, then at this time it is believed that the suspicion that this sender promotes or harasses there are short message, can incite somebody to action at this time
The sender, which is added in the blacklist of SMS, makees shielding processing, prevents from persistently receiving such refuse messages.In addition, in order to
When avoiding routine use, the new cell-phone number of certain normal contact persons is treated as the jump of refuse messages when first time, contact used
Plate purposes can first count the same unknown number continuously or add up to have sent the short message that status indication is True several times, then
This statistical number is compared with preset value, when being more than this preset count threshold, it is believed that this number is non-effective
Then number makees blacklist shielding processing to the non-effective number again.
The present embodiment, issuable certain the case where accidentally deleting number, are arranged by judging address list when for shielding
Count threshold carries out the accumulative statistics and judgement for receiving refuse messages quantity, realizes to reduce and manslaughters probability, improves accuracy of identification
Effect.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to effective
The flow chart that short message is handled, as shown in fig. 6, including step S301~S303:
S301, any effective short message that status indication is Boolean False is obtained;
S302, being associated between effective short message and text-to-speech engine is established;
S303, the text-to-speech engine is called to translate with voice mode and play the content of effective short message.
Specifically, reading the status indication of any short message, when status indication is Boolean False, determine that the short message is
Effective short message can carry out that the text of the short message is changed into the operation played after voice automatically.For this purpose, can call directly text turns language
Converting text is plays out after sound engine after voice, that is, temporarily stored after the content of text of short message is extracted,
And storage address is sent to text-to-speech engine, and send play instruction simultaneously.Wherein, text-to-speech engine can be used
Microsoft's TTS speech engine, news fly TTS speech engine etc..
The present embodiment, by the way that the function of playing automatically is arranged for effective short message, realization is right after carrying out refuse messages screening
Be judged as effective short message short message content real-time perfoming voice play, so as to reduce user directly operate SMS function when
Between, it reduces operation threshold and improves user experience in the case where certain user is not easy manual operation.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to effective
Another flow chart that short message is handled, as shown in fig. 7, comprises step S30201~S30202:
A corresponding trigger switch is arranged in S30201, the effective short message for being False for the status indication.
Specifically, setting one is eye-catching in the interface where the corresponding effective short message of each broadcasts in SMS interface
Button is put, the broadcast button is the trigger switch for triggering effective short message and carrying out voice broadcasting.
S30202, after receiving the trigger signal of the trigger switch, call the text-to-speech engine play described in have
Imitate short message.
Specifically, turning after calling text-to-speech engine after receiving user to the clicking operation of the broadcast button
The content of text of effective short message is translated to be played in real time after corresponding voice signal.
The present embodiment realizes the voice of human controllable by the way that voice broadcast button is arranged to effective short message in SMS
Playing function, thus in the lower application of text-recognition degree, such as dysphotia user in use, such as background
Under the weaker application scenarios of light, the acquisition of short message is realized in such a way that voice plays, effectively improves traditional SMS function
Practicability.
In wherein some embodiments, the functional block of the present invention provides a kind of SMS identification device based on NLP
Figure, as shown in figure 8, including short message judgment module 11, short message mark module 12, short message mark module 13, in which:
Short message judgment module 11 utilizes nature language after being set as the short message content for obtaining any bar short message in SMS
Speech processing technique NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message content by judging result
Keyword;
Short message mark module 12 is set as judging whether any bar short message is refuse messages according to the keyword,
And the SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module 13 is set as calling automatic voice playing function to the SMS for being identified as non-junk short message
Carry out voice broadcasting.
In above-described embodiment, the SMS identification device judges SMS by short message judgment module 11
Determine which short message content belongs to refuse messages after processing, which belongs to normal short message, then by short message mark module 12 to rubbish
Rubbish short message carries out corresponding processing after being marked, including shields this kind of short message or be deleted.Handling refuse messages
Afterwards, to normal short message then call short message playing module 13 carry out voice broadcasting function increase, make normal short message can when needed,
The automatic speech for being arranged to carry out short message content after being judged as normal short message plays, or the triggering inputted according to user
Signal plays corresponding short message content with voice mode.
It is described the invention proposes a kind of computer equipment, including memory and processor in wherein some embodiments
It is stored with computer-readable instruction in memory, when the computer-readable instruction is executed by the processor, realizes above-mentioned base
In the SMS recognition methods of NLP the step of.
In wherein some embodiments, the invention proposes a kind of computer readable storage mediums, are stored thereon with calculating
Machine readable instruction when the computer-readable instruction is executed by one or more processors, realizes that the above-mentioned mobile phone based on NLP is short
The step of believing recognition methods, wherein the storage medium can be non-volatile memory medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random
Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of the technical characteristic in example to be all described, as long as however, lance is not present in the combination of these technical characteristics
Shield all should be considered as described in this specification.
The some exemplary embodiments of the application above described embodiment only expresses, wherein describe it is more specific and detailed,
But it cannot be understood as the limitations to the application the scope of the patents.It should be pointed out that for the ordinary skill of this field
For personnel, without departing from the concept of this application, various modifications and improvements can be made, these belong to the application
Protection scope.Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. the SMS recognition methods based on NLP characterized by comprising
Natural language processing technique NLP progress word segmentation processing is utilized after obtaining the short message content of any bar short message in SMS
It carries out degree of correlation judgement again afterwards, the keyword of the corresponding short message content is determined by judging result;
Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is marked
Note carries out shielding or delete processing when SMS is identified as refuse messages;
Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
2. the SMS recognition methods according to claim 1 based on NLP, which is characterized in that the acquisition mobile phone is short
The degree of correlation is carried out again after carrying out word segmentation processing using natural language processing technique NLP after the short message content of any bar short message in letter
Judgement determines the keyword of the corresponding short message content by judging result, comprising:
The adapter tube permission for obtaining SMS after user authorizes is obtained, after transferring SMS list, is arranged by the SMS
The sequence of table reads the short message content of each SMS;
The short message content is pre-processed;
Word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, processing result is carried out
The keyword of SMS described in corresponding each is determined after Controlling UEP.
3. the SMS recognition methods according to claim 2 based on NLP, which is characterized in that described by being based on NLP
Participle tool word segmentation processing is carried out to the pretreated short message content, determined after carrying out Controlling UEP to processing result
The keyword of SMS described in corresponding each, comprising:
Preset preprocessing rule is called to pre-process the short message content;
Word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes the short message content
It is divided into several word segments;
Preset meaning of a word degree of correlation judgment rule table is called, according to the low degree of correlation word in the meaning of a word degree of correlation judgment rule table
The judgment rule of justice screens each described word segment, when some described word segment is not belonging to the meaning of a word degree of correlation judgement
When the low degree of correlation word that the judgment rule in rule list defines, recorded;
Obtain corresponding to the keyword of the SMS described in statistics after the word segment recorded.
4. the SMS recognition methods according to claim 2 based on NLP, which is characterized in that described according to the pass
Keyword judges whether any bar short message is refuse messages, and the SMS of identification is marked, when SMS quilt
It is identified as carrying out shielding or delete processing when refuse messages, comprising:
All keywords of same SMS are obtained, status indication are set for the SMS, the status indication is used for
Mark whether the SMS is refuse messages;
Semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and based on the analysis results described
Boolean is written in the status indication of SMS to identify whether the SMS is refuse messages, with Boolean True mark
Remember that the SMS is refuse messages, marks the SMS for effective short message with Boolean False;
Shielding in the SMS function of calling mobile phone or delete function to the SMS that status indication is Boolean True into
Row cleaning.
5. the SMS recognition methods according to claim 4 based on NLP, which is characterized in that the calling mobile phone
After the SMS that status indication is Boolean True is cleared up in shielding or deletion function in SMS function, packet
It includes:
Obtain the sender for the SMS that status indication is Boolean True;
The address list of calling mobile phone verifies the sender whether in address list, if the sender in address list,
The refuse messages of the sender are deleted or shielded;
When the sender is not in address list, the mobile phone that the status indication in the SMS of the sender is True is counted
The number of short message, and the number is compared with preset count threshold, when the number is not less than the count threshold
When, the sender is appended in the blacklist in short message function.
6. the SMS recognition methods according to claim 4 based on NLP, which is characterized in that described to call automatic language
Sound playing function carries out voice broadcasting to the SMS for being identified as non-junk short message, comprising:
Obtain any effective short message that status indication is Boolean False;
Establish being associated between effective short message and text-to-speech engine;
It calls the text-to-speech engine to translate with voice mode and plays the content of effective short message.
7. the SMS recognition methods according to claim 6 based on NLP, which is characterized in that have described in the foundation
Effect short message and text-to-speech engine between be associated with after, comprising:
A corresponding trigger switch is arranged in the effective short message for being False for the status indication;
After the trigger signal for receiving the trigger switch, the text-to-speech engine is called to play effective short message.
8. the SMS identification device based on NLP characterized by comprising
Short message judgment module utilizes natural language processing after being set as the short message content for obtaining any bar short message in SMS
Technology NLP carries out degree of correlation judgement after carrying out word segmentation processing again, and the key of the corresponding short message content is determined by judging result
Word;
Short message mark module is set as judging whether any bar short message is refuse messages according to the keyword, and to knowledge
Other SMS is marked, and shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module is set as that automatic voice playing function is called to carry out language to the SMS for being identified as non-junk short message
Sound plays.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is special
Sign is, when the computer-readable instruction is executed by the processor, realizes as described in any one of claim 1 to 7
The SMS recognition methods based on NLP.
10. a kind of computer readable storage medium, is stored thereon with computer-readable instruction, which is characterized in that the computer
When readable instruction is executed by one or more processors, realize as claimed in any of claims 1 to 7 in one of claims based on NLP's
SMS recognition methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540582.5A CN110377699A (en) | 2019-06-21 | 2019-06-21 | SMS recognition methods and relevant device based on NLP |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540582.5A CN110377699A (en) | 2019-06-21 | 2019-06-21 | SMS recognition methods and relevant device based on NLP |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110377699A true CN110377699A (en) | 2019-10-25 |
Family
ID=68250547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910540582.5A Pending CN110377699A (en) | 2019-06-21 | 2019-06-21 | SMS recognition methods and relevant device based on NLP |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377699A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111385423A (en) * | 2020-03-12 | 2020-07-07 | 北京小米移动软件有限公司 | Voice broadcasting method, voice broadcasting device and computer storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101778154A (en) * | 2009-12-28 | 2010-07-14 | 中兴通讯股份有限公司 | Method and device for shielding voice broadcasting of short messages |
CN104168548A (en) * | 2014-08-21 | 2014-11-26 | 北京奇虎科技有限公司 | Short message intercepting method and device and cloud server |
CN106681980A (en) * | 2015-11-05 | 2017-05-17 | 中国移动通信集团公司 | Method and device for analyzing junk short messages |
CN107943791A (en) * | 2017-11-24 | 2018-04-20 | 北京奇虎科技有限公司 | A kind of recognition methods of refuse messages, device and mobile terminal |
CN108664473A (en) * | 2018-05-11 | 2018-10-16 | 平安科技(深圳)有限公司 | Recognition methods, electronic device and the readable storage medium storing program for executing of text key message |
CN109525951A (en) * | 2018-12-03 | 2019-03-26 | 中国联合网络通信集团有限公司 | Junk short message processing method, device and equipment |
-
2019
- 2019-06-21 CN CN201910540582.5A patent/CN110377699A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101778154A (en) * | 2009-12-28 | 2010-07-14 | 中兴通讯股份有限公司 | Method and device for shielding voice broadcasting of short messages |
CN104168548A (en) * | 2014-08-21 | 2014-11-26 | 北京奇虎科技有限公司 | Short message intercepting method and device and cloud server |
CN106681980A (en) * | 2015-11-05 | 2017-05-17 | 中国移动通信集团公司 | Method and device for analyzing junk short messages |
CN107943791A (en) * | 2017-11-24 | 2018-04-20 | 北京奇虎科技有限公司 | A kind of recognition methods of refuse messages, device and mobile terminal |
CN108664473A (en) * | 2018-05-11 | 2018-10-16 | 平安科技(深圳)有限公司 | Recognition methods, electronic device and the readable storage medium storing program for executing of text key message |
CN109525951A (en) * | 2018-12-03 | 2019-03-26 | 中国联合网络通信集团有限公司 | Junk short message processing method, device and equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111385423A (en) * | 2020-03-12 | 2020-07-07 | 北京小米移动软件有限公司 | Voice broadcasting method, voice broadcasting device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106550155B (en) | Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted | |
CN110910901B (en) | Emotion recognition method and device, electronic equipment and readable storage medium | |
CN107943941B (en) | Junk text recognition method and system capable of being updated iteratively | |
CN106447239B (en) | Data release auditing method and device | |
CN105100366B (en) | Harassing call number determines methods, devices and systems | |
CN108124191A (en) | A kind of video reviewing method, device and server | |
CN103035247B (en) | Based on the method and device that voiceprint is operated to audio/video file | |
US20080201411A1 (en) | Method and system for filtering text messages | |
WO2017076314A1 (en) | Processing method and system for adaptive unwanted call identification | |
CN110334241A (en) | Quality detecting method, device, equipment and the computer readable storage medium of customer service recording | |
CN105872185A (en) | Information prompting method, device and system | |
CN108831439A (en) | Audio recognition method, device, equipment and system | |
CN107093431A (en) | A kind of method and device that quality inspection is carried out to service quality | |
CN106453061A (en) | Method and system for recognizing internet fraud behavior | |
CN101778154A (en) | Method and device for shielding voice broadcasting of short messages | |
CN101784022A (en) | Method and system for filtering and classifying short messages | |
CN107517463A (en) | A kind of recognition methods of telephone number and device | |
CN109033266B (en) | Information delivery method and device, electronic equipment and computer readable medium | |
CN109271768A (en) | Release news management method, device, storage medium and terminal | |
CN103559880A (en) | Voice input system and voice input method | |
CN102567534B (en) | Interactive product user generated content intercepting system and intercepting method for the same | |
CN113903363B (en) | Violation behavior detection method, device, equipment and medium based on artificial intelligence | |
CN108228567A (en) | For extracting the method and apparatus of the abbreviation of organization | |
CN110377699A (en) | SMS recognition methods and relevant device based on NLP | |
CN101389085A (en) | Rubbish short message recognition system and method based on sending behavior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |