CN110377699A - SMS recognition methods and relevant device based on NLP - Google Patents

SMS recognition methods and relevant device based on NLP Download PDF

Info

Publication number
CN110377699A
CN110377699A CN201910540582.5A CN201910540582A CN110377699A CN 110377699 A CN110377699 A CN 110377699A CN 201910540582 A CN201910540582 A CN 201910540582A CN 110377699 A CN110377699 A CN 110377699A
Authority
CN
China
Prior art keywords
short message
sms
nlp
word
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910540582.5A
Other languages
Chinese (zh)
Inventor
张仁娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201910540582.5A priority Critical patent/CN110377699A/en
Publication of CN110377699A publication Critical patent/CN110377699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The present invention relates to semantic analytic technique fields, more particularly to a kind of SMS recognition methods and relevant device based on NLP, the described method includes: carrying out degree of correlation judgement again after carrying out word segmentation processing using natural language processing technique after the short message content of any bar short message in acquisition SMS, pass through the keyword that judging result determines the corresponding short message content;Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.The present invention is by refining keyword to short message content, it is recognized after carrying out degree of correlation judgement to keyword again to whether short message content belongs to refuse messages range, realizes the effect that the identification of refuse messages is wide, discrimination method is not easy to be bypassed, plays effective short message content automatically.

Description

SMS recognition methods and relevant device based on NLP
Technical field
The present invention relates to semantic analytic technique field more particularly to a kind of SMS recognition methods and phase based on NLP Close equipment.
Background technique
SMS is the higher a kind of digital communication service of popularity in today's society, passes through mobile phone or other electricity Letter mobile terminal device sends between at least two users or receives the text information of the textual form of folio, thus Realize the information exchange between user.SMS has the advantages that function is simple easily operated when in use, deep by all kinds of use Like at family.But with the universal of information technology and develop, start to be mingled with all kinds of advertising informations and swindle letter in SMS Breath, these information are agreed to but can freely and arbitrarily be sent to user without user, moreover, user cannot under most of scene This kind of short message is rejected according to the wish of oneself.Suchlike short message is referred to as refuse messages.Refuse messages are spread unchecked, The social image for being seriously affected the normal life of people, destroying operator.In order to cope with this kind of situation, mobile service is supplied Quotient is answered to start the measure for gradually implementing cell-phone number real-name authentication, and mobile phone development manufacturer realizes hand also by technological means Resolution of the SMS function built in machine to refuse messages.
But current short message function is relatively simple for the resolved way of refuse messages, majority is only by short Letter source judges after being judged.In addition, existing short message function only rests on the form of expression of pure words, receive Messaging message needs to open the function interface of SMS, is read one by one to word content therein.Today's society, people It is in rapid operation step and rhythm of life mostly, it is critical on pot life for relatively previous life and work demand It lacks much.Therefore, even if refuse messages have been isolated in screening, for normal short message content, user's point is allowed to open SMS Function interface carries out the mode of operation of word read, also seems and falls behind and holding time, affects life indirectly in some cases Living or work efficiency, reduces the user experience of short message function.
In conclusion short message function at this stage have the defects that it is as follows:
1) more narrow to the identification covering surface of refuse messages:
2) it is easier to be bypassed, only need for example, illegal advertisement putting person or telecommunication fraud are violated through replacement cell-phone number Code, or such refuse messages screening mechanism can be got around using hardware technologies such as pseudo-base stations, thus unbridledly Implement malfeasance, furthermore, if being easy to influence normal mobile phone since note number is added in blacklist by maloperation It uses;
3) short message content is only shown by textual form, reduces the easy expenditure of SMS function.
Summary of the invention
The present invention provides based on NLP SMS recognition methods and relevant device, judged in short message by keyword Hold the technological means to mark refuse messages, realizes refuse messages and recognized from content, so that identification covering surface is improved, And identification means are not easy the effect being bypassed, meanwhile, increase the user experience of SMS function by automatic speech broadcasting.
In a first aspect, the present invention provides a kind of SMS recognition methods based on NLP, comprising:
It is segmented after obtaining the short message content of any bar short message in SMS using natural language processing technique NLP Degree of correlation judgement is carried out after processing again, the keyword of the corresponding short message content is determined by judging result;
Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is carried out Label carries out shielding or delete processing when SMS is identified as refuse messages;
Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
In some possible embodiments, using certainly after the short message content for obtaining any bar short message in SMS Right language processing techniques NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message by judging result The keyword of content, comprising:
The adapter tube permission that SMS is obtained after user authorizes is obtained, it is short by the mobile phone after transferring SMS list Believe that the sequence of list reads the short message content of each SMS;
The short message content is pre-processed;
Word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, to processing result The keyword for corresponding to SMS described in each is determined after carrying out Controlling UEP.
In some possible embodiments, it is described by the participle tool based on NLP in the pretreated short message Hold and carries out word segmentation processing, the keyword of SMS described in determining corresponding each after processing result progress Controlling UEP, Include:
Preset preprocessing rule is called to pre-process the short message content;
Word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes the short message Content is divided into several word segments;
Preset meaning of a word degree of correlation judgment rule table is called, according to the low correlation in the meaning of a word degree of correlation judgment rule table The judgment rule of the degree meaning of a word screens each described word segment, when some described word segment is not belonging to the meaning of a word degree of correlation When the low degree of correlation word that the judgment rule in judgment rule table defines, recorded;
Obtain corresponding to the keyword of the SMS described in statistics after the word segment recorded.
It is described to judge whether any bar short message is that rubbish is short according to the keyword in some possible embodiments Letter, and the SMS of identification is marked, it is carried out when SMS is identified as refuse messages at shielding or deletion Reason, comprising:
All keywords of same SMS are obtained, status indication, the status indication are set for the SMS For marking whether the SMS is refuse messages;
Semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and is existed based on the analysis results Boolean is written in the status indication of the SMS to identify whether the SMS is refuse messages, with Boolean It is refuse messages that True, which marks the SMS, marks the SMS for effective short message with Boolean False;
Shielding or deletion function in the SMS function of calling mobile phone is short to the mobile phone that status indication is Boolean True Letter is cleared up.
In some possible embodiments, the shielding or deletion function in the SMS function of the calling mobile phone are to state Labeled as Boolean True SMS cleared up after, comprising:
Obtain the sender for the SMS that status indication is Boolean True;
The address list of calling mobile phone verifies the sender whether in address list, if the sender is in address list In, then the refuse messages of the sender are deleted or shielded;
When the sender is not in address list, counting the status indication in the SMS of the sender is True's The number of SMS, and the number is compared with preset count threshold, when the number is not less than the counting When threshold value, the sender is appended in the blacklist in short message function.
In some possible embodiments, the calling automatic voice playing function is to the mobile phone for being identified as non-junk short message Short message carries out voice broadcasting, comprising:
Obtain any effective short message that status indication is Boolean False;
Establish being associated between effective short message and text-to-speech engine;
It calls the text-to-speech engine to translate with voice mode and plays the content of effective short message.
In some possible embodiments, described establish between effective short message and text-to-speech engine is associated with it Afterwards, comprising:
A corresponding trigger switch is arranged in the effective short message for being False for the status indication;
After the trigger signal for receiving the trigger switch, the text-to-speech engine is called to play effective short message.
Second aspect, the present invention provide a kind of SMS identification dress based on NLP in some possible embodiments It sets, comprising: short message judgment module, short message mark module and short message playing module, in which:
Short message judgment module utilizes natural language after being set as the short message content for obtaining any bar short message in SMS Processing technique NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message content by judging result Keyword;
Short message mark module is set as judging whether any bar short message is refuse messages according to the keyword, and The SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module, be set as calling automatic voice playing function to be identified as the SMS of non-junk short message into Row voice plays.
Based on identical inventive concept, the present invention provides a kind of computer equipment in some possible embodiments, packet Memory and processor are included, computer-readable instruction is stored in the memory, the computer-readable instruction is by the place Manage device execute when, realize the above-mentioned SMS recognition methods based on NLP the step of.
Based on identical inventive concept, the present invention provides a kind of computer-readable storage in some possible embodiments Medium is stored thereon with computer-readable instruction, when the computer-readable instruction is executed by one or more processors, realizes The step of above-mentioned SMS recognition methods based on NLP.
The utility model has the advantages that the present invention by short message content refine keyword, then to keyword carry out degree of correlation judgement after it is right Whether short message content, which belongs to refuse messages range, is recognized, realize the identification of refuse messages is wide, discrimination method be not easy by around It opens, play the effect of effective short message content automatically, specifically include following advantage:
1) high coverage rate of refuse messages content identification is realized by semantic analysis short message content;
2) the easy expenditure and user experience for improving SMS function are played by automatic speech.
Detailed description of the invention
Fig. 1 is a kind of main flow chart of SMS recognition methods based on NLP of the embodiment of the present invention;
Fig. 2 is that extracting in a kind of SMS recognition methods based on NLP of the embodiment of the present invention to short message content is closed The flow chart of degree of correlation judgement is carried out after keyword;
Fig. 3 is to carry out correlation to keyword in a kind of SMS recognition methods based on NLP of the embodiment of the present invention Spend the flow chart of judgement;
Fig. 4 is marking in a kind of SMS recognition methods based on NLP of the embodiment of the present invention to short message content The flow chart of note;
Fig. 5 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in refuse messages at The flow chart of reason;
Fig. 6 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in effective short message at The flow chart of reason;
Fig. 7 be the embodiment of the present invention a kind of SMS recognition methods based on NLP in effective short message at Another flow chart of reason;
Fig. 8 is a kind of functional block diagram of SMS identification device based on NLP of the embodiment of the present invention.
Specific embodiment
The SMS recognition methods and its relevant device that the embodiment of the invention provides a kind of based on NLP, are used for opponent Machine short message content is recognized and is marked, and is carried out the processing such as shielding to refuse messages according to label result, be carried out to effective short message Automatic speech plays, and avoids identification means from being got around by technology, while also improving user experience.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention The embodiment of the present invention is described in attached drawing.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein Or the sequence other than the content of description is implemented.In addition, term " includes " or " having " and its any deformation, it is intended that covering is not Exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units be not necessarily limited to it is clear Step or unit those of is listed on ground, but is not clearly listed or for these process, methods, product or is set Standby intrinsic other step or units.
In wherein some embodiments, the main flow for the SMS recognition methods based on NLP that the present invention provides a kind of Figure, as shown in Figure 1, including step S1~S3:
S1, obtain SMS in any bar short message short message content after using natural language processing technique NLP progress Degree of correlation judgement is carried out after word segmentation processing again, the keyword of the corresponding short message content is determined by judging result.
Specifically, first passing through the SMS function of the modes adapter tube mobile phones such as user's authorization, and obtain from message list each Then the content of short message carries out the identification of content, for example, short message content is divided by NLP technology to each short message Different paragraphs or phrase rejects the lower word of the degree of correlation in conjunction with the rule of the screening word of preset refuse messages Perhaps the remaining phrase of paragraph or paragraph are the keyword that can be used for the subsequent progress meaning of a word or semantic analysis to group.Wherein, NLP technology refers to natural language processing (NLP, Natural Language Processing) technology, it is research people and calculates A series of general name of technologies of language issues in machine interaction, it is a branch of language information processing simultaneously, and is artificial A kind of application of intelligence.
S2, judge whether any bar short message is refuse messages according to the keyword, and to the SMS of identification It is marked, shielding or delete processing is carried out when SMS is identified as refuse messages.
Specifically, recognizing it after carrying out meaning of a word analysis to keyword and belonging to refuse messages or effective short message.Identification process Can by calling common semantic analysis engine to carry out, as Datamate Text Parser Lite, Word Art, The common semantic analysis tool in the markets such as Tegxedo, niuparser.And after recognition result is marked, according to label by its In refuse messages carry out shielding or delete processing.
S3, automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
Specifically, the function that automatic speech plays can be by establishing mobile phone built-in text-to-speech engine and SMS Between association after, the automatic voice that executes plays, for example, calling the TTS speech engine of Microsoft to call short message content, realizes text This turns the effect played after voice.
Above-described embodiment carries out semantic analysis and identification, root by carrying out keyword extraction to short message content, to keyword Short message label and processing are carried out according to recognition results, realizes the identification of SMS, compare traditional technology, can be fundamentally Short message accuracy of identification is improved, is also not easy to be avoided identifying by modes such as replacement numbers by traditional technology means.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to short message The flow chart of degree of correlation judgement is carried out after contents extraction keyword, as shown in Fig. 2, including step S101~S103:
S101, the adapter tube permission for obtaining acquisition SMS after user's authorization, after transferring SMS list, by the hand The sequence of machine message list reads the short message content of each SMS.
Specifically, issue the user with authorization requests first, it is waiting receive user's authorization after, adapter tube short message function, so The message list being stored in mobile phone is called according to access limit afterwards, the time or serial number further according to existing list are as sequence It is successively read each of these short message, obtains short message content.
S102, the short message content is pre-processed.
Specifically, in order to ensure the precision of word segmentation and efficiency, having before carrying out word segmentation processing and semantic analysis to short message content Necessity pre-processes the original contents of short message, the processing including format and punctuation mark etc., and amendment wrong word is repaired Positive punctuation mark, duplicate removal, disambiguation etc., for example, " can ' t " is replaced with " can not " in the abbreviation replacement process of text. Or the symbol in text is replaced, for example, unit symbol kg replaces with kilogram.
In addition, can also introduce deactivated vocabulary when pretreatment, according to the stop words in table, traverse in each short message one by one Common stop words is filtered after appearance.The deactivated vocabulary can be generated by list is manually entered in advance, be deactivated wherein record is common Word, for avoiding these words in information retrieval, to be that retrieval operation plays raising recall precision, saves text storage sky Between effect.
S103, word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, to processing As a result the keyword for corresponding to SMS described in each is determined after carrying out Controlling UEP.
Specifically, the participle tool of any mainstream is called to carry out word segmentation processing to by pretreated short message content, Whole section of short message content is divided into several word segments, then there is independent semantic word segment to carry out the degree of correlation point these The value as keyword is determined if after analysis.When carrying out Controlling UEP, can be advised by preset Controlling UEP It is then analyzed, Controlling UEP model can also be trained to be analyzed in advance.The Controlling UEP in the application refers to pair The meaning of a word of word segment is compared with preset judgement with reference to word, both judge whether be same semanteme analytic process. Wherein, preset judgement refers to word, refers to that the feature for having positive connection with the key feature of refuse messages is used in this application Word, for example be accused of swindling, be accused of the information such as advertising.
The present embodiment determines after carrying out Controlling UEP by the word segmentation result to short message content for judging refuse messages Keyword judge that precision is higher for traditional means.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to key Word carries out the flow chart of degree of correlation judgement, as shown in figure 3, including step S10301~S10304:
S10301, preset preprocessing rule is called to pre-process the short message content.
Specifically, first preparing preprocessing rule, then by script or regulation engine when pre-processing to short message content Executing rule example is driven, using short message content as input, then output is that have passed through pretreated new short message content.Pre- place Reason rule can include: the Substitution Rules to symbol, the Substitution Rules to abbreviation, to Substitution Rules of repetitor etc..Work as execution When to the Substitution Rules of symbol, the symbol to be replaced of retrieval, the symbol generally abridged, such as kg are defined in rule condition, Km etc., after retrieving hit, the regular entity of execution is that replacement kg or km is kilogram or kilometer;Work as execution When to the Substitution Rules of abbreviation, common abbreviation list is set in rule condition, when hitting its a period of time, is replaced with corresponding Complete term, for example, then executing rule entity is to replace with NLP as retrieval hit abbreviation NLP in short message content Natural Language Processing。
S10302, word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes institute It states short message content and is divided into several word segments;
S10303, preset meaning of a word degree of correlation judgment rule table is called, according in the meaning of a word degree of correlation judgment rule table The judgment rule of the low degree of correlation meaning of a word screen each described word segment, when some described word segment is not belonging to institute's predicate When the low degree of correlation word that the judgment rule in adopted degree of correlation judgment rule table defines, recorded;
Obtain corresponding to the keyword of the SMS described in S10304, statistics after the word segment recorded.
Specifically, the text word of short message content and semantic all more original short message content are more smart after pretreatment It refines and accurate.Word segmentation processing is carried out at this time, and participle efficiency can be improved.When carrying out word segmentation processing, call commonly based on NLP's Participle tool carries out word segmentation processing, then word segmentation result is summarized.Wherein, commonly the participle tool based on NLP may be selected In Paoding, mmseg4j, IKAnalyzer, Imdict-chinese-analyzer, Ansj, Httpcws and jieba etc. It is any.After participle, multiple word segments with independent semanteme are obtained, at this time according to preset judgment rule pair These word segments are analyzed, and judge whether it belongs to crucial word category.In the application, the keyword refers to by processing Obtained after analysis, for judge the short message content whether be refuse messages feature word.Wherein, the meaning of a word degree of correlation judges Rule list can be arranged in advance, and the reference word for carrying out Controlling UEP is stored in the table, this kind of to include use with reference to word In judge some word segment whether be effective short message content effective reference word and for judge some word segment whether be The non-effective reference word of non-effective short message content.Controlling UEP rule may be defined as, according to preset condition, when meeting some When word segment and the matching degree of some reference word are more than some threshold value, it is believed that the two meaning of a word having the same.It completes to analyze Afterwards, Macro or mass analysis is as a result, these results are the keyword of some corresponding short message.
The present embodiment improves participle efficiency using pretreatment, using the participle tool based on NLP by pretreated short message Content segmentation is at independent morpheme, after carrying out matching degree analysis in conjunction with preset reference word, determines which word segment to determine The short message belongs to the keyword of effective short message or refuse messages.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to short message The flow chart that content is marked, as shown in figure 4, including step S201~S203:
Status indication, the state is arranged for the SMS in S201, all keywords for obtaining same SMS Label is for marking whether the SMS is refuse messages;
Specifically, for each by word segmentation processing and Controlling UEP treated SMS is arranged one it is right therewith The status indication answered, the record value in the label are Boolean, and wherein True indicates that the short message is non-effective short message, i.e. rubbish is short Letter;Wherein False indicates that the short message is effective short message.
S202, semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and is tied according to analysis Boolean is written to identify whether the SMS is refuse messages, with boolean in fruit in the status indication of the SMS It is refuse messages that value True, which marks the SMS, marks the SMS for effective short message with Boolean False;
S203, calling mobile phone SMS function in shielding or delete function to status indication be Boolean True hand Machine short message is cleared up.
Specifically, can still be carried out using the degree of correlation when carrying out key word analysis by the semantic analysis tool based on NLP Judge, for example calculate the similarity of single keyword Yu certain class reference word by semantic analysis tool, to the institute in same short message After the completion of having the similarity calculation of keyword, the similarity of comprehensive entirety text fragment is whether the short message is refuse messages It is biased to.For example, certain short message is total to be divided into n word segment, the text of corresponding each word segment and non-effective reference word Similarity is w1、w2、…、wn, then the overall similarity of the short message, that is, combine similarity p (W)=p (w1、w2、…、wn), according to The chain algorithm of Bayesian formula obtains p (W) are as follows:
The calculating formula of similarity of short message is obtained such as after simplifying above-mentioned formula using n-gram model by the chain algorithm again Under:
The similarity of entire short message text is calculated according to above-mentioned formula, when the similarity is more than certain threshold value, recognizes Belong to refuse messages for the short message.
After calculating and judgement, the status indication assignment Boolean True of the short message of refuse messages will be belonged to, it is opposite, The status indication of effective short message is assigned a value of False.Then calling mobile phone short message shielding or delete function to label for The short message of True is handled.
The present embodiment obtains some word segment after calculating the skewed popularity of each word segment of short message and is partial to Then the degree of certain class reference word combines the calculated result of all word segments to carry out comprehensive meter using calculating formula of similarity The judgement scope whether entire short message text belongs to refuse messages is obtained after calculation, and when being judged as refuse messages, does corresponding position Reason, this kind of mode improve the identification of refuse messages.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to rubbish The flow chart that short message is handled, as shown in figure 5, including step S20301~S20303:
S20301, the sender for obtaining the SMS that status indication is Boolean True;
S20302, calling mobile phone address list verify the sender whether in address list, if the sender exists In address list, then the refuse messages of the sender is deleted or shielded.
Specifically, the sender of short message may be without knowing it as certain refuse messages under some scenes Relay transmitting terminal, such as using certain APP loophole in the case where not informing mobile phone holder, by certain popularizations of the APP Information is sent to all contact persons in the address list of mobile phone holder, if only carried out with refuse messages content to sender Shielding, then do not have effect once and for all, instead probably due to accidentally deleting or accidentally shielding this kind of sender and influence normal Human relation network.For this purpose, the status indication of any short message A can be read in the application, when status indication is True, mention Sender's number of the short message is taken, and traverses sender's number in cell phone address book, when there is hit project, it is believed that The number is the significant number of customer acceptance, and short message A is done deletion or shielding processing at this time, avoids being judged as refuse messages Content influence user, but sender can be retained and do not handled, prevent from accidentally deleting significant number.
S20303, when the sender is not in address list, count the status indication in the SMS of the sender For the number of the SMS of True, and the number is compared with preset count threshold, when the number is not less than When the count threshold, the sender is appended in the blacklist in short message function.
Specifically, sender is found not in address list after traversing address list, the short message sent in conjunction with the sender Content is judged as refuse messages, then at this time it is believed that the suspicion that this sender promotes or harasses there are short message, can incite somebody to action at this time The sender, which is added in the blacklist of SMS, makees shielding processing, prevents from persistently receiving such refuse messages.In addition, in order to When avoiding routine use, the new cell-phone number of certain normal contact persons is treated as the jump of refuse messages when first time, contact used Plate purposes can first count the same unknown number continuously or add up to have sent the short message that status indication is True several times, then This statistical number is compared with preset value, when being more than this preset count threshold, it is believed that this number is non-effective Then number makees blacklist shielding processing to the non-effective number again.
The present embodiment, issuable certain the case where accidentally deleting number, are arranged by judging address list when for shielding Count threshold carries out the accumulative statistics and judgement for receiving refuse messages quantity, realizes to reduce and manslaughters probability, improves accuracy of identification Effect.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to effective The flow chart that short message is handled, as shown in fig. 6, including step S301~S303:
S301, any effective short message that status indication is Boolean False is obtained;
S302, being associated between effective short message and text-to-speech engine is established;
S303, the text-to-speech engine is called to translate with voice mode and play the content of effective short message.
Specifically, reading the status indication of any short message, when status indication is Boolean False, determine that the short message is Effective short message can carry out that the text of the short message is changed into the operation played after voice automatically.For this purpose, can call directly text turns language Converting text is plays out after sound engine after voice, that is, temporarily stored after the content of text of short message is extracted, And storage address is sent to text-to-speech engine, and send play instruction simultaneously.Wherein, text-to-speech engine can be used Microsoft's TTS speech engine, news fly TTS speech engine etc..
The present embodiment, by the way that the function of playing automatically is arranged for effective short message, realization is right after carrying out refuse messages screening Be judged as effective short message short message content real-time perfoming voice play, so as to reduce user directly operate SMS function when Between, it reduces operation threshold and improves user experience in the case where certain user is not easy manual operation.
In wherein some embodiments, the SMS recognition methods based on NLP that the present invention provides a kind of to effective Another flow chart that short message is handled, as shown in fig. 7, comprises step S30201~S30202:
A corresponding trigger switch is arranged in S30201, the effective short message for being False for the status indication.
Specifically, setting one is eye-catching in the interface where the corresponding effective short message of each broadcasts in SMS interface Button is put, the broadcast button is the trigger switch for triggering effective short message and carrying out voice broadcasting.
S30202, after receiving the trigger signal of the trigger switch, call the text-to-speech engine play described in have Imitate short message.
Specifically, turning after calling text-to-speech engine after receiving user to the clicking operation of the broadcast button The content of text of effective short message is translated to be played in real time after corresponding voice signal.
The present embodiment realizes the voice of human controllable by the way that voice broadcast button is arranged to effective short message in SMS Playing function, thus in the lower application of text-recognition degree, such as dysphotia user in use, such as background Under the weaker application scenarios of light, the acquisition of short message is realized in such a way that voice plays, effectively improves traditional SMS function Practicability.
In wherein some embodiments, the functional block of the present invention provides a kind of SMS identification device based on NLP Figure, as shown in figure 8, including short message judgment module 11, short message mark module 12, short message mark module 13, in which:
Short message judgment module 11 utilizes nature language after being set as the short message content for obtaining any bar short message in SMS Speech processing technique NLP carries out degree of correlation judgement after carrying out word segmentation processing again, determines the corresponding short message content by judging result Keyword;
Short message mark module 12 is set as judging whether any bar short message is refuse messages according to the keyword, And the SMS of identification is marked, shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module 13 is set as calling automatic voice playing function to the SMS for being identified as non-junk short message Carry out voice broadcasting.
In above-described embodiment, the SMS identification device judges SMS by short message judgment module 11 Determine which short message content belongs to refuse messages after processing, which belongs to normal short message, then by short message mark module 12 to rubbish Rubbish short message carries out corresponding processing after being marked, including shields this kind of short message or be deleted.Handling refuse messages Afterwards, to normal short message then call short message playing module 13 carry out voice broadcasting function increase, make normal short message can when needed, The automatic speech for being arranged to carry out short message content after being judged as normal short message plays, or the triggering inputted according to user Signal plays corresponding short message content with voice mode.
It is described the invention proposes a kind of computer equipment, including memory and processor in wherein some embodiments It is stored with computer-readable instruction in memory, when the computer-readable instruction is executed by the processor, realizes above-mentioned base In the SMS recognition methods of NLP the step of.
In wherein some embodiments, the invention proposes a kind of computer readable storage mediums, are stored thereon with calculating Machine readable instruction when the computer-readable instruction is executed by one or more processors, realizes that the above-mentioned mobile phone based on NLP is short The step of believing recognition methods, wherein the storage medium can be non-volatile memory medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of the technical characteristic in example to be all described, as long as however, lance is not present in the combination of these technical characteristics Shield all should be considered as described in this specification.
The some exemplary embodiments of the application above described embodiment only expresses, wherein describe it is more specific and detailed, But it cannot be understood as the limitations to the application the scope of the patents.It should be pointed out that for the ordinary skill of this field For personnel, without departing from the concept of this application, various modifications and improvements can be made, these belong to the application Protection scope.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. the SMS recognition methods based on NLP characterized by comprising
Natural language processing technique NLP progress word segmentation processing is utilized after obtaining the short message content of any bar short message in SMS It carries out degree of correlation judgement again afterwards, the keyword of the corresponding short message content is determined by judging result;
Judge whether any bar short message is refuse messages according to the keyword, and the SMS of identification is marked Note carries out shielding or delete processing when SMS is identified as refuse messages;
Automatic voice playing function is called to carry out voice broadcasting to the SMS for being identified as non-junk short message.
2. the SMS recognition methods according to claim 1 based on NLP, which is characterized in that the acquisition mobile phone is short The degree of correlation is carried out again after carrying out word segmentation processing using natural language processing technique NLP after the short message content of any bar short message in letter Judgement determines the keyword of the corresponding short message content by judging result, comprising:
The adapter tube permission for obtaining SMS after user authorizes is obtained, after transferring SMS list, is arranged by the SMS The sequence of table reads the short message content of each SMS;
The short message content is pre-processed;
Word segmentation processing is carried out to the pretreated short message content by the participle tool based on NLP, processing result is carried out The keyword of SMS described in corresponding each is determined after Controlling UEP.
3. the SMS recognition methods according to claim 2 based on NLP, which is characterized in that described by being based on NLP Participle tool word segmentation processing is carried out to the pretreated short message content, determined after carrying out Controlling UEP to processing result The keyword of SMS described in corresponding each, comprising:
Preset preprocessing rule is called to pre-process the short message content;
Word segmentation processing is carried out using the participle tool based on NLP to the pretreated short message content, makes the short message content It is divided into several word segments;
Preset meaning of a word degree of correlation judgment rule table is called, according to the low degree of correlation word in the meaning of a word degree of correlation judgment rule table The judgment rule of justice screens each described word segment, when some described word segment is not belonging to the meaning of a word degree of correlation judgement When the low degree of correlation word that the judgment rule in rule list defines, recorded;
Obtain corresponding to the keyword of the SMS described in statistics after the word segment recorded.
4. the SMS recognition methods according to claim 2 based on NLP, which is characterized in that described according to the pass Keyword judges whether any bar short message is refuse messages, and the SMS of identification is marked, when SMS quilt It is identified as carrying out shielding or delete processing when refuse messages, comprising:
All keywords of same SMS are obtained, status indication are set for the SMS, the status indication is used for Mark whether the SMS is refuse messages;
Semantic analysis is carried out using keyword of the semantic analysis tool based on NLP to acquisition, and based on the analysis results described Boolean is written in the status indication of SMS to identify whether the SMS is refuse messages, with Boolean True mark Remember that the SMS is refuse messages, marks the SMS for effective short message with Boolean False;
Shielding in the SMS function of calling mobile phone or delete function to the SMS that status indication is Boolean True into Row cleaning.
5. the SMS recognition methods according to claim 4 based on NLP, which is characterized in that the calling mobile phone After the SMS that status indication is Boolean True is cleared up in shielding or deletion function in SMS function, packet It includes:
Obtain the sender for the SMS that status indication is Boolean True;
The address list of calling mobile phone verifies the sender whether in address list, if the sender in address list, The refuse messages of the sender are deleted or shielded;
When the sender is not in address list, the mobile phone that the status indication in the SMS of the sender is True is counted The number of short message, and the number is compared with preset count threshold, when the number is not less than the count threshold When, the sender is appended in the blacklist in short message function.
6. the SMS recognition methods according to claim 4 based on NLP, which is characterized in that described to call automatic language Sound playing function carries out voice broadcasting to the SMS for being identified as non-junk short message, comprising:
Obtain any effective short message that status indication is Boolean False;
Establish being associated between effective short message and text-to-speech engine;
It calls the text-to-speech engine to translate with voice mode and plays the content of effective short message.
7. the SMS recognition methods according to claim 6 based on NLP, which is characterized in that have described in the foundation Effect short message and text-to-speech engine between be associated with after, comprising:
A corresponding trigger switch is arranged in the effective short message for being False for the status indication;
After the trigger signal for receiving the trigger switch, the text-to-speech engine is called to play effective short message.
8. the SMS identification device based on NLP characterized by comprising
Short message judgment module utilizes natural language processing after being set as the short message content for obtaining any bar short message in SMS Technology NLP carries out degree of correlation judgement after carrying out word segmentation processing again, and the key of the corresponding short message content is determined by judging result Word;
Short message mark module is set as judging whether any bar short message is refuse messages according to the keyword, and to knowledge Other SMS is marked, and shielding or delete processing are carried out when SMS is identified as refuse messages;
Short message playing module is set as that automatic voice playing function is called to carry out language to the SMS for being identified as non-junk short message Sound plays.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is special Sign is, when the computer-readable instruction is executed by the processor, realizes as described in any one of claim 1 to 7 The SMS recognition methods based on NLP.
10. a kind of computer readable storage medium, is stored thereon with computer-readable instruction, which is characterized in that the computer When readable instruction is executed by one or more processors, realize as claimed in any of claims 1 to 7 in one of claims based on NLP's SMS recognition methods.
CN201910540582.5A 2019-06-21 2019-06-21 SMS recognition methods and relevant device based on NLP Pending CN110377699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910540582.5A CN110377699A (en) 2019-06-21 2019-06-21 SMS recognition methods and relevant device based on NLP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910540582.5A CN110377699A (en) 2019-06-21 2019-06-21 SMS recognition methods and relevant device based on NLP

Publications (1)

Publication Number Publication Date
CN110377699A true CN110377699A (en) 2019-10-25

Family

ID=68250547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910540582.5A Pending CN110377699A (en) 2019-06-21 2019-06-21 SMS recognition methods and relevant device based on NLP

Country Status (1)

Country Link
CN (1) CN110377699A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111385423A (en) * 2020-03-12 2020-07-07 北京小米移动软件有限公司 Voice broadcasting method, voice broadcasting device and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101778154A (en) * 2009-12-28 2010-07-14 中兴通讯股份有限公司 Method and device for shielding voice broadcasting of short messages
CN104168548A (en) * 2014-08-21 2014-11-26 北京奇虎科技有限公司 Short message intercepting method and device and cloud server
CN106681980A (en) * 2015-11-05 2017-05-17 中国移动通信集团公司 Method and device for analyzing junk short messages
CN107943791A (en) * 2017-11-24 2018-04-20 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and mobile terminal
CN108664473A (en) * 2018-05-11 2018-10-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of text key message
CN109525951A (en) * 2018-12-03 2019-03-26 中国联合网络通信集团有限公司 Junk short message processing method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101778154A (en) * 2009-12-28 2010-07-14 中兴通讯股份有限公司 Method and device for shielding voice broadcasting of short messages
CN104168548A (en) * 2014-08-21 2014-11-26 北京奇虎科技有限公司 Short message intercepting method and device and cloud server
CN106681980A (en) * 2015-11-05 2017-05-17 中国移动通信集团公司 Method and device for analyzing junk short messages
CN107943791A (en) * 2017-11-24 2018-04-20 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and mobile terminal
CN108664473A (en) * 2018-05-11 2018-10-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of text key message
CN109525951A (en) * 2018-12-03 2019-03-26 中国联合网络通信集团有限公司 Junk short message processing method, device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111385423A (en) * 2020-03-12 2020-07-07 北京小米移动软件有限公司 Voice broadcasting method, voice broadcasting device and computer storage medium

Similar Documents

Publication Publication Date Title
CN106550155B (en) Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN107943941B (en) Junk text recognition method and system capable of being updated iteratively
CN106447239B (en) Data release auditing method and device
CN105100366B (en) Harassing call number determines methods, devices and systems
CN108124191A (en) A kind of video reviewing method, device and server
CN103035247B (en) Based on the method and device that voiceprint is operated to audio/video file
US20080201411A1 (en) Method and system for filtering text messages
WO2017076314A1 (en) Processing method and system for adaptive unwanted call identification
CN110334241A (en) Quality detecting method, device, equipment and the computer readable storage medium of customer service recording
CN105872185A (en) Information prompting method, device and system
CN108831439A (en) Audio recognition method, device, equipment and system
CN107093431A (en) A kind of method and device that quality inspection is carried out to service quality
CN106453061A (en) Method and system for recognizing internet fraud behavior
CN101778154A (en) Method and device for shielding voice broadcasting of short messages
CN101784022A (en) Method and system for filtering and classifying short messages
CN107517463A (en) A kind of recognition methods of telephone number and device
CN109033266B (en) Information delivery method and device, electronic equipment and computer readable medium
CN109271768A (en) Release news management method, device, storage medium and terminal
CN103559880A (en) Voice input system and voice input method
CN102567534B (en) Interactive product user generated content intercepting system and intercepting method for the same
CN113903363B (en) Violation behavior detection method, device, equipment and medium based on artificial intelligence
CN108228567A (en) For extracting the method and apparatus of the abbreviation of organization
CN110377699A (en) SMS recognition methods and relevant device based on NLP
CN101389085A (en) Rubbish short message recognition system and method based on sending behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination