CN106331354B - A kind of short message information extracting and analysis method - Google Patents
A kind of short message information extracting and analysis method Download PDFInfo
- Publication number
- CN106331354B CN106331354B CN201610744099.5A CN201610744099A CN106331354B CN 106331354 B CN106331354 B CN 106331354B CN 201610744099 A CN201610744099 A CN 201610744099A CN 106331354 B CN106331354 B CN 106331354B
- Authority
- CN
- China
- Prior art keywords
- short message
- new
- sender
- information
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/7243—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
- H04M1/72436—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. SMS or e-mail
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Abstract
The present invention relates to a kind of short message information extracting and analysis methods, the short message that this method obtains carries out numbers match, obtains its sender, is matched according to the corresponding regular expression of sender to short message, its key message is extracted, to be used for subsequent information analysis and displaying.The present invention also to regular expression, number, sender propose it is a kind of by short message carry out security update method.Method provided by the invention reliability with higher and safety, the matched scalability of short message is strong, has broad applicability, whole flow process is without participating in manually, having the characteristics that intelligent and modernizing.
Description
[technical field]
The invention belongs to one of smart phone and field of information processing more particularly to a kind of smart phone short messages
Extraction and analysis method.
[background technique]
In recent years, with the development of smart phone and mobile Internet, mobile phone is important at one in for people's lives
Tool, due to the uniqueness of cell-phone number, mobile phone also becomes a tool of certification personal identification.Therefore, mobile phone is usually used
Short message as to a kind of personal accurate communication means, such as: various identifying code short messages, bank's consumption information short message, express delivery
Notifying messages etc..But although a large amount of such short message is not refuse messages, it also will use family mobile phone and be submerged in short message sea
Midocean, so that user be made to have ignored really important short message.For these reasons, present user usually requires to propose short message
It takes and arranges.
Currently, only small part custom-made mobile phone ROM has the letters such as identifying code identification in short message extraction and analysis related fields
Single abstraction function, and it is only applicable to the mobile phone of special handset manufacturer, do not have universality.Moreover, for the behaviour of short message
Focus mostly in simple message extraction, there is no being directed to different types of short message, the short message of separate sources different function into
Row careful division and specially treated.Therefore, user is still perplexed by a large amount of certain types of customization short messages.
[summary of the invention]
In order to solve the above problem in the prior art, the invention proposes a kind of short message information extracting and analysis method,
This method, as the matched mode of short message, matches short message and is identified using regular expression.
The technical solution adopted by the invention is as follows:
A kind of short message information extracting and analysis method, this method comprises the following steps:
Step 100: obtaining a short message record to be analyzed;
Step 200: according to the number record in database, the short message being recorded and carries out numbers match, determines the short message
The sender of record;If failing to be matched to number, which is not analyzed, return step 100;
Step 300: the sender recorded according to the short message obtains the canonical table of the short message of the sender from data base querying
Up to formula;
Step 400: all regular expressions obtained based on step 300 one by one match short message to be analyzed, such as
None regular expression of fruit can match the short message, then abandon the analysis to short message, return step 100;If finding one
Regular expression can match the short message, then according to the regular expression, extract the key message in short message.
Step 500: by the short message extracted key message and its sender's associated storage in the database;
Step 600: it is for statistical analysis to the key message stored in database, and show statistic analysis result.
Further, also according to following step to the regular expression and the information such as related sender, number in database
It is updated:
(1) a update server is set, and each needs the mobile phone updated all to be stepped in advance in the update server
Note, register information includes phone number, and each needs all to store the digital certificate of the update server on the mobile phone updated;
(2) when needing more new information, update server will more new information and current date according to predefined format into
Row is packaged, and obtains more new packets;
(3) it for the phone number P registered on updating server, updates server and calculates
PM=P mod 256;By each byte of more new packets and the PMExclusive or is carried out, to obtain an exclusive or
The more new packets of second afterwards;
(4) server is updated to be digitally signed the second more new packets using its digital certificate private key;
(5) update server to the described second more new packets carry out Base64 coding, using obtained coding result as
First short message is sent to phone number P;
(6) it updates server and Base64 coding is carried out to the digital signature that step 4 obtains, it is short using coding result as second
Letter is sent to phone number P;
(7) it updates server and step 3-6 is repeated to the phone number of each registration, to be sent out to the phone number of each registration
Send the first short message and the second short message;
(8) after a mobile phone receives above-mentioned first short message and the second short message, if it find that the sender of this two short messages
Number is the number for updating server, then carries out Base64 decoding to two short messages, obtains the second more new packets sum number word label
Name;
(9) mobile phone is using the digital certificate for updating server, the second more new packets sum number word obtained to step 8
Signature carries out signature verification, if verifying does not pass through, ignores then two short messages, otherwise continues subsequent step;
(10) mobile phone calculates P based on the phone number P of itselfM=P mod 256;Then step 8 is obtained second
The each byte and the P of more new packetsMExclusive or is carried out, to obtain original more new packets;
(11) mobile phone parses the content of the more new packets using predefined format, obtains more new information and update
Date information in packet;Whether the difference for verifying the date information and current date is greater than predetermined threshold, if it is greater, then neglecting
The slightly more new packets;If it is not greater, then the update information update database based on acquisition.
Further, the short message record to be analyzed in step 100 obtains one of in the following manner:
A short message record is read from SMS database;
Using short message mechanism is intercepted, intercepts and obtain the new message that mobile phone receives.
Further, the more new packets include one of following three kinds of information:
Increase new regular expression for some existing sender;
Increase new number for some existing sender;
Increase new sender and its number, regular expression.
Further, each more new packets only include a more new information.
Further, the PM is precalculated by update server.
The beneficial effect comprise that provide it is reliable, it is safe, it is various, complete short message information extracting with point
Analysis method, the matched scalability of short message is strong, has broad applicability, and whole flow process has intelligent and existing without participating in manually
The characteristics of generationization.
[Detailed description of the invention]
Described herein the drawings are intended to provide a further understanding of the invention, constitutes part of this application, but
It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is the flow chart of short message information extracting and analysis method of the present invention.
Fig. 2 is the embodiment of a statistical graph of short message analysis of the present invention.
[specific embodiment]
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and says
It is bright to be only used to explain the present invention but not as a limitation of the invention.
The embodiment of the present invention operates on Android phone platform, and main thought is: according to the regular expressions of agreement
Formula, the intelligent extraction key message from short message text, optimize information exhibition method, carry out ASSOCIATE STATISTICS operation, whole process without
User is needed to participate in manually.
Referring to attached drawing 1, it illustrates the basic procedure of short message information extracting of the invention and analysis method, which includes
Following steps:
Step 100: obtaining a short message record to be analyzed.
Short message information extracting and analysis method of the invention can be directed to old short message, can also be directed to new message, therefore walk
Actually there are two data sources for the short message to be analyzed obtained in rapid 100: one is read from the note data library of mobile phone
One short message record;The other is intercepting each new message that mobile phone receives using short message mechanism is intercepted.Based on Android
On the one hand the open platform of mobile phone, the cell phone system realized using method of the invention can read the note data of mobile phone
Library, on the other hand can be in one SMS interception program of running background, to intercept the new message received.Based on the two data come
Source, the old short message of the available each of step 100 and new message, to complete subsequent step.
Step 200: according to the number record in database, the short message being recorded and carries out numbers match, determines the short message
The sender of record;If failing to be matched to number, which is not analyzed, return step 100.
Every short message record has a number of a transmission short message, this number be it is corresponding with short message sender, it is attached
Several examples are given in Fig. 1, the number including Construction Bank is 95533, and the number of industrial and commercial bank is 95588 etc..It is same
Sender may also have different numbers, these numbers and sender's associated storage are in the number record of database.Therefore,
By sender's number of analysis short message record, the sender of short message can be confirmed, this is first letter extracted from short message
Breath.But the present invention only analyzes the short message of specific sender, such as each bank, express delivery, train ticket ticketing website etc.
Deng for common daily short message without processing, therefore the number record in database has only included and has needed to carry out short message analysis
Number, if the number of short message sender does not just analyze it not in the number record of database, to return to step
Rapid 100 to handle next short message.
Step 300: the sender recorded according to the short message obtains the canonical table of the short message of the sender from data base querying
Up to formula.
Short message transmitted by each sender may have different types, and by taking bank as an example, the short message sent may
It is consumption short message, it is also possible to which account amount of money changes short message.But for the short message of each type, messaging format is usual
It is fixed, therefore can is the short message of each type, determines a regular expression corresponding with its format, these canonicals
Expression in advance with sender's associated storage in the database, thus step 300 can obtain the sender couple according to sender
All regular expressions answered.
Step 400: the regular expression obtained based on step 300 one by one matches short message to be analyzed, if do not had
There is a regular expression that can match the short message, then abandons the analysis to short message, return step 100;If finding a canonical
Expression formula can match the short message, then according to the regular expression, extract the key message in short message.
Three examples of key message are shown in attached drawing 1: first example is extracted from bank's consumption short message
Transaction card number, exchange hour, transaction amount and transaction details;Second example is from bank account balances inquiry return receipt short message
Transaction card number, exchange hour, transaction amount and the account balance of extraction;Third example is the order extracted from express delivery short message
Number, distribution time and picking number.These key messages are all to be matched by regular expression to messaging format, are extracted
's.
Step 500: by the short message extracted key message and its sender's associated storage in the database.
After key message is extracted, storage in the database, in this way it is subsequent can based on these information into
The further statistical analysis of row.
Step 600: it is for statistical analysis to the key message stored in database, and show statistic analysis result.
Since the key message extracted is all unified structured content, statistical analysis is carried out with regard to very easy,
Attached drawing 2 shows the example of a statistical graph, is statisticallyd analyze to bank's consumption details of every month, and by soft
Part showing interface statistical graph.The present invention does not make limitation, art technology to specific statistical analysis and diagrammatic representation method
Personnel are appreciated that the key message based on above-mentioned acquisition, can carry out various statistics and analysis processes as known in the art.
In said short message information extraction process, regular expression is the core tool for extracting short message, it reflects short message
Correct format.But messaging format may be with time change.On the one hand, short transmitted by each sender's number
Letter format is possible changed, on the other hand, it is possible that new sender's number (number including new sender,
Or the new digit of old sender), new number would generally bring new messaging format.Therefore, regular expression and its association
Sender, number needs update.
A kind of common update method is that network updates in the prior art, it can in surfing Internet with cell phone, on network
Server is updated to inquire and download new regular expression.But, on the one hand, network update needs to expend network flow, another
Aspect, sometimes mobile phone may and not have the environment that can be surfed the Internet.Therefore, the invention proposes a kind of new regular expressions
Formula and the update method of related sender, number, this method can be updated by short message, if mobile phone can receive it is short
Letter, so that it may carry out relevant updates.
Specific step is as follows for update:
(1) a update server is set, and each needs the mobile phone updated all to be stepped in advance in the update server
Note, register information includes phone number, and each needs all to store the digital certificate of the update server on the mobile phone updated.
(2) when needing more new information, update server will more new information and current date according to predefined format into
Row is packaged, and obtains more new packets.
Here more new information may be there are many situation, comprising: increases new regular expressions for some existing sender
Formula;Increase new number for some existing sender;Increase new sender and its number, regular expression.Either where
Kind more new information, can pre-define packing format, which is packaged according to the packing format.In order to can
To use short message to be updated, each more new packets are only packaged a more new information.More new information is also added in the current date
Packet is the timeliness and safety in order to guarantee more new packets, and a more new information is avoided to be repeated to send by people's malice.
(3) for the phone number P registered on updating server, the value that server calculates its mould 256 is updated, i.e.,
Calculate PM=P mod 256;Therefore the PMDigit be no more than 8, that is, be limited within a byte.By more new packets
Each byte and the PMExclusive or is carried out, to obtain the second more new packets after an exclusive or.
3 exclusive or process through the above steps gets up the image watermarking of more new packets, and for different hands
Machine number, the second more new packets obtained after exclusive or are different from, then further improve safety.In addition, institute in step 3
The P usedMIt is fixed and invariable for each phone number, therefore in practical applications can be with each P computed in advanceM。
(4) server is updated to be digitally signed the second more new packets using the private key of the digital certificate of oneself.
Why digital signature is used, is the safety in order to guarantee more new information, to prevent someone from pretending to be more new demand servicing
Device damages mobile phone safe, since to each phone number, the second more new packets are different from, then its digital signature also not phase
Together.
(5) update server to the described second more new packets carry out Base64 coding, using obtained coding result as
First short message is sent to phone number P.
(6) it updates server and Base64 coding is also carried out to the digital signature that step 4 obtains, using coding result as second
Short message is sent to phone number P.
(7) it updates server and step 3-6 is repeated to the phone number of each registration, so that the phone number of each registration
It can receive first short message and the second short message.
(8) after a mobile phone receives above-mentioned first short message and the second short message, if it find that the sender of this two short messages
Number is the number for updating server, then carries out Base64 decoding to two short messages, obtains the second more new packets sum number word label
Name.
(9) mobile phone is using the digital certificate for updating server, the second more new packets sum number word obtained to step 8
Signature carries out signature verification.If verifying does not pass through, ignores then two short messages, otherwise continue subsequent step;
Based on attack means such as present pseudo-base stations, it is insecure for updating server by number judgement merely, therefore this
Invention guarantees the reliability updated by digital signature authentication.
(10) mobile phone calculates P based on the phone number P of itselfM=P mod 256;Then step 8 is obtained second
The each byte and the P of more new packetsMExclusive or is carried out, to obtain original more new packets.The step for be actually
The inverse process of step 3.
(11) mobile phone parses the content of the more new packets using predefined format, obtains more new information and update
Date information in packet;Whether the difference for verifying the date information and current date is greater than predetermined threshold, if it does, explanation
The more new packets have been subjected to the phase, then ignore the more new packets;If it is not greater, then the update information update number based on acquisition
According to library.
On the one hand above-mentioned update method ensure that the safety of more new information, prevent from pretending to be update server, on the other hand
Without surfing Internet with cell phone, can be updated in the case where mobile phone is offline;So that short message of the invention is extracted with analysis method more
Add safe and reliable, timely.
The above description is only a preferred embodiment of the present invention, thus it is all according to the configuration described in the scope of the patent application of the present invention,
The equivalent change or modification that feature and principle are done, is included in the scope of the patent application of the present invention.
Claims (5)
1. a kind of short message information extracting and analysis method, which is characterized in that this method comprises the following steps:
Step 100: obtaining a short message record to be analyzed;
Step 200: according to the number record in database, the short message being recorded and carries out numbers match, determine that the short message records
Sender;If failing to be matched to number, which is not analyzed, return step 100;
Step 300: the sender recorded according to the short message obtains the regular expressions of the short message of the sender from data base querying
Formula;
Step 400: all regular expressions obtained based on step 300 one by one match short message to be analyzed, if do not had
There is a regular expression that can match the short message, then abandons the analysis to short message, return step 100;If finding a canonical
Expression formula can match the short message, then according to the regular expression, extract the key message in short message;
Step 500: by the short message extracted key message and its sender's associated storage in the database;
Step 600: it is for statistical analysis to the key message stored in database, and show statistic analysis result;
Wherein, also according to following step in database regular expression and related sender, number information be updated:
(1) a update server is set, and each needs the mobile phone updated all to be registered in advance in the update server, steps on
Remember that information includes phone number, each needs all to store the digital certificate of the update server on the mobile phone updated;
(2) it when needing more new information, updates server and is beaten more new information and current date according to predefined format
Packet obtains more new packets;
(3) it for the phone number P registered on updating server, updates server and calculates PM=P mod 256;It will more
The each byte and the P of new packetsMExclusive or is carried out, to obtain the second more new packets after an exclusive or;
(4) server is updated to be digitally signed the second more new packets using its digital certificate private key;
(5) it updates server and Base64 coding is carried out to the described second more new packets, using obtained coding result as first
Short message is sent to phone number P;
(6) it updates server and Base64 coding is carried out to the digital signature that step 4 obtains, sent out coding result as the second short message
Give phone number P;
(7) it updates server and step 3-6 is repeated to the phone number of each registration, to send the to the phone number of each registration
One short message and the second short message;
(8) after a mobile phone receives above-mentioned first short message and the second short message, if it find that sender's number of this two short messages
It is the number for updating server, then Base64 decoding is carried out to two short messages, obtains the second more new packets sum number word signature;
(9) mobile phone signs to the second more new packets sum number word that step 8 obtains using the digital certificate for updating server
Signature verification is carried out, if verifying does not pass through, ignores this two short messages, otherwise continues subsequent step;
(10) mobile phone calculates P based on the phone number P of itselfM=P mod 256;Then step 8 second is obtained to update
The each byte and the P of packetMExclusive or is carried out, to obtain original more new packets;
(11) mobile phone parses the content of the more new packets using predefined format, obtains more new information and more new information
Date information in packet;Whether the difference for verifying the date information and current date is greater than predetermined threshold, if it is greater, then ignoring this
More new packets;If it is not greater, then the update information update database based on acquisition.
2. short message information extracting according to claim 1 and analysis method, which is characterized in that be analyzed in step 100
Short message record obtain one of in the following manner:
A short message record is read from SMS database;
Using short message mechanism is intercepted, intercepts and obtain the new message that mobile phone receives.
3. short message information extracting according to claim 1 and analysis method, which is characterized in that the more new packets include with
One of lower three kinds of information:
Increase new regular expression for some existing sender;
Increase new number for some existing sender;
Increase new sender and its number, regular expression.
4. short message information extracting according to claim 1 and analysis method, which is characterized in that each more new packets are only wrapped
Include a more new information.
5. short message information extracting according to claim 1 and analysis method, which is characterized in that the PMBy update server
It precalculates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610744099.5A CN106331354B (en) | 2016-08-26 | 2016-08-26 | A kind of short message information extracting and analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610744099.5A CN106331354B (en) | 2016-08-26 | 2016-08-26 | A kind of short message information extracting and analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106331354A CN106331354A (en) | 2017-01-11 |
CN106331354B true CN106331354B (en) | 2019-06-04 |
Family
ID=57791443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610744099.5A Active CN106331354B (en) | 2016-08-26 | 2016-08-26 | A kind of short message information extracting and analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106331354B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106878155B (en) * | 2017-04-25 | 2020-07-10 | Oppo广东移动通信有限公司 | Method, system and mobile terminal for short message delay notification |
CN108616827B (en) * | 2018-04-27 | 2021-05-18 | 中国联合网络通信集团有限公司 | Short message data management method and device |
CN110267222A (en) * | 2019-05-24 | 2019-09-20 | 深圳壹账通智能科技有限公司 | The methods of exhibiting and device of short message bill |
CN111027285B (en) * | 2019-12-17 | 2023-06-16 | 南京上游软件有限公司 | Method and system for automatically extracting order information from pdf format order |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101807273A (en) * | 2010-03-25 | 2010-08-18 | 上海合合信息科技发展有限公司 | Method and system for performing financial management by extracting consumption information in credit card short message |
KR20120002703A (en) * | 2010-07-01 | 2012-01-09 | 삼성전자주식회사 | Method and device for searching data |
-
2016
- 2016-08-26 CN CN201610744099.5A patent/CN106331354B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101807273A (en) * | 2010-03-25 | 2010-08-18 | 上海合合信息科技发展有限公司 | Method and system for performing financial management by extracting consumption information in credit card short message |
KR20120002703A (en) * | 2010-07-01 | 2012-01-09 | 삼성전자주식회사 | Method and device for searching data |
Also Published As
Publication number | Publication date |
---|---|
CN106331354A (en) | 2017-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106331354B (en) | A kind of short message information extracting and analysis method | |
CN103067156B (en) | The URL encryption of mobile Internet user resources access, verification method and device | |
CN104516918B (en) | Maintaining method, device, server and the system of subscriber identity information | |
CN103618794B (en) | Method, terminal and the server of automated log on | |
CN103428163A (en) | Verification code based on image content | |
CN102323933A (en) | Information embedding and interaction system facing real-time communication and method | |
CN105631688A (en) | Anti-fake and anti-commodity-fleeing query method and system based on public platform | |
WO2016008413A1 (en) | Information providing method and client | |
TWI651013B (en) | Method and system for remotely processing SIM card | |
CN106341313A (en) | Method and apparatus for obtaining billing information | |
CN101674318A (en) | Method for pushing data to mobile equipment at regular time | |
CN103139761B (en) | The method and communication terminal of a kind of information real-time show | |
JP5973808B2 (en) | Information processing device, terminal device, information processing system, information processing method, and computer program | |
CN108694168A (en) | A kind of address processing method and processing device, computer installation and readable storage medium storing program for executing | |
CN103067892B (en) | Short message transmission method using watermark | |
CN107635028A (en) | A kind of naming method of resource, device, block scm cluster and electronic equipment | |
FI114425B (en) | Method and arrangement to verify the authenticity of a utility of value distributed as a digital message | |
CN106936807A (en) | A kind of recognition methods of malicious operation and device | |
CN102804732B (en) | The method protecting individual privacy information in the audience measurement of digit broadcasting system | |
US20080052155A1 (en) | Method and system of campaign management with code | |
CN107786661A (en) | Information synchronization method | |
US8712885B2 (en) | Method for assisting in the checking of transaction records, transaction device, server, mobile terminal, and corresponding computer programs | |
CN102611998A (en) | Method for realizing mass texting through transfer of mass texting platform client | |
CN109039995A (en) | A kind of invalid information recognition methods, device and equipment | |
CN109670763B (en) | Data processing method and system, terminal and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |