CN105227737B - The recognition methods of telephone number and device - Google Patents
The recognition methods of telephone number and device Download PDFInfo
- Publication number
- CN105227737B CN105227737B CN201510643027.7A CN201510643027A CN105227737B CN 105227737 B CN105227737 B CN 105227737B CN 201510643027 A CN201510643027 A CN 201510643027A CN 105227737 B CN105227737 B CN 105227737B
- Authority
- CN
- China
- Prior art keywords
- telephone number
- identified
- digit
- strings
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000001514 detection method Methods 0.000 claims abstract description 141
- 230000005611 electricity Effects 0.000 claims description 25
- 238000007781 pre-processing Methods 0.000 claims description 2
- 239000013589 supplement Substances 0.000 description 16
- 230000008901 benefit Effects 0.000 description 7
- 230000001502 supplementing effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 238000000205 computational method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Abstract
The invention provides a kind of recognition methods of telephone number and device.This method includes:From initial position, target telephone number strings to be identified are divided according to the division rule for meeting phone number format, obtain the first number series for specifying digit;Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;According to the cutting result, the number series progress completion that optimized detection digit specifies digit to described first is chosen from described at least two detection digits.Target telephone number strings to be identified are detected, identified, improve the accuracy of telephone number identification by the scheme that the embodiment of the present invention is judged using backward detection digit.
Description
Technical field
The present invention relates to technical field of internet application, the recognition methods of particularly a kind of telephone number and device.
Background technology
POI (Point of Interest), i.e. point of interest, it is the foundation stone of whole digital map navigation industry, especially when reach
Dynamic Internet era, map information data just become more indispensable.Substantial amounts of POI is included in magnanimity webpage, often
Individual POI includes the information such as title, address, longitude and latitude, telephone number, and the POI data levels of audit quality of different web pages is uneven, and
Important way of the phone as contact point of interest, its accuracy are to weigh the important indicator of a POI data quality.
Hundreds of millions of POIs is contained in magnanimity webpage, the presentation mode of telephone number is also complicated various, same
POI may include multiple landline telephones or mobile phone, and staggeredly be merged together.In addition, from internet
The POI of extraction there may be the data of substantial amounts of mistake, and POI telephone number is also in this way, and the telephone number of mistake
The injury in experience can be brought to user in application, so how to identify the telephone number in webpage POI exactly
As technical problem urgently to be resolved hurrily at present.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on
State the recognition methods of the telephone number of problem and corresponding device.
According to an aspect of of the present present invention, there is provided a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are carried out according to the division rule for meeting phone number format
Division, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen from described at least two detection digits to the described first finger
The number series for positioning number carries out completion.
Alternatively, it is described that each detection digit is respectively adopted to the progress cutting of target telephone number strings to be identified, obtain
To cutting result, including:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first
Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions
Number, as cutting result corresponding to the detection digit.
Alternatively, according to the cutting result, optimized detection digit is chosen to institute from described at least two detection digits
Stating first specifies the number series of digit to carry out completion, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
Alternatively, judging that described first specifies whether the number series of digit meets the attribute spy of first category telephone number
After sign, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, choose newly
The division rule for meeting phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies
The number series of digit;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then according to the attributive character of the second category telephone number, to the described second number series for specifying digit
Carry out completion.
Alternatively, from initial position, according to meeting the division rule of phone number format to target phone number to be identified
Sequence is divided, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, is handled
Target telephone number strings to be identified afterwards;
From initial position, according to meeting the division rule of phone number format to the target electricity to be identified after the processing
Words number series is divided.
Alternatively, target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format,
Target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to mesh described in the separator cutting
Telephone number strings to be identified are marked, obtain at least two targets telephone number strings to be identified after cutting.
Alternatively, the separator specified includes at least one following:It is pause mark, comma, branch, slash, back slash, perpendicular
Bar.
Alternatively, after at least two targets telephone number strings to be identified after obtaining cutting, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has
National area code;
If so, then remove the national area code on target telephone number strings head to be identified.
Alternatively, after the national area code on target telephone number strings head to be identified is removed, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement
This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground
Trivial number progress duplicate removal processing.
Alternatively, target telephone number strings to be identified are obtained by following steps:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
Alternatively, after specifying the number series of digit or the second specified digit to carry out completion to described first, also
Including:
If remaining telephone number strings to be identified be present, again perform pretreatment operation, division operation, judge operation,
It is determined that operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
According to another aspect of the present invention, a kind of identification device of telephone number is additionally provided, including:
Division module, it is to be identified to target according to the division rule for meeting phone number format suitable for from initial position
Telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the category of first category telephone number
Property feature;
Determining module, if judging that described first specifies the number series of digit to meet first category electricity suitable for the judge module
The attributive character of number is talked about, then according to the attributive character of the first category telephone number, determines at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted,
Obtain cutting result;
Completion module, suitable for according to the cutting result, optimized detection position is chosen from described at least two detection digits
Several number series that digit is specified to described first carry out completion.
Alternatively, the cutting module is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first
Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions
Number, as cutting result corresponding to the detection digit.
Alternatively, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
Alternatively, the division module, if being further adapted for the judge module judges that first specifies the number series of digit not to be inconsistent
The attributive character of first category telephone number is closed, then chooses the new division rule for meeting phone number format and the target is treated
Identification telephone number strings re-start division, obtain the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category phone number
The attributive character of code;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second
The attributive character of classification telephone number, then according to the attributive character of the second category telephone number, to second specific bit
Several number series carries out completion.
Alternatively, the division module includes:
Pretreatment unit, suitable for target telephone number strings to be identified are carried out with the pre- place related to phone number format
Reason operation, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, after the division rule of phone number format is met to the processing
Target telephone number strings to be identified divided.
Alternatively, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to former described in the separator cutting
Begin telephone number strings to be identified, obtains at least two targets telephone number strings to be identified after cutting.
Alternatively, the separator specified includes at least one following:It is pause mark, comma, branch, slash, back slash, perpendicular
Bar.
Alternatively, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified
Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
Alternatively, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code
Target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement
This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground
Trivial number progress duplicate removal processing.
Alternatively, described device also includes acquisition module, suitable for obtaining target phone to be identified by following steps
Number series:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
Alternatively, described device also includes:
Recurrence module, if suitable for remaining telephone number strings to be identified being present, triggering the pretreatment unit and holding again
Row pretreatment operation, the division module perform division operation again, the judge module performs judgement operation again, described true
Cover half block performs determination operation again, the cutting module performs slicing operation again and the completion module performs benefit again
Full operation, until remaining telephone number strings to be identified have all been identified.
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format
Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.)
Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified
Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes
Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit
Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely
Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected,
Identification, further increase the accuracy of telephone number identification.
In addition, the embodiment of the present invention meets the division rule of phone number format to target telephone number to be identified in basis
Before string is divided, the pretreatment related to phone number format can also be carried out to target telephone number strings to be identified and grasped
Make so that the target telephone number strings to be identified after pretreatment operation are consistent with phone number format, in order to subsequently based on pre-
Target telephone number strings to be identified after processing operation carry out the identification of telephone number, improve the discrimination of telephone number.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
According to the accompanying drawings will be brighter to the detailed description of the specific embodiment of the invention, those skilled in the art
Above-mentioned and other purposes, the advantages and features of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the flow chart of the recognition methods of telephone number according to an embodiment of the invention;
Fig. 2 shows the flow chart of the recognition methods of telephone number according to another embodiment of the present invention;
Fig. 3 shows the structural representation of the identification device of telephone number according to an embodiment of the invention;And
Fig. 4 shows the structural representation of the identification device of telephone number according to another embodiment of the present invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
In order to solve the above technical problems, the embodiments of the invention provide a kind of recognition methods of telephone number.Fig. 1 is shown
The flow chart of the recognition methods of telephone number according to an embodiment of the invention.Referring to Fig. 1, this method can at least include step
S102 to step S110.
Step S102, from initial position, according to meeting the division rule of phone number format to target phone to be identified
Number series is divided, and obtains the first number series for specifying digit.
Step S104, judge that first specifies whether the number series of digit meets the attributive character of first category telephone number,
If so, then continue executing with step S106.
Step S106, according to the attributive character of first category telephone number, determine at least two detection digits.
Step S108, each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting
As a result.
Step S110, according to cutting result, selection optimized detection digit is specified to first from least two detection digits
The number series of digit carries out completion.
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format
Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.)
Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified
Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes
Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit
Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely
Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected,
Identification, further increase the accuracy of telephone number identification.
The recognition methods of telephone number provided in an embodiment of the present invention can carry out effective to the telephone number in POI
Identification, i.e. before above step S102, target telephone number strings to be identified can be obtained first, specifically, can be from webpage
Middle acquisition POI, and then target telephone number strings to be identified are extracted from POI.
Phone information in webpage is broadly divided into mobile phone and landline telephone, using Chinese city, area, county's telephone number as
Example, mobile phone include 11, may determine that its correctness and affiliated area according to its first 7, here, mobile phone typically with
13rd, 14,15,17,18 or 19 beginning, mobile phone can be utilized to belong to the correctness of 7 and affiliated area before table judges;It is fixed
Phone is divided into 10 number telephones of the beginning of official 400 or 800, includes the electricity of the region of common 7 or 8 of 3 or 4 area codes
Words, 5 telephone numbers of special official (such as 10086,95522 etc.) and special 3 telephone numbers (such as 110,119,114
Deng), and landline telephone may include extension number.
Hundreds of millions of POIs is contained in magnanimity webpage, the presentation mode of telephone number is also complicated various, same
POI may include multiple landline telephones or mobile phone, and staggeredly be merged together.Table 1 lists some nets
Chinese city, area, the presentation mode of county's telephone number in page.The embodiment of the present invention is subsequently according to Chinese city mentioned above, area, county
The characteristics of telephone number, telephone number mixed and disorderly in webpage is identified.
It should be noted that the method for identification telephone number provided in an embodiment of the present invention can also combine other countries
The characteristics of telephone number, the telephone number of other countries is effectively identified.
Table 1
Telephone number | Explanation on telephone number |
400-890-0000 turns 805530 | Extension number is illustrated by Chinese character |
86-0877-70104577010457 | 86 are included before phone, and multiple telephone numbers are without separator |
0852-8719889 86 8719669 | There is national area code 86 among telephone number |
028-84876877,1380233318 | Mobile phone and landline telephone superposition, mobile phone are imperfect |
0771 0771 324579718602365784 | Regional area code repeats |
286990619869906199 | Regional area code lacks 0 |
0755-13651464541 | Regional area code is included before mobile phone |
Telephone number presentation mode complexity as can be seen from Table 1 in webpage is various, and the embodiment of the present invention is in order to improve electricity
The discrimination of number is talked about, can be first to the progress of target telephone number strings to be identified and telephone number in above step S102
The related pretreatment operation of form, the target telephone number strings to be identified after being handled, so that after pretreatment operation
Target telephone number strings to be identified are consistent as far as possible with phone number format.And then from initial position, according to meeting electricity
The division rule of words number format divides to the target telephone number strings to be identified after processing.
In embodiments of the present invention, target telephone number strings to be identified are carried out with the pretreatment related to phone number format
Operation, it can include according to the pre- cutting of separator, the identification of national area code and removal, the supplement of regional area code and duplicate removal etc..
First, pre-cut timesharing is being carried out according to separator, it may be determined that whether included in target telephone number strings to be identified
The separator specified, if comprising the separator specified in target telephone number strings to be identified, according to the separator cutting target
Telephone number strings to be identified, obtain at least two targets telephone number strings to be identified after cutting.If conversely, target electricity to be identified
The separator specified is not included in words number series, then without pre- slicing operation.Here, the separator specified can be pause mark
", ", comma, ", branch ";", slash "/", back slash " ", montant " | " etc., the invention is not restricted to this.
For example, in table 1 above target telephone number strings to be identified " 028-84876877,1380233318 ", it is determined that should
The separator (that is, comma, " specified is included in target telephone number strings to be identified), treated according to separator ", " the cutting target
Identify telephone number strings, obtain the telephone number strings to be identified of the target after cutting for " 028-84876877 " and
“1380233318”。
Secondly, the identification and removal of national area code.In existing telephone number, in order to distinguish the phone number of every country
Code, it will usually national area code is added before telephone number.By taking the telephone number of China as an example, it will usually add 86 before telephone number
To represent to distinguish, but in without transnational call, national area code does not have substantive use, thus it can be carried out
Removal is handled.
In embodiments of the present invention, after at least two targets telephone number strings to be identified after obtaining cutting, for
Whether each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has national area code, if
It is the national area code for then removing target telephone number strings head to be identified.If conversely, target telephone number strings to be identified
Head does not have national area code, then without going division operation.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out
Number series is talked about, then whether have national area code, if so, then removing if further determining that the head of target telephone number strings to be identified
The national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified does not have
There is national area code, then without going division operation.
In embodiments of the present invention, by taking Chinese area code 86 as an example, 86 common forms include+86,086,0086,86 etc.,
The embodiment of the present invention can judge whether 86 be Chinese area code according to remaining phone digit.For example, the target in table 1 above is treated
Telephone number strings " 86-0877-70104577010457 " are identified, judge 86 according to remaining phone digit for Chinese area code, then
Processing is removed to 86, the telephone number strings to be identified of the target after being handled are " 0877-70104577010457 ", here
Processing is also removed to the symbol "-" behind 86.
Furthermore supplement is carried out at trivial number over the ground and during duplicate removal, can wait to know to eliminating the target after national area code
Other telephone number strings are analyzed, if the head that analysis obtains target telephone number strings to be identified has regional area code and this area
Area code is imperfect, then supplementing this area's area code makes its complete;If analyzing the head for obtaining target telephone number strings to be identified has
Regional area code and the repetition of this area's area code, then carry out duplicate removal processing to this area's area code.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out
Number series is talked about, or in the step of national area code is identified and removed, the target for operation need not be removed
Telephone number strings to be identified, then further target telephone number strings to be identified are analyzed, if analysis obtains the target and treated
The head of identification telephone number strings has regional area code and this area's area code is imperfect, then supplementing this area's area code makes its complete;
If the head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code repeats, to this area
Area code carries out duplicate removal processing.
For example, the target telephone number strings to be identified " 286990619869906199 " in table 1 above, wait to know to the target
Other telephone number strings are analyzed, and obtaining the head of target telephone number strings to be identified has regional area code and this area's area code
Imperfect, then supplementing this area's area code makes it complete, the target telephone number strings to be identified after obtaining regional area code supplement completely
“0286990619869906199”。
For another example the target telephone number strings to be identified " 0,771 0,771 324579718602365784 " in table 1 above,
Target telephone number strings to be identified are analyzed, the head for obtaining target telephone number strings to be identified has regional area code
And this area's area code repeats, then duplicate removal processing is carried out to this area's area code, obtain the target electricity to be identified removably after trivial number
Talk about number series " 0,771 324579718602365784 ".
In embodiments of the present invention, the Chinese city shown in table 1 above, area, county's telephone number are grasped by pretreatment above
After work, the target telephone number strings to be identified after being handled, as shown in table 2.For pretreatment operation mentioned above, i.e.
It is of the invention and unlimited including according to the pre- cutting of separator, the identification of national area code and removal, the supplement of regional area code and duplicate removal etc.
The sequencing that they are performed is made, in practical operation, the sequencing of their execution can be set according to the actual requirements.Example
Such as, first the identification and removal of national area code are then carried out, the supplement of regional area code is then carried out and goes according to the pre- cutting of separator
Weight.And for example, the identification and removal of national area code are first carried out, the supplement and duplicate removal of regional area code are then carried out, then according to separation
Accord with pre- cutting.For another example first carrying out the identification and removal of national area code, then according to the pre- cutting of separator, area is then carried out
Supplement and duplicate removal of area code, etc..
Table 2
It should be noted that target telephone number strings to be identified are carried out and phone number format phase in the embodiment of the present invention
The pretreatment operation of pass, it is not limited to above-mentioned several pretreatment modes, in practical operation, the electricity of country variant can be combined
The characteristics of talking about number carries out corresponding pretreatment operation so that target telephone number strings to be identified and phone after pretreatment operation
Number format is consistent as far as possible, so as to improve the discrimination of telephone number.
Further, from initial position, the target after processing is waited to know according to the division rule for meeting phone number format
Other telephone number strings are divided, and are obtained the first number series for specifying digit, can be combined different classes of telephone number here
The characteristics of (such as landline telephone or mobile phone), choose corresponding division rule and divided.
By taking Chinese city, area, county's telephone number as an example, when selection meets the division rule of Mobile Directory Number form, by
11 are included in mobile phone, its correctness and affiliated area are may determine that according to its first 7, thus can be according to meeting movement
The division rule of phone number format divides to target telephone number strings to be identified, and it is 7 to obtain first and specify digit
Number series.
In addition, choose meet the division rule of fixed telephone number form when, due to landline telephone be divided into official 400 or
10 number telephones, common 7 or 85 electricity of region phone and special official comprising 3 or 4 area codes of 800 beginnings
Number is talked about, thus target telephone number strings to be identified can be drawn according to the division rule for meeting fixed telephone number form
Point, obtain first and specify the number series that digit is 3,4 or 5.
For example, the target extracted from POI telephone number strings to be identified are "+8613651464541,28-
84876877 ", the pretreatment operation related to phone number format is carried out to target telephone number to be identified, is followed successively by basis
The pre- cutting of separator, the identification of national area code and removal, the identification of regional area code and supplement, then the target electricity to be identified after handling
It is " 13651464541 " and " 028-84876877 " to talk about number series.Further, from initial position, according to meeting mobile phone
The division rule of number format divides to target telephone number strings to be identified " 13651464541 ", obtains the first specific bit
Number is the number series " 1365146 " of 7.Or from initial position, according to the division rule for meeting fixed telephone number form
Target telephone number strings to be identified " 028-84876877 " are divided, first is obtained and specifies the number series that digit is 3
“028”。
For another example in table 2 above, the target phone to be identified after the pretreatment operation related to phone number format is carried out
Number series is " 0286990619869906199 ", next from initial position, according to stroke for meeting fixed telephone number form
Divider then divides to target telephone number strings to be identified, and it is the number series " 028 " of 3 to obtain first and specify digit, and this
One number series for specifying digit to be 3 meets the attributive character of first category telephone number (that is, landline telephone).
It should be noted that the first specified digit listed above is 7, first category telephone number is mobile phone;
Or first specify digit be 3,4 or 5, first category telephone number is landline telephone, is according to Chinese city, area, county
The setting that the characteristics of telephone number is carried out, the identification for the telephone number of other countries can be with reference to the phone of other countries
The characteristics of number, specifies digit, first category telephone number to be set accordingly to first.
In step s 106, if first specifies the number series of digit to meet the attributive character of first category telephone number,
According to the attributive character of first category telephone number, at least two detection digits are determined.Afterwards, step S108 is respectively adopted each
Detect digit and cutting is carried out to target telephone number strings to be identified, obtain cutting result, can the embodiments of the invention provide one kind
The scheme of choosing, i.e. for each detection digit, using the detection digit to target telephone number strings to be identified, first specify
Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number, compare first
Cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, as corresponding to the detection digit
Cutting result.Then, in step s 110, number identical digit corresponding to more each detection digit, from each detection position
In number, that chooses corresponding number identical digit maximum is used as optimized detection digit, to the first number series completion for specifying digit
Optimized detection digit.
In the above example, identify first specify digit be the number series " 028 " of 3 corresponding to telephone number be solid
Determine phone, and the landline telephone is not due to being with 400 or 800 beginnings, it is determined that 7 and 8 two detection digits.
For the detection digit of 7, using the detection digit to target telephone number strings to be identified, the first specified digit
Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number
" 6990619 " and the second cutting number " 8699061 ", it is determined that number identical digit is 1 on both correspondence positions.
For the detection digit of 8, using the detection digit to target telephone number strings to be identified, the first specified digit
Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number
" 69906198 " and the second cutting number " 69906199 ", it is determined that number identical digit is 7 on both correspondence positions.
Then, from the detection digit of 7 and 8, that chooses corresponding number identical digit maximum is used as optimized detection
Digit, that is, the detection digit for choosing 8 specify number series " 028 " completion of digit optimal as optimized detection digit to first
The landline telephone that detection digit obtains is " 02869906198 ".Here, select this computational methods foundation occur from it is same
Two landline telephones or mobile phone in telephone unit have very big similitude.
Further, judge that first specifies whether the number series of digit meets first category telephone number in step S104
After attributive character, if first specifies the number series of digit not meet the attributive character of first category telephone number, it can select
Take the new division rule for meeting phone number format to re-start division to target telephone number strings to be identified, obtain the second finger
The number series of number is positioned, subsequently determines whether that second specifies whether the number series of digit meets the attribute spy of second category telephone number
Sign, if so, then specifying the number series of digit to carry out completion to second according to the attributive character of second category telephone number.
For example, the target extracted from POI telephone number strings to be identified are "+8613651464541,28-
84876877 ", the pretreatment operation related to phone number format is carried out to target telephone number to be identified, such as deletes country
Area code, the target telephone number strings to be identified after being handled are " 13651464541,28-84876877 ".Further, from first
Beginning position rise, target telephone number strings to be identified are divided according to the division rule for meeting fixed telephone number form, obtained
The number series " 136 " that digit is 3 is specified to first, the number series " 136 " of the first specified digit does not meet first category electricity
The attributive character of number (that is, landline telephone) is talked about, then the division rule that can choose Mobile Directory Number form is waited to know to target
Other telephone number strings re-start division, obtain second and specify the number series " 1365146 " that digit is 7, second specific bit
Number is that the number series " 1365146 " of 7 meets the attributive character of second category telephone number (that is, mobile phone), according to second
The attributive character of classification telephone number, the number series of digit is specified to carry out completion to second, obtain completion second specifies digit
Number series corresponding to telephone number " 13651464541 ".
It is 7 that listed above first, which specifies digit, and first category telephone number is mobile phone, and the second specified digit is
3,4 or 5, second category telephone number is landline telephone;Or first specify digit be 3,4 or 5, first
Classification telephone number is landline telephone, and second specifies digit, and for 7, second category telephone number is mobile phone, is in
The characteristics of city of state, area, county's telephone number carry out setting, it is necessary to explanation, the identification for the telephone number of other countries,
The characteristics of telephone number of other countries can be combined, specifies digit, first category telephone number, second to specify digit to first
And second category telephone number is set accordingly.
In embodiments of the present invention, electricity corresponding to the first number series for specifying digit or the second specified digit is obtained in completion
After talking about number, completion can be exported and obtain telephone number corresponding to the first number series for specifying digit or the second specified digit.
For example, identifying landline telephone from target telephone number strings to be identified " 0286990619869906199 "
After " 02869906198 ", landline telephone " 02869906198 " can be exported.
Further, for remaining telephone number strings " 69906199 " to be identified, then need to perform again pretreatment operation,
Judge operation, determine operation, slicing operation and completion operation, until remaining telephone number strings to be identified are all identified
It is complete.That is, completion area area code " 028 " first, obtains target telephone number strings to be identified " 02869906199 ".Then, from initial
Position is risen, according to meeting the division rule of fixed telephone number form to target telephone number strings to be identified " 02869906199 "
Divided, obtain first and specify digit to be the number series " 028 " of 3, and then identify that first specifies the number that digit is 3
Telephone number corresponding to string is landline telephone " 02869906199 ".
For another example in table 2 above, target telephone number strings to be identified are " 400-890-0000 turns 805530 ", from initial
Position is risen, according to the division rule for meeting fixed telephone number form telephone number strings " 400-890-0000 to be identified to target
Turn 805530 " to be divided, obtain first and specify digit to be the number series " 400 " of 3, and then can be identified according to step S108
It is landline telephone " 400-890-0000 " to go out first to specify telephone number corresponding to the number series that digit is 3.For remaining
Telephone number strings to be identified " turn 805530 " to identify as extension number, are then added to landline telephone " 400-890-0000 " end
Tail, obtain " 400-890-0000 turns 805530 ".
The implementation process of the recognition methods of telephone number provided by the invention is discussed in detail below by a specific embodiment,
In this embodiment, by taking Chinese city, area, county's telephone number as an example, POI is obtained from webpage, and extracted from POI
Target telephone number strings to be identified.Fig. 2 shows the flow of the recognition methods of telephone number according to another embodiment of the present invention
Figure.Referring to Fig. 2, this method can at least include step S202 to step S216.
Step S202, to target telephone number strings to be identified, pre-cut office reason is carried out according to separator.
In this step, it may be determined that whether comprising the separator specified in target telephone number strings to be identified, if target
Comprising the separator specified in telephone number strings to be identified, then according to the separator cutting target telephone number strings to be identified, obtain
At least two targets telephone number strings to be identified after to cutting.If refer to conversely, not including in target telephone number strings to be identified
Fixed separator, then without pre- slicing operation.Here, the separator specified can be pause mark ", ", comma, ", branch ";”、
Slash "/", back slash " ", montant " | " etc., the invention is not restricted to this.
For example, in table 1 above target telephone number strings to be identified " 028-84876877,1380233318 ", it is determined that should
The separator (that is, comma, " specified is included in target telephone number strings to be identified), treated according to separator ", " the cutting target
Identify telephone number strings, obtain the telephone number strings to be identified of the target after cutting for " 028-84876877 " and
“1380233318”。
Step S204, remove beginning 86.
In this step, after at least two targets telephone number strings to be identified after obtaining cutting, for each mesh
Telephone number strings to be identified are marked, whether have national area code, if so, then going if determining the head of target telephone number strings to be identified
Except the national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified is not
With national area code, then without going division operation.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out
Number series is talked about, then whether have national area code, if so, then removing if further determining that the head of target telephone number strings to be identified
The national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified does not have
There is national area code, then without going division operation.
By taking Chinese area code 86 as an example, 86 common forms include+86,086,0086,86 etc., and the embodiment of the present invention can root
Judge whether 86 be Chinese area code according to remaining phone digit.For example, the target telephone number strings " 86- to be identified in table 1 above
0877-70104577010457 ", 86 are judged for Chinese area code according to remaining phone digit, then processing is removed to 86, is obtained
Target telephone number strings to be identified after to processing are " 0877-70104577010457 ", here to the symbol "-" behind 86
It is removed processing.
Step S206, regional area code supplement and duplicate removal.
In this step, can analyze eliminating the target telephone number strings to be identified after national area code, if
The head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code is imperfect, then supplements this area
Area code makes its complete;If analyzing the head for obtaining target telephone number strings to be identified has regional area code and this area's area code weight
It is multiple, then duplicate removal processing is carried out to this area's area code.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out
Number series is talked about, or in the step of national area code is identified and removed, the target for operation need not be removed
Telephone number strings to be identified, then further target telephone number strings to be identified are analyzed, if analysis obtains the target and treated
The head of identification telephone number strings has regional area code and this area's area code is imperfect, then supplementing this area's area code makes its complete;
If the head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code repeats, to this area
Area code carries out duplicate removal processing.
For example, the target telephone number strings to be identified " 286990619869906199 " in table 1 above, wait to know to the target
Other telephone number strings are analyzed, and obtaining the head of target telephone number strings to be identified has regional area code and this area's area code
Imperfect, then supplementing this area's area code makes it complete, the target telephone number strings to be identified after obtaining regional area code supplement completely
“0286990619869906199”。
For another example the target telephone number strings to be identified " 07710771324579718602365784 " in table 1 above, right
Target telephone number strings to be identified are analyzed, obtain the head of target telephone number strings to be identified have regional area code and
This area area code repeats, then carries out duplicate removal processing to this area's area code, obtains the target phone to be identified removably after trivial number
Number series " 0771324579718602365784 ".
Step S208, mobile phone is determine whether according to first 7 of target telephone number strings to be identified, if it is not, then after
It is continuous to perform step S210, if so, continuing executing with step S212.
In this step, choose and meet the division rule of Mobile Directory Number form target telephone number strings to be identified are entered
Row division, it is the number series of 7 to obtain first and specify digit, judges whether the first specified digit meets for the number series of 7
The attributive character of one classification telephone number (that is, mobile phone), if so, then according to first category telephone number (that is, mobile electricity
Words) attributive character, specify the number series that digit is 7 to carry out completion to first, it is the number of 7 to obtain first and specify digit
Telephone number corresponding to string (that is, mobile phone).
Step S210, the backward digit that detects judge.
In this step, if the first number series for specifying digit to be 7 is unsatisfactory for first category phone number in step S208
The attributive character of code (that is, mobile phone), then the division rule for meeting fixed telephone number form is chosen to target electricity to be identified
Words number series re-starts division, obtains second and specifies digit to be the number series of 3,4 or 5, and then judges that second specifies
Digit is whether the number series of 3,4 or 5 meets the attributive character of second category telephone number (that is, landline telephone), if
It is that, then according to the attributive character of second category telephone number (that is, landline telephone), it is 3,4 or 5 that digit is specified to second
Number series carry out completion, it is that telephone number is (that is, fixed corresponding to the number series of 3,4 or 5 to obtain second and specify digit
Phone).
For example, in table 2 above, pre-processed to target telephone number strings to be identified " 286990619869906199 "
After operation, it is " 0286990619869906199 " to obtain target telephone number strings to be identified, next from initial position, root
Target telephone number strings to be identified are divided according to the division rule for meeting Mobile Directory Number form, obtain the first specific bit
The number series that number is 7 is " 0286990 ", and first number series for specifying digit to be 7 is unsatisfactory for first category telephone number
The attributive character of (that is, mobile phone), then the division rule for meeting fixed telephone number form is chosen to target phone to be identified
Number series re-starts division, and it is " 028 " to obtain second and specify the number series that digit is 3, identifies that second specifies digit as 3
Telephone number corresponding to the number series " 028 " of position is landline telephone, respectively " 0286990619 " of 7 or 8
“02869906198”。
In above example, is identified from target telephone number strings to be identified " 0286990619869906199 "
Telephone number corresponding to the number series that two specified digits are 3 is landline telephone, respectively " 0286990619 " or 8 of 7
" 02869906198 " of position.In order to choose suitable completion position, the discrimination of telephone number is improved, the embodiment of the present invention is in root
According to the attributive character of second category telephone number, when specifying the number of digit to carry out completion to second, there is provided a kind of backward to visit
The scheme that location number judges, i.e. at least two detection digits can be determined according to the attributive character of second category telephone number,
Each detection digit is then respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result.Afterwards, according to
Cutting result, the number series progress completion that optimized detection digit specifies digit to second is chosen from least two detection digits.
Further, for each detection digit, using the detection digit to target telephone number strings to be identified, the second finger
Position the telephone number strings after the number series of number and carry out cutting, obtain the first cutting number and the second cutting number, compare the
All branch codes and the second cutting number, it is determined that number identical digit on both correspondence positions, corresponding as the detection digit
Cutting result.Then, number identical digit corresponding to more each detection digit, from each detection digit, selection pair
That answers number identical digit maximum is used as optimized detection digit, to the second number series completion optimized detection position for specifying digit
Number.
In the above example, identify second specify digit be the number series " 028 " of 3 corresponding to telephone number be solid
Determine phone, respectively " 0286990619 " of 7 or " 02869906198 " of 8, in order to choose suitable completion position, really
Fixed 7 and 8 two detection digits.
For the detection digit of 7, using the detection digit to target telephone number strings to be identified, the second specified digit
Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number
" 6990619 " and the second cutting number " 8699061 ", it is determined that number identical digit is 1 on both correspondence positions.
For the detection digit of 8, using the detection digit to target telephone number strings to be identified, the second specified digit
Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number
" 69906198 " and the second cutting number " 69906199 ", it is determined that number identical digit is 7 on both correspondence positions.
Then, from the detection digit of 7 and 8, that chooses corresponding number identical digit maximum is used as optimized detection
Digit, that is, the detection digit for choosing 8 specify number series " 028 " completion of digit optimal as optimized detection digit to second
The landline telephone that detection digit obtains is " 02869906198 ".Here, select this computational methods foundation occur from it is same
Two landline telephones or mobile phone in telephone unit have very big similitude.
Step S212, judges whether mistake, if it is not, step S214 is then continued executing with, if so, then terminating this flow.
In this step, it can be determined that whether first specify telephone number corresponding to the number series that digit is 7 accurate, such as
Whether lack digit or whether be spacing etc..The telephone number that detection digit judges to obtain backward in S210, which can also be judged, is
It is no accurate.
Step S214, export telephone number.
Step S216, judges whether the length of remaining telephone number strings is more than 0, and step S204 is performed if so, then returning,
If it is not, then terminate this flow.
In embodiments of the present invention, target telephone number strings to be identified are carried out first related to phone number format pre-
Processing operation (being followed successively by the supplement and duplicate removal according to the pre- cutting of separator, the identification of national area code and removal, regional area code), makes
The target telephone number strings to be identified obtained after pretreatment operation are consistent with phone number format, in order to subsequently based on pretreatment behaviour
Target telephone number strings to be identified after work carry out the identification of telephone number, improve the discrimination of telephone number.Further, this hair
Bright embodiment combines the feature that different classes of telephone number (landline telephone and mobile phone) has, using different classes of electricity
The division rule of phone number format divides to target telephone number strings to be identified corresponding to words number, is obtained according to division
First specify the number series of digit to identify the classification of its corresponding telephone number, realize to different classes of telephone number
Effectively identification.Further, the embodiment of the present invention has very with reference to two landline telephones in same telephone unit or mobile phone
The characteristics of big similitude, the scheme judged using backward detection digit, target telephone number strings to be identified are detected, known
Not, the accuracy of telephone number identification is further increased.Further, the embodiment of the present invention is for remaining telephone number strings,
It is identified using recursive mode, until remaining telephone number strings have all been identified.
It is real based on same inventive concept, the present invention based on the recognition methods of the telephone number that each embodiment provides above
Apply example and additionally provide a kind of identification device of telephone number, Fig. 3 shows the knowledge of telephone number according to an embodiment of the invention
The structural representation of other device.As shown in figure 3, the device can at least include division module 310, judge module 320, determine mould
Block 330, cutting module 340 and completion module 350.
Now introduce the function and each several part of each composition or device of the identification device of the telephone number of the embodiment of the present invention
Between annexation:
Division module 310, suitable for from initial position, waiting to know to target according to the division rule for meeting phone number format
Other telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module 320, it is coupled with division module 310, suitable for judging that first specifies whether the number series of digit meets
The attributive character of first category telephone number;
Determining module 330, it is coupled with judge module 320, if judging the first number for specifying digit suitable for judge module
String meets the attributive character of first category telephone number, then according to the attributive character of first category telephone number, it is determined that at least two
Individual detection digit;
Cutting module 340, it is coupled with determining module 330, it is to be identified to target suitable for each detection digit is respectively adopted
Telephone number strings carry out cutting, obtain cutting result;
Completion module 350, it is coupled with cutting module 340, suitable for according to cutting result, being detected from least two in digits
Choose optimized detection digit specifies the number series of digit to carry out completion to first.
In an embodiment of the present invention, cutting module 340 is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, the first specified digit
Number series after telephone number strings carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, as
Cutting result corresponding to the detection digit.
In an embodiment of the present invention, completion module 350 is further adapted for:
Number identical digit corresponding to more each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To the first number series completion optimized detection digit for specifying digit.
In an embodiment of the present invention, if division module 310 is further adapted for judge module and judges the first number for specifying digit
String does not meet the attributive character of first category telephone number, then chooses the new division rule for meeting phone number format to target
Telephone number strings to be identified re-start division, obtain the second number series for specifying digit;
Judge module 320 is further adapted for judging that second specifies whether the number series of digit meets the category of second category telephone number
Property feature;
If completion module 350, which is further adapted for judge module, judges that second specifies the number series of digit to meet second category phone number
The attributive character of code, then according to the attributive character of second category telephone number, the number series of digit is specified to carry out completion to second.
In an embodiment of the present invention, division module 310 includes:
Pretreatment unit, grasped suitable for carrying out the pretreatment related to phone number format to target telephone number strings to be identified
Make, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, according to meeting the division rule of phone number format to the mesh after processing
Telephone number strings to be identified are marked to be divided.
In an embodiment of the present invention, pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
It is if original to be identified according to the separator cutting comprising the separator specified in target telephone number strings to be identified
Telephone number strings, obtain at least two targets telephone number strings to be identified after cutting.
In an embodiment of the present invention, the separator specified includes at least one following:Pause mark, comma, branch, slash,
Back slash, montant.
In an embodiment of the present invention, pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified
Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
In an embodiment of the present invention, pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code
Target telephone number strings to be identified;
If the head of target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented
Trivial number makes its complete;
If the head of target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area
Number carry out duplicate removal processing.
In an embodiment of the present invention, as shown in figure 4, the device of Fig. 3 displayings can also include acquisition module 360, with drawing
Sub-module 310 is coupled, suitable for obtaining target telephone number strings to be identified by following steps:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from POI.
In an embodiment of the present invention, as shown in figure 4, the device of Fig. 3 displayings can also include:
Recurrence module 370, being coupled with completion module 350, if suitable for remaining telephone number strings to be identified be present, touching
Hair pretreatment unit performs that pretreatment operation, division module perform division operation again, judge module performs judgement again again
Operation, determining module perform determination operation again, cutting module performs slicing operation again and completion module performs benefit again
Full operation, until remaining telephone number strings to be identified have all been identified.
According to the combination of any one above-mentioned preferred embodiment or multiple preferred embodiments, the embodiment of the present invention can reach
Following beneficial effect:
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format
Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.)
Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified
Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes
Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit
Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely
Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected,
Identification, further increase the accuracy of telephone number identification.
In addition, the embodiment of the present invention meets the division rule of phone number format to target telephone number to be identified in basis
Before string is divided, the pretreatment related to phone number format can also be carried out to target telephone number strings to be identified and grasped
Make so that the target telephone number strings to be identified after pretreatment operation are consistent with phone number format, in order to subsequently based on pre-
Target telephone number strings to be identified after processing operation carry out the identification of telephone number, improve the discrimination of telephone number.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any
Mode it can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice
Microprocessor or digital signal processor (DSP) are realized in the identification device of telephone number according to embodiments of the present invention
The some or all functions of some or all parts.The present invention is also implemented as being used to perform method as described herein
Some or all equipment or program of device (for example, computer program and computer program product).Such reality
The program of the existing present invention can store on a computer-readable medium, or can have the form of one or more signal.
Such signal can be downloaded from internet website and obtained, and either be provided or in the form of any other on carrier signal
There is provided.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention
Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention
It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized
It is set to and covers other all these variations or modifications.
The embodiment of the invention also discloses:A1, a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are carried out according to the division rule for meeting phone number format
Division, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen from described at least two detection digits to the described first finger
The number series for positioning number carries out completion.
A2, the method according to A1, wherein, it is described that each detection digit is respectively adopted to target phone to be identified
Number series carries out cutting, obtains cutting result, including:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first
Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions
Number, as cutting result corresponding to the detection digit.
A3, the method according to A1 or A2, wherein, according to the cutting result, from described at least two detection digits
Middle optimized detection digit of choosing specifies the number series of digit to carry out completion to described first, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
A4, the method according to any one of A1-A3, wherein, judge described first specify digit number series whether
After the attributive character for meeting first category telephone number, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, choose newly
The division rule for meeting phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies
The number series of digit;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then according to the attributive character of the second category telephone number, to the described second number series for specifying digit
Carry out completion.
A5, the method according to any one of A1-A4, wherein, from initial position, according to meeting phone number format
Division rule divides to target telephone number strings to be identified, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, is handled
Target telephone number strings to be identified afterwards;
From initial position, according to meeting the division rule of phone number format to the target electricity to be identified after the processing
Words number series is divided.
A6, the method according to any one of A1-A5, wherein, target telephone number strings to be identified are carried out and electricity
The related pretreatment operation of words number format, the target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to mesh described in the separator cutting
Telephone number strings to be identified are marked, obtain at least two targets telephone number strings to be identified after cutting.
A7, the method according to any one of A1-A6, wherein, the separator specified includes at least one following:
Number, comma, branch, slash, back slash, montant.
A8, the method according to any one of A1-A7, wherein, at least two targets electricity to be identified after cutting is obtained
After talking about number series, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has
National area code;
If so, then remove the national area code on target telephone number strings head to be identified.
A9, the method according to any one of A1-A8, wherein, removing target telephone number strings head to be identified
After national area code, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement
This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground
Trivial number progress duplicate removal processing.
A10, the method according to any one of A1-A9, wherein, obtain target phone to be identified by following steps
Number series:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
A11, the method according to any one of A1-A10, wherein, digit or second finger are being specified to described first
After the number series progress completion for positioning number, in addition to:
If remaining telephone number strings to be identified be present, again perform pretreatment operation, division operation, judge operation,
It is determined that operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
B12, a kind of identification device of telephone number, including:
Division module, it is to be identified to target according to the division rule for meeting phone number format suitable for from initial position
Telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the category of first category telephone number
Property feature;
Determining module, if judging that described first specifies the number series of digit to meet first category electricity suitable for the judge module
The attributive character of number is talked about, then according to the attributive character of the first category telephone number, determines at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted,
Obtain cutting result;
Completion module, suitable for according to the cutting result, optimized detection position is chosen from described at least two detection digits
Several number series that digit is specified to described first carry out completion.
B13, the device according to B12, wherein, the cutting module is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first
Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions
Number, as cutting result corresponding to the detection digit.
B14, the device according to B12 or B13, wherein, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
B15, the device according to any one of B12-B14, wherein,
The division module, if being further adapted for the judge module judges that first specifies the number series of digit not meet the first kind
The attributive character of other telephone number, then the new division rule for meeting phone number format is chosen to target phone to be identified
Number series re-starts division, obtains the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category phone number
The attributive character of code;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second
The attributive character of classification telephone number, then according to the attributive character of the second category telephone number, to second specific bit
Several number series carries out completion.
B16, the device according to any one of B12-B15, wherein, the division module includes:
Pretreatment unit, suitable for target telephone number strings to be identified are carried out with the pre- place related to phone number format
Reason operation, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, after the division rule of phone number format is met to the processing
Target telephone number strings to be identified divided.
B17, the device according to any one of B12-B16, wherein, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to former described in the separator cutting
Begin telephone number strings to be identified, obtains at least two targets telephone number strings to be identified after cutting.
B18, the device according to any one of B12-B17, wherein, the separator specified include it is following at least it
One:Pause mark, comma, branch, slash, back slash, montant.
B19, the device according to any one of B12-B18, wherein, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified
Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
B20, the device according to any one of B12-B19, wherein, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code
Target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement
This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground
Trivial number progress duplicate removal processing.
B21, the device according to any one of B12-B20, wherein, in addition to acquisition module, suitable for passing through following steps
Obtain target telephone number strings to be identified:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
B22, the device according to any one of B12-B21, wherein, in addition to:
Recurrence module, if suitable for remaining telephone number strings to be identified being present, triggering the pretreatment unit and holding again
Row pretreatment operation, the division module perform division operation again, the judge module performs judgement operation again, described true
Cover half block performs determination operation again, the cutting module performs slicing operation again and the completion module performs benefit again
Full operation, until remaining telephone number strings to be identified have all been identified.
Claims (22)
1. a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are drawn according to the division rule for meeting phone number format
Point, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen to first specific bit from described at least two detection digits
Several number series carries out completion.
2. the method according to claim 11, wherein, it is described that each detection digit is respectively adopted to target electricity to be identified
Talk about number series and carry out cutting, obtain cutting result, including:
For each detection digit, specified using the detection digit to target telephone number strings to be identified, described first
Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions,
As cutting result corresponding to the detection digit.
3. method according to claim 1 or 2, wherein, according to the cutting result, from described at least two detection digits
Middle optimized detection digit of choosing specifies the number series of digit to carry out completion to described first, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
4. method according to claim 1 or 2, wherein, judging that described first specifies whether the number series of digit meets
After the attributive character of first category telephone number, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, new meet is chosen
The division rule of phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies digit
Number series;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then the number series of digit is specified to carry out to described second according to the attributive character of the second category telephone number
Completion.
5. method according to claim 1 or 2, wherein, from initial position, according to the division for meeting phone number format
Rule divides to target telephone number strings to be identified, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, after being handled
Target telephone number strings to be identified;
From initial position, according to meeting the division rule of phone number format to the target phone number to be identified after the processing
Sequence is divided.
6. according to the method for claim 5, wherein, target telephone number strings to be identified are carried out and telephone number lattice
The related pretreatment operation of formula, the target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If treated in the target telephone number strings to be identified comprising the separator specified according to target described in the separator cutting
Telephone number strings are identified, obtain at least two targets telephone number strings to be identified after cutting.
7. according to the method for claim 6, wherein, the separator specified includes at least one following:Pause mark, tease
Number, branch, slash, back slash, montant.
8. the method according to claim 6 or 7, wherein, at least two targets phone number to be identified after cutting is obtained
After sequence, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has country
Area code;
If so, then remove the national area code on target telephone number strings head to be identified.
9. according to the method for claim 8, wherein, removing the national area code on target telephone number strings head to be identified
Afterwards, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented
Trivial number makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area
Number carry out duplicate removal processing.
10. method according to claim 1 or 2, wherein, obtain target telephone number to be identified by following steps
String:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
11. according to the method for claim 4, wherein, digit is being specified to the described first specified digit or described second
After number series carries out completion, in addition to:
If remaining telephone number strings to be identified be present, pretreatment operation, division operation are performed again, operation is judged, determines
Operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
12. a kind of identification device of telephone number, including:
Division module, suitable for from initial position, according to meeting the division rule of phone number format to target phone to be identified
Number series is divided, and obtains the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the attribute spy of first category telephone number
Sign;
Determining module, if judging that described first specifies the number series of digit to meet first category phone number suitable for the judge module
The attributive character of code, then according to the attributive character of the first category telephone number, determine at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted, obtained
Cutting result;
Completion module, suitable for according to the cutting result, optimized detection digit pair is chosen from described at least two detection digits
Described first specifies the number series of digit to carry out completion.
13. device according to claim 12, wherein, the cutting module is further adapted for:
For each detection digit, specified using the detection digit to target telephone number strings to be identified, described first
Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions,
As cutting result corresponding to the detection digit.
14. the device according to claim 12 or 13, wherein, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
15. the device according to claim 12 or 13, wherein,
The division module, if being further adapted for the judge module judges that first specifies the number series of digit not meet first category electricity
The attributive character of number is talked about, then chooses the new division rule for meeting phone number format to target telephone number to be identified
String re-starts division, obtains the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category telephone number
Attributive character;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second category
The attributive character of telephone number, then according to the attributive character of the second category telephone number, digit is specified to described second
Number series carries out completion.
16. the device according to claim 12 or 13, wherein, the division module includes:
Pretreatment unit, grasped suitable for carrying out the pretreatment related to phone number format to target telephone number strings to be identified
Make, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, according to meeting the division rule of phone number format to the mesh after the processing
Telephone number strings to be identified are marked to be divided.
17. device according to claim 16, wherein, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If treated in the target telephone number strings to be identified comprising the separator specified according to target described in the separator cutting
Telephone number strings are identified, obtain at least two targets telephone number strings to be identified after cutting.
18. device according to claim 17, wherein, the separator specified includes at least one following:Pause mark, tease
Number, branch, slash, back slash, montant.
19. the device according to claim 17 or 18, wherein, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target telephone number to be identified
Whether string, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
20. device according to claim 19, wherein, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis eliminates the institute after national area code
State target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented
Trivial number makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area
Number carry out duplicate removal processing.
21. the device according to claim 12 or 13, wherein, in addition to acquisition module, suitable for being obtained by following steps
The target telephone number strings to be identified:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
22. device according to claim 16, wherein, in addition to:
Recurrence module, if suitable for remaining telephone number strings to be identified be present, trigger the pretreatment unit perform again it is pre-
Processing operation, the division module perform division operation again, the judge module performs judgement operation, the determination mould again
Block performs determination operation again, the cutting module performs slicing operation again and the completion module performs completion behaviour again
Make, until remaining telephone number strings to be identified have all been identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510643027.7A CN105227737B (en) | 2015-09-30 | 2015-09-30 | The recognition methods of telephone number and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510643027.7A CN105227737B (en) | 2015-09-30 | 2015-09-30 | The recognition methods of telephone number and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105227737A CN105227737A (en) | 2016-01-06 |
CN105227737B true CN105227737B (en) | 2018-01-05 |
Family
ID=54996405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510643027.7A Active CN105227737B (en) | 2015-09-30 | 2015-09-30 | The recognition methods of telephone number and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105227737B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109246623B (en) * | 2018-08-31 | 2020-05-22 | 长沙炫笔记通信科技有限公司 | Communication number completion method, device and storage medium |
CN111866207B (en) * | 2020-06-29 | 2022-11-22 | 厦门亿联网络技术股份有限公司 | Audio and video conference system number distribution method and system |
CN112003988A (en) * | 2020-08-05 | 2020-11-27 | 云南电网有限责任公司红河供电局 | Device and method for identifying number accuracy |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102088697A (en) * | 2010-12-17 | 2011-06-08 | 北京华中融合科技有限公司 | Method and system for processing spam |
CN104731977A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Phone number data search and classification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4163138B2 (en) * | 2004-04-05 | 2008-10-08 | 松下電器産業株式会社 | Mobile phone equipment |
-
2015
- 2015-09-30 CN CN201510643027.7A patent/CN105227737B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102088697A (en) * | 2010-12-17 | 2011-06-08 | 北京华中融合科技有限公司 | Method and system for processing spam |
CN104731977A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Phone number data search and classification method |
Also Published As
Publication number | Publication date |
---|---|
CN105227737A (en) | 2016-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102021057B1 (en) | Apparatus and method for extracting paragraph in document | |
CN103902702B (en) | A kind of data-storage system and storage method | |
CN105227737B (en) | The recognition methods of telephone number and device | |
CN108491388B (en) | Data set acquisition method, classification method, device, equipment and storage medium | |
RU2016107443A (en) | METHOD AND DEVICE FOR RECOMMENDING REFERENCE DOCUMENTS | |
CN103559313B (en) | Searching method and device | |
CN105095381B (en) | New word identification method and device | |
CN108170293A (en) | Input the personalized recommendation method and device of association | |
Termritthikun et al. | NU-InNet: Thai food image recognition using convolutional neural networks on smartphone | |
CN109147769B (en) | Language identification method, language identification device, translation machine, medium and equipment | |
CN104778159B (en) | Word segmenting method and device based on word weights | |
WO2016034062A1 (en) | Information lookup method and device | |
CN105260440B (en) | Identify the method and device of telephone number | |
CN112364014A (en) | Data query method, device, server and storage medium | |
CN108780047A (en) | The detection method and relevant apparatus and computer readable storage medium of material composition | |
CN105187600B (en) | Recognition methods based on recursive telephone number and device | |
CN107608965B (en) | Extracting method, electronic equipment and the storage medium of books the names of protagonists | |
CN109670153A (en) | A kind of determination method, apparatus, storage medium and the terminal of similar model | |
CN106569734B (en) | The restorative procedure and device that memory overflows when data are shuffled | |
JP2010020530A (en) | Document classification providing device, document classification providing method and program | |
CN104317903B (en) | The recognition methods of the chapters and sections integrality of chapters and sections formula text and device | |
CN106919601B (en) | Method and device for extracting interest points from query words | |
CN109033210A (en) | A kind of method and apparatus for excavating map point of interest POI | |
CN107577667A (en) | A kind of entity word treating method and apparatus | |
CN108154177B (en) | Service identification method, device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220715 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |
|
TR01 | Transfer of patent right |