CN105227737B - The recognition methods of telephone number and device - Google Patents

The recognition methods of telephone number and device Download PDF

Info

Publication number
CN105227737B
CN105227737B CN201510643027.7A CN201510643027A CN105227737B CN 105227737 B CN105227737 B CN 105227737B CN 201510643027 A CN201510643027 A CN 201510643027A CN 105227737 B CN105227737 B CN 105227737B
Authority
CN
China
Prior art keywords
telephone number
identified
digit
strings
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510643027.7A
Other languages
Chinese (zh)
Other versions
CN105227737A (en
Inventor
马健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510643027.7A priority Critical patent/CN105227737B/en
Publication of CN105227737A publication Critical patent/CN105227737A/en
Application granted granted Critical
Publication of CN105227737B publication Critical patent/CN105227737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of recognition methods of telephone number and device.This method includes:From initial position, target telephone number strings to be identified are divided according to the division rule for meeting phone number format, obtain the first number series for specifying digit;Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;According to the cutting result, the number series progress completion that optimized detection digit specifies digit to described first is chosen from described at least two detection digits.Target telephone number strings to be identified are detected, identified, improve the accuracy of telephone number identification by the scheme that the embodiment of the present invention is judged using backward detection digit.

Description

The recognition methods of telephone number and device
Technical field
The present invention relates to technical field of internet application, the recognition methods of particularly a kind of telephone number and device.
Background technology
POI (Point of Interest), i.e. point of interest, it is the foundation stone of whole digital map navigation industry, especially when reach Dynamic Internet era, map information data just become more indispensable.Substantial amounts of POI is included in magnanimity webpage, often Individual POI includes the information such as title, address, longitude and latitude, telephone number, and the POI data levels of audit quality of different web pages is uneven, and Important way of the phone as contact point of interest, its accuracy are to weigh the important indicator of a POI data quality.
Hundreds of millions of POIs is contained in magnanimity webpage, the presentation mode of telephone number is also complicated various, same POI may include multiple landline telephones or mobile phone, and staggeredly be merged together.In addition, from internet The POI of extraction there may be the data of substantial amounts of mistake, and POI telephone number is also in this way, and the telephone number of mistake The injury in experience can be brought to user in application, so how to identify the telephone number in webpage POI exactly As technical problem urgently to be resolved hurrily at present.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State the recognition methods of the telephone number of problem and corresponding device.
According to an aspect of of the present present invention, there is provided a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are carried out according to the division rule for meeting phone number format Division, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen from described at least two detection digits to the described first finger The number series for positioning number carries out completion.
Alternatively, it is described that each detection digit is respectively adopted to the progress cutting of target telephone number strings to be identified, obtain To cutting result, including:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions Number, as cutting result corresponding to the detection digit.
Alternatively, according to the cutting result, optimized detection digit is chosen to institute from described at least two detection digits Stating first specifies the number series of digit to carry out completion, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
Alternatively, judging that described first specifies whether the number series of digit meets the attribute spy of first category telephone number After sign, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, choose newly The division rule for meeting phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies The number series of digit;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then according to the attributive character of the second category telephone number, to the described second number series for specifying digit Carry out completion.
Alternatively, from initial position, according to meeting the division rule of phone number format to target phone number to be identified Sequence is divided, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, is handled Target telephone number strings to be identified afterwards;
From initial position, according to meeting the division rule of phone number format to the target electricity to be identified after the processing Words number series is divided.
Alternatively, target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, Target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to mesh described in the separator cutting Telephone number strings to be identified are marked, obtain at least two targets telephone number strings to be identified after cutting.
Alternatively, the separator specified includes at least one following:It is pause mark, comma, branch, slash, back slash, perpendicular Bar.
Alternatively, after at least two targets telephone number strings to be identified after obtaining cutting, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has National area code;
If so, then remove the national area code on target telephone number strings head to be identified.
Alternatively, after the national area code on target telephone number strings head to be identified is removed, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground Trivial number progress duplicate removal processing.
Alternatively, target telephone number strings to be identified are obtained by following steps:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
Alternatively, after specifying the number series of digit or the second specified digit to carry out completion to described first, also Including:
If remaining telephone number strings to be identified be present, again perform pretreatment operation, division operation, judge operation, It is determined that operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
According to another aspect of the present invention, a kind of identification device of telephone number is additionally provided, including:
Division module, it is to be identified to target according to the division rule for meeting phone number format suitable for from initial position Telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the category of first category telephone number Property feature;
Determining module, if judging that described first specifies the number series of digit to meet first category electricity suitable for the judge module The attributive character of number is talked about, then according to the attributive character of the first category telephone number, determines at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted, Obtain cutting result;
Completion module, suitable for according to the cutting result, optimized detection position is chosen from described at least two detection digits Several number series that digit is specified to described first carry out completion.
Alternatively, the cutting module is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions Number, as cutting result corresponding to the detection digit.
Alternatively, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
Alternatively, the division module, if being further adapted for the judge module judges that first specifies the number series of digit not to be inconsistent The attributive character of first category telephone number is closed, then chooses the new division rule for meeting phone number format and the target is treated Identification telephone number strings re-start division, obtain the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category phone number The attributive character of code;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second The attributive character of classification telephone number, then according to the attributive character of the second category telephone number, to second specific bit Several number series carries out completion.
Alternatively, the division module includes:
Pretreatment unit, suitable for target telephone number strings to be identified are carried out with the pre- place related to phone number format Reason operation, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, after the division rule of phone number format is met to the processing Target telephone number strings to be identified divided.
Alternatively, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to former described in the separator cutting Begin telephone number strings to be identified, obtains at least two targets telephone number strings to be identified after cutting.
Alternatively, the separator specified includes at least one following:It is pause mark, comma, branch, slash, back slash, perpendicular Bar.
Alternatively, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
Alternatively, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code Target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground Trivial number progress duplicate removal processing.
Alternatively, described device also includes acquisition module, suitable for obtaining target phone to be identified by following steps Number series:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
Alternatively, described device also includes:
Recurrence module, if suitable for remaining telephone number strings to be identified being present, triggering the pretreatment unit and holding again Row pretreatment operation, the division module perform division operation again, the judge module performs judgement operation again, described true Cover half block performs determination operation again, the cutting module performs slicing operation again and the completion module performs benefit again Full operation, until remaining telephone number strings to be identified have all been identified.
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.) Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected, Identification, further increase the accuracy of telephone number identification.
In addition, the embodiment of the present invention meets the division rule of phone number format to target telephone number to be identified in basis Before string is divided, the pretreatment related to phone number format can also be carried out to target telephone number strings to be identified and grasped Make so that the target telephone number strings to be identified after pretreatment operation are consistent with phone number format, in order to subsequently based on pre- Target telephone number strings to be identified after processing operation carry out the identification of telephone number, improve the discrimination of telephone number.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
According to the accompanying drawings will be brighter to the detailed description of the specific embodiment of the invention, those skilled in the art Above-mentioned and other purposes, the advantages and features of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the flow chart of the recognition methods of telephone number according to an embodiment of the invention;
Fig. 2 shows the flow chart of the recognition methods of telephone number according to another embodiment of the present invention;
Fig. 3 shows the structural representation of the identification device of telephone number according to an embodiment of the invention;And
Fig. 4 shows the structural representation of the identification device of telephone number according to another embodiment of the present invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
In order to solve the above technical problems, the embodiments of the invention provide a kind of recognition methods of telephone number.Fig. 1 is shown The flow chart of the recognition methods of telephone number according to an embodiment of the invention.Referring to Fig. 1, this method can at least include step S102 to step S110.
Step S102, from initial position, according to meeting the division rule of phone number format to target phone to be identified Number series is divided, and obtains the first number series for specifying digit.
Step S104, judge that first specifies whether the number series of digit meets the attributive character of first category telephone number, If so, then continue executing with step S106.
Step S106, according to the attributive character of first category telephone number, determine at least two detection digits.
Step S108, each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting As a result.
Step S110, according to cutting result, selection optimized detection digit is specified to first from least two detection digits The number series of digit carries out completion.
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.) Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected, Identification, further increase the accuracy of telephone number identification.
The recognition methods of telephone number provided in an embodiment of the present invention can carry out effective to the telephone number in POI Identification, i.e. before above step S102, target telephone number strings to be identified can be obtained first, specifically, can be from webpage Middle acquisition POI, and then target telephone number strings to be identified are extracted from POI.
Phone information in webpage is broadly divided into mobile phone and landline telephone, using Chinese city, area, county's telephone number as Example, mobile phone include 11, may determine that its correctness and affiliated area according to its first 7, here, mobile phone typically with 13rd, 14,15,17,18 or 19 beginning, mobile phone can be utilized to belong to the correctness of 7 and affiliated area before table judges;It is fixed Phone is divided into 10 number telephones of the beginning of official 400 or 800, includes the electricity of the region of common 7 or 8 of 3 or 4 area codes Words, 5 telephone numbers of special official (such as 10086,95522 etc.) and special 3 telephone numbers (such as 110,119,114 Deng), and landline telephone may include extension number.
Hundreds of millions of POIs is contained in magnanimity webpage, the presentation mode of telephone number is also complicated various, same POI may include multiple landline telephones or mobile phone, and staggeredly be merged together.Table 1 lists some nets Chinese city, area, the presentation mode of county's telephone number in page.The embodiment of the present invention is subsequently according to Chinese city mentioned above, area, county The characteristics of telephone number, telephone number mixed and disorderly in webpage is identified.
It should be noted that the method for identification telephone number provided in an embodiment of the present invention can also combine other countries The characteristics of telephone number, the telephone number of other countries is effectively identified.
Table 1
Telephone number Explanation on telephone number
400-890-0000 turns 805530 Extension number is illustrated by Chinese character
86-0877-70104577010457 86 are included before phone, and multiple telephone numbers are without separator
0852-8719889 86 8719669 There is national area code 86 among telephone number
028-84876877,1380233318 Mobile phone and landline telephone superposition, mobile phone are imperfect
0771 0771 324579718602365784 Regional area code repeats
286990619869906199 Regional area code lacks 0
0755-13651464541 Regional area code is included before mobile phone
Telephone number presentation mode complexity as can be seen from Table 1 in webpage is various, and the embodiment of the present invention is in order to improve electricity The discrimination of number is talked about, can be first to the progress of target telephone number strings to be identified and telephone number in above step S102 The related pretreatment operation of form, the target telephone number strings to be identified after being handled, so that after pretreatment operation Target telephone number strings to be identified are consistent as far as possible with phone number format.And then from initial position, according to meeting electricity The division rule of words number format divides to the target telephone number strings to be identified after processing.
In embodiments of the present invention, target telephone number strings to be identified are carried out with the pretreatment related to phone number format Operation, it can include according to the pre- cutting of separator, the identification of national area code and removal, the supplement of regional area code and duplicate removal etc..
First, pre-cut timesharing is being carried out according to separator, it may be determined that whether included in target telephone number strings to be identified The separator specified, if comprising the separator specified in target telephone number strings to be identified, according to the separator cutting target Telephone number strings to be identified, obtain at least two targets telephone number strings to be identified after cutting.If conversely, target electricity to be identified The separator specified is not included in words number series, then without pre- slicing operation.Here, the separator specified can be pause mark ", ", comma, ", branch ";", slash "/", back slash " ", montant " | " etc., the invention is not restricted to this.
For example, in table 1 above target telephone number strings to be identified " 028-84876877,1380233318 ", it is determined that should The separator (that is, comma, " specified is included in target telephone number strings to be identified), treated according to separator ", " the cutting target Identify telephone number strings, obtain the telephone number strings to be identified of the target after cutting for " 028-84876877 " and “1380233318”。
Secondly, the identification and removal of national area code.In existing telephone number, in order to distinguish the phone number of every country Code, it will usually national area code is added before telephone number.By taking the telephone number of China as an example, it will usually add 86 before telephone number To represent to distinguish, but in without transnational call, national area code does not have substantive use, thus it can be carried out Removal is handled.
In embodiments of the present invention, after at least two targets telephone number strings to be identified after obtaining cutting, for Whether each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has national area code, if It is the national area code for then removing target telephone number strings head to be identified.If conversely, target telephone number strings to be identified Head does not have national area code, then without going division operation.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out Number series is talked about, then whether have national area code, if so, then removing if further determining that the head of target telephone number strings to be identified The national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified does not have There is national area code, then without going division operation.
In embodiments of the present invention, by taking Chinese area code 86 as an example, 86 common forms include+86,086,0086,86 etc., The embodiment of the present invention can judge whether 86 be Chinese area code according to remaining phone digit.For example, the target in table 1 above is treated Telephone number strings " 86-0877-70104577010457 " are identified, judge 86 according to remaining phone digit for Chinese area code, then Processing is removed to 86, the telephone number strings to be identified of the target after being handled are " 0877-70104577010457 ", here Processing is also removed to the symbol "-" behind 86.
Furthermore supplement is carried out at trivial number over the ground and during duplicate removal, can wait to know to eliminating the target after national area code Other telephone number strings are analyzed, if the head that analysis obtains target telephone number strings to be identified has regional area code and this area Area code is imperfect, then supplementing this area's area code makes its complete;If analyzing the head for obtaining target telephone number strings to be identified has Regional area code and the repetition of this area's area code, then carry out duplicate removal processing to this area's area code.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out Number series is talked about, or in the step of national area code is identified and removed, the target for operation need not be removed Telephone number strings to be identified, then further target telephone number strings to be identified are analyzed, if analysis obtains the target and treated The head of identification telephone number strings has regional area code and this area's area code is imperfect, then supplementing this area's area code makes its complete; If the head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code repeats, to this area Area code carries out duplicate removal processing.
For example, the target telephone number strings to be identified " 286990619869906199 " in table 1 above, wait to know to the target Other telephone number strings are analyzed, and obtaining the head of target telephone number strings to be identified has regional area code and this area's area code Imperfect, then supplementing this area's area code makes it complete, the target telephone number strings to be identified after obtaining regional area code supplement completely “0286990619869906199”。
For another example the target telephone number strings to be identified " 0,771 0,771 324579718602365784 " in table 1 above, Target telephone number strings to be identified are analyzed, the head for obtaining target telephone number strings to be identified has regional area code And this area's area code repeats, then duplicate removal processing is carried out to this area's area code, obtain the target electricity to be identified removably after trivial number Talk about number series " 0,771 324579718602365784 ".
In embodiments of the present invention, the Chinese city shown in table 1 above, area, county's telephone number are grasped by pretreatment above After work, the target telephone number strings to be identified after being handled, as shown in table 2.For pretreatment operation mentioned above, i.e. It is of the invention and unlimited including according to the pre- cutting of separator, the identification of national area code and removal, the supplement of regional area code and duplicate removal etc. The sequencing that they are performed is made, in practical operation, the sequencing of their execution can be set according to the actual requirements.Example Such as, first the identification and removal of national area code are then carried out, the supplement of regional area code is then carried out and goes according to the pre- cutting of separator Weight.And for example, the identification and removal of national area code are first carried out, the supplement and duplicate removal of regional area code are then carried out, then according to separation Accord with pre- cutting.For another example first carrying out the identification and removal of national area code, then according to the pre- cutting of separator, area is then carried out Supplement and duplicate removal of area code, etc..
Table 2
It should be noted that target telephone number strings to be identified are carried out and phone number format phase in the embodiment of the present invention The pretreatment operation of pass, it is not limited to above-mentioned several pretreatment modes, in practical operation, the electricity of country variant can be combined The characteristics of talking about number carries out corresponding pretreatment operation so that target telephone number strings to be identified and phone after pretreatment operation Number format is consistent as far as possible, so as to improve the discrimination of telephone number.
Further, from initial position, the target after processing is waited to know according to the division rule for meeting phone number format Other telephone number strings are divided, and are obtained the first number series for specifying digit, can be combined different classes of telephone number here The characteristics of (such as landline telephone or mobile phone), choose corresponding division rule and divided.
By taking Chinese city, area, county's telephone number as an example, when selection meets the division rule of Mobile Directory Number form, by 11 are included in mobile phone, its correctness and affiliated area are may determine that according to its first 7, thus can be according to meeting movement The division rule of phone number format divides to target telephone number strings to be identified, and it is 7 to obtain first and specify digit Number series.
In addition, choose meet the division rule of fixed telephone number form when, due to landline telephone be divided into official 400 or 10 number telephones, common 7 or 85 electricity of region phone and special official comprising 3 or 4 area codes of 800 beginnings Number is talked about, thus target telephone number strings to be identified can be drawn according to the division rule for meeting fixed telephone number form Point, obtain first and specify the number series that digit is 3,4 or 5.
For example, the target extracted from POI telephone number strings to be identified are "+8613651464541,28- 84876877 ", the pretreatment operation related to phone number format is carried out to target telephone number to be identified, is followed successively by basis The pre- cutting of separator, the identification of national area code and removal, the identification of regional area code and supplement, then the target electricity to be identified after handling It is " 13651464541 " and " 028-84876877 " to talk about number series.Further, from initial position, according to meeting mobile phone The division rule of number format divides to target telephone number strings to be identified " 13651464541 ", obtains the first specific bit Number is the number series " 1365146 " of 7.Or from initial position, according to the division rule for meeting fixed telephone number form Target telephone number strings to be identified " 028-84876877 " are divided, first is obtained and specifies the number series that digit is 3 “028”。
For another example in table 2 above, the target phone to be identified after the pretreatment operation related to phone number format is carried out Number series is " 0286990619869906199 ", next from initial position, according to stroke for meeting fixed telephone number form Divider then divides to target telephone number strings to be identified, and it is the number series " 028 " of 3 to obtain first and specify digit, and this One number series for specifying digit to be 3 meets the attributive character of first category telephone number (that is, landline telephone).
It should be noted that the first specified digit listed above is 7, first category telephone number is mobile phone; Or first specify digit be 3,4 or 5, first category telephone number is landline telephone, is according to Chinese city, area, county The setting that the characteristics of telephone number is carried out, the identification for the telephone number of other countries can be with reference to the phone of other countries The characteristics of number, specifies digit, first category telephone number to be set accordingly to first.
In step s 106, if first specifies the number series of digit to meet the attributive character of first category telephone number, According to the attributive character of first category telephone number, at least two detection digits are determined.Afterwards, step S108 is respectively adopted each Detect digit and cutting is carried out to target telephone number strings to be identified, obtain cutting result, can the embodiments of the invention provide one kind The scheme of choosing, i.e. for each detection digit, using the detection digit to target telephone number strings to be identified, first specify Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number, compare first Cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, as corresponding to the detection digit Cutting result.Then, in step s 110, number identical digit corresponding to more each detection digit, from each detection position In number, that chooses corresponding number identical digit maximum is used as optimized detection digit, to the first number series completion for specifying digit Optimized detection digit.
In the above example, identify first specify digit be the number series " 028 " of 3 corresponding to telephone number be solid Determine phone, and the landline telephone is not due to being with 400 or 800 beginnings, it is determined that 7 and 8 two detection digits.
For the detection digit of 7, using the detection digit to target telephone number strings to be identified, the first specified digit Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number " 6990619 " and the second cutting number " 8699061 ", it is determined that number identical digit is 1 on both correspondence positions.
For the detection digit of 8, using the detection digit to target telephone number strings to be identified, the first specified digit Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number " 69906198 " and the second cutting number " 69906199 ", it is determined that number identical digit is 7 on both correspondence positions.
Then, from the detection digit of 7 and 8, that chooses corresponding number identical digit maximum is used as optimized detection Digit, that is, the detection digit for choosing 8 specify number series " 028 " completion of digit optimal as optimized detection digit to first The landline telephone that detection digit obtains is " 02869906198 ".Here, select this computational methods foundation occur from it is same Two landline telephones or mobile phone in telephone unit have very big similitude.
Further, judge that first specifies whether the number series of digit meets first category telephone number in step S104 After attributive character, if first specifies the number series of digit not meet the attributive character of first category telephone number, it can select Take the new division rule for meeting phone number format to re-start division to target telephone number strings to be identified, obtain the second finger The number series of number is positioned, subsequently determines whether that second specifies whether the number series of digit meets the attribute spy of second category telephone number Sign, if so, then specifying the number series of digit to carry out completion to second according to the attributive character of second category telephone number.
For example, the target extracted from POI telephone number strings to be identified are "+8613651464541,28- 84876877 ", the pretreatment operation related to phone number format is carried out to target telephone number to be identified, such as deletes country Area code, the target telephone number strings to be identified after being handled are " 13651464541,28-84876877 ".Further, from first Beginning position rise, target telephone number strings to be identified are divided according to the division rule for meeting fixed telephone number form, obtained The number series " 136 " that digit is 3 is specified to first, the number series " 136 " of the first specified digit does not meet first category electricity The attributive character of number (that is, landline telephone) is talked about, then the division rule that can choose Mobile Directory Number form is waited to know to target Other telephone number strings re-start division, obtain second and specify the number series " 1365146 " that digit is 7, second specific bit Number is that the number series " 1365146 " of 7 meets the attributive character of second category telephone number (that is, mobile phone), according to second The attributive character of classification telephone number, the number series of digit is specified to carry out completion to second, obtain completion second specifies digit Number series corresponding to telephone number " 13651464541 ".
It is 7 that listed above first, which specifies digit, and first category telephone number is mobile phone, and the second specified digit is 3,4 or 5, second category telephone number is landline telephone;Or first specify digit be 3,4 or 5, first Classification telephone number is landline telephone, and second specifies digit, and for 7, second category telephone number is mobile phone, is in The characteristics of city of state, area, county's telephone number carry out setting, it is necessary to explanation, the identification for the telephone number of other countries, The characteristics of telephone number of other countries can be combined, specifies digit, first category telephone number, second to specify digit to first And second category telephone number is set accordingly.
In embodiments of the present invention, electricity corresponding to the first number series for specifying digit or the second specified digit is obtained in completion After talking about number, completion can be exported and obtain telephone number corresponding to the first number series for specifying digit or the second specified digit. For example, identifying landline telephone from target telephone number strings to be identified " 0286990619869906199 " After " 02869906198 ", landline telephone " 02869906198 " can be exported.
Further, for remaining telephone number strings " 69906199 " to be identified, then need to perform again pretreatment operation, Judge operation, determine operation, slicing operation and completion operation, until remaining telephone number strings to be identified are all identified It is complete.That is, completion area area code " 028 " first, obtains target telephone number strings to be identified " 02869906199 ".Then, from initial Position is risen, according to meeting the division rule of fixed telephone number form to target telephone number strings to be identified " 02869906199 " Divided, obtain first and specify digit to be the number series " 028 " of 3, and then identify that first specifies the number that digit is 3 Telephone number corresponding to string is landline telephone " 02869906199 ".
For another example in table 2 above, target telephone number strings to be identified are " 400-890-0000 turns 805530 ", from initial Position is risen, according to the division rule for meeting fixed telephone number form telephone number strings " 400-890-0000 to be identified to target Turn 805530 " to be divided, obtain first and specify digit to be the number series " 400 " of 3, and then can be identified according to step S108 It is landline telephone " 400-890-0000 " to go out first to specify telephone number corresponding to the number series that digit is 3.For remaining Telephone number strings to be identified " turn 805530 " to identify as extension number, are then added to landline telephone " 400-890-0000 " end Tail, obtain " 400-890-0000 turns 805530 ".
The implementation process of the recognition methods of telephone number provided by the invention is discussed in detail below by a specific embodiment, In this embodiment, by taking Chinese city, area, county's telephone number as an example, POI is obtained from webpage, and extracted from POI Target telephone number strings to be identified.Fig. 2 shows the flow of the recognition methods of telephone number according to another embodiment of the present invention Figure.Referring to Fig. 2, this method can at least include step S202 to step S216.
Step S202, to target telephone number strings to be identified, pre-cut office reason is carried out according to separator.
In this step, it may be determined that whether comprising the separator specified in target telephone number strings to be identified, if target Comprising the separator specified in telephone number strings to be identified, then according to the separator cutting target telephone number strings to be identified, obtain At least two targets telephone number strings to be identified after to cutting.If refer to conversely, not including in target telephone number strings to be identified Fixed separator, then without pre- slicing operation.Here, the separator specified can be pause mark ", ", comma, ", branch ";”、 Slash "/", back slash " ", montant " | " etc., the invention is not restricted to this.
For example, in table 1 above target telephone number strings to be identified " 028-84876877,1380233318 ", it is determined that should The separator (that is, comma, " specified is included in target telephone number strings to be identified), treated according to separator ", " the cutting target Identify telephone number strings, obtain the telephone number strings to be identified of the target after cutting for " 028-84876877 " and “1380233318”。
Step S204, remove beginning 86.
In this step, after at least two targets telephone number strings to be identified after obtaining cutting, for each mesh Telephone number strings to be identified are marked, whether have national area code, if so, then going if determining the head of target telephone number strings to be identified Except the national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified is not With national area code, then without going division operation.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out Number series is talked about, then whether have national area code, if so, then removing if further determining that the head of target telephone number strings to be identified The national area code on target telephone number strings head to be identified.If conversely, the head of target telephone number strings to be identified does not have There is national area code, then without going division operation.
By taking Chinese area code 86 as an example, 86 common forms include+86,086,0086,86 etc., and the embodiment of the present invention can root Judge whether 86 be Chinese area code according to remaining phone digit.For example, the target telephone number strings " 86- to be identified in table 1 above 0877-70104577010457 ", 86 are judged for Chinese area code according to remaining phone digit, then processing is removed to 86, is obtained Target telephone number strings to be identified after to processing are " 0877-70104577010457 ", here to the symbol "-" behind 86 It is removed processing.
Step S206, regional area code supplement and duplicate removal.
In this step, can analyze eliminating the target telephone number strings to be identified after national area code, if The head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code is imperfect, then supplements this area Area code makes its complete;If analyzing the head for obtaining target telephone number strings to be identified has regional area code and this area's area code weight It is multiple, then duplicate removal processing is carried out to this area's area code.
In the step of carrying out pre- cutting according to separator, the target electricity to be identified for pre- slicing operation need not be carried out Number series is talked about, or in the step of national area code is identified and removed, the target for operation need not be removed Telephone number strings to be identified, then further target telephone number strings to be identified are analyzed, if analysis obtains the target and treated The head of identification telephone number strings has regional area code and this area's area code is imperfect, then supplementing this area's area code makes its complete; If the head that analysis obtains target telephone number strings to be identified has regional area code and this area's area code repeats, to this area Area code carries out duplicate removal processing.
For example, the target telephone number strings to be identified " 286990619869906199 " in table 1 above, wait to know to the target Other telephone number strings are analyzed, and obtaining the head of target telephone number strings to be identified has regional area code and this area's area code Imperfect, then supplementing this area's area code makes it complete, the target telephone number strings to be identified after obtaining regional area code supplement completely “0286990619869906199”。
For another example the target telephone number strings to be identified " 07710771324579718602365784 " in table 1 above, right Target telephone number strings to be identified are analyzed, obtain the head of target telephone number strings to be identified have regional area code and This area area code repeats, then carries out duplicate removal processing to this area's area code, obtains the target phone to be identified removably after trivial number Number series " 0771324579718602365784 ".
Step S208, mobile phone is determine whether according to first 7 of target telephone number strings to be identified, if it is not, then after It is continuous to perform step S210, if so, continuing executing with step S212.
In this step, choose and meet the division rule of Mobile Directory Number form target telephone number strings to be identified are entered Row division, it is the number series of 7 to obtain first and specify digit, judges whether the first specified digit meets for the number series of 7 The attributive character of one classification telephone number (that is, mobile phone), if so, then according to first category telephone number (that is, mobile electricity Words) attributive character, specify the number series that digit is 7 to carry out completion to first, it is the number of 7 to obtain first and specify digit Telephone number corresponding to string (that is, mobile phone).
Step S210, the backward digit that detects judge.
In this step, if the first number series for specifying digit to be 7 is unsatisfactory for first category phone number in step S208 The attributive character of code (that is, mobile phone), then the division rule for meeting fixed telephone number form is chosen to target electricity to be identified Words number series re-starts division, obtains second and specifies digit to be the number series of 3,4 or 5, and then judges that second specifies Digit is whether the number series of 3,4 or 5 meets the attributive character of second category telephone number (that is, landline telephone), if It is that, then according to the attributive character of second category telephone number (that is, landline telephone), it is 3,4 or 5 that digit is specified to second Number series carry out completion, it is that telephone number is (that is, fixed corresponding to the number series of 3,4 or 5 to obtain second and specify digit Phone).
For example, in table 2 above, pre-processed to target telephone number strings to be identified " 286990619869906199 " After operation, it is " 0286990619869906199 " to obtain target telephone number strings to be identified, next from initial position, root Target telephone number strings to be identified are divided according to the division rule for meeting Mobile Directory Number form, obtain the first specific bit The number series that number is 7 is " 0286990 ", and first number series for specifying digit to be 7 is unsatisfactory for first category telephone number The attributive character of (that is, mobile phone), then the division rule for meeting fixed telephone number form is chosen to target phone to be identified Number series re-starts division, and it is " 028 " to obtain second and specify the number series that digit is 3, identifies that second specifies digit as 3 Telephone number corresponding to the number series " 028 " of position is landline telephone, respectively " 0286990619 " of 7 or 8 “02869906198”。
In above example, is identified from target telephone number strings to be identified " 0286990619869906199 " Telephone number corresponding to the number series that two specified digits are 3 is landline telephone, respectively " 0286990619 " or 8 of 7 " 02869906198 " of position.In order to choose suitable completion position, the discrimination of telephone number is improved, the embodiment of the present invention is in root According to the attributive character of second category telephone number, when specifying the number of digit to carry out completion to second, there is provided a kind of backward to visit The scheme that location number judges, i.e. at least two detection digits can be determined according to the attributive character of second category telephone number, Each detection digit is then respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result.Afterwards, according to Cutting result, the number series progress completion that optimized detection digit specifies digit to second is chosen from least two detection digits.
Further, for each detection digit, using the detection digit to target telephone number strings to be identified, the second finger Position the telephone number strings after the number series of number and carry out cutting, obtain the first cutting number and the second cutting number, compare the All branch codes and the second cutting number, it is determined that number identical digit on both correspondence positions, corresponding as the detection digit Cutting result.Then, number identical digit corresponding to more each detection digit, from each detection digit, selection pair That answers number identical digit maximum is used as optimized detection digit, to the second number series completion optimized detection position for specifying digit Number.
In the above example, identify second specify digit be the number series " 028 " of 3 corresponding to telephone number be solid Determine phone, respectively " 0286990619 " of 7 or " 02869906198 " of 8, in order to choose suitable completion position, really Fixed 7 and 8 two detection digits.
For the detection digit of 7, using the detection digit to target telephone number strings to be identified, the second specified digit Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number " 6990619 " and the second cutting number " 8699061 ", it is determined that number identical digit is 1 on both correspondence positions.
For the detection digit of 8, using the detection digit to target telephone number strings to be identified, the second specified digit Number series after telephone number strings (that is, 6990619869906199) carry out cutting, obtain the first cutting number " 69906198 " and the second cutting number " 69906199 ", it is determined that number identical digit is 7 on both correspondence positions.
Then, from the detection digit of 7 and 8, that chooses corresponding number identical digit maximum is used as optimized detection Digit, that is, the detection digit for choosing 8 specify number series " 028 " completion of digit optimal as optimized detection digit to second The landline telephone that detection digit obtains is " 02869906198 ".Here, select this computational methods foundation occur from it is same Two landline telephones or mobile phone in telephone unit have very big similitude.
Step S212, judges whether mistake, if it is not, step S214 is then continued executing with, if so, then terminating this flow.
In this step, it can be determined that whether first specify telephone number corresponding to the number series that digit is 7 accurate, such as Whether lack digit or whether be spacing etc..The telephone number that detection digit judges to obtain backward in S210, which can also be judged, is It is no accurate.
Step S214, export telephone number.
Step S216, judges whether the length of remaining telephone number strings is more than 0, and step S204 is performed if so, then returning, If it is not, then terminate this flow.
In embodiments of the present invention, target telephone number strings to be identified are carried out first related to phone number format pre- Processing operation (being followed successively by the supplement and duplicate removal according to the pre- cutting of separator, the identification of national area code and removal, regional area code), makes The target telephone number strings to be identified obtained after pretreatment operation are consistent with phone number format, in order to subsequently based on pretreatment behaviour Target telephone number strings to be identified after work carry out the identification of telephone number, improve the discrimination of telephone number.Further, this hair Bright embodiment combines the feature that different classes of telephone number (landline telephone and mobile phone) has, using different classes of electricity The division rule of phone number format divides to target telephone number strings to be identified corresponding to words number, is obtained according to division First specify the number series of digit to identify the classification of its corresponding telephone number, realize to different classes of telephone number Effectively identification.Further, the embodiment of the present invention has very with reference to two landline telephones in same telephone unit or mobile phone The characteristics of big similitude, the scheme judged using backward detection digit, target telephone number strings to be identified are detected, known Not, the accuracy of telephone number identification is further increased.Further, the embodiment of the present invention is for remaining telephone number strings, It is identified using recursive mode, until remaining telephone number strings have all been identified.
It is real based on same inventive concept, the present invention based on the recognition methods of the telephone number that each embodiment provides above Apply example and additionally provide a kind of identification device of telephone number, Fig. 3 shows the knowledge of telephone number according to an embodiment of the invention The structural representation of other device.As shown in figure 3, the device can at least include division module 310, judge module 320, determine mould Block 330, cutting module 340 and completion module 350.
Now introduce the function and each several part of each composition or device of the identification device of the telephone number of the embodiment of the present invention Between annexation:
Division module 310, suitable for from initial position, waiting to know to target according to the division rule for meeting phone number format Other telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module 320, it is coupled with division module 310, suitable for judging that first specifies whether the number series of digit meets The attributive character of first category telephone number;
Determining module 330, it is coupled with judge module 320, if judging the first number for specifying digit suitable for judge module String meets the attributive character of first category telephone number, then according to the attributive character of first category telephone number, it is determined that at least two Individual detection digit;
Cutting module 340, it is coupled with determining module 330, it is to be identified to target suitable for each detection digit is respectively adopted Telephone number strings carry out cutting, obtain cutting result;
Completion module 350, it is coupled with cutting module 340, suitable for according to cutting result, being detected from least two in digits Choose optimized detection digit specifies the number series of digit to carry out completion to first.
In an embodiment of the present invention, cutting module 340 is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, the first specified digit Number series after telephone number strings carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, as Cutting result corresponding to the detection digit.
In an embodiment of the present invention, completion module 350 is further adapted for:
Number identical digit corresponding to more each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To the first number series completion optimized detection digit for specifying digit.
In an embodiment of the present invention, if division module 310 is further adapted for judge module and judges the first number for specifying digit String does not meet the attributive character of first category telephone number, then chooses the new division rule for meeting phone number format to target Telephone number strings to be identified re-start division, obtain the second number series for specifying digit;
Judge module 320 is further adapted for judging that second specifies whether the number series of digit meets the category of second category telephone number Property feature;
If completion module 350, which is further adapted for judge module, judges that second specifies the number series of digit to meet second category phone number The attributive character of code, then according to the attributive character of second category telephone number, the number series of digit is specified to carry out completion to second.
In an embodiment of the present invention, division module 310 includes:
Pretreatment unit, grasped suitable for carrying out the pretreatment related to phone number format to target telephone number strings to be identified Make, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, according to meeting the division rule of phone number format to the mesh after processing Telephone number strings to be identified are marked to be divided.
In an embodiment of the present invention, pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
It is if original to be identified according to the separator cutting comprising the separator specified in target telephone number strings to be identified Telephone number strings, obtain at least two targets telephone number strings to be identified after cutting.
In an embodiment of the present invention, the separator specified includes at least one following:Pause mark, comma, branch, slash, Back slash, montant.
In an embodiment of the present invention, pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
In an embodiment of the present invention, pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code Target telephone number strings to be identified;
If the head of target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented Trivial number makes its complete;
If the head of target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area Number carry out duplicate removal processing.
In an embodiment of the present invention, as shown in figure 4, the device of Fig. 3 displayings can also include acquisition module 360, with drawing Sub-module 310 is coupled, suitable for obtaining target telephone number strings to be identified by following steps:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from POI.
In an embodiment of the present invention, as shown in figure 4, the device of Fig. 3 displayings can also include:
Recurrence module 370, being coupled with completion module 350, if suitable for remaining telephone number strings to be identified be present, touching Hair pretreatment unit performs that pretreatment operation, division module perform division operation again, judge module performs judgement again again Operation, determining module perform determination operation again, cutting module performs slicing operation again and completion module performs benefit again Full operation, until remaining telephone number strings to be identified have all been identified.
According to the combination of any one above-mentioned preferred embodiment or multiple preferred embodiments, the embodiment of the present invention can reach Following beneficial effect:
In embodiments of the present invention, from initial position, target is treated according to the division rule for meeting phone number format Identification telephone number strings are divided, i.e., with reference to different classes of telephone number has (such as landline telephone or mobile phone etc.) Feature, using the division rule of phone number format corresponding to different classes of telephone number to target telephone number strings to be identified Divided, first obtained according to division specifies the number series of digit to identify the classification of its corresponding telephone number, realizes Effective identification to different classes of telephone number.Further, the embodiment of the present invention combines two in same telephone unit Landline telephone or mobile phone have the characteristics of very big similitude, according to the attributive character of first category telephone number, it is determined that extremely Few two detections digit, the scheme then judged using backward detection digit, target telephone number strings to be identified are detected, Identification, further increase the accuracy of telephone number identification.
In addition, the embodiment of the present invention meets the division rule of phone number format to target telephone number to be identified in basis Before string is divided, the pretreatment related to phone number format can also be carried out to target telephone number strings to be identified and grasped Make so that the target telephone number strings to be identified after pretreatment operation are consistent with phone number format, in order to subsequently based on pre- Target telephone number strings to be identified after processing operation carry out the identification of telephone number, improve the discrimination of telephone number.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Mode it can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice Microprocessor or digital signal processor (DSP) are realized in the identification device of telephone number according to embodiments of the present invention The some or all functions of some or all parts.The present invention is also implemented as being used to perform method as described herein Some or all equipment or program of device (for example, computer program and computer program product).Such reality The program of the existing present invention can store on a computer-readable medium, or can have the form of one or more signal. Such signal can be downloaded from internet website and obtained, and either be provided or in the form of any other on carrier signal There is provided.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized It is set to and covers other all these variations or modifications.
The embodiment of the invention also discloses:A1, a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are carried out according to the division rule for meeting phone number format Division, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen from described at least two detection digits to the described first finger The number series for positioning number carries out completion.
A2, the method according to A1, wherein, it is described that each detection digit is respectively adopted to target phone to be identified Number series carries out cutting, obtains cutting result, including:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions Number, as cutting result corresponding to the detection digit.
A3, the method according to A1 or A2, wherein, according to the cutting result, from described at least two detection digits Middle optimized detection digit of choosing specifies the number series of digit to carry out completion to described first, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
A4, the method according to any one of A1-A3, wherein, judge described first specify digit number series whether After the attributive character for meeting first category telephone number, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, choose newly The division rule for meeting phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies The number series of digit;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then according to the attributive character of the second category telephone number, to the described second number series for specifying digit Carry out completion.
A5, the method according to any one of A1-A4, wherein, from initial position, according to meeting phone number format Division rule divides to target telephone number strings to be identified, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, is handled Target telephone number strings to be identified afterwards;
From initial position, according to meeting the division rule of phone number format to the target electricity to be identified after the processing Words number series is divided.
A6, the method according to any one of A1-A5, wherein, target telephone number strings to be identified are carried out and electricity The related pretreatment operation of words number format, the target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to mesh described in the separator cutting Telephone number strings to be identified are marked, obtain at least two targets telephone number strings to be identified after cutting.
A7, the method according to any one of A1-A6, wherein, the separator specified includes at least one following: Number, comma, branch, slash, back slash, montant.
A8, the method according to any one of A1-A7, wherein, at least two targets electricity to be identified after cutting is obtained After talking about number series, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has National area code;
If so, then remove the national area code on target telephone number strings head to be identified.
A9, the method according to any one of A1-A8, wherein, removing target telephone number strings head to be identified After national area code, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground Trivial number progress duplicate removal processing.
A10, the method according to any one of A1-A9, wherein, obtain target phone to be identified by following steps Number series:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
A11, the method according to any one of A1-A10, wherein, digit or second finger are being specified to described first After the number series progress completion for positioning number, in addition to:
If remaining telephone number strings to be identified be present, again perform pretreatment operation, division operation, judge operation, It is determined that operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
B12, a kind of identification device of telephone number, including:
Division module, it is to be identified to target according to the division rule for meeting phone number format suitable for from initial position Telephone number strings are divided, and obtain the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the category of first category telephone number Property feature;
Determining module, if judging that described first specifies the number series of digit to meet first category electricity suitable for the judge module The attributive character of number is talked about, then according to the attributive character of the first category telephone number, determines at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted, Obtain cutting result;
Completion module, suitable for according to the cutting result, optimized detection position is chosen from described at least two detection digits Several number series that digit is specified to described first carry out completion.
B13, the device according to B12, wherein, the cutting module is further adapted for:
For each detection digit, using the detection digit to target telephone number strings to be identified, described first Specify the telephone number strings after the number series of digit to carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical position on both correspondence positions Number, as cutting result corresponding to the detection digit.
B14, the device according to B12 or B13, wherein, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
B15, the device according to any one of B12-B14, wherein,
The division module, if being further adapted for the judge module judges that first specifies the number series of digit not meet the first kind The attributive character of other telephone number, then the new division rule for meeting phone number format is chosen to target phone to be identified Number series re-starts division, obtains the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category phone number The attributive character of code;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second The attributive character of classification telephone number, then according to the attributive character of the second category telephone number, to second specific bit Several number series carries out completion.
B16, the device according to any one of B12-B15, wherein, the division module includes:
Pretreatment unit, suitable for target telephone number strings to be identified are carried out with the pre- place related to phone number format Reason operation, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, after the division rule of phone number format is met to the processing Target telephone number strings to be identified divided.
B17, the device according to any one of B12-B16, wherein, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If comprising the separator specified in the target telephone number strings to be identified, according to former described in the separator cutting Begin telephone number strings to be identified, obtains at least two targets telephone number strings to be identified after cutting.
B18, the device according to any one of B12-B17, wherein, the separator specified include it is following at least it One:Pause mark, comma, branch, slash, back slash, montant.
B19, the device according to any one of B12-B18, wherein, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target phone to be identified Whether number series, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
B20, the device according to any one of B12-B19, wherein, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis is eliminated after national area code Target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, supplement This area's area code makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to the ground Trivial number progress duplicate removal processing.
B21, the device according to any one of B12-B20, wherein, in addition to acquisition module, suitable for passing through following steps Obtain target telephone number strings to be identified:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
B22, the device according to any one of B12-B21, wherein, in addition to:
Recurrence module, if suitable for remaining telephone number strings to be identified being present, triggering the pretreatment unit and holding again Row pretreatment operation, the division module perform division operation again, the judge module performs judgement operation again, described true Cover half block performs determination operation again, the cutting module performs slicing operation again and the completion module performs benefit again Full operation, until remaining telephone number strings to be identified have all been identified.

Claims (22)

1. a kind of recognition methods of telephone number, including:
From initial position, target telephone number strings to be identified are drawn according to the division rule for meeting phone number format Point, obtain the first number series for specifying digit;
Judge that described first specifies whether the number series of digit meets the attributive character of first category telephone number;
If so, at least two detection digits are then determined according to the attributive character of the first category telephone number;
Each detection digit is respectively adopted cutting is carried out to target telephone number strings to be identified, obtain cutting result;
According to the cutting result, optimized detection digit is chosen to first specific bit from described at least two detection digits Several number series carries out completion.
2. the method according to claim 11, wherein, it is described that each detection digit is respectively adopted to target electricity to be identified Talk about number series and carry out cutting, obtain cutting result, including:
For each detection digit, specified using the detection digit to target telephone number strings to be identified, described first Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, As cutting result corresponding to the detection digit.
3. method according to claim 1 or 2, wherein, according to the cutting result, from described at least two detection digits Middle optimized detection digit of choosing specifies the number series of digit to carry out completion to described first, including:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
4. method according to claim 1 or 2, wherein, judging that described first specifies whether the number series of digit meets After the attributive character of first category telephone number, in addition to:
If described first specifies the number series of digit not meet the attributive character of first category telephone number, new meet is chosen The division rule of phone number format re-starts division to target telephone number strings to be identified, obtains second and specifies digit Number series;
Judge that described second specifies whether the number series of digit meets the attributive character of second category telephone number;
If so, then the number series of digit is specified to carry out to described second according to the attributive character of the second category telephone number Completion.
5. method according to claim 1 or 2, wherein, from initial position, according to the division for meeting phone number format Rule divides to target telephone number strings to be identified, including:
Target telephone number strings to be identified are carried out with the pretreatment operation related to phone number format, after being handled Target telephone number strings to be identified;
From initial position, according to meeting the division rule of phone number format to the target phone number to be identified after the processing Sequence is divided.
6. according to the method for claim 5, wherein, target telephone number strings to be identified are carried out and telephone number lattice The related pretreatment operation of formula, the target telephone number strings to be identified after being handled, including:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If treated in the target telephone number strings to be identified comprising the separator specified according to target described in the separator cutting Telephone number strings are identified, obtain at least two targets telephone number strings to be identified after cutting.
7. according to the method for claim 6, wherein, the separator specified includes at least one following:Pause mark, tease Number, branch, slash, back slash, montant.
8. the method according to claim 6 or 7, wherein, at least two targets phone number to be identified after cutting is obtained After sequence, in addition to:
Whether for each target telephone number strings to be identified, determining the head of target telephone number strings to be identified has country Area code;
If so, then remove the national area code on target telephone number strings head to be identified.
9. according to the method for claim 8, wherein, removing the national area code on target telephone number strings head to be identified Afterwards, in addition to:
Analysis eliminates the target telephone number strings to be identified after national area code;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented Trivial number makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area Number carry out duplicate removal processing.
10. method according to claim 1 or 2, wherein, obtain target telephone number to be identified by following steps String:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
11. according to the method for claim 4, wherein, digit is being specified to the described first specified digit or described second After number series carries out completion, in addition to:
If remaining telephone number strings to be identified be present, pretreatment operation, division operation are performed again, operation is judged, determines Operation, slicing operation and completion operation, until remaining telephone number strings to be identified have all been identified.
12. a kind of identification device of telephone number, including:
Division module, suitable for from initial position, according to meeting the division rule of phone number format to target phone to be identified Number series is divided, and obtains the first number series for specifying digit;
Judge module, suitable for judging that described first specifies whether the number series of digit meets the attribute spy of first category telephone number Sign;
Determining module, if judging that described first specifies the number series of digit to meet first category phone number suitable for the judge module The attributive character of code, then according to the attributive character of the first category telephone number, determine at least two detection digits;
Cutting module, cutting is carried out to target telephone number strings to be identified suitable for each detection digit is respectively adopted, obtained Cutting result;
Completion module, suitable for according to the cutting result, optimized detection digit pair is chosen from described at least two detection digits Described first specifies the number series of digit to carry out completion.
13. device according to claim 12, wherein, the cutting module is further adapted for:
For each detection digit, specified using the detection digit to target telephone number strings to be identified, described first Telephone number strings after the number series of digit carry out cutting, obtain the first cutting number and the second cutting number;
Compare the first cutting number and the second cutting number, it is determined that number identical digit on both correspondence positions, As cutting result corresponding to the detection digit.
14. the device according to claim 12 or 13, wherein, the completion module is further adapted for:
Compare number identical digit corresponding to each detection digit;
From each detection digit, that chooses corresponding number identical digit maximum is used as optimized detection digit;
To optimized detection digit described in the described first number series completion for specifying digit.
15. the device according to claim 12 or 13, wherein,
The division module, if being further adapted for the judge module judges that first specifies the number series of digit not meet first category electricity The attributive character of number is talked about, then chooses the new division rule for meeting phone number format to target telephone number to be identified String re-starts division, obtains the second number series for specifying digit;
The judge module, it is further adapted for judging that described second specifies whether the number series of digit meets second category telephone number Attributive character;
The completion module, if being further adapted for the judge module judges that described second specifies the number series of digit to meet second category The attributive character of telephone number, then according to the attributive character of the second category telephone number, digit is specified to described second Number series carries out completion.
16. the device according to claim 12 or 13, wherein, the division module includes:
Pretreatment unit, grasped suitable for carrying out the pretreatment related to phone number format to target telephone number strings to be identified Make, the target telephone number strings to be identified after being handled;
Division unit, suitable for from initial position, according to meeting the division rule of phone number format to the mesh after the processing Telephone number strings to be identified are marked to be divided.
17. device according to claim 16, wherein, the pretreatment unit is further adapted for:
Whether determine in target telephone number strings to be identified comprising the separator specified;
If treated in the target telephone number strings to be identified comprising the separator specified according to target described in the separator cutting Telephone number strings are identified, obtain at least two targets telephone number strings to be identified after cutting.
18. device according to claim 17, wherein, the separator specified includes at least one following:Pause mark, tease Number, branch, slash, back slash, montant.
19. the device according to claim 17 or 18, wherein, the pretreatment unit is further adapted for:
After at least two targets telephone number strings to be identified after obtaining cutting, for each target telephone number to be identified Whether string, determining the head of target telephone number strings to be identified has national area code;
If so, then remove the national area code on target telephone number strings head to be identified.
20. device according to claim 19, wherein, the pretreatment unit is further adapted for:
After the national area code on target telephone number strings head to be identified is removed, analysis eliminates the institute after national area code State target telephone number strings to be identified;
If the head of the target telephone number strings to be identified has regional area code and this area's area code is imperfect, the ground is supplemented Trivial number makes its complete;
If the head of the target telephone number strings to be identified has regional area code and this area's area code repeats, to area of this area Number carry out duplicate removal processing.
21. the device according to claim 12 or 13, wherein, in addition to acquisition module, suitable for being obtained by following steps The target telephone number strings to be identified:
Point of interest POI is obtained from webpage;
Target telephone number strings to be identified are extracted from the POI.
22. device according to claim 16, wherein, in addition to:
Recurrence module, if suitable for remaining telephone number strings to be identified be present, trigger the pretreatment unit perform again it is pre- Processing operation, the division module perform division operation again, the judge module performs judgement operation, the determination mould again Block performs determination operation again, the cutting module performs slicing operation again and the completion module performs completion behaviour again Make, until remaining telephone number strings to be identified have all been identified.
CN201510643027.7A 2015-09-30 2015-09-30 The recognition methods of telephone number and device Active CN105227737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510643027.7A CN105227737B (en) 2015-09-30 2015-09-30 The recognition methods of telephone number and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510643027.7A CN105227737B (en) 2015-09-30 2015-09-30 The recognition methods of telephone number and device

Publications (2)

Publication Number Publication Date
CN105227737A CN105227737A (en) 2016-01-06
CN105227737B true CN105227737B (en) 2018-01-05

Family

ID=54996405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510643027.7A Active CN105227737B (en) 2015-09-30 2015-09-30 The recognition methods of telephone number and device

Country Status (1)

Country Link
CN (1) CN105227737B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109246623B (en) * 2018-08-31 2020-05-22 长沙炫笔记通信科技有限公司 Communication number completion method, device and storage medium
CN111866207B (en) * 2020-06-29 2022-11-22 厦门亿联网络技术股份有限公司 Audio and video conference system number distribution method and system
CN112003988A (en) * 2020-08-05 2020-11-27 云南电网有限责任公司红河供电局 Device and method for identifying number accuracy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102088697A (en) * 2010-12-17 2011-06-08 北京华中融合科技有限公司 Method and system for processing spam
CN104731977A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Phone number data search and classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4163138B2 (en) * 2004-04-05 2008-10-08 松下電器産業株式会社 Mobile phone equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102088697A (en) * 2010-12-17 2011-06-08 北京华中融合科技有限公司 Method and system for processing spam
CN104731977A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Phone number data search and classification method

Also Published As

Publication number Publication date
CN105227737A (en) 2016-01-06

Similar Documents

Publication Publication Date Title
KR102021057B1 (en) Apparatus and method for extracting paragraph in document
CN103902702B (en) A kind of data-storage system and storage method
CN105227737B (en) The recognition methods of telephone number and device
CN108491388B (en) Data set acquisition method, classification method, device, equipment and storage medium
RU2016107443A (en) METHOD AND DEVICE FOR RECOMMENDING REFERENCE DOCUMENTS
CN103559313B (en) Searching method and device
CN105095381B (en) New word identification method and device
CN108170293A (en) Input the personalized recommendation method and device of association
Termritthikun et al. NU-InNet: Thai food image recognition using convolutional neural networks on smartphone
CN109147769B (en) Language identification method, language identification device, translation machine, medium and equipment
CN104778159B (en) Word segmenting method and device based on word weights
WO2016034062A1 (en) Information lookup method and device
CN105260440B (en) Identify the method and device of telephone number
CN112364014A (en) Data query method, device, server and storage medium
CN108780047A (en) The detection method and relevant apparatus and computer readable storage medium of material composition
CN105187600B (en) Recognition methods based on recursive telephone number and device
CN107608965B (en) Extracting method, electronic equipment and the storage medium of books the names of protagonists
CN109670153A (en) A kind of determination method, apparatus, storage medium and the terminal of similar model
CN106569734B (en) The restorative procedure and device that memory overflows when data are shuffled
JP2010020530A (en) Document classification providing device, document classification providing method and program
CN104317903B (en) The recognition methods of the chapters and sections integrality of chapters and sections formula text and device
CN106919601B (en) Method and device for extracting interest points from query words
CN109033210A (en) A kind of method and apparatus for excavating map point of interest POI
CN107577667A (en) A kind of entity word treating method and apparatus
CN108154177B (en) Service identification method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220715

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right