CN104951543A - Information processing method and device realized through computer - Google Patents

Information processing method and device realized through computer Download PDF

Info

Publication number
CN104951543A
CN104951543A CN201510347745.XA CN201510347745A CN104951543A CN 104951543 A CN104951543 A CN 104951543A CN 201510347745 A CN201510347745 A CN 201510347745A CN 104951543 A CN104951543 A CN 104951543A
Authority
CN
China
Prior art keywords
geographical
term
classification
geographical term
location information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510347745.XA
Other languages
Chinese (zh)
Other versions
CN104951543B (en
Inventor
邵睿
沈剑平
李炫�
莫洋
宋元峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510347745.XA priority Critical patent/CN104951543B/en
Publication of CN104951543A publication Critical patent/CN104951543A/en
Application granted granted Critical
Publication of CN104951543B publication Critical patent/CN104951543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an information processing method and device realized through a computer. The method comprises the steps that text information of an essay is acquired; at least one original geographical noun is extracted from the text information; classification geographical position information corresponding to the original geographical noun is extracted from a pre-built classification geographic information knowledge base, wherein the classification geographical position information comprises same-level geographical nouns matched with the original geographical nouns and superior geographical nouns of administration partition levels which the same-level geographical nouns belong to; credibility rating is carried out on the extracted classification geographical position information according to the extracted original geographical nouns and the classification geographical position information; the classification geographical position information with the credibility grade exceeding a preset credibility threshold value is labeled as geographical information related to the essay. By means of the information processing method and device realized through the computer, the geographical information related to the essay content can be accurately acquired, and the acquired geographical information is complete and comprises all administrative levels.

Description

By computer implemented information processing method and device
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of information processing method for content of text and device.
Background technology
Current, the information content on network is huge, and abundance.For news article, user can obtain a large amount of news from various information source, when user is concerned about the news of being correlated with in some region very much, how the news relevant to the region that user is concerned about is recommended user, is a technical barrier being badly in need of solving.
In prior art, the method of carrying out Keywords matching with the geography information knowledge base set up in advance is adopted to extract at least one candidate's geography information of article and corresponding count results, using the geography information of candidate's geography information higher for count results as described article, be that user recommends the article relevant to region according to the geography information of article.
State in realization in the process of the geography information extracting article, situation about senior geography information and rudimentary geography information (such as Guangdong Province and Shenzhen) mixing being calculated can occur, cause the geography information of the article extracted inaccurate.In addition, cannot extract geography information implicit in article, such as, when there is " Haidian District " in article, its implicit geography information " Beijing " cannot be extracted, and causes the geography information of the article extracted inaccurate.
Summary of the invention
The object of the present invention is to provide a kind of by computer implemented information processing method and device, to extract the classification geography information relevant to article content more accurately.
According to an aspect of the present invention, the invention provides a kind of by computer implemented information processing method, described method comprises: the text message obtaining article; At least one original geographical term is extracted from described text message; Extract the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it; Respectively confidence level scoring is carried out to the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information; The classification geographical location information that described confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.
Preferably, the described described original geographical term according to extraction and described classification geographical location information comprise the process that the described classification geographical location information extracted carries out confidence level scoring respectively: the value obtaining at least two geographical term deliberated indexs according to the described original geographical term extracted and described classification geographical location information respectively from the described classification geographical location information extracted; According to acquisition, the value of at least two geographical term deliberated indexs carries out confidence level scoring to the described classification geographical location information extracted respectively.
Further, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
Preferably, the process that the classification geographical location information corresponding to described original geographical term is extracted in the geographical information knowledge storehouse of the described classification from setting up in advance respectively comprises: be called for short full name mapping table according to the geography information set up in advance and extract the classification geographical location information corresponding with described original geographical term respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
Alternatively, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
Alternatively, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
Further, described text message comprises title and the text of described article; The extracting position of described original geographical term comprises at least one with upper/lower positions: the beginning of described title, described text, the ending of described text and the remainder of described text except beginning and end.
Alternatively, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
Preferably, the value of described at least two geographical term deliberated indexs according to acquisition comprises the process that the described classification geographical location information extracted carries out confidence level scoring respectively: the geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs gives weights; The classification geographical location information corresponding to described geographical term at the same level respectively according to the weights of the geographical term imparting described at the same level for extracting carries out confidence level scoring.
According to a further aspect in the invention, the present invention also provides a kind of device for information processing, and described device comprises: text message acquiring unit, for obtaining the text message of article; Original geographical term extraction unit, for extracting at least one original geographical term from described text message; Classification geographical location information extraction unit, for extracting the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it; Confidence level scoring unit, for carrying out confidence level scoring to the described classification geographical location information extracted respectively according to the described original geographical term extracted and described classification geographical location information; Geography information mark unit, is labeled as the geography information relevant to described article for the classification geographical location information that described confidence level scoring is exceeded predetermined believability threshold.
Preferably, described confidence level scoring unit comprises: geographical term deliberated index value acquiring unit, for obtaining the value of at least two geographical term deliberated indexs respectively from the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information; Confidence level scoring subelement, the value at least two geographical term deliberated indexs according to acquisition carries out confidence level scoring to the described classification geographical location information extracted respectively.
Further, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
Preferably, described classification geographical location information extraction unit comprises: classification geographical location information extracts subelement, extracts the classification geographical location information corresponding to described original geographical term for being called for short full name mapping table according to the geography information set up in advance respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
Alternatively, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
Alternatively, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
Further, described text message comprises title and the text of described article; The extracting position of described original geographical term comprises at least one with upper/lower positions: the beginning of described title, described text, the ending of described text and the remainder of described text except beginning and end.
Alternatively, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
Preferably, described confidence level scoring subelement comprises: weights assignment module, gives weights for the geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs; Confidence level grading module, the classification geographical location information that the weights for giving according to the geographical term described at the same level for extracting are corresponding to described geographical term at the same level respectively carries out confidence level scoring.
Provided by the invention by computer implemented information processing method and device, the classification geographical location information corresponding to the original geographical term the text message of article is extracted from the geographical information knowledge storehouse of classification, and confidence level scoring is carried out to described classification geographical location information, the classification geographical location information that confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.Owing to considering the Administration partition rank of geographical location information in processing procedure, and obtain the at the same level geographical term corresponding with the original geographical term extracted and each higher level's geographical term, therefore can get the geography information relevant to article content more exactly, and the geography information obtained is the geography information more completely comprising each administrative grade.
Accompanying drawing explanation
Fig. 1 illustrates the schematic flow sheet of exemplary embodiment of the present by computer implemented information processing method;
Fig. 2 illustrates the article recommendation page schematic diagram based on geographic position in infosystem;
Fig. 3 illustrates that user configures the page schematic diagram in interested geographic position in infosystem;
Fig. 4 illustrates the structural representation of exemplary embodiment of the present for the device of information processing.
Embodiment
Basic conception of the present invention is, the Administration partition rank of geographical location information is extracted the classification geographical location information corresponding to the original geographical term in the text message of article as one of key element and confidence level scoring is carried out to described classification geographical location information, thus determine the geography information relevant to described article based on described confidence level scoring, realize extracting the geography information relevant to article content more accurately, for user recommends the article relevant to the region of its care.
Below in conjunction with accompanying drawing being described in detail by computer implemented information processing method and device exemplary embodiment of the present.
Embodiment one
Fig. 1 illustrates the schematic flow sheet of exemplary embodiment of the present by computer implemented information processing method.
With reference to Fig. 1, in step S110, obtain the text message of article.
Particularly, text message can comprise title and the text of article.
In step S120, extract at least one original geographical term from described text message.
Particularly, can, according to the knowledge base comprising more complete geographical location information, the method for Keywords matching be adopted to extract at least one original geographical term from described text message.
In step S130, extract the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it.
Particularly, the geographical information knowledge storehouse of classification can be as shown in table 1.
The geographical information knowledge storehouse of table 1 classification
Economize City District, county, town, township Street, road
Guangdong Province Shenzhen Nanshan District South Mountain street
Guangdong Province Shenzhen Yantian District Main road, the South Sea
Fujian Province Longyan South Mountain town
Sichuan Province Deyang City South Mountain town Chinese scholartree East Road
Hunan Province Yueyang Villages in So th
Jiangxi Province Shangrao City Villages in So th
Suppose that the original geographical term extracted from text message in step S120 is " Nanshan District ", then from the geographical information knowledge storehouse of the classification shown in table 1, extract match with original geographical term " Nanshan District " peer's (district, county, town, township level) geographical term " Nanshan District " and higher level's (provincial and city-level) geographical term " Guangdong Province " and " Shenzhen ".
In step S140, respectively confidence level scoring is carried out to the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information.
Particularly, be still " Nanshan District " for the original geographical term extracted in step S120, according to the original geographical term " Nanshan District " extracted from text message, and the geographical term at the same level " Nanshan District " extracted from the geographical information knowledge storehouse of classification and higher level's geographical term " Guangdong Province " and " Shenzhen " carry out confidence level scoring to geographical term at the same level " Nanshan District ", higher level's geographical term " Guangdong Province " and " Shenzhen " respectively.
In step S150, the classification geographical location information that described confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.
Particularly, confidence level scoring scope can be 0 ~ 1, more close to 1, represent described geographical term at the same level or higher level's geographical term more credible.The classification geographical location information that confidence level scoring in the classification geographical location information of extraction is exceeded predetermined believability threshold is as the geography information relevant to described article and mark.
The embodiment of the present invention by computer implemented information processing method, the classification geographical location information corresponding to the original geographical term the text message of article is extracted from the geographical information knowledge storehouse of classification, and confidence level scoring is carried out to described classification geographical location information, the classification geographical location information that confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.Owing to considering the Administration partition rank of geographical location information in processing procedure, and obtain the at the same level geographical term corresponding with the original geographical term extracted and each higher level's geographical term, therefore can get the geography information relevant to article content more exactly, and the geography information obtained is the geography information more completely comprising each administrative grade.
Preferably, the process of described step S140 comprises: the value obtaining at least two geographical term deliberated indexs according to the described original geographical term extracted and described classification geographical location information respectively from the described classification geographical location information extracted; According to acquisition, the value of at least two geographical term deliberated indexs carries out confidence level scoring to the described classification geographical location information extracted respectively.
Alternatively, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
Particularly, the registration of higher level's geographical term of each Administration partition rank belonging to multiple described geographical term at the same level, for evaluating the whether corresponding identical higher level's geographical term of multiple described geographical term at the same level.Suppose that the geographical term at the same level extracted in step S130 is respectively " Shenzhen ", " Nanshan District " and " Yantian District ", due to higher level's geographical term " Guangdong Province " that above-mentioned three geographical terms at the same level are corresponding identical, therefore higher level's geographical term of above-mentioned three geographical terms at the same level overlaps.
Preferably, the process of described step S130 comprises: be called for short full name mapping table according to the geography information set up in advance and extract the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
Particularly, geography information abbreviation full name mapping table can be as shown in table 2.
Table 2 geography information is called for short full name mapping table
Be called for short Full name
South Mountain Nanshan District/town/Villages in So th, South Mountain
Guangdong Guangdong Province
Guangdong Guangdong Province
Suppose that the original geographical term extracted from text message in step S120 is " South Mountain ", then be called for short from the geography information shown in table 2 full name " Nanshan District " extracting full name mapping table and match with original geographical term " South Mountain ", " South Mountain town " and " Villages in So th ", and from the geographical information knowledge storehouse of the classification shown in table 1, extract (the district of peer matched with original geographical term " South Mountain " according to above-mentioned three full name of original geographical term " South Mountain ", county, town, township level) geographical term " Nanshan District ", " South Mountain town " and " Villages in So th ", and higher level's (provincial and city-level) geographical term " Guangdong Province ", " Shenzhen ", " Fujian Province ", " Longyan ", " Sichuan Province ", " Deyang City ", " Hunan Province ", " Yueyang ", " Jiangxi Province " and " Shangrao City ".
Alternatively, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
Particularly, whether the abbreviation full name integrity degree of the original geographical term that the geographical term described at the same level of extraction is corresponding is full name for evaluating original geographical term corresponding to described geographical term at the same level.Suppose that the original geographical term extracted from text message in step S120 is " South Mountain ", be then called for short the known original geographical term " South Mountain " of full name mapping table for being called for short (namely imperfect) according to the geography information shown in table 2.
Alternatively, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
Particularly, the extracting position of described original geographical term can comprise at least one with upper/lower positions: the beginning of title, text, the ending of text and the remainder of text except beginning and end.
Alternatively, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
Preferably, the value of described at least two geographical term deliberated indexs according to acquisition comprises the process that the described classification geographical location information extracted carries out confidence level scoring respectively: the geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs gives weights; The classification geographical location information corresponding to described geographical term at the same level respectively according to the weights of the geographical term imparting described at the same level for extracting carries out confidence level scoring.
Particularly, can be that described geographical term at the same level gives the first weights according to the Administration partition rank of the geographical term at the same level extracted, Administration partition rank be higher, and the first corresponding weights are larger.Such as, the first weights that geographical term at the same level " Guangdong Province " is corresponding are greater than the first weights corresponding to geographical term at the same level " Shenzhen ".
Can the registration of higher level's geographical term of each Administration partition rank belonging to multiple geographical term at the same level be that described multiple geographical term at the same level gives the second weights, registration be higher, and the second corresponding weights are larger.Such as, the geographical term at the same level extracted in step S130 is respectively " Shenzhen ", " Nanshan District " and " South Mountain town ", the higher level geographical term corresponding due to " Shenzhen " and " Nanshan District " overlaps, be in " Guangdong Province ", and higher level's geographical term corresponding to " South Mountain town " higher level geographical term corresponding with " Shenzhen " and " Nanshan District " does not overlap, therefore second weights of geographical term at the same level " Shenzhen " and " Nanshan District " correspondence are greater than the second weights of geographical term at the same level " South Mountain town " correspondence.
The abbreviation full name integrity degree of original geographical term that can be corresponding according to the geographical term at the same level extracted is that described geographical term at the same level gives the 3rd weights, and integrity degree is higher, and the 3rd corresponding weights are larger.Such as, the 3rd weights that corresponding with original geographical term " Guangdong Province " geographical term at the same level " Guangdong Province " is corresponding are greater than the 3rd weights corresponding to the at the same level geographical term " Guangdong Province " corresponding with original geographical term " Guangdong ".
The extracting position of original geographical term that can be corresponding according to the geographical term at the same level extracted is that described geographical term at the same level gives the 4th weights, extracting position " title " > extracting position " home/end of text " > extracting position " remainder of text except beginning and end ".
The occurrence number of original geographical term that can be corresponding according to the geographical term at the same level extracted is that described geographical term at the same level gives the 5th weights, and occurrence number is higher, and the 5th corresponding weights are larger.
The first weights, the second weights, the 3rd weights, the 4th weights and the 5th weights according to giving for described geographical term at the same level determine the final weights that described geographical term at the same level is corresponding.According to the difference of final weights corresponding to other geographical terms at the same level that the final weights that described geographical term at the same level is corresponding are identical with described geographical term at the same level with Administration partition rank, the classification geographical location information corresponding to described geographical term at the same level carries out confidence level scoring, the difference of final weights is larger, and confidence level scoring is higher.Such as, the final weights of geographical term " Guangdong Province " correspondence at the same level are 20, the final weights of geographical term " Fujian Province " correspondence at the same level are 21, and because the difference of final weights is less, therefore the confidence level scoring of geographical term at the same level " Guangdong Province " and " Fujian Province " correspondence is lower.Such as, the final weights of geographical term " Guangdong Province " correspondence at the same level are 20, the final weights of geographical term " Fujian Province " correspondence at the same level are also 3, and due to differing greatly of final weights, therefore the confidence level scoring of geographical term at the same level " Guangdong Province " and " Fujian Province " correspondence is higher.
The classification geographical location information that confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.Such as, the original geographical term extracted in step S120 is respectively " Shenzhen " " South Mountain ", the classification geographical location information that the confidence level scoring then finally obtained exceedes predetermined believability threshold can be " Guangdong Province ", " Shenzhen " and " Nanshan District ", and does not comprise confidence level scoring lower " South Mountain town " and " Villages in So th ".
The embodiment of the present invention by computer implemented information processing method can be applicable to any have article recommend application program (Application, be called for short APP), search engine and website etc., also can be applicable to application programming interface (the Application Programming Interface of independently module or an open platform, be called for short API), such as, in infosystem.Fig. 2 illustrates the article recommendation page schematic diagram based on geographic position in infosystem.The page recommended in article when to be illustrated in figure 2 geographic position keyword be " whole nation ".Fig. 3 illustrates that user configures the page schematic diagram in interested geographic position in infosystem.As shown in Figure 3, user selects interested geographic position, such as " Guangdong " by " the region screening " on the page, thinks that user recommends the article relevant to " Guangdong ".
Embodiment two
Fig. 4 illustrates the structural representation of exemplary embodiment of the present for the device of information processing.
With reference to Fig. 4, the device for information processing of the embodiment of the present invention can perform the information processing method by calculating the realization of confidence level scoring unit machine of embodiment one.The device for information processing of the embodiment of the present invention comprises: text message acquiring unit 410, original geographical term extraction unit 420, classification geographical location information extraction unit 430, confidence level scoring unit 440 and geography information mark unit 450.
Text message acquiring unit 410 is for obtaining the text message of article.
Original geographical term extraction unit 420 is for extracting at least one original geographical term from described text message.
Classification geographical location information extraction unit 430 is for extracting the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, and described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it.
Confidence level scoring unit 440 is for carrying out confidence level scoring to the described classification geographical location information extracted respectively according to the described original geographical term extracted and described classification geographical location information.
Geography information mark unit 450 is labeled as the geography information relevant to described article for the classification geographical location information that described confidence level scoring is exceeded predetermined believability threshold.
Preferably, described confidence level scoring unit 440 comprises: geographical term deliberated index value acquiring unit, for obtaining the value of at least two geographical term deliberated indexs respectively from the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information; Confidence level scoring subelement, the value at least two geographical term deliberated indexs according to acquisition carries out confidence level scoring to the described classification geographical location information extracted respectively.
Further, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
Preferably, described classification geographical location information extraction unit 430 comprises: classification geographical location information extracts subelement, extracts the classification geographical location information corresponding to described original geographical term for being called for short full name mapping table according to the geography information set up in advance respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
Alternatively, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
Alternatively, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
Further, described text message comprises title and the text of described article; The extracting position of described original geographical term comprises at least one with upper/lower positions: the beginning of described title, described text, the ending of described text and the remainder of described text except beginning and end.
Alternatively, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
Preferably, described confidence level scoring subelement comprises: weights assignment module, gives weights for the geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs; Confidence level grading module, the classification geographical location information that the weights for giving according to the geographical term described at the same level for extracting are corresponding to described geographical term at the same level respectively carries out confidence level scoring.
The device for information processing of the embodiment of the present invention, the classification geographical location information corresponding to the original geographical term the text message of article is extracted from the geographical information knowledge storehouse of classification, and confidence level scoring is carried out to described classification geographical location information, the classification geographical location information that confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.Owing to considering the Administration partition rank of geographical location information in processing procedure, and obtain the at the same level geographical term corresponding with the original geographical term extracted and each higher level's geographical term, therefore can get the geography information relevant to article content more exactly, and the geography information obtained is the geography information more completely comprising each administrative grade.
It may be noted that the needs according to implementing, each step described can be split as more multi-step, also the part operation of two or more step or step can be combined into new step, to realize object of the present invention in the application.
Above-mentioned can at hardware according to method of the present invention, realize in firmware, or be implemented as and can be stored in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) in software or computer code, or be implemented and will be stored in the computer code in local recording medium by the original storage of web download in remote logging medium or nonvolatile machine readable media, thus method described here can be stored in use multi-purpose computer, such software process on the recording medium of application specific processor or able to programme or specialized hardware (such as ASIC or FPGA).Be appreciated that, computing machine, processor, microprocessor controller or programmable hardware comprise and can store or receive the memory module of software or computer code (such as, RAM, ROM, flash memory etc.), when described software or computer code by computing machine, processor or hardware access and perform time, realize disposal route described here.In addition, when the code for realizing the process shown in this accessed by multi-purpose computer, multi-purpose computer is converted to the special purpose computer for performing the process shown in this by the execution of code.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims (18)

1., by a computer implemented information processing method, it is characterized in that, described method comprises:
Obtain the text message of article;
At least one original geographical term is extracted from described text message;
Extract the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it;
Respectively confidence level scoring is carried out to the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information;
The classification geographical location information that described confidence level scoring exceedes predetermined believability threshold is labeled as the geography information relevant to described article.
2. method according to claim 1, is characterized in that, the described described original geographical term according to extraction and described classification geographical location information comprise the process that the described classification geographical location information extracted carries out confidence level scoring respectively:
Obtain the value of at least two geographical term deliberated indexs respectively from the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information;
According to acquisition, the value of at least two geographical term deliberated indexs carries out confidence level scoring to the described classification geographical location information extracted respectively.
3. method according to claim 2, it is characterized in that, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
4. method according to claim 3, is characterized in that, the process that the classification geographical location information corresponding to described original geographical term is extracted in the geographical information knowledge storehouse of the described classification from setting up in advance respectively comprises:
Be called for short full name mapping table according to the geography information set up in advance and extract the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
5. method according to claim 4, is characterized in that, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
6. method according to claim 2, is characterized in that, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
7. method according to claim 6, is characterized in that, described text message comprises title and the text of described article;
The extracting position of described original geographical term comprises at least one with upper/lower positions: the beginning of described title, described text, the ending of described text and the remainder of described text except beginning and end.
8. method according to claim 2, is characterized in that, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
9. the method according to any one of claim 2 ~ 8, is characterized in that, the value of described at least two geographical term deliberated indexs according to acquisition comprises the process that the described classification geographical location information extracted carries out confidence level scoring respectively:
The geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs gives weights;
The classification geographical location information corresponding to described geographical term at the same level respectively according to the weights of the geographical term imparting described at the same level for extracting carries out confidence level scoring.
10. for a device for information processing, it is characterized in that, described device comprises:
Text message acquiring unit, for obtaining the text message of article;
Original geographical term extraction unit, for extracting at least one original geographical term from described text message;
Classification geographical location information extraction unit, for extracting the classification geographical location information corresponding to described original geographical term respectively from the geographical information knowledge storehouse of the classification of setting up in advance, described classification geographical location information comprises higher level's geographical term of the geographical term at the same level that matches with described original geographical term and each Administration partition rank belonging to it;
Confidence level scoring unit, for carrying out confidence level scoring to the described classification geographical location information extracted respectively according to the described original geographical term extracted and described classification geographical location information;
Geography information mark unit, is labeled as the geography information relevant to described article for the classification geographical location information that described confidence level scoring is exceeded predetermined believability threshold.
11. devices according to claim 10, is characterized in that, described confidence level scoring unit comprises:
Geographical term deliberated index value acquiring unit, for obtaining the value of at least two geographical term deliberated indexs respectively from the described classification geographical location information extracted according to the described original geographical term extracted and described classification geographical location information;
Confidence level scoring subelement, the value at least two geographical term deliberated indexs according to acquisition carries out confidence level scoring to the described classification geographical location information extracted respectively.
12. devices according to claim 11, it is characterized in that, described geographical term deliberated index comprises: the registration of higher level's geographical term of the Administration partition rank of the geographical term described at the same level of extraction and each Administration partition rank belonging to multiple described geographical term at the same level.
13. devices according to claim 12, is characterized in that, described classification geographical location information extraction unit comprises:
Classification geographical location information extracts subelement, extracts the classification geographical location information corresponding to described original geographical term for being called for short full name mapping table according to the geography information set up in advance respectively from the geographical information knowledge storehouse of described classification of setting up in advance.
14. devices according to claim 13, is characterized in that, described geographical term deliberated index also comprises the abbreviation full name integrity degree of original geographical term corresponding to the geographical term described at the same level of extraction.
15. devices according to claim 11, is characterized in that, described geographical term deliberated index also comprises the extracting position of original geographical term corresponding to the geographical term described at the same level of extraction.
16. devices according to claim 15, is characterized in that, described text message comprises title and the text of described article;
The extracting position of described original geographical term comprises at least one with upper/lower positions: the beginning of described title, described text, the ending of described text and the remainder of described text except beginning and end.
17. devices according to claim 11, is characterized in that, described geographical term deliberated index also comprises the occurrence number of original geographical term corresponding to the geographical term at the same level of extraction.
18. devices according to any one of claim 11 ~ 17, is characterized in that, described confidence level scoring subelement comprises:
Weights assignment module, gives weights for the geographical term described at the same level being respectively extraction according to the value of described at least two geographical term deliberated indexs;
Confidence level grading module, the classification geographical location information that the weights for giving according to the geographical term described at the same level for extracting are corresponding to described geographical term at the same level respectively carries out confidence level scoring.
CN201510347745.XA 2015-06-19 2015-06-19 Pass through computer implemented information processing method and device Active CN104951543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510347745.XA CN104951543B (en) 2015-06-19 2015-06-19 Pass through computer implemented information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510347745.XA CN104951543B (en) 2015-06-19 2015-06-19 Pass through computer implemented information processing method and device

Publications (2)

Publication Number Publication Date
CN104951543A true CN104951543A (en) 2015-09-30
CN104951543B CN104951543B (en) 2019-02-22

Family

ID=54166201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510347745.XA Active CN104951543B (en) 2015-06-19 2015-06-19 Pass through computer implemented information processing method and device

Country Status (1)

Country Link
CN (1) CN104951543B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408489A (en) * 2016-11-01 2017-02-15 航天恒星科技有限公司 Targeted poverty alleviation information processing method and device
WO2018161719A1 (en) * 2017-03-07 2018-09-13 广州优视网络科技有限公司 Method and apparatus for recommending articles to users on basis of regional characteristics
CN113360742A (en) * 2021-05-19 2021-09-07 维沃移动通信有限公司 Recommendation information determination method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651634A (en) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 Method and system for providing regional information
CN101661461A (en) * 2008-08-29 2010-03-03 阿里巴巴集团控股有限公司 Method and system for determining core geographic information in document
CN101777082A (en) * 2010-03-01 2010-07-14 苏州数字地图网络科技有限公司 Correlation method of text information and geological information and system
CN103853738A (en) * 2012-11-29 2014-06-11 中国科学院计算机网络信息中心 Identification method for webpage information related region

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651634A (en) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 Method and system for providing regional information
CN101661461A (en) * 2008-08-29 2010-03-03 阿里巴巴集团控股有限公司 Method and system for determining core geographic information in document
CN101777082A (en) * 2010-03-01 2010-07-14 苏州数字地图网络科技有限公司 Correlation method of text information and geological information and system
CN103853738A (en) * 2012-11-29 2014-06-11 中国科学院计算机网络信息中心 Identification method for webpage information related region

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408489A (en) * 2016-11-01 2017-02-15 航天恒星科技有限公司 Targeted poverty alleviation information processing method and device
WO2018161719A1 (en) * 2017-03-07 2018-09-13 广州优视网络科技有限公司 Method and apparatus for recommending articles to users on basis of regional characteristics
CN113360742A (en) * 2021-05-19 2021-09-07 维沃移动通信有限公司 Recommendation information determination method and device and electronic equipment

Also Published As

Publication number Publication date
CN104951543B (en) 2019-02-22

Similar Documents

Publication Publication Date Title
Kinsella et al. " I'm eating a sandwich in Glasgow" modeling locations with tweets
Robertson et al. Biogeo: an R package for assessing and improving data quality of occurrence record datasets
Han et al. A stacking-based approach to twitter user geolocation prediction
CN102289467A (en) Method and device for determining target site
CN104679801B (en) A kind of interest point search method and device
CN104915426A (en) Information sorting method, method for generating information ordering models and device
CN103546446B (en) Phishing website detection method, device and terminal
CN105808609A (en) Discrimination method and equipment of point-of-information data redundancy
CN105893075A (en) Update method and device of application
CN104951543A (en) Information processing method and device realized through computer
CN105701193A (en) Method for rapidly searching for traffic big data dynamic information and application thereof
CN104317909A (en) Method and device for verifying data of points of interest
CN103186666A (en) Method, device and equipment for searching based on favorites
CN103077234A (en) Voice website navigation system and method
CN106021556A (en) Address information processing method and device
CN104715012B (en) Network entity City-level terrestrial reference mining algorithm based on Internet forums
CN103955480A (en) Method and equipment for determining target object information corresponding to user
CN106934004A (en) A kind of method and apparatus for recommending article to user based on regional feature
CN105205075B (en) From the name entity sets extended method of extension and recommended method is inquired based on collaboration
CN105488105A (en) Establishment method for information extraction template and knowledge data processing method and apparatus
Yang et al. Classifying urban functional zones by integrating POIs, Place2vec, and LDA
CN103853437A (en) Candidate item obtaining method and device
CN112307169A (en) Address data matching method and device, computer equipment and storage medium
CN107066112A (en) The spelling input method and device of a kind of address information
WO2017050991A1 (en) Aggregating profile information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant