CN113626536B - News geocoding method based on deep learning - Google Patents
News geocoding method based on deep learning Download PDFInfo
- Publication number
- CN113626536B CN113626536B CN202110747499.2A CN202110747499A CN113626536B CN 113626536 B CN113626536 B CN 113626536B CN 202110747499 A CN202110747499 A CN 202110747499A CN 113626536 B CN113626536 B CN 113626536B
- Authority
- CN
- China
- Prior art keywords
- news
- text
- name
- place
- geocoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Remote Sensing (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a news geocoding method based on deep learning, which is used for realizing the geocoding of news contents. The invention combines the deep learning model and the place name database to realize the geographical coding of news under different provinces, cities and counties, thereby obtaining longitude and latitude information contained in the news, leading people to have more visual perception on the news places, and simultaneously, the result can be used for the functions of geographical search screening, distance sorting, region recommendation and the like of the news. The invention uses the deep learning model based on ERNIE and Bi-GRU-CRF to carry out the task of identifying the named entity, thereby being more accurate and efficient for identifying the place name and the organization name in the news text; by carrying out different news content extraction methods on different news sources, the news text extraction method is effectively compatible with the news text extraction of each large news portal website, and the news geocoding application range is wider.
Description
Technical Field
The invention relates to text mining, natural language processing and geocoding in computer science and technology, in particular to a news geocoding method based on deep learning.
Background
With the development of science and technology and network technology, we have come to an era of information explosion, and it is increasingly important to obtain information efficiently in the current society. News is an important way for people to obtain information, a short report of the fact that recently occurs and has a social meaning and is of public interest, and is one of the most widely used genres in newspapers, radio, and television news.
However, its presentation in the space-time dimension is lacking. In traditional news reading, people can only know the place where the news occurs or the place where the news is related by reading news text or viewing pictures, but cannot intuitively know the geographical position where the news occurs, so that readers lack understanding of the geographical position of the news and grasp of the surrounding environment of the news place, and news information cannot be intuitively perceived and read. Meanwhile, the spatial attribute of the news cannot be fully mined, so that the functions of geographical search screening, distance sorting, region recommendation and the like of the news are difficult to realize.
Disclosure of Invention
The invention provides a news geocoding method based on deep learning, which extracts places in news texts and performs geocoding by using a deep learning technology, so that longitude and latitude information contained in news is obtained, people can intuitively recognize news places, and the result can be used for functions of geographic search screening, distance sorting, region recommendation and the like of news.
The technical scheme provided by the invention is a news geocoding method based on deep learning, which comprises the following steps:
step S10, constructing a database of provinces, cities, counties and place names of China;
step S20, extracting news text from given news links and news contents by using a text extractor;
step S30, searching a place name database according to the news text extracted in the step S20 to obtain the possible provinces, cities and county place names in the news;
step S40, carrying out a named entity recognition task based on the ERNIE and Bi-GRU-CRF deep learning model according to the news text extracted in the step S20, and obtaining a place name alternative list in the news text;
step S50, calling a local or national geocoding service to geocode the place names in the place name alternative list according to the county names, the city names and the province name alternative list;
step S60, the encoding result candidate list is structured and organized, and the geocoding result is organized according to the modes of national-place roll calling, province-place roll calling, city-place roll calling and county-place roll calling. Therefore, the geographical coding of the news is realized, the geographical coordinates of the news are determined, and longitude and latitude data contained in the news are obtained.
Further, the step S10 specifically includes:
step S101, obtaining provinces, cities and counties of the whole China, and classifying according to the provinces, the direct jurisdictions and the non-direct jurisdictions and the counties;
step S102, building a subordinate relation between place names, corresponding father-son relations of the city names of the provinces and the counties of the cities one by one, and storing the corresponding relations in a database.
Step S103, establishing a query service, and querying whether the corresponding place name exists in the database or not through the query keyword, and giving the place name type and the father-son relationship thereof.
Further, the step S20 specifically includes:
in step S201, the web page is first parsed into DOM tree by reading the web page HTML code, each HTML tag being a node, wherein all text is a leaf node in the DOM tree. Traversing each node in the DOM tree, and calculating the total number of character strings of all text leaf nodes in the node divided by the total number of child nodes contained in the node to obtain the text density of the node. And obtaining the news text DOM node by screening the node with the highest text density in the DOM tree. Further, the news text is obtained by obtaining the text content of the text leaf node in the DOM node.
Step S202, it is determined whether the given content is a link or text. For the text, the text is directly used as news text for subsequent steps.
Step S203, judging the news website for the link, if the link website is other news portal websites except WeChat and microblog, processing the link website in step S201, and extracting news text. If the link website is a newwave microblog and a WeChat, using a CSS and XPath selector for the newwave microblog and the WeChat, acquiring class, id, data attributes of DOM nodes containing the news text according to DOM structures of the microblog and the WeChat pages, and acquiring the news text by using CSS and XPath rules corresponding to the class, id, data attributes.
Further, the step S30 specifically includes:
step S301 searches for the province name, and if the province name existing in the database is included in the text, adds it to the province name candidate list.
Step S302, searching for the city names, and if the text contains the city names in the database, adding the city names into a city name alternative list, and adding the province names corresponding to the cities into the province name alternative list.
Step S303, searching for the county names, and if the text contains the county names in the database, adding the county names into a county name alternative list, and adding the city names corresponding to the counties and the province names corresponding to the cities into a city name and province name alternative list.
Further, the step S40 specifically includes:
step S401, constructing a deep learning model based on ERNIE and Bi-GRU-CRF. The ERNIE model is composed of 12 layers of Encoder layers in a superposition mode by using an ERNIE Base structure, wherein the input and output of each Encoder layer are composed of 768 Hidden Units. Each Encoder layer is formed by stacking a self-Attention layer, a standardization layer, a full connection layer and a standardization layer, wherein each self-Attention layer contains 12 Attention Heads.
The Bi-GRU-CRF model structure is formed by stacking 2 bidirectional GRU layers and a full-connection layer, the output of the full-connection layer is input into the CRF layer, the maximum possible label of each word in the sentence is obtained, and therefore a named entity recognition result is output.
The ERNIE model inputs the text of the character string, outputs 768-dimensional text embedded vector matrix, the vector matrix is input into the bottommost two-way GRU layer in the Bi-GRU-CRF model, and finally the maximum likelihood label of each word is obtained by outputting in the CRF layer.
The ERNIE model was trained on the MSRA-NER (SIGHAN 2006) dataset to yield a pre-trained model. Text is input into the model, and 768-dimensional text embedded vector matrixes corresponding to the model structures can be obtained. The Bi-GRU-CRF model is trained by adopting the LAC corpus as a training set, only parameters of the bidirectional GRU layer and the CRF layer are updated in the training process, and parameters of the ERNIR model are frozen so as not to participate in training. Through the ERNIE and Bi-GRU-CRF based deep learning model, named entity recognition tasks can be performed, and place nouns and mechanism nouns contained in a text can be obtained. Place nouns include provinces, cities, counties, roads, landmark names, etc., and institution names include government institutions, educational institutions, recreational and recreational facilities, etc.
Step S402, merging the place names and the organization names obtained by the named entity identification to obtain a place name alternative list
Step S403, traversing the place name candidate list, and deleting the place names repeated in the province name candidate list, the city name candidate list and the county name candidate list.
Further, the step S50 specifically includes:
step S501, determining a geocode range list. If the number of the entries in the county name candidate list is greater than 1, the geocoding range list is the county name candidate list, otherwise, whether the number of the entries in the city name candidate list is greater than 1 is judged, if yes, the geocoding range list is the city name candidate list, otherwise, whether the number of the entries in the province name candidate list is greater than 1 is judged, if yes, the geocoding range list is the province name candidate list, and otherwise, the geocoding range is national.
Step S502, traversing the place name alternative list, and if the geocode range is not national, according to the items in the geocode range list, using the geocode range as city limit to call the local geocode service for the current place name. The local geocoding service returns geographic longitude and latitude coordinates under the BD09 coordinate system of the query keyword according to the designated query keyword, the search geographic range limit parameter, the search city limit parameter and the search category limit parameter by searching the national POI interest point database of the hundred-degree map, and returns the understanding degree score and the credibility score of the coding result. The understanding degree score can be used for judging whether the input query keyword exists in the POI interest point database or not and the similarity with the data in the POI interest point database, and the higher the understanding degree is, the more correct the name format of the place name is, and the higher the possibility that the corresponding coordinate is found in the database is; the confidence score may be used to determine the range of place names to which the queried place name relates, with higher confidence scores indicating more specific locations to which the place name relates. In order to ensure the correctness of the place name analysis, an understanding score threshold value is set to be 70, and a credibility score threshold value is set to be 20, so that the wrong place name analysis result is filtered. And judging the result of the hundred-degree geocoding service, and storing the coding result into a coding result candidate list if the understanding score of the coding service result is more than 70 and the credibility score is more than 20.
And if the geocode range is national, performing national geocode service call on the place name. The national geocoding service returns the geographic longitude and latitude coordinates under the Mars coordinate system according to the designated query keyword parameters by searching the national POI interest point database of the Goldmap. The national geocoding service has higher coding accuracy for the national place names without specifying the province range of the city, and the place names are correct and unique and return the longitude and latitude coordinates under the accurate Mars coordinate system only by returning the geocoding result. And if the returned result is not null, storing the coding result into a coding result candidate list.
Step S503, traversing the candidate list of the coding result, selecting the items with higher understanding score and credible score for the same name place name, and deleting other items.
Compared with the prior art, the invention has the advantages that:
the invention uses the deep learning model based on ERNIE and Bi-GRU-CRF to carry out the task of identifying the named entity, thereby being more accurate and efficient for identifying the place name and the organization name in the news text; a place name database is constructed according to provinces, cities and counties of the whole country, and the news content is matched, so that the range of invoking the geocoding service is limited, and the geocoding result is more accurate; by carrying out different news content extraction methods on different news sources, the news text extraction method is effectively compatible with the news text extraction of each large news portal website, and the news geocoding application range is wider.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a flowchart of an algorithm embodying the present invention.
Detailed Description
For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.
The news geocoding method based on deep learning has a flow shown in figure 1, and comprises the following steps:
step S10, constructing a database of provinces, cities, counties and place names of China
Step S20, for a given news link, a general news website text extractor is used for extracting news text.
Step S30, searching a place name database according to the news text extracted in the step S20 to obtain the possible provinces, cities and county place names in the news
And step S40, carrying out a named entity recognition task based on the deep learning model of ERNIE and Bi-GRU-CRF according to the news text extracted in the step S20, and obtaining a place name alternative list in the news text.
Step S50, calling a local or national geocoding service to geocode the place names in the place name alternative list according to the county names, the city names and the province name alternative list.
Step S60, the encoding result candidate list is structured and organized, and the geocoding result is organized according to the modes of national-place roll calling, province-place roll calling, city-place roll calling and county-place roll calling. Therefore, the geographical coding of the news is realized, the geographical coordinates of the news are determined, and longitude and latitude data contained in the news are obtained.
In one illustrated embodiment, step S10 specifically includes:
step S101, obtaining provinces, cities and counties of the whole China, and classifying according to the provinces, the direct jurisdictions and the non-direct jurisdictions and the counties
Step S102, building a subordinate relation between place names, corresponding father-son relations of the city names of the provinces and the counties of the cities one by one, and storing the corresponding relations in a database.
Step S103, establishing a query service, and querying whether the corresponding place name exists in the database or not through the query keyword, and giving the place name type and the father-son relationship thereof.
In one illustrated embodiment, step S20 specifically includes:
in step S201, the web page is first parsed into DOM tree by reading the web page HTML code, each HTML tag being a node, wherein all text is a leaf node in the DOM tree. Traversing each node in the DOM tree, and calculating the total number of character strings of all text leaf nodes in the node divided by the total number of child nodes contained in the node to obtain the text density of the node. And obtaining the news text DOM node by screening the node with the highest text density in the DOM tree. Further, the news text is obtained by obtaining the text content of the text leaf node in the DOM node.
Step S202, it is determined whether the given content is a link or text. For the text, the text is directly used as news text for subsequent steps.
And step S203, judging a news website for the link, if the link website is a newwave microblog and a WeChat, using a CSS and XPath selector for the link website, acquiring class, id, data attributes of DOM nodes containing the news text according to DOM structures of the microblog and the WeChat page, and acquiring the news text by using a CSS and XPath rule corresponding to the attribute.
If the link web site is other news portal web site, the processing of step S201 is performed on the link web site to extract news text.
In one illustrated embodiment, step S30 specifically includes:
step S301 searches for the province name, and if the province name existing in the database is included in the text, adds it to the province name candidate list.
Step S302, searching for the city names, and if the text contains the city names in the database, adding the city names into a city name alternative list, and adding the province names corresponding to the cities into the province name alternative list.
Step S303, searching for the county names, and if the text contains the county names in the database, adding the county names into a county name alternative list, and adding the city names corresponding to the counties and the province names corresponding to the cities into a city name and province name alternative list.
In one illustrated embodiment, step S40 specifically includes:
step S401, merging the place names and the organization names obtained by the named entity identification to obtain a place name alternative list.
Step S402, traversing the place name alternative list, and deleting the place names repeated in the province name alternative list, the city name alternative list and the county name alternative list.
In one illustrated embodiment, step S50 specifically includes:
step S501, determining a geocode range list. If the number of the entries in the county name candidate list is greater than 1, the geocoding range list is the county name candidate list, otherwise, whether the number of the entries in the city name candidate list is greater than 1 is judged, if yes, the geocoding range list is the city name candidate list, otherwise, whether the number of the entries in the province name candidate list is greater than 1 is judged, if yes, the geocoding range list is the province name candidate list, and otherwise, the geocoding range is national.
Step S502, traversing the place name candidate list, if the geocoding range is not national, carrying out local geocoding service call on the current place name according to the items in the geocoding range list, and if the understanding score of the coding service result is more than 70 and the credibility score is more than 20, storing the coding result into the coding result candidate list. And if the geographic coding range is national, carrying out national geographic coding service call on the place name, and if the returned result is not null, storing the coding result into a coding result candidate list.
Step S503, traversing the candidate list of the coding result, selecting the items with higher understanding score and credible score for the same name place name, and deleting other items.
The above description of the embodiments is only for aiding in the understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
Claims (5)
1. The news geocoding method based on deep learning is characterized by comprising the following steps of:
step S10, constructing a database of provinces, cities, counties and place names of China;
step S20, extracting news text from given news links and news contents by using a text extractor;
the step S20 specifically includes the steps of,
step S201, firstly, analyzing a webpage into a DOM tree by reading webpage HTML codes, wherein each HTML label is a node, and all texts are leaf nodes in the DOM tree; traversing each node in the DOM tree, and calculating the total number of character strings of all text leaf nodes in the node divided by the total number of child nodes contained in the node to obtain the text density of the node; obtaining news text DOM nodes by screening nodes with highest text density in the DOM tree; further, obtaining a news text by obtaining text content of a text leaf node in the DOM node;
step S202, judging whether the given content is a link or a text, and regarding the text, directly taking the text as a news text to carry out the subsequent steps;
step S203, judging the news website for the link, if the link website is other news portal websites except WeChat and microblog, processing the link website in step S201, and extracting news text; if the link website is a newwave microblog and a WeChat nutlet, using a CSS and XPath selector to obtain class, id, data attributes of DOM nodes containing news texts according to DOM structures of the microblog and the WeChat pages, and obtaining the news texts by using CSS and XPath rules corresponding to the class, id, data attributes;
step S30, searching a place name database according to the news text extracted in the step S20 to obtain the possible provinces, cities and county place names in the news;
step S40, carrying out a named entity recognition task based on the ERNIE pre-training model and the Bi-GRU-CRF deep learning model according to the news text extracted in the step S20, and obtaining a place name alternative list in the news text;
step S50, calling a local or national geocoding service to geocode the place names in the place name alternative list according to the county names, the city names and the province name alternative list;
the step S50 specifically includes the steps of,
step S501, determining a geocoding range list; if the number of the entries in the county name candidate list is greater than 1, the geocoding range list is the county name candidate list, otherwise, whether the number of the entries in the city name candidate list is greater than 1 is judged, if yes, the geocoding range list is the city name candidate list, otherwise, whether the number of the entries in the province name candidate list is greater than 1 is judged, if yes, the geocoding range list is the province name candidate list, and otherwise, the geocoding range is national;
step S502, traversing a place name alternative list, if the geocoding range is not national, carrying out hundred-degree geocoding service call on the current place name by taking the geocoding range as city limitation according to an item in the geocoding range list, and returning geographic longitude and latitude coordinates under a BD09 coordinate system of the query keyword according to the designated query keyword, the search geographic range limiting parameter, the search city limiting parameter and the search category limiting parameter by searching a hundred-degree map national POI interest point database by the hundred-degree geocoding service, and returning an understanding degree score and a credibility score of the coding result; the understanding degree score can be used for judging whether the input query keyword exists in the POI interest point database or not and the similarity with the data in the POI interest point database, and the higher the understanding degree is, the more correct the name format of the place name is, and the higher the possibility that the corresponding coordinate is found in the database is; the credibility score can be used for judging the place name range related to the queried place name, and the higher the credibility score is, the more specific the place related to the place name is; if the understanding score of the coding service result is larger than P1 and the credibility score is larger than P2, storing the coding result into a coding result candidate list;
if the geographic coding range is national, carrying out national geographic coding service call on the place name, and returning geographic longitude and latitude coordinates under a Mars coordinate system by the national geographic coding service through searching a national POI interest point database of the Goldmap according to the designated query keyword parameters;
step S503, traversing the candidate list of the coding result, selecting an item with higher understanding score and credible score for the same name place name, and deleting other items;
step S60, the candidate list of the coding result is organized structurally, and the geocoding result is organized according to the modes of national-place roll call, province-place roll call, city-place roll call and county-place roll call, so that the geocoding of news is realized, the geographic coordinates of the news are determined, and longitude and latitude data contained in the news are obtained.
2. The news geocoding method based on deep learning of claim 1, wherein: the step S10 specifically includes the steps of,
step S101, obtaining provinces, cities and counties of the whole China, and classifying according to the provinces, the direct jurisdictions and the non-direct jurisdictions and the counties;
step S102, building subordinate relations among place names, corresponding father-son relations of the city names of the provinces and the counties of the cities one by one, and storing the corresponding relations into a database;
step S103, establishing a query service, querying whether a corresponding place name exists in a database through a query keyword, and giving the type of the place name and the father-son relationship thereof.
3. The news geocoding method based on deep learning of claim 1, wherein: the step S30 specifically includes the steps of,
step S301, searching for the province name, and if the province name existing in the database is contained in the text, adding the province name into a province name alternative list;
step S302, searching for the city names, if the text contains the city names in the database, adding the city names into a city name alternative list, and adding the province names corresponding to the cities into the province name alternative list;
step S303, searching for the county names, and if the text contains the county names in the database, adding the county names into a county name alternative list, and adding the city names corresponding to the counties and the province names corresponding to the cities into a city name and province name alternative list.
4. The news geocoding method based on deep learning of claim 1, wherein: the step S40 specifically includes the steps of,
s401, constructing a deep learning model based on ERNIE and Bi-GRU-CRF, wherein the ERNIE model is formed by superposing 12 layers of Encoder layers by using an ERNIE Base structure, wherein the input and output of each Encoder layer are composed of 768 Hidden Units, each Encoder layer is formed by stacking a self-Attention layer, a standardization layer, a full connection layer and a standardization layer, and each self-Attention layer contains 12 Attention Heads;
the Bi-GRU-CRF model structure is formed by stacking 2 layers of bidirectional GRU layers, the output of the top layer bidirectional GRU layer is input into the CRF layer to obtain the maximum possible label of each word in the sentence, and therefore a named entity recognition result is output;
the ERNIE model inputs a character string text, outputs 768-dimensional text embedded vector matrixes, the vector matrixes are input into a bottommost bidirectional GRU layer in the Bi-GRU-CRF model, and finally, the maximum likelihood labels of the words are obtained through outputting at the CRF layer;
training an ERNIE model on an MSRA-NER data set to obtain a pre-training model, inputting a text into the model to obtain 768-dimensional text embedded vector matrixes corresponding to model structures, training a Bi-GRU-CRF model by using an LAC corpus as a training set, and only updating parameters of a bidirectional GRU layer and a CRF layer in the training process and freezing parameters of the ERNIR model to enable the parameters not to participate in training;
through the ERNIE and Bi-GRU-CRF based deep learning model, named entity recognition tasks can be performed to obtain place nouns and mechanism nouns contained in a text, wherein the place nouns comprise provinces, cities, curves, roads and landmark names, and the mechanism names comprise government institutions, educational institutions and leisure and entertainment places;
step S402, merging place names and mechanism names obtained by identifying named entities to obtain a place name alternative list;
step S403, traversing the place name candidate list, and deleting the place names repeated in the province name candidate list, the city name candidate list and the county name candidate list.
5. The news geocoding method based on deep learning of claim 1, wherein: the understanding score threshold is set to 70, the confidence score threshold is set to 20, i.e., P1 is set to 70 and P2 is set to 20.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110747499.2A CN113626536B (en) | 2021-07-02 | 2021-07-02 | News geocoding method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110747499.2A CN113626536B (en) | 2021-07-02 | 2021-07-02 | News geocoding method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113626536A CN113626536A (en) | 2021-11-09 |
CN113626536B true CN113626536B (en) | 2023-08-15 |
Family
ID=78378968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110747499.2A Active CN113626536B (en) | 2021-07-02 | 2021-07-02 | News geocoding method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626536B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114580412B (en) * | 2021-12-29 | 2024-06-04 | 西安工程大学 | Clothing entity identification method based on field adaptation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009134464A (en) * | 2007-11-29 | 2009-06-18 | Nippon Telegr & Teleph Corp <Ntt> | Generation device, generation method and generation program of retrieval result snippet considering range meant by place name, and recording medium recording the generation program |
CN109033358A (en) * | 2018-07-26 | 2018-12-18 | 李辰洋 | News Aggreagation and the associated method of intelligent entity |
CN110472066A (en) * | 2019-08-07 | 2019-11-19 | 北京大学 | A kind of construction method of urban geography semantic knowledge map |
WO2020215793A1 (en) * | 2019-04-23 | 2020-10-29 | 深圳先进技术研究院 | Urban aggregation event prediction and positioning method and device |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100250562A1 (en) * | 2009-03-24 | 2010-09-30 | Mireo d.o.o. | Recognition of addresses from the body of arbitrary text |
-
2021
- 2021-07-02 CN CN202110747499.2A patent/CN113626536B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009134464A (en) * | 2007-11-29 | 2009-06-18 | Nippon Telegr & Teleph Corp <Ntt> | Generation device, generation method and generation program of retrieval result snippet considering range meant by place name, and recording medium recording the generation program |
CN109033358A (en) * | 2018-07-26 | 2018-12-18 | 李辰洋 | News Aggreagation and the associated method of intelligent entity |
WO2020215793A1 (en) * | 2019-04-23 | 2020-10-29 | 深圳先进技术研究院 | Urban aggregation event prediction and positioning method and device |
CN110472066A (en) * | 2019-08-07 | 2019-11-19 | 北京大学 | A kind of construction method of urban geography semantic knowledge map |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
Also Published As
Publication number | Publication date |
---|---|
CN113626536A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gritta et al. | What’s missing in geographical parsing? | |
CN111353030B (en) | Knowledge question and answer retrieval method and device based on knowledge graph in travel field | |
CN102395965B (en) | Method for searching objects in a database | |
WO2006133538A1 (en) | System and method for ranking web content | |
KR101221959B1 (en) | An Integrated Region-Related Information Searching System applying of Map Interface and Knowledge Processing | |
CN103514234A (en) | Method and device for extracting page information | |
CN102841920A (en) | Method and device for extracting webpage frame information | |
CN112749265B (en) | Intelligent question-answering system based on multiple information sources | |
JP2022532451A (en) | How to disambiguate Chinese place name meanings based on encyclopedia knowledge base and word embedding | |
WO2019227581A1 (en) | Interest point recognition method, apparatus, terminal device, and storage medium | |
CN111914539A (en) | Channel announcement information extraction method and system based on BilSTM-CRF model | |
CN111078835A (en) | Resume evaluation method and device, computer equipment and storage medium | |
CN117290489A (en) | Method and system for quickly constructing industry question-answer knowledge base | |
Moura et al. | Reference data enhancement for geographic information retrieval using linked data | |
Shi et al. | Extraction of geospatial information on the Web for GIS applications | |
CN115129719A (en) | Knowledge graph-based qualitative position space range construction method | |
CN113626536B (en) | News geocoding method based on deep learning | |
CN114091454A (en) | Method for extracting place name information and positioning space in internet text | |
Laparra et al. | A dataset and evaluation framework for complex geographical description parsing | |
Kayed et al. | Postal address extraction from the web: a comprehensive survey | |
CN102460440B (en) | Searching methods and devices | |
Abascal-Mena et al. | Geo information extraction and processing from travel narratives. | |
CN113377739A (en) | Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium | |
Shi et al. | Thematic data extraction from Web for GIS and applications | |
Luo et al. | Chinese address standardisation of POIs based on GRU and spatial correlation and applied in multi-source emergency events fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |