CN110298039B - Event place identification method, system, equipment and computer readable storage medium - Google Patents

Event place identification method, system, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110298039B
CN110298039B CN201910539293.3A CN201910539293A CN110298039B CN 110298039 B CN110298039 B CN 110298039B CN 201910539293 A CN201910539293 A CN 201910539293A CN 110298039 B CN110298039 B CN 110298039B
Authority
CN
China
Prior art keywords
place
words
place words
candidate
administrative division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910539293.3A
Other languages
Chinese (zh)
Other versions
CN110298039A (en
Inventor
韩翠云
陈玉光
刘远圳
潘禄
施茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910539293.3A priority Critical patent/CN110298039B/en
Publication of CN110298039A publication Critical patent/CN110298039A/en
Application granted granted Critical
Publication of CN110298039B publication Critical patent/CN110298039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The embodiment of the invention provides a method, a system, equipment and a computer readable storage medium for identifying an event place. The method comprises the following steps: extracting candidate place words in event information, wherein the event information comprises a title and a text; and inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the recognition model recognizes whether the candidate place words are event places in the place sentences, wherein the place sentences are sentences in which the place words are located. The embodiment of the invention can accurately identify the occurrence place of the event.

Description

Event place identification method, system, equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a method, a system, equipment and a computer readable storage medium for identifying an event place.
Background
The event map is a network map taking events as nodes and the relationship among the events as edges, wherein the event nodes are composed of all attribute characteristics of the events, and the place is one of important attributes of the events, so that the identification of the occurrence place of the events is important to the construction of the event map.
At present, some existing technologies can identify occurrence places in some events, but for scenes where the occurrence places of the events and the occurrence places are related at the same time, the occurrence places of the events and the occurrence places cannot be distinguished, so that the identification accuracy of the occurrence places of the events is low.
Disclosure of Invention
The embodiment of the invention provides a method, a system, equipment and a computer readable storage medium for identifying an event place, so as to improve the identification precision of the event place.
In a first aspect, an embodiment of the present invention provides a method for identifying an event venue, including: extracting candidate place words in event information, wherein the event information comprises a title and a text; and inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences, wherein the place sentences are sentences in which the place words are located.
In a second aspect, an embodiment of the present invention provides an identification system for an event venue, including: the extraction module is used for extracting candidate place words in the event information, wherein the event information comprises a title and a text; and the input and recognition module is used for inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model can recognize whether the candidate place words are event places in the place sentences or not, and the place sentences are sentences in which the place words are located.
In a third aspect, an embodiment of the present invention provides an apparatus for identifying an event venue, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program for execution by a processor to implement the method of the first aspect.
The method, the system, the equipment and the computer readable storage medium for identifying the event places are provided by the embodiment of the invention, and candidate place words in event information are extracted, wherein the event information comprises a title and a text; and inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences, wherein the place sentences are sentences in which the place words are located. Since the identification model considers the title when identifying the event occurrence place, it is possible to distinguish the event occurrence place from the event-related place, thereby accurately identifying the event occurrence place.
Drawings
FIG. 1 is a flowchart of a method for identifying an event venue according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for identifying an event venue according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a system for identifying an event venue according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an event location identification device according to an embodiment of the present invention.
Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The event place identification method provided by the embodiment of the invention can be applied to equipment such as terminal equipment, intelligent watches, tablet computers and the like.
The method for identifying the event places aims to solve the technical problems in the prior art.
The following describes the technical scheme of the present invention and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for identifying an event venue according to an embodiment of the present invention. Aiming at the technical problems in the prior art, the embodiment of the invention provides an event place identification method, which comprises the following specific steps:
step 101, extracting candidate place words in event information, wherein the event information comprises a title and a text.
In this embodiment, the event information may be information, for example, news information including a title and a body. Extracting candidate place words in the event information means extracting all place words from the title and the text of the event information. For example, a piece of news information is the title: the earthquake sensation of 4.1 grade earthquake Sichuan Leshan yaan and the like occurs in Yunnan Zhaotong is obvious. The text is: a4.1-grade earthquake occurs in Zhaoshan county (28.11 degrees North latitude and 103.63 degrees east longitude) in Zhaotong, yunnan, 6 months, 5 days and 15 hours, and the depth of the earthquake focus is 8 kilometers. After the earthquake occurs, the feedback of Sichuan net friends is obvious in the sense of earthquake such as Leshan, yibin, yaan and the like. Extracting candidate place words from the title and body includes: "Yunnan", "Zhaotong city", "Yongshan county", "Sichuan", "Leshan" and "Yaan".
Step 102, inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model, so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences, wherein the place sentences are sentences in which the place words are located.
Optionally, before inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model, so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences, the method of the embodiment of the present invention further includes: constructing an identification model to be trained; acquiring a training sample, wherein the training sample comprises a title and a text; extracting candidate place words in the title and the text of the training sample, and labeling the candidate place words to obtain labeling results, wherein the labeling results comprise whether the candidate place words are place words, whether the candidate place words are event occurrence places and whether the candidate place words are event-related places; inputting the labeling result of the candidate place words, the place sentences corresponding to the candidate place words and the titles corresponding to the candidate place words into the recognition model to be trained; and training the recognition model to be trained until a preset training index is reached.
Specifically, the pre-trained recognition model may be obtained by training a deep learning-based classification model, for example, a fine-tune model based on bert, and training the deep learning-based classification model mainly includes: and acquiring a training sample and labeling the sample. The training samples may be obtained by recalling news information of each field in a period of time, for example, news information in the last year from an event map resource library, randomly extracting a preset number of events, for example, 2000 events, from the obtained training samples, and sorting the 2000 events into the formats of event ID numbers, news links, titles and texts. And extracting all place words contained in the title and the text for subsequent manual labeling.
Optionally, manually labeling the training sample includes: determining whether the title and the text are target events or not, and if the title and the text are the target events, determining whether the title and the text contain place words or not; and if the place words are contained, classifying and labeling the place words. Optionally, whether the missing place words are not extracted can be checked manually, and if the missing place words are not extracted, the place words are extracted in a manual extraction mode.
Further, the marked data is input into a two-class model based on deep learning according to the format of 'title + place sentence and place word', so as to train the two-class model to learn whether the marked place word is an event occurrence place in the place sentence. In the learning process, the model outputs a score which indicates whether the place word belongs to the probability of the event occurrence place in the place sentence or not, and the larger the output score is, the larger the probability that the place word belongs to the event occurrence place in the place sentence is. And when the training result reaches the training index, ending the training. Alternatively, the training index may be that the probability value output by the model reaches a probability threshold.
The title may be used to distinguish between an event occurrence location and an event correlation location, where the location sentence refers to a sentence in which the extracted location word is located.
After training to obtain the identification model through the method steps, the identification model can be used for identifying the occurrence place of the event. In the use process, similar place words, place sentences and titles are input again, and then the model can identify whether the place words are event places in the place sentences.
The embodiment of the invention extracts candidate place words in the event information, wherein the event information comprises a title and a text; and inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the recognition model recognizes whether the candidate place words are event places in the place sentences, wherein the place sentences are sentences in which the place words are located. Since the identification model considers the title when identifying the event occurrence place, it is possible to distinguish the event occurrence place from the event-related place, thereby accurately identifying the event occurrence place.
Optionally, extracting candidate place words in the event information or extracting candidate place words in the title and the text of the training sample includes at least one processing method of:
first kind: and extracting the title and the geographical nouns in the body as candidate place words. Specifically, whether the title and the text contain geographic nouns or not is judged, and if the title and the text contain geographic nouns, the geographic nouns are extracted to be used as place words.
Second kind: and cutting the title and the text, and performing part-of-speech analysis on the cutting result to obtain candidate place words. Alternatively, the existing word segmentation tool may be used to segment the title and the text to obtain a plurality of words, and then label the parts of speech of the plurality of words. Specifically, the part-of-speech tagging includes: several words are labeled as general nouns, modifiers, noun phrases, verb phrases, etc. Finally, the virtual words, the quantitative words, the personification words, the stop words and the like are removed from the words.
Third kind: and extracting the administrative division type place words in the title and the body as candidate place words according to the administrative division dictionary file. Specifically, the administrative division dictionary file is parsed to obtain place words of country, province, state, city, county, district and the like, and whether the title and the text contain the place words of country, province, state, city, county, district and the like obtained by parsing is further judged, and if the place words are contained, the place words are extracted to be the place words.
Fourth kind: and carrying out regular matching on the title and the text through a regular matching template to obtain candidate place words. Specifically, sentences in the title and the text are matched through a regular matching template, so that potential candidate place words are mined. Regular matching templates are held, for example, at (..times.), with attendance at (..times.) [ meeting|activity|forum ].
The title and the text in the above four processing methods refer to the title and the text in the event information or the training sample.
Alternatively, the embodiment of the invention can adopt one of the four modes to extract the place words, and can select two or three of the four modes to extract the place words, and of course, the title and the text can be sequentially processed according to the four modes to extract the candidate place words. The title and the text are processed in turn according to the four modes, and candidate place words are extracted. Thus, the potential candidate place words in the title and the text can be guaranteed to be mined to the greatest extent.
Optionally, after inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model, so that the recognition model recognizes whether the candidate place words are event places in the place sentences, the method of the embodiment of the present invention further includes: and processing the candidate place words corresponding to the identified event places into addresses in a preset format. For example, the candidate place words corresponding to the event occurrence place identified in the above embodiment are mapped to administrative units such as province, city, county/district, etc., and are processed into places in five-level format of "country-province/state-city-county/district-address". If the upper level administrative division of a city is province, state, the final processing results in a country-province/state-city address. If the upper level of the city is a country, the final processing results in a country-to-city address.
Fig. 2 is a flowchart of a method for identifying an event venue according to another embodiment of the present invention. On the basis of the above embodiment, processing the candidate place words corresponding to the identified event places into addresses in a preset format includes:
step 201, word segmentation is carried out on candidate place words;
step 202, performing part-of-speech analysis on the word segmentation result to obtain fine-granularity place words;
since the candidate place words extracted in the foregoing embodiments may be rough, for example, in the format of "Changsha city of Hunan province". Therefore, the extracted candidate place words can be further segmented by adopting the existing segmentation tool to obtain place words with finer granularity, and place words with finer granularity of Hunan province and Changsha city can be obtained after further segmentation.
Optionally, performing part-of-speech analysis on the segmentation result includes: and labeling a plurality of words as general nouns, modifiers, noun phrases, verb phrases and the like, and finally removing words such as virtual words, quantitative words, personification words, stop words and the like from the plurality of words.
Step 203, when the fine-grained place word belongs to the administrative division type place word, processing the fine-grained place word into an address with a preset format by adopting an administrative division dictionary. Optionally, the administrative division class place words include administrative division class place words such as xx province, xx city, xx county, xx district, xx town and the like. For example, if a place word with a certain fine granularity is a long-sand city, determining that a place word of a first-level administrative division class of the long-sand city is Hunan province in a administrative division dictionary, and obtaining an address with a preset format, namely the long-sand city in Hunan province.
Optionally, in the case that the fine-grained place word belongs to the administrative division type place word, processing the fine-grained place word into the address in the preset format by adopting the administrative division dictionary includes: under the condition that the fine-granularity place words belong to administrative division place words, acquiring the upper-level administrative division place words corresponding to the administrative division place words according to an administrative division dictionary until the highest-level administrative division place words are acquired; and processing the administrative division type place words into addresses which comprise the place words of the administrative division level step by step up to the highest level according to the administrative division level.
Optionally, as shown in fig. 2, after performing part-of-speech analysis on the word segmentation result to obtain the place word with fine granularity, the method in the embodiment of the present invention further includes:
step 204, under the condition that the place words with fine granularity belong to the place words of the organization, adopting a mapping relation between a preset entity and places to process the place words with fine granularity into addresses with preset formats.
Optionally, in the case that the fine-grained place word belongs to the organization structure type place word, processing the fine-grained place word into the address in the preset format by adopting the mapping relationship between the preset entity and the place includes: under the condition that the fine-granularity place words belong to organization place words, sequentially acquiring the upper-level place words corresponding to the organization place words according to a preset mapping relation between an entity and places until the highest-level place words; and processing the organization place words into addresses from the organization place words step by step upwards to the highest place words.
Optionally, the organization class place words include xx streets, xx communities, xx cells, xx schools, xx buildings, and other entities. For example, if a place word with a fine granularity is xx university, it may be processed into a preset format of a detailed address corresponding to the national-province/state-city-county/district-xx university according to a preset mapping relationship between an entity and a place. The entity can be a house, a shop, a mailbox or a bus station. Since entities of the same name may exist, one entity may eventually get multiple addresses, resulting in a list of addresses.
In addition, in order to increase the robustness of the recognition model, candidate place words with scores exceeding a score threshold are determined according to the scoring of the recognition model on the extracted candidate place words, and the candidate place words with scores exceeding the score threshold are processed into addresses in a preset format.
For example, the recognition model scores the recognized place word "Xinhua area" as 0.9, scores "Shijia Xinhua area" as 0.8, scores "Beijing" as 0.4, and sets the score threshold as 0.5, and then filters out the place word "Beijing", and processes the "Xinhua area" and "Shijia Xinhua area" as addresses in a preset format.
Further, taking a Xinhua area as an example, firstly word segmentation and part-of-speech analysis are carried out to obtain a Xinhua area, searching the Xinhua area in a political region dictionary, and obtaining the address format after processing if 2 results are obtained, namely the Xinhua area- > [ Shijia, cangzhou ], respectively: "China-Hebei province-Shijia village-Xinhua district-0.45, china-Hebei province-Cangzhou city-Xinhua district-0.45".
Similarly, searching for a Shijia Xinhua area in the administrative division dictionary, and obtaining the address format after processing as follows: "China-Hebei province-Shijia-Xinhua district-0.8".
And combining the two results to obtain 'Chinese-Hebei province-Shijia-Xinhua district-1.25 and Chinese-Hebei province-Cangzhou city-Xinhua district-0.45', wherein the combination is carried out from the highest level to the last level according to the order of administrative division in the combination process, for example, the combination is carried out according to the order of country, province, city and county.
Fig. 3 is a schematic structural diagram of an event location recognition system according to an embodiment of the present invention. The system for identifying an event venue provided in the embodiment of the present invention may execute the processing flow provided in the embodiment of the method for identifying an event venue, as shown in fig. 3, where the system for identifying an event venue 30 includes: a decimation module 31 and an input and recognition module 32; the extraction module 31 is configured to extract candidate place words in event information, where the event information includes a title and a text; the input and recognition module 32 is configured to input the candidate place word and the corresponding title and place sentence into a recognition model trained in advance, so that the recognition model recognizes whether the candidate place word is an event occurrence place in the place sentence, where the place sentence is a sentence where the place word is located.
Optionally, the system 30 of the embodiment of the present invention further includes: a construction module 33, an acquisition module 34, an input module 35 and a training module 36; wherein, the construction module 33 is configured to construct an identification model to be trained; an acquisition module 34, configured to acquire a training sample, where the training sample includes a title and a text; the extracting module 31 is further configured to extract candidate place words in the title and the text of the training sample, and label the candidate place words to obtain a labeling result, where the labeling result includes whether the candidate place words are place words, whether the candidate place words are event places and whether the candidate place words are event-related places; the input module 35 is configured to input the labeling result of the candidate place word, the place sentence corresponding to the candidate place word, and the title corresponding to the candidate place word into the recognition model to be trained; and the training module 36 is configured to train the recognition model to be trained until a preset training index is reached.
Optionally, when the extracting module 31 extracts candidate place words in the event information or extracts candidate place words in the title and the text of the training sample, at least one of the following processes are included: extracting the title and the geographic nouns in the body as the candidate place words; performing word segmentation on the title and the text, and performing part-of-speech analysis on a word segmentation result to obtain the candidate place words; extracting the title and the administrative division category place words in the body according to the administrative division dictionary file to serve as candidate place words; and carrying out regular matching on the title and the text through a regular matching template to obtain candidate place words.
Optionally, the system 30 of the embodiment of the present invention further comprises a processing module 37; the processing module 37 is configured to process the candidate place words corresponding to the identified event occurrence place into an address in a preset format.
Optionally, when the processing module 37 processes the candidate place words corresponding to the identified event place into an address in a preset format, the processing module is specifically configured to: word segmentation is carried out on the candidate place words; part-of-speech analysis is carried out on the word segmentation result to obtain place words with fine granularity; and under the condition that the fine-granularity place words belong to administrative division type place words, adopting an administrative division dictionary to process the fine-granularity place words into addresses in a preset format.
Optionally, the processing module 37 is further configured to process the fine-grained place word into the address in the preset format by adopting a preset mapping relationship between the entity and the place in the case that the fine-grained place word belongs to the organization place word.
Optionally, when the fine-grained place word belongs to an administrative division type place word, the processing module 37 is specifically configured to, when using an administrative division dictionary to process the fine-grained place word into an address in a preset format: under the condition that the fine-granularity place words belong to administrative division place words, acquiring the upper-level administrative division place words corresponding to the administrative division place words according to an administrative division dictionary until the highest-level administrative division place words are acquired; and processing the administrative division type place words into addresses which comprise the place words of the administrative division level step by step up to the highest level according to the administrative division level.
Optionally, when the fine-grained place word belongs to the organization structure type place word, the processing module 37 is specifically configured to, when adopting a mapping relationship between a preset entity and a place to process the fine-grained place word into an address in a preset format: under the condition that the fine-granularity place words belong to organization place words, sequentially acquiring the upper-level place words corresponding to the organization place words according to a preset mapping relation between an entity and places until the highest-level place words; and processing the organization place words into addresses from the organization place words step by step upwards to the highest place words. The system for identifying an event location in the embodiment shown in fig. 3 may be used to implement the technical solution of the above method embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
According to the method, the device and the computer readable storage medium for identifying the event places, provided by the embodiment of the invention, candidate place words in event information are extracted, the event information comprises a title and a text, the candidate place words and the corresponding title and place sentences are input into a pre-trained identification model, so that whether the candidate place words are event places in the place sentences or not is identified by the identification model, and the place sentences are sentences in which the place words are located. Since the identification model considers the title when identifying the event occurrence place, it is possible to distinguish the event occurrence place from the event-related place, thereby accurately identifying the event occurrence place.
Fig. 4 is a schematic structural diagram of an event location identification device according to an embodiment of the present invention. The event location identification device provided by the embodiment of the present invention may execute the processing flow provided by the embodiment of the event location identification method, as shown in fig. 4, where the event location identification device 40 includes: memory 41, processor 42, computer programs and communication interface 43; wherein the computer program is stored in the memory 41 and configured to be executed by the processor 42 for the steps of the above method embodiments.
The identifying device for the event area in the embodiment shown in fig. 4 may be used to implement the technical solution of the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein again.
In addition, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program that is executed by a processor to implement the method for identifying an event venue described in the above embodiment.
The method, the device and the computer readable storage medium for identifying the event places are provided by the embodiment of the invention, and at least one place word in the event information is extracted, wherein the event information comprises a title and a text; and inputting the at least one place word and the corresponding title and place sentence into a pre-trained recognition model so that the recognition model recognizes the event occurrence place in the at least one place word, wherein the place sentence is the sentence in which the place word is located. Since the identification model considers the title when identifying the event occurrence place, it is possible to distinguish the event occurrence place from the event-related place, thereby accurately identifying the event occurrence place.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform part of the steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working process of the above-described device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (16)

1. A method for identifying an event venue, comprising:
extracting candidate place words in event information, wherein the event information comprises a title and a text;
inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences or not, wherein the place sentences are sentences in which the place words are located;
the method further comprises, before the step of inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model recognizes whether the candidate place words are event places in the place sentences, the method further comprises:
constructing an identification model to be trained;
acquiring a training sample, wherein the training sample comprises a title and a text;
extracting candidate place words in the title and the text of the training sample, and labeling the candidate place words to obtain labeling results, wherein the labeling results comprise whether the candidate place words are place words, whether the candidate place words are event occurrence places and whether the candidate place words are event-related places;
inputting the labeling result of the candidate place words, the place sentences corresponding to the candidate place words and the titles corresponding to the candidate place words into the recognition model to be trained;
and training the recognition model to be trained until a preset training index is reached.
2. The method of claim 1, wherein the extracting candidate place words in event information or extracting candidate place words in the title and body of the training sample comprises at least one of:
extracting the title and the geographic nouns in the body as the candidate place words;
performing word segmentation on the title and the text, and performing part-of-speech analysis on a word segmentation result to obtain the candidate place words;
extracting the title and the administrative division type place words in the body as the candidate place words according to the administrative division dictionary file;
and carrying out regular matching on the title and the text through a regular matching template to obtain the candidate place words.
3. The method of claim 1 or 2, wherein upon said entering said candidate place words with corresponding said headlines and place sentences into a pre-trained recognition model, such that said pre-trained recognition model recognizes whether said candidate place words are places of occurrence of an event in said place sentences, said method further comprises:
and processing the candidate place words corresponding to the identified event occurrence places into addresses in a preset format.
4. A method according to claim 3, wherein said processing the candidate place words corresponding to the identified event places as addresses in a preset format comprises:
word segmentation is carried out on the candidate place words;
part-of-speech analysis is carried out on the word segmentation result to obtain place words with fine granularity;
and under the condition that the fine-granularity place words belong to administrative division type place words, adopting an administrative division dictionary to process the fine-granularity place words into addresses in a preset format.
5. The method of claim 4, wherein after the word segmentation result is subjected to part-of-speech analysis to obtain fine-grained place words, the method further comprises:
and under the condition that the fine-grained place words belong to organization place words, processing the fine-grained place words into addresses in a preset format by adopting a preset mapping relation between entities and places.
6. The method according to claim 4 or 5, wherein, in the case that the fine-grained place word belongs to an administrative division type place word, processing the fine-grained place word into a preset-format address using an administrative division dictionary includes:
under the condition that the fine-granularity place words belong to administrative division place words, acquiring the upper-level administrative division place words corresponding to the administrative division place words according to an administrative division dictionary until the highest-level administrative division place words are acquired;
and processing the administrative division type place words into addresses which comprise the place words of the administrative division level step by step up to the highest level according to the administrative division level.
7. The method according to claim 5, wherein, in the case that the fine-grained place word belongs to an organization structure type place word, processing the fine-grained place word into a preset format address by adopting a preset mapping relationship between an entity and a place includes:
under the condition that the fine-granularity place words belong to organization place words, sequentially acquiring the upper-level place words corresponding to the organization place words according to a preset mapping relation between an entity and places until the highest-level place words;
and processing the organization place words into addresses from the organization place words step by step upwards to the highest place words.
8. An identification system for an event venue, comprising:
the extraction module is used for extracting candidate place words in the event information, wherein the event information comprises a title and a text;
the input and recognition module is used for inputting the candidate place words and the corresponding titles and place sentences into a pre-trained recognition model so that the pre-trained recognition model can recognize whether the candidate place words are event places in the place sentences or not, and the place sentences are sentences in which the place words are located;
the system further comprises:
the construction module is used for constructing an identification model to be trained;
the acquisition module is used for acquiring a training sample, wherein the training sample comprises a title and a text;
the extraction module is further used for extracting candidate place words in the title and the text of the training sample, and labeling the candidate place words to obtain labeling results, wherein the labeling results comprise whether the candidate place words are place words, whether the candidate place words are event places and whether the candidate place words are event-related places;
the input module is used for inputting the labeling result of the candidate place words, the place sentences corresponding to the candidate place words and the titles corresponding to the candidate place words into the recognition model to be trained;
and the training module is used for training the recognition model to be trained until a preset training index is reached.
9. The system of claim 8, wherein the extraction module, when extracting candidate place words in event information or candidate place words in the title and text of the training sample, comprises at least one of the following:
extracting the title and the geographic nouns in the body as the candidate place words;
performing word segmentation on the title and the text, and performing part-of-speech analysis on a word segmentation result to obtain the candidate place words;
extracting the title and the administrative division category place words in the body according to the administrative division dictionary file to serve as candidate place words;
and carrying out regular matching on the title and the text through a regular matching template to obtain candidate place words.
10. The system according to claim 8 or 9, characterized in that the system further comprises:
and the processing module is used for processing the candidate place words corresponding to the identified event occurrence places into addresses in a preset format.
11. The system of claim 10, wherein the processing module is configured to, when processing the candidate place words corresponding to the identified event place as an address in a preset format:
word segmentation is carried out on the candidate place words;
part-of-speech analysis is carried out on the word segmentation result to obtain place words with fine granularity;
and under the condition that the fine-granularity place words belong to administrative division type place words, adopting an administrative division dictionary to process the fine-granularity place words into addresses in a preset format.
12. The system of claim 11, wherein the processing module is further configured to process the fine-grained place word into a preset-format address using a preset entity-place mapping relationship in a case where the fine-grained place word belongs to an organization-like place word.
13. The system according to claim 11 or 12, wherein the processing module is configured to, when the fine-grained place word belongs to an administrative division type place word, process the fine-grained place word into an address in a preset format by using an administrative division dictionary, specifically:
under the condition that the fine-granularity place words belong to administrative division place words, acquiring the upper-level administrative division place words corresponding to the administrative division place words according to an administrative division dictionary until the highest-level administrative division place words are acquired;
and processing the administrative division type place words into addresses which comprise the place words of the administrative division level step by step up to the highest level according to the administrative division level.
14. The system of claim 12, wherein the processing module is configured to, when the fine-grained place word belongs to an organization structure type place word, process the fine-grained place word into an address in a preset format by using a mapping relationship between a preset entity and a place, specifically:
under the condition that the fine-granularity place words belong to organization place words, sequentially acquiring the upper-level place words corresponding to the organization place words according to a preset mapping relation between an entity and places until the highest-level place words;
and processing the organization place words into addresses from the organization place words step by step upwards to the highest place words.
15. An apparatus for identifying an event venue, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-7.
16. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-7.
CN201910539293.3A 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium Active CN110298039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910539293.3A CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910539293.3A CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110298039A CN110298039A (en) 2019-10-01
CN110298039B true CN110298039B (en) 2023-05-30

Family

ID=68028381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910539293.3A Active CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110298039B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090994A (en) * 2019-11-12 2020-05-01 北京信息科技大学 Chinese-internet-forum-text-oriented event place attribution province identification method
CN111309861B (en) * 2020-02-07 2023-08-22 鼎富智能科技有限公司 Site extraction method, apparatus, electronic device, and computer-readable storage medium
CN112329469B (en) * 2020-11-05 2023-12-19 新华智云科技有限公司 Administrative region entity identification method and system
CN113837472B (en) * 2021-09-26 2024-03-12 杭州海康威视系统技术有限公司 Method and equipment for predicting event executives

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778402A (en) * 1995-06-07 1998-07-07 Microsoft Corporation Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN103020286A (en) * 2012-12-27 2013-04-03 上海交通大学 Internet ranking list grasping system based on ranking website
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
CN108153860A (en) * 2017-12-25 2018-06-12 中译语通科技(青岛)有限公司 A kind of geolocation analysis method based on multilingual news
CN108415902A (en) * 2018-02-10 2018-08-17 合肥工业大学 A kind of name entity link method based on search engine
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10235683B2 (en) * 2014-07-18 2019-03-19 PlaceIQ, Inc. Analyzing mobile-device location histories to characterize consumer behavior

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778402A (en) * 1995-06-07 1998-07-07 Microsoft Corporation Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN103020286A (en) * 2012-12-27 2013-04-03 上海交通大学 Internet ranking list grasping system based on ranking website
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN108153860A (en) * 2017-12-25 2018-06-12 中译语通科技(青岛)有限公司 A kind of geolocation analysis method based on multilingual news
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN108415902A (en) * 2018-02-10 2018-08-17 合肥工业大学 A kind of name entity link method based on search engine
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Predicting the Occurrence of Life Events from User"s Tweet History";Shun Abe 等;《 2018 IEEE 12th International Conference on Semantic Computing (ICSC)》;20180412;第219-225页 *
"同一新闻事件识别研究";张松;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20180115;第I138-1928页 *
"基于地理位置的新闻事件收集与分析技术的研究";李贞昊;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20160315;第I138-7755页 *
利用地名语义实现Web地震事件空间信息提取;杨继文等;《测绘地理信息》;20131205(第06期);第16-19页 *

Also Published As

Publication number Publication date
CN110298039A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110298039B (en) Event place identification method, system, equipment and computer readable storage medium
CN109189942B (en) Construction method and device of patent data knowledge graph
CN106649818B (en) Application search intention identification method and device, application search method and server
CN104408093B (en) A kind of media event key element abstracting method and device
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN106570180A (en) Artificial intelligence based voice searching method and device
CN104102721A (en) Method and device for recommending information
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
WO2019227581A1 (en) Interest point recognition method, apparatus, terminal device, and storage medium
CN111522901B (en) Method and device for processing address information in text
CN111274239A (en) Test paper structuralization processing method, device and equipment
CN111291566A (en) Event subject identification method and device and storage medium
CN103577989A (en) Method and system for information classification based on product identification
CN109299469A (en) A method of identifying complicated address in long text
CN109597892A (en) Classification method, device, equipment and the storage medium of data in a kind of database
CN107357765A (en) Word document flaking method and device
CN106897274B (en) Cross-language comment replying method
CN106485525A (en) Information processing method and device
CN111199151A (en) Data processing method and data processing device
CN110232160B (en) Method and device for detecting interest point transition event and storage medium
JP6942759B2 (en) Information processing equipment, programs and information processing methods
CN113761137A (en) Method and device for extracting address information
KR20160067473A (en) Method for spam classfication, recording medium and device for performing the method
CN113807102A (en) Method, device, equipment and computer storage medium for establishing semantic representation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant