CN113626215B - Meteorological scene service map construction method and system based on phrase identification - Google Patents
Meteorological scene service map construction method and system based on phrase identification Download PDFInfo
- Publication number
- CN113626215B CN113626215B CN202110830708.XA CN202110830708A CN113626215B CN 113626215 B CN113626215 B CN 113626215B CN 202110830708 A CN202110830708 A CN 202110830708A CN 113626215 B CN113626215 B CN 113626215B
- Authority
- CN
- China
- Prior art keywords
- service
- weather
- scene
- weather service
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a meteorological scene service map construction method and system based on phrase identification, and relates to the technical field of artificial intelligence; determining a weather service scene corresponding to the problem; based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model; classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences; and establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map. The application can lighten the repeated labor of weather service.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a weather scene service map construction method and system based on phrase identification.
Background
Knowledge graph is a technology that stores expertise in a domain or domains through structured representations. Knowledge stored in the knowledge graph can be applied to a plurality of fields such as interactive questions and answers, intelligent recommendation and the like. At present, the knowledge graph technology is widely applied to various fields in life, such as search engine sequencing, shopping recommendation, general questions and answers and the like.
The existing knowledge graph in the meteorological field mostly focuses on the weather encyclopedia knowledge, explanation of weather concepts and properties and other contents, such as the definition of rain and the like. For the related content of the weather service, due to the problems of difficult collection and arrangement, difficult structural representation and the like, the existing knowledge graph in the weather field often ignores the content, however, in the application of actual weather question and answer, recommendation and the like, people often pay attention to the related content of the weather service, such as travel advice in rainy days and the like. For this reason, the existing knowledge graph in the meteorological field is difficult to provide enough knowledge support for subsequent application, and the existing meteorological service is mostly provided by manually compiling rule sentences by professionals, for example, people are reminded to take umbrellas in rainy days, time and labor are wasted by manually compiling by professionals, and meanwhile, the problems of single service scene, monotonous sentences and the like exist in the meteorological service sentences.
Disclosure of Invention
Aiming at the defects in the prior art, the application aims to provide a weather scene service map construction method and system based on phrase identification, which can reduce the repeated labor of weather service.
In order to achieve the above purpose, the application provides a weather scene service map construction method based on phrase identification, which specifically comprises the following steps:
acquiring and storing known weather service related questions and articles according to the determined weather service related entity type and based on a message queue and a multithreading technology, and constructing an original data set;
preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related articles related to questions and answers related to weather services based on a keyword recognition technology to determine weather service scenes corresponding to the questions;
based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
and establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map.
On the basis of the technical scheme, the weather service related entity types comprise weather service users, weather types and event types.
Based on the technical scheme, the method further comprises the following steps of:
based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
Based on the technical scheme, the word segmentation tool is combined with a semantic similarity algorithm to obtain the weather service related entity type corresponding to the new weather service scene, and the specific steps comprise:
word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity;
calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
On the basis of the technical scheme, the Jaro-based distance editing algorithm combines the weather type of the new weather service scene with the weather type of the existing weather service scene service map, and specifically comprises the following steps:
the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of characters matched, X represents the text string length of the weather type vocabulary of the new weather service scene, Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result.
On the basis of the technical scheme, the answers of the weather service related questions and the service sentences in the weather service related articles are extracted based on the corresponding relation between the questions and the answers and on the phrase recognition model under the same weather service scene, and the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
Based on the technical scheme, the classifying the similar service sentences in the same weather service scene as the same service sentence entity comprises the following specific steps:
obtaining word vector representation of each service statement based on the pre-training model BERT;
based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences.
Based on the technical scheme, the related triples are built based on the OWL standard, wherein the built related triples are as follows:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
Based on the technical scheme, the preprocessing comprises error checking, sensitive vocabulary filtering and illegal character rejecting.
The application provides a weather scene service map construction system based on phrase identification, which comprises the following steps:
the construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set;
the recognition module is used for preprocessing data in the original data set, recognizing keywords for questions related to weather service and article titles of weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions;
the extraction module is used for extracting answers of the weather service related questions and service sentences in the weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase identification model;
the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group and complete weather scene service map construction.
Compared with the prior art, the application has the advantages that: the method solves the problems that service sentences are difficult to extract and service knowledge of the meteorological scene is difficult to store structurally in the current meteorological field knowledge graph construction aiming at the meteorological service scene, provides a structural knowledge base for applications such as question-answering, recommendation and the like related to the meteorological service, and simultaneously realizes an automatic updating process of the related knowledge of the new service scene and reduces repeated labor of the meteorological service by decomposing the meteorological service scene into component factors of service crowd, weather type and event type and establishing atomic suggestion groups related to each type of factors.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for constructing a weather scene service map based on phrase identification in an embodiment of the application;
FIG. 2 is a flowchart of determining a weather service scenario according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a weather scene service map established in an embodiment of the present application;
FIG. 4 is a schematic diagram of a weather scene service map established for a primary school student weather service in an embodiment of the application.
Detailed Description
The embodiment of the application provides a weather scene service map construction method based on phrase identification, which realizes the weather map construction of a weather service scene, solves the problems that the current weather field knowledge map construction is difficult to extract service sentences and difficult to structurally store weather scene service knowledge, provides a structural knowledge base for applications such as question-answering, recommendation and the like related to weather service, and simultaneously, realizes an automatic update process of the related knowledge of a new service scene and reduces the repeated labor of the weather service by decomposing the weather service scene into component factors of service population, weather type and event type and establishing atomic suggestion groups related to each type of factors. The embodiment of the application correspondingly provides a weather scene service map construction system based on phrase identification.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, the method for constructing a weather scene service map based on phrase identification provided by the embodiment of the application specifically includes the following steps:
s1: and acquiring and storing the known weather service related questions and articles according to the determined weather service related entity types and based on the message queue and the multithreading technology, and constructing to obtain an original data set.
In the embodiment of the application, the weather service related entity types comprise weather service users, weather types and event types. The weather service users comprise students, schools and parents; weather types include sand, hail, ice, low temperature, high temperature, thunderstorms, typhoons, haze, rain, and snow; event types include school, outing, pick-up students, work, school lessons, holidays at home, weekend outing, daily at home, inter-class activities, and daily living.
In the embodiment of the application, the acquisition of the known weather service related questions and articles is carried out on the Internet, and the weather service related questions and articles under the condition of combining the types of the weather service related entities can be obtained specifically based on the crawler technology.
S2: preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related questions and weather service related articles based on a keyword recognition technology to determine weather service scenes corresponding to the questions.
In the embodiment of the application, the preprocessing comprises error checking, sensitive vocabulary filtering and illegal character removing, and specifically, the error checking, the sensitive vocabulary filtering and the illegal character removing in the preprocessing can be performed by using regular expressions of Python language.
In the embodiment of the application, for the questions of the weather service related questions and the article titles of the weather service related articles, the weather service user, the weather type and the event type comprising the weather service related entity type determined in the step S1 are extracted by using a Python language in a character matching mode, and if any one of the weather service user, the weather type and the event type is true, the sample is discarded.
S3: based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
s4: classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
s4: and establishing a relevant triplet based on OWL (web ontology language) standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the weather scene service map construction.
In a possible implementation manner, in step S2, keyword recognition is performed on the questions of the weather service related questions and the article titles of the weather service related articles based on the keyword recognition technology, and a weather service scene corresponding to the questions is determined, where, as shown in fig. 2, for the original data set, the weather service scene is determined from the article title samples of the questions and the weather service related articles for which the weather service related questions and the weather service related answers are recognized, and the specific flow includes:
s201: judging whether all samples in the original data set are identified, if yes, turning to S3, extracting service sentences, and if not, turning to S202;
s202: carrying out keyword recognition on the sample, extracting an entity, and turning to S203; the entity comprises three entities, namely a weather service user, a weather type and an event type, which are determined in the step S1;
s203: judging whether all three types of entities are provided, if so, turning to S204, and if not, turning to S205;
s204: determining a weather service scene, and turning to S201;
s205: the current sample is discarded and the process goes to S201.
In the embodiment of the application, based on the corresponding relation between questions and answers and on the phrase recognition model, the answers of the questions and answers related to the weather service and service sentences in the articles related to the weather service are extracted under the same weather service scene, and the specific steps comprise:
s301: realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
s302: performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
s303: judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
In the embodiment of the application, similar service sentences in the same weather service scene are classified as the same service sentence entity, and the specific steps comprise:
s401: word vector representations of the service sentences are obtained based on the pre-training model BERT. BERT is a large deep network framed by a "self-attention mechanism" based transducer structure. The self-attention mechanism mainly obtains the representation of the words by adjusting a weight coefficient matrix according to the association degree between the words in the same sentence:
wherein Q, K and V are both word vector matrices, d k Representing the Embedding dimension, attention (Q, K, V) representing dot product Attention, T tableShowing the transpose of the matrix, softmax represents the Softmax activation function.
The multi-head Attention mechanism projects Q, K and V through a plurality of different linear transformations, and finally concatenates different Attention results. The formula of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(ead 1 ,…,ead n )W O
head i =Attention(QW i Q ,KW i K ,VW i V )
wherein, multi (Q, K, V) represents Multi-head attention, concat represents splicing Multi-heads, head i Representing the attention of the i-th dot product, h represents the number of dot product attentiveness.
The fully linked feed forward network in the Transformer structure has two layers of dense: the activation function of the first layer is a ReLU and the second layer is a linear activation function. If the output of the multi-headed attentiveness mechanism is denoted as Z and b is a bias vector, then FFN (fully linked feed forward network) can be expressed as:
FFN(Z)=max(0,ZW 1 +b 1 )W 2 +b 2
wherein max represents the de-maximum calculation, W 1 A weight matrix representing a first layer, b 1 Representing the bias vector of the first layer, W 2 Representing a weight matrix of the second layer, b 2 Representing the bias vector of the second layer.
S402: based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
s403: and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences. Specifically, the preset value is 0.8.
The application uses cosine distance algorithm to calculate the word vector similarity of the service sentence. Word vectors are a method for expressing words as real vectors in a low-dimensional space, namely, high-dimensional word vectors are embedded into the low-dimensional vector space, storage pressure is reduced, and semantic features of words in text are extracted. In the word vector space, word vectors close to each other are semantically closer than word vector pairs farther away, so that service sentences with consistent meaning expression can be categorized by measuring the similarity between the word vectors.
In the embodiment of the application, the related triples are established based on the OWL standard, wherein the established related triples are as follows:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
The atomic suggestion group can be that if hail is hit in the field, a head is protected by what can be utilized nearby, no shielding object exists nearby, the atomic suggestion group can be squatted on the ground, the head is held by both hands, the head, the chest and the abdomen are protected from being hit by hail, the atomic suggestion group can be temporarily placed on the top of the head if articles such as bags are carried about, the head is protected by rain gear or other substitutes outdoors, ice cubes are not required to be picked up outside, damage caused by hail is avoided, sun cream is prevented from being smeared when going out, long-time naked skin is prevented from being in the sun, and the like.
In the embodiment of the application, for obtaining the atomic suggestion group, the construction of the weather scene service map is completed, and the established weather scene service map can be shown as a graph in fig. 3, wherein the weather scene service map comprises a scene, weather, an event, suggestions and a user, and corresponding suggestions are given for different scenes, weather and events and different users.
In one possible implementation, since the existing weather patterns are mostly static patterns, automatic updating cannot be realized when new knowledge appears, and thus, the application further comprises, for new weather service scenarios:
s01: based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
s02: according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
s03: and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
In the embodiment of the application, based on word segmentation tools and combining with semantic similarity algorithms, the weather service related entity types corresponding to the new weather service scene are obtained, and the specific steps comprise:
s011: word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
s012: vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
s013: according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity; the Word2Vec Word vector model in the embodiment of the application is a Word2Vec Word vector model which is trained by using more than 800 ten thousand vocabulary entries in hundred degrees encyclopedia and has the size exceeding 26 GB. And finding the 64-dimensional word vector corresponding to each candidate entity by using a lookup table.
S014: calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
The atomic suggestion group applicable to the new weather service scene is the union of the atomic suggestion groups applicable to the corresponding weather service crowd, weather type examples and event types.
In the embodiment of the application, based on a Jaro edit distance algorithm, the weather type of a new meteorological service scene is combined with the weather type of an existing meteorological scene service map, and the specific steps comprise:
s031: the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of characters matched, X represents the text string length of the weather type vocabulary of the new weather service scene, Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
s032: and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result. Meanwhile, if the calculated maximum Jaro distance is less than 0.5, the weather types are not merged.
In the method for constructing the weather scene service map, in the actual application process, the collected original data set is used for constructing the weather scene service map, and a new weather service user instance 'student' is used for completing map updating work and fusing with the existing weather knowledge map. The completed weather scene service map comprises more than 2 ten thousand weather entities, more than 3 ten thousand weather relations, 400 weather scenes and 21 kinds of atom suggestion groups, and the feasibility and the practicability of the application are effectively proved. A weather scenario service map built around a primary and secondary school student weather service may be shown in fig. 4 for different user scenarios and strong weather, providing primary and secondary school students with different countermeasures.
According to the weather scene service map construction method based on phrase identification, weather map construction aiming at a weather service scene is achieved, the problems that service sentences are difficult to extract and weather scene service knowledge is difficult to store structurally in the current weather field knowledge map construction are solved, a structural knowledge base is provided for applications such as question answering and recommendation related to weather services, meanwhile, the weather service scene is decomposed into component factors of service population, weather types and event types, and an atomic suggestion group related to each type of factors is established, so that an automatic update process aiming at the knowledge related to a new service scene is achieved, and repeated labor of the weather services is reduced. In addition, the application can realize the entity alignment process of the existing knowledge graph, and can finish the knowledge fusion of the newly added weather service knowledge on the premise of ensuring that the knowledge of the existing weather graph is not lost.
The embodiment of the application provides a weather scene service map construction system based on phrase recognition, which comprises a construction module, a recognition module, an extraction module, a classification module and a construction module.
The construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set; the recognition module is used for preprocessing data in the original data set, carrying out keyword recognition on questions and article titles of weather service related questions and weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions; the extraction module is used for extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase recognition model; the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences; the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Claims (9)
1. A weather scene service map construction method based on phrase identification is characterized by comprising the following steps:
acquiring and storing known weather service related questions and articles according to the determined weather service related entity type and based on a message queue and a multithreading technology, and constructing an original data set;
preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related articles related to questions and answers related to weather services based on a keyword recognition technology to determine weather service scenes corresponding to the questions;
based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity, establishing a related triplet based on an OWL standard to obtain an atomic suggestion group, and completing the construction of a weather scene service map;
the method comprises the specific steps of extracting answers of weather service related questions and answers and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and on the phrase recognition model, wherein the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
2. The weather scene service map construction method based on phrase identification as claimed in claim 1, wherein: the weather service related entity types include weather service users, weather types, and event types.
3. The method for constructing a weather scene service map based on phrase identification as claimed in claim 2, further comprising, for a new weather service scene:
based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
4. The method for constructing a weather scene service map based on phrase identification as claimed in claim 3, wherein the steps of obtaining the weather service related entity type corresponding to the new weather service scene based on word segmentation tool and combining with semantic similarity algorithm include:
word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity;
calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
5. The method for constructing a weather scene service map based on phrase recognition according to claim 4, wherein the combining the weather type of the new weather scene service map with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm comprises the following specific steps:
the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of matched characters, X represents the text string length of the weather type vocabulary of the new weather service scene, and Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result.
6. The method for constructing a service map of a weather scene based on phrase identification as claimed in claim 1, wherein the classifying similar service sentences in the same weather service scene as the same service sentence entity comprises the following specific steps:
obtaining word vector representation of each service statement based on the pre-training model BERT;
based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences.
7. The method for constructing a weather scene service map based on phrase identification as claimed in claim 1, wherein the method is characterized in that the related triples are established based on OWL standards, and the established related triples are:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
8. The method for constructing a weather scene service map based on phrase identification as claimed in claim 1, wherein said preprocessing includes error checking, sensitive vocabulary filtering and illegal character rejection.
9. A weather scene service map construction system based on phrase identification, comprising:
the construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set;
the recognition module is used for preprocessing data in the original data set, recognizing keywords for questions related to weather service and article titles of weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions;
the extraction module is used for extracting answers of the weather service related questions and service sentences in the weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase identification model;
the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group and complete weather scene service map construction;
the method comprises the specific steps of extracting answers of weather service related questions and answers and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and on the phrase recognition model, wherein the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110830708.XA CN113626215B (en) | 2021-07-22 | 2021-07-22 | Meteorological scene service map construction method and system based on phrase identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110830708.XA CN113626215B (en) | 2021-07-22 | 2021-07-22 | Meteorological scene service map construction method and system based on phrase identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113626215A CN113626215A (en) | 2021-11-09 |
CN113626215B true CN113626215B (en) | 2023-08-18 |
Family
ID=78380558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110830708.XA Active CN113626215B (en) | 2021-07-22 | 2021-07-22 | Meteorological scene service map construction method and system based on phrase identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626215B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20050032937A (en) * | 2003-10-02 | 2005-04-08 | 한국전자통신연구원 | Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system |
CN110955764A (en) * | 2019-11-19 | 2020-04-03 | 百度在线网络技术(北京)有限公司 | Scene knowledge graph generation method, man-machine conversation method and related equipment |
CN111353030A (en) * | 2020-02-26 | 2020-06-30 | 陕西师范大学 | Knowledge question and answer retrieval method and device based on travel field knowledge graph |
CN111444305A (en) * | 2020-03-19 | 2020-07-24 | 浙江大学 | Multi-triple combined extraction method based on knowledge graph embedding |
CN111506722A (en) * | 2020-06-16 | 2020-08-07 | 平安科技(深圳)有限公司 | Knowledge graph question-answering method, device and equipment based on deep learning technology |
CN112883175A (en) * | 2021-02-10 | 2021-06-01 | 武汉大学 | Meteorological service interaction method and system combining pre-training model and template generation |
-
2021
- 2021-07-22 CN CN202110830708.XA patent/CN113626215B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20050032937A (en) * | 2003-10-02 | 2005-04-08 | 한국전자통신연구원 | Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system |
CN110955764A (en) * | 2019-11-19 | 2020-04-03 | 百度在线网络技术(北京)有限公司 | Scene knowledge graph generation method, man-machine conversation method and related equipment |
CN111353030A (en) * | 2020-02-26 | 2020-06-30 | 陕西师范大学 | Knowledge question and answer retrieval method and device based on travel field knowledge graph |
CN111444305A (en) * | 2020-03-19 | 2020-07-24 | 浙江大学 | Multi-triple combined extraction method based on knowledge graph embedding |
CN111506722A (en) * | 2020-06-16 | 2020-08-07 | 平安科技(深圳)有限公司 | Knowledge graph question-answering method, device and equipment based on deep learning technology |
CN112883175A (en) * | 2021-02-10 | 2021-06-01 | 武汉大学 | Meteorological service interaction method and system combining pre-training model and template generation |
Non-Patent Citations (1)
Title |
---|
K-VQA:一种知识图谱辅助下的视觉问答方法;高鸿斌;毛金莹;王会勇;;河北科技大学学报(04);315-326 * |
Also Published As
Publication number | Publication date |
---|---|
CN113626215A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107908671B (en) | Knowledge graph construction method and system based on legal data | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN110309331A (en) | A kind of cross-module state depth Hash search method based on self-supervisory | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN111026842A (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN110750656A (en) | Multimedia detection method based on knowledge graph | |
CN111400469A (en) | Intelligent generation system and method for voice question answering | |
CN113505586A (en) | Seat-assisted question-answering method and system integrating semantic classification and knowledge graph | |
CN110188346A (en) | A kind of network security bill part intelligence analysis method based on information extraction | |
CN110287298A (en) | A kind of automatic question answering answer selection method based on question sentence theme | |
Wu et al. | Scene attention mechanism for remote sensing image caption generation | |
CN112541347A (en) | Machine reading understanding method based on pre-training model | |
CN117076693A (en) | Method for constructing digital human teacher multi-mode large language model pre-training discipline corpus | |
CN113051922A (en) | Triple extraction method and system based on deep learning | |
CN112434164A (en) | Network public opinion analysis method and system considering topic discovery and emotion analysis | |
CN112988970A (en) | Text matching algorithm serving intelligent question-answering system | |
CN117196042B (en) | Semantic reasoning method and terminal for learning target in education universe | |
CN117236338B (en) | Named entity recognition model of dense entity text and training method thereof | |
CN113626215B (en) | Meteorological scene service map construction method and system based on phrase identification | |
CN117033661A (en) | Construction method and device of multi-domain knowledge graph, electronic equipment and storage medium | |
CN108763487B (en) | Mean Shift-based word representation method fusing part-of-speech and sentence information | |
CN114898775B (en) | Voice emotion recognition method and system based on cross-layer cross fusion | |
CN116257618A (en) | Multi-source intelligent travel recommendation method based on fine granularity emotion analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |