CN113626215B - Meteorological scene service map construction method and system based on phrase identification - Google Patents

Meteorological scene service map construction method and system based on phrase identification Download PDF

Info

Publication number
CN113626215B
CN113626215B CN202110830708.XA CN202110830708A CN113626215B CN 113626215 B CN113626215 B CN 113626215B CN 202110830708 A CN202110830708 A CN 202110830708A CN 113626215 B CN113626215 B CN 113626215B
Authority
CN
China
Prior art keywords
service
weather
scene
weather service
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110830708.XA
Other languages
Chinese (zh)
Other versions
CN113626215A (en
Inventor
彭敏
张鼎
潘佳鑫
谢烁圻
罗娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110830708.XA priority Critical patent/CN113626215B/en
Publication of CN113626215A publication Critical patent/CN113626215A/en
Application granted granted Critical
Publication of CN113626215B publication Critical patent/CN113626215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a meteorological scene service map construction method and system based on phrase identification, and relates to the technical field of artificial intelligence; determining a weather service scene corresponding to the problem; based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model; classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences; and establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map. The application can lighten the repeated labor of weather service.

Description

Meteorological scene service map construction method and system based on phrase identification
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a weather scene service map construction method and system based on phrase identification.
Background
Knowledge graph is a technology that stores expertise in a domain or domains through structured representations. Knowledge stored in the knowledge graph can be applied to a plurality of fields such as interactive questions and answers, intelligent recommendation and the like. At present, the knowledge graph technology is widely applied to various fields in life, such as search engine sequencing, shopping recommendation, general questions and answers and the like.
The existing knowledge graph in the meteorological field mostly focuses on the weather encyclopedia knowledge, explanation of weather concepts and properties and other contents, such as the definition of rain and the like. For the related content of the weather service, due to the problems of difficult collection and arrangement, difficult structural representation and the like, the existing knowledge graph in the weather field often ignores the content, however, in the application of actual weather question and answer, recommendation and the like, people often pay attention to the related content of the weather service, such as travel advice in rainy days and the like. For this reason, the existing knowledge graph in the meteorological field is difficult to provide enough knowledge support for subsequent application, and the existing meteorological service is mostly provided by manually compiling rule sentences by professionals, for example, people are reminded to take umbrellas in rainy days, time and labor are wasted by manually compiling by professionals, and meanwhile, the problems of single service scene, monotonous sentences and the like exist in the meteorological service sentences.
Disclosure of Invention
Aiming at the defects in the prior art, the application aims to provide a weather scene service map construction method and system based on phrase identification, which can reduce the repeated labor of weather service.
In order to achieve the above purpose, the application provides a weather scene service map construction method based on phrase identification, which specifically comprises the following steps:
acquiring and storing known weather service related questions and articles according to the determined weather service related entity type and based on a message queue and a multithreading technology, and constructing an original data set;
preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related articles related to questions and answers related to weather services based on a keyword recognition technology to determine weather service scenes corresponding to the questions;
based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
and establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map.
On the basis of the technical scheme, the weather service related entity types comprise weather service users, weather types and event types.
Based on the technical scheme, the method further comprises the following steps of:
based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
Based on the technical scheme, the word segmentation tool is combined with a semantic similarity algorithm to obtain the weather service related entity type corresponding to the new weather service scene, and the specific steps comprise:
word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity;
calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
On the basis of the technical scheme, the Jaro-based distance editing algorithm combines the weather type of the new weather service scene with the weather type of the existing weather service scene service map, and specifically comprises the following steps:
the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of characters matched, X represents the text string length of the weather type vocabulary of the new weather service scene, Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result.
On the basis of the technical scheme, the answers of the weather service related questions and the service sentences in the weather service related articles are extracted based on the corresponding relation between the questions and the answers and on the phrase recognition model under the same weather service scene, and the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
Based on the technical scheme, the classifying the similar service sentences in the same weather service scene as the same service sentence entity comprises the following specific steps:
obtaining word vector representation of each service statement based on the pre-training model BERT;
based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences.
Based on the technical scheme, the related triples are built based on the OWL standard, wherein the built related triples are as follows:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
Based on the technical scheme, the preprocessing comprises error checking, sensitive vocabulary filtering and illegal character rejecting.
The application provides a weather scene service map construction system based on phrase identification, which comprises the following steps:
the construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set;
the recognition module is used for preprocessing data in the original data set, recognizing keywords for questions related to weather service and article titles of weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions;
the extraction module is used for extracting answers of the weather service related questions and service sentences in the weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase identification model;
the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group and complete weather scene service map construction.
Compared with the prior art, the application has the advantages that: the method solves the problems that service sentences are difficult to extract and service knowledge of the meteorological scene is difficult to store structurally in the current meteorological field knowledge graph construction aiming at the meteorological service scene, provides a structural knowledge base for applications such as question-answering, recommendation and the like related to the meteorological service, and simultaneously realizes an automatic updating process of the related knowledge of the new service scene and reduces repeated labor of the meteorological service by decomposing the meteorological service scene into component factors of service crowd, weather type and event type and establishing atomic suggestion groups related to each type of factors.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for constructing a weather scene service map based on phrase identification in an embodiment of the application;
FIG. 2 is a flowchart of determining a weather service scenario according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a weather scene service map established in an embodiment of the present application;
FIG. 4 is a schematic diagram of a weather scene service map established for a primary school student weather service in an embodiment of the application.
Detailed Description
The embodiment of the application provides a weather scene service map construction method based on phrase identification, which realizes the weather map construction of a weather service scene, solves the problems that the current weather field knowledge map construction is difficult to extract service sentences and difficult to structurally store weather scene service knowledge, provides a structural knowledge base for applications such as question-answering, recommendation and the like related to weather service, and simultaneously, realizes an automatic update process of the related knowledge of a new service scene and reduces the repeated labor of the weather service by decomposing the weather service scene into component factors of service population, weather type and event type and establishing atomic suggestion groups related to each type of factors. The embodiment of the application correspondingly provides a weather scene service map construction system based on phrase identification.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, the method for constructing a weather scene service map based on phrase identification provided by the embodiment of the application specifically includes the following steps:
s1: and acquiring and storing the known weather service related questions and articles according to the determined weather service related entity types and based on the message queue and the multithreading technology, and constructing to obtain an original data set.
In the embodiment of the application, the weather service related entity types comprise weather service users, weather types and event types. The weather service users comprise students, schools and parents; weather types include sand, hail, ice, low temperature, high temperature, thunderstorms, typhoons, haze, rain, and snow; event types include school, outing, pick-up students, work, school lessons, holidays at home, weekend outing, daily at home, inter-class activities, and daily living.
In the embodiment of the application, the acquisition of the known weather service related questions and articles is carried out on the Internet, and the weather service related questions and articles under the condition of combining the types of the weather service related entities can be obtained specifically based on the crawler technology.
S2: preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related questions and weather service related articles based on a keyword recognition technology to determine weather service scenes corresponding to the questions.
In the embodiment of the application, the preprocessing comprises error checking, sensitive vocabulary filtering and illegal character removing, and specifically, the error checking, the sensitive vocabulary filtering and the illegal character removing in the preprocessing can be performed by using regular expressions of Python language.
In the embodiment of the application, for the questions of the weather service related questions and the article titles of the weather service related articles, the weather service user, the weather type and the event type comprising the weather service related entity type determined in the step S1 are extracted by using a Python language in a character matching mode, and if any one of the weather service user, the weather type and the event type is true, the sample is discarded.
S3: based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
s4: classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
s4: and establishing a relevant triplet based on OWL (web ontology language) standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the weather scene service map construction.
In a possible implementation manner, in step S2, keyword recognition is performed on the questions of the weather service related questions and the article titles of the weather service related articles based on the keyword recognition technology, and a weather service scene corresponding to the questions is determined, where, as shown in fig. 2, for the original data set, the weather service scene is determined from the article title samples of the questions and the weather service related articles for which the weather service related questions and the weather service related answers are recognized, and the specific flow includes:
s201: judging whether all samples in the original data set are identified, if yes, turning to S3, extracting service sentences, and if not, turning to S202;
s202: carrying out keyword recognition on the sample, extracting an entity, and turning to S203; the entity comprises three entities, namely a weather service user, a weather type and an event type, which are determined in the step S1;
s203: judging whether all three types of entities are provided, if so, turning to S204, and if not, turning to S205;
s204: determining a weather service scene, and turning to S201;
s205: the current sample is discarded and the process goes to S201.
In the embodiment of the application, based on the corresponding relation between questions and answers and on the phrase recognition model, the answers of the questions and answers related to the weather service and service sentences in the articles related to the weather service are extracted under the same weather service scene, and the specific steps comprise:
s301: realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
s302: performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
s303: judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
In the embodiment of the application, similar service sentences in the same weather service scene are classified as the same service sentence entity, and the specific steps comprise:
s401: word vector representations of the service sentences are obtained based on the pre-training model BERT. BERT is a large deep network framed by a "self-attention mechanism" based transducer structure. The self-attention mechanism mainly obtains the representation of the words by adjusting a weight coefficient matrix according to the association degree between the words in the same sentence:
wherein Q, K and V are both word vector matrices, d k Representing the Embedding dimension, attention (Q, K, V) representing dot product Attention, T tableShowing the transpose of the matrix, softmax represents the Softmax activation function.
The multi-head Attention mechanism projects Q, K and V through a plurality of different linear transformations, and finally concatenates different Attention results. The formula of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(ead 1 ,…,ead n )W O
head i =Attention(QW i Q ,KW i K ,VW i V )
wherein, multi (Q, K, V) represents Multi-head attention, concat represents splicing Multi-heads, head i Representing the attention of the i-th dot product, h represents the number of dot product attentiveness.
The fully linked feed forward network in the Transformer structure has two layers of dense: the activation function of the first layer is a ReLU and the second layer is a linear activation function. If the output of the multi-headed attentiveness mechanism is denoted as Z and b is a bias vector, then FFN (fully linked feed forward network) can be expressed as:
FFN(Z)=max(0,ZW 1 +b 1 )W 2 +b 2
wherein max represents the de-maximum calculation, W 1 A weight matrix representing a first layer, b 1 Representing the bias vector of the first layer, W 2 Representing a weight matrix of the second layer, b 2 Representing the bias vector of the second layer.
S402: based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
s403: and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences. Specifically, the preset value is 0.8.
The application uses cosine distance algorithm to calculate the word vector similarity of the service sentence. Word vectors are a method for expressing words as real vectors in a low-dimensional space, namely, high-dimensional word vectors are embedded into the low-dimensional vector space, storage pressure is reduced, and semantic features of words in text are extracted. In the word vector space, word vectors close to each other are semantically closer than word vector pairs farther away, so that service sentences with consistent meaning expression can be categorized by measuring the similarity between the word vectors.
In the embodiment of the application, the related triples are established based on the OWL standard, wherein the established related triples are as follows:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
The atomic suggestion group can be that if hail is hit in the field, a head is protected by what can be utilized nearby, no shielding object exists nearby, the atomic suggestion group can be squatted on the ground, the head is held by both hands, the head, the chest and the abdomen are protected from being hit by hail, the atomic suggestion group can be temporarily placed on the top of the head if articles such as bags are carried about, the head is protected by rain gear or other substitutes outdoors, ice cubes are not required to be picked up outside, damage caused by hail is avoided, sun cream is prevented from being smeared when going out, long-time naked skin is prevented from being in the sun, and the like.
In the embodiment of the application, for obtaining the atomic suggestion group, the construction of the weather scene service map is completed, and the established weather scene service map can be shown as a graph in fig. 3, wherein the weather scene service map comprises a scene, weather, an event, suggestions and a user, and corresponding suggestions are given for different scenes, weather and events and different users.
In one possible implementation, since the existing weather patterns are mostly static patterns, automatic updating cannot be realized when new knowledge appears, and thus, the application further comprises, for new weather service scenarios:
s01: based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
s02: according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
s03: and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
In the embodiment of the application, based on word segmentation tools and combining with semantic similarity algorithms, the weather service related entity types corresponding to the new weather service scene are obtained, and the specific steps comprise:
s011: word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
s012: vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
s013: according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity; the Word2Vec Word vector model in the embodiment of the application is a Word2Vec Word vector model which is trained by using more than 800 ten thousand vocabulary entries in hundred degrees encyclopedia and has the size exceeding 26 GB. And finding the 64-dimensional word vector corresponding to each candidate entity by using a lookup table.
S014: calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
The atomic suggestion group applicable to the new weather service scene is the union of the atomic suggestion groups applicable to the corresponding weather service crowd, weather type examples and event types.
In the embodiment of the application, based on a Jaro edit distance algorithm, the weather type of a new meteorological service scene is combined with the weather type of an existing meteorological scene service map, and the specific steps comprise:
s031: the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of characters matched, X represents the text string length of the weather type vocabulary of the new weather service scene, Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
s032: and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result. Meanwhile, if the calculated maximum Jaro distance is less than 0.5, the weather types are not merged.
In the method for constructing the weather scene service map, in the actual application process, the collected original data set is used for constructing the weather scene service map, and a new weather service user instance 'student' is used for completing map updating work and fusing with the existing weather knowledge map. The completed weather scene service map comprises more than 2 ten thousand weather entities, more than 3 ten thousand weather relations, 400 weather scenes and 21 kinds of atom suggestion groups, and the feasibility and the practicability of the application are effectively proved. A weather scenario service map built around a primary and secondary school student weather service may be shown in fig. 4 for different user scenarios and strong weather, providing primary and secondary school students with different countermeasures.
According to the weather scene service map construction method based on phrase identification, weather map construction aiming at a weather service scene is achieved, the problems that service sentences are difficult to extract and weather scene service knowledge is difficult to store structurally in the current weather field knowledge map construction are solved, a structural knowledge base is provided for applications such as question answering and recommendation related to weather services, meanwhile, the weather service scene is decomposed into component factors of service population, weather types and event types, and an atomic suggestion group related to each type of factors is established, so that an automatic update process aiming at the knowledge related to a new service scene is achieved, and repeated labor of the weather services is reduced. In addition, the application can realize the entity alignment process of the existing knowledge graph, and can finish the knowledge fusion of the newly added weather service knowledge on the premise of ensuring that the knowledge of the existing weather graph is not lost.
The embodiment of the application provides a weather scene service map construction system based on phrase recognition, which comprises a construction module, a recognition module, an extraction module, a classification module and a construction module.
The construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set; the recognition module is used for preprocessing data in the original data set, carrying out keyword recognition on questions and article titles of weather service related questions and weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions; the extraction module is used for extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase recognition model; the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences; the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group, and completing the construction of the weather scene service map.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (9)

1. A weather scene service map construction method based on phrase identification is characterized by comprising the following steps:
acquiring and storing known weather service related questions and articles according to the determined weather service related entity type and based on a message queue and a multithreading technology, and constructing an original data set;
preprocessing data in an original data set, and carrying out keyword recognition on questions and article titles of weather service related articles related to questions and answers related to weather services based on a keyword recognition technology to determine weather service scenes corresponding to the questions;
based on the corresponding relation between the questions and the answers, extracting answers of weather service related questions and service sentences in weather service related articles under the same weather service scene based on the phrase recognition model;
classifying similar service sentences under the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity, establishing a related triplet based on an OWL standard to obtain an atomic suggestion group, and completing the construction of a weather scene service map;
the method comprises the specific steps of extracting answers of weather service related questions and answers and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and on the phrase recognition model, wherein the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
2. The weather scene service map construction method based on phrase identification as claimed in claim 1, wherein: the weather service related entity types include weather service users, weather types, and event types.
3. The method for constructing a weather scene service map based on phrase identification as claimed in claim 2, further comprising, for a new weather service scene:
based on word segmentation tools and combining with semantic similarity algorithms, obtaining weather service related entity types corresponding to new weather service scenes;
according to the weather service related entity type of the new weather service scene, determining an atomic suggestion group corresponding to the new weather service scene;
and combining the weather type of the new weather service scene with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm to finish the fusion of the new weather service scene and the existing weather scene service map.
4. The method for constructing a weather scene service map based on phrase identification as claimed in claim 3, wherein the steps of obtaining the weather service related entity type corresponding to the new weather service scene based on word segmentation tool and combining with semantic similarity algorithm include:
word segmentation is carried out on the related description of the new weather service scene based on the word segmentation tool, so that candidate entities of weather service users, weather types and event types of the new weather service scene are obtained;
vectorizing the obtained candidate entity based on a Word2vec algorithm to obtain vector representation of abstract words;
according to Word2Vec Word vector model, determining 64-dimensional Word vectors corresponding to each candidate entity;
calculating a candidate entity, and taking the canonical entity with the maximum cosine similarity as a standard entity representation of the candidate entity to obtain a weather service related entity type corresponding to a new weather service scene;
the cosine similarity between the candidate entity and the standard entity of the determined weather service related entity type is calculated, and a calculation formula is as follows:
wherein COS (X) i ,Y i ) Representing the cosine similarity between candidate entities and canonical entities of the determined weather service related entity type, X i Word vector representing candidate entity, Y i The word vector representing canonical entities of the determined weather service related entity type, n representing the dimension of the word vector.
5. The method for constructing a weather scene service map based on phrase recognition according to claim 4, wherein the combining the weather type of the new weather scene service map with the weather type of the existing weather scene service map based on the Jaro edit distance algorithm comprises the following specific steps:
the Jaro distance between the weather type of the new weather service scene and each weather type of the existing weather service scene is calculated, and the calculation formula is as follows:
wherein Jaro (X, Y) represents Jaro distance between the weather type of the new weather service scene and the weather type of the existing weather scene service map, m represents the number of matched characters, X represents the text string length of the weather type vocabulary of the new weather service scene, and Y represents the text string length of the weather type vocabulary of the existing weather scene service map;
and combining the weather type of the new weather scene with the weather type of the existing weather scene service map corresponding to the maximum Jaro distance calculation result based on the Jaro distance calculation result.
6. The method for constructing a service map of a weather scene based on phrase identification as claimed in claim 1, wherein the classifying similar service sentences in the same weather service scene as the same service sentence entity comprises the following specific steps:
obtaining word vector representation of each service statement based on the pre-training model BERT;
based on the word vector representation of each service statement, calculating cosine coherence between each service statement, wherein a calculation formula for the cosine coherence between two service statements is as follows:
wherein COS (A) i ,B i ) Representing cosine phase between two service sentences, A i Word vector representing one service sentence of two service sentences, B i A word vector representing the other of the two service sentences, n representing the dimension of the word vector;
and classifying the two service sentences as the same service sentence entity when the cosine coherence between the two service sentences is larger than a preset value according to the cosine coherence calculation result between the service sentences.
7. The method for constructing a weather scene service map based on phrase identification as claimed in claim 1, wherein the method is characterized in that the related triples are established based on OWL standards, and the established related triples are:
t={subject,predicate,object}
where t represents the relevant triples established, the subjects represent the relevant entity types of the weather service and the weather service scene, the subjects represent the atom suggestion group, and the predictes represent the weather service users.
8. The method for constructing a weather scene service map based on phrase identification as claimed in claim 1, wherein said preprocessing includes error checking, sensitive vocabulary filtering and illegal character rejection.
9. A weather scene service map construction system based on phrase identification, comprising:
the construction module is used for acquiring and storing the known weather service related questions and articles according to the determined weather service related entity type and based on the message queue and the multithreading technology, and constructing to obtain an original data set;
the recognition module is used for preprocessing data in the original data set, recognizing keywords for questions related to weather service and article titles of weather service related articles based on a keyword recognition technology, and determining weather service scenes corresponding to the questions;
the extraction module is used for extracting answers of the weather service related questions and service sentences in the weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and based on the phrase identification model;
the classifying module is used for classifying similar service sentences in the same weather service scene into the same service sentence entity based on the determined weather service scene and the extracted service sentences;
the construction module is used for establishing a relevant triplet based on the OWL standard according to the determined weather service related entity type, the determined weather service scene and the classified service statement entity to obtain an atomic suggestion group and complete weather scene service map construction;
the method comprises the specific steps of extracting answers of weather service related questions and answers and service sentences in weather service related articles under the same weather service scene based on the corresponding relation between the questions and the answers and on the phrase recognition model, wherein the specific steps comprise:
realizing an LDA topic model based on a ckpe tool library, and extracting answers of weather service related questions and answers and keyword phrases in weather service related articles;
performing dependency syntax analysis on answers to weather service related questions and weather service related articles by using a hanlp tool library;
judging whether the extracted keyword phrases are in adjacent phrases of a moving object relation or a centering relation, judging whether the extracted keyword phrases are in clauses containing the determined weather service related entity types, and if yes, judging that the clauses are the extracted service sentences.
CN202110830708.XA 2021-07-22 2021-07-22 Meteorological scene service map construction method and system based on phrase identification Active CN113626215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830708.XA CN113626215B (en) 2021-07-22 2021-07-22 Meteorological scene service map construction method and system based on phrase identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830708.XA CN113626215B (en) 2021-07-22 2021-07-22 Meteorological scene service map construction method and system based on phrase identification

Publications (2)

Publication Number Publication Date
CN113626215A CN113626215A (en) 2021-11-09
CN113626215B true CN113626215B (en) 2023-08-18

Family

ID=78380558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830708.XA Active CN113626215B (en) 2021-07-22 2021-07-22 Meteorological scene service map construction method and system based on phrase identification

Country Status (1)

Country Link
CN (1) CN113626215B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050032937A (en) * 2003-10-02 2005-04-08 한국전자통신연구원 Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system
CN110955764A (en) * 2019-11-19 2020-04-03 百度在线网络技术(北京)有限公司 Scene knowledge graph generation method, man-machine conversation method and related equipment
CN111353030A (en) * 2020-02-26 2020-06-30 陕西师范大学 Knowledge question and answer retrieval method and device based on travel field knowledge graph
CN111444305A (en) * 2020-03-19 2020-07-24 浙江大学 Multi-triple combined extraction method based on knowledge graph embedding
CN111506722A (en) * 2020-06-16 2020-08-07 平安科技(深圳)有限公司 Knowledge graph question-answering method, device and equipment based on deep learning technology
CN112883175A (en) * 2021-02-10 2021-06-01 武汉大学 Meteorological service interaction method and system combining pre-training model and template generation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050032937A (en) * 2003-10-02 2005-04-08 한국전자통신연구원 Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system
CN110955764A (en) * 2019-11-19 2020-04-03 百度在线网络技术(北京)有限公司 Scene knowledge graph generation method, man-machine conversation method and related equipment
CN111353030A (en) * 2020-02-26 2020-06-30 陕西师范大学 Knowledge question and answer retrieval method and device based on travel field knowledge graph
CN111444305A (en) * 2020-03-19 2020-07-24 浙江大学 Multi-triple combined extraction method based on knowledge graph embedding
CN111506722A (en) * 2020-06-16 2020-08-07 平安科技(深圳)有限公司 Knowledge graph question-answering method, device and equipment based on deep learning technology
CN112883175A (en) * 2021-02-10 2021-06-01 武汉大学 Meteorological service interaction method and system combining pre-training model and template generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
K-VQA:一种知识图谱辅助下的视觉问答方法;高鸿斌;毛金莹;王会勇;;河北科技大学学报(04);315-326 *

Also Published As

Publication number Publication date
CN113626215A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN107908671B (en) Knowledge graph construction method and system based on legal data
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN110309331A (en) A kind of cross-module state depth Hash search method based on self-supervisory
CN107818164A (en) A kind of intelligent answer method and its system
CN111026842A (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN110750656A (en) Multimedia detection method based on knowledge graph
CN111400469A (en) Intelligent generation system and method for voice question answering
CN113505586A (en) Seat-assisted question-answering method and system integrating semantic classification and knowledge graph
CN110188346A (en) A kind of network security bill part intelligence analysis method based on information extraction
CN110287298A (en) A kind of automatic question answering answer selection method based on question sentence theme
Wu et al. Scene attention mechanism for remote sensing image caption generation
CN112541347A (en) Machine reading understanding method based on pre-training model
CN117076693A (en) Method for constructing digital human teacher multi-mode large language model pre-training discipline corpus
CN113051922A (en) Triple extraction method and system based on deep learning
CN112434164A (en) Network public opinion analysis method and system considering topic discovery and emotion analysis
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN117196042B (en) Semantic reasoning method and terminal for learning target in education universe
CN117236338B (en) Named entity recognition model of dense entity text and training method thereof
CN113626215B (en) Meteorological scene service map construction method and system based on phrase identification
CN117033661A (en) Construction method and device of multi-domain knowledge graph, electronic equipment and storage medium
CN108763487B (en) Mean Shift-based word representation method fusing part-of-speech and sentence information
CN114898775B (en) Voice emotion recognition method and system based on cross-layer cross fusion
CN116257618A (en) Multi-source intelligent travel recommendation method based on fine granularity emotion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant