CN118260717A - Internet low-orbit satellite information mining method, system, device and medium - Google Patents
Internet low-orbit satellite information mining method, system, device and medium Download PDFInfo
- Publication number
- CN118260717A CN118260717A CN202410256745.8A CN202410256745A CN118260717A CN 118260717 A CN118260717 A CN 118260717A CN 202410256745 A CN202410256745 A CN 202410256745A CN 118260717 A CN118260717 A CN 118260717A
- Authority
- CN
- China
- Prior art keywords
- orbit satellite
- information
- internet
- internet low
- low
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005065 mining Methods 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 230000003993 interaction Effects 0.000 claims abstract description 8
- 230000007246 mechanism Effects 0.000 claims description 41
- 238000004422 calculation algorithm Methods 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 26
- 238000001914 filtration Methods 0.000 claims description 23
- 238000012706 support-vector machine Methods 0.000 claims description 23
- 238000010276 construction Methods 0.000 claims description 21
- 238000002372 labelling Methods 0.000 claims description 21
- 238000012216 screening Methods 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 17
- 238000005516 engineering process Methods 0.000 claims description 17
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 claims description 11
- 229910052711 selenium Inorganic materials 0.000 claims description 11
- 239000011669 selenium Substances 0.000 claims description 11
- 238000004140 cleaning Methods 0.000 claims description 10
- 230000008520 organization Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000015654 memory Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000003014 reinforcing effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 19
- 230000010365 information processing Effects 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 description 39
- 230000006870 function Effects 0.000 description 35
- 238000013461 design Methods 0.000 description 9
- 230000014509 gene expression Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000006872 improvement Effects 0.000 description 7
- 230000002708 enhancing effect Effects 0.000 description 6
- 238000007619 statistical method Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/26—Discovering frequent patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an Internet low-orbit satellite information mining method, system, device and medium, relating to the technical field of low-orbit satellite data processing, wherein the method comprises the following steps: the Internet website is used as an information source, and information data of the information source is automatically extracted through a regular matching template; the weight of the corresponding keyword is enhanced by combining the keyword frequency through the SVM, and the Internet low-orbit satellite entity is extracted; extracting an Internet low-orbit satellite relationship based on the entity; extracting an Internet low-orbit satellite event through weight processing, topic mining and topic clustering; constructing an Internet low-orbit satellite entity, relationship and event triplet set, and importing KGAT a recommendation model to construct an Internet low-orbit satellite knowledge graph; and carrying out information analysis according to the interaction information with the user. The method provides a deep and accurate information processing and analysis solution for the Internet low-orbit satellite, and meets the professional requirements of the field on information.
Description
Technical Field
The invention relates to the technical field of low-orbit satellite data processing, in particular to an Internet low-orbit satellite information mining method, system, device and medium.
Background
The field related to low orbit satellite information mining often has high professional properties, including aerospace technology, military applications, geographic information systems, and the like. The informative data of these fields usually contain a large number of terms, complex associations and specific analysis logic.
In the information mining method known by the inventor, information mining analysis is realized by constructing a knowledge graph in the general field: firstly, a scientific knowledge map is established, scientific knowledge is taken as a metering research object, and a visual mesh map structure is established through the modes of co-word analysis, quotation analysis and the like to display the development progress, research hotspots, future trends, the cooperative relationship between researchers and institutions and the like of the scientific knowledge; secondly, semantic network knowledge graph is mainly used for mining and processing text data through related technologies in the fields of artificial intelligence and natural language processing, and acquiring an entity-relation-entity or entity-attribute value triplet to construct a complex graph network so as to reflect the relevance among knowledge.
A large number of data samples can be analyzed and calculated by utilizing a scientific knowledge graph and displayed in a visual mode. The construction of the knowledge graph mainly comprises the following three aspects:
(1) And the acquisition module is used for: obtaining corresponding information from the web crawlers, selecting to use a corresponding crawler method to obtain structured/semi-structured data, and storing the structured/semi-structured data into a database;
(2) An information extraction module: and cleaning and screening the data, and for the semi-structured and unstructured data, extracting a large number of fact triples by mainly using a relation extraction model based on deep learning, and then manually screening. For structured data, adopting a manual identification and extraction mode to directly convert the data into a form of triples required by a Neo4j graph database;
(3) And a knowledge graph drawing module: and constructing and storing the knowledge graph by using an open source graph database Neo4 j.
Although the general knowledge graph can construct general entities and relationships, because the general knowledge graph lacks specific knowledge and specific logic for the field, professional characteristics and complexity of the low-orbit satellite information field are often not reflected well, and professional terms, specific associations and complex relationships contained in the low-orbit satellite information data may not be effectively represented and inferred in the general knowledge graph. In addition, the general-purpose field information mining algorithm only stays at the basic map construction and query level, does not deeply mine and utilize the information contained in the map, and for a user, not only a simple query result but also a system capable of further analyzing the information in the map is needed, and deeper association, prediction or interpretation is provided so as to meet the requirements of the user in decision making and analysis.
Disclosure of Invention
In order to overcome the defects of the information mining method, the invention provides an Internet low-orbit satellite information mining method, system, device and medium which can more accurately represent and analyze Internet low-orbit satellite information data by establishing a knowledge graph.
The invention relates to an Internet low-orbit satellite information mining method for solving the technical problems, which comprises the following steps of:
Setting corresponding regular matching templates aiming at data organization forms of different information source websites by taking an Internet website as an information source, automatically extracting information data of the information source based on Selenium and Chromedriver, cleaning and screening, and inputting the information data into an information database;
The information data of the information database is enhanced with corresponding keyword weight through SVM in combination with keyword frequency, and the Internet low orbit satellite entity is extracted through word embedding technology and BiLSTM-CRF model for marking;
based on the Internet low-orbit satellite entity, extracting the relation between the entities through word embedding technology and BiLSTM model, and summarizing the relation into the Internet low-orbit satellite relation;
Configuring weights of keywords corresponding to internet low-orbit satellite key events based on TF-IDF, performing topic mining through an LDA model to obtain topic probability distribution results, escaping the topic probability distribution results into semantic vectors, calculating semantic distances among documents based on JSD, performing topic clustering through a K-means algorithm, and extracting internet low-orbit satellite events;
Constructing an Internet low-orbit satellite entity, an Internet low-orbit satellite relationship and an Internet low-orbit satellite event triplet set, and importing KGAT a recommendation model to construct an Internet low-orbit satellite knowledge graph;
And carrying out information analysis according to the interaction information with the user based on the Internet low-orbit satellite knowledge graph.
As an improvement of the Internet low-orbit satellite information mining method, the following steps are carried out before the related step between the entities is extracted through word embedding technology and BiLSTM model:
and adding the entity relationship suitable for the Internet low-orbit satellite by extracting the main guest relationship.
As an improvement of the internet low-orbit satellite information mining method, the following steps are carried out before the step of summarizing the internet low-orbit satellite relationship:
and determining the entity relationship strength according to the word frequency of the professional vocabulary suitable for the Internet low-orbit satellite.
As an improvement of the Internet low-orbit satellite information mining method, KGAT recommended models are configured with an improved attention mechanism of KGAT-I algorithm;
The KGAT-I algorithm uses Euclidean distance to measure the similarity between two nodes and considers the distance factor between the nodes; the attention mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, and the attention score pi (h, r, t) is specifically realized by the following formula:
Where w r represents a projection matrix from entity space to relationship space;
Representing the euclidean distance between two nodes under the relationship.
As an improvement of the Internet low-orbit satellite information mining method, an information filtering layer is configured after the attention is embedded into the propagation layer, and the information filtering layer filters the Internet low-orbit satellite entity, the Internet low-orbit satellite relation and the Internet low-orbit satellite event triplets with the attention score lower than a set threshold value.
Compared with the technology known by the inventor, the Internet low-orbit satellite information mining method provides an information all-link processing scheme aiming at the Internet low-orbit satellite field, combines a knowledge graph and a recommendation algorithm, improves the efficiency of information analysis and research and judgment, and realizes efficient retrieval. In the aspect of data acquisition, the system screens out important keywords by using an SVM decision function and a keyword word frequency statistical method, and ensures the accuracy and pertinence of information. In relation extraction, the system improves the precision and pertinence of relation extraction by introducing a relation strong and weak frequency function. In addition, the system improves KGAT algorithm, and the attention mechanism is realized by adding an information filter layer, so that useful information is better obtained from the knowledge graph. In general, the information mining method of the embodiment of the invention provides a deep and accurate information processing and analysis solution for the field of the Internet low-orbit satellite, and meets the requirements of the field on the professionality and the real-time property of the information.
In terms of the internet low-orbit satellite information mining system, the internet low-orbit satellite information mining system for solving the technical problems of the invention comprises:
The acquisition module is used for setting corresponding regular matching templates aiming at data organization forms of different information source websites by taking the Internet websites as information sources, automatically extracting information data of the information sources based on Selenium and Chromedriver, cleaning and screening, and inputting the information data into the information database;
The knowledge graph construction module is used for reinforcing corresponding keyword weight to the information data of the information database through SVM (support vector machine) in combination with keyword frequency, labeling through word embedding technology and BiLSTM-CRF (compact disc model), and extracting Internet low-orbit satellite entities; based on the Internet low-orbit satellite entity, extracting the relation between the entities through word embedding technology and BiLSTM model, and summarizing the relation into the Internet low-orbit satellite relation; configuring weights of keywords corresponding to internet low-orbit satellite key events based on TF-IDF, performing topic mining through an LDA model to obtain topic probability distribution results, escaping the topic probability distribution results into semantic vectors, calculating semantic distances among documents based on JSD, performing topic clustering through a K-means algorithm, and extracting internet low-orbit satellite events; constructing an Internet low-orbit satellite entity, an Internet low-orbit satellite relationship and an Internet low-orbit satellite event triplet set, and importing KGAT a recommendation model to construct an Internet low-orbit satellite knowledge graph;
and the user interaction module is used for providing a user interface allowing a user to interact with the system, receiving the input of the user and displaying the analyzed information result to the user.
As an improvement of an internet low-orbit satellite information mining system, the knowledge graph construction module comprises a KGAT-I module, wherein the KGAT-I module is configured with an attention mechanism improved by a KGAT-I algorithm;
the KGAT-I algorithm uses Euclidean distance to measure the similarity between two nodes and considers the distance factor between the nodes; the attention mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, and the attention score is specifically realized by the following formula:
Where w r represents a projection matrix from entity space to relationship space;
Representing the euclidean distance between two nodes under the relationship.
As an improvement of the Internet low-orbit satellite information mining system, the KGAT-I module comprises an information filtering layer configured after the attention is embedded into the propagation layer, and the information filtering layer filters the Internet low-orbit satellite entity, the Internet low-orbit satellite relation and the Internet low-orbit satellite event triplets with the attention score lower than a set threshold value.
The internet low-orbit satellite information mining system realizes the flow of the internet low-orbit satellite information mining method, has the beneficial effects the same as those of the internet low-orbit satellite information mining method, and is not described in detail herein.
In terms of the internet low-orbit satellite information mining device, the internet low-orbit satellite information mining device for solving the technical problems of the invention comprises: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the steps of the Internet low-orbit satellite information mining method when being executed by the processor.
The internet low-orbit satellite information mining device realizes the flow of the internet low-orbit satellite information mining method, has the beneficial effects the same as those of the internet low-orbit satellite information mining method, and is not described in detail herein.
In the case of a computer storage medium, the computer storage medium stores a computer program which, when executed by a processor, implements the steps of the internet low-orbit satellite intelligence mining method described above.
The computer storage medium of the invention realizes the flow of the Internet low-orbit satellite information mining method, and has the beneficial effects the same as those of the Internet low-orbit satellite information mining method, and the description is omitted here.
Drawings
FIG. 1 is a schematic diagram of an Internet low-orbit satellite information mining system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a Word2Vec+ BiLSTM-CRF model according to an embodiment of the present invention;
FIG. 3 is a diagram of a Word2Vec+ BiLSTM-Attention model according to an embodiment of the present invention;
FIG. 4 is a modified KGAT overall architecture according to an embodiment of the present invention;
FIG. 5 is a first flowchart of an Internet low-orbit satellite information mining method according to an embodiment of the present invention;
Fig. 6 is a first flowchart of an internet low-orbit satellite information mining method according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention for achieving the intended purpose, the following detailed description of the present invention is given with reference to the accompanying drawings and preferred embodiments.
The steps of the method flow described in the specification and the flow chart shown in the drawings of the specification are not necessarily strictly executed according to step numbers, and the execution order of the steps of the method may be changed. Moreover, some steps may be omitted, multiple steps may be combined into one step to be performed, and/or one step may be decomposed into multiple steps to be performed.
Based on the professional characteristics and complexity of the low-orbit satellite information field, the invention combines a knowledge graph with a recommendation algorithm, and provides a novel intelligent satellite information mining method and system. The information processing capability of the information mining system is optimized by applying the knowledge graph and the recommendation algorithm to the field of Internet satellite information mining.
The first embodiment of the present invention provides an internet low-orbit satellite information mining system, referring to fig. 1, which is shown in fig. 1 and includes an acquisition module, a knowledge graph construction module, a KGAT-I module and a user interaction module.
The acquisition module is used for automatically acquiring the information data by taking the Internet website as the information source. The acquisition module is configured with a multi-element information autonomous acquisition strategy, the multi-element information autonomous acquisition strategy adapts to various information sources by adopting an extensible self-defined regular matching template, different regular matching templates are firstly set for different information sources, when new information sources are added, a corresponding extraction template is designed to be synchronously imported with a website, then automatic operation is simulated through Selenium matching Chromedriver, webpage information is acquired, information related to an Internet low-orbit satellite is extracted by combining the regular matching template, finally, primary cleaning and screening of data are carried out on the acquired contents such as HTML, texts, tables and the like according to labels, and an information database is input.
Specifically, the information sources are various information data related to low-orbit satellites such as starlink networks, oneweb networks, telesat networks, hundred degrees, google and the like, which are included in various websites on the internet, and the information sources are difficult to extract in the same way due to different data organization forms of the websites of each information source, so that a regular expression (Regular Expressions) is adopted as a matching template for extracting required information from the information sources. Regular expressions are a powerful text processing tool that can be used to search for, match, and replace specific patterns in text. The key feature of the strategy is its scalability, allowing users to design different regular matching templates for different intelligence sources. When a new information source is added, a user can design a new extraction template for the new information source and synchronously import the new extraction template and the website into the system.
Selenium is a tool for automated web application testing, chromedriver is WebDriver supporting a Chrome browser. Through Selenium and Chromedriver, the system may simulate real user browser operations, such as opening a web page, clicking on a button, filling in a form, etc., to obtain dynamically generated web page information.
The multi-element information autonomous acquisition strategy combines regular expression matching, automatic browser operation and data cleaning and screening, and efficiently and accurately extracts information data related to the Internet low-orbit satellite from various information sources.
And the knowledge graph construction module is used for identifying key entities from the collected information data, and analyzing and extracting the relationship among the entities to construct a knowledge graph. The knowledge graph construction module is configured with a multi-element automatic extraction strategy: including internet low-orbit satellite entity identification policies, low-orbit satellite relationship extraction policies, and internet low-orbit satellite event extraction policies.
An internet low-orbit satellite entity identification strategy aims to accurately identify named entities related to low-orbit satellites from multivariate intelligence. The method comprises the steps of firstly carrying out SVM screening processing on automatically collected multi-element information, enhancing the weight of keywords such as star chains and the like, then carrying out part of manual labeling to ensure the accuracy of the keywords, extracting different keywords, then using a word embedding technology and BiLSTM-CRF data labeling model, and labeling by adopting the BiLSTM-CRF data labeling model. And finally, calculating and extracting named entities with probability related to the Internet low-orbit satellite.
Specifically, the SVM is a supervised learning model for filtering out information not related to low-orbit satellites or performing preliminary classification on the information, and enhancing the weight of keywords such as "star chains" so as to more accurately identify entities related to low-orbit satellites in subsequent entity identification. In the aspect of data acquisition, because of a special mechanism of the satellite Internet, when information is acquired, after the crawler work is completed, automatic screening of data is needed, SVM is used for data screening, and an objective function is increased by a relaxation factor on the basis of possibly introducing noise:
The setting of the C value is adjusted according to the keyword frequency of the corpus obtained by the crawler, and the filtering of the important keyword frequency can be realized by combining a decision function of the SVM classifier and a statistical method of the keyword frequency, adding the keyword frequency as an additional feature into a feature vector and calculating the score or probability value of each data point by using the decision function of the SVM classifier.
To ensure the quality and accuracy of the data, the selection and weight setting of keywords can be verified and corrected by manual annotation.
Referring to FIG. 2, the Word2Vec+BiLSTM-CRF model is shown in FIG. 2, where the word embedding technique is to convert each word in the text into a vector representation of a fixed dimension, so that the semantic relationships between the words can be captured in vector space.
The BiLSTM-CRF data labeling model is a data labeling model combining a two-way long-short-term memory network (BiLSTM) and a Conditional Random Field (CRF), the BiLSTM (two-way long-term memory network) can capture context information in a text, the CRF (conditional random field) is used for a sequence labeling task, the dependency relationship between prediction labels can be ensured, and the BiLSTM-CRF data labeling model is used for labeling in combination of the two, so that the feature vector of embedding is coded. The formula of the CRF model is as follows:
Where f j () is a feature function, λ j is the weight of the feature function f j. f j () needs four parameters to be introduced into it, s: to-be-marked sequences, s represents sentences; i: the position of the word in the sequence; l i: marking type of i position in the sequence to be marked; l i-1: the type of annotation for the i-1 position in the sequence to be annotated.
And finally, obtaining the sequence label with the maximum probability by calculation, namely obtaining an output result.
The internet low orbit satellite entity identification strategy adds an SVM mechanism on the named entity, improves corresponding functions, and is more suitable for the characteristic of higher requirement of internet constellation professional knowledge.
The low orbit satellite relationship extraction strategy automatically discovers the relationships between entities related to the low orbit satellite from the text and generalizes the relationships into the entity relationship graph. After the extraction task of the low orbit satellite entity is completed, the relation among the entities is found out from the text by utilizing a technical means, and the semantic relation is generalized to the entity relation. In the relation extraction model, unique relations of satellite Internet are added, words such as emission, construction, links and the like are extracted in a focusing mode, main guest relations are extracted, and relation strength is determined according to frequency. In the design of the Loss function, frequency weights are added as correlation coefficients. Word2Vec was used as the Word embedding tool, biLSTM as the master model.
Specifically, by extracting the main guest relation in the text to increase the unique relation of the satellite internet, namely, the action and the passive relation between the entities, such as "a certain organization transmits a certain satellite", the frequency of the relation in the text needs to be considered to determine the strength of the relation, for example, if the word "transmission" frequently appears in the text, the relation may be regarded as a strong relation. Based on the above, when the relation extraction model is trained, the Loss function needs to be designed, and the frequency weight can be added into the Loss function as a correlation coefficient, so that the model can learn the relation between the frequency and the relation strength better.
In the aspect of Internet low-orbit satellite relation extraction, the relation strong and weak frequency functions of terms such as satellite transmission, base station construction, link and the like are added besides conventional extraction, and in the aspect of the design of a Loss function,
loss=w1*sum(match_scores)+w2*sum(keywords_importances)
The match_score is an array, represents the degree score of matching each keyword with information content, and can be calculated based on indexes such as similarity and relativity of the keywords; the keyword_ importances is an array, represents the importance score of each keyword, and can be determined according to factors such as the occurrence frequency of keywords in information, such as emission, carrying, construction and the like, semantic importance and the like; the weights w1 and w2 can be adjusted according to specific application scenes and targets so as to balance the matching degree and the keyword importance, and keyword weight settings such as transmitting, carrying, building and the like are correspondingly higher.
Referring to FIG. 3, the Word2Vec+BiLSTM-Attention model is shown in FIG. 3, and in the implementation, word2Vec is used as the Word embedding tool, biLSTM is used as the master model. Word2Vec is a commonly used Word embedding tool that can convert words in text into vector representations of fixed dimensions so that a model can capture semantic relationships between words. The relationship extraction model differs from named entity recognition in that the BiLSTM network layer is then replaced by Attentionlayer with a CRF layer.
Referring also to fig. 2-3, in the named entity recognition task BiLSTM is often used in conjunction with the CRF layer, biLSTM is responsible for capturing context information in the sequence, generating a context representation for each word, and then the CRF layer uses these representations to predict the tags for each word while ensuring global optimality of the tag sequence. In the relationship extraction task, while BiLSTM is still used to capture context information, the emphasis of the model is to discover relationships between entities, rather than predicting one label for each word. The attention mechanism can be seen as a weight distribution mechanism that allows the model to pay more attention to words related to a particular relationship when predicting the relationship. BiLSTM the use of an attention mechanism layer instead of the CRF layer enables the model to focus more on discovering relationships between entities, the attention mechanism allowing the model to process input sequences more flexibly and give higher weight to related words when predicting relationships.
An internet low-orbit satellite event extraction strategy for highlighting and correlating important events. And (3) extracting key events such as satellite emission, base station construction and the like, extracting key words through TF-IDF, manually setting weights of the key words during extraction, ensuring that important events of the Internet satellites are emphasized and associated, including weight processing, topic mining and topic clustering, and enhancing associated processing of corresponding topic events of the Internet constellation.
Specifically, for example, a user hopes to quickly acquire the action measure of a target on a certain event based on a knowledge graph, or gives a decision heuristic to a business person to a certain extent from the view point of complexity of a business flow. Keyword extraction is carried out through TF-IDF, topic mining is carried out through LDA (latent Dirichlet allocation) topic model, the processed topic probability distribution result is converted into semantic vectors, semantic distances among documents are calculated through JSD, and topic clustering is carried out through K-means algorithm.
TF-IDF is a common information retrieval and text mining technique used to evaluate the importance of a word in a document or corpus. LDA (latent dirichlet allocation) is a topic model for finding hidden topics from a collection of documents. JSD (Jensen-Shannon Divergence) is an index for comparing the similarity or difference of two probability distributions, and in this embodiment, JSD is used to calculate the semantic distance between two documents, i.e. the similarity or difference of their topic probability distributions, the smaller the JSD value, the more similar the semantics of the two documents; the larger the JSD value, the greater the semantic difference representing the two documents. The K-means algorithm is a common clustering algorithm used to divide a data set into K clusters, and similar documents are divided into the same cluster to form different topic categories.
Through the weight processing, the topic mining and the topic clustering, the strategy aims to strengthen the association between related topic events of the Internet low-orbit satellite, so that more comprehensive and deep analysis and insight are provided for users, the information related to key events such as satellite emission, base station construction and the like is ensured to be highlighted and associated, and more accurate and useful information is provided for the users.
And KGAT-I module for deep analysis of entities and relations in the knowledge graph through a recommendation model formed by fusing the knowledge graph and the graph annotation meaning network so as to extract Internet low-orbit satellite entities, internet low-orbit satellite relations and Internet low-orbit satellite event triplets and construct the Internet low-orbit satellite knowledge graph. In order to facilitate the use of the knowledge graph by a user, the knowledge graph is combined with a recommendation algorithm, and a recommendation model KGAT formed by fusing the knowledge graph and the graph meaning network is used for improvement.
Specifically, referring to fig. 4, an improved KGAT overall architecture is shown in fig. 4, and based on the KGAT algorithm, a KGAT-I recommendation algorithm for information filtering is provided to be applied to an internet low-orbit satellite information mining system, a conventional attention mechanism is modified, a relationship filtering layer process is added, and the attention mechanism is realized by adding an information filtering layer, so that information in a knowledge graph can be better acquired.
The KGAT-I algorithm measures the similarity between two nodes by improving an attention score strategy, and can use euclidean distance, and due to the difference between the nodes and the offset of vectors between the nodes, the distance factor between the nodes needs to be considered, and the attention score pi (h, r, t) is specifically realized by the following formula:
Where w r denotes the projection matrix from entity space to relationship space, Representing the euclidean distance between two nodes under the relationship. The improved attention embedding mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, node embedding is optimized, and the loss of information contained in the entity is effectively reduced.
KGAT-I recommended algorithm adds an information filtering layer after the attention is embedded into the propagation layer, nodes with lower attention scores can be filtered out through the limitation of a threshold value, and the normalization of the attention scores of the filtered triplet sets adopts a softmax function, namely:
Wherein, pi (h, r, t) represents the attention score in the traditional mode, the similarity between two nodes is measured by Euclidean distance, and pi' (h, r, t) is introduced And (3) correcting the n (h, r, t) by using the triplets with the attention scores not lower than the threshold value to realize better characteristic representation.
Meanwhile, because the Internet low-orbit satellite information mining system has higher requirements on the proper nouns, the scoring mechanism of the proper noun entities in the knowledge graph should be enhanced when the attention mechanism is introduced. Therefore, the attention score mechanism can be better realized, and the influence of noise is avoided, so that the semantic information of the nodes in the knowledge graph is better captured, and the propagation behavior of the model is better explained.
And the user interaction module is used for providing a user interface allowing a user to interact with the system, receiving input of the user and displaying the analyzed information result to the user in a visual or other mode.
The Internet low-orbit satellite information mining system of the embodiment of the invention provides a complete information all-link processing scheme aiming at the professional field of the Internet low-orbit satellite, and provides a more complete processing mechanism and information processing flow aiming at the professional field in the processing process.
In the information processing process, a multi-element information acquisition strategy, a multi-element automatic extraction strategy and a knowledge graph construction auxiliary strategy are provided, and the technical requirements of higher professional requirements and better real-time requirements of Internet satellite information are met. The knowledge graph is combined with the recommendation algorithm, so that the requirement of a user can be better met, the efficiency of information analysis and research and judgment is improved as required, and efficient retrieval is realized.
In the aspect of data acquisition, according to keyword frequency adjustment, an SVM decision function and a keyword word frequency statistical method are combined, and the important keyword word frequency is screened by adding the keyword word frequency as an additional feature into a feature vector and calculating the score or probability value of each data point.
In the aspect of extracting the relationship of the Internet low-orbit satellite, a relationship strong and weak frequency function is introduced in the design of the Loss function, the function can dynamically adjust the calculation mode of the Loss according to the occurrence frequency and the importance of the keywords, and the model can pay more attention to the relationship extraction related to core activities such as satellite emission, base station construction, link and the like in the training process, so that the extraction precision and pertinence are improved, the model can better adapt to the specific requirements of the Internet low-orbit satellite field, and more accurate and useful information is provided for related applications.
The KGAT algorithm is improved, and the attention mechanism is realized by adding an information filter layer, so that the information in the knowledge graph can be better acquired.
The second embodiment of the present invention provides a method for mining information of an internet low-orbit satellite, please refer to fig. 5-6, wherein the flow of the method for mining information of the internet low-orbit satellite is shown in fig. 5-6, and the method comprises the following steps:
s100, setting corresponding regular matching templates according to data organization forms of different information source websites by taking an Internet website as an information source, automatically extracting information data of the information source based on Selenium and Chromedriver, cleaning and screening, and then inputting the information data into an information database.
Specifically, different regular matching templates are set for different information sources, when new information sources are added, corresponding extraction templates are designed to be synchronously imported with websites, then automatic operation is simulated through Selenium matching Chromedriver, webpage information is obtained, information related to internet low-orbit satellites is extracted by combining the regular matching templates, finally, primary cleaning and screening of data are carried out on the obtained contents such as HTML, text, tables and the like according to labels, and information databases are input.
The information sources are various information data related to low-orbit satellites such as starlink official networks, oneweb official networks, telesat official networks, hundred degrees, google and the like, the data organization forms of the various information source websites are different and are difficult to extract in the same mode, and in order to extract required information from the information sources, a regular expression (Regular Expressions) is adopted as a matching template. Regular expressions are a powerful text processing tool that can be used to search for, match, and replace specific patterns in text. The key feature of the strategy is its scalability, allowing users to design different regular matching templates for different intelligence sources. When a new information source is added, a user can design a new extraction template for the new information source and synchronously import the new extraction template and the website into the system.
Selenium is a tool for automated web application testing, chromedriver is WebDriver supporting a Chrome browser. Through Selenium and Chromedriver, the system may simulate real user browser operations, such as opening a web page, clicking on a button, filling in a form, etc., to obtain dynamically generated web page information.
The multi-element information collection step combines regular expression matching, automatic browser operation and data cleaning and screening, and effectively and accurately extracts information data related to the Internet low-orbit satellite from various information sources.
And S200, enhancing the weight of the corresponding key words of the information data of the information database by combining the key word frequency through the SVM, and extracting the Internet low-orbit satellite entity by marking through a word embedding technology and a BiLSTM-CRF model.
Specifically, the internet low-orbit satellite entity identification step carries out SVM screening processing on the automatically collected multi-element information, enhances the weight of keywords such as a star chain and the like, then carries out part of manual labeling to ensure the accuracy of the keywords, and can ensure the quality and the accuracy of data through manual labeling, verification and keyword correction selection and weight setting.
After extracting different keywords, using word embedding technology and BiLSTM-CRF data labeling model, and labeling by BiLSTM-CRF data labeling model. And finally, calculating and extracting named entities with probability related to the Internet low-orbit satellite.
The SVM is a supervised learning model for filtering out information irrelevant to the low-orbit satellite or performing preliminary classification on the information, and enhancing the weight of keywords such as a 'star chain', so as to more accurately identify the entity relevant to the low-orbit satellite in the subsequent entity identification. In the aspect of data acquisition, because of a special mechanism of the satellite Internet, when information is acquired, after the crawler work is completed, automatic screening of data is needed, SVM is used for data screening, and an objective function is increased by a relaxation factor on the basis of possibly introducing noise:
The setting of the C value is adjusted according to the keyword frequency of the corpus obtained by the crawler, and the filtering of the important keyword frequency can be realized by combining a decision function of the SVM classifier and a statistical method of the keyword frequency, adding the keyword frequency as an additional feature into a feature vector and calculating the score or probability value of each data point by using the decision function of the SVM classifier.
According to the keyword frequency adjustment, the keyword frequency is added into the feature vector as an additional feature by combining an SVM decision function and a keyword frequency statistical method, and the score or probability value of each data point is calculated, so that the screening of the important keyword frequency is realized.
Word embedding techniques are techniques that convert each word in text into a vector representation of a fixed dimension, such that semantic relationships between words can be captured in vector space.
The BiLSTM-CRF data labeling model is a data labeling model combining a two-way long-short-term memory network (BiLSTM) and a Conditional Random Field (CRF), the BiLSTM (two-way long-term memory network) can capture context information in a text, the CRF (conditional random field) is used for a sequence labeling task, the dependency relationship between prediction labels can be ensured, and the BiLSTM-CRF data labeling model is used for labeling in combination of the two, so that the feature vector of embedding is coded. The formula of the CRF model is as follows:
Wherein the characteristic function is the weight of the characteristic function. Four parameters are needed to be transmitted into the device, and the parameters are respectively as follows: to-be-marked sequences, s represents sentences; i: the position of the word in the sequence; : marking type of i position in the sequence to be marked; : the type of annotation for the i-1 position in the sequence to be annotated.
And finally, obtaining the sequence label with the maximum probability by calculation, namely obtaining an output result.
The identification step of the Internet low-orbit satellite entity adds an SVM mechanism on the named entity, improves corresponding functions, and is more suitable for the characteristic of higher requirement of Internet constellation professional knowledge.
S300, extracting relations among named entities through word embedding technology and BiLSTM model based on the Internet low-orbit satellite entities and summarizing the Internet low-orbit satellite relations. The method further comprises the following steps:
s301, adding entity relations suitable for the Internet low-orbit satellites by extracting the main guest relations.
S302, determining the entity relationship strength according to the word frequency of the professional vocabulary suitable for the Internet low-orbit satellite.
Specifically, the low orbit satellite relation extraction step is established after the low orbit satellite entity extraction task is completed, and the relation among the entities is found from the text by utilizing a technical means, so that the semantic relation is generalized to the entity relation. In the relation extraction model, unique relations of satellite Internet are added, words such as emission, construction, links and the like are extracted in a focusing mode, main guest relations are extracted, and relation strength is determined according to frequency. In the design of the Loss function, frequency weights are added as correlation coefficients. Word2Vec was used as the Word embedding tool, biLSTM as the master model.
Specifically, by extracting the main guest relation in the text to increase the unique relation of the satellite internet, namely, the action and the passive relation between the entities, such as "a certain organization transmits a certain satellite", the frequency of the relation in the text needs to be considered to determine the strength of the relation, for example, if the word "transmission" frequently appears in the text, the relation may be regarded as a strong relation. Based on the above, when the relation extraction model is trained, the Loss function needs to be designed, and the frequency weight can be added into the Loss function as a correlation coefficient, so that the model can learn the relation between the frequency and the relation strength better.
In the aspect of Internet low-orbit satellite relation extraction, the relation strong and weak frequency functions of terms such as satellite transmission, base station construction, link and the like are added besides conventional extraction, and in the aspect of the design of a Loss function,
loss=w1*sum(match_scores)+w2*sum(keywords_importances)
The match_score is an array, represents the degree score of matching each keyword with information content, and can be calculated based on indexes such as similarity and relativity of the keywords; the keyword_ importances is an array, represents the importance score of each keyword, and can be determined according to factors such as the occurrence frequency of keywords in information, such as emission, carrying, construction and the like, semantic importance and the like; the weights w1 and w2 can be adjusted according to specific application scenes and targets so as to balance the matching degree and the keyword importance, and keyword weight settings such as transmitting, carrying, building and the like are correspondingly higher.
In a specific implementation, word2Vec is used as the Word embedding tool, biLSTM as the master model. Word2Vec is a commonly used Word embedding tool that can convert words in text into vector representations of fixed dimensions so that a model can capture semantic relationships between words. The relationship extraction model differs from named entity recognition in that the BiLSTM network layer is then replaced by an Attention layer by a CRF layer. In the task of named entity recognition BiLSTM is often used in conjunction with the CRF layer, biLSTM is responsible for capturing context information in the sequence, generating a context representation for each word, and then the CRF layer uses these representations to predict the tags for each word while ensuring global optimality of the tag sequence. In the relationship extraction task, while BiLSTM is still used to capture context information, the emphasis of the model is to discover relationships between entities, rather than predicting one label for each word. The attention mechanism can be seen as a weight distribution mechanism that allows the model to pay more attention to words related to a particular relationship when predicting the relationship. BiLSTM the use of an attention mechanism layer instead of the CRF layer enables the model to focus more on discovering relationships between entities, the attention mechanism allowing the model to process input sequences more flexibly and give higher weight to related words when predicting relationships.
S400, configuring weights of keywords corresponding to internet low-orbit satellite key events based on TF-IDF, performing topic mining through an LDA model to obtain topic probability distribution results, escaping the topic probability distribution results into semantic vectors, calculating semantic distances among documents based on JSD, performing topic clustering through a K-means algorithm, and extracting internet low-orbit satellite events.
Specifically, at the time of TF-IDF keyword extraction, the weight of such keywords is manually set, ensuring that important events are highlighted. And enhancing the association processing of related topic events through weight processing, topic mining and topic clustering. F-IDF is a common information retrieval and text mining technique used to evaluate the importance of a word in a document or corpus. LDA (latent dirichlet allocation) is a topic model for finding hidden topics from a collection of documents. JSD (Jensen-Shannon Divergence) is an index for comparing the similarity or difference of two probability distributions, and in this embodiment, JSD is used to calculate the semantic distance between two documents, i.e. the similarity or difference of their topic probability distributions, the smaller the JSD value, the more similar the semantics of the two documents; the larger the JSD value, the greater the semantic difference representing the two documents. The K-means algorithm is a common clustering algorithm used to divide a data set into K clusters, and similar documents are divided into the same cluster to form different topic categories.
Through the weight processing, the topic mining and the topic clustering, the strategy aims to strengthen the association between related topic events of the Internet low-orbit satellite, so that more comprehensive and deep analysis and insight are provided for users, the information related to key events such as satellite emission, base station construction and the like is ensured to be highlighted and associated, and more accurate and useful information is provided for the users.
S500, an Internet low-orbit satellite entity, an Internet low-orbit satellite relation and an Internet low-orbit satellite event triplet set are constructed, and a KGAT recommendation model is imported to construct an Internet low-orbit satellite knowledge graph.
In the step of Internet low-orbit satellite knowledge graph, KGAT (Knowledge GraphAttention Network) recommendation model is a graph-annotation meaning network model based on knowledge graph and is used for recommending tasks related to a system and other knowledge graphs. The importance of the entities and relationships in the knowledge graph is captured through an attention mechanism, and the recommendation or prediction is performed according to the importance.
S600, carrying out information analysis according to the interaction information with the user based on the Internet low-orbit satellite knowledge graph.
The information analysis step can be visual information analysis, and a graphical interface can be used for displaying entities and relations in the knowledge graph, so that a user is allowed to browse and inquire the knowledge graph in a dragging, clicking and other modes; the system can also provide functions of searching, filtering and the like, and help users to quickly find out accurate information.
The Internet low-orbit satellite information mining method provided by the embodiment of the invention can automatically collect and process the low-orbit satellite data, establish a knowledge graph and provide intelligent information analysis and utilization functions. The technical terms, specific association and complex relations contained in the low-orbit satellite information data can be better represented and inferred in the general knowledge graph. In addition, the method can deeply mine and utilize the information contained in the map, so that the information in the map can be further analyzed for a user, and deeper association, prediction or interpretation is provided to meet the requirements of the user in decision making and analysis.
The third embodiment of the invention provides another internet low-orbit satellite information mining method, and further improves a KGAT recommended model in the internet low-orbit satellite knowledge graph construction step on the basis of the method of the second embodiment of the invention so as to better adapt to the professionals and the particularities of the internet low-orbit satellite field.
The method for mining information of the internet low-orbit satellite in the embodiment is the same as the method in the second embodiment of the present invention, and the differences are that:
and constructing an Internet low-orbit satellite entity, an Internet low-orbit satellite relationship and an Internet low-orbit satellite event triplet set, and importing a KGAT recommendation model provided with a KGAT-I algorithm improved attention mechanism and a relationship filtering layer to construct an Internet low-orbit satellite knowledge graph.
Specifically, the step is to provide an information filtering KGAT-I recommendation algorithm applied to an Internet low-orbit satellite information mining system on the basis of KGAT algorithm, modify a conventional attention mechanism, add a relation filtering layer process, realize the attention mechanism by adding an information filtering layer, and better acquire information in a knowledge graph.
The KGAT-I algorithm measures the similarity between two nodes by improving an attention score strategy, and can use euclidean distance, and due to the difference between the nodes and the offset of vectors between the nodes, the distance factor between the nodes needs to be considered, and the attention score pi (h, r, t) is specifically realized by the following formula:
Where w r denotes the projection matrix from entity space to relationship space, Representing the euclidean distance between two nodes under the relationship. The improved attention embedding mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, node embedding is optimized, and the loss of information contained in the entity is effectively reduced.
KGAT-I recommended algorithm adds an information filtering layer after the attention is embedded into the propagation layer, nodes with lower attention scores can be filtered out through the limitation of a threshold value, and the normalization of the attention scores of the filtered triplet sets adopts a softmax function, namely:
wherein, pi (h, r, t) represents the original attention scoring mechanism, Alpha is a set threshold value, and the new scoring mechanism corrects the original scoring mechanism to obtain better characteristic representation through filtering.
Meanwhile, because the Internet low-orbit satellite information mining system has higher requirements on the proper nouns, the scoring mechanism of the proper noun entities in the knowledge graph should be enhanced when the attention mechanism is introduced. Therefore, the attention score mechanism can be better realized, and the influence of noise is avoided, so that the semantic information of the nodes in the knowledge graph is better captured, and the propagation behavior of the model is better explained.
The method for mining the information of the Internet low-orbit satellite provides an information all-link processing scheme aiming at the field of the Internet low-orbit satellite, combines a knowledge graph and a recommendation algorithm, improves the efficiency of information analysis and judgment, and realizes efficient retrieval. In the aspect of data acquisition, the system screens out important keywords by using an SVM decision function and a keyword word frequency statistical method, and ensures the accuracy and pertinence of information. In relation extraction, the system dynamically adjusts the Loss calculation mode by introducing a relation strong and weak frequency function, so that the precision and pertinence of relation extraction are improved. In addition, the system improves KGAT algorithm, and the attention mechanism is realized by adding an information filter layer, so that useful information is better obtained from the knowledge graph. In general, the information mining method of the embodiment of the invention provides a deep and accurate information processing and analysis solution for the field of the Internet low-orbit satellite, and meets the requirements of the field on the professionality and the real-time property of the information.
The fourth embodiment of the present invention provides an internet low-orbit satellite information mining device, which solves the technical problems, and comprises: the method comprises the steps of a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps of the Internet low-orbit satellite information mining method are realized when the computer program is executed by the processor.
The internet low-orbit satellite information mining device provided by the embodiment of the invention realizes the flow of the internet low-orbit satellite information mining method, has the beneficial effects the same as those of the internet low-orbit satellite information mining method, and is not described in detail herein.
A fifth embodiment of the present invention provides a computer storage medium having a computer program stored thereon, which when executed by a processor implements the steps of the above-described internet low-orbit satellite intelligence mining method.
The computer storage medium of the embodiment of the invention realizes the flow of the Internet low-orbit satellite information mining method, and has the advantages same as those of the Internet low-orbit satellite information mining method, and the description is omitted herein.
While the invention has been described in connection with specific embodiments thereof, it is to be understood that these drawings are included in the spirit and scope of the invention, it is not to be limited thereto.
Claims (10)
1. The Internet low-orbit satellite information mining method is characterized by comprising the following steps of:
Setting corresponding regular matching templates aiming at data organization forms of different information source websites by taking an Internet website as an information source, automatically extracting information data of the information source based on Selenium and Chromedriver, cleaning and screening, and inputting the information data into an information database;
The information data of the information database is enhanced with corresponding keyword weight through SVM in combination with keyword frequency, and the Internet low orbit satellite entity is extracted through word embedding technology and BiLSTM-CRF model for marking;
based on the Internet low-orbit satellite entity, extracting the relation between the entities through word embedding technology and BiLSTM model, and summarizing the relation into the Internet low-orbit satellite relation;
Configuring weights of keywords corresponding to internet low-orbit satellite key events based on TF-IDF, performing topic mining through an LDA model to obtain topic probability distribution results, escaping the topic probability distribution results into semantic vectors, calculating semantic distances among documents based on JSD, performing topic clustering through a K-means algorithm, and extracting internet low-orbit satellite events;
Constructing an Internet low-orbit satellite entity, an Internet low-orbit satellite relationship and an Internet low-orbit satellite event triplet set, and importing KGAT a recommendation model to construct an Internet low-orbit satellite knowledge graph;
And carrying out information analysis according to the interaction information with the user based on the Internet low-orbit satellite knowledge graph.
2. The method of claim 1, wherein before the step of extracting the information between entities by word embedding technology and BiLSTM model, the steps of:
and adding the entity relationship suitable for the Internet low-orbit satellite by extracting the main guest relationship.
3. The method of claim 1, wherein prior to said step of generalizing to internet low-orbit satellite relationships, the steps of:
and determining the entity relationship strength according to the word frequency of the professional vocabulary suitable for the Internet low-orbit satellite.
4. The method for mining information from an Internet low-orbit satellite according to any one of claims 1 to 3,
KGAT the recommended model is configured with KGAT-I algorithm-improved attention mechanisms;
The KGAT-I algorithm uses Euclidean distance to measure the similarity between two nodes and considers the distance factor between the nodes; the attention mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, and the attention score pi (h, r, t) is specifically realized by the following formula:
Where w r represents a projection matrix from entity space to relationship space;
Representing the euclidean distance between two nodes under the relationship.
5. The method for mining information from an Internet low-orbit satellite according to claim 4, wherein,
And after the attention is embedded into the propagation layer, an information filtering layer is configured, and the information filtering layer filters the Internet low-orbit satellite entity, the Internet low-orbit satellite relation and the Internet low-orbit satellite event triplets with the attention score lower than a set threshold value.
6. An internet low-orbit satellite information mining system, comprising:
The acquisition module is used for setting corresponding regular matching templates aiming at data organization forms of different information source websites by taking the Internet websites as information sources, automatically extracting information data of the information sources based on Selenium and Chromedriver, cleaning and screening, and inputting the information data into the information database;
The knowledge graph construction module is used for reinforcing corresponding keyword weight to the information data of the information database through SVM (support vector machine) in combination with keyword frequency, labeling through word embedding technology and BiLSTM-CRF (compact disc model), and extracting Internet low-orbit satellite entities; based on the Internet low-orbit satellite entity, extracting the relation between the entities through word embedding technology and BiLSTM model, and summarizing the relation into the Internet low-orbit satellite relation; configuring weights of keywords corresponding to internet low-orbit satellite key events based on TF-IDF, performing topic mining through an LDA model to obtain topic probability distribution results, escaping the topic probability distribution results into semantic vectors, calculating semantic distances among documents based on JSD, performing topic clustering through a K-means algorithm, and extracting internet low-orbit satellite events; constructing an Internet low-orbit satellite entity, an Internet low-orbit satellite relationship and an Internet low-orbit satellite event triplet set, and importing KGAT a recommendation model to construct an Internet low-orbit satellite knowledge graph;
and the user interaction module is used for providing a user interface allowing a user to interact with the system, receiving the input of the user and displaying the analyzed information result to the user.
7. The system for mining information from an internet low-orbit satellite as claimed in claim 6, wherein,
The knowledge graph construction module comprises a KGAT-I module, and the KGAT-I module is configured with an attention mechanism improved by a KGAT-I algorithm;
the KGAT-I algorithm uses Euclidean distance to measure the similarity between two nodes and considers the distance factor between the nodes; the attention mechanism simultaneously considers the difference between the nodes and the offset of the vector, so that a higher attention score is provided between two nodes which are closer to each other in the relation space r, and the attention score is specifically realized by the following formula:
Where w r represents a projection matrix from entity space to relationship space;
Representing the euclidean distance between two nodes under the relationship.
8. The system for mining information from an internet low-orbit satellite as claimed in claim 7, wherein,
The KGAT-I module comprises an information filtering layer configured after the attention is embedded into the propagation layer, wherein the information filtering layer filters the Internet low-orbit satellite entity, the Internet low-orbit satellite relation and the Internet low-orbit satellite event triplets with the attention score lower than a set threshold value.
9. An internet low-orbit satellite information mining device, characterized in that the internet low-orbit satellite information mining device comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the internet low-orbit satellite intelligence mining method according to any one of claims 1 to 5.
10. A computer storage medium having stored thereon a computer program which when executed by a processor performs the steps of the internet low-orbit satellite intelligence mining method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410256745.8A CN118260717A (en) | 2024-03-07 | 2024-03-07 | Internet low-orbit satellite information mining method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410256745.8A CN118260717A (en) | 2024-03-07 | 2024-03-07 | Internet low-orbit satellite information mining method, system, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118260717A true CN118260717A (en) | 2024-06-28 |
Family
ID=91612117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410256745.8A Pending CN118260717A (en) | 2024-03-07 | 2024-03-07 | Internet low-orbit satellite information mining method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118260717A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118585657A (en) * | 2024-08-01 | 2024-09-03 | 之江实验室 | Event prediction method based on satellite orbit threat domain knowledge graph |
-
2024
- 2024-03-07 CN CN202410256745.8A patent/CN118260717A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118585657A (en) * | 2024-08-01 | 2024-09-03 | 之江实验室 | Event prediction method based on satellite orbit threat domain knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111737495B (en) | Middle-high-end talent intelligent recommendation system and method based on domain self-classification | |
US11500818B2 (en) | Method and system for large scale data curation | |
CN117271767B (en) | Operation and maintenance knowledge base establishing method based on multiple intelligent agents | |
CN112966091B (en) | Knowledge map recommendation system fusing entity information and heat | |
CN102663129A (en) | Medical field deep question and answer method and medical retrieval system | |
Zhu et al. | A learning to rank framework for developer recommendation in software crowdsourcing | |
Miao et al. | A dynamic financial knowledge graph based on reinforcement learning and transfer learning | |
Gao et al. | Preference-based interactive multi-document summarisation | |
CN118260717A (en) | Internet low-orbit satellite information mining method, system, device and medium | |
CN112508743B (en) | Technology transfer office general information interaction method, terminal and medium | |
Das et al. | A CV parser model using entity extraction process and big data tools | |
Quan et al. | An improved accurate classification method for online education resources based on support vector machine (SVM): Algorithm and experiment | |
CN113157859A (en) | Event detection method based on upper concept information | |
CN117909466A (en) | Domain question-answering system, construction method, electronic device and storage medium | |
CN116186381A (en) | Intelligent retrieval recommendation method and system | |
CN115269816A (en) | Core personnel mining method and device based on information processing method and storage medium | |
Khatter et al. | Content curation algorithm on blog posts using hybrid computing | |
Tallapragada et al. | Improved Resume Parsing based on Contextual Meaning Extraction using BERT | |
Li | Research on extraction of useful tourism online reviews based on multimodal feature fusion | |
Pichiyan et al. | Web scraping using natural language processing: exploiting unstructured text for data extraction and analysis | |
Viegas et al. | Semantic Academic Profiler (SAP): a framework for researcher assessment based on semantic topic modeling | |
CN113610626A (en) | Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium | |
CN112925983A (en) | Recommendation method and system for power grid information | |
Jafari Sadr et al. | Popular tag recommendation by neural network in social media | |
Rybak et al. | Machine learning-enhanced text mining as a support tool for research on climate change: theoretical and technical considerations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |