WO2024065952A1 - 一种遥感卫星资讯推荐方法、系统及设备 - Google Patents

一种遥感卫星资讯推荐方法、系统及设备 Download PDF

Info

Publication number
WO2024065952A1
WO2024065952A1 PCT/CN2022/129937 CN2022129937W WO2024065952A1 WO 2024065952 A1 WO2024065952 A1 WO 2024065952A1 CN 2022129937 W CN2022129937 W CN 2022129937W WO 2024065952 A1 WO2024065952 A1 WO 2024065952A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
matching
satellite
label
text
Prior art date
Application number
PCT/CN2022/129937
Other languages
English (en)
French (fr)
Inventor
玉龙飞雪
万伟
王冠珠
唐珂
黄涛
王浩天
李辉
刘国栋
乔亦实
闫大鹏
张帅
Original Assignee
中国四维测绘技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国四维测绘技术有限公司 filed Critical 中国四维测绘技术有限公司
Publication of WO2024065952A1 publication Critical patent/WO2024065952A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the patent of this invention relates to the field of intelligent recommendation technology, specifically a method, system architecture, device, equipment and storage medium for recommending remote sensing satellite information.
  • the remote sensing satellite industry is broad, and related information involves multiple links such as remote sensing satellite manufacturing, satellite launch services, ground equipment manufacturing, remote sensing satellite operations, and remote sensing satellite application services.
  • remote sensing satellite information is highly specialized, and obtaining remote sensing satellite intelligence requires professional knowledge.
  • the commonly used recommendation methods are keyword-based recommendation and user behavior-based recommendation.
  • keyword-based retrieval recommendation due to the lack of contextual semantic information, breaks the relevance of the data, resulting in poor recall of remote sensing satellite information and low information utilization; recommendation algorithms based on user behavior are often affected by the sparsity of the behavioral relationship data between users and objects.
  • the cold start problem when recommending new users or new items will also lead to poor recommendation results.
  • the knowledge graph is essentially a semantic network that contains a large number of relationships between entities.
  • the present invention provides an intelligent recommendation method, system and device for remote sensing satellite information.
  • the solution of the present invention is: a remote sensing satellite information recommendation method, comprising:
  • the above-mentioned stored satellite information is initially recalled
  • the related information of the above recalled information results is mined as the recommended candidate information set;
  • the candidate information in the above-mentioned recommended candidate information set is evaluated for relevance, and a final recommendation is made based on the evaluation results.
  • the preliminary processing of the system includes:
  • Multi-threaded crawling is performed from news websites, remote sensing satellite operator websites, and websites of various countries’ space government agencies to obtain the original text information of satellite information;
  • the original text information is processed through extraction, conversion, cleaning and loading processes, and the information text is organized into a preset data storage structure and stored in the above-mentioned retrieval engine.
  • the search engine of the system selects the ES full-text search engine, and the full-text search engine is deployed in a cluster manner.
  • the system obtains satellite information with labels and confidence levels including:
  • the intelligent label matching model is used to perform intelligent label matching on the stored information text to obtain satellite information with labels and confidence levels.
  • the configurable intelligent tag matching model of the system includes three parts: tag matching strategy, tag matching mode, and result confidence calculation mode;
  • the tag matching strategy stores a plurality of pre-designed matching strategies
  • the tag matching mode is used to select different matching strategies from the tag matching strategies to form a single strategy mode or a combined matching mode according to the requirements;
  • the result confidence calculation mode stores different confidence calculation modes.
  • the matching strategies of the system include regular label matching strategies, text similarity label matching strategies, and deep learning label matching strategies;
  • the regular label matching strategy is applicable to label matching consisting of professional phrases or fixed expressions in the field of remote sensing satellites;
  • the text similarity tag matching strategy is used to use the text similarity between the information and the standard expression to determine whether the tag matching is satisfied.
  • the standard is a reference standard and is a generalized expression of the tag feature design.
  • the deep learning label matching strategy uses information text as input and label matching score as output for network training; if the score output by the network exceeds a set threshold, the label is considered to be matched and the score is used as the confidence level.
  • the system stores the configuration mode of the smart tag matching model in the MySQL database through a table design method with maximum versatility, thereby ensuring the open and closed principle of the smart tag matching model and enabling the matching strategy to be iterated flexibly and quickly.
  • the most universal table design content of the system includes:
  • Design verification stage The number of satellite information texts is less than 500, and the matching strategy and label matching mode are: single strategy mode of regular matching strategy or text similarity matching strategy; the result confidence calculation mode is to directly use the confidence result of a single strategy as the final confidence result of the model;
  • the number of satellite information texts is 500-2000, and the matching strategy and label matching mode are: regular matching + deep learning label matching strategy; or a combination strategy mode of text similarity + deep learning label matching strategy; the result confidence calculation mode is to obtain the average value of the confidence results of different strategies as the final confidence result of the model;
  • the matching strategy and label matching mode are: regular matching + deep learning label matching strategy; or a combination strategy mode of text similarity + deep learning label matching strategy; the result confidence calculation mode is to obtain the weighted average of the confidence results of different strategies as the final confidence result of the model;
  • the matching strategy and label matching mode are: single strategy mode of deep learning label matching strategy, and the result confidence calculation mode is to directly use the confidence result of a single strategy as the final confidence result of the model.
  • the system performs manual verification on the matching results, wherein the design verification stage and the development implementation stage are both assisted by manual full verification; the trial operation stage is assisted by manual regular sampling verification; and the operation stage is assisted by manual irregular sampling verification.
  • the deep learning network is iteratively updated every time the information increases by a preset length, and the number of updates during the entire development and implementation phase is 10-20 times.
  • the system deep learning network consists of an embedding layer, 2 bidirectional LSTM layers, and 3 fully connected layers; the network is trained with information text as input and label matching scores as output; if the score output by the network exceeds a set threshold, the label is considered to be matched and the score is used as the confidence level.
  • the system constructing a satellite knowledge graph includes:
  • Each label of the satellite information label library is stored in the graph database in the form of label entity
  • the entities, relationships, and attributes extracted and integrated above are stored in batches into the graph database to complete the construction of the knowledge graph in the field of remote sensing satellite knowledge.
  • the system performs relevance evaluation on the above-mentioned candidate information by constructing a graph-enhanced semantic analysis network.
  • the graph-enhanced semantic analysis network of the system includes a context feature analysis network, a label feature analysis network, and a graph feature analysis network;
  • the context feature analysis network is used to abstract and analyze the semantic feature information of the information text itself
  • the label feature analysis network is used to extract, abstract and analyze label feature information
  • the graph feature analysis network is used to extract, abstract and analyze graph feature information.
  • the relevance evaluation of the system includes:
  • context feature analysis network randomly sample the information text through sentences, randomly generate a set of texts to be analyzed, and map them into sentence vectors in a low-dimensional space to form a context feature matrix, and further abstract text context features from the context feature matrix;
  • the label entities of the information text are vectorized, multiplied by the confidence information of the label, to form a label feature matrix; further feature abstraction is performed on the label feature matrix;
  • the non-labeled entities of the information text are used as relatively deep hidden associated information carriers, mapped to low-dimensional space to form entity/relationship vectors, and form a graph embedding feature matrix, and further extract and abstract the graph features of the graph embedding feature matrix;
  • the above two texts are the current information text and each candidate information text in the recommended candidate information set, and the current information text is an information text clicked by the user from the initially recalled satellite information.
  • a remote sensing satellite information recommendation system including a satellite information collection and storage module, a satellite information label intelligent matching module, a remote sensing satellite field knowledge graph construction module, and a satellite information query recommendation module;
  • the satellite information collection and storage module is used to collect satellite information text and perform preliminary processing, organize the information text into a preset data storage structure and store it;
  • the satellite information label intelligent matching module is used to perform intelligent label matching on the stored information text to obtain satellite information with labels and confidence levels;
  • the remote sensing satellite field knowledge graph construction module is used to construct a satellite knowledge graph using the satellite information with labels and confidence levels.
  • the satellite information query and recommendation module preliminarily recalls the satellite information stored in the satellite information acquisition and storage module according to the input query content; mines the related information of the above-mentioned recalled information results based on the constructed satellite knowledge graph as a recommended candidate information set; performs relevance evaluation on the candidate information in the above-mentioned recommended candidate information set, and makes final recommendations based on the evaluation results.
  • the system further comprises an information retrieval interface, through which the user performs tag screening, keyword search or time range search, and the search content is input into the satellite information query recommendation module.
  • the system further comprises an information details interface, through which the user views the information details, the information viewed by the user serves as the current information text, and the information details interface simultaneously returns the recommendation results of the current information document.
  • the system further comprises a satellite information statistical report generation module and a statistical report management interface;
  • the user uploads, modifies, deletes, or queries the customized statistical report templates and historically generated statistical reports through the statistical report management interface; the satellite information statistical report generation module generates statistical reports according to the statistical report templates.
  • a remote sensing satellite information recommendation device comprising a client and a server
  • the client is provided with an information retrieval interface, an information details interface, and a statistical report management interface; and interacts with the server through the interface;
  • the user performs tag screening, keyword search or time range search through the information search interface, and the query content is sent to the server in the form of a search request form;
  • the user views the information details through the information details interface and sends an information details request form to the server; the information details interface also returns the recommendation results of the current information document sent by the server.
  • the server runs the method described above and responds to the interface input of the client.
  • the client is a WEB browser
  • the server is composed of a service layer and a data layer.
  • the service layer runs the method described
  • the data layer includes a data storage component and a server where the component is deployed.
  • the present invention proposes a remote sensing satellite information recommendation method and system architecture based on tags and knowledge graphs, providing a solution for users to accurately obtain satellite remote sensing information from massive Internet information.
  • the method does not require a user's historical behavioral preference data model.
  • a tag library and an intelligent tag matching strategy By constructing a tag library and an intelligent tag matching strategy, a knowledge graph in the professional field of satellite remote sensing is constructed at the same time.
  • Information and information tags will be used as a type of entity in the graph, enhancing the potential semantic association between tags. Users perform preliminary filtering of information content of interest by searching or filtering tags.
  • the system finds information that users may be interested in through the entity association information in the graph and recommends it, and evaluates the relevance of the information through a graph-enhanced semantic analysis network, making the recommendation results richer and more accurate.
  • the present invention provides a full-process system architecture for information query and recommendation, and information association analysis.
  • FIG1 is a schematic diagram of the framework of the system of the present invention.
  • FIG2 is a flow chart of the smart label matching module of the present invention.
  • FIG3 is a flow chart of knowledge graph creation of the present invention.
  • FIG4 is a schematic diagram of a remote sensing satellite knowledge graph model of the present invention.
  • FIG5 is a schematic diagram of the graph-enhanced semantic analysis network structure of the present invention.
  • FIG. 6 is a schematic diagram of the interaction flow of the remote sensing satellite information recommendation system of the present invention.
  • a remote sensing satellite information recommendation method comprising:
  • efficient satellite information collection and storage methods include: collecting public or subscribed satellite information and performing preliminary processing on the information, including translation of information content, cleaning of redundant HTML tags in information content, and correction of text content; selecting information storage components, and designing information storage structure and index structure; after the information cleaning and processing is completed, organizing the information into a designed data storage structure and storing the information in batches.
  • the second aspect is to implement intelligent label matching on the satellite information obtained in the first aspect, including: extracting keywords in the remote sensing satellite field in the search engine as labels to form a satellite information label library; designing and optimizing a flexibly configurable intelligent label matching model to target information at different stages of project implementation.
  • step (2) implementing smart tag matching with satellite information includes the following steps:
  • the present invention abstracts three label matching strategies, including regular label matching strategy, text similarity label matching strategy, and deep learning label matching strategy.
  • Different matching strategies summarize the matching logic of labels with different characteristics to information text, and can calculate the matching confidence of labels to information text.
  • the present invention uses a database to design a configuration-based label matching model, which realizes flexible plug-in, combination, switching and confidence calculation of different strategies through configuration.
  • the label matching model is adjusted to specifically support different needs brought about by changes in data scale, different cost control considerations, etc. at each stage.
  • a satellite knowledge graph including:
  • the entities, relationships, and attributes extracted and integrated above are stored in batches into the graph database to complete the construction of the knowledge graph in the field of remote sensing satellite knowledge.
  • a satellite information recommendation method is created, comprising the following steps:
  • the related information of the above recalled information results is mined as the recommended candidate information set;
  • the fifth aspect is the interactive design of satellite information query, recommendation and statistical report generation, including but not limited to:
  • Information retrieval interface users can search for relevant information documents based on tag filtering, keyword search, or time range;
  • Information details interface through which users can view information details, and the interface also returns recommendation results based on the current information document;
  • Statistical report management interface users upload, modify, delete, and query customized statistical report templates and historically generated statistical reports through the interface; users add documents of interest to the statistical report template through the interface, and the report is sent asynchronously to the user's mailbox after it is generated.
  • the present invention also provides a remote sensing satellite information recommendation system, which includes an efficient satellite information collection and storage module, a satellite information label intelligent matching module, a remote sensing satellite field knowledge graph construction module, a satellite information query recommendation module, and a satellite information statistical report generation module.
  • the present invention also provides a remote sensing satellite information recommendation device, which includes a client and a server.
  • a remote sensing satellite information recommendation device which includes a client and a server.
  • the same functional description in the method, system, and device of the present invention can be processed in the same manner, so the repetition is not repeated.
  • the present invention is described in detail below with an example.
  • a remote sensing satellite information recommendation device includes a client and a server.
  • the client is a WEB browser. Users interact with the server through WEB pages using interfaces.
  • the server consists of a service layer and a data layer.
  • the service layer includes WEB backend applications, algorithm service applications, and program-deployed servers.
  • the data layer includes ES (Elasticsearch, full-text retrieval engine), MySQL (relational database), Neo4J (graph database), Redis (key-value storage system) and other data storage components and component-deployed servers.
  • the service layer mainly implements the content of a remote sensing satellite information recommendation system/method.
  • the device architecture is shown in Figure 1. The details are as follows:
  • Satellite information efficient collection and storage module 1. Satellite information efficient collection and storage module:
  • ES full-text search engine as a storage component for satellite information to provide full-text search capabilities and fast query capabilities.
  • ES is deployed in a cluster, using an organizational structure of 3 Master nodes and 3 Data nodes to provide fault tolerance and load balancing capabilities, improving system availability and retrieval performance.
  • the original collected data (original information text) is processed through the ETL (extraction, transformation, cleaning, loading) process, which includes cleaning redundant tags, special characters, etc. in the original documents; connecting to translation API interfaces, such as Baidu and Youdao, to translate non-Chinese information and save the translation results; connecting to the error correction API to correct the information content; organizing information into a designed document structure, and calling the ES batch operation API to batch store information data into the database.
  • ETL extraction, transformation, cleaning, loading
  • the main index design of satellite information in ES is as follows:
  • the present invention uses the above method to process the obtained results in the following form (mainly displayed in the form of storage in es):
  • Satellite information intelligent label matching module the specific flow chart is shown in Figure 2:
  • Tag library (2) First, construct a tag library. Using keyword mining tools, extract keywords in the field of remote sensing satellites from Baidu Encyclopedia and Wikipedia as tags. Based on satellite domain specific terms (such as "NASA”, indicating organizations and satellite equipment associated with the information), domain expertise (such as “Earth observation”, “ground station”, etc.), and information feature words (such as "rocket launch", indicating events associated with the information), construct a tag library suitable for satellite information. For each tag, design a regular expression or a normative text expression that can characterize the characteristics of the tag, which is used for the subsequent tag matching strategy construction.
  • satellite domain specific terms such as "NASA”, indicating organizations and satellite equipment associated with the information
  • domain expertise such as "Earth observation", “ground station”, etc.
  • information feature words such as "rocket launch”, indicating events associated with the information
  • the present invention takes the strategy model as its guiding ideology and takes modification closure and expansion openness as its design principles. It innovatively proposes a database-based, flexible, and ready-to-use label matching model design to support changing information text label matching requirements in different situations.
  • the label matching model proposed in the present invention mainly includes three parts: label matching strategy, label matching mode, and result confidence calculation mode.
  • the tag matching strategy is an abstract structural expression of the matching algorithm logic information.
  • the implementation logic of the matching strategy itself is not coupled with the main application logic of the solution, but provides an interface or service in the form of a third-party application.
  • This invention abstracts the following three basic strategies:
  • the regular expression matching strategy is applicable to the matching of labels composed of professional phrases or fixed expressions in the field of remote sensing satellites. This type of label often appears in the information text in the form of plain text.
  • the present invention sets a set (at least one) of regular expressions for each label, indicating the regular features that the text matching the label should meet, such as: "Remote sensing satellite [image] image”, "[Ee] (arth)? [Oo] (bservation)? [Ss] (atellite)?”. If a tag (tagA) sets n regular expressions, and a piece of information (articalA) matches m of them, then
  • the text similarity matching strategy is applicable to satellite information with a certain characteristic label.
  • the representation of this label in the information is relatively flexible, and it is difficult to enumerate the regular expression of its representation.
  • a general description can be designed for this label feature as a reference standard. For each information, the text similarity between the information and the standard description is used to determine whether the label matching is satisfied.
  • the present invention uses the BM25 algorithm combined with the space vector model to calculate text similarity:
  • the standard expression of a tag is q, and the information text is d.
  • the BM25 algorithm is used to calculate the relevance score of each term based on q and d respectively, as the weight of the term, and the space vector of q and d is constructed with the weight, and the cosine between the two vectors is calculated as the similarity score. If the similarity score is greater than the set threshold, the information is determined to match the tag. The similarity score will be used as the confidence of the tag, and the information will be bound to the tag and updated and stored in ES.
  • the confidence calculation method is as follows:
  • the present invention builds a deep learning network consisting of an embedding layer, a 2-layer bidirectional LSTM layer, and a 3-layer fully connected layer.
  • the network is trained with information documents as input and label matching scores as output. If the score output by the network exceeds the set threshold, the label is considered to match, and the score is used as the confidence, and is updated and stored in ES after being bound to the information document together with the corresponding label.
  • the label matching model can select different matching modes (single strategy mode, combined matching mode) and different confidence calculation modes (average, weighted, maximum).
  • the matching model is stored in the MySQL database in the form of configuration records.
  • the present invention ensures the opening and closing principle of the matching model through the table design with maximum versatility (as shown in Table 3), so that the matching strategy can be iterated flexibly and quickly.
  • the benefit of the intelligent tag matching model design proposed in the present invention is that for tags and information with different characteristics, or at different stages of project implementation, different targeted technical solutions are often required in the tag matching process.
  • the configuration-based, flexible, pluggable, combined, and switchable tag matching model design proposed in the present invention can provide support for the tag matching needs in the above different situations.
  • the label matching model is set to a single strategy mode. For most features that are relatively clear and convenient to use finitely enumerated regular representations of labels, a regular label matching strategy is used; for a few labels with relatively complex feature representations, a text familiarity label matching strategy is used. The confidence result calculated by the single strategy matching is directly used as the model output confidence result. This method facilitates the rapid implementation of the project in the early stage and can produce data results. It should be noted that in the early stage, due to limited data samples and immature model iterations, the matching results obtained are not completely accurate and need to be supplemented by manual verification.
  • the label matching model is set to the combined strategy mode, and the label strategy uses the regular label matching strategy + deep learning matching strategy or the text similarity label matching strategy + deep learning matching strategy.
  • the confidence calculation mode uses the average.
  • the model iteration is immature, it needs to be supplemented by manual verification. For every 500 additional articles, the deep learning network is iteratively updated.
  • the training set has reached a certain scale.
  • the generalization performance of the deep learning network is improved, and the confidence calculation mode is adjusted to weighted sum.
  • the accuracy of the deep neural network reaches 0.8, and the weight distribution is 0.4/0.6.
  • manual verification is adjusted to sampling verification.
  • the accuracy rate is improved to 0.95.
  • the model performance is considered to be stable, and the deep learning single matching strategy is adopted.
  • the network output result is directly used as the matching confidence, and the manual verification is adjusted to irregular sampling verification.
  • the amount of information exceeds 10000, the same method can be used for processing at this stage.
  • the tag matching strategy configuration is defined as follows, which is stored in the MySQL database as structured data.
  • the specific design is as follows:
  • the core code of the label matching process is as follows:
  • the present invention uses the following examples to describe the above-mentioned marking process:
  • “Commercial remote sensing satellite” and “satellite launch” are commonly used keywords in the field of remote sensing satellites and are included in the tag library as tags.
  • the tag matching mode designed for the above two tags is as follows (regular matching strategy):
  • Fields Tag1 Tag2 id 1 2 tag Commercial remote sensing satellite Satellite launch match_mode 4 4 regex Business, Remote Sensing Satellite, Remote Sensing Data satellite, rocket, launch similarity_scentence - - similarity_api - - dl_api - - confidence_calcu_mode - - confidence_threshold 0.6 0.6
  • the first document matches the regular satellite, rocket, launch, remote sensing data, commercial, and remote sensing satellite.
  • the information document should be matched with the label "commercial remote sensing satellite” with a confidence of 1 (1>threshold 0.6), and the information should also be matched with the label "satellite launch” with a confidence of 1 (1>threshold 0.6);
  • the second document matches the regular commercial, remote sensing satellite, satellite, rocket, and launch, and should also match the labels "commercial remote sensing satellite” and "satellite launch", with confidences of 0.67 and 1, respectively.
  • the tag is bound to the document and updated and stored in ES, such as:
  • DeepKE an open source deep learning-based knowledge graph extraction tool, extracts entities, relationships, and attributes from satellite information
  • OpenEA an open source knowledge graph fusion tool, is used to align entities, and finally entities, relationships, etc. are imported into the graph database NEO4J to build a knowledge graph.
  • the specific steps are:
  • entities refer to objectively existing things, generally nouns, such as satellite numbers, organization names, orbits, and equipment models; relationships refer to the ways in which entities are connected to each other, generally referring to verbs between two entities, such as launch, run, and load; attributes refer to a certain characteristic of an entity, such as resolution, speed, etc.
  • the trained knowledge extraction model is used to extract knowledge from the information text data, and the extraction results are imported into the NEO4J graph database.
  • the results are shown in Figure 4.
  • the present invention proposes a satellite information recommendation method, which obtains related information through the knowledge graph as a recommendation candidate, and uses the graph-enhanced semantic analysis network proposed in the present invention to evaluate the recommendation candidate results. Information that meets the evaluation requirements will eventually be recommended to users.
  • the current document and candidate documents are taken as network inputs. After network calculation, an evaluation score characterizing the relevance of the candidate document to the current document is obtained. Candidate information with a score exceeding a threshold is sorted according to the score and returned to the user as the final recommendation result.
  • the graph-enhanced semantic analysis network mentioned in 3 above is mainly composed of three parts, namely, context feature analysis network, label feature analysis network, and graph feature analysis network.
  • Context feature analysis is mainly used to abstract and analyze the semantic feature information of the information text itself.
  • Label feature analysis is mainly used to extract abstract and analyze label entity feature information.
  • Graph feature analysis is mainly used to extract abstract and analyze graph feature information.
  • the specific network calculation steps are as follows:
  • the specific processing process of the context feature analysis network is that the current information and the specified candidate information text are randomly sampled through sentences to randomly generate a set of texts to be analyzed.
  • the text of the set to be analyzed is mapped to a sentence vector in a low-dimensional space through the PV-DBOW model to form a context feature matrix.
  • the matrix will further abstract the text context features through the transformer layer and the bidirectional LSTM layer.
  • the label entity itself is a representation of the information with certain characteristic generalization.
  • the present invention uses a label feature analysis network to process it. The specific process is that the label is vectorized by PV-DBOW, multiplied by the confidence information of each label, and a label feature matrix is formed. The matrix will be further feature abstracted by a multi-layer perceptron;
  • satellite map information is used as a carrier of relatively deeply hidden correlation information between satellite information.
  • the present invention uses a map feature analysis network to process it. The specific process is to map map entity/relationship information to a low-dimensional space through the TransE and TransH networks to form an entity/relationship vector to form a graph embedding feature matrix. The matrix will be further extracted and abstracted through the KGCNN network.
  • the graph-enhanced semantic analysis network proposed in the present invention can complement the constructed knowledge graph. After the graph is constructed, the correlation between information can be quickly obtained through graph calculation. The correlation results are intuitive and highly interpretable, but the correlation cannot be quantitatively measured.
  • the graph-enhanced semantic analysis network can quantitatively calculate the correlation between information, but the amount of calculation is large and the calculation results are poorly interpretable.
  • the present invention first uses graph calculation to obtain recommendation candidates, and then uses the graph-enhanced semantic analysis network to calculate the correlation between the candidate information and the current information, so that the efficiency and effect of the recommendation process are optimally balanced.
  • the client initiates a search request form based on tags or keywords and time range
  • the server receives the request and constructs a query search statement.
  • the ES full-text search engine performs a full-text search based on the inverted index and returns the result, which is organized into a response data structure and returned to the client to complete the response to the request.
  • the client initiates an information details request form
  • the server receives the request, obtains the detailed information of the information Document through information query, and constructs a Cypher query statement at the same time. It recalls the relevant information content based on the first-degree (second-degree) associated knowledge of the knowledge graph. The recalled documents and the current document are used as parameters. The relevance evaluation results are obtained through the graph-enhanced semantic analysis network calculation. Information with evaluation scores higher than the threshold is added to the recommended document set. Finally, the current document details and the recommended document set information are organized into a response data structure and returned to the client to complete the response to the request.
  • the client initiates a request form for generating information statistics reports
  • the server receives the request, pre-sets statistical indicators based on the specified template, builds query (ES), sql (MYSQL), and cypher (NEO4J) statistical query statements, and completes the content filling, style setting, and graphics drawing of the report template through a template engine framework based on Word file splitting and reorganization, JFreeChart, and thymeleaf. It generates a customized report, saves it to the database, and sends it to the user's mailbox asynchronously.
  • This implementation example also includes but is not limited to the following contents: multi-dimensional retrieval of satellite information, client-customized creation of report templates, template management, report management, report preview, download, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种遥感卫星资讯推荐方法、系统及设备,包括:采集卫星资讯文本并进行初步处理,将资讯文本组织为预设的数据存储结构并存储;对存储的资讯文本进行智能标签匹配,得到带有标签及置信度的卫星资讯;利用上述带有标签及置信度的卫星资讯,构建卫星知识图谱;根据用户的查询内容,对上述存储的卫星资讯进行初步召回;基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;对上述推荐候选资讯集中的候选资讯进行相关性评测,根据评测结果进行最终推荐。

Description

一种遥感卫星资讯推荐方法、系统及设备
本申请要求于2022年9月30日提交中国专利局、申请号为202211216697.7、发明名称为“一种遥感卫星资讯推荐方法、系统及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明专利涉及智能推荐技术领域,具体为一种遥感卫星资讯的推荐方法、系统架构、装置、设备及存储介质。
背景技术
随着卫星遥感技术的迅猛发展,全球进入了一个对地观测的新时代。遥感卫星发射数量整体不断增加,商用遥感卫星比例逐渐提高,基于卫星遥感数据的商业化应用层出不穷,取得了巨大的社会经济效益。同时,互联网的蓬勃发展,遥感卫星相关信息量大幅增长。在海量的资讯中快速定位符合用户需求的有效信息,并进一步对目标信息进行深度挖掘、关联分析以及趋势研究,对于卫星遥感领域的国家战略、行业研究、商业分析等应用具有重要意义。
一方面,遥感卫星行业范围广,相关资讯涉及到遥感卫星制造、卫星发射服务、地面设备制造、遥感卫星运营、遥感卫星应用服务等多个环节。另一方面,遥感卫星信息专业化程度高,获取遥感卫星情报需要具备专业领域的知识。目前常用的推荐方法有基于关键词推荐和基于用户行为推荐,其中基于关键词的检索推荐,由于上下文语义信息缺失,割裂了数据的关联性,使得召回的遥感卫星信息效果不佳,信息利用率不高;基于用户行为的推荐算法,经常会受到用户与对象之间的行为关系数据的稀疏性影响,同时新用户或者新物品进行推荐时存在的冷启动问题也会导致推荐效果不佳。
知识图谱本质上一种语义网络,其中蕴含着大量实体与实体之间的关系。通过构建关于商业遥感卫星资讯的图谱信息,并将资讯标签映射为知识图谱的实体,一定程度上弥补了标签之间潜在的语义关联信息的丢失,这样可以有效 解决数据稀疏的问题,以提高系统性能和推荐效果。
发明内容
本发明解决的技术问题是:为解决上述存在的问题,本发明提供一种遥感卫星资讯的智能推荐方法、系统及设备。
本发明解决技术的方案是:一种遥感卫星资讯推荐方法,包括:
采集卫星资讯文本并进行初步处理,将资讯文本组织为预设的数据存储结构并存储;
对存储的资讯文本进行智能标签匹配,得到带有标签及置信度的卫星资讯;
利用上述带有标签及置信度的卫星资讯,构建卫星知识图谱;
根据用户的查询内容,对上述存储的卫星资讯进行初步召回;
基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;
对上述推荐候选资讯集中的候选资讯进行相关性评测,根据评测结果进行最终推荐。
优选的,所述系统所述初步处理包括:
从新闻网站、遥感卫星运营商官网以及各国航天政府机构网站进行多线程爬取,获取卫星资讯的原始文本信息;
选择检索引擎作为卫星资讯的存储组件;
通过抽取、转换清洗、加载流程对原始文本信息进行处理,将资讯文本组织为预设的数据存储结构并存储至上述检索引擎。
优选的,所述系统所述的检索引擎选择ES全文检索引擎,全文检索引擎以集群的方式进行部署。
优选的,所述系统得到带有标签及置信度的卫星资讯包括:
提取检索引擎中遥感卫星领域的关键词作为标签,形成卫星资讯标签库;
设计可配置的智能标签匹配模型,所述匹配模型概括不同特征的标签对资讯文本的匹配逻辑,并能计算得到标签于资讯文本的匹配置信度;
利用所述智能标签匹配模型对存储的资讯文本进行智能标签匹配,得到带 有标签及置信度的卫星资讯。
优选的,所述系统所述可配置的智能标签匹配模型包括标签匹配策略、标签匹配模式、结果置信度计算模式三部分;
所述的标签匹配策略中存储预先设计的多个匹配策略;
所述标签匹配模式用于根据需求从标签匹配策略中选择不同的匹配策略组成单策略模式或组合匹配模式;
所述结果置信度计算模式中存储不同的置信度计算模式。
优选的,所述系统所述匹配策略包括正则标签匹配策略、文本相似度标签匹配策略、深度学习标签匹配策略;
所述正则标签匹配策略适用于遥感卫星领域的专业词组或固定表达构成的标签匹配;
所述文本相似度标签匹配策略用于使用资讯与标准表述的文本相似度来判断是否满足标签匹配,所述标准为参考标准,是标签特征设计的概括性表述;
所述深度学习标签匹配策略以资讯文本作为输入,标签匹配得分作为输出进行网络训练;网络输出的得分超过设置的阈值,则认为标签匹配,得分作为置信度。
优选的,所述系统通过最大通用性的表设计方式,将智能标签匹配模型的配置方式存放在MySQL数据库中,保证智能标签匹配模型的开闭原则,使匹配策略能够进行灵活快速地迭代。
优选的,所述系统最大通用型的表设计内容包括:
设计验证阶段:卫星资讯文本数量小于500,匹配策略和标签匹配模式为:正则匹配策略或文本相似度匹配策略的单策略模式;结果置信度计算模式为直接使用单一策略的置信度结果作为模型的最终置信度结果;
开发实施阶段:卫星资讯文本数量500-2000,匹配策略和标签匹配模式为:正则匹配+深度学习标签匹配策略;或者文本相似度+深度学习标签匹配策略的组合策略模式;结果置信度计算模式为求取不同策略置信度结果的平均值作为模型的最终置信度结果;
试运行阶段:卫星资讯文本数量2000-5000时,匹配策略和标签匹配模式为:正则匹配+深度学习标签匹配策略;或者文本相似度+深度学习标签匹配策略的组合策略模式;结果置信度计算模式为求取不同策略置信度结果的加权平均值作为模型的最终置信度结果;
运行阶段:卫星资讯文本数量大于5000时,匹配策略和标签匹配模式为:深度学习标签匹配策略的单策略模式,结果置信度计算模式为直接使用单一策略的置信度结果作为模型的最终置信度结果。
优选的,所述系统对匹配结果进行人工校验,其中,设计验证阶段以及开发实施阶段均辅以人工全校验;试运行阶段辅以人工定期抽样校验;运行阶段辅以人工不定期抽样校验。
优选的,所述系统开发实施阶段,资讯每增加预设的篇幅,深度学习网络迭代更新一次,整个开发实施阶段,更新次数为10-20次。
优选的,所述系统深度学习网络由嵌入层、2层双向LSTM层、3层全连接层构成;以资讯文本作为输入,标签匹配得分作为输出进行网络训练;网络输出的得分超过设置的阈值,则认为标签匹配,得分作为置信度。
优选的,所述系统所述构建卫星知识图谱包括:
将卫星资讯标签库的每一个标签以标签实体的形式,存入图数据库;
将资讯文本以资讯实体的形式,存入图数据库;
建立资讯实体与标签实体之间的关系,即匹配关系;
使用知识图谱抽取工具从资讯实体的文本属性中,抽取的遥感卫星相关知识实体、关系、属性;
使用知识图谱融合工具对上述得到的实体进行融合对齐;
将上述抽取融合后的实体、关系、属性批量存入图数据库,完成遥感卫星知识领域的知识图谱构建。
优选的,所述系统通过构建图谱增强型语义分析网络,对上述候选资讯进行相关性评测。
优选的,所述系统所述图谱增强型语义分析网络包括上下文特征分析网络、标签特征分析网络、图谱特征分析网络;
所述上下文特征分析网络于抽象和分析资讯文本本身的语义特征信息;
所述标签特征分析网络用于提取抽象和分析标签特征信息;
所述图谱特征分析网络用于提取抽象和分析图谱特征信息。
优选的,所述系统所述相关性评测包括:
S1、对当前资讯文本和推荐候选资讯集中的候选资讯文本分别执行如下处理:
利用上下文特征分析网络将资讯文本经过语句随机采样,随机生成待分析文本集合,并映射为低维空间中的句向量,形成上下文特征矩阵,从所述上下文特征矩阵中进一步抽象文本上下文特征;
利用标签特征分析网络将资讯文本的标签实体进行向量化,乘以标签的置信度信息,组成标签特征矩阵;对所述标签特征矩阵进行进一步特征抽象;
利用图谱特征分析网络将资讯文本的非标签实体作为相对深度的隐藏关联信息载体,映射到低维空间形成实体/关系向量,组成图嵌入特征矩阵,对所述图嵌入特征矩阵进行进一步图谱特征的提取抽象;
S2、将S1中针对每个候选资讯文本的处理结果分别与当前资讯文本的处理结果进行链接后,再通过全连接网络,最后输出表征两篇文本的关联程度的输出评测结果;
上述两篇文本为当前资讯文本与推荐候选资讯集中的每篇候选资讯文本,当前资讯文本为用户从初步召回的卫星资讯中点开的一篇资讯文本。
一种遥感卫星资讯推荐系统,包括卫星资讯采集及存储模块、卫星资讯标签智能匹配模块、遥感卫星领域知识图谱构建模块、卫星资讯查询推荐模块;
所述卫星资讯采集及存储模块,用于采集卫星资讯文本并进行初步处理,将资讯文本组织为预设的数据存储结构并存储;
所述卫星资讯标签智能匹配模块,用于对存储的资讯文本进行智能标签匹 配,得到带有标签及置信度的卫星资讯;
所述遥感卫星领域知识图谱构建模块,用于利用上述带有标签及置信度的卫星资讯,构建卫星知识图谱;
所述卫星资讯查询推荐模块,根据输入的查询内容,对卫星资讯采集及存储模块存储的卫星资讯进行初步召回;基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;对上述推荐候选资讯集中的候选资讯进行相关性评测,根据评测结果进行最终推荐。
优选的,所述系统还包括资讯检索接口,用户通过该接口进行标签筛选、关键词查询或者时间范围查询,查询内容输入至卫星资讯查询推荐模块。
优选的,所述系统还包括资讯详情接口,用户通过该接口查看资讯详情信息,用户查看的资讯作为当前资讯文本,资讯详情接口同时返回当前资讯文档的推荐结果。
优选的,所述系统还包括卫星资讯统计报告生成模块以及统计报告管理接口;
用户通过统计报告管理接口上传、修改、删除、或查询自定义的统计报告模板及历史生成的统计报告;所述卫星资讯统计报告生成模块根据统计报告模板生成统计报告。
一种遥感卫星资讯推荐设备,包括客户端和服务端;
所述客户端设置资讯检索接口、资讯详情接口、统计报告管理接口;通过接口与服务端进行交互;
还包括,用户通过资讯检索接口进行标签筛选、关键词查询或者时间范围查询,查询内容以检索请求表单的形式发送至服务端;
用户通过资讯详情接口查看资讯详情信息,向服务端发送资讯详情请求表单;资讯详情接口同时返回服务端发送的当前资讯文档的推荐结果。
用户通过统计报告管理接口上传、修改、删除、或查询自定义的统计报告模板及历史生成的统计报告;
所述服务端运行所述的方法,对客户端的接口输入进行响应。
优选的,客户端为WEB浏览器,服务端由服务层和数据层构成,服务层运行所述的方法,数据层包括数据存储组件以及组件部署的服务器。
本发明与现有技术相比的有益效果是:
本发明提出了一种基于标签和知识图谱的遥感卫星资讯推荐方法和系统架构,为用户在海量互联网信息中准确获取卫星遥感资讯提供了一种解决方案。该方法不需要用户历史的行为偏好数据模型,通过构建标签库和智能标签匹配策略,同时构建卫星遥感专业领域的知识图谱,资讯以及资讯标签都将作为图谱的一类实体,增强了标签间的潜在语义关联。用户通过搜索或筛选标签对感兴趣的资讯内容进行初步过滤,系统通过图谱中的实体关联信息找到用户可能感兴趣的资讯进行推荐,并通过图谱增强型语义分析网络进行资讯相关性评测,使得推荐结果更丰富、准确。本发明为资讯查询与推荐、信息的关联分析提供了一套全流程系统架构。
附图说明
图1是本发明所述系统的框架示意图;
图2是本发明智能标签匹配模块流程图;
图3是本发明的知识图谱创建流程图;
图4是本发明的遥感卫星知识图谱模型示意图;
图5是本发明图谱增强型语义分析网络结构示意图;
图6是本发明遥感卫星资讯推荐系统交互流程示意图。
具体实施方式
下面结合实施例对本发明作进一步阐述。
一种遥感卫星资讯推荐方法,包括:
第一方面,卫星资讯的高效采集及存储方法,包括:采集公开的或是订阅的卫星资讯,并对资讯进行初步处理,处理过程包括资讯内容翻译,资讯内容冗余HTML标签清洗,文本内容修正等;选型资讯存储组件,设计资讯的存储 结构,索引结构;资讯信息清理处理完成后,组织资讯信息为设计好的数据存储结构,资讯批量入库。
第二方面,对第一方面中获取得到的卫星资讯实施智能标签匹配,包括:提取检索引擎中遥感卫星领域的关键词作为标签,形成卫星资讯标签库;设计优化可灵活配置的智能标签匹配模型,以适用于项目实施的不同阶段针对性地对资讯进行打标。
进一步的,步骤(2)卫星资讯实施智能标签匹配包括以下步骤:
首先,本发明设计抽象了三种标签匹配策略,包括正则标签匹配策略、文本相似度标签匹配策略、深度学习标签匹配策略。不同匹配策略概括了不同特征的标签对资讯文本的匹配逻辑,并能计算得到标签于资讯文本的匹配置信度。
然后,以上述标签匹配策略为基础,本发明使用数据库设计了一种基于配置的标签匹配模型。该模型通过配置实现不同策略的灵活插拔、组合、切换及置信度的计算。
在项目实施的不同阶段,通过调整标签匹配模型来针对性支持各个阶段下,数据规模变化、不同的成本控制考虑等带来的不同需求。
第三方面,针对上述第二方面得到的带有标签及置信度的卫星资讯,构建卫星知识图谱,包括:
将标签库的每一个标签以标签实体的形式,存入图数据库;
将资讯文档以资讯实体的形式,存入图数据库;
建立资讯实体与标签实体之间的关系,即匹配关系;
使用知识图谱抽取工具从资讯实体的文本属性中,抽取更细粒度的遥感卫星相关知识实体、关系、属性;
使用知识图谱融合工具对上述得到的实体进行融合对齐;
将上述抽取融合后的实体、关系、属性批量存入图数据库,完成遥感卫星知识领域的知识图谱构建。
第四方面,基于第二方面的带标签的资讯以及第三方面构建的卫星知识图 谱,创建卫星资讯推荐方法,包括以下步骤:
通过用户输入的检索内容(包括标签筛选或关键词查询等),对卫星资讯进行初步召回;
基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;
构建图谱增强型语义分析网络,对上述候选资讯进行相关性评测,根据评测结果进行最终推荐。
第五方面,卫星资讯查询、推荐及统计报告生成交互设计,包括但不限于:
资讯检索接口,用户基于标签筛选,关键词查询,或时间范围查询相关资讯文档;
资讯详情接口,用户通过接口查看资讯详情信息,接口同时返回基于当前资讯文档的推荐结果;
统计报告管理接口,用户通过接口上传、修改、删除、查询自定义的统计报告模板及历史生成的统计报告;用户通过接口,将感兴趣的文档加入统计报告模板,报告生成后异步发送至用户邮箱。
本发明还提供一种遥感卫星资讯推荐系统,系统包括卫星资讯的高效采集及存储模块、卫星资讯标签智能匹配模块、遥感卫星领域知识图谱构建模块、卫星资讯查询推荐模块、卫星资讯统计报告生成模块。
本发明还提供一种遥感卫星资讯推荐设备,设备包括客户端和服务端。本发明中的方法、系统、设备中的相同功能描述可以采用同样的处理方式,因此,重复的不进行过多赘述。下面以一实例对本发明做详细说明。
一种遥感卫星资讯推荐设备包括客户端和服务端,客户端为WEB浏览器,用户通过WEB页面,使用接口与服务端进行交互;服务端由服务层和数据层构成,服务层包括WEB后端应用程序、算法服务应用程序及程序部署的服务器,数据层包括ES(Elasticsearch,全文检索引擎)、MySQL(关系型数据库)、Neo4J(图形数据库)、Redis(key-value存储系统)等数据存储组件及组件部署 的服务器,服务层主要实现的是一种遥感卫星资讯推荐系统/方法的内容。设备架构如图1所示。具体如下:
1、卫星资讯高效采集及存储模块:
利用Python的Scrapy、Newspaper框架,从航天类新闻网站SpaceNews、防务新闻网、遥感卫星运营商官网如Maxar公司、空客防务与航天公司(ADS),政府间国际组织如欧空局(ESA)、地球观测组织(GEO),各国航天政府机构如美国国家航空航天局(NASA)、中国国家航天局(CNSA)等进行多线程爬取,这些原始文本信息一般都是以HTML网页和PDF文档进行存储;
使用ES全文检索引擎,作为卫星资讯的存储组件,以提供资讯的全文检索能力,提供快速查询能力。设计卫星资讯文档的索引结构,包括字段、字段数据类型、字段分词行为等。ES以集群的方式进行部署,使用3个Master节点,3个Data节点的组织架构,以提供容错及负载均衡的能力,提高系统的可用性和检索性能。
通过ETL(抽取、转换清洗、加载)流程对原始采集资料(原始资讯文本)进行处理,过程包括对原始文档中的冗余标签,特殊字符等进行清洗;对接翻译API接口,如百度、有道,对非中文资讯进行翻译并保存翻译结果;对接纠错API,对资讯内容进行修正;组织资讯信息为设计好的文档结构,调用ES的batch操作API将资讯数据批量入库。
卫星资讯在ES中的主要索引设计如下:
表1
Figure PCTCN2022129937-appb-000001
Figure PCTCN2022129937-appb-000002
本实施案例中,以爬取的两篇资讯为示例,本发明通过以上方式,处理得到的成果形式如下(主要以在es中的存放形式展示):
Figure PCTCN2022129937-appb-000003
2、卫星资讯智能标签匹配模块,具体流程图如图2所示:
(1)首先进行标签库构建。利用关键词挖词工具,在百度百科和WIKI百科中提取出遥感卫星领域的关键词作为标签,基于卫星领域专有名词(如“NASA”,表示与资讯关联的机构,卫星设备等)、领域专业知识(如“对地观测”、“地面站”等)、资讯特征词(如“火箭发射”,表示资讯关联的事件)等,构建适用于卫星资讯的标签库。对每个标签归纳设计出能够表征该标签特点的正则表达式,或者规范性文本表述,用于后续标签匹配策略构建。
(2)智能标签匹配模型构建。
基于资讯类型、来源、主题相关性,以及项目实施不同阶段资讯的规模、维护成本等方面的考虑,本发明以策略模式为指导思想,面对修改关闭面对拓展开放为设计原则,创新性地提出了一种基于数据库的,灵活的配置即用的标签匹配模型设计,以支持不同情况下变化的资讯文本标签匹配需求。
本发明提出的标签匹配模型主要包括三个部分:标签匹配策略、标签匹配模式、结果置信度计算模式。
标签匹配策略为匹配算法逻辑信息的抽象化结构表达,匹配策略本身的实现逻辑不与方案主体应用逻辑耦合,而是以第三方应用的方式提供接口或服务。本发明抽象了以下三种基本策略:
①正则表达式匹配策略
正则表达式匹配策略适用于遥感卫星领域的专业词组或固定表达构成的标签匹配,该类型的标签,常以为明文的形式出现在资讯文本中。本发明为每个标签设定一组(至少一个)正则表达式,表示匹配该标签的文本应该符合的正则特征,如:“遥感卫星[影图]像”、“[Ee](arth)?[Oo](bservation)?[Ss](atellite)?”。若某标签(tagA)设定的正则表达式为n个,某篇资讯(articalA)匹配了其中的m个正则表达式,那么可以将
Figure PCTCN2022129937-appb-000004
作为资讯articalA匹配标签tagA的置信度(置信度用于表征某文档匹配某标签的正确程度),当confidence>confidence_threshold(自定义的置信度阈值),才会为articalA打上tagA标签,并且tagA及confidence将与articalA绑定更新到ES中。
②文本相似度匹配策略
文本相似度匹配策略适用于:针对具有某种特征标签的卫星资讯,资讯中对于该种标签的表征相对灵活,枚举其表征的正则表达相较困难。此时可以为该种标签特征设计一种概括性的表述,作为一个参考标准,对于每个资讯,使 用资讯与该标准表述的文本相似度来判断是否满足标签匹配。
本发明使用BM25算法结合空间向量模型来计算文本相似度:
标签的标准表述为q,资讯文本为d,将q进行分词后,使用BM25算法分别计算每个词条分别基于q,d的相关度得分,作为词条的权重,以权重构建q和d的空间向量,计算两向量之间的余弦,作为相似得分。若相似的分大于设置的阈值,则判定该篇资讯的匹配该标签。相似度得分将作为标签的置信度,和标签一起绑定资讯并更新存入ES。置信度计算方法如下:
逆文档频率:
Figure PCTCN2022129937-appb-000005
BM25得分:
Figure PCTCN2022129937-appb-000006
余弦相似度:
Figure PCTCN2022129937-appb-000007
③深度学习标签匹配策略
本发明搭建了由嵌入层、2层双向LSTM层、3层全连接层构成的深度学习网络。以资讯文档作为输入,标签匹配得分作为输出进行网络训练。网络输出的得分超过设置的阈值,则认为标签匹配,得分作为置信度,和对应的标签一起与资讯文档绑定后更新存入ES。
基于上述策略,标签匹配模型可以选择不同的匹配模式(单策略模式,组合匹配模式)和不同的置信度计算模式(平均、加权、最大),匹配模型以配置记录的方式存放在MySQL数据库中,本发明通过最大通用性的表设计(如表3所示),保证了匹配模型的开闭原则,使匹配策略可以进行灵活快速地迭代。
本发明提出的智能标签匹配模型设计益处在于:对于不同特征的标签和资讯,或在项目实施的不同阶段,往往需要标签匹配过程采用不同的具有针对性地技术方案。本发明提出的基于配置的可灵活插拔、组合、切换的标签匹配模型设计,可以对上述不同情况下的标签匹配需求提供支持。
项目初期,标签及文档较少,月增量大概为300篇资讯,对应使用了20个标签,本发明中将资讯数量少于500时定义为项目初期,此时标签匹配模型设置为单策略模式,对于多数特征相对明确,方便使用有限枚举的正则表征的标签,使用正则标签匹配策略;对于少数特征表示起来相对复杂的标签,则使用文本相识度标签匹配策略。单策略匹配计算得到的置信度结果直接作为模型输出置信度结果。这种方式方便项目在初期快速落地实现,并能产出数据成果。需要说明的是,初期由于数据样本有限并且模型迭代不够成熟,得到的匹配结果并不完全准确,需要辅以人工校验。
随着资讯数据规模增大,当资讯量在500-2000篇,可以基于小样本集训练深度神经网络。此时标签匹配模型设置为组合策略模式,标签策略使用正则标签匹配策略+深度学习匹配策略或文本相似度标签匹配策略+深度学习匹配策略。此时置信度计算模式采用取平均。同样,因为模型迭代不成熟,需要辅以人工校验。资讯每增加500篇,深度学习网络迭代更新一次。
当资讯量达到2000-5000时,训练集已经具有一定规模,此时深度学习网络泛化性能有所提高,置信度计算模式调整为取加权求和,深度神经网络准确率达到0.8,权重分配为0.4/0.6。同时考虑到人工校验维护的成本,人工校验调整为抽样校验。
当资讯量达到5000-10000时,随着学习网络迭代优化,准确率提升至0.95,此时认为模型性能趋于稳定,采用深度学习单匹配策略,网络输出结果直接作为匹配置信度,人工校验调整为不定期抽样校验。资讯量超过10000时,同样可以采用该阶段的方式处理。
表2标签匹配模型
Figure PCTCN2022129937-appb-000008
其中,标签匹配策略相关配置定义如下,以结构化数据存储在MySQL数据库中,具体设计如下:
表3策略配置数据结构
字段 字段描述 类型
id 自增主键 int
tag 标签 varchar
match_mode 匹配模式 int
regex 正则表达式 varchar
similarity_scentence 相似度比较标签表述标准参考 text
similarity_api 文本相似度匹配模型接口 varchar
dl_api 深度神经网络匹配模型接口 varchar
confidence_calcu_mode 多模型置信度组合模式 varchar
confidence_threshold 置信度阈值 float
其中,标签匹配过程核心代码如下:
Figure PCTCN2022129937-appb-000009
本发明使用以下示例对上述打标过程进行描述:
“商业遥感卫星”、“卫星发射”作为遥感卫星领域常用关键词,作为标签被纳入标签库。上述两个标签设计的标签匹配模式如下(正则匹配策略):
表4正则标签匹配模式
字段 Tag1 Tag2
id 1 2
tag 商业遥感卫星 卫星发射
match_mode 4 4
regex 商业、遥感卫星、遥感数据 卫星、火箭、发射
similarity_scentence - -
similarity_api - -
dl_api - -
confidence_calcu_mode - -
confidence_threshold 0.6 0.6
对于方面一中示例的两篇文档,第一篇匹配正则卫星、火箭、发射、遥感数据、商业、遥感卫星。按照方面二中描述的正则匹配策略,应该为该资讯文档匹配标签“商业遥感卫星”,置信度为1(1>阈值0.6),同时为该资讯匹配标签“卫星发射”,置信度为1(1>阈值0.6);第二篇资讯匹配正则商业、遥感卫星、卫星、火箭、发射,同样应该匹配标签“商业遥感卫星”、“卫星发射”,置信度分别为0.67、1。
经过人工审核后,标签与文档绑定更新存储到ES如:
Figure PCTCN2022129937-appb-000010
Figure PCTCN2022129937-appb-000011
(3)卫星知识图谱构建方法模块,具体流程图如图3所示:
基于开源深度学习的知识图谱抽取工具DeepKE对卫星资讯进行实体、关系、属性抽取后,使用开源知识图谱融合工具OpenEA对实体进行对齐,最后将实体、关系等导入图数据库NEO4J构建知识图谱。具体步骤为:
人工准备卫星资讯知识样本,包括资讯文本及预标注的卫星知识实体、属性、关系。将数据输入DeepKE相应模块(NER、RE、AE)进行模型训练,使用训练好的知识抽取模型如图3,对资讯文本数据进行知识抽取,并将抽取结果导入Neo4J图数据库。
需要说明的是,实体指客观存在的事物,一般指名词,比如卫星编号、机构名、运行轨道、设备型号;关系指实体之间相互联系的方式,一般指两个实体之间的动词,比如发射、运行、装载;属性指实体所具备的某一特征,如分辨率,速度等。
本实施示例中,基于爬取的两篇新闻资讯,使用训练好的知识抽取模型对资讯文本数据进行知识抽取,并将抽取结果导入NEO4J图数据库,结果如图4所示。
(4)卫星资讯推荐方法模块,具体流程图如图5所示:
第四方面,基于上述带标签资讯以及卫星知识图谱,本发明提出一种卫星资讯推荐方法,通过知识图谱得到关联资讯,作为推荐候选,并通过本发明提出的图谱增强型语义分析网络,用于对推荐候选结果进行评测,评测满足要求的资讯最终会被推荐给用户。
具体实施方案如下:
1)通过用户选取的标签,或输入的全文检索关键词,构建ES的query查询语句(term或match),ES通过TF/IDF算法进行资讯文档的召回;
2)针对上述召回的资讯文档,构建文档关联实体的cypher查询语句,通过实体间的关联关系,找到该文档的一度、二度关联文档,作为推荐候选;
3)使用本发明提出的图谱增强型语义分析网络,以当前文档及候选文档作为网络的输入,经过网络计算,得到表征候选文档与当前文档相关性的评测得分,得分超过阈值的候选资讯,按照得分排序后作为最终的推荐结果返回给用户。
上述3中提到的图谱增强型语义分析网络主体由三部分组成,分别为上下文特征分析网络、标签特征分析网络、图谱特征分析网络。上下文特征分析,主要用于抽象和分析资讯文本本身的语义特征信息。标签特征分析,主要用于提取抽象和分析标签实体特征信息。图谱特征分析,主要用于提取抽象和分析图谱特征信息。具体网络计算步骤如下:
1)首先,上下文特征分析网络的具体处理过程为,当前资讯与指定的候选资讯文本分别经过语句随机采样,随机生成待分析文本集合,待分析集合文本通过PV-DBOW模型映射为低维空间中的句向量,形成上下文特征矩阵,矩阵会通过transformer层和双向LSTM层进一步抽象文本上下文特征。
2)然后,资讯的关联实体分为标签实体和非标签实体,标签实体本身作为对资讯具有某种特征概括性的一种表征,本发明使用标签特征分析网络对其进行处理,具体过程为,标签通过PV-DBOW进行向量化,乘以各标签的置信度信息,组成标签特征矩阵,矩阵将通过多层感知机进行进一步特征抽象;
3)最后,卫星图谱信息作为卫星资讯间相对深度隐藏关联信息的载体,本发明使用图谱特征分析网络对其进行处理,具体过程为,通过TransE、TransH网络将图谱实体/关系等信息映射到低维空间形成实体/关系向量,组成图嵌入特征矩阵,矩阵将通过KGCNN网络进行进一步图谱特征的提取抽象。
4)当前资讯文本和指定的推荐候选资讯文本分别执行上述处理,分别得到两组3种抽象特征结果矩阵,将每种矩阵分别进行链接后,再通过卷积、全连 接等网络,最后输出评测结果y’,表征两篇文本的关联程度,为最终推荐决策提供参考。
本发明提出的图谱增强型语义分析网络,与构建的知识图谱能够相互补充。图谱构建完成后,通过图计算,能够快速获取资讯间的关联关系,关联结果直观,可解释性强,但相关性无法量化度量。图谱增强型语义分析网络可量化地计算出资讯间的相关性,但计算量大,并且计算结果可解释性差。本发明先使用图计算得到推荐候选,再使用图谱增强型语义分析网络对候选资讯与当前资讯进行相关性计算,使推荐过程效率和效果达到平衡最优。
(5)构建卫星资讯查询推荐模块,卫星资讯统计报告生成模块,具体流程图如图6所示:
本实施示例中,由客户端发起基于标签或关键词、时间范围的检索请求表单;
服务端接收请求,并构建query检索语句,由ES全文检索引擎基于倒排索引进行全文检索后返回,并组织为应答数据结构,返回给客户端,完成请求的响应。
客户端发起资讯详情请求表单;
服务端接收请求,通过资讯查询并得到资讯Document详情信息,同时构建Cypher查询语句,基于知识图谱一度(二度)关联知识进行相关资讯内容的召回,召回的文档和当前文档共同作为参数,通过图谱增强型语义分析网络计算得到相关性评测结果,评测得分高于阈值的资讯加入推荐文档集,最后组织当前文档详情信息以及推荐文档集信息为应答数据结构,返回给客户端,完成请求的响应。
客户端发起资讯统计报告生成的请求表单;
服务端接收请求,基于指定模板预先设定统计指标,构建query(ES)、sql(MYSQL)、cypher(NEO4J)统计查询语句,召回的数据与指定加入报告的资讯文档列表一起,通过以Word文件拆分重组、JFreeChart、thymeleaf为基础 的模板引擎框架,完成报告模板的内容填充、样式设置、图形绘制,生成定制化报告,并保存到数据库,同时通过异步的方式下发致用户邮箱。
本实施示例还包含并不限于以下内容:卫星资讯的多维度检索、客户端定制化创建报告模板、模板管理、报告管理、报告预览、下载等。
本发明虽然已以较佳实施例公开如上,但其并不是用来限定本发明,任何本领域技术人员在不脱离本发明的精神和范围内,都可以利用上述揭示的方法和技术内容对本发明技术方案做出可能的变动和修改,因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化及修饰,均属于本发明技术方案的保护范围。
本发明未详细说明部分属于本领域技术人员的公知常识。

Claims (21)

  1. 一种遥感卫星资讯推荐方法,其特征在于包括:
    采集卫星资讯文本并进行初步处理,将资讯文本组织为预设的数据存储结构并存储;
    对存储的资讯文本进行智能标签匹配,得到带有标签及置信度的卫星资讯;
    利用上述带有标签及置信度的卫星资讯,构建卫星知识图谱;
    根据用户的查询内容,对上述存储的卫星资讯进行初步召回;
    基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;
    对上述推荐候选资讯集中的候选资讯进行相关性评测,根据评测结果进行最终推荐。
  2. 根据权利要求1所述的方法,其特征在于:所述初步处理包括:
    从新闻网站、遥感卫星运营商官网以及各国航天政府机构网站进行多线程爬取,获取卫星资讯的原始文本信息;
    选择检索引擎作为卫星资讯的存储组件;
    通过抽取、转换清洗、加载流程对原始文本信息进行处理,将资讯文本组织为预设的数据存储结构并存储至上述检索引擎。
  3. 根据权利要求2所述的方法,其特征在于:所述的检索引擎选择ES全文检索引擎,全文检索引擎以集群的方式进行部署。
  4. 根据权利要求1所述的方法,其特征在于:得到带有标签及置信度的卫星资讯包括:
    提取检索引擎中遥感卫星领域的关键词作为标签,形成卫星资讯标签库;
    设计可配置的智能标签匹配模型,所述匹配模型概括不同特征的标签对资讯文本的匹配逻辑,并能计算得到标签于资讯文本的匹配置信度;
    利用所述智能标签匹配模型对存储的资讯文本进行智能标签匹配,得到带有标签及置信度的卫星资讯。
  5. 根据权利要求4所述的方法,其特征在于:所述可配置的智能标签匹配模型包括标签匹配策略、标签匹配模式、结果置信度计算模式三部分;
    所述的标签匹配策略中存储预先设计的多个匹配策略;
    所述标签匹配模式用于根据需求从标签匹配策略中选择不同的匹配策略组成单策略模式或组合匹配模式;
    所述结果置信度计算模式中存储不同的置信度计算模式。
  6. 根据权利要求5所述的方法,其特征在于:所述匹配策略包括正则标签匹配策略、文本相似度标签匹配策略、深度学习标签匹配策略;
    所述正则标签匹配策略适用于遥感卫星领域的专业词组或固定表达构成的标签匹配;
    所述文本相似度标签匹配策略用于使用资讯与标准表述的文本相似度来判断是否满足标签匹配,所述标准为参考标准,是标签特征设计的概括性表述;
    所述深度学习标签匹配策略以资讯文本作为输入,标签匹配得分作为输出进行网络训练;网络输出的得分超过设置的阈值,则认为标签匹配,得分作为置信度。
  7. 根据权利要求5所述的方法,其特征在于:通过最大通用性的表设计方式,将智能标签匹配模型的配置方式存放在MySQL数据库中,保证智能标签匹配模型的开闭原则,使匹配策略能够进行灵活快速地迭代。
  8. 根据权利要求7所述的方法,其特征在于:最大通用型的表设计内容包括:
    设计验证阶段:卫星资讯文本数量小于500,匹配策略和标签匹配模式为:正则匹配策略或文本相似度匹配策略的单策略模式;结果置信度计算模式为直接使用单一策略的置信度结果作为模型的最终置信度结果;
    开发实施阶段:卫星资讯文本数量500-2000,匹配策略和标签匹配模式为:正则匹配+深度学习标签匹配策略;或者文本相似度+深度学习标签匹配策略的组合策略模式;结果置信度计算模式为求取不同策略置信度结果的平均值作 为模型的最终置信度结果;
    试运行阶段:卫星资讯文本数量2000-5000时,匹配策略和标签匹配模式为:正则匹配+深度学习标签匹配策略;或者文本相似度+深度学习标签匹配策略的组合策略模式;结果置信度计算模式为求取不同策略置信度结果的加权平均值作为模型的最终置信度结果;
    运行阶段:卫星资讯文本数量大于5000时,匹配策略和标签匹配模式为:深度学习标签匹配策略的单策略模式,结果置信度计算模式为直接使用单一策略的置信度结果作为模型的最终置信度结果。
  9. 根据权利要求8所述的方法,其特征在于:对匹配结果进行人工校验,其中,设计验证阶段以及开发实施阶段均辅以人工全校验;试运行阶段辅以人工定期抽样校验;运行阶段辅以人工不定期抽样校验。
  10. 根据权利要求8所述的方法,其特征在于:开发实施阶段,资讯每增加预设的篇幅,深度学习网络迭代更新一次,整个开发实施阶段,更新次数为10-20次。
  11. 根据权利要求6所述的方法,其特征在于:深度学习网络由嵌入层、2层双向LSTM层、3层全连接层构成;以资讯文本作为输入,标签匹配得分作为输出进行网络训练;网络输出的得分超过设置的阈值,则认为标签匹配,得分作为置信度。
  12. 根据权利要求4所述的方法,其特征在于:所述构建卫星知识图谱包括:
    将卫星资讯标签库的每一个标签以标签实体的形式,存入图数据库;
    将资讯文本以资讯实体的形式,存入图数据库;
    建立资讯实体与标签实体之间的关系,即匹配关系;
    使用知识图谱抽取工具从资讯实体的文本属性中,抽取的遥感卫星相关知识实体、关系、属性;
    使用知识图谱融合工具对上述得到的实体进行融合对齐;
    将上述抽取融合后的实体、关系、属性批量存入图数据库,完成遥感卫星知识领域的知识图谱构建。
  13. 根据权利要求1所述的方法,其特征在于:通过构建图谱增强型语义分析网络,对上述候选资讯进行相关性评测。
  14. 根据权利要求13所述的方法,其特征在于:所述图谱增强型语义分析网络包括上下文特征分析网络、标签特征分析网络、图谱特征分析网络;
    所述上下文特征分析网络于抽象和分析资讯文本本身的语义特征信息;
    所述标签特征分析网络用于提取抽象和分析标签特征信息;
    所述图谱特征分析网络用于提取抽象和分析图谱特征信息。
  15. 根据权利要求14所述的方法,其特征在于:所述相关性评测包括:
    S1、对当前资讯文本和推荐候选资讯集中的候选资讯文本分别执行如下处理:
    利用上下文特征分析网络将资讯文本经过语句随机采样,随机生成待分析文本集合,并映射为低维空间中的句向量,形成上下文特征矩阵,从所述上下文特征矩阵中进一步抽象文本上下文特征;
    利用标签特征分析网络将资讯文本的标签实体进行向量化,乘以标签的置信度信息,组成标签特征矩阵;对所述标签特征矩阵进行进一步特征抽象;
    利用图谱特征分析网络将资讯文本的非标签实体作为相对深度的隐藏关联信息载体,映射到低维空间形成实体/关系向量,组成图嵌入特征矩阵,对所述图嵌入特征矩阵进行进一步图谱特征的提取抽象;
    S2、将S1中针对每个候选资讯文本的处理结果分别与当前资讯文本的处理结果进行链接后,再通过全连接网络,最后输出表征两篇文本的关联程度的输出评测结果;
    上述两篇文本为当前资讯文本与推荐候选资讯集中的每篇候选资讯文本,当前资讯文本为用户从初步召回的卫星资讯中点开的一篇资讯文本。
  16. 一种遥感卫星资讯推荐系统,其特征在于:包括卫星资讯采集及存储 模块、卫星资讯标签智能匹配模块、遥感卫星领域知识图谱构建模块、卫星资讯查询推荐模块;
    所述卫星资讯采集及存储模块,用于采集卫星资讯文本并进行初步处理,将资讯文本组织为预设的数据存储结构并存储;
    所述卫星资讯标签智能匹配模块,用于对存储的资讯文本进行智能标签匹配,得到带有标签及置信度的卫星资讯;
    所述遥感卫星领域知识图谱构建模块,用于利用上述带有标签及置信度的卫星资讯,构建卫星知识图谱;
    所述卫星资讯查询推荐模块,根据输入的查询内容,对卫星资讯采集及存储模块存储的卫星资讯进行初步召回;基于构建的卫星知识图谱,挖掘上述召回资讯结果的关联资讯,作为推荐候选资讯集;对上述推荐候选资讯集中的候选资讯进行相关性评测,根据评测结果进行最终推荐。
  17. 根据权利要求16所述的系统,其特征在于:还包括资讯检索接口,用户通过该接口进行标签筛选、关键词查询或者时间范围查询,查询内容输入至卫星资讯查询推荐模块。
  18. 根据权利要求16所述的系统,其特征在于:还包括资讯详情接口,用户通过该接口查看资讯详情信息,用户查看的资讯作为当前资讯文本,资讯详情接口同时返回当前资讯文档的推荐结果。
  19. 根据权利要求16所述的系统,其特征在于:还包括卫星资讯统计报告生成模块以及统计报告管理接口;
    用户通过统计报告管理接口上传、修改、删除、或查询自定义的统计报告模板及历史生成的统计报告;所述卫星资讯统计报告生成模块根据统计报告模板生成统计报告。
  20. 一种遥感卫星资讯推荐设备,其特征在于:包括客户端和服务端;
    所述客户端设置资讯检索接口、资讯详情接口、统计报告管理接口;通过接口与服务端进行交互;
    用户通过资讯检索接口进行标签筛选、关键词查询或者时间范围查询,查询内容以检索请求表单的形式发送至服务端;
    用户通过资讯详情接口查看资讯详情信息,向服务端发送资讯详情请求表单;资讯详情接口同时返回服务端发送的当前资讯文档的推荐结果。
    用户通过统计报告管理接口上传、修改、删除、或查询自定义的统计报告模板及历史生成的统计报告;
    所述服务端运行权利要求1-15之一所述的方法,对客户端的接口输入进行响应。
  21. 根据权利要求20所述的设备,其特征在于:客户端为WEB浏览器,服务端由服务层和数据层构成,服务层运行权利要求1-15之一所述的方法,数据层包括数据存储组件以及组件部署的服务器。
PCT/CN2022/129937 2022-09-30 2022-11-04 一种遥感卫星资讯推荐方法、系统及设备 WO2024065952A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211216697.7 2022-09-30
CN202211216697.7A CN115640458A (zh) 2022-09-30 2022-09-30 一种遥感卫星资讯推荐方法、系统及设备

Publications (1)

Publication Number Publication Date
WO2024065952A1 true WO2024065952A1 (zh) 2024-04-04

Family

ID=84941757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129937 WO2024065952A1 (zh) 2022-09-30 2022-11-04 一种遥感卫星资讯推荐方法、系统及设备

Country Status (2)

Country Link
CN (1) CN115640458A (zh)
WO (1) WO2024065952A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010587A (zh) * 2023-03-23 2023-04-25 中国人民解放军63921部队 航天测发保障性条件知识推送方法、装置、介质及设备
CN116821712B (zh) * 2023-08-25 2023-12-19 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN116992294B (zh) * 2023-09-26 2023-12-19 成都国恒空间技术工程股份有限公司 卫星测控训练评估方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334632A (zh) * 2018-02-26 2018-07-27 深圳市腾讯计算机系统有限公司 实体推荐方法、装置、计算机设备和计算机可读存储介质
CN110674312A (zh) * 2019-09-18 2020-01-10 泰康保险集团股份有限公司 构建知识图谱方法、装置、介质及电子设备
WO2020253591A1 (zh) * 2019-06-19 2020-12-24 达而观信息科技(上海)有限公司 运用标签知识网络的搜索方法及装置
CN113220894A (zh) * 2021-02-07 2021-08-06 国家卫星气象中心(国家空间天气监测预警中心) 一种基于感知计算的卫星遥感数据智能获取方法
CN113553439A (zh) * 2021-06-18 2021-10-26 杭州摸象大数据科技有限公司 知识图谱挖掘的方法和系统
CN114491237A (zh) * 2021-12-31 2022-05-13 中国科学院空天信息创新研究院 一种遥感卫星数据个性化推荐方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334632A (zh) * 2018-02-26 2018-07-27 深圳市腾讯计算机系统有限公司 实体推荐方法、装置、计算机设备和计算机可读存储介质
WO2020253591A1 (zh) * 2019-06-19 2020-12-24 达而观信息科技(上海)有限公司 运用标签知识网络的搜索方法及装置
CN110674312A (zh) * 2019-09-18 2020-01-10 泰康保险集团股份有限公司 构建知识图谱方法、装置、介质及电子设备
CN113220894A (zh) * 2021-02-07 2021-08-06 国家卫星气象中心(国家空间天气监测预警中心) 一种基于感知计算的卫星遥感数据智能获取方法
CN113553439A (zh) * 2021-06-18 2021-10-26 杭州摸象大数据科技有限公司 知识图谱挖掘的方法和系统
CN114491237A (zh) * 2021-12-31 2022-05-13 中国科学院空天信息创新研究院 一种遥感卫星数据个性化推荐方法

Also Published As

Publication number Publication date
CN115640458A (zh) 2023-01-24

Similar Documents

Publication Publication Date Title
WO2024065952A1 (zh) 一种遥感卫星资讯推荐方法、系统及设备
AU2022201654A1 (en) System and engine for seeded clustering of news events
US20040034633A1 (en) Data search system and method using mutual subsethood measures
CN109885773B (zh) 一种文章个性化推荐方法、系统、介质及设备
US7289985B2 (en) Enhanced document retrieval
US20040024756A1 (en) Search engine for non-textual data
US20040024755A1 (en) System and method for indexing non-textual data
CN110222160A (zh) 智能语义文档推荐方法、装置及计算机可读存储介质
CN103838833A (zh) 基于相关词语语义分析的全文检索系统
CN109447266B (zh) 一种基于大数据的农业科技服务智能分拣方法
CN105426529A (zh) 基于用户搜索意图定位的图像检索方法及系统
Du et al. An approach for selecting seed URLs of focused crawler based on user-interest ontology
CN112307182B (zh) 一种基于问答系统的伪相关反馈的扩展查询方法
CN114254201A (zh) 一种科技项目评审专家的推荐方法
Giabelli et al. GraphLMI: A data driven system for exploring labor market information through graph databases
CN103942268A (zh) 搜索与应用相结合的方法、设备以及应用接口
CA2956627A1 (en) System and engine for seeded clustering of news events
CN112199508A (zh) 一种基于远程监督的参数自适应农业知识图谱推荐方法
Zaware et al. Text summarization using tf-idf and textrank algorithm
CN111104492B (zh) 一种基于层次化Attention机制的民航领域自动问答方法
CN111753151A (zh) 一种基于互联网用户行为的服务推荐方法
CN108959555B (zh) 查询式的扩展方法、装置、计算机设备及存储介质
Bao et al. Hot news prediction method based on natural language processing technology and its application
Alankritha Implementation of Semantic-Based Approach Against Frequency & Graph-Based Approach for Concise Human Summarization
Peng et al. Clustering-based topical web crawling for topic-specific information retrieval guided by incremental classifier

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22960551

Country of ref document: EP

Kind code of ref document: A1