CN116431825A - Construction method of 6G knowledge system for global full-scene on-demand service - Google Patents

Construction method of 6G knowledge system for global full-scene on-demand service Download PDF

Info

Publication number
CN116431825A
CN116431825A CN202310341182.8A CN202310341182A CN116431825A CN 116431825 A CN116431825 A CN 116431825A CN 202310341182 A CN202310341182 A CN 202310341182A CN 116431825 A CN116431825 A CN 116431825A
Authority
CN
China
Prior art keywords
knowledge
demand
knowledge base
topics
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310341182.8A
Other languages
Chinese (zh)
Inventor
李长乐
沙子凡
承楠
岳文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310341182.8A priority Critical patent/CN116431825A/en
Publication of CN116431825A publication Critical patent/CN116431825A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a construction method of a 6G knowledge system facing global full scene on-demand service, which comprises the following steps: realizing the construction of a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature; carrying out statistical analysis on literature metadata in a 6G knowledge base to realize prediction of 6G academic development; processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation; in the processes of statistical analysis, knowledge extraction and generation, the metadata in the 6G knowledge base is subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling; and carrying out demand tasks in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing the on-demand application of 6G knowledge driving. The method constructs a 6G knowledge base and a knowledge system, realizes the on-demand application of knowledge on the basis, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service.

Description

Construction method of 6G knowledge system for global full-scene on-demand service
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a construction method of a 6G knowledge system for global full-scene on-demand service.
Background
In recent years, 6G-oriented research and study is increasingly rising, but the current 6G-related concepts are not unified, and recognition and definition of consistency are urgently needed. Meanwhile, the academic and industry lack overall knowledge of the development of 6G, and scientific researchers have difficulty in obtaining clear knowledge of the progress of research in the related fields. In addition, 6G aims at realizing the paradigm shift from everything interconnection to everything intelligent association, and the introduction of knowledge and the intelligent implementation are also a great important characteristic of 6G distinguished from 5G.
There are two problems with the current development of the 6G academic field: firstly, because the research time is short, the exploration of the 6G related field lacks of integral system construction and vein carding, and the deep research of 6G theory and technology is limited; secondly, the service wish of ' the service is wanted at will ' can not be realized by simply applying an artificial intelligence algorithm in the existing communication system, the network is changed at will, the resources are shared at will ', and the realization of the deep fusion of the mobile communication technology and the artificial intelligence technology in a novel architecture system with embedded knowledge is an urgent need.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method for constructing a 6G knowledge system for global full-scene on-demand service. The technical problems to be solved by the invention are realized by the following technical scheme:
the embodiment of the invention provides a method for constructing a 6G knowledge system for global full-scene on-demand service, which comprises the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature;
s2, carrying out statistical analysis on literature metadata in the 6G knowledge base to realize prediction of 6G academic development;
s3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation;
s4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling;
s5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
In one embodiment of the present invention, step S1 includes:
extracting ontology and mode information by means of the 6G academic literature by means of a structured data source by combining expert knowledge and natural language processing technology, and adding the ontology and the mode information into a 6G knowledge base to realize top-down 6G knowledge base construction; and simultaneously, acquiring a target data mode from the 6G academic literature by using a labeling and induction method, selecting information with higher confidence in the target data mode, and recording the information into a 6G knowledge base, thereby realizing the construction of the 6G knowledge base from bottom to top.
In one embodiment of the invention, the 6G knowledge base includes metadata fields and extended attributes, wherein,
the metadata field comprises an ID, a title, a abstract, a field, a publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
In one embodiment of the present invention, step S2 includes:
and carrying out statistical analysis on the paper distribution, the hot spot field and the hot word distribution conditions in the 6G academic literature to realize prediction of 6G academic development.
In one embodiment of the present invention, step S3 includes:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge;
s32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof;
s33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word;
s34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure;
s35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out a topic collection corresponding to the topic words;
s36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
In one embodiment of the present invention, step S32 includes:
calculating the co-occurrence frequency of the keywords in the text corpus of the 6G knowledge base to capture the relevance words, and obtaining a theme and a keyword set corresponding to the theme;
sorting the relevance of the keywords by using the mutual information quantity, and selecting five keywords with highest relevance from the topics to name the topics, wherein the formula of the mutual information quantity is as follows:
Figure BDA0004158087870000031
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (Z) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
In one embodiment of the present invention, step S33 includes:
and scoring the similarity of the topics by adopting a BM25 algorithm, selecting the topic with the highest score as a target topic word, and scoring the similarity of the topics according to the formula:
Figure BDA0004158087870000032
wherein Q represents a corpus set, D represents one corpus of Q, IDF (Q i ) Representing keyword q i IDF value in Q, f (Q i D) represents keyword q i TF value, k in corpus D 1 Representing the term frequency saturation, b representing the field length reduction, i.e. the ratio of the corpus length of D to Q, |d| representing the corpus length, avgdl representing the average length of all the corpora in Q, and IDF and TF are calculated as follows:
Figure BDA0004158087870000041
wherein n is i Representing the number of times the keyword appears in the corpus,
Figure BDA0004158087870000042
representing the total number of occurrences of all keywords in D;
Figure BDA0004158087870000043
wherein 1+|{ j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
In one embodiment of the present invention, step S4 includes:
and in the processes of statistical analysis, knowledge extraction and generation, the scenes, the technologies and the indexes of each article in the 6G knowledge base are marked in a targeted manner.
In one embodiment of the present invention, step S5 includes:
and training a target model of the demand task by utilizing knowledge extraction and generation in the 6G knowledge system and 6G corpus data in the 6G knowledge base, and carrying out the demand task in the 6G field by utilizing the trained target model, so as to realize 6G knowledge driven on-demand application.
In one embodiment of the invention, the on-demand applications include relevance of demand, ambiguity of demand, and scalability of demand.
Compared with the prior art, the invention has the beneficial effects that:
the invention constructs a 6G knowledge base and a knowledge system, wherein the 6G knowledge base carries out structural storage on all 6G academic documents so far, and extends knowledge dimension on the basis of initial fields; the 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes the on-demand application of the knowledge on the basis; the 6G knowledge base and the 6G knowledge system are first knowledge clusters constructed for the whole 6G field, have multi-dimension and multi-field expandability, support knowledge generation combining top-down and bottom-up, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service.
Drawings
Fig. 1 is a flow diagram of a method for constructing a 6G knowledge system for global full-scene on-demand service according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a 6G knowledge base and knowledge system construction process according to an embodiment of the present invention;
FIG. 3 is a statistical diagram of 5G and 6G literature distribution provided by an embodiment of the invention;
FIG. 4 is a statistical chart of 6G Top-20 hotword distribution provided by an embodiment of the invention;
FIG. 5 is a flow chart of 6G knowledge tree construction provided by an embodiment of the invention;
FIG. 6 is a visual result diagram of a 6G knowledge tree provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of a 6G ten-large exemplary network scenario and ten-large key technologies provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of a 6G-BERT model architecture according to an embodiment of the present invention;
fig. 9 is a schematic diagram of text-based 6G hotspot recommendation provided in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
In the face of the rapidly developing 6G technology, how to conduct intelligent analysis on the overall view of the 6G research and the development context of the specific technology has become a common requirement of numerous scientific researchers and engineering personnel. In order to realize the mining of the 6G knowledge and the embedding of the native intelligence, the embodiment constructs a 6G knowledge base and a knowledge system. The 6G knowledge base and knowledge system built not only contains the storage and mining of data intelligence, but also aims to create an intelligent management and control platform for the full life cycle of academic knowledge. The construction of the 6G knowledge base and the knowledge system is beneficial to knowing the academic, industrial layout elevation and future development hot spot of the 6G, and the knowledge application according to needs can be realized through deep knowledge mining.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flow chart of a method for constructing a 6G knowledge base for global full-scene on-demand service according to an embodiment of the present invention, and fig. 2 is a schematic flow chart of a 6G knowledge base and knowledge system construction according to an embodiment of the present invention. The construction method comprises the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature.
The 6G knowledge base is based on structured 6G academic data, and combines expert knowledge and natural language processing technology to realize the top-down and bottom-up combined knowledge system construction. By top-down construction, we mean that by means of a structured data source, ontology and schema information are extracted from it and added to the knowledge base; the bottom-up construction is to obtain the required data mode by using labeling, induction and other methods, select the information with higher confidence coefficient, and add the information into the knowledge base.
In this embodiment, the 6G knowledge base stores all 6G academic documents up to now in a structured manner, and extends knowledge dimensions based on the initial fields. Specifically, 1754 academic documents related to 6G were screened out in total in this example, and the documents belong to 633 different fields in total. Firstly, preprocessing the 6G literature obtained by screening to form normalized text data, thereby constructing a 6G-oriented knowledge corpus. Currently, the 6G knowledge base contains initial data fields (i.e., metadata fields) such as specific IDs, titles, summaries, fields, publication year, and doi number of 1754 articles. At the same time, the knowledge base supports both longitudinal (number of articles) and lateral (article property category) expansion. At present, a plurality of attribute dimensions including scenes, technologies and KPIs are expanded, and knowledge expansion according to needs is supported.
In summary, the 6G knowledge base includes metadata fields and extended attributes, the metadata fields include ID, title, abstract, field, publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
And S2, carrying out statistical analysis on the literature metadata in the 6G knowledge base to realize prediction of 6G academic development.
Specifically, statistical analysis can be performed on the distribution of papers with time, the hot spot field and the distribution situation of hot words in the 6G academic literature, so as to predict the development of the 6G academic, that is, statistical analysis is performed on structural data such as metadata fields and expansion attributes of the 6G literature according to years, fields and the like, so that the overall grasp of the current development situation of the 6G academic is realized, and the future development trend can be further predicted on the basis of the statistical analysis.
In a specific embodiment, the knowledge base obtains 1754 6G related documents and 36682 5G related documents in total through combined automatic and manual cleaning screening. In this embodiment, the year distribution statistics of the above documents are performed, please refer to fig. 3, and fig. 3 is a 5G and 6G document distribution statistics chart provided in the embodiment of the present invention. As is clear from the figure, the 6G academy began to sprout from 2018, the acceleration tendencies increased from 2019 to 2021, the number of articles in a single year in 2021 reached 1067, the development potential was rapid, and the breakthrough growth was expected to be maintained in the next few years. However, 6G academic is still in the beginning, and the 6G development situation in the future cannot be imaged from data in recent years. In order to better understand the future development trend of 6G, the present embodiment uses the distribution situation of the 5G academic literature as a reference and a proof, and compares and analyzes and predicts the growth trend of the 6G academic. As can be seen from fig. 3, the 5G academy began to sprout around 2010; 2013 to 2018 are in a very rapid growth period; the growth speed is slowed down after 2018, and the peak value of 5G article distribution is reached in 2020, and the article number is up to nearly 8000; the development period from germination to peak was 8-10 years. After 2020, a declining trend is presented, and it is expected that a paper publication trend from a descent to a steep descent will be presented in the next 5 years.
In a specific embodiment, the embodiment extracts and analyzes the high-frequency hotwords appearing in the 6G knowledge base, and performs statistical analysis on the hotwords of Top-20, the statistical result is shown in FIG. 4, and FIG. 4 is a statistical chart of distribution of the 6G Top-20 hotwords provided by the embodiment of the invention. As can be seen from FIG. 4, the hotwords of Top-20 are in order: edge computing, AI, security, internet of things, terahertz, internet of vehicles, MIMO, cellular networks, cloud computing, intelligent supersurface, virtualization, blockchain, satellite, non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA), software, beamforming, quantum communication, network slicing, big data, and the like. These hotwords clearly correspond to the typical scenario, enabling technology and direction of development of 6G. Specifically, future scenes of integrated construction of the air, the ground and the sea, three-dimensional traffic of the internet of vehicles, intelligent collaboration of cloud side ends and distributed centralized fusion deployment exist; deep exploration of terahertz, massive MIMO, blockchain and other emerging technologies; and development concepts and new frameworks for virtualization, software, on-demand services, etc. These hotword distributions also provide a current full-field development profile for the development of 6G academic and the layout of industry and lead to future directions.
S3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation.
Specifically, implementing 6G knowledge extraction and generation includes, but is not limited to: carding the 6G venation by using a hierarchical topic detection algorithm to generate a 6G knowledge tree; training a language model by using the obtained 6G literature corpus to obtain a 6G-oriented language model-6G-BERT which can be applied to various 6G-related downstream knowledge services in the future; by utilizing the neural network model, the invention realizes 6G hot spot recommendation based on text generation, and can be used for carrying out fine-grained hot spot recommendation and association for each sub-scene of 6G in the future.
Specifically, referring to fig. 5, fig. 5 is a flowchart of 6G knowledge tree construction provided in an embodiment of the present invention. Taking the generation of a 6G knowledge tree as an example, the implementation of knowledge extraction and generation specifically comprises the following steps:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge.
Specifically, normalized text data is stored in the 6G knowledge base, and a keyword set is extracted from the 6G knowledge base, where the set contains 10000 keywords in total. In addition, expert knowledge is utilized for screening and checking, a matching word list in the 6G field is constructed, the matching word list is used for carrying out fuzzy matching with knowledge context nodes, and the matching word list contains 4278 6G knowledge system subject node candidate words.
S32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof.
Specifically, the step is a first-layer association, and mainly comprises capturing of association words and determination of subject words.
Capturing the relevance words is mainly determined through co-occurrence characteristics of the keywords, namely, the frequency of the co-occurrence of the keywords in the text corpus of the 6G knowledge base is calculated to capture the relevance words, and the topics and the corresponding keyword sets are obtained. Further, when the frequency of occurrence reaches a set threshold, the keywords are identified as relevant, i.e., as belonging to a topic.
For the determination of the subject term, the relevance of the keywords is ordered by using the mutual information quantity I (X; Y), then five keywords with the highest relevance under a certain subject are selected to name the subject, and the formula of the mutual information quantity I (X; Y) is defined as follows:
Figure BDA0004158087870000081
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (X) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
And S33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word.
Specifically, knowledge should be presented in a condensed mode, and named topics of five words are slightly redundant and are unfavorable for visual expression of knowledge tree visualization, so that the embodiment further utilizes the constructed 6G matching word list to carry out fuzzy matching on the topics to obtain subject words.
Further, in this embodiment, the BM25 algorithm is used to score the similarity of the topics in the matching vocabulary, and the topic with the highest score is selected as the target subject term. The formula for scoring the similarity of the subject is:
Figure BDA0004158087870000091
wherein Q represents a corpus; d represents one corpus in Q; IDF (q) i ) Representing keyword q i The IDF value in Q, i.e. Q i The more rare the weight is, the higher the importance in Q, so it decreases with increasing number of words; f (q) i D) represents keyword q i TF values in corpus D, i.e. q i The importance level in D increases with the number of words; k (k) 1 Representing the term frequency saturation, which is used to adjust the rate of saturation change; b represents field length reduction, i.e. the ratio of the corpus length of D to Q; |d| represents corpus length; avgdl represents the average length of all corpora in Q, and the calculation formulas of IDF and TF are as follows:
Figure BDA0004158087870000092
wherein n is i Representing the number of times the keyword appears in the corpus,
Figure BDA0004158087870000093
representing the total number of occurrences of all keywords in D;
Figure BDA0004158087870000094
wherein 1+|{ j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
S34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure.
Specifically, on the basis of the first-layer association, the obtained first-level theme is used as a keyword to carry out the association again, and the association is repeated continuously, so that a knowledge tree hierarchical structure in the second-layer association, namely a theme hierarchical structure, can be obtained.
And S35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out the discourse set corresponding to the topic words.
Specifically, after the topic and the hierarchical structure thereof are obtained, similarity matching is performed on the topic and the corpus of papers again, so that the collection of papers related to the topic is recommended. In one particular embodiment, the topic and corpus of papers may be similarity matched using an elastic search method.
S36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
Referring to fig. 6, fig. 6 is a visual result diagram of a 6G knowledge tree according to an embodiment of the present invention. In fig. 6, the 6G knowledge tree contains 6 levels, in total 1453 topics, with different colors representing different levels of nodes.
In the whole, the 6G knowledge tree covers all sides of the 6G whole field, and especially corresponds to a 6G ten-large typical scene and ten-large key technologies. Specifically, as shown in the enlarged part of fig. 6, the node of the internet of things includes an industrial internet of things and an internet of vehicles, the industrial internet of things relates to the aspects of sensor fusion technology, equipment detection, data security, and the like, and the internet of vehicles covers the sub-fields of Vehicle-to-Vehicle (V2V), vehicle-to-Cloud (V2C), vehicle-to-Infrastructure (V2I), and the like; the space-to-sea nodes include space-based networks including various satellite communications, space-based networks including high altitude space and near-earth space, and land-based networks including cellular networks, WI-FI, device-to-Device (D2D), and the like, relating to UAV communications and related technologies. It can be said that the 6G topic tree gives attention to accuracy and reliability of knowledge on the basis of realizing the whole field overview of the 6G academic. In addition, the access to the nodes can provide recommendation of related papers, and knowledge searching in a specific field is facilitated.
The embodiment generates the 6G knowledge tree from massive academic data by using a hierarchical topic detection algorithm, realizes three-layer association among keywords, topics and papers, and realizes the refinement of the 6G full-field knowledge structure.
S4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; the 6G knowledge system is formed by three layers of kernels of statistical analysis, knowledge extraction, generation and knowledge labeling.
In addition to metadata-oriented statistical analysis and knowledge extraction, the embodiment also performs regularized knowledge labeling work. Knowledge tagging refers to the structured tagging of text or other forms of data that can be used to identify important information, such as entities, relationships, events, etc., in text, typically based on predefined categories and rules, so that the computer program can better understand and utilize the data. The embodiment mainly carries out targeted labeling on important attributes such as typical scenes, enabling technologies, key KPIs and the like of each article in the 6G knowledge base. The marked data can be applied to extensive knowledge-on-demand service, and mainly relates to scene recognition, technical association, KPI cluster analysis and the like at present, and more scientific research and application requirements are oriented in the future.
Furthermore, a 6G knowledge system is formed by three layers of kernels of statistical analysis, knowledge extraction, generation and knowledge labeling. The 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes knowledge application according to needs on the basis.
There is also an interactive association between the three kernels of the 6G knowledge system: knowledge labeling provides data samples for knowledge generation for relevant model training driving; knowledge generation provides knowledge dimensions to be annotated for knowledge annotation; knowledge generation provides statistically significant data dimensions for statistical analysis; the statistical analysis results may guide the extraction and generation of specific knowledge. The three-layer kernel pushes the 6G knowledge system to realize knowledge distillation oriented to the 6G specific field, and the output of the knowledge distillation is further fed back to the 6G database to realize the cyclic operation of knowledge, namely knowledge closed loop. Therefore, the 6G knowledge base has strong expandability, and knowledge is not limited to defined rules, but is reasonable in reasoning and discovery in the field scope. In addition, by using the AI technology, the presentation of specific knowledge concepts, the mining of service requirements, the recommendation of decisions and the like can be realized, and the results can be continuously used as the input of a knowledge base to realize the knowledge growth and the knowledge closed loop in the real sense. Therefore, by constructing a 6G knowledge base and a knowledge system, full life cycle closed loop control from extraction and mining of 6G academic knowledge to on-demand application is realized.
Referring to fig. 7, fig. 7 is a schematic diagram of a 6G ten-large typical network scenario and ten-large key technologies provided in an embodiment of the present invention. After carrying out multi-round rule cleaning and manual screening on the knowledge base, the embodiment summarizes and defines ten typical scenes of 6G, which are respectively: full sense immersive communication, three-dimensional amphibious transportation, twin virtual interaction, full-function full-automatic green industry, general sense calculation integrated network, smart city and life, full coverage cross-domain space communication, ubiquitous intelligent on-demand interaction, anti-interference safe trusted network and disaster adaptability network. Meanwhile, the embodiment also summarizes the 6G ten core technologies, which are respectively: holographic communication, terahertz technology, visible light technology, digital twinning, intelligent super-surface, knowledge graph, intention driving, large-scale MIMO, block chain and big data. The 6G typical scene and the core technology are closely related, and complement each other, and together, the blueprint and the direction of the future development of the 6G are depicted.
S5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
In particular, the purpose of building a 6G knowledge system is to drive related knowledge applications to enable global full-scene on-demand services. It can be understood that the 6G knowledge base and the 6G knowledge system are constructed, and corresponding scenes, technologies and service requirements can be automatically analyzed according to specific fields, so that full scene on-demand service in a true sense is realized.
The present embodiment first combs the association logic of the related concepts: so-called "on-demand services", belonging to the category of 6G services, such as communication, access, autopilot, XR, etc., are intended to provide customized services for users containing different subject, environment, demand features in 6G; so-called "knowledge driven" is a new technology that aims to make on-demand services more accurate, efficient and low-cost; the "knowledge application", i.e. some programmed applications that the 6G knowledge system can provide for knowledge driven on-demand services, can implement knowledge driven on-demand services by invoking specific applications or methods; the enabling of the knowledge-driven 6G on-demand service is to realize knowledge analysis, generation, reasoning, recommendation and the like meeting the service requirements on the basis of extracting corresponding data modes and entity attributes aiming at known or unknown, standardized or customized service requirements, and has knowledge service management and control capability of a full life cycle. Taking scene cognition as an example to illustrate the implementation process of 'knowledge driven on demand services': decomposing 6G full scene elements to construct an ontology structure comprising environment, main body, resource and service; secondly, the knowledge graph is utilized to characterize and cognize the relation between different entities and examples, and a multidimensional multi-granularity rapid resource perception scheme is formed on the basis; and finally, according to the perception and transfer of network intention, realizing accurate demand identification and strategy generation under different scenes.
To further illustrate the ability of the 6G knowledge system to be applied on demand, this example summarizes three main implications on demand: one is the correlation of the demands, i.e. the knowledge itself has to help in the analysis and knowledge of the demands, so that the services provided reflect the demands practically; secondly, the ambiguity of the requirement, namely the requirement is not provided with fixed granularity and characteristics, and knowledge can provide a certain selected space for the requirement; thirdly, the scalability of the requirements, i.e. the knowledge services need to have the ability to handle new requirements or unknown requirements.
In order to realize the fusion of knowledge and on-demand services, the 6G knowledge base and the knowledge system constructed by the embodiment provide rich knowledge application. Specifically, the 6G knowledge base contains abundant corpus data, and through training the texts, a plurality of 6G field related demand tasks such as scene recognition, technical association, KPI clustering and the like can be performed, so that the application of the 6G knowledge base is realized, namely, the 6G corpus data in the 6G knowledge base is extracted and generated by utilizing knowledge in the 6G knowledge base to train a target model of the demand task, and the trained target model is utilized to perform the 6G field demand tasks, so that the 6G knowledge driven on-demand application is realized. Further, 6G knowledge driven on-demand applications include, but are not limited to: scene recognition, technical association, KPI clustering, knowledge map generation, knowledge completion and reasoning, and on-demand knowledge recommendation.
In the embodiment, taking 6G hot spot recommendation based on text generation as an example, knowledge application oriented to 6G on-demand service is realized by utilizing theoretical tools such as deep learning, neural networks and the like.
Specifically, the conventional text generation modes include text generation based on a language model and text generation based on a deep learning method, and the 6G knowledge system integrates and applies the two modes. For the former, the 6G knowledge system trains a 6G-BERT language model by using a 6G literature corpus, the model architecture is shown in fig. 8, fig. 8 is a schematic diagram of the 6G-BERT model architecture provided by the embodiment of the invention, along with the continuous expansion of knowledge and corpus, the 6G-BERT is also gradually complete, and the 6G-BERT is applied to a large number of downstream applications based on the language model in the future; for the latter, the present embodiment enables automatic generation of text sequences by training the LSTM model, taking into account the sequence order and contextual relevance of the natural language.
The text generation task is essentially a multi-classification problem, and for the text generation task at the character level, the class to be distinguished is the character class in the text. The 6G academic corpus is cut according to every 100 characters, 2022120 samples are obtained, the training set and the verification set are divided according to the ratio of 4:1, and model training is performed. The corresponding parameter settings are shown in table 1:
table 2 related parameter settings
Figure BDA0004158087870000131
Figure BDA0004158087870000141
In the prediction stage, a test text 'future six-generation' is input into a model, and output passes through a Softmax layer to obtain a 148-dimensional probability result. And finally, sampling the probability of the Top five ranks according to the distribution of the probability by utilizing Top-k sampling, and outputting corresponding characters. And part of the generation result is shown in fig. 9, and fig. 9 is a schematic diagram of 6G hot spot recommendation based on text generation according to an embodiment of the present invention.
In the whole, the 6G knowledge system and the knowledge base model effectively generate a plurality of 6G field related information, and the 6G field related information comprises 6G scenes, technologies and attributes and characteristics thereof, such as artificial intelligence, network security, cellular network, access network, mobile edge computing, antenna technology, data transmission technology, non-orthogonal technology, time delay, capacity, energy efficiency and the like, so that hot spot recommendation consistent and relevant to input text is realized. On the basis of the application of the knowledge system, the smoothness and the correctness of text generation can be improved by introducing more artificial intelligence theory and technology, and the capability of knowledge service on demand is further improved. Qualitatively, the knowledge application for 6G hot spot recommendation is attached to the three connotations of the on-demand service, and the recommendation result is highly related to the required content and comprises a plurality of dimensions such as scenes, technologies, characteristics, attributes and the like, so that the relevance and the ambiguity of the requirements are met. Meanwhile, the input of the method can be any character and sentence, so that the method can meet the emerging or unknown requirements, and the expandability of the requirements is fully met. In addition, the 6G knowledge base and the knowledge system also provide various types of on-demand services and knowledge applications, and on the basis of further realizing demand analysis and granularity division, the method is expected to realize scene intelligent linkage and cross-domain services in the future.
The 6G knowledge base and the knowledge system constructed by the embodiment realize the on-demand application of the knowledge on the basis of extracting and summarizing the knowledge in the whole 6G field; the analysis of current 6G academic knowledge is beneficial to guiding the strategic layout and future development of the 6G fields; meanwhile, the introduction of the 6G knowledge can realize three-dimensional perception, decision-making inference and dynamic adjustment for service requirements and management and control thereof, such as network management and control, optimization and the like related to enabling of long-term accumulated knowledge in the network and communication field; the proposed 6G knowledge base and knowledge system are the first knowledge clusters constructed for the 6G whole field, and have important significance for overview of the 6G whole view and enabling the whole scene on-demand service.
In summary, the embodiment constructs a 6G knowledge base and a knowledge system, aiming at realizing the 6G landscape of "knowledge driving" and "on-demand service". The 6G knowledge base stores all 6G academic documents up to the present in a structured way, and extends knowledge dimension based on the initial field. The 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes the on-demand application of the knowledge on the basis. The 6G knowledge base and the knowledge system have multi-dimension and multi-field expandability, support knowledge generation combining top-down and bottom-up, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service. In the future, a knowledge platform for 6G on-demand service can be created on the basis of realizing the fine-granularity sensing and knowledge expansion of the demand, and more intelligent on-demand knowledge service can be provided.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. The construction method of the 6G knowledge system for the global full scene on-demand service is characterized by comprising the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature;
s2, carrying out statistical analysis on literature metadata in the 6G knowledge base to realize prediction of 6G academic development;
s3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation;
s4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling;
s5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
2. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S1 comprises:
extracting ontology and mode information by means of the 6G academic literature by means of a structured data source by combining expert knowledge and natural language processing technology, and adding the ontology and the mode information into a 6G knowledge base to realize top-down 6G knowledge base construction; and simultaneously, acquiring a target data mode from the 6G academic literature by using a labeling and induction method, selecting information with higher confidence in the target data mode, and recording the information into a 6G knowledge base, thereby realizing the construction of the 6G knowledge base from bottom to top.
3. The method for building a global full scene on-demand service oriented 6G knowledge base of claim 1, wherein said 6G knowledge base comprises metadata fields and extended attributes, wherein,
the metadata field comprises an ID, a title, a abstract, a field, a publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
4. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S2 comprises:
and carrying out statistical analysis on the paper distribution, the hot spot field and the hot word distribution conditions in the 6G academic literature to realize prediction of 6G academic development.
5. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S3 comprises:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge;
s32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof;
s33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word;
s34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure;
s35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out a topic collection corresponding to the topic words;
s36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
6. The method for building a 6G knowledge system for global full scene on-demand services as claimed in claim 5, wherein step S32 comprises:
calculating the co-occurrence frequency of the keywords in the text corpus of the 6G knowledge base to capture the relevance words, and obtaining a theme and a keyword set corresponding to the theme;
sorting the relevance of the keywords by using the mutual information quantity, and selecting five keywords with highest relevance from the topics to name the topics, wherein the formula of the mutual information quantity is as follows:
Figure FDA0004158087860000021
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (X) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
7. The method for building a 6G knowledge system for global full scene on-demand services as claimed in claim 5, wherein step S33 comprises:
and scoring the similarity of the topics by adopting a BM25 algorithm, selecting the topic with the highest score as a target topic word, and scoring the similarity of the topics according to the formula:
Figure FDA0004158087860000031
wherein Q represents a corpus set, D represents one corpus in Q, (-) is shown i ) Representing keyword q i IDF value in Q, f (Q i (ii) represents keyword q i TF value, k in corpus D 1 Representing the term frequency saturation, b representing the field length reduction, i.e. the ratio of the corpus length of D to Q, |d| representing the corpus length, avgdl representing the average length of all the corpora in Q, and IDF and TF are calculated as follows:
Figure FDA0004158087860000032
wherein n is i Representing the number of times the keyword appears in the corpus,
Figure FDA0004158087860000033
representing the total number of occurrences of all keywords in D;
Figure FDA0004158087860000034
wherein 1+ { j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
8. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S4 comprises:
and in the processes of statistical analysis, knowledge extraction and generation, the scenes, the technologies and the indexes of each article in the 6G knowledge base are marked in a targeted manner.
9. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S5 comprises:
and training a target model of the demand task by utilizing knowledge extraction and generation in the 6G knowledge system and 6G corpus data in the 6G knowledge base, and carrying out the demand task in the 6G field by utilizing the trained target model, so as to realize 6G knowledge driven on-demand application.
10. The method for building a global full scene on-demand service oriented 6G knowledge system according to claim 1, wherein the on-demand applications include correlation of demands, ambiguity of demands, and scalability of demands.
CN202310341182.8A 2023-03-31 2023-03-31 Construction method of 6G knowledge system for global full-scene on-demand service Pending CN116431825A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310341182.8A CN116431825A (en) 2023-03-31 2023-03-31 Construction method of 6G knowledge system for global full-scene on-demand service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310341182.8A CN116431825A (en) 2023-03-31 2023-03-31 Construction method of 6G knowledge system for global full-scene on-demand service

Publications (1)

Publication Number Publication Date
CN116431825A true CN116431825A (en) 2023-07-14

Family

ID=87079077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310341182.8A Pending CN116431825A (en) 2023-03-31 2023-03-31 Construction method of 6G knowledge system for global full-scene on-demand service

Country Status (1)

Country Link
CN (1) CN116431825A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236409A (en) * 2023-11-16 2023-12-15 中电科大数据研究院有限公司 Small model training method, device and system based on large model and storage medium
CN117573956A (en) * 2024-01-16 2024-02-20 中国电信股份有限公司深圳分公司 Metadata management method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236409A (en) * 2023-11-16 2023-12-15 中电科大数据研究院有限公司 Small model training method, device and system based on large model and storage medium
CN117236409B (en) * 2023-11-16 2024-02-27 中电科大数据研究院有限公司 Small model training method, device and system based on large model and storage medium
CN117573956A (en) * 2024-01-16 2024-02-20 中国电信股份有限公司深圳分公司 Metadata management method, device, equipment and storage medium
CN117573956B (en) * 2024-01-16 2024-05-07 中国电信股份有限公司深圳分公司 Metadata management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN116431825A (en) Construction method of 6G knowledge system for global full-scene on-demand service
Jain et al. Ontology based information retrieval in semantic web: A survey
US10007679B2 (en) Enhanced max margin learning on multimodal data mining in a multimedia database
US7765176B2 (en) Knowledge discovery system with user interactive analysis view for analyzing and generating relationships
CN105653691B (en) Management of information resources method and managing device
Amato et al. Recommendation in social media networks
CN114218400A (en) Semantic-based data lake query system and method
CN101834837A (en) On-line landscape video active information service system of scenic spots in tourist attraction based on bandwidth network
CN111582587B (en) Prediction method and prediction system for video public sentiment
Ye et al. A web services classification method based on GCN
Wang et al. A novel blockchain oracle implementation scheme based on application specific knowledge engines
CN113792786A (en) Automatic commodity object classification method and device, equipment, medium and product thereof
Obaid et al. Semantic web and web page clustering algorithms: a landscape view
Gourru et al. Document network projection in pretrained word embedding space
Kalmukov et al. Design and development of an automated web crawler used for building image databases
CN112069306B (en) Paper partner recommendation method based on author writing tree and graph neural network
CN116738068A (en) Trending topic mining method, device, storage medium and equipment
Xu et al. Video structural description: a semantic based model for representing and organizing video surveillance big data
Fahad et al. Towards Classification of Web Ontologies for the Emerging Semantic Web.
CN112733021A (en) Knowledge and interest personalized tracing system for internet users
Hu et al. Video content classification using time-sync comments and titles
Amato et al. Semantic summarization of news from heterogeneous sources
Lakshmi et al. Search for social smart objects constituting sensor ontology, social iot and social network interaction
Zhang et al. A deep recommendation framework for completely new users in mashup creation
CN113505600B (en) Distributed indexing method of industrial chain based on semantic concept space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination