CN116431825A - Construction method of 6G knowledge system for global full-scene on-demand service - Google Patents
Construction method of 6G knowledge system for global full-scene on-demand service Download PDFInfo
- Publication number
- CN116431825A CN116431825A CN202310341182.8A CN202310341182A CN116431825A CN 116431825 A CN116431825 A CN 116431825A CN 202310341182 A CN202310341182 A CN 202310341182A CN 116431825 A CN116431825 A CN 116431825A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- demand
- knowledge base
- topics
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 19
- 238000005516 engineering process Methods 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 238000007619 statistical method Methods 0.000 claims abstract description 27
- 238000011161 development Methods 0.000 claims abstract description 23
- 238000002372 labelling Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000003058 natural language processing Methods 0.000 claims abstract description 10
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000013135 deep learning Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 4
- 238000009826 distribution Methods 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 230000006698 induction Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000009411 base construction Methods 0.000 claims description 2
- 230000018109 developmental process Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000011160 research Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 208000018910 keratinopathic ichthyosis Diseases 0.000 description 5
- 238000005065 mining Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000009960 carding Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013140 knowledge distillation Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 206010042135 Stomatitis necrotising Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 210000001520 comb Anatomy 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000035784 germination Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 201000008585 noma Diseases 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a construction method of a 6G knowledge system facing global full scene on-demand service, which comprises the following steps: realizing the construction of a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature; carrying out statistical analysis on literature metadata in a 6G knowledge base to realize prediction of 6G academic development; processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation; in the processes of statistical analysis, knowledge extraction and generation, the metadata in the 6G knowledge base is subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling; and carrying out demand tasks in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing the on-demand application of 6G knowledge driving. The method constructs a 6G knowledge base and a knowledge system, realizes the on-demand application of knowledge on the basis, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a construction method of a 6G knowledge system for global full-scene on-demand service.
Background
In recent years, 6G-oriented research and study is increasingly rising, but the current 6G-related concepts are not unified, and recognition and definition of consistency are urgently needed. Meanwhile, the academic and industry lack overall knowledge of the development of 6G, and scientific researchers have difficulty in obtaining clear knowledge of the progress of research in the related fields. In addition, 6G aims at realizing the paradigm shift from everything interconnection to everything intelligent association, and the introduction of knowledge and the intelligent implementation are also a great important characteristic of 6G distinguished from 5G.
There are two problems with the current development of the 6G academic field: firstly, because the research time is short, the exploration of the 6G related field lacks of integral system construction and vein carding, and the deep research of 6G theory and technology is limited; secondly, the service wish of ' the service is wanted at will ' can not be realized by simply applying an artificial intelligence algorithm in the existing communication system, the network is changed at will, the resources are shared at will ', and the realization of the deep fusion of the mobile communication technology and the artificial intelligence technology in a novel architecture system with embedded knowledge is an urgent need.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method for constructing a 6G knowledge system for global full-scene on-demand service. The technical problems to be solved by the invention are realized by the following technical scheme:
the embodiment of the invention provides a method for constructing a 6G knowledge system for global full-scene on-demand service, which comprises the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature;
s2, carrying out statistical analysis on literature metadata in the 6G knowledge base to realize prediction of 6G academic development;
s3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation;
s4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling;
s5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
In one embodiment of the present invention, step S1 includes:
extracting ontology and mode information by means of the 6G academic literature by means of a structured data source by combining expert knowledge and natural language processing technology, and adding the ontology and the mode information into a 6G knowledge base to realize top-down 6G knowledge base construction; and simultaneously, acquiring a target data mode from the 6G academic literature by using a labeling and induction method, selecting information with higher confidence in the target data mode, and recording the information into a 6G knowledge base, thereby realizing the construction of the 6G knowledge base from bottom to top.
In one embodiment of the invention, the 6G knowledge base includes metadata fields and extended attributes, wherein,
the metadata field comprises an ID, a title, a abstract, a field, a publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
In one embodiment of the present invention, step S2 includes:
and carrying out statistical analysis on the paper distribution, the hot spot field and the hot word distribution conditions in the 6G academic literature to realize prediction of 6G academic development.
In one embodiment of the present invention, step S3 includes:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge;
s32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof;
s33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word;
s34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure;
s35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out a topic collection corresponding to the topic words;
s36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
In one embodiment of the present invention, step S32 includes:
calculating the co-occurrence frequency of the keywords in the text corpus of the 6G knowledge base to capture the relevance words, and obtaining a theme and a keyword set corresponding to the theme;
sorting the relevance of the keywords by using the mutual information quantity, and selecting five keywords with highest relevance from the topics to name the topics, wherein the formula of the mutual information quantity is as follows:
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (Z) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
In one embodiment of the present invention, step S33 includes:
and scoring the similarity of the topics by adopting a BM25 algorithm, selecting the topic with the highest score as a target topic word, and scoring the similarity of the topics according to the formula:
wherein Q represents a corpus set, D represents one corpus of Q, IDF (Q i ) Representing keyword q i IDF value in Q, f (Q i D) represents keyword q i TF value, k in corpus D 1 Representing the term frequency saturation, b representing the field length reduction, i.e. the ratio of the corpus length of D to Q, |d| representing the corpus length, avgdl representing the average length of all the corpora in Q, and IDF and TF are calculated as follows:
wherein n is i Representing the number of times the keyword appears in the corpus,representing the total number of occurrences of all keywords in D;
wherein 1+|{ j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
In one embodiment of the present invention, step S4 includes:
and in the processes of statistical analysis, knowledge extraction and generation, the scenes, the technologies and the indexes of each article in the 6G knowledge base are marked in a targeted manner.
In one embodiment of the present invention, step S5 includes:
and training a target model of the demand task by utilizing knowledge extraction and generation in the 6G knowledge system and 6G corpus data in the 6G knowledge base, and carrying out the demand task in the 6G field by utilizing the trained target model, so as to realize 6G knowledge driven on-demand application.
In one embodiment of the invention, the on-demand applications include relevance of demand, ambiguity of demand, and scalability of demand.
Compared with the prior art, the invention has the beneficial effects that:
the invention constructs a 6G knowledge base and a knowledge system, wherein the 6G knowledge base carries out structural storage on all 6G academic documents so far, and extends knowledge dimension on the basis of initial fields; the 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes the on-demand application of the knowledge on the basis; the 6G knowledge base and the 6G knowledge system are first knowledge clusters constructed for the whole 6G field, have multi-dimension and multi-field expandability, support knowledge generation combining top-down and bottom-up, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service.
Drawings
Fig. 1 is a flow diagram of a method for constructing a 6G knowledge system for global full-scene on-demand service according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a 6G knowledge base and knowledge system construction process according to an embodiment of the present invention;
FIG. 3 is a statistical diagram of 5G and 6G literature distribution provided by an embodiment of the invention;
FIG. 4 is a statistical chart of 6G Top-20 hotword distribution provided by an embodiment of the invention;
FIG. 5 is a flow chart of 6G knowledge tree construction provided by an embodiment of the invention;
FIG. 6 is a visual result diagram of a 6G knowledge tree provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of a 6G ten-large exemplary network scenario and ten-large key technologies provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of a 6G-BERT model architecture according to an embodiment of the present invention;
fig. 9 is a schematic diagram of text-based 6G hotspot recommendation provided in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
In the face of the rapidly developing 6G technology, how to conduct intelligent analysis on the overall view of the 6G research and the development context of the specific technology has become a common requirement of numerous scientific researchers and engineering personnel. In order to realize the mining of the 6G knowledge and the embedding of the native intelligence, the embodiment constructs a 6G knowledge base and a knowledge system. The 6G knowledge base and knowledge system built not only contains the storage and mining of data intelligence, but also aims to create an intelligent management and control platform for the full life cycle of academic knowledge. The construction of the 6G knowledge base and the knowledge system is beneficial to knowing the academic, industrial layout elevation and future development hot spot of the 6G, and the knowledge application according to needs can be realized through deep knowledge mining.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flow chart of a method for constructing a 6G knowledge base for global full-scene on-demand service according to an embodiment of the present invention, and fig. 2 is a schematic flow chart of a 6G knowledge base and knowledge system construction according to an embodiment of the present invention. The construction method comprises the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature.
The 6G knowledge base is based on structured 6G academic data, and combines expert knowledge and natural language processing technology to realize the top-down and bottom-up combined knowledge system construction. By top-down construction, we mean that by means of a structured data source, ontology and schema information are extracted from it and added to the knowledge base; the bottom-up construction is to obtain the required data mode by using labeling, induction and other methods, select the information with higher confidence coefficient, and add the information into the knowledge base.
In this embodiment, the 6G knowledge base stores all 6G academic documents up to now in a structured manner, and extends knowledge dimensions based on the initial fields. Specifically, 1754 academic documents related to 6G were screened out in total in this example, and the documents belong to 633 different fields in total. Firstly, preprocessing the 6G literature obtained by screening to form normalized text data, thereby constructing a 6G-oriented knowledge corpus. Currently, the 6G knowledge base contains initial data fields (i.e., metadata fields) such as specific IDs, titles, summaries, fields, publication year, and doi number of 1754 articles. At the same time, the knowledge base supports both longitudinal (number of articles) and lateral (article property category) expansion. At present, a plurality of attribute dimensions including scenes, technologies and KPIs are expanded, and knowledge expansion according to needs is supported.
In summary, the 6G knowledge base includes metadata fields and extended attributes, the metadata fields include ID, title, abstract, field, publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
And S2, carrying out statistical analysis on the literature metadata in the 6G knowledge base to realize prediction of 6G academic development.
Specifically, statistical analysis can be performed on the distribution of papers with time, the hot spot field and the distribution situation of hot words in the 6G academic literature, so as to predict the development of the 6G academic, that is, statistical analysis is performed on structural data such as metadata fields and expansion attributes of the 6G literature according to years, fields and the like, so that the overall grasp of the current development situation of the 6G academic is realized, and the future development trend can be further predicted on the basis of the statistical analysis.
In a specific embodiment, the knowledge base obtains 1754 6G related documents and 36682 5G related documents in total through combined automatic and manual cleaning screening. In this embodiment, the year distribution statistics of the above documents are performed, please refer to fig. 3, and fig. 3 is a 5G and 6G document distribution statistics chart provided in the embodiment of the present invention. As is clear from the figure, the 6G academy began to sprout from 2018, the acceleration tendencies increased from 2019 to 2021, the number of articles in a single year in 2021 reached 1067, the development potential was rapid, and the breakthrough growth was expected to be maintained in the next few years. However, 6G academic is still in the beginning, and the 6G development situation in the future cannot be imaged from data in recent years. In order to better understand the future development trend of 6G, the present embodiment uses the distribution situation of the 5G academic literature as a reference and a proof, and compares and analyzes and predicts the growth trend of the 6G academic. As can be seen from fig. 3, the 5G academy began to sprout around 2010; 2013 to 2018 are in a very rapid growth period; the growth speed is slowed down after 2018, and the peak value of 5G article distribution is reached in 2020, and the article number is up to nearly 8000; the development period from germination to peak was 8-10 years. After 2020, a declining trend is presented, and it is expected that a paper publication trend from a descent to a steep descent will be presented in the next 5 years.
In a specific embodiment, the embodiment extracts and analyzes the high-frequency hotwords appearing in the 6G knowledge base, and performs statistical analysis on the hotwords of Top-20, the statistical result is shown in FIG. 4, and FIG. 4 is a statistical chart of distribution of the 6G Top-20 hotwords provided by the embodiment of the invention. As can be seen from FIG. 4, the hotwords of Top-20 are in order: edge computing, AI, security, internet of things, terahertz, internet of vehicles, MIMO, cellular networks, cloud computing, intelligent supersurface, virtualization, blockchain, satellite, non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA), software, beamforming, quantum communication, network slicing, big data, and the like. These hotwords clearly correspond to the typical scenario, enabling technology and direction of development of 6G. Specifically, future scenes of integrated construction of the air, the ground and the sea, three-dimensional traffic of the internet of vehicles, intelligent collaboration of cloud side ends and distributed centralized fusion deployment exist; deep exploration of terahertz, massive MIMO, blockchain and other emerging technologies; and development concepts and new frameworks for virtualization, software, on-demand services, etc. These hotword distributions also provide a current full-field development profile for the development of 6G academic and the layout of industry and lead to future directions.
S3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation.
Specifically, implementing 6G knowledge extraction and generation includes, but is not limited to: carding the 6G venation by using a hierarchical topic detection algorithm to generate a 6G knowledge tree; training a language model by using the obtained 6G literature corpus to obtain a 6G-oriented language model-6G-BERT which can be applied to various 6G-related downstream knowledge services in the future; by utilizing the neural network model, the invention realizes 6G hot spot recommendation based on text generation, and can be used for carrying out fine-grained hot spot recommendation and association for each sub-scene of 6G in the future.
Specifically, referring to fig. 5, fig. 5 is a flowchart of 6G knowledge tree construction provided in an embodiment of the present invention. Taking the generation of a 6G knowledge tree as an example, the implementation of knowledge extraction and generation specifically comprises the following steps:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge.
Specifically, normalized text data is stored in the 6G knowledge base, and a keyword set is extracted from the 6G knowledge base, where the set contains 10000 keywords in total. In addition, expert knowledge is utilized for screening and checking, a matching word list in the 6G field is constructed, the matching word list is used for carrying out fuzzy matching with knowledge context nodes, and the matching word list contains 4278 6G knowledge system subject node candidate words.
S32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof.
Specifically, the step is a first-layer association, and mainly comprises capturing of association words and determination of subject words.
Capturing the relevance words is mainly determined through co-occurrence characteristics of the keywords, namely, the frequency of the co-occurrence of the keywords in the text corpus of the 6G knowledge base is calculated to capture the relevance words, and the topics and the corresponding keyword sets are obtained. Further, when the frequency of occurrence reaches a set threshold, the keywords are identified as relevant, i.e., as belonging to a topic.
For the determination of the subject term, the relevance of the keywords is ordered by using the mutual information quantity I (X; Y), then five keywords with the highest relevance under a certain subject are selected to name the subject, and the formula of the mutual information quantity I (X; Y) is defined as follows:
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (X) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
And S33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word.
Specifically, knowledge should be presented in a condensed mode, and named topics of five words are slightly redundant and are unfavorable for visual expression of knowledge tree visualization, so that the embodiment further utilizes the constructed 6G matching word list to carry out fuzzy matching on the topics to obtain subject words.
Further, in this embodiment, the BM25 algorithm is used to score the similarity of the topics in the matching vocabulary, and the topic with the highest score is selected as the target subject term. The formula for scoring the similarity of the subject is:
wherein Q represents a corpus; d represents one corpus in Q; IDF (q) i ) Representing keyword q i The IDF value in Q, i.e. Q i The more rare the weight is, the higher the importance in Q, so it decreases with increasing number of words; f (q) i D) represents keyword q i TF values in corpus D, i.e. q i The importance level in D increases with the number of words; k (k) 1 Representing the term frequency saturation, which is used to adjust the rate of saturation change; b represents field length reduction, i.e. the ratio of the corpus length of D to Q; |d| represents corpus length; avgdl represents the average length of all corpora in Q, and the calculation formulas of IDF and TF are as follows:
wherein n is i Representing the number of times the keyword appears in the corpus,representing the total number of occurrences of all keywords in D;
wherein 1+|{ j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
S34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure.
Specifically, on the basis of the first-layer association, the obtained first-level theme is used as a keyword to carry out the association again, and the association is repeated continuously, so that a knowledge tree hierarchical structure in the second-layer association, namely a theme hierarchical structure, can be obtained.
And S35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out the discourse set corresponding to the topic words.
Specifically, after the topic and the hierarchical structure thereof are obtained, similarity matching is performed on the topic and the corpus of papers again, so that the collection of papers related to the topic is recommended. In one particular embodiment, the topic and corpus of papers may be similarity matched using an elastic search method.
S36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
Referring to fig. 6, fig. 6 is a visual result diagram of a 6G knowledge tree according to an embodiment of the present invention. In fig. 6, the 6G knowledge tree contains 6 levels, in total 1453 topics, with different colors representing different levels of nodes.
In the whole, the 6G knowledge tree covers all sides of the 6G whole field, and especially corresponds to a 6G ten-large typical scene and ten-large key technologies. Specifically, as shown in the enlarged part of fig. 6, the node of the internet of things includes an industrial internet of things and an internet of vehicles, the industrial internet of things relates to the aspects of sensor fusion technology, equipment detection, data security, and the like, and the internet of vehicles covers the sub-fields of Vehicle-to-Vehicle (V2V), vehicle-to-Cloud (V2C), vehicle-to-Infrastructure (V2I), and the like; the space-to-sea nodes include space-based networks including various satellite communications, space-based networks including high altitude space and near-earth space, and land-based networks including cellular networks, WI-FI, device-to-Device (D2D), and the like, relating to UAV communications and related technologies. It can be said that the 6G topic tree gives attention to accuracy and reliability of knowledge on the basis of realizing the whole field overview of the 6G academic. In addition, the access to the nodes can provide recommendation of related papers, and knowledge searching in a specific field is facilitated.
The embodiment generates the 6G knowledge tree from massive academic data by using a hierarchical topic detection algorithm, realizes three-layer association among keywords, topics and papers, and realizes the refinement of the 6G full-field knowledge structure.
S4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; the 6G knowledge system is formed by three layers of kernels of statistical analysis, knowledge extraction, generation and knowledge labeling.
In addition to metadata-oriented statistical analysis and knowledge extraction, the embodiment also performs regularized knowledge labeling work. Knowledge tagging refers to the structured tagging of text or other forms of data that can be used to identify important information, such as entities, relationships, events, etc., in text, typically based on predefined categories and rules, so that the computer program can better understand and utilize the data. The embodiment mainly carries out targeted labeling on important attributes such as typical scenes, enabling technologies, key KPIs and the like of each article in the 6G knowledge base. The marked data can be applied to extensive knowledge-on-demand service, and mainly relates to scene recognition, technical association, KPI cluster analysis and the like at present, and more scientific research and application requirements are oriented in the future.
Furthermore, a 6G knowledge system is formed by three layers of kernels of statistical analysis, knowledge extraction, generation and knowledge labeling. The 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes knowledge application according to needs on the basis.
There is also an interactive association between the three kernels of the 6G knowledge system: knowledge labeling provides data samples for knowledge generation for relevant model training driving; knowledge generation provides knowledge dimensions to be annotated for knowledge annotation; knowledge generation provides statistically significant data dimensions for statistical analysis; the statistical analysis results may guide the extraction and generation of specific knowledge. The three-layer kernel pushes the 6G knowledge system to realize knowledge distillation oriented to the 6G specific field, and the output of the knowledge distillation is further fed back to the 6G database to realize the cyclic operation of knowledge, namely knowledge closed loop. Therefore, the 6G knowledge base has strong expandability, and knowledge is not limited to defined rules, but is reasonable in reasoning and discovery in the field scope. In addition, by using the AI technology, the presentation of specific knowledge concepts, the mining of service requirements, the recommendation of decisions and the like can be realized, and the results can be continuously used as the input of a knowledge base to realize the knowledge growth and the knowledge closed loop in the real sense. Therefore, by constructing a 6G knowledge base and a knowledge system, full life cycle closed loop control from extraction and mining of 6G academic knowledge to on-demand application is realized.
Referring to fig. 7, fig. 7 is a schematic diagram of a 6G ten-large typical network scenario and ten-large key technologies provided in an embodiment of the present invention. After carrying out multi-round rule cleaning and manual screening on the knowledge base, the embodiment summarizes and defines ten typical scenes of 6G, which are respectively: full sense immersive communication, three-dimensional amphibious transportation, twin virtual interaction, full-function full-automatic green industry, general sense calculation integrated network, smart city and life, full coverage cross-domain space communication, ubiquitous intelligent on-demand interaction, anti-interference safe trusted network and disaster adaptability network. Meanwhile, the embodiment also summarizes the 6G ten core technologies, which are respectively: holographic communication, terahertz technology, visible light technology, digital twinning, intelligent super-surface, knowledge graph, intention driving, large-scale MIMO, block chain and big data. The 6G typical scene and the core technology are closely related, and complement each other, and together, the blueprint and the direction of the future development of the 6G are depicted.
S5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
In particular, the purpose of building a 6G knowledge system is to drive related knowledge applications to enable global full-scene on-demand services. It can be understood that the 6G knowledge base and the 6G knowledge system are constructed, and corresponding scenes, technologies and service requirements can be automatically analyzed according to specific fields, so that full scene on-demand service in a true sense is realized.
The present embodiment first combs the association logic of the related concepts: so-called "on-demand services", belonging to the category of 6G services, such as communication, access, autopilot, XR, etc., are intended to provide customized services for users containing different subject, environment, demand features in 6G; so-called "knowledge driven" is a new technology that aims to make on-demand services more accurate, efficient and low-cost; the "knowledge application", i.e. some programmed applications that the 6G knowledge system can provide for knowledge driven on-demand services, can implement knowledge driven on-demand services by invoking specific applications or methods; the enabling of the knowledge-driven 6G on-demand service is to realize knowledge analysis, generation, reasoning, recommendation and the like meeting the service requirements on the basis of extracting corresponding data modes and entity attributes aiming at known or unknown, standardized or customized service requirements, and has knowledge service management and control capability of a full life cycle. Taking scene cognition as an example to illustrate the implementation process of 'knowledge driven on demand services': decomposing 6G full scene elements to construct an ontology structure comprising environment, main body, resource and service; secondly, the knowledge graph is utilized to characterize and cognize the relation between different entities and examples, and a multidimensional multi-granularity rapid resource perception scheme is formed on the basis; and finally, according to the perception and transfer of network intention, realizing accurate demand identification and strategy generation under different scenes.
To further illustrate the ability of the 6G knowledge system to be applied on demand, this example summarizes three main implications on demand: one is the correlation of the demands, i.e. the knowledge itself has to help in the analysis and knowledge of the demands, so that the services provided reflect the demands practically; secondly, the ambiguity of the requirement, namely the requirement is not provided with fixed granularity and characteristics, and knowledge can provide a certain selected space for the requirement; thirdly, the scalability of the requirements, i.e. the knowledge services need to have the ability to handle new requirements or unknown requirements.
In order to realize the fusion of knowledge and on-demand services, the 6G knowledge base and the knowledge system constructed by the embodiment provide rich knowledge application. Specifically, the 6G knowledge base contains abundant corpus data, and through training the texts, a plurality of 6G field related demand tasks such as scene recognition, technical association, KPI clustering and the like can be performed, so that the application of the 6G knowledge base is realized, namely, the 6G corpus data in the 6G knowledge base is extracted and generated by utilizing knowledge in the 6G knowledge base to train a target model of the demand task, and the trained target model is utilized to perform the 6G field demand tasks, so that the 6G knowledge driven on-demand application is realized. Further, 6G knowledge driven on-demand applications include, but are not limited to: scene recognition, technical association, KPI clustering, knowledge map generation, knowledge completion and reasoning, and on-demand knowledge recommendation.
In the embodiment, taking 6G hot spot recommendation based on text generation as an example, knowledge application oriented to 6G on-demand service is realized by utilizing theoretical tools such as deep learning, neural networks and the like.
Specifically, the conventional text generation modes include text generation based on a language model and text generation based on a deep learning method, and the 6G knowledge system integrates and applies the two modes. For the former, the 6G knowledge system trains a 6G-BERT language model by using a 6G literature corpus, the model architecture is shown in fig. 8, fig. 8 is a schematic diagram of the 6G-BERT model architecture provided by the embodiment of the invention, along with the continuous expansion of knowledge and corpus, the 6G-BERT is also gradually complete, and the 6G-BERT is applied to a large number of downstream applications based on the language model in the future; for the latter, the present embodiment enables automatic generation of text sequences by training the LSTM model, taking into account the sequence order and contextual relevance of the natural language.
The text generation task is essentially a multi-classification problem, and for the text generation task at the character level, the class to be distinguished is the character class in the text. The 6G academic corpus is cut according to every 100 characters, 2022120 samples are obtained, the training set and the verification set are divided according to the ratio of 4:1, and model training is performed. The corresponding parameter settings are shown in table 1:
table 2 related parameter settings
In the prediction stage, a test text 'future six-generation' is input into a model, and output passes through a Softmax layer to obtain a 148-dimensional probability result. And finally, sampling the probability of the Top five ranks according to the distribution of the probability by utilizing Top-k sampling, and outputting corresponding characters. And part of the generation result is shown in fig. 9, and fig. 9 is a schematic diagram of 6G hot spot recommendation based on text generation according to an embodiment of the present invention.
In the whole, the 6G knowledge system and the knowledge base model effectively generate a plurality of 6G field related information, and the 6G field related information comprises 6G scenes, technologies and attributes and characteristics thereof, such as artificial intelligence, network security, cellular network, access network, mobile edge computing, antenna technology, data transmission technology, non-orthogonal technology, time delay, capacity, energy efficiency and the like, so that hot spot recommendation consistent and relevant to input text is realized. On the basis of the application of the knowledge system, the smoothness and the correctness of text generation can be improved by introducing more artificial intelligence theory and technology, and the capability of knowledge service on demand is further improved. Qualitatively, the knowledge application for 6G hot spot recommendation is attached to the three connotations of the on-demand service, and the recommendation result is highly related to the required content and comprises a plurality of dimensions such as scenes, technologies, characteristics, attributes and the like, so that the relevance and the ambiguity of the requirements are met. Meanwhile, the input of the method can be any character and sentence, so that the method can meet the emerging or unknown requirements, and the expandability of the requirements is fully met. In addition, the 6G knowledge base and the knowledge system also provide various types of on-demand services and knowledge applications, and on the basis of further realizing demand analysis and granularity division, the method is expected to realize scene intelligent linkage and cross-domain services in the future.
The 6G knowledge base and the knowledge system constructed by the embodiment realize the on-demand application of the knowledge on the basis of extracting and summarizing the knowledge in the whole 6G field; the analysis of current 6G academic knowledge is beneficial to guiding the strategic layout and future development of the 6G fields; meanwhile, the introduction of the 6G knowledge can realize three-dimensional perception, decision-making inference and dynamic adjustment for service requirements and management and control thereof, such as network management and control, optimization and the like related to enabling of long-term accumulated knowledge in the network and communication field; the proposed 6G knowledge base and knowledge system are the first knowledge clusters constructed for the 6G whole field, and have important significance for overview of the 6G whole view and enabling the whole scene on-demand service.
In summary, the embodiment constructs a 6G knowledge base and a knowledge system, aiming at realizing the 6G landscape of "knowledge driving" and "on-demand service". The 6G knowledge base stores all 6G academic documents up to the present in a structured way, and extends knowledge dimension based on the initial field. The 6G knowledge system takes a 6G knowledge base as a carrier, is an important kernel for realizing knowledge growth and application of the 6G knowledge base, comprises statistical analysis of the whole 6G field, extraction and generation of the 6G knowledge and labeling of specific knowledge, and realizes the on-demand application of the knowledge on the basis. The 6G knowledge base and the knowledge system have multi-dimension and multi-field expandability, support knowledge generation combining top-down and bottom-up, and can realize efficient knowledge closed-loop and knowledge-driven on-demand service. In the future, a knowledge platform for 6G on-demand service can be created on the basis of realizing the fine-granularity sensing and knowledge expansion of the demand, and more intelligent on-demand knowledge service can be provided.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.
Claims (10)
1. The construction method of the 6G knowledge system for the global full scene on-demand service is characterized by comprising the following steps:
s1, combining expert knowledge and natural language processing technology, and constructing a 6G knowledge base combining top-down and bottom-up based on a 6G academic literature;
s2, carrying out statistical analysis on literature metadata in the 6G knowledge base to realize prediction of 6G academic development;
s3, processing and training the 6G corpus data in the 6G knowledge base by using a natural language processing technology and a deep learning method to realize knowledge extraction and generation;
s4, in the statistical analysis, knowledge extraction and generation processes, the metadata in the 6G knowledge base are subjected to knowledge labeling; wherein, three layers of kernels are formed into a 6G knowledge system through statistical analysis, knowledge extraction, generation and knowledge labeling;
s5, carrying out a demand task in the 6G field based on the 6G knowledge base and the 6G knowledge system, and realizing 6G knowledge driven on-demand application.
2. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S1 comprises:
extracting ontology and mode information by means of the 6G academic literature by means of a structured data source by combining expert knowledge and natural language processing technology, and adding the ontology and the mode information into a 6G knowledge base to realize top-down 6G knowledge base construction; and simultaneously, acquiring a target data mode from the 6G academic literature by using a labeling and induction method, selecting information with higher confidence in the target data mode, and recording the information into a 6G knowledge base, thereby realizing the construction of the 6G knowledge base from bottom to top.
3. The method for building a global full scene on-demand service oriented 6G knowledge base of claim 1, wherein said 6G knowledge base comprises metadata fields and extended attributes, wherein,
the metadata field comprises an ID, a title, a abstract, a field, a publication year, and doi number of the 6G academic document; the extended attributes include the number of articles and the article attribute category.
4. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S2 comprises:
and carrying out statistical analysis on the paper distribution, the hot spot field and the hot word distribution conditions in the 6G academic literature to realize prediction of 6G academic development.
5. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S3 comprises:
s31, extracting a keyword set from the 6G knowledge base, and constructing a matching word list in the 6G field by using expert knowledge;
s32, calculating the relativity between the keywords and the topics by using a hierarchical topic detection algorithm, and finding the topics and the corresponding keyword sets thereof;
s33, fuzzy matching is carried out on the subject by utilizing the matching word list, so as to obtain a subject word;
s34, calculating the relativity among the subject terms by using a hierarchical subject detection algorithm, and establishing a subject hierarchical structure;
s35, calculating the relativity of the topic words and the papers in the topic hierarchical structure, and finding out a topic collection corresponding to the topic words;
s36, forming a 6G knowledge tree by the theme and the keyword set corresponding to the theme, the theme hierarchical structure and the discourse set corresponding to the theme, and realizing knowledge extraction and generation.
6. The method for building a 6G knowledge system for global full scene on-demand services as claimed in claim 5, wherein step S32 comprises:
calculating the co-occurrence frequency of the keywords in the text corpus of the 6G knowledge base to capture the relevance words, and obtaining a theme and a keyword set corresponding to the theme;
sorting the relevance of the keywords by using the mutual information quantity, and selecting five keywords with highest relevance from the topics to name the topics, wherein the formula of the mutual information quantity is as follows:
wherein (X; Y) represents different topics X and topics Y, P (X, Y) represents the joint probability density function of the topics X and Y, P (X) represents the edge probability density function of the topics X, and P (Y) represents the edge probability density function of the topics Y.
7. The method for building a 6G knowledge system for global full scene on-demand services as claimed in claim 5, wherein step S33 comprises:
and scoring the similarity of the topics by adopting a BM25 algorithm, selecting the topic with the highest score as a target topic word, and scoring the similarity of the topics according to the formula:
wherein Q represents a corpus set, D represents one corpus in Q, (-) is shown i ) Representing keyword q i IDF value in Q, f (Q i (ii) represents keyword q i TF value, k in corpus D 1 Representing the term frequency saturation, b representing the field length reduction, i.e. the ratio of the corpus length of D to Q, |d| representing the corpus length, avgdl representing the average length of all the corpora in Q, and IDF and TF are calculated as follows:
wherein n is i Representing the number of times the keyword appears in the corpus,representing the total number of occurrences of all keywords in D;
wherein 1+ { j: t i ∈d j The } | represents the total number of corpora in Q in which the keyword appears.
8. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S4 comprises:
and in the processes of statistical analysis, knowledge extraction and generation, the scenes, the technologies and the indexes of each article in the 6G knowledge base are marked in a targeted manner.
9. The method for building a 6G knowledge system for global full-scene on-demand services according to claim 1, wherein step S5 comprises:
and training a target model of the demand task by utilizing knowledge extraction and generation in the 6G knowledge system and 6G corpus data in the 6G knowledge base, and carrying out the demand task in the 6G field by utilizing the trained target model, so as to realize 6G knowledge driven on-demand application.
10. The method for building a global full scene on-demand service oriented 6G knowledge system according to claim 1, wherein the on-demand applications include correlation of demands, ambiguity of demands, and scalability of demands.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310341182.8A CN116431825A (en) | 2023-03-31 | 2023-03-31 | Construction method of 6G knowledge system for global full-scene on-demand service |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310341182.8A CN116431825A (en) | 2023-03-31 | 2023-03-31 | Construction method of 6G knowledge system for global full-scene on-demand service |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116431825A true CN116431825A (en) | 2023-07-14 |
Family
ID=87079077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310341182.8A Pending CN116431825A (en) | 2023-03-31 | 2023-03-31 | Construction method of 6G knowledge system for global full-scene on-demand service |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116431825A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236409A (en) * | 2023-11-16 | 2023-12-15 | 中电科大数据研究院有限公司 | Small model training method, device and system based on large model and storage medium |
CN117573956A (en) * | 2024-01-16 | 2024-02-20 | 中国电信股份有限公司深圳分公司 | Metadata management method, device, equipment and storage medium |
-
2023
- 2023-03-31 CN CN202310341182.8A patent/CN116431825A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236409A (en) * | 2023-11-16 | 2023-12-15 | 中电科大数据研究院有限公司 | Small model training method, device and system based on large model and storage medium |
CN117236409B (en) * | 2023-11-16 | 2024-02-27 | 中电科大数据研究院有限公司 | Small model training method, device and system based on large model and storage medium |
CN117573956A (en) * | 2024-01-16 | 2024-02-20 | 中国电信股份有限公司深圳分公司 | Metadata management method, device, equipment and storage medium |
CN117573956B (en) * | 2024-01-16 | 2024-05-07 | 中国电信股份有限公司深圳分公司 | Metadata management method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116431825A (en) | Construction method of 6G knowledge system for global full-scene on-demand service | |
Jain et al. | Ontology based information retrieval in semantic web: A survey | |
US10007679B2 (en) | Enhanced max margin learning on multimodal data mining in a multimedia database | |
US7765176B2 (en) | Knowledge discovery system with user interactive analysis view for analyzing and generating relationships | |
CN105653691B (en) | Management of information resources method and managing device | |
Amato et al. | Recommendation in social media networks | |
CN114218400A (en) | Semantic-based data lake query system and method | |
CN101834837A (en) | On-line landscape video active information service system of scenic spots in tourist attraction based on bandwidth network | |
CN111582587B (en) | Prediction method and prediction system for video public sentiment | |
Ye et al. | A web services classification method based on GCN | |
Wang et al. | A novel blockchain oracle implementation scheme based on application specific knowledge engines | |
CN113792786A (en) | Automatic commodity object classification method and device, equipment, medium and product thereof | |
Obaid et al. | Semantic web and web page clustering algorithms: a landscape view | |
Gourru et al. | Document network projection in pretrained word embedding space | |
Kalmukov et al. | Design and development of an automated web crawler used for building image databases | |
CN112069306B (en) | Paper partner recommendation method based on author writing tree and graph neural network | |
CN116738068A (en) | Trending topic mining method, device, storage medium and equipment | |
Xu et al. | Video structural description: a semantic based model for representing and organizing video surveillance big data | |
Fahad et al. | Towards Classification of Web Ontologies for the Emerging Semantic Web. | |
CN112733021A (en) | Knowledge and interest personalized tracing system for internet users | |
Hu et al. | Video content classification using time-sync comments and titles | |
Amato et al. | Semantic summarization of news from heterogeneous sources | |
Lakshmi et al. | Search for social smart objects constituting sensor ontology, social iot and social network interaction | |
Zhang et al. | A deep recommendation framework for completely new users in mashup creation | |
CN113505600B (en) | Distributed indexing method of industrial chain based on semantic concept space |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |