CN114443813A - Intelligent online teaching resource knowledge point concept entity linking method - Google Patents
Intelligent online teaching resource knowledge point concept entity linking method Download PDFInfo
- Publication number
- CN114443813A CN114443813A CN202210018754.4A CN202210018754A CN114443813A CN 114443813 A CN114443813 A CN 114443813A CN 202210018754 A CN202210018754 A CN 202210018754A CN 114443813 A CN114443813 A CN 114443813A
- Authority
- CN
- China
- Prior art keywords
- knowledge point
- character
- vector
- concept
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims description 153
- 238000004364 calculation method Methods 0.000 claims description 45
- 230000007246 mechanism Effects 0.000 claims description 24
- 238000012512 characterization method Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 17
- 238000004140 cleaning Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 230000002457 bidirectional effect Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 230000008520 organization Effects 0.000 abstract description 5
- 230000003213 activating effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
An intelligent online teaching resource knowledge point concept entity linking method comprises a knowledge point concept entity recognition model and a knowledge point concept linking model, wherein an application scene mainly faces to teaching resource organization and management in a domestic online learning platform, and domestic teaching is basically Chinese teaching, so that the method is only suitable for Chinese language texts and is compatible with partial English texts. The knowledge point concept entity identification is to extract knowledge point concept entity vocabularies, disciplines, professional terms, historical events and the like from a teaching resource text, wherein the extracted knowledge point concept entities are called knowledge point mentions; the knowledge point concept association refers to finding out concept knowledge with the highest semantic similarity from a knowledge base according to the extracted knowledge point concept mention and the context where the knowledge point concept is located, and performing relationship. The association between teaching resources and knowledge point concepts is realized through knowledge point concept entity recognition and knowledge point concept linkage, and the purpose of constructing a teaching resource organization system taking concept knowledge as a core is achieved.
Description
Technical Field
The invention relates to intelligent education, in particular to an intelligent online teaching resource knowledge point concept entity linking method.
Background
A large amount of learning resources are borne in a traditional teaching resource library, and the rich teaching resource types of the teaching resource library are widely concerned by people. With the increasing number of users on the online learning platform, the number and types of teaching resources in the platform are increasing to meet different requirements of different users on the resources. In practice, along with the increase of the number of teaching resources and the diversification of contents, a learner needs to spend more time and more energy on searching and selecting the learning resources needed by the learner on a teaching resource platform than before, the learning efficiency of the learner in the platform is gradually reduced, and the learning quality and the learning initiative of the learner are seriously influenced.
Knowledge maps have become a core driving force for the development of the internet and artificial intelligence as a means to effectively structure human knowledge. The teaching resource library in the self-adaptive learning system can also build a teaching resource system taking knowledge as a core by means of knowledge graph technology. The teaching resources can be associated with the concept knowledge points, so that a teaching resource system can be effectively organized to enable the self-adaptive learning system.
The existing knowledge point concept marking and association in the online teaching resources are all input in a manual mode by teachers. However, the manual input mode consumes a lot of time and energy, most of knowledge point concepts provided by teachers are coarse-grained, fine-grained knowledge point concepts in teaching resources are ignored, the knowledge point concepts are not fully labeled, and learners cannot intuitively know details of course contents. To solve the above problems, an intelligent method or tool is needed to accurately identify and associate the knowledge point concept entities in the online teaching resources. Currently, only some researchers carry out part of relevant work, and mainly extract key phrases and terms in teaching resources by means of statistical learning. However, these research advances are far from adequate to solve the above-mentioned critical problems.
With the development of knowledge-graph and natural language processing fields, entity linking technology can sufficiently solve the above-mentioned problems. The entity linking technique is to identify references in the text and link to corresponding entities in the knowledge base. Most of the existing entity linking methods are open-ended, that is, key entity vocabularies such as names of people, place names, organizations, time and the like in text corpora are recognized and linked to corresponding entries in a knowledge base (such as encyclopedia, Wikipedia and the like). At present, there are more mature entity linking tools, such as: wikify! AIDA, DBpedia Spotlight, TagMe and Linkify, etc. These entity linking systems are mainly composed of two parts, entity mention detection and entity linking. Although the above-described physical link system has been developed more and more, there are certain disadvantages. In entity mention detection, the system mainly relies on existing Named Entity Recognition (NER) tools, such as Stanza, Jieba and SnowNLP, which can achieve considerable entity recognition accuracy, but can only recognize three entity categories: people, places, and organizations.
Different from the entity linking task in the open field, the teaching resource knowledge point concept linking is to extract and associate concept entities involved in teaching resources, and not to all entities such as: the place name entity, the person entity, the time entity and the like are extracted and associated, so that the existing entity linking tool is not suitable for the concept linking of knowledge points in teaching resources.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide an intelligent online teaching resource knowledge point concept entity linking method, which is based on a natural language processing model Bert and is combined with a text data enhancement technology to extract and link knowledge point concepts contained in teaching resources. The association between teaching resources and concept knowledge is realized, and finally a teaching resource organization system taking knowledge point concepts as a core is constructed.
In order to achieve the purpose, the invention adopts the technical scheme that:
an intelligent online teaching resource knowledge point concept entity linking method is characterized by comprising the following steps:
1) firstly, carrying out a preprocessing process of character string cleaning on a character string, wherein the character string cleaning mainly comprises the steps of judging whether a character is a Chinese character set, a numeric character set and an English character set, marking the character set as S, and removing the character if the character is not in the character set S;
2) the model needs to match the cleaned character string C ═ { C ═ C1,c2,......,clAll elements in the symbol are labeled by a BIO labeling mechanism, and when a character ciWhen labeled "B", represents the character ciThe first character of a certain knowledge point concept vocabulary entity, "I" is a middle character of the knowledge point concept vocabulary entity, "O" is a non-knowledge point concept vocabulary character, and finally text data are obtained;
3) text data enhancement constructs a knowledge point concept dictionary Dict through knowledge point entry nouns and aliases thereof in a knowledge base, a Maximum BiDirectional Matching algorithm (BiDirectional Maximum Matching dictionary) is used for Matching a character string C, dictionary vocabularies contained in the character string are found out, matched character substrings are labeled by a 'BIEO' mechanism, namely if the matched character substrings are Csub={ci,ci+1,……,ci+m},CsubE.g. Dict, starting character c in the pair stringiLabeled "B", end character ci+mLabeled "E", starting character ciAnd an end character ci+mCharacter string of { c }i+1,ci+2,…..,ci+m-1All the characters contained in the character string are marked as 'I' and other characters which are not matched are marked as 'O', through the mechanism, a string of character strings with marks can be obtained, and a starting character 'CLS' is added]"and end character" [ SEP]”S={s[CLS],s1,s2,……,s[SEP]Each element siFrom the character C of the corresponding index position in the character string CiAnd a label character;
4) carrying out vector space embedding operation Embedding (S) on the obtained character string S with the label, namely, each element S in SiCharacterised by one dimension being dsThe numerical values in the vector are randomly initialized by using KaiMing distribution,the embedded sequence vector is
5) The sequence vector E obtained by the above operationSThe method comprises the steps of representing context semantic information contained in a character string C by using a pre-trained neural network language model Bert, wherein the pre-trained neural network language model Bert refers to a model trained in large-scale general text data, the pre-trained language model Bert is used as a semantic encoder, a text sequence can be effectively represented as a high-dimensional vector, the cleaned character string C is used as the input of the pre-trained Bert language model, the Bert model is used for calculating the character string C by taking characters as a unit, and the input character string C is { C ═ C1,c2,......,clThe Bert model will first insert the identifier "[ CLS ] before the start position and after the end position of the string, respectively]"and" [ SEP]", i.e., string {" [ CLS { "] { [ CLS ]]",c1,c2,......,cl,"[SEP]"} as calculated data for the model;
6) the output vector F obtained by the Bert model is the coding vector of the character string C, and then the sequence vector E with the concept knowledge point vocabulary boundary information is combinedSExtracting candidate concept knowledge point entities from the character string C through an LSTM model and a conditional random field CRF; extracting corresponding substrings on the predicted tag sequence to obtain a knowledge point concept mentioning entity;
7) the knowledge point concept entity link model refers to the extracted knowledge point concept to an entity M ═ M1,m2,.......,mkMatching and associating with a knowledge point entity in a knowledge base, generating a candidate knowledge point concept entity based on a Levenshtein Distance string fuzzy matching algorithm, and adding the current entity m to be referred toiFuzzy matching is carried out on the knowledge point concept vocabularies in the knowledge base, the matched knowledge point concept vocabularies with the editing Distance larger than the Distance are filtered by setting the editing Distance parameter Distance in the fuzzy matching algorithm, and a candidate knowledge point concept entity set is generated
8) Coding the abstract text description of each candidate knowledge point concept entity through the introduced pre-training Bert model to obtain a vector for representing the candidate knowledge point concept entities, and for one candidate knowledge point concept entityiWith the corresponding abstract description as a stringAs an input of the Bert model, the output vector after the Bert model is coded isCorresponding identifier 'CLS' to implicit vector hclsObtaining an output vector by activating a fully-connected layer with a function of tanhAs a characterization vector of a conceptual entity of a candidate knowledge point, i.e. In this way, a set of characterization vectors for a set of candidate knowledge point concept entities can be obtained
9) For each mentioned knowledge point concept miThe method for representing the course text C includes firstly, pre-training a Bert model to a course text C ═ { C } where knowledge point concepts are located1,c2,......,clCoding is carried out, and a representation vector V of the course text is obtainedCObtaining a token vector VCThe method of (2) is the same as the method of the characterization vector of the candidate knowledge point concept entity;
10) the encoding vector of each character in the course text after being calculated by the Bert model is HC={hcls,h1,h2,......,hl,hsepFor the extracted mentioned knowledge point concepts miThe index position of the plaintext substring represented by the index position can be represented as a binary groupWherein, beg represents the index of the starting position of the substring in C, and end represents the index of the ending position of the substring in C. Encoding vector HCIs prepared byThe code vector between the middle start position index beg and the end position index end is expressed asWill be provided withObtaining a characterization vector of the concept entity of the knowledge point through a text convolution network TextCNNTextCNN model for inputCalculating the characterization vector V of the course textCAnd a characterization vector referring to the concept entity of the knowledge pointPerforming concatemate splicing operation, and obtaining an output vector through a full-connection layer with an activation function of tanhNamely, it is
11) Output vector that will refer to concept entity of knowledge pointSet of characterization vectors associated with set of candidate knowledge point concept entitiesIs subjected to cos similarity calculation, i.e. each vector in (1)Concept of entity sets from candidate knowledge points Selecting the knowledge point concept with the highest similarity to be associated with the knowledge point concept, i.e. the final association result can be represented as a binary group
12) The concept of knowledge points included in the text of the input course is linked as a result And completing the association between teaching resources and knowledge point concepts in the knowledge base.
The input of the knowledge point concept entity recognition model is a text string X ═ X1,x2,......,xnX is composed of n characters, XiThe ith character of X, the text string may be from a course video caption or an electronic textbook text, etc.
The character string cleaning preprocessing method is realized mainly through Unicode codingTable realization, when a character xiUnicode encoding ofWhen located between \ u4e00 and \ u9fa5, i.e.Character xiIs a Chinese character. In the same way, when Time, character xiIs a numeric character; when in use OrTime, character xiAnd for English characters, deleting all characters outside the encoding range by Unicode encoding, completing the cleaning process of the character string, and cleaning the character string C ═ { C { (C) }1,c2,......,clAnd h, the length l of the character string after cleaning is less than or equal to n.
The calculation of the Bert model for the character string mainly comprises the following steps:
character embedding operation: the character string to be calculated { [ CLS ]]",c1,c2,......,cl,"[SEP]Each character in the' is characterized as a character vector with d dimension by Embedding operation (Embedding), and the embedded character string vector is
Integrating position information coding: to obtain the sequence features of text data, the Bert model uses sin and cos mechanisms on charactersString vector EcThe position index of each element in (a) is encoded. I.e. for elements at the position of pos, where diD is more than or equal to 1 for the dimension position in each elementiD is less than or equal to d, when d isiFor even numbers, using sin function for conversion, diIf the number is odd, the cos function is used for conversion to obtain the position coding vectorEach element p is a d-dimensional vector, and the corresponding position coding formula is as follows:
self-attention mechanism based on dot product scaling: the character string vector E obtained by the calculationcAdding the position coded vector P to obtain an input vector Z ═ Z of a self-attention mechanismcls,z1,......,zl,zsep}. The self-attention mechanism mainly captures the degree of association between every two elements in the sequence through a dot product scaling method, and the higher the degree of association between the two elements is, the larger the value of the calculation result is. The formula for the calculation of the self-attention mechanism is as follows, where the inputs are each a vector Z multiplied by its corresponding weight parameter W, i.e. Q ═ ZWQ,K=ZWK,V=ZWVAnd d is the vector dimension of the input:
the multi-head self-attention mechanism: in order to fully consider the information from different independent subspaces after the scaling dot product calculation, vectors after h times of scaling dot product calculation, namely h attention heads are spliced and concatered, then a linear transformation is carried out, and a calculation formula is obtainedAs follows, whereinWOFor a trainable parameter matrix:
MultiHead(Q,K,V)=Concate(head1,……,headh)WO
feed-forward network layer: the result of each character element after multi-head attention calculation is a result after only linear transformation, and in order to fully consider the mutual influence between information under different potential dimensions, a feedforward network layer with nonlinear transformation is integrated into a model, and the calculation mode of the feedforward network layer is as follows, wherein the result of each character element after multi-head attention calculation is the result after only linear transformation, and the feedforward network layer is calculated as followsAre trainable parameter matrices:
F=FFN(Z)=ReLU(ZW(1)+b(1))W(2)+b(2)。
extracting candidate concept knowledge point entities from the character string C through an LSTM model and a conditional random field CRF; the main process is as follows:
and (3) feature vector fusion: the feature fusion mainly comprises a coding vector F with semantic features and a sequence vector E with knowledge point concept vocabulary boundary informationSPerforming Concate splicing, and performing linear transformation through a weight parameter matrix W to obtain a fused vector V ═ Vcls,v1,v2,......,vl,vsepThe formula is as follows:
V=Concate(F,ES)W
encoding of LSTM model: the LSTM model is a variant of the Recurrent Neural Network (RNN) and has a more robust predictive effect than the RNN model. Vector information of the first i-1 elements can be fully combined when the ith element is calculated, and the calculation process of the LSTM model for the elements at each time step t is as follows:
zt=σ(Wi*[ht-1,vt])
rt=σ(Wr*[ht-1,vt])
where σ is sigmoid function,. is dot product multiplication operator,. vtIs the t-th element, h, in the fused vector VtAs an implicit state vector, i.e. vtThe output of the vector V after passing through the LSTM model is H ═ H1,h2,.....,hTWhere T ═ l + 2.
CRF model prediction layer: the model prediction layer is used for judging the implicit vector output by the LSTM model and consists of a full connection layer and a CRF layer. First, the implicit state vector H ═ H output by the LSTM model1,h2,.....,hTPerforming linear transformation through a full connection layer to obtain the score of each character corresponding to each class label, i.e. the score l _ score of each labeli=[score1,score2,score3]Comprises three elements, wherein score1Score, which represents the probability of predicting the current character as "B2Score, which represents the probability score for predicting the current character as "I3Representing a probability score of predicting the current character as "O". The probability of each character prediction label in the character string is set as L _ Score ═ L _ Scorecls,l_score1,l_score2,,......,l_scorel,l_scoresepAnd taking the score set of the character string as the input of the CRF layer. The CRF layer can model the labels by taking the input score set as an emision score matrix, calculate a score transition matrix T between label categories to represent the transition probability from one label to another label, mine the dependency relationship between the label categories, calculate the sequence score Scors (H) of the character string, and decode the score sequence Scors (H) by a Viterbi algorithm to obtain a predicted label sequenceRemoving the corresponding prediction tags of the start identifier 'CLS' and the end identifier 'SEP' carried by the Bert model to obtain the prediction tag sequence result of the character stringExtracting corresponding substrings on the predicted tag sequence to obtain a knowledge point concept mentioning entity M ═ { M ═ M1,m2,.......,mk}。
The extracted knowledge point concept is referred to as an entity M ═ { M ═ M1,m2,.......,mkMatching and associating with knowledge point entities in a knowledge base, and the method mainly comprises the following steps: 1. using Levenshtein Distance string fuzzy matching algorithm to each mentioned entity miPerforming fuzzy search, and selecting a candidate knowledge point entity set which is possibly matched from a knowledge base; 2. for reference to entity miPerforming context semantic representation on the candidate entities through a Bert model to obtain context semantic representation vectors; 3. and performing similarity calculation on the context semantic representation vectors of the knowledge point entity and each candidate entity through a cos function, wherein the candidate knowledge point entity with the highest similarity is the linked knowledge point concept.
1. defining a plurality of one-dimensional convolution kernels, and using the convolution kernels to perform convolution calculation on input respectively to capture the correlation of adjacent characters.
2. And performing time sequence maximum pooling on all output channels respectively, and splicing the pooled output values of the channels to obtain the characterization vector.
The invention has the beneficial effects that:
the technical framework of this patent mainly contains two main parts: the application scene of the patent mainly faces to the organization and management of teaching resources in a domestic online learning platform, and domestic teaching is basically Chinese teaching, so that the knowledge point concept entity recognition model and the knowledge point concept link model are only suitable for Chinese language texts and are compatible with partial English texts. The knowledge point concept entity identification is to extract the contained knowledge point concept entity vocabulary from the teaching resource text, such as: the extracted knowledge point concept entities are called knowledge point mentions; the knowledge point concept association means that the concept knowledge with the highest semantic similarity is found from a knowledge base according to the extracted knowledge point concept mention and the context where the knowledge point concept is located, and the relationship is carried out. The association between teaching resources and knowledge point concepts is realized through knowledge point concept entity recognition and knowledge point concept linkage, and the purpose of constructing a teaching resource organization system taking concept knowledge as a core is achieved.
Drawings
Fig. 1 is a working principle diagram of the present invention.
Fig. 2 is a schematic diagram of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIGS. 1 and 2, the knowledge point concept entity recognition model
The input of the knowledge point concept entity recognition model is a text string X ═ X1,x2,......,xnX is composed of n characters, XiThe ith character of X, the text string may be from a course video caption or an electronic textbook text, etc.
Firstly, carrying out a preprocessing process of character string cleaning on a character string, wherein the character string cleaning mainly comprises the steps of judging whether a character is a Chinese character set, a numeric character set and an English character set, marking the character set as S, and removing the character if the character is not in the character set S; the method is realized mainly through a Unicode coding table when a character xiUnicode encoding ofWhen located between \ u4e00 and \ u9fa5, i.e.Character xiIs a Chinese character. In the same way, whenTime, character xiIs a numeric character; when in use OrTime, character xiIs an English character. The Unicode codes delete all the characters outside the coding range to complete the cleaning process of the character string, and the cleaned character string C ═ C1,c2,......,clAnd h, the length l of the character string after cleaning is less than or equal to n. Next, the model needs to set { C ═ C for the cleaned string C1,c2,......,clAll elements in the symbol are labeled by a BIO labeling mechanism, and when a character ciWhen labeled "B", represents the character ciThe first character of a certain knowledge point concept vocabulary entity, "I" is the middle character of the knowledge point concept vocabulary entity, and "O" is the non-knowledge point concept vocabulary character.
Because the word frequency of the knowledge point concept vocabulary in the teaching text is low and the character string length of the concept vocabulary is large, the traditional character-level-based entity recognition model is difficult to recognize the text boundary of the knowledge point concept entity, and the knowledge point concept entity is difficult to be recognized completely. The method for enhancing the text data is used, and the accuracy of the knowledge point concept entity recognition model is improved by combining the Bert language model.
The text data enhancement constructs a knowledge point concept dictionary Dict through knowledge point vocabulary terms and aliases thereof in a knowledge base, and an external knowledge base used in the patent is a disciplinary knowledge base provided on line by academia. Use ofThe Maximum BiDirectional Matching algorithm (BiDirectional Maximum Matching algorithm) matches the character string C to find out dictionary words contained in the character string. The matched character substrings are all marked by a 'BIEO' mechanism, namely if the matched character substrings are Csub={ci,ci+1,……,ci+m},CsubE.g. Dict, starting character c in the pair stringiLabeled "B", end character ci+mLabeled "E", starting character ciAnd an end character ci+mCharacter string of { c }i+1,ci+2,.....,ci+m-1The included characters are all labeled as "I" and the other characters not matched are labeled as "O". By this mechanism, a string of marked characters can be obtained and the start character "[ CLS ] is added]"and end character" [ SEP]”S={s1,s2,……,slH, each element siFrom the character C of the corresponding index position in the character string CiAnd a label character.
Carrying out vector space embedding operation Embedding (S) on the obtained character string S with the label, namely, each element S in SiCharacterised by one dimension being dsThe numerical values in the high-dimensional vectors are randomly initialized by using KaiMing distribution, and the embedded sequence vectors are
The sequence vector E obtained by the above operationSThe boundary information of the knowledge point concept vocabulary is included, and then the context semantic information included in the character string C is characterized. The patent uses a pre-trained neural network language model Bert, which refers to a model trained on large-scale general text data. The method takes the pre-trained language model Bert as a semantic encoder, and can effectively represent the text sequence as a high-dimensional vector.
And taking the cleaned character string C as the input of a pre-training Bert language model, wherein the Bert model is used for calculating the character string C by taking characters as units. For an input string C ═ { C ═ C1,c2,......,clThe Bert model will first insert the identifier "[ CLS ] before the start position and after the end position of the string, respectively]"and" [ SEP]", i.e., string {" [ CLS { "] { [ CLS ]]",c1,c2,......,cl,"[SEP]"} as calculated data for the model. The calculation of the character string by the Bert model mainly comprises the following steps:
character embedding operation: the character string to be calculated { [ CLS ]]",c1,c2,......,cl,"[SEP]Each character in the' is characterized as a character vector with d dimension by Embedding operation (Embedding), and the embedded character string vector is
Integrating position information coding: to obtain the sequence features of the text data, the Bert model uses sin and cos mechanisms on a string vector EcThe position index of each element in (a) is encoded. I.e. for elements at the position of pos, where diD is more than or equal to 1 for the dimension position in each elementiD is less than or equal to d, when d isiFor even numbers, using sin function for conversion, diIf the number is odd, the cos function is used for conversion to obtain the position coding vectorEach element p is a d-dimensional vector, and the corresponding position coding formula is as follows:
self-attention mechanism based on dot product scaling: the character string vector E obtained by the calculationcAdding the position coded vector P to obtain an input vector Z ═ Z of a self-attention mechanismcls,z1,......,zl,zsep}. The self-attention mechanism mainly captures the degree of association between every two elements in the sequence through a dot product scaling method, and the higher the degree of association between the two elements is, the larger the value of the calculation result is. The formula for the calculation of the self-attention mechanism is as follows, where the inputs are each a vector Z multiplied by its corresponding weight parameter W, i.e. Q ═ ZWQ,K=ZWK,V=ZWVAnd d is the vector dimension of the input:
the multi-head self-attention mechanism: in order to fully consider the information from different independent subspaces after the scaling dot product calculation, vectors after h times of scaling dot product calculation, namely h self-attention heads are spliced and concatered, then a linear transformation is carried out, and the calculation formula is as follows, wherein the information from different independent subspaces after the scaling dot product calculation is carried outWOFor a trainable parameter matrix:
MultiHead(Q,K,V)=Concate(head1,……,headh)WO
feed-forward network layer: the result of each character element after multi-head attention calculation is a result after only linear transformation, and in order to fully consider the mutual influence between information under different potential dimensions, a feedforward network layer with nonlinear transformation is integrated into a model, and the calculation mode of the feedforward network layer is as follows, wherein the result of each character element after multi-head attention calculation is the result after only linear transformation, and the feedforward network layer is calculated as followsAre trainable parameter matrices:
F=FFN(Z)=ReLU(ZW(1)+b(1))W(2)+b(2)
the output vector F obtained by the Bert model is the coding vector of the character string C, and then the sequence vector with the concept knowledge point vocabulary boundary information is combinedESAnd extracting candidate concept knowledge point entities from the character string C through an LSTM model and a conditional random field CRF, wherein the main process is as follows:
and (3) feature vector fusion: the feature fusion mainly comprises a coding vector F with semantic features and a sequence vector E with knowledge point concept vocabulary boundary informationSPerforming Concate splicing, and performing linear transformation through a weight parameter matrix W to obtain a fused vector V ═ Vcls,v1,v2,......,vl,vsepThe formula is as follows:
V=Concate(F,ES)W
encoding of LSTM model: the LSTM model is a variant of the Recurrent Neural Network (RNN) and has a more robust predictive effect than the RNN model. Vector information of the first i-1 elements can be fully combined when the ith element is calculated, and the calculation process of the LSTM model for the elements at each time step t is as follows:
zt=σ(Wi*[ht-1,vt])
rt=σ(Wr*[ht-1,vt])
where σ is sigmoid function,. is operator of dot product multiplication, vtIs the t-th element, h, in the fused vector VtAs an implicit state vector, i.e. vtThe output of the vector V after passing through the LSTM model is H ═ H1,h2,.....,hTWhere T ═ l + 2.
CRF model prediction layer: the model prediction layer is used for judging the implicit vector output by the LSTM model and consists of a full connection layer and a CRF layer. First, the implicit state vector H ═ H output by the LSTM model1,h2,.....,hTPerforming linear transformation through a full connection layer to obtain the score of each character corresponding to each class label, i.e. the score l _ score of each labeli=[score1,score2,score3]Comprises three elements, wherein score1Score, which represents the probability of predicting the current character as "B2Score, which represents the probability score for predicting the current character as "I3Representing a probability score of predicting the current character as "O". The probability of each character prediction label in the character string is set as L _ Score ═ L _ Scorecls,l_score1,l_score2,,......,l_scorel,l_scoresepAnd taking the score set of the character string as the input of the CRF layer. The CRF layer can model the labels by taking the input score set as an emision score matrix, calculate a score transition matrix T between label categories to represent the transition probability from one label to another label, mine the dependency relationship between the label categories, calculate the sequence score Scors (H) of the character string, and decode the score sequence Scors (H) by a Viterbi algorithm to obtain a predicted label sequenceRemoving the corresponding prediction tags of the start identifier 'CLS' and the end identifier 'SEP' carried by the Bert model to obtain the prediction tag sequence result of the character stringExtracting corresponding substrings on the predicted tag sequence to obtain a knowledge point concept mentioning entity M ═ { M ═ M1,m2,.......,mk}。
Knowledge point concept entity link model
The knowledge point concept entity link model refers to the extracted knowledge point concept to an entity M ═ M1,m2,.......,mkMatching and associating with knowledge point entities in a knowledge base, and the method mainly comprises the following steps: 1. using Levenshtein Distance string fuzzy matching algorithm to each mentioned entity miPerforming fuzzy searchSearching, selecting a candidate knowledge point entity set which is possibly matched from a knowledge base; 2. for reference to entity miPerforming context semantic representation on the candidate entities through a Bert model to obtain context semantic representation vectors; 3. and performing similarity calculation on the context semantic representation vectors of the knowledge point entity and each candidate entity through a cos function, wherein the candidate knowledge point entity with the highest similarity is the linked knowledge point concept.
Generating a candidate knowledge point concept entity based on a Levenshtein Distance string fuzzy matching algorithm, and generating a current mentioned entity miFuzzy matching is carried out on the knowledge point concept vocabularies in the knowledge base, the matched knowledge point concept vocabularies with the editing Distance larger than the Distance are filtered by setting the editing Distance parameter Distance in the fuzzy matching algorithm, and a candidate knowledge point concept entity set is generated
In the external knowledge base, there is a corresponding abstract text description for each knowledge point concept vocabulary. The method encodes the abstract text description of each candidate knowledge point concept entity through the pre-training Bert model introduced above, and obtains a vector for representing the candidate knowledge point concept entities. Concept entity for a candidate knowledge pointiIts corresponding abstract description as a stringAs input to the Bert model. The output vector after being coded by the Bert model isCorresponding identifier 'CLS' to implicit vector hclsObtaining an output vector by activating a fully connected layer with a function of tanhAs a characterization vector of a conceptual entity of a candidate knowledge point, i.e. In this way, a set of characterization vectors for a set of candidate knowledge point concept entities can be obtained
For each mentioned knowledge point concept miThe method for representing the course text C includes firstly, pre-training a Bert model to a course text C ═ { C } where knowledge point concepts are located1,c2,......,clCoding is carried out, and a representation vector V of the course text is obtainedCObtaining a token vector VCIn the same way as the method of the token vector of the concept entity of the candidate knowledge point.
The encoding vector of each character in the course text after being calculated by the Bert model is HC={hcls,h1,h2,......,hl,hsepFor the extracted mentioned knowledge point concepts miThe index position of the plaintext substring represented by the index position can be represented as a binary groupWherein, beg represents the index of the starting position of the substring in C, and end represents the index of the ending position of the substring in C. Encoding vector HCIs prepared byThe code vector between the middle start position index beg and the end position index end is expressed asWill be provided withObtaining a characterization vector of the concept entity of the knowledge point through a text convolution network TextCNNTextCNN model for inputThe calculation steps are as follows:
3. defining a plurality of one-dimensional convolution kernels, and using the convolution kernels to perform convolution calculation on input respectively to capture the correlation of adjacent characters.
4. And performing time sequence maximum pooling on all output channels respectively, and splicing the pooled output values of the channels to obtain the characterization vector.
Finally, the representation vector V of the course text is divided intoCAnd a characterization vector referring to the concept entity of the knowledge pointPerforming concatemate splicing operation, and obtaining an output vector through a full-connection layer with an activation function of tanhNamely, it is
Output vector that will refer to concept entity of knowledge pointSet of characterization vectors associated with set of candidate knowledge point concept entitiesIs subjected to cos similarity calculation, i.e. each vector in (1)Concept of entity sets from candidate knowledge points Selecting the knowledge point concept with the highest similarity to be associated with the knowledge point concept, i.e. the final association result can be represented as a binary group
Claims (7)
1. An intelligent online teaching resource knowledge point concept entity linking method is characterized by comprising the following steps:
1) firstly, carrying out a preprocessing process of character string cleaning on a character string, wherein the character string cleaning is mainly used for judging whether a character is a Chinese character set, a numeric character set and an English character set, and if the character is not in the character set, removing the character;
2) the model needs to match the cleaned character string C ═ { C ═ C1,c2,......,clAll elements in the symbol are labeled by a BIO labeling mechanism, and when a character ciWhen labeled "B", represents the character ciThe first character of a certain knowledge point concept vocabulary entity, "I" is a middle character of the knowledge point concept vocabulary entity, "O" is a non-knowledge point concept vocabulary character, and finally text data are obtained;
3) text data enhancement constructs a knowledge point concept dictionary Dict through knowledge point vocabulary terms and aliases thereof in a knowledge base, matches a character string C by using a Maximum BiDirectional Matching algorithm (BiDirectional Maximum Matching algorithm), and finds out that the character string containsThe matched character sub-strings are all marked by a 'BIEO' mechanism, namely if the matched character sub-strings are Csub={ci,ci+1,......,ci+m},CsubE.g. Dict, starting character c in the pair stringiLabeled "B", end character ci+mLabeled "E", starting character ciAnd an end character ci+mC between the character stringsi+1,ci+2,......,ci+m-1All the characters contained in the character string are marked as ' I ' and other characters which are not matched are marked as ' O ', and through the mechanism, a string of marked character strings can be obtained and initial characters are added simultaneously ' [ CLS ]]"and end character" [ SEP]”,S={s[CLS],s1,s2,......,sl,S[SEP]Each element siFrom the character C of the corresponding index position in the character string CiAnd a label character;
4) carrying out vector space embedding operation Embedding (S) on the obtained character string S with the label, namely, each element S in SiCharacterised by one dimension being dsThe numerical values in the high-dimensional vectors are randomly initialized by using KaiMing distribution, and the embedded sequence vector is
5) The sequence vector E obtained by the above operationSThe method comprises the steps of representing context semantic information contained in a character string C by using a pre-trained neural network language model Bert, wherein the pre-trained neural network language model Bert refers to a model trained in large-scale general text data, the pre-trained language model Bert is used as a semantic encoder, a text sequence can be effectively represented as a high-dimensional vector, the cleaned character string C is used as the input of the pre-trained Bert language model, the Bert model is used for calculating the character string C by taking characters as a unit, and the input character string C is { C ═ C1,c2,......,clThe Bert model will first precede and end the start of the stringAfter placement, the identifiers "[ CLS ] are inserted separately]"and" [ SEP]", i.e., string {" [ CLS ]]″,c1,c2,......,cl,″[SEP]") as the calculated data for the model;
6) the output vector F obtained by the Bert model is the coding vector of the character string C, and then the sequence vector E with the concept knowledge point vocabulary boundary information is combinedSExtracting candidate concept knowledge point entities from the character string C through an LSTM model and a conditional random field CRF; extracting corresponding substrings on the predicted tag sequence to obtain a knowledge point concept mention entity;
7) the knowledge point concept entity link model is to refer the extracted knowledge point concept to an entity M ═ M1,m2,......,mkMatching and associating with a knowledge point entity in a knowledge base, generating a candidate knowledge point concept entity based on a Levenshtein Distance string fuzzy matching algorithm, and adding the current entity m to be referred toiFuzzy matching is carried out on the knowledge point concept vocabularies in the knowledge base, the matched knowledge point concept vocabularies with the editing Distance larger than the Distance are filtered by setting the editing Distance parameter Distance in the fuzzy matching algorithm, and a candidate knowledge point concept entity set is generated
8) Coding the abstract text description of each candidate knowledge point concept entity through the introduced pre-training Bert model to obtain a vector for representing the candidate knowledge point concept entities, and for one candidate knowledge point concept entityiWith the corresponding abstract description as a stringAs an input of the Bert model, the output vector after the Bert model is coded isCorresponding identifier 'CLS' to implicit vector hclsBy excitationObtaining an output vector from a fully connected layer with an active function of tanhAs a token vector of a conceptual entity of a candidate knowledge point, i.e. In this way, a set of characterization vectors for a set of candidate knowledge point concept entities can be obtained
9) For each mentioned knowledge point concept miThe method for representing the course text C includes firstly, pre-training a Bert model to a course text C ═ { C } where knowledge point concepts are located1,c2,......,clCoding is carried out, and a representation vector V of the course text is obtainedCObtaining a token vector VCThe method of (2) is the same as the method of the characterization vector of the candidate knowledge point concept entity;
10) the encoding vector of each character in the course text after being calculated by the Bert model is HC={hcls,h1,h2,......,hl,hsepFor the extracted mentioned knowledge point concepts miThe index position of the plaintext substring represented by the index position can be represented as a binary groupWherein, beg represents the index of the starting position of the substring in C, and end represents the index of the ending position of the substring in C. Encoding vector HCIs prepared fromThe code vector between the middle start position index beg and the end position index end is expressed asWill be provided withObtaining a characterization vector of the concept entity of the knowledge point through a text convolution network TextCNNTextCNN model for inputCalculating the characterization vector V of the course textCAnd a characterization vector referring to the concept entity of the knowledge pointPerforming concatemate splicing operation, and obtaining an output vector through a full-connection layer with an activation function of tanhNamely, it is
11) Output vector that will refer to concept entity of knowledge pointSet of token vectors associated with set of candidate knowledge point concept entitiesIs subjected to cos similarity calculation, i.e. each vector in (1)Concept of entity sets from candidate knowledge points Selecting the knowledge point concept with the highest similarity to be associated with the knowledge point concept, i.e. the final association result can be represented as a binary group
2. The method as claimed in claim 1, wherein the input of the knowledge point concept entity recognition model is a text string X ═ X1,x2,......,xnX is composed of n characters, XiThe ith character of X, the text string may be from a course video caption or an electronic textbook text, etc.
3. The method as claimed in claim 1, wherein the preprocessing method for cleaning the character string is implemented mainly by Unicode coding table when a character x is a characteriUnicode encoding ofIs located between \ u4e00 and \ u9fa5, that is'Character xiIs a Chinese character. In the same way, whenTime, character xiIs a numeric character; when in useOrTime, character xiAnd for English characters, deleting all characters outside the encoding range by Unicode encoding to finish the cleaning process of the character string, wherein the cleaned character string C is { C }1,c2,......,clAnd h, the length l of the character string after cleaning is less than or equal to n.
4. The intelligent online teaching resource knowledge point concept entity linking method as claimed in claim 1, wherein the calculation of the character string by the Bert model mainly comprises the following steps:
1) character embedding operation: the character string { "[ CLS ] to be calculated]″,c1,c2,......,cl,″[SEP]Each character in the character string is characterized as a d-dimensional character vector through Embedding operation (Embedding), and the embedded character string vector is
2) Integrating position information coding: to obtain the sequence features of the text data, the Bert model uses sin and cos mechanisms on a string vector EcThe position index of each element in (a) is encoded. I.e. for elements at the position of pos, where diFor the position of the dimension in each element,1≤did is less than or equal to d, when d isiFor even numbers, using sin function for conversion, diIf the number is odd, the cos function is used for conversion to obtain the position coding vectorEach element p is a d-dimensional vector, and the corresponding position coding formula is as follows:
3) self-attention mechanism based on dot product scaling: the character string vector E obtained by the calculationcAdding the position coded vector P to obtain an input vector Z ═ Z of a self-attention mechanismcls,z1,......,zl,zsep}. The self-attention mechanism mainly captures the degree of association between every two elements in the sequence through a dot product scaling method, and the higher the degree of association between the two elements is, the larger the value of the calculation result is. The formula for the calculation of the self-attention mechanism is as follows, where the inputs are each a vector Z multiplied by its corresponding weight parameter W, i.e. Q ═ ZWQ,K=ZWK,V=ZWVAnd d is the vector dimension of the input:
4) the multi-head self-attention mechanism: in order to fully consider the information from different independent subspaces after the scaling dot product calculation, vectors after h times of scaling dot product calculation, namely h attention heads are spliced with concatee, and then linear transformation is carried out, wherein the calculation formula is as followsWOAs a trainable parameter matrix:
MultiHead(Q,K,V)=Concate(head1,......,headh)WO
5) feed-forward network layer: the result of each character element after multi-head attention calculation is a result after only linear transformation, and in order to fully consider the mutual influence between information under different potential dimensions, a feedforward network layer with nonlinear transformation is integrated into a model, and the calculation mode of the feedforward network layer is as follows, wherein W is the following(1),b(1),Are trainable parameter matrices:
F=FFN(Z)=ReLU(ZW(1)+b(1))W(2)+b(2)。
5. the method of claim 1, wherein the candidate concept knowledge point entities are extracted from the character string C by an LSTM model and a conditional random field CRF; the main process is as follows:
1) and (3) feature vector fusion: the feature fusion mainly comprises a coding vector F with semantic features and a sequence vector E with knowledge point concept vocabulary boundary informationSPerforming Concate splicing, and performing linear transformation through a weight parameter matrix W to obtain a fused vector V ═ Vcls,v1,v2,......,vl,vsepThe formula is as follows:
V=Concate(F,ES)W
2) encoding of LSTM model: the LSTM model is a variant of the Recurrent Neural Network (RNN) and has a more robust predictive effect than the RNN model. Vector information of the first i-1 elements can be fully combined when the ith element is calculated, and the calculation process of the LSTM model for the elements at each time step t is as follows:
zt=σ(Wi*[ht-1,vt])
rt=σ(Wr*[ht-1,vt])
where σ is sigmoid function,. is dot product multiplication operator,. vtIs the t-th element, h, in the fused vector VtAs an implicit state vector, i.e. vtThe output of the vector V after passing through the LSTM model is H ═ H1,h2,......,hTWhere T ═ l + 2.
3) CRF model prediction layer: the model prediction layer is used for judging the implicit vector output by the LSTM model and consists of a full connection layer and a CRF layer. First, the implicit state vector H ═ H output by the LSTM model1,h2,......,hTPerforming linear transformation through a full connection layer to obtain the score of each character corresponding to each category label, i.e. the score l _ score of each labeli=[score1,score2,score3]Comprises three elements, wherein score1Score, which represents the probability of predicting the current character as "B2Score, which represents the probability score for predicting the current character as "I3Representing a probability score of predicting the current character as "O". The probability of each character prediction label in the character string is set as L _ Score ═ L _ Scorecls,l_score1,l_score2,,......,l_scorel,l_scoresepAnd taking the score set of the character string as the input of the CRF layer.The CRF layer can model the labels by taking the input score set as an emision score matrix, calculate a score transition matrix T between label categories to represent the transition probability from one label to another label, mine the dependency relationship between the label categories, calculate the sequence score Scors (H) of the character string, and decode the score sequence Scors (H) by a Viterbi algorithm to obtain a predicted label sequenceRemoving the corresponding prediction tags of the start identifier 'CLS' and the end identifier 'SEP' carried by the Bert model to obtain the prediction tag sequence result of the character stringExtracting corresponding substrings on the predicted tag sequence to obtain a knowledge point concept mentioning entity M ═ { M ═ M1,m2,......,mk}。
6. The method as claimed in claim 1, wherein the extracted knowledge point concept mentioning entity M ═ M is used to link the knowledge point concept entities of the online education resources1,m2,......,mkMatching and associating with knowledge point entities in a knowledge base, and the method mainly comprises the following steps: 1) using Levenshtein Distance string fuzzy matching algorithm to each mentioned entity miPerforming fuzzy search, and selecting a candidate knowledge point entity set which is possibly matched from a knowledge base; 2) for reference to entity miPerforming context semantic representation on the candidate entities through a Bert model to obtain context semantic representation vectors; 3. and performing similarity calculation on the context semantic representation vectors of the knowledge point entity and each candidate entity through a cos function, wherein the candidate knowledge point entity with the highest similarity is the linked knowledge point concept.
7. The method of claim 1, wherein the method comprises the step of linking the knowledge points of the online teaching resources with the concept entitiesCharacterized in that said TextCNN model is applied to the inputThe calculation steps are as follows:
1) defining a plurality of one-dimensional convolution kernels, and using the convolution kernels to perform convolution calculation on input respectively to capture the correlation of adjacent characters.
2) And performing time sequence maximum pooling on all output channels respectively, and splicing the pooled output values of the channels to obtain the characterization vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210018754.4A CN114443813B (en) | 2022-01-09 | 2022-01-09 | Intelligent on-line teaching resource knowledge point concept entity linking method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210018754.4A CN114443813B (en) | 2022-01-09 | 2022-01-09 | Intelligent on-line teaching resource knowledge point concept entity linking method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114443813A true CN114443813A (en) | 2022-05-06 |
CN114443813B CN114443813B (en) | 2024-04-09 |
Family
ID=81367718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210018754.4A Active CN114443813B (en) | 2022-01-09 | 2022-01-09 | Intelligent on-line teaching resource knowledge point concept entity linking method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114443813B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115633090A (en) * | 2022-10-21 | 2023-01-20 | 北京中电飞华通信有限公司 | Multi-source data link method based on eSIM card and 5G network |
CN116976351A (en) * | 2023-09-22 | 2023-10-31 | 之江实验室 | Language model construction method based on subject entity and subject entity recognition device |
CN117852637A (en) * | 2024-03-07 | 2024-04-09 | 南京师范大学 | Definition-based subject concept knowledge system automatic construction method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526799A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of knowledge mapping construction method based on deep learning |
CN109902298A (en) * | 2019-02-13 | 2019-06-18 | 东北师范大学 | Domain Modeling and know-how estimating and measuring method in a kind of adaptive and learning system |
CN111753098A (en) * | 2020-06-23 | 2020-10-09 | 陕西师范大学 | Teaching method and system based on cross-media dynamic knowledge graph |
WO2021012645A1 (en) * | 2019-07-22 | 2021-01-28 | 创新先进技术有限公司 | Method and device for generating pushing information |
-
2022
- 2022-01-09 CN CN202210018754.4A patent/CN114443813B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526799A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of knowledge mapping construction method based on deep learning |
CN109902298A (en) * | 2019-02-13 | 2019-06-18 | 东北师范大学 | Domain Modeling and know-how estimating and measuring method in a kind of adaptive and learning system |
WO2021012645A1 (en) * | 2019-07-22 | 2021-01-28 | 创新先进技术有限公司 | Method and device for generating pushing information |
CN111753098A (en) * | 2020-06-23 | 2020-10-09 | 陕西师范大学 | Teaching method and system based on cross-media dynamic knowledge graph |
Non-Patent Citations (1)
Title |
---|
吕健颖;尚福华;曹茂俊;: "课程知识本体自动构建方法研究", 计算机应用与软件, no. 08, 12 August 2018 (2018-08-12) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115633090A (en) * | 2022-10-21 | 2023-01-20 | 北京中电飞华通信有限公司 | Multi-source data link method based on eSIM card and 5G network |
CN115633090B (en) * | 2022-10-21 | 2023-07-18 | 北京中电飞华通信有限公司 | Multi-source data linking method based on eSIM card and 5G network |
CN116976351A (en) * | 2023-09-22 | 2023-10-31 | 之江实验室 | Language model construction method based on subject entity and subject entity recognition device |
CN116976351B (en) * | 2023-09-22 | 2024-01-23 | 之江实验室 | Language model construction method based on subject entity and subject entity recognition device |
CN117852637A (en) * | 2024-03-07 | 2024-04-09 | 南京师范大学 | Definition-based subject concept knowledge system automatic construction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN114443813B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460013B (en) | Sequence labeling model and method based on fine-grained word representation model | |
CN111444721B (en) | Chinese text key information extraction method based on pre-training language model | |
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN110083831B (en) | Chinese named entity identification method based on BERT-BiGRU-CRF | |
CN108388560B (en) | GRU-CRF conference name identification method based on language model | |
CN114443813B (en) | Intelligent on-line teaching resource knowledge point concept entity linking method | |
CN111985239B (en) | Entity identification method, entity identification device, electronic equipment and storage medium | |
CN112101028B (en) | Multi-feature bidirectional gating field expert entity extraction method and system | |
CN111694924A (en) | Event extraction method and system | |
CN114943230B (en) | Method for linking entities in Chinese specific field by fusing common sense knowledge | |
CN110276052B (en) | Ancient Chinese automatic word segmentation and part-of-speech tagging integrated method and device | |
CN111274804A (en) | Case information extraction method based on named entity recognition | |
CN112487820A (en) | Chinese medical named entity recognition method | |
CN115292463B (en) | Information extraction-based method for joint multi-intention detection and overlapping slot filling | |
CN111709242A (en) | Chinese punctuation mark adding method based on named entity recognition | |
CN113297364A (en) | Natural language understanding method and device for dialog system | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN113190656A (en) | Chinese named entity extraction method based on multi-label framework and fusion features | |
CN111444720A (en) | Named entity recognition method for English text | |
CN113641809A (en) | XLNET-BiGRU-CRF-based intelligent question answering method | |
CN114020900A (en) | Chart English abstract generation method based on fusion space position attention mechanism | |
CN112101014B (en) | Chinese chemical industry document word segmentation method based on mixed feature fusion | |
CN113505207B (en) | Machine reading understanding method and system for financial public opinion research report | |
CN115169349A (en) | Chinese electronic resume named entity recognition method based on ALBERT | |
CN114757191A (en) | Electric power public opinion field named entity recognition method and system based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |