CN116204607A - Text online learning resource knowledge point labeling method, system and medium - Google Patents
Text online learning resource knowledge point labeling method, system and medium Download PDFInfo
- Publication number
- CN116204607A CN116204607A CN202310188731.2A CN202310188731A CN116204607A CN 116204607 A CN116204607 A CN 116204607A CN 202310188731 A CN202310188731 A CN 202310188731A CN 116204607 A CN116204607 A CN 116204607A
- Authority
- CN
- China
- Prior art keywords
- sequence
- entity
- attention
- text
- knowledge point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 claims abstract description 46
- 230000006870 function Effects 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 239000000470 constituent Substances 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000005755 formation reaction Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 8
- 238000012549 training Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 9
- 230000006872 improvement Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000009966 trimming Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a text online learning resource knowledge point labeling method, a text online learning resource knowledge point labeling system and a text online learning resource knowledge point labeling medium, wherein the text online learning resource knowledge point labeling method comprises the steps of tokenizing an input course caption text to obtain a vocabulary sequence, and obtaining BERT codes; dictionary matching is carried out on the input course caption text and a preset designated entity table, and a candidate entity sequence is obtained; acquiring BERT codes based on the word symbol sequences; calculating a dictionary attention code of each element in the candidate entity sequence by using an entity encoder BE; the BERT codes and the dictionary attention codes are spliced and then input into a transducer layer to obtain attention enhancement representation; and inputting the attention enhancement representation into a linear classification layer for linear classification to obtain initial scoring, ending scoring and intra-finger scoring, and inputting into a decoding layer to obtain a knowledge point labeling result. The method can realize automatic labeling of knowledge points of text online learning resources, and has the advantages of high precision and recall ratio.
Description
Technical Field
The invention relates to the technical field of online education learning, in particular to a text online learning resource knowledge point labeling method, a text online learning resource knowledge point labeling system and a text online learning resource knowledge point labeling medium.
Background
Large-scale online open lessons (Massive Open Online Courses, MOOC for short) have become an important internet online learning application in recent years. Unlike traditional classroom teaching, the background knowledge mastered by the MOOC-oriented students varies greatly, and the degree of understanding of the various terms involved in course data varies.
The MOOC provides the students with course introduction pages, reading materials, illustrations, lecture videos (and subtitles), quiz questions and questions, etc. that form learning resources that the students can reach. The online learning resources represented by the reading materials and the teaching videos exist in the form of long plain text structures, and are called text online learning resources. In text-based online learning resources, knowledge points (or lesson concepts) are one of the unique language structures. Finger means: knowledge concepts of professors in the curriculum video help students understand related topics of the curriculum video. Specifically, two criteria need to be met: (1) phrase: must be a grammatically and semantically correct complete phrase; (2) informativeness: must represent a scientific and technological concept and this concept is relevant to the current course. Although the definition of knowledge points is subjective, the definition of knowledge points by a human grasping the relevant knowledge is often uniform.
The existing knowledge point recognition method takes a phrase as a unit, only a modeling method of literal matching is utilized, and the understanding of the concept of the course in the context is not actually realized. The modeling process for example of Pan model is: firstly, extracting phrases in the captions, merging the phrases with the same literal and composition into a candidate sample, then manually marking and machine scoring by taking the candidate sample as a unit, and finally, matching the positive examples in the candidate sample as a course concept library with the video captions to finish marking work. If the same literal text is encountered in use, they are all references to the concept of a lesson, which is a problem that needs to be solved using techniques and procedures such as physical linking. An important task in online learning resource analysis is to distinguish which content is closely related to the course, which is the focus of the teacher's teaching and also the content that students need to learn. Therefore, how to realize the online learning resource knowledge point labeling of the text can form semantic highlighting by labeling the curriculum related entities in the text, guide the attention of a learner, help the learner to check the learning result and prevent missing key points, so that the online learning resource knowledge point labeling method becomes a key technical problem to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides a method, a system and a medium for labeling knowledge points of text online learning resources.
In order to solve the technical problems, the invention adopts the following technical scheme:
a text online learning resource knowledge point labeling method comprises the following steps:
s101, tokenizing an input course caption text to obtain a tokenized sequence [ t ] 1 ,t 2 ,...,t n ]And obtain BERT code through BERT coding layerDictionary matching is carried out on the input course caption text and a preset designated entity table, and a candidate entity sequence [ e ] is obtained 1 ,e 2 ,...]And calculates a dictionary attention code of each element therein using an entity encoder BE;
s102, BERT is encodedAnd inputting the dictionary attention code split into a transducer layer to obtain an attention enhancement representation hr;
s103, enhancing the attention to the representation h r Inputting the initial scoring s into a linear classification layer for linear classification start Ending scoring s end And index internal scoring s mention Scoring s the start start Ending scoring s end And index internal scoring s mention And inputting the decoding layer to obtain a knowledge point labeling result.
Optionally, the BERT encoding is obtained by the BERT encoding layer in step S101The functional expression of (2) is:
in the above formula, BERT represents the BERT coding model, [ CLS ]]And [ SEP ]]Tag tokens for sentence start and separation, t 1 ~t n For a character in a sequence of logograms,the representation dimension, h, is the hidden layer dimension of the BERT coding model, and n is the number of characters in the sequence of tokens.
Optionally, BERT is encoded in step S102And dictionary attention code BE (e) i-n ) The splicing comprises:
s201, firstly, using the indicator entity table and the prior probability, the indicator entity table is a word symbol sequence [ t ] 1 ,t 2 ,...,t n ]Finding out the matching entity with the highest prior probability from all substrings matched with the named entity table, and taking the prior probability as the link confidence level; then according to the preset threshold th rl Screening the link confidence, selecting a reference entity pair with the confidence greater than a threshold value to obtain a reference list { (rs) i ,re i ,e i ) Of which (rs) i ,re i ) For candidate entity e i Location information rs of (a) i For candidate entity e i The starting position, rs i For candidate entity e i End position of (2);
s202, reference list { (rs) i ,re i ,e i ) Further original sequence of tokens t 1 ,t 2 ,...,t n ]Splicing into three sequences:
in the above, x r Representing word symbols and entity sequences, head r Representing a logogram and a sequence of entities x r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]In (3) a start position sequence, tail r Representing a logogram and a sequence of entitiesx r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]End position sequence of (a);
s203, combining the initial position sequence head r And end position sequence tail r For the word symbol and the entity sequence x r Any ith element of (2)And j' th element->Calculating head-tail relative distance ∈>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>And calculates the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Correlation R of (2) ij ;
S204, BERT-based codingAnd dictionary attention code determination of a word symbol and entity sequence x r Any i-th element +.>Determination ofIts vector represents E i And combine with the correlation R ij Determining a logogram and an entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j The method comprises the steps of carrying out a first treatment on the surface of the Based on the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j Attention weighting is performed to obtain a weighted feature a as input to the transducer layer.
Optionally, the relative head-to-tail distance is calculated in step S203Relative distance of head->Relative distance between the tail and the headRelative distance of tail +.>The functional expression of (2) is: />
In the above, head r i And tail r i Respectively a word symbol and an entity sequence x r Any ith element of (2)In the start position sequence head r And end position sequence tail r Corresponding element of (1), head r j And tail r j Respectively a word symbol and an entity sequence x r Any j-th element of (a)>In the start position sequence head r And end position sequence tail r Corresponding to the elements of the group.
Alternatively, the ith element in step S203And j' th element->Correlation R of (2) ij The expression of the calculation function of (c) is:
in the above formula, reLU represents a ReLU activation function, W r For the word symbol and the entity sequence x r Is used for the weight matrix of the (c),for splicing operation, < >>And->Respectively represent the relative distance between the head and the tail>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>The result of the encoding of P is encoded by the relative position.
Optionally, the vector in step S204 represents E i The functional expression of (2) is:
in the above-mentioned method, the step of,coding +.>Is the ith code in (e) i-n ) For the ith-nth code in dictionary attention codes, e i-n I is the word symbol and the entity sequence x for the i-n candidate entities in the candidate entity sequence r The i-th element of (a)>N is the number of characters in the sequence of tokens,/-, n is the number of characters in the sequence of tokens>For the word symbol and the entity sequence x r Any i-th element of (a); and attention weight a i,j The expression of the calculation function of (c) is:
in the above, W q As a trainable weight matrix, E i For the word symbol and the entity sequence x r The ith element in (a)Is represented by a vector of E j For the word symbol and the entity sequence x r The j-th element of (a)>Is represented by a vector of W k,E As a trainable weight matrix, R ij For the i element->And j' th element->W is as follows k,R For a trainable weight matrix, u is a trainable weight vector and v is a trainable weight vector.
Optionally, step S102 includes:
s301, BERT is encodedAnd dictionary attention encoding is split and then input into a transducer layer to obtain a complete dictionary attention enhancement representation H, wherein the word symbol and the entity sequence x r The i-th element of (a)>Is a complete dictionary attention-enhancing representation H i The functional expression of (2) is:
H i =soffmax(A)EW v ,
in the above formula, soffmax represents the soffmax activation function, A is the attention weight A i,j The matrix is composed of all the words and the entity sequence x r The i-th element of (a)Vector representation E of (E) i Matrix of formations, W v Is a trainable weight matrix;
s302, aiming at the word symbol and the entity sequence x r The ith element in (a)Is a complete dictionary attention-enhancing representation H i The first n items are taken out as attention enhancement representation +.>Get the representation of +.>The constituent attention-enhancing representation h r ;
In the above formula, i is a word symbol and an entity sequence x r The ith element in (a)N is the number of characters in the sequence of tokens.
Optionally, in step S103, the linear classification layer performs linear classification to obtain a starting score S start Ending scoring s end And index internal scoring s mention The functional expression of (2) is:
in the above, s start (i) To predict the probability that position i will be the starting position of the knowledge point,for the attention-enhanced coded representation at position i, s end (j) Probability of ending position for predicted position j as knowledge point, +.>For the attention-enhanced coded representation at position j, s mention (k) Probability of predicting position k as an internal component of knowledge point, +.>For the attention-enhanced coded representation at position k, < >>And->Trainable network parameters for the linear classification layer; and when the decoding layer obtains the knowledge point labeling result, the calculation function expression of the probability of any region (i, j) is as follows:
in the above formula, p (i, j) represents the probability of the region (i, j), and σ represents a sigmoid function; if the probability of the region (i, j) is larger than the set value, judging that the region (i, j) is a knowledge point labeling region, and thus obtaining a knowledge point labeling result.
In addition, the invention also provides a text online learning resource knowledge point labeling system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the text online learning resource knowledge point labeling method.
Furthermore, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for being programmed or configured by a microprocessor to execute the text online learning resource knowledge point labeling method.
Compared with the prior art, the invention has the following advantages: the method comprises the steps of tokenizing an input course caption text to obtain a vocabulary symbol sequence, and obtaining BERT codes through a BERT coding layer; dictionary matching is carried out on the input course caption text and a preset designated entity table, and a candidate entity sequence is obtained; acquiring BERT codes through a BERT coding layer based on the word symbol sequence; calculating a dictionary attention code of each element in the candidate entity sequence by using an entity encoder BE; the BERT codes and the dictionary attention codes are spliced and then input into a transducer layer to obtain attention enhancement representation; the method and the device can realize automatic labeling of the knowledge points aiming at the text online learning resources, and have the advantages of high precision and recall ratio.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a network structure using a solid annotation model DsMOOC according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of network training using a solid annotation model DsMOOC in an embodiment of the invention.
Fig. 4 is a general entity linking example in an embodiment of the present invention.
Detailed Description
As shown in fig. 1 and fig. 2, the text online learning resource knowledge point labeling method of the embodiment includes:
s101, tokenizing an input course caption text to obtain a tokenized sequence [ t ] 1 ,t 2 ,...,t n ]And obtain BERT code through BERT coding layerInputting text of course caption and pre-processingDictionary matching is carried out on the set designated entity table to obtain a candidate entity sequence [ e ] 1 ,e 2 ,…]And calculates a dictionary attention code of each element therein using an entity encoder BE;
s102, BERT is encodedInput transducer layer acquisition after dictionary attention encoding splicing to obtain attention enhancement representation h r ;
S103, enhancing the attention to the representation h r Inputting the initial scoring s into a linear classification layer for linear classification start Ending scoring s en And index internal scoring s mentiom Scoring s the start start Ending scoring s end And index internal scoring s mention And inputting the decoding layer to obtain a knowledge point labeling result.
In the text online learning resource knowledge point labeling method of the embodiment, a network model formed by a BERT coding layer, an entity encoder BE, transformer layer, a linear classification layer and a decoding layer is named as an entity labeling model DsMOOC (Discovery and selection in MOOC). The entity labeling of course captions is a refinement of traditional course concept extraction research. In order to mark the knowledge needed to be understood by the learner in the caption, the correlation degree of the entity and the course needs to be distinguished on the basis of entity identification of the caption, and only phrases of the entity which can be understood by the learner are selected. The entity labeling of course captions is different from the conventional concept extraction task in that the concept extraction task does not consider context information, but in this embodiment, the boundaries of the entities are delineated from the context, and it is determined that the entities are not related to the course. The context information provides richer semantic information for the embodiment, so that the pre-training language models such as BERT and the like can be applied, but also brings more complex problem boundaries, namely, entity identification is needed, and mismatching situations of literal identity are needed to be eliminated. The entity labeling of course captions is the same as the wikipedia entity knowledge labeling task in that entity recognition in plain text is needed, and then correlation screening judgment is carried out. But there are two points that differ significantly. On the one hand, the present embodiment does not need to avoid repeated labeling. On the other hand, wikipedia and curriculum captions differ in terms of "helpful" criteria, which can lead to significant differences in screening results, which need to be addressed. In addition, the wikipedia labeling task has relatively plentiful training corpus. The training set of this embodiment may be much smaller than the task needs and this problem needs to be solved. The entity labeling model DsMOOC adopted by the text online learning resource knowledge point labeling method of the embodiment uses information in the knowledge graph as dictionary attention representation enhancement, and compared with other existing labeling methods, the DsMOOC has obvious performance improvement.
In this embodiment, in step S101, the BERT code is obtained through the BERT coding layerThe functional expression of (2) is:
in the above formula, BERT represents the BERT coding model, [ CLS ]]And [ SEP ]]Tag tokens for sentence start and separation, t 1 ~t n For a character in a sequence of logograms,representing dimension, h is the hidden layer dimension of the BERT coding model, n is the number of characters in the word sequence, and superscript b Representing this is the original semantic representation of the BERT coding model, enhanced with the latter representation by h r Etc. represent a distinction. The BERT coding model is an existing coding model, and can be found in the literature (Devlin, jacob, ming-Wei Chang, kenton Lee, and Kristina Toutova.2019. "BERT: pre-Training of Deep Bidirectional Transformers for Language Understand." Conference proceedings.In Proceedings of the 2019Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long and Short Paper)s), 4171-86), the present embodiment relates only to the application of the coding model and does not relate to the improvement of the coding model, so its details will not be described in detail here.
One difficulty with tasks is that: which sub-region in the text represents an entity? Logically, this is a pre-issue that considers what entity a sub-region represents and whether an entity is helpful to the reader or not. This step is accomplished in other tasks and systems by consulting the reference-entity table, but also results in these systems only finding the references enumerated in the reference-entity table. Compared with the task and the system for discovering new entities such as NER, the corpus provided by the embodiment directly provides the results of two steps of comprehensive discovery and screening, and the results of the step cannot be independently analyzed and discovered. Therefore, the score of the basic model trained directly on the corpus of this embodiment is low. The embodiment proposes to perform enhanced representation by taking the matching with the index-entity table as a part of the characteristics, so that the information of the index-entity table is utilized, and the problem that the entity outside the entity table cannot be generated due to the hard matching rule is avoided. Specifically, the BERT is encoded in step S102 of the present embodimentAnd dictionary attention code BE (e) i-n ) The splicing comprises:
s201, firstly, using the indicator entity table and the prior probability, the indicator entity table is a word symbol sequence [ t ] 1 ,t 2 ,...,t n ]Finding out the matching entity with the highest prior probability from all substrings matched with the named entity table, and taking the prior probability as the link confidence level; then according to the preset threshold th rl Screening the link confidence, selecting a reference entity pair with the confidence greater than a threshold value to obtain a reference list { (rs) i ,re i ,e i ) Of which (rs) i ,re i ) For candidate entity e i Location information rs of (a) i For candidate entity e i The starting position, rs i For candidate entity e i End position of (2); this reference list { (rs) i ,re i ,e i ) The precision rate of the model is possibly lower, and the model is not necessarily completely recalled, but sufficient semantic information supplement can be provided for the model through a representation enhancement mechanism; the basic idea in the method of this embodiment is to add each item (rs i ,re i ,e i ) Position information (rs) i ,re i ) And semantic information e i Added to region (rs) i ,re i ) Such that the system can use this information to comprehensively consider the discovery and screening of the references to achieve an enhanced representation when scoring the tokens;
s202, reference list { (rs) i ,re i ,e i ) Further original sequence of tokens t 1 ,t 2 ,...,t n ]Splicing into three sequences:
in the above, x r Representing word symbols and entity sequences, head r Representing a logogram and a sequence of entities x r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]In (3) a start position sequence, tail r Representing a logogram and a sequence of entities x r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]End position sequence of (a);
s203, combining the initial position sequence head r And end position sequence tail r For the word symbol and the entity sequence x r Any ith element of (2)And j' th element->Calculating head-tail relative distance ∈>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>And calculates the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Correlation R of (2) ij ;
S204, BERT-based codingAnd dictionary attention code determination of a word symbol and entity sequence x r Any i-th element +.>Determining its vector representation E i And combine with the correlation R ij Determining a logogram and an entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j The method comprises the steps of carrying out a first treatment on the surface of the Based on the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j Attention weighting is performed to obtain a weighted feature a as input to the transducer layer. />
In the present embodiment, the head-to-tail relative distance is calculated in step S203Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>The functional expression of (2) is:
in the above, head r i And tail r i Respectively a word symbol and an entity sequence x r Any ith element of (2)In the start position sequence head r And end position sequence tail r Corresponding element of (1), head r j Sum tail r j Respectively a word symbol and an entity sequence x r Any j-th element of (a)>In the start position sequence head r And end position sequence tail r Corresponding to the elements of the group.
In this embodiment, the ith element in step S203And j' th element->Correlation R of (2) ij The expression of the calculation function of (c) is:
in the above formula, reLU represents a ReLU activation function, W r For the word symbol and the entity sequence x r Is used for the weight matrix of the (c),for splicing operation, < >>And->Respectively represent the relative distance between the head and the tail>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>The result of the encoding of P is encoded by the relative position. It should be noted that, the relative position code P is an existing encoder, and reference may be made to document (6.Vaswani,Ashish,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N.Gomez,Lukasz Kaiser,and Illia Polosukhin.201 7."Attention Is All You seed. '' In proc. Ofneurones, 5998-6008.), and the present embodiment relates only to the application of the coding model, and does not relate to the improvement of the coding model, so its details will not be described In detail here.
The enhancements In this example represent improvements In the mechanism based In fact on the FLAT method (3.Li,Xiaonan,Hang Yan,Xipeng Qiu,and Xuanjing Huang.2020."FLAT: chinese NER Using Flat-Lattice transducer." InProc. OfACL, 6836-42.). In the FLAT method, word information is added to word representation for enhancement, and matched representation of words and words obtained through literature training is used. In the method of this embodiment, the expression of the symbol is a semantic expression generated by BERTThe entity representation is selected from the literature and the entity embedding method used, and is marked as BE (e i-n ). This entity represents two benefits: on the one hand, it is calculated from the title and information of the entity using the same BERT model code, in the same space as the representation of the logographic. On the other hand, BE (e) i-n ) The independence is maintained, and the information on one side of the knowledge base is fully utilized. Vector representation E in step S204 of the present embodiment i The functional expression of (2) is:
in the above-mentioned method, the step of,coding +.>Is the ith code in (e) i-n ) For the ith-nth code in dictionary attention codes, e i-n I is the word symbol and the entity sequence x for the i-n candidate entities in the candidate entity sequence r The i-th element of (a)>N is the number of characters in the sequence of tokens,/-, n is the number of characters in the sequence of tokens>For the word symbol and the entity sequence x r Any i-th element of (a); and attention weight a i,j The expression of the calculation function of (c) is:
in the above, W q As a trainable weight matrix, E i For the word symbol and the entity sequence x r The ith element in (a)Is represented by a vector of E j For the word symbol and the entity sequence x r The j-th element of (a)>Is represented by a vector of W k,E As a trainable weight matrix, R ij For the i element->And j' th element->W is as follows k,R For a trainable weight matrix, u is a trainable weight vector and v is a trainable weight vector.
In this embodiment, step S102 includes:
s301, BERT is encodedAnd dictionary attention encoding is split and then input into a transducer layer to obtain a complete dictionary attention enhancement representation H, wherein the word symbol and the entity sequence x r The i-th element of (a)>Is a complete dictionary attention-enhancing representation H i The functional expression of (2) is:
H i =softmax(A)EW v ,
in the above formula, softmax represents a softmax activation function, A is the attention weight A i,j The matrix is composed of all the words and the entity sequence x r The i-th element of (a)Vector representation E of (E) i Matrix of formations, W v Is a trainable weight matrix;
s302, aiming at the word symbol and the entity sequence x r The ith element in (a)Is a complete dictionary attention-enhancing representation H i The first n items are taken out as attention enhancement representation +.>Get the representation of +.>The constituent attention-enhancing representation h r ;
In the above, i isWord symbol and entity sequence x r The ith element in (a)N is the number of characters in the sequence of tokens.
In this embodiment, in step S103, the linear classification layer performs linear classification to obtain a starting score S start Ending scoring s end And index internal scoring s mention The functional expression of (2) is:
in the above, s start (i) To predict the probability that position i will be the starting position of the knowledge point,for the attention-enhanced coded representation at position i, s end (j) Probability of ending position for predicted position j as knowledge point, +.>For the attention-enhanced coded representation at position j, s mention (k) Probability of predicting position k as an internal component of knowledge point, +.>For the attention-enhanced coded representation at position k, < >>And->Trainable network parameters for the linear classification layer; and when the decoding layer obtains the knowledge point labeling result, the calculation function expression of the probability of any region (i, j) is as follows:
in the above formula, p (i, j) represents the probability of the region (i, j), and σ represents a sigmoid function; if the probability of the region (i, j) is larger than the set value, judging that the region (i, j) is a knowledge point labeling region, and thus obtaining a knowledge point labeling result.
Since the data set is much smaller in terms of both the number of sentences and average sentence length, and there is no entity encoding required for dictionary attention-based representation enhancement, the learning of entity discovery tasks by the model must be accomplished by first fine-tuning the task associated with entity discovery. As shown in fig. 3, the method of this embodiment designs a two-stage fine tuning manner to complete the training task. First, data preparation: the method comprises the steps of downloading a Chinese BERT pre-training model, finding and linking task public data sets by Chinese entities in advance, and arranging a knowledge point labeling data set for training course entity labeling. And a second step of: generic entity discovery and link fine-tuning: the method of the embodiment uses the general entity discovery and linking task to make the first fine tuning. The universal entity link sample is shown in fig. 4, and is a task of identifying and linking all entities in the hundred degrees encyclopedia range in the universal field webpage text, and is relatively close to the course entity labeling task in the text. Specific fine tuning procedures are described In ELQ literature (Li, belinda Z., sewon Min, srinivasan Iyer, yashr Mehdad, and Wen-tau YIh.2020, "efficiency One-Pass End-to-End Entity Linking for questions," In Proc. Of EMNLP, 6433-41.) the training procedure In this stage is not within the scope of the claimed invention. The fine tuning updates parameters of a context encoder, an entity encoder and a linear classifier. Third, course entity labeling training: parameters of a context encoder, an entity encoder and a linear classifier are reserved, parameters of a transducer layer are initialized randomly, and a knowledge point labeling data set training set is input for fine adjustment. And updating network parameters in each layer in the entity labeling model DsMOOC by using a back propagation algorithm, and obtaining the entity labeling model DsMOOC after training is completed.
The text online learning resource knowledge point labeling method of the embodiment is further verified by combining experiments. In the experiments of the present example, the test piece,a test set is constructed by using a manual labeling mode, and for a 12440 sentence online course subtitle sample, 6 volunteers label the course knowledge point entities. The training set, the verification set and the test set are randomly divided according to the dividing ratio of 8:1:1. Finally, the test set has 1244 examples, which include 1956 entities. The existing method for comparing the entity labeling model DsMOOC in this embodiment includes: WMP: a method for combining a prior probability screening based on a Wikpedia Miner (Mirne and Witten 2008) statistical dictionary matching method. Wherein the dictionary is a finger-entity table based on Chinese wikipedia statistics, and the prior probability is the ratio of the number of document spreads of a finger to the number of total document spreads in the wikipedia and selected as anchor text. In this embodiment, the maximum recall ratio parameter combination WMP is specifically adopted recall And a parameter combination WMP for maximizing F1 value best . MOOCCube (Yu et al 2020) and MOOCCube x (Yu et al 2021): a method of matching an existing published concept library. This is a model of existing research to deal with the concept of lessons in video captions. Pan was not selected for comparison because the library of curriculum concepts he published did not label all curriculum concepts in the subtitle, and the recall ratio was too low. The MOOCCube defines that the concept of the courses related to the computer science courses is ended by the computer science technology, so that the statistical performance of the MOOCCube (computer) sub-library is specially intercepted. In addition, direct training refers to skipping general entity discovery and link trimming, training is directly performed by using a course knowledge point labeling data set on the basis of a pre-training BERT model to obtain a model, and first trimming refers to knowledge point labeling prediction by using a model obtained only through general entity discovery and link trimming. The experimental results obtained are shown in table 1.
Table 1: table of experimental results.
Table 1 shows the performance of the method of the present embodiment and the existing method in labeling tasks by a course caption entity. In the traditional method based on literal matching, the WMP method based on Chinese wikipedia anchor text statistics takes advantage of recall ratio, and after the threshold is adjusted, the precision ratio is also improved. Compared with three existing concept libraries, the accuracy of the concept library of MOOCCube in the computer field is considerable, but the recall ratio is very low. In the general subject comparison, the recall ratio of MOOCCubeX is higher than that of MOOCCube by nearly 20 percent, and the advantage of wider source is reflected. The range is more precise, and the advantages of precision are possessed. The three methods F1 values of direct matching to the existing concept library are between 16.05% and 21.69%, which are far lower than the wikipedia matching and the solid annotation model DsMOOC of this example, which illustrates the inauguration of the direct matching method. The accuracy and recall of the model directly trained on the training set are low, even lower than WMP based on matching and statistical features best The model F1 value is slightly higher. After the first fine tuning, the recall ratio can be significantly improved, even exceeding the maximum recall ratio based on the matching model. Through the second fine tuning, the precision and recall are further improved. Finally, the entity labeling model DsMOOC of the embodiment obtains the best performance in the course subtitle entity labeling task, and the F1 value reaches 53.40%. In summary, the method for labeling knowledge points of text online learning resources in the embodiment can automatically label knowledge points of text online learning resources, and has the advantages of high precision and recall ratio.
In addition, the embodiment also provides a text online learning resource knowledge point labeling system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the text online learning resource knowledge point labeling method. The present embodiment also provides a computer-readable storage medium having a computer program stored therein, the computer program being for programming or configuring by a microprocessor to perform the text-based online learning resource knowledge point labeling method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (10)
1. A text online learning resource knowledge point labeling method is characterized by comprising the following steps:
s101, tokenizing an input course caption text to obtain a tokenized sequence [ t ] 1 ,t 2 ,...,t n ]And obtain BERT code through BERT coding layerDictionary matching is carried out on the input course caption text and a preset designated entity table, and a candidate entity sequence [ e ] is obtained 1 ,e 2 ,...]And calculates a dictionary attention code of each element therein using an entity encoder BE;
s102, BERT is encodedInput transducer layer acquisition after dictionary attention encoding splicing to obtain attention enhancement representation h r ;
S103, enhancing the attention to the representation h r Inputting the initial scoring s into a linear classification layer for linear classification start Ending scoring s end And index internal scoring s mention Scoring s the start start Ending scoring s end And index internal scoring s mention And inputting the decoding layer to obtain a knowledge point labeling result.
2. The text online learning resource knowledge point labeling method according to claim 1, wherein in step S101, a BERT code is obtained through a BERT coding layerThe functional expression of (2) is:
in the above formula, BERT represents the BERT coding model, [ CLS ]]And [ SEP ]]Tag tokens for sentence start and separation, t 1 ~t n For a character in a sequence of logograms,the representation dimension, h, is the hidden layer dimension of the BERT coding model, and n is the number of characters in the sequence of tokens.
3. The method for labeling text-based online learning resource knowledge points according to claim 1, wherein in step S102, BERT is encodedAnd dictionary attention code BE (e) i-n ) The splicing comprises:
s201, firstly, using the indicator entity table and the prior probability, the indicator entity table is a word symbol sequence [ t ] 1 ,t 2 ,...,t n ]Finding out the matching entity with the highest prior probability from all substrings matched with the named entity table, and taking the prior probability as the link confidence level; then according to the preset threshold th rl Screening the link confidence, selecting a reference entity pair with the confidence greater than a threshold value to obtain a reference list { (rs) i ,re i ,e i ) Of which (rs) i ,re i ) For candidate entity e i Location information rs of (a) i For candidate entity e i The starting position, rs i For candidate entity e i End position of (2);
s202, reference list { (rs) i ,re i ,e i ) Further original sequence of tokens t 1 ,t 2 ,...,t n ]Splicing into three sequences:
in the above, x r Representing word symbols and entity sequences, head r Representing a logogram and a sequence of entities x r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]In (3) a start position sequence, tail r Representing a logogram and a sequence of entities x r The medium element is in the original word symbol sequence t 1 ,t 2 ,...,t n ]End position sequence of (a);
s203, combining the initial position sequence head r And end position sequence tail r For the word symbol and the entity sequence x r Any ith element of (2)And j' th element->Calculating head-tail relative distance ∈>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>And calculates the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Is related to (a)R ij ;
S204, BERT-based codingAnd dictionary attention code determination of a word symbol and entity sequence x r Any i-th element +.>Determining its vector representation E i And combine with the correlation R ij Determining a logogram and an entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j The method comprises the steps of carrying out a first treatment on the surface of the Based on the word symbol and the entity sequence x r Any i-th element +.>And j' th element->Attention weight a of (2) i,j Attention weighting is performed to obtain a weighted feature a as input to the transducer layer.
4. The method for labeling knowledge points of online learning resources of text class according to claim 3, wherein the relative distances between the head and the tail are calculated in step S203Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>The functional expression of (2) is:
in the above, head r i And tail r i Respectively a word symbol and an entity sequence x r Any ith element of (2)In the start position sequence head r And end position sequence tail r Corresponding element of (1), head r j And tail r j Respectively a word symbol and an entity sequence x r Any j-th element of (a)>In the start position sequence head r And end position sequence tail r Corresponding to the elements of the group.
5. The text-based online learning resource awareness of claim 3The identifying point marking method is characterized in that the ith element in the step S203And j' th element->Correlation R of (2) ij The expression of the calculation function of (c) is:
in the above formula, reLU represents a ReLU activation function, W r For the word symbol and the entity sequence x r Is used for the weight matrix of the (c),in order for the splicing operation to be performed,and->Respectively represent the relative distance between the head and the tail>Relative distance of head->Relative distance of the tail head->Relative distance of tail +.>The result of the encoding of P is encoded by the relative position.
6. The method for labeling knowledge points of online learning resources of text class as set forth in claim 3, wherein the vector in step S204 represents E i The functional expression of (2) is:
in the above-mentioned method, the step of,coding +.>Is the ith code in (e) i-n ) For the ith-nth code in dictionary attention codes, e i-n I is the word symbol and the entity sequence x for the i-n candidate entities in the candidate entity sequence r The ith element in (a)N is the number of characters in the sequence of tokens,/-, n is the number of characters in the sequence of tokens>For the word symbol and the entity sequence x r Any i-th element of (a); and attention weight a i,j The expression of the calculation function of (c) is:
in the above, W q As a trainable weight matrix, E i For the word symbol and the entity sequence x r The ith element in (a)Is represented by a vector of E j For the word symbol and the entity sequence x r The j-th element of (a)>Is represented by a vector of W k,E As a trainable weight matrix, R ij For the i element->And j' th element->W is as follows k,R For a trainable weight matrix, u is a trainable weight vector and v is a trainable weight vector. />
7. The method for labeling text-based online learning resource knowledge points according to claim 6, wherein step S102 comprises:
s301, BERT is encodedAnd dictionary attention encoding is split and then input into a transducer layer to obtain a complete dictionary attention enhancement representation H, wherein the word symbol and the entity sequence x r The i-th element of (a)>Is a complete dictionary attention-enhancing representation H i The functional expression of (2) is:
H i =softmax(A)EW v ,
in the above formula, softmax represents a softmax activation function, A is the attention weight A i,j The matrix is composed of all the words and the entity sequence x r The i-th element of (a)Vector representation E of (E) i Matrix of formations, W v Is a trainable weight matrix;
s302, aiming at the word symbol and the entity sequence x r I of (a)Element(s)Is a complete dictionary attention-enhancing representation H i The first n items are taken out as attention enhancement representation +.>Get the representation of +.>The constituent attention-enhancing representation h r ;
8. The method for labeling text-based online learning resource knowledge points according to claim 1, wherein in step S103, linear classification is performed by a linear classification layer to obtain a starting score S start Ending scoring s end And index internal scoring s mention The functional expression of (2) is:
in the above, s start (i) To predict the probability that position i will be the starting position of the knowledge point,for the attention-enhanced coded representation at position i, s end (j) To predict positionj as probability of knowledge point end position, +.>For the attention-enhanced coded representation at position j, s mention (k) Probability of predicting position k as an internal component of knowledge point, +.>For the attention-enhanced coded representation at position k, < >>And->Trainable network parameters for the linear classification layer; and when the decoding layer obtains the knowledge point labeling result, the calculation function expression of the probability of any region (i, j) is as follows:
in the above formula, p (i, j) represents the probability of the region (i, j), and σ represents a sigmoid function; if the probability of the region (i, j) is larger than the set value, judging that the region (i, j) is a knowledge point labeling region, and thus obtaining a knowledge point labeling result.
9. A text-based online learning resource knowledge point labeling system comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the text-based online learning resource knowledge point labeling method of any one of claims 1-8.
10. A computer readable storage medium having a computer program stored therein, wherein the computer program is configured or programmed by a microprocessor to perform the text-based on-line learning resource knowledge point labeling method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310188731.2A CN116204607A (en) | 2023-02-27 | 2023-02-27 | Text online learning resource knowledge point labeling method, system and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310188731.2A CN116204607A (en) | 2023-02-27 | 2023-02-27 | Text online learning resource knowledge point labeling method, system and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116204607A true CN116204607A (en) | 2023-06-02 |
Family
ID=86507442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310188731.2A Pending CN116204607A (en) | 2023-02-27 | 2023-02-27 | Text online learning resource knowledge point labeling method, system and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116204607A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435746A (en) * | 2023-12-18 | 2024-01-23 | 广东信聚丰科技股份有限公司 | Knowledge point labeling method and system based on natural language processing |
-
2023
- 2023-02-27 CN CN202310188731.2A patent/CN116204607A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435746A (en) * | 2023-12-18 | 2024-01-23 | 广东信聚丰科技股份有限公司 | Knowledge point labeling method and system based on natural language processing |
CN117435746B (en) * | 2023-12-18 | 2024-02-27 | 广东信聚丰科技股份有限公司 | Knowledge point labeling method and system based on natural language processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106328147B (en) | Speech recognition method and device | |
CN106650943B (en) | Auxiliary writing method and device based on artificial intelligence | |
CN111160031A (en) | Social media named entity identification method based on affix perception | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN112487139B (en) | Text-based automatic question setting method and device and computer equipment | |
CN108536654A (en) | Identify textual presentation method and device | |
KR101988165B1 (en) | Method and system for improving the accuracy of speech recognition technology based on text data analysis for deaf students | |
KR20190080314A (en) | Method and apparatus for providing segmented internet based lecture contents | |
CN114218379A (en) | Intelligent question-answering system-oriented method for attributing questions which cannot be answered | |
CN111091002A (en) | Method for identifying Chinese named entity | |
CN116204607A (en) | Text online learning resource knowledge point labeling method, system and medium | |
CN113961706A (en) | Accurate text representation method based on neural network self-attention mechanism | |
CN112800177A (en) | FAQ knowledge base automatic generation method and device based on complex data types | |
CN112382295A (en) | Voice recognition method, device, equipment and readable storage medium | |
Gehrmann et al. | Improving human text comprehension through semi-Markov CRF-based neural section title generation | |
CN116320607A (en) | Intelligent video generation method, device, equipment and medium | |
CN114048335A (en) | Knowledge base-based user interaction method and device | |
Suman et al. | Gender Age and Dialect Recognition using Tweets in a Deep Learning Framework-Notebook for FIRE 2019. | |
KR102395702B1 (en) | Method for providing english education service using step-by-step expanding sentence structure unit | |
CN112084788A (en) | Automatic marking method and system for implicit emotional tendency of image captions | |
CN110750669A (en) | Method and system for generating image captions | |
Zhu et al. | YUN111@ Dravidian-CodeMix-FIRE2020: Sentiment Analysis of Dravidian Code Mixed Text. | |
KasthuriArachchi et al. | Deep learning approach to detect plagiarism in sinhala text | |
CN114492467A (en) | Fault-tolerant translation method and device for training fault-tolerant translation model | |
CN111090720B (en) | Hot word adding method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |