CN117435746A

CN117435746A - Knowledge point labeling method and system based on natural language processing

Info

Publication number: CN117435746A
Application number: CN202311738105.2A
Authority: CN
Inventors: 黎国权; 朱晖
Original assignee: Guangdong Xinjufeng Technology Co ltd
Current assignee: Guangdong Xinjufeng Technology Co ltd
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-01-23
Anticipated expiration: 2043-12-18
Also published as: CN117435746B

Abstract

The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a knowledge point labeling method and a knowledge point labeling system based on natural language processing, wherein the method comprises the following steps: acquiring context deducing features between identification words and sentences matched with learning text units in a first offline learning platform preset text and a second offline learning platform preset text; performing feature splicing on the identification words and sentences in the text pre-configured by the first offline learning platform and the identification words and sentences in the text pre-configured by the second offline learning platform based on the context deduction features; determining knowledge point contact features corresponding to the identification words and sentences completing feature splicing based on the distribution features of a plurality of learning text units corresponding to the identification words and sentences completing feature splicing in the first offline learning platform pre-configured text and the distribution features of the learning text units corresponding to the identification words and sentences completing feature splicing in the second offline learning platform pre-configured text; and generating a knowledge point labeling learning text based on the knowledge point contact features of the identification words and sentences with the feature splicing and the classification features corresponding to the identification words and sentences with the feature splicing.

Description

Knowledge point labeling method and system based on natural language processing

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a knowledge point labeling method and system based on natural language processing.

Background

At present, an offline learning platform has become one of important means for people to acquire knowledge. Traditional educational resources are often scattered on different offline learning platforms, each with respective learning materials and learning methods, which results in differences in the representation of knowledge points, making learner challenging when integrating and comparing learning information from different sources. Therefore, how to efficiently integrate multi-source learning materials and provide a more systematic and consistent learning experience is a current challenge.

Disclosure of Invention

In order to improve the technical problems in the related art, the invention provides a knowledge point labeling method and a knowledge point labeling system based on natural language processing.

In a first aspect, an embodiment of the present invention provides a knowledge point labeling method based on natural language processing, which is applied to an offline training platform system, and the method includes:

acquiring context deducing features between identification words and sentences matched with learning text units in a first offline learning platform preset text and a second offline learning platform preset text; the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are different offline learning platform pre-configured texts extracted from two offline pages with page number differences not larger than a set difference threshold;

Based on the context deducing characteristics corresponding to the identification words and sentences matched with the learning text units, performing characteristic splicing on the identification words and sentences in the first offline learning platform pre-configured text and the identification words and sentences in the second offline learning platform pre-configured text, wherein the two identification words and sentences subjected to characteristic splicing represent the same identification words and sentences in the learning resource;

determining knowledge point contact features corresponding to the identification words and sentences completing feature splicing based on the distribution features of a plurality of learning text units corresponding to the identification words and sentences completing feature splicing in the first offline learning platform pre-configured text and the distribution features of the learning text units corresponding to the identification words and sentences completing feature splicing in the second offline learning platform pre-configured text;

and generating a knowledge point labeling learning text based on the knowledge point contact features of the identification words and sentences subjected to feature splicing and the classification features corresponding to the identification words and sentences subjected to feature splicing.

Preferably, the feature splicing between the identifying words and sentences in the pre-configured text of the first offline learning platform and the identifying words and sentences in the pre-configured text of the second offline learning platform based on the context deducing features corresponding to the identifying words and sentences matched with the learning text units includes:

determining a mask prediction result corresponding to the first mask recognition result in a second offline learning platform pre-configured text based on the distribution characteristic of the first mask recognition result corresponding to the first identification word and sentence and the context deduction characteristic corresponding to the first identification word and sentence for the first identification word and sentence in the first offline learning platform pre-configured text;

If a second mask recognition result corresponding to a second identification word and sentence belonging to the target classification feature exists in the mask prediction result, performing feature splicing on the first identification word and sentence in the first offline learning platform pre-configured text and the second identification word and sentence in the second offline learning platform pre-configured text; the classification feature corresponding to the first identification word and sentence is the target classification feature.

performing mask size adjustment on the mask prediction result in the second offline learning platform pre-configured text;

And if a second mask recognition result corresponding to a second identification word and sentence belonging to the target classification feature exists in the mask prediction result after the mask size adjustment, performing feature splicing on the first identification word and sentence in the first offline learning platform pre-configured text and the second identification word and sentence in the second offline learning platform pre-configured text.

Preferably, the method further comprises:

if a third identification word and sentence which are not complete in feature splicing exists in the first offline learning platform pre-configured text, determining a global text description vector of the third identification word and sentence in the first offline learning platform pre-configured text based on a text block description vector of the third identification word and sentence in the first offline learning platform pre-configured text and a text knowledge logic vector of at least one fourth identification word and sentence which is already feature spliced in the first offline learning platform pre-configured text relative to the third identification word and sentence;

determining a global text description vector of a fifth identification word and sentence in an original offline learning platform pre-configured text based on a text block description vector of the fifth identification word and sentence in the original offline learning platform pre-configured text and a text knowledge logic vector of the sixth identification word and sentence which is subjected to feature splicing in the original offline learning platform pre-configured text relative to the fifth identification word and sentence; the classification characteristic corresponding to the fifth identification word and sentence is the same as the classification characteristic corresponding to the third identification word and sentence;

Determining a commonality score of the third identification word and the fifth identification word based on a global text description vector of the third identification word in the first offline learning platform pre-configured text and a global text description vector of the fifth identification word in the original offline learning platform pre-configured text;

and if the commonality score is larger than a commonality score threshold, performing feature splicing on the third identification words and sentences in the first offline learning platform pre-configured text and the fifth identification words and sentences in the original offline learning platform pre-configured text.

Preferably, the determining the knowledge point contact feature corresponding to the identification word and sentence for completing feature stitching based on the distribution feature of the plurality of learning text units corresponding to the identification word and sentence for completing feature stitching in the first offline learning platform pre-configured text and the distribution feature in the second offline learning platform pre-configured text includes:

determining knowledge concept updating characteristics based on the distribution characteristics of a plurality of learning text units corresponding to the identification words and sentences which finish feature splicing in the first offline learning platform pre-configured text and the distribution characteristics of the learning text units in the second offline learning platform pre-configured text;

based on the knowledge concept updating characteristics, determining first relative distribution characteristics of the identification words and sentences subjected to characteristic splicing in a first word vector space corresponding to the pre-matched text of the first offline learning platform and second relative distribution characteristics of the identification words and sentences subjected to characteristic splicing in a second word vector space corresponding to the pre-matched text of the second offline learning platform;

And determining knowledge point contact features corresponding to the identification words and sentences for completing feature splicing based on the page number difference between the page state corresponding to the first offline learning platform pre-configured text and the page state corresponding to the second offline learning platform pre-configured text, the first relative distribution features and the second relative distribution features.

Preferably, before determining the knowledge point contact feature corresponding to the identification word and sentence for completing feature stitching based on the page number difference between the page state corresponding to the first offline learning platform pre-configured text and the page state corresponding to the second offline learning platform pre-configured text, the first relative distribution feature and the second relative distribution feature, the method further includes:

determining a first text content output state of a first offline page associated with the first offline learning platform pre-configured text under an offline learning task corresponding to the first offline learning platform pre-configured text, and taking the first text content output state as a page state corresponding to the first offline learning platform pre-configured text;

determining a second text content output state of a second offline page associated with the second offline learning platform pre-configured text under an offline learning task corresponding to the second offline learning platform pre-configured text, and taking the second text content output state as a page state corresponding to the second offline learning platform pre-configured text;

And determining a page number difference between a page state corresponding to the first offline learning platform pre-configured text and a page state corresponding to the second offline learning platform pre-configured text based on the first text content output state and the second text content output state.

Preferably, the obtaining the context deducing feature between the identification words and sentences in the first offline learning platform preset text and the identification words and sentences matched with the learning text units in the second offline learning platform preset text includes:

determining a plurality of learning text unit doublets corresponding to the learning text unit matched identification words and sentences based on the identification words and sentences of the first offline learning platform pre-matched text and the learning text unit matched identification words and sentences in the second offline learning platform pre-matched text, wherein each learning text unit doublet comprises a first learning text unit in the learning text unit matched identification words and sentences in the first offline learning platform pre-matched text and a second learning text unit in the learning text unit matched identification words and sentences in the second offline learning platform pre-matched text;

and determining the context deducing characteristics corresponding to the matched identification words and sentences of the learning text units based on the distribution characteristics of the first learning text unit and the distribution characteristics of the second learning text unit in the plurality of learning text unit tuples.

a first target identification word and sentence of which the context deducing characteristics are to be determined in the text are pre-configured for the first offline learning platform, and a first target sentence cluster corresponding to the first target identification word and sentence is determined;

determining context deducing features between the first target sentence cluster and the second target sentence cluster based on a plurality of learning text unit tuples with corresponding relations in the first target sentence cluster and the second target sentence cluster, wherein the second target sentence cluster is a sentence cluster corresponding to the first target sentence cluster in the second offline learning platform pre-configured text;

taking the context deducing features between the first target sentence cluster and the second target sentence cluster as the context deducing features between a first target identification word and sentence in a first offline learning platform pre-configured text and a second target identification word and sentence in a second offline learning platform pre-configured text; and the second target identification words and sentences are identification words and sentences of which learning text units in the second offline learning platform pre-configured text are located in the second target sentence cluster.

Preferably, before obtaining the context deducing feature between the identification words and sentences in the first offline learning platform pre-configured text and the identification words and sentences in the second offline learning platform pre-configured text, the method further includes:

based on text thermal identification data corresponding to a first offline learning platform pre-configured text and distribution characteristics of all identification words and sentences in the first offline learning platform pre-configured text, clustering the identification words and sentences in the first offline learning platform pre-configured text, and determining sentence clusters in the first offline learning platform pre-configured text;

based on text thermal identification data corresponding to a second offline learning platform pre-configured text and distribution characteristics of all identification words and sentences in the second offline learning platform pre-configured text, clustering the identification words and sentences in the second offline learning platform pre-configured text, and determining sentence clusters in the second offline learning platform pre-configured text;

and determining corresponding sentence clusters in the first offline learning platform pre-configured text and the second offline learning platform pre-configured text based on a learning text unit association result of the first offline learning platform pre-configured text and the second offline learning platform pre-configured text.

Preferably, the method further comprises:

if sentence clusters with the number of the learning text units smaller than a first threshold exist in the first offline learning platform pre-configured text or the second offline learning platform pre-configured text, expanding the sentence clusters so that the number of the learning text units in the expanded sentence clusters is not smaller than the first threshold.

Preferably, the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are in the same pre-configured text set; the method further comprises the steps of:

deleting a learning text unit positioned in the first noise text set in the first offline learning platform pre-configured text based on the distribution characteristics of the first noise text set indicated by the noise discrimination point of the first offline learning platform pre-configured text;

deleting learning text units in the second noise text set in the second offline learning platform pre-configured text based on the distribution characteristics of the second noise text set indicated by the noise discrimination point of the second offline learning platform pre-configured text;

performing learning text unit association on learning text units in the pre-configured text of the first offline learning platform which completes noise deletion and learning text units in the pre-configured text of the second offline learning platform which completes noise deletion;

And if the number of the associated learning text unit tuples is greater than a second threshold, determining that the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are in the same pre-configured text set.

Preferably, the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are obtained from an initial pre-configured text set; the method further comprises the steps of:

for a plurality of offline learning platform pre-configured texts to be processed, grouping the plurality of offline learning platform pre-configured texts based on the extracted page numbers corresponding to the offline learning platform pre-configured texts and chapter labels corresponding to the offline learning platform pre-configured texts to obtain at least one initial pre-configured text set; the difference of page numbers between the offline pages corresponding to any two offline learning platform pre-configured texts in the same initial pre-configured text set is not greater than a set difference threshold, and the difference of chapter labels corresponding to any two offline learning platform pre-configured texts is not greater than a third threshold.

In a second aspect, the present invention also provides an offline training platform system, including a processor and a memory; the processor is in communication with the memory, and the processor is configured to read and execute a computer program from the memory to implement the method described above.

In a third aspect, the present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the method described above.

By applying the scheme of the invention, different visual angles and interpretation modes for describing the same knowledge point can be captured by comparing the preset texts of different offline learning platforms and focusing on the matched identification words and sentences and the context thereof. This helps to provide a more comprehensive knowledge base, which lays a solid foundation for subsequent feature stitching and knowledge point association analysis. By stitching the contextual features of the same identifying words and sentences in the first offline learning platform and the second offline learning platform, a comprehensive, multidimensional knowledge representation can be created. This not only promotes consistency across platform content, but may also reveal new knowledge links and opportunities for in-depth understanding. By analyzing the distribution characteristics of the identification words and sentences which complete feature splicing in two different texts, it can be identified which knowledge points often appear together or are discussed jointly. This helps map out the network relationships between knowledge points, deepening understanding of course structure, providing a more consistent and systematic learning experience for the learner. Using the information extracted in the previous step, a learning text with knowledge point contact features can be created, where each knowledge point is explicitly labeled and categorized. The knowledge point labeling learning text can be used as a powerful learning tool, supports the design of an adaptive learning path, helps a learner select learning materials according to own needs and preferences, and improves learning efficiency and effect. By the design, the limitation of a single education resource can be overcome, and a more comprehensive and complementary offline learning knowledge point labeling learning text can be generated by integrating and optimizing the content from a plurality of offline learning platforms.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flow chart of a knowledge point labeling method based on natural language processing according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention.

It should be noted that the terms "first," "second," and the like in the description of the present invention and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiment provided by the embodiment of the invention can be executed in an offline training platform system, a computer device or a similar computing device. Taking the example of running on an offline training platform system, the offline training platform system may comprise one or more processors (which may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory for storing data, and optionally the offline training platform system may further include a transmission device for communication functions. It will be appreciated by those of ordinary skill in the art that the above-described architecture is merely illustrative and is not intended to limit the architecture of the offline training platform system described above. For example, the offline training platform system may also include more or fewer components than those shown above, or have a different configuration than those shown above.

The memory may be used to store a computer program, for example, a software program of application software and a module, for example, a computer program corresponding to a knowledge point labeling method based on natural language processing in the embodiment of the present invention, and the processor executes the computer program stored in the memory, thereby performing various functional applications and data processing, that is, implementing the method described above. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory may further include memory remotely located with respect to the processor, the remote memory being connectable to the offline training platform system through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of an offline training platform system. In one example, the transmission means comprises a network adapter (Network Interface Controller, simply referred to as NIC) that can be connected to other network devices via a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

Referring to fig. 1, fig. 1 is a flowchart of a knowledge point labeling method based on natural language processing according to an embodiment of the present invention, where the method is applied to an offline training platform system, and further includes steps 10-40.

Step 10, obtaining context deducing features between identification words and sentences matched with learning text units in a first offline learning platform preset text and a second offline learning platform preset text; the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are different offline learning platform pre-configured texts extracted from two offline pages with page number differences not larger than a set difference threshold.

In the scheme of the invention, the first offline learning platform is provided with the text in advance, which is preset and used for learning, on a certain offline learning platform. The second offline learning platform is configured with text similar to the first offline learning platform, which also refers to learning text material preset on another offline learning platform. The learning text unit refers to a basic unit constituting learning text, such as a paragraph, a sentence, a phrase, or a vocabulary, or the like. Identifying an expression is a word or sentence that has a particular meaning in text or is used to identify a point of knowledge. Context-derived features refer to the context surrounding the identifying words and sentences, which can help understand the meaning of the identifying words and sentences and their role in the text. The page number difference refers to the difference between the numbers (possibly representing chapters, page numbers, etc.) of two offline pages. The set difference threshold is a predefined threshold for determining the maximum range of accepted page number differences for subsequent processing. Offline pages refer to static pages that do not require networking to be accessed, and here refer specifically to pages that contain learning materials.

It is assumed that two different offline learning systems are provided with learning materials for mathematical courses, respectively, and that each offline learning system has its own offline learning platform. The first offline learning system's platform has an electronic version of a mathematical book called "first offline learning platform pre-configured text", and the second offline learning system's platform has an electronic version of another mathematical book called "second offline learning platform pre-configured text". The content of the two books is similar, but typesetting and page numbering may be slightly different.

In step 10, it is necessary to compare similar contents in the two books. First, it is determined whether the difference in page numbers in the two books is within a preset threshold value, for example, the difference is not more than 5 pages. If the difference is within an acceptable range, the "learn text units" in both books are further extracted.

Next, the same "identifying words and sentences" in both books, such as the word "pythagorean theorem", are found, and then "context deducing features" around the word, that is, explanation and questions around the pythagorean theorem, etc., are analyzed. By analysis of these features, the similarity and variability between two texts can be better understood and possibly used for further educational resource integration or knowledge point linking.

And 20, performing feature splicing on the identification words and sentences in the pre-matched text of the first offline learning platform and the identification words and sentences in the pre-matched text of the second offline learning platform based on the context deducing features corresponding to the identification words and sentences matched with the learning text units, wherein the two identification words and sentences subjected to feature splicing represent the same identification words and sentences in the learning resource.

In the scheme of the invention, feature stitching refers to combining two data segments of different sources (in this example, identification words and sentences and their context derived features) according to corresponding rules or logic to form a complete data representation. This typically involves comparing and analyzing the commonalities and differences between two data segments and constructing a unified view therefrom. Learning resources refer to materials that may be used for learning, including but not limited to books, articles, video, audio, interactive modules, and the like. In this example, the learning resources may be learning content contained on two different offline learning platforms.

Assume that there are sections on biology on two different offline learning platforms. The pre-configured text of the first offline learning platform has a portion related to "cell division", and the pre-configured text of the second offline learning platform also has a similar portion. The identification phrase "mitosis" in two text units is first determined. Subsequently, context-derived features surrounding these identified phrases, such as their context-interpreted text, charts, and examples, are collected.

In step 20, the context-derived features around "mitosis" in the first offline learning platform are compared and combined with corresponding features in the second offline learning platform. For example, if a first offline learning platform emphasizes mitotic phases, while a second offline learning platform provides more detail about chromosomal behavior, the result of feature stitching would be a more comprehensive mitotic learning unit that fuses the information advantages of both offline learning platforms.

By means of the splicing, not only is the information density of the learning text units enhanced, but also new knowledge relations can be revealed, and the connotation of learning resources is further enriched. Finally, the two identification words and sentences of the feature concatenation jointly represent the same concept or knowledge point, and a foundation is laid for subsequent knowledge point labeling and learning resource integration.

And step 30, determining knowledge point contact characteristics corresponding to the identification words and sentences completing feature splicing based on the distribution characteristics of a plurality of learning text units corresponding to the identification words and sentences completing feature splicing in the first offline learning platform pre-configured text and the distribution characteristics of the learning text units corresponding to the identification words and sentences completing feature splicing in the second offline learning platform pre-configured text.

In the solution of the present invention, the distribution feature (Distribution Feature) refers to learning the pattern or law of occurrence of text units in the text, such as frequency, location or relation to other text units, etc. In different offline learning platforms, even the same knowledge points, the corresponding learning text units may exhibit different distribution characteristics due to editing style, learning order, content depth, or the like. Knowledge point association features (Knowledge Point Connection Feature) refer to information about associations between knowledge points that can be revealed by analyzing distribution features of learning text units. For example, the derivation of a formula may be referenced in a number of places, where the context in which it appears, the relevant questions and exercises reflect the relationship of the formula to other mathematical concepts.

Assume that two offline learning platforms are provided, each of which contains a set of learning materials for a physical course. It is desirable to integrate the resources of the two offline learning platforms to enhance students' understanding of physical concepts.

From the data obtained in step 20, matching identifying phrases (such as "newton's second law") have been obtained by feature stitching, which represent the same knowledge points in the text of the two offline learning platforms. Next, the distribution characteristics of these identifying phrases in the respective platform text are analyzed. In the text of the first offline learning platform, the "newton's second law" may often appear with the calculation questions of force and acceleration, while in the text of the second offline learning platform, it may be more connected with the content of conservation of momentum. By comparing these distribution characteristics, it can be determined which knowledge points are strongly linked to the "newton's second law" on the two offline learning platforms. For example, if the text of a first offline learning platform relates it to an engineering application and a second offline learning platform relates it to a theoretical derivation, then "newton's second law" can be considered to have a close relationship to both areas. After determining these knowledge point association features, a comprehensive resource can be designed that combines the advantages of two offline learning platforms to provide a comprehensive view to learn "newton's second law". This may include practice of various forms of learning activities such as computational questions, theoretical instructions, experimental simulations, and the like. In addition, the learning path can be optimized by utilizing the link features, for example, the basic theory is displayed first, then the actual application is introduced, and finally the theory is verified through experiments.

Through the steps, the technical scheme in the step 30 is helpful for establishing a bridge of knowledge points between different offline learning platforms, so that students can understand and master complex concepts through various visual angles and methods.

And step 40, generating a knowledge point labeling learning text based on the knowledge point contact features of the identification words and sentences subjected to feature splicing and the classification features corresponding to the identification words and sentences subjected to feature splicing.

Wherein the process described in step 40 involves generating knowledge point annotation learning text using the data acquired and integrated in the previous steps. The key here is to apply the extracted knowledge point association features and classification features to the learning material in order to create a richer and learning-aided content.

Based on this, knowledge point labeling learning text (Knowledge Point Annotated Learning Text) contains detailed descriptions, definitions, examples, or related exercises of specific knowledge points that are explicitly labeled in the text by certain technical means to facilitate learner recognition and understanding. The knowledge point association feature (Knowledge Point Linkage Feature) may also refer to information that can embody logical or topic associations between different learning text units, such as co-occurring concepts, mutual reference theories, or coherent topics. In addition, classification features (Categorization Feature) can help to categorize knowledge points into specific discipline categories, difficulty levels, or relevance categories, etc., so that learning content can be organized and presented according to different dimensions.

Assuming that the sections on the two offline learning platforms about the "ecosystem" have been deeply analyzed through steps 10 to 30 and feature stitching is completed, a "ecosystem" knowledge point database is now provided that merges the advantages of the two offline learning platforms.

In step 40, the database is used to generate the final knowledge point annotation learning text. The linking characteristics of each knowledge point, such as "food chain", "biodiversity", "niche", etc., are first determined and how the knowledge points are related to each other in different text is determined. These knowledge points are then examined as to how they are classified, for example, in terms of their subject area (biology, environmental science), complexity (primary, advanced) or their order in the course.

The learning text is then annotated with this information, ensuring that each knowledge point is not only clearly defined and interpreted, but also that its relationship to other knowledge points is marked. For example, when teaching a "food chain," an interactive chart may be added showing the location of different species in the food chain, and providing additional resources linked to "biodiversity" and "niches.

The generated learning text has the following characteristics:

(1) Each knowledge point is marked explicitly, and visual explanation and examples are provided;

(2) Revealing the links between knowledge points, helping learners build a framework understanding of the whole discipline;

(3) Based on the classification of the knowledge points, a customized learning path and in-depth material are provided to the learner.

By the method, the finally generated knowledge point labeling learning text becomes a learning tool which is highly organized and easy to navigate, and the understanding and the memorization of the knowledge points by learners are effectively promoted.

By comparing the pre-configured texts of different offline learning platforms and focusing on the matched identification words and sentences and the context thereof, different visual angles and interpretation modes for describing the same knowledge point can be captured by applying the method to the steps 10-40. This helps to provide a more comprehensive knowledge base, which lays a solid foundation for subsequent feature stitching and knowledge point association analysis. By stitching the contextual features of the same identifying words and sentences in the first offline learning platform and the second offline learning platform, a comprehensive, multidimensional knowledge representation can be created. This not only promotes consistency across platform content, but may also reveal new knowledge links and opportunities for in-depth understanding. By analyzing the distribution characteristics of the identification words and sentences which complete feature splicing in two different texts, it can be identified which knowledge points often appear together or are discussed jointly. This helps map out the network relationships between knowledge points, deepening understanding of course structure, providing a more consistent and systematic learning experience for the learner. Using the information extracted in the previous step, a learning text with knowledge point contact features can be created, where each knowledge point is explicitly labeled and categorized. The knowledge point labeling learning text can be used as a powerful learning tool, supports the design of an adaptive learning path, helps a learner select learning materials according to own needs and preferences, and improves learning efficiency and effect.

By the design, the limitation of a single education resource can be overcome, and a more comprehensive and complementary offline learning knowledge point labeling learning text can be generated by integrating and optimizing the content from a plurality of offline learning platforms.

In some examples, step 20, that is, based on the derived context features corresponding to the identification words and sentences matched with the learning text units, performs feature stitching on the identification words and sentences in the first offline learning platform pre-configured text and the identification words and sentences in the second offline learning platform pre-configured text, and is implemented through steps 21-22.

Step 21, for a first identification word and sentence in the first offline learning platform pre-configured text, determining a mask prediction result corresponding to the first mask recognition result in a second offline learning platform pre-configured text based on a distribution feature of a first mask recognition result corresponding to the first identification word and sentence and a context deducing feature corresponding to the first identification word and sentence.

Step 22, if a second mask recognition result corresponding to a second identification word and sentence belonging to the target classification feature exists in the mask prediction result, feature stitching is performed on the first identification word and sentence in the first offline learning platform pre-configured text and the second identification word and sentence in the second offline learning platform pre-configured text. The classification feature corresponding to the first identification word and sentence is the target classification feature.

In the above scenario, the mask recognition result (Masked Recognition Result) refers to a method for text analysis in which some portions of text are temporarily hidden or "masked" so that the model can predict or recognize the contents of the masked portions based on the remaining context. Such techniques are often used in natural language processing to improve the ability of a model to understand context. The masked prediction result (Masked Prediction Result) is a prediction that the model makes about the masked text based on a given context. For example, during the training process of machine learning, the model may need to predict the masked words or phrases in the sentence.

Suppose that biological lesson content on two different offline learning platforms is being processed. In the first offline learning platform, there is a paragraph about "photosynthesis", and the text surrounding this paragraph provides a detailed description about the photosynthesis process and importance. In this context, "photosynthesis" is the first identifying phrase sought.

In the text of the first offline learning platform, an algorithm may be used to "mask" the photosynthesis and generate a mask recognition result based on the text surrounding the sentence. Then, an attempt will be made to find similar context in the pre-configured text of the second offline learning platform in order to predict the mask prediction result of a possible match. This means that words and phrases with similar contextual characteristics are looked up in the second offline learning platform to determine if they correspond to the same knowledge point. Next, the prediction result of the second offline learning platform is checked to confirm whether or not there is an expression corresponding to "photosynthesis" therein. If there is a term belonging to the same classification feature (such as biological terms, key concepts, etc.), it is confirmed that this is a valid match. Once the match is confirmed, the "photosynthesis" of the first offline learning platform is feature stitched with the corresponding words and sentences of the second offline learning platform to create a more complete learning unit.

By the design, the context is forced to be focused by the model through the mask technology, the capability of predicting unknown or missing information can be improved, and further the understanding of the text is enhanced. The use of mask recognition and prediction results allows the same knowledge points to be mapped precisely between different learning platforms even though their expressions are not identical. Through characteristic concatenation, can combine the advantage of a plurality of learning platforms, provide a more comprehensive and coherent study experience for the learner. This approach helps to establish a unified standard between different learning platforms, thereby making it easier to transform and integrate learning materials.

In other examples, step 20, that is, based on the derived context features corresponding to the identification words and sentences matched with the learning text units, performs feature concatenation on the identification words and sentences in the first offline learning platform pre-configured text and the identification words and sentences in the second offline learning platform pre-configured text, which may also be implemented by the technical solutions described in steps 20 a-20 c.

Step 20a, for a first identification word and sentence in the first offline learning platform pre-configured text, determining a mask prediction result corresponding to the first mask recognition result in a second offline learning platform pre-configured text based on a distribution feature of a first mask recognition result corresponding to the first identification word and sentence and a context deducing feature corresponding to the first identification word and sentence.

And step 20b, performing mask size adjustment on the mask prediction result in the second offline learning platform pre-configured text.

And step 20c, if a second mask recognition result corresponding to a second identification word and sentence belonging to the target classification feature exists in the mask prediction result after the mask size adjustment, performing feature splicing on the first identification word and sentence in the first offline learning platform pre-configured text and the second identification word and sentence in the second offline learning platform pre-configured text.

In the above example, the mask sizing (Mask Size Adjustment) involves changing the length or number of masked (mask) portions. For example, only one word in the original text may be masked when predicting missing text, but to better accommodate the text of another offline learning platform, the mask may need to be sized to cover longer phrases or sentences, or vice versa, reducing the masked content. Doing so may help the model more accurately predict and match representations of similar content on different platforms.

It is assumed that two different offline learning platforms are provided, both of which provide learning content about the "industrial revolution", but in different expressions and depths.

On the first offline learning platform, there is a section detailing the invention of the "steamer" and its impact on the industrial revolution. A "steamer" is selected as the first identifying phrase and a mask recognition result is generated based on text content (e.g., inventor, influence, technical change, etc.) in the vicinity of the phrase. On the second offline learning platform, the discussion of "steam engine" may be more abbreviated or use different terminology. Here, the original mask prediction may need to be resized to match the content and expression in the second platform. For example, if the second platform is more focused on "transition of industrial production" it may be desirable to expand the mask range to cover this broader theme. Once the mask sizing is complete, it is checked whether the adjusted prediction finds the corresponding "steamer" related description in the text of the second platform. If present, and this description meets the target classification feature (i.e., the key technology point associated with the industrial revolution), then the "steamer" related content of the two offline learning platforms can be feature stitched to create a comprehensive learning unit.

Therefore, through feature stitching, two offline learning platforms can be integrated with respect to different visual angles and information of the industrial revolution, and a more comprehensive historical visual angle is provided for students. The material after feature stitching can provide multiple angles of interpretation and analysis, helping students to better understand historical events and their background. Ensuring that important historical concepts and knowledge points are presented and interpreted consistently in different learning materials reduces confusion when switching between different offline learning platforms.

Under some possible designs, the method further comprises steps 001-004.

And 001, if a third identification word and sentence which are not complete in feature splicing exists in the first offline learning platform pre-configured text, determining a global text description vector of the third identification word and sentence in the first offline learning platform pre-configured text based on a text block description vector of the third identification word and sentence in the first offline learning platform pre-configured text and a text knowledge logic vector of at least one fourth identification word and sentence which is already feature spliced in the first offline learning platform pre-configured text relative to the third identification word and sentence.

002, determining a global text description vector of a fifth identification word and sentence in an original offline learning platform pre-configured text based on a text block description vector of the fifth identification word and sentence in the original offline learning platform pre-configured text and a text knowledge logic vector of the sixth identification word and sentence which is subjected to feature splicing in the original offline learning platform pre-configured text relative to the fifth identification word and sentence; the classification characteristic corresponding to the fifth identification word and sentence is the same as the classification characteristic corresponding to the third identification word and sentence.

Step 003, determining a commonality score of the third identification word and the fifth identification word based on a global text description vector of the third identification word in the first offline learning platform pre-configured text and a global text description vector of the fifth identification word in the original offline learning platform pre-configured text.

And 004, if the commonality score is larger than a commonality score threshold, performing feature stitching on the third identification words and sentences in the preset text of the first offline learning platform and the fifth identification words and sentences in the preset text of the original offline learning platform.

Under the above design, the text block description vector (Text Block Description Vector) refers to a mathematical vector that is used to represent the content and characteristics of a piece of text (i.e., a text block). The elements in the vector may contain key information extracted from the text, such as the frequency of words, the presence of specific terms, etc., that numerically characterize the text block. The text knowledge logic vector (Text Knowledge Logic Vector) refers to a vector representing the logical relationship of knowledge points in text, such as the dependency or sequence between a knowledge point and other knowledge points. Such vectors help to understand the links between the different knowledge points. The global text description vector (Global Text Description Vector) refers to a vector describing a knowledge point in a broader text context, taking into account the role and importance of the knowledge point throughout the document or learning material. The commonality score (Commonality Score) is a metric used to evaluate the similarity of two knowledge points or text blocks in terms of content or features. If two knowledge points have a high commonality score, it means that they are interchangeable or similar to some extent.

It is assumed that two sets of junior high history learning materials are being processed, with the theme "industrial revolution" being involved.

In step 001, a text block associated with "steamer invention" is selected from the first set of learning materials and its text block description vector is calculated. At the same time, the logical relation between the knowledge point and other knowledge points (such as 'textile industry improvement') is considered to construct a global text description vector. In step 002, find the "application of ship" knowledge points in another set of learning materials that are the same as the "steamer invention" classification feature, and calculate its text block description vector and global text description vector as well. In step 003, a commonality score is calculated between the two knowledge points based on their global text description vectors. If the commonality score in step 004 exceeds a predetermined threshold, the two knowledge points are considered to have sufficient similarity in different learning materials, feature stitching can be performed, and the two knowledge points are integrated into a unified knowledge point labeling learning text.

By doing so, students can better understand the location and importance of individual knowledge points in a large historical context by constructing and comparing global text description vectors. Feature stitching allows students to see similarities and differences between different learning materials, thereby achieving a deeper learning experience. By integrating learning resources from different sources, richer and diversified learning materials can be created, and learning efficiency is improved. The global text description vector and the commonality score can help teachers identify core knowledge points in different learning materials, thereby providing more targeted learning support according to the needs of students.

In some alternatives of the present invention, step 30, that is, determining the knowledge point contact feature corresponding to the identification word and sentence completing feature stitching based on the distribution feature of a plurality of learning text units corresponding to the identification word and sentence completing feature stitching in the first offline learning platform pre-configured text and the distribution feature of the learning text units corresponding to the identification word and sentence completing feature stitching in the second offline learning platform pre-configured text may be implemented through steps 31-33.

And step 31, determining knowledge concept updating characteristics based on the distribution characteristics of a plurality of learning text units corresponding to the identification words and sentences completing feature splicing in the first offline learning platform pre-configured text and the distribution characteristics of the learning text units in the second offline learning platform pre-configured text.

And step 32, based on the knowledge concept updating characteristics, determining first relative distribution characteristics of the identification words and sentences subjected to characteristic splicing in a first word vector space corresponding to the pre-matched text of the first offline learning platform and second relative distribution characteristics of the identification words and sentences subjected to characteristic splicing in a second word vector space corresponding to the pre-matched text of the second offline learning platform.

And step 33, determining knowledge point contact features corresponding to the identification words and sentences for completing feature splicing based on page number differences between the page states corresponding to the first offline learning platform pre-configured text and the page states corresponding to the second offline learning platform pre-configured text, the first relative distribution features and the second relative distribution features.

In the above alternatives, the knowledge concept update feature (Knowledge Concept Update Feature) refers to that in two different learning platforms, there may be updates or changes to the content or expression of the same knowledge point, and this feature reflects the latest description, definition or understanding of the knowledge concept in different texts. The word vector space (Word Vector Space) refers to the fact that in natural language processing, vocabulary is converted into a mathematical vector form for computer processing. In such a space, each vocabulary has its corresponding vector representation, which can typically be used to capture semantic relationships between the vocabularies. Page status may refer to the content of a page in the learning material, including the layout and organization of all elements, such as text, images, etc. The difference in page numbers may then refer to the difference between the numbers of the pages in which the corresponding content is located in the two different learning materials.

Given that two sets of junior middle school history learning materials are being processed, each from a different publishing company, it is desirable to integrate the chapters related to "ancient Egypt civilization" therein.

First, the distribution characteristics of the two sets of learning materials, such as the frequency of occurrence, the associated knowledge points and the respective content update conditions, of the identification word and sentence about the "pyramid construction technology" are analyzed, so as to determine knowledge concept update characteristics. Next, based on the knowledge concept update features, words related to the "pyramid building technique" are found in the first set of learning materials and their distribution features are determined in the word vector space. Similarly, the same operation is performed in the second set of learning materials to determine the distribution characteristics in the second word vector space. Finally, considering that the related content of the "pyramid building technology" in the two sets of learning materials may appear on different pages, the relevant feature of the knowledge point of the "pyramid building technology" is determined based on the difference of page numbers and the word vector distribution feature obtained in the step 32, that is, how to establish mutual reference and connection of the knowledge point in the two sets of learning materials.

In this way, by comparing the latest descriptions and understanding of knowledge points in different learning materials, students are able to obtain a more comprehensive and deep learning experience. The determination of knowledge concept update features helps to ensure that students are exposed to up-to-date academic research and historical knowledge. Through analysis of page number differences and word vector space, a teacher can more easily switch between different learning materials, thereby providing cross-text citations and references for students. Analysis of page states helps to understand the organization logic of different learning materials, making the learning design more accurate and orderly.

In other optional embodiments, before determining the knowledge point contact feature corresponding to the identification word and sentence for completing feature stitching in step 33, that is, based on the page number difference between the page state corresponding to the first offline learning platform pre-configured text and the page state corresponding to the second offline learning platform pre-configured text, the first relative distribution feature and the second relative distribution feature, the method further includes steps 301-303.

Step 301, determining a first text content output state of a first offline page associated with the first offline learning platform pre-configured text under an offline learning task corresponding to the first offline learning platform pre-configured text, and taking the first text content output state as a page state corresponding to the first offline learning platform pre-configured text.

Step 302, determining a second text content output state of a second offline page associated with the second offline learning platform pre-configured text under an offline learning task corresponding to the second offline learning platform pre-configured text, and taking the second text content output state as a page state corresponding to the second offline learning platform pre-configured text.

Step 303, determining a page number difference between a page state corresponding to the first offline learning platform pre-configured text and a page state corresponding to the second offline learning platform pre-configured text based on the first text content output state and the second text content output state.

In the above embodiment, the Page Status (Page Status) refers to a Status of one specific educational content Page at a time or after completion of a specific learning task. This may include a combination of textual content, images, notes, highlighting, etc. information on the page. The page number difference (Page Number Difference) refers to the number difference of corresponding pages between different learning materials or platforms at the same or related knowledge points. Since chapter ordering or arrangement may be different for different publishers, even content related to the same topic may appear on different page numbers.

Assume that the content of the unit "earth internal structure" of two sets of junior high school geosteering materials is being processed.

In step 301, the content of the first set of learning materials about "crust, mantle, and pit" is reviewed to determine the status of the page on which the portion of the content is located after completion of the learning task associated therewith. This may include text content, related charts, student notes, and the like. In step 302, the pages of the second set of learning materials concerning the internal structure of the earth are also reviewed and their output status is recorded, including the information provided by the learning materials and any additional learning aid materials. In step 303, the states of the corresponding pages in the two sets of learning materials are compared and the page number difference between them is determined. For example, a first book may introduce crust at page 25, while a second book introduces the same at page 32.

In this way, consistency of learning material content can be promoted, learning path design can be optimized, knowledge transfer capability across learning materials can be improved, and maximized utilization of learning resources can be supported.

In some examples, step 10, that is, obtaining the context derivation features between the identification words and sentences in the first offline learning platform pre-configured text that match the learning text units in the second offline learning platform pre-configured text, includes steps 11-12.

Step 11, determining a plurality of learning text unit tuples corresponding to the learning text unit matched identification words and sentences based on the identification words and sentences of the first offline learning platform preset text and the learning text unit matched identification words and sentences in the second offline learning platform preset text, wherein each learning text unit tuple comprises a first learning text unit in the learning text unit matched identification words and sentences in the first offline learning platform preset text and a second learning text unit in the learning text unit matched identification words and sentences in the second offline learning platform preset text.

And step 12, determining context deducing features corresponding to the matched identification words and sentences of the learning text units based on the distribution features of the first learning text unit and the distribution features of the second learning text unit in the plurality of learning text unit tuples.

In the above example, the learn text unit doublet (Learning Text Unit Tuple) refers to a pair of learn text units, one from the pre-configured text of the first offline learning platform and the other from the pre-configured text of the second offline learning platform. The two text units are considered to be matched, i.e. they contain the same or very similar knowledge points or information content.

Assuming that two sets of junior chemical learning materials are being processed, each from a different publishing company, it is desirable to integrate the sections therein concerning "atomic structure".

In the first set of learning materials, a paragraph describing the "electron cloud model" is found, while in the second set of learning materials, a paragraph describing the "application of quantum mechanics in atomic models" is found. Although the headings and expressions of the two paragraphs are different, they both teach the distribution of electrons within atoms. The two paragraphs are treated as a learning text unit tuple. The distribution characteristics of the two learned text units in the respective learning material, such as in which section they appear, with which other knowledge points they are closely related, etc., are then analyzed. Based on these distribution features, context-derived features between two text units are determined, e.g. they both emphasize the probability distribution of electrons, and the importance of this concept to understanding chemical bonds.

By the design, knowledge understanding of the cross-learning materials can be promoted, adaptability of learning resources is improved, and butt joint of learning contents is optimized.

In other examples, step 10, that is, obtaining the context derivation features between the identification words and sentences in the first offline learning platform pre-configured text that match the learning text units in the second offline learning platform pre-configured text, includes steps 10 a-10 c.

Step 10a, pre-configuring a first target identification word and sentence of a context deducing feature to be determined in a text for the first offline learning platform, and determining a first target sentence cluster corresponding to the first target identification word and sentence.

Step 10b, determining a context deducing feature between the first target sentence cluster and the second target sentence cluster based on a plurality of learning text unit tuples with corresponding relations in the first target sentence cluster and the second target sentence cluster, wherein the second target sentence cluster is a sentence cluster corresponding to the first target sentence cluster in the second offline learning platform pre-configured text.

Step 10c, taking the context deducing features between the first target sentence cluster and the second target sentence cluster as the context deducing features between the first target identification words and sentences in the first offline learning platform pre-configured text and the second target identification words and sentences in the second offline learning platform pre-configured text; and the second target identification words and sentences are identification words and sentences of which learning text units in the second offline learning platform pre-configured text are located in the second target sentence cluster.

In the above example, the first target sentence cluster (First Target Sentence Cluster) refers to a collection of related sentences clustered around a particular identifying sentence in the first offline learning platform pre-configured text. These sentences are grouped together as being commonly associated with a topic or concept.

It is assumed that two sets of junior middle school chemical learning materials, in particular the content of the unit "atomic structure", are being processed.

In step 10a, all sentences associated with the identification word "nucleus" are identified in the first set of learning materials to form a first target sentence cluster. This cluster of statements may include descriptions of the definition of the nucleus, its composition, its role in the atomic structure, etc. In step 10b, sentence clusters associated with "nuclei" are found in the second set of learning materials, and correspondence between learning text units in the two sets of sentence clusters is determined. A context-derived feature, i.e. how the expressions of the two sets of learning materials are related to each other at the knowledge point "nucleus", is then determined based on these correspondences. In step 10c, the up-down Wen Tuidao characteristic between the first target sentence cluster and the second target sentence cluster is used as a characteristic that measures the relationship between the "nuclei" identification words and sentences in the first set and the second set of learning materials.

The design is beneficial to establishing a bridge of knowledge points among different learning materials, so that the complementation among different learning material sources is realized.

Under some other design considerations, before obtaining the context derivation features between the identification words and sentences in the first offline learning platform pre-configured text and the identification words and sentences in the second offline learning platform pre-configured text in step 10, the method further includes steps 101-103.

Step 101, based on text thermal identification data corresponding to a first offline learning platform pre-configured text and distribution characteristics of all identification words and sentences in the first offline learning platform pre-configured text, clustering the identification words and sentences in the first offline learning platform pre-configured text, and determining sentence clusters in the first offline learning platform pre-configured text.

Step 102, based on text thermal identification data corresponding to the second offline learning platform pre-configured text and distribution characteristics of all the identification words and sentences in the second offline learning platform pre-configured text, clustering the identification words and sentences in the second offline learning platform pre-configured text, and determining sentence clusters in the second offline learning platform pre-configured text.

And step 103, determining corresponding sentence clusters in the first offline learning platform pre-configured text and the second offline learning platform pre-configured text based on a learning text unit association result of the first offline learning platform pre-configured text and the second offline learning platform pre-configured text.

The text thermal recognition data (Text Heatmap Recognition Data) refers to data obtained by analyzing the reading or interaction condition of text content, such as which parts are frequently read, annotated or marked, etc. Such data may typically be represented by a graphical thermodynamic diagram in which the shades of color represent the attention or importance of a portion of text.

Further, a Sentence Cluster (Sentence Cluster) refers to a collection of related sentences classified according to certain characteristics (e.g., topic similarity, frequency of use, etc.). The sentences in each cluster have a certain common feature that distinguishes them from sentences in other clusters.

Suppose that the content of two sets of junior middle english learning materials is being processed regarding this element of "environmental protection".

In step 101, text thermal recognition data about "environmental protection" in the first set of learning materials, such as which paragraphs are frequently read or most frequently written by students, is analyzed, and related sentences are clustered based on the data and distribution characteristics of the identified sentences, such as "recycle", "save resources". In step 102, related content in the second set of learning materials is analyzed in a similar manner, and sentence clusters are also generated, which clusters may be deployed around the topics of "renewable energy", "pollution reduction", etc. In step 103, corresponding sentence clusters in the two sets of learning materials are determined according to the sentence clusters generated in the previous two steps. For example, a cluster in a first set of books that is "recycled" may correspond to a cluster in a second set of books that is "resource efficient.

By the design, the interactivity among knowledge points can be improved, and the correlation of learning contents can be enhanced, so that the learning contents can be analyzed and integrated more accurately.

In some other possible embodiments, the method further comprises: if sentence clusters with the number of the learning text units smaller than a first threshold exist in the first offline learning platform pre-configured text or the second offline learning platform pre-configured text, expanding the sentence clusters so that the number of the learning text units in the expanded sentence clusters is not smaller than the first threshold.

Assume that the content of the section on "photosynthesis" in two sets of junior biological learning materials is being processed.

In this offline learning scenario, the first threshold means that a sentence cluster must contain at least 5 units of learning text to be considered complete. If only 3 learning text units are found in the "photosynthesis" related paragraph of the first set of learning materials, then this sentence cluster needs to be expanded. This may mean that additional sentences or paragraphs are added until the sentence clusters reach or exceed 5 units of learning text. Specific implementations may include: the definition, process, importance, influence factors and the like of the "photosynthesis" related to the current sentence cluster are identified. If the amount of such content does not reach the first threshold, then relevant information such as "photosynthesis" role in nature, relationship to respiration, etc. will continue to be found and added until the threshold requirement is reached.

In this way, by setting the first threshold, it can be ensured that each sentence cluster has enough information quantity, and it is ensured that complete knowledge point understanding is obtained. Cross-text contact may also be promoted to accommodate diverse learning needs.

In some examples, the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are in a same pre-configured text set. Based on this, the method further comprises steps 51-54.

And 51, deleting the learning text units in the first noise text set in the first offline learning platform pre-configured text based on the distribution characteristics of the first noise text set indicated by the noise discrimination point of the first offline learning platform pre-configured text.

And 52, deleting the learning text units in the second noise text set in the second offline learning platform pre-configured text based on the distribution characteristics of the second noise text set indicated by the noise discrimination point of the second offline learning platform pre-configured text.

And step 53, correlating the learning text units in the pre-configured text of the first offline learning platform with which the noise deletion is completed with the learning text units in the pre-configured text of the second offline learning platform with which the noise deletion is completed.

And 54, if the number of the associated learning text unit tuples is greater than a second threshold, determining that the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are in the same pre-configured text set.

In the above examples, the noise discrimination point of view (Noise Discrimination Perspective) refers to a standard or perspective for distinguishing valuable content from irrelevant or low quality content. In educational materials, this may involve identifying which content is filler information that is irrelevant to the learning objective, i.e. "noise". Noise Text Set (Noise Text Set) refers to a collection of Text portions identified as Noise. Text units in this set are generally considered to be either not helpful to the learning process or to interfere with learning. The threshold refers to a predetermined numerical criterion that is used to determine whether to perform an operation or make a decision. In this example, if the number of the associated learning text unit tuples exceeds the set threshold, the two sets of pre-configured texts may be considered to belong to the same pre-configured text set.

Assume that two sets of junior middle school biologicals are being processed in sections for "cellular structure and function".

In step 51, by analyzing the content in the first set of learning materials, noise information that is irrelevant to the learning objective, such as repeated definition, excessively simplified interpretation, etc., is identified, a first noise text set is formed, and is deleted from the valid learning content. In step 52, the content in the second set of learning materials is processed in a similar manner to remove noisy text elements therefrom to ensure the quality of the remaining content. In step 53, the learning text units in the two sets of noise cleaned learning materials are correlated, such as by correlating the "structure of cell membranes" in the first set of books with the "function and composition of cell membranes" in the second set of books. If in step 54, a sufficient number (exceeding a set threshold) of matched learning text element doublets can be found, it can be determined that the two sets of learning materials have sufficient similarity in teaching "cell structure and function" so they can be considered to belong to the same pre-configured text set.

In this way, the accuracy of learning material content can be improved by deleting noisy text, and consistency across learning materials can also be facilitated. By determining matched learning text units among different learning materials, a unified knowledge framework is constructed, and conversion among different learning resources is ensured. In addition, according to the association condition of the text units, personalized learning paths are designed, and the dominant content in different learning materials is combined to meet different learning requirements.

In some possible embodiments, the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are obtained from an initial set of pre-configured text. Based thereon, the method further comprises: and grouping the offline learning platform pre-configured texts to be processed based on the extracted page numbers corresponding to the offline learning platform pre-configured texts and chapter labels corresponding to the offline learning platform pre-configured texts to obtain at least one initial pre-configured text set. The difference of page numbers between the offline pages corresponding to any two offline learning platform pre-configured texts in the same initial pre-configured text set is not greater than a set difference threshold, and the difference of chapter labels corresponding to any two offline learning platform pre-configured texts is not greater than a third threshold.

In the above embodiment, the extracted page number (Extraction Page Number) refers to the number of a specific page extracted from a learning material or other learning document. These numbers are typically used to identify specific locations in the document, helping to manage and reference the learning material content. Chapter tags (Chapter labels) refer to names or numbers assigned to individual chapters in a learning material or learning material for quickly identifying and locating learning content. The Third Threshold (Third Threshold) refers to the maximum difference allowed when comparing chapter tags of different texts. This threshold is used to ensure that the chapter content of the text is pre-assembled by the grouped offline learning platform close enough to facilitate efficient comparison and integration thereof.

Assume that the content of chapters about "ancient Egypt civilization" in a plurality of sets of junior middle school history learning materials is being processed.

For example, a page number and a chapter tag containing "ancient egypt civilization" related content are extracted from each set of learning materials. And grouping related contents in the learning materials according to the page numbers and the chapter labels to form an initial pre-configured text set. For example, all pages that mention "French king" may be grouped into one group, while pages that mention "pyramid building" are grouped into another group. The page number difference between the pre-configured texts of any two offline learning platforms is not greater than a set difference threshold, and the chapter tag difference is not greater than a third threshold, so that the content correlation in the group is ensured.

By the design, the matching precision of the learning materials can be improved, the combined or compared learning contents can be ensured to be highly correlated in the theme by using the page numbers and the chapter labels as grouping basis, and uncorrelated materials are prevented from being mixed together. The difference threshold and the third threshold are set to help screen out learning materials with more consistent structure and content, so that the integrated resources are more unified and consistent. The learning materials are logically grouped according to page numbers and chapter labels, so that the links among materials of different sources can be tracked, and systematic learning is promoted.

Further, there is also provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the above-described method.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A knowledge point labeling method based on natural language processing, which is characterized by being applied to an offline training platform system, the method comprising:

2. The method of claim 1, wherein the feature stitching the identifying words and sentences in the first offline learning platform pre-configured text with the identifying words and sentences in the second offline learning platform pre-configured text based on the context derivation features corresponding to the identifying words and sentences in the learning text unit matches comprises:

3. The method of claim 1, wherein the feature stitching the identifying words and sentences in the first offline learning platform pre-configured text with the identifying words and sentences in the second offline learning platform pre-configured text based on the context derivation features corresponding to the identifying words and sentences in the learning text unit matches comprises:

4. A method according to claim 2 or 3, wherein the method further comprises:

5. The method of claim 1, wherein determining knowledge point contact features corresponding to the identification words and sentences completing feature stitching based on the distribution features of the plurality of learning text units corresponding to the identification words and sentences completing feature stitching in the first offline learning platform pre-configured text and the distribution features of the plurality of learning text units corresponding to the identification words and sentences completing feature stitching in the second offline learning platform pre-configured text comprises:

determining knowledge point contact features corresponding to the identification words and sentences for completing feature splicing based on page number differences between page states corresponding to the first offline learning platform pre-configured text and page states corresponding to the second offline learning platform pre-configured text, the first relative distribution features and the second relative distribution features;

the method further includes, before determining the knowledge point contact feature corresponding to the identification word and sentence for completing feature concatenation based on the page number difference between the page state corresponding to the first offline learning platform pre-configured text and the page state corresponding to the second offline learning platform pre-configured text, the first relative distribution feature and the second relative distribution feature, the method further includes: determining a first text content output state of a first offline page associated with the first offline learning platform pre-configured text under an offline learning task corresponding to the first offline learning platform pre-configured text, and taking the first text content output state as a page state corresponding to the first offline learning platform pre-configured text; determining a second text content output state of a second offline page associated with the second offline learning platform pre-configured text under an offline learning task corresponding to the second offline learning platform pre-configured text, and taking the second text content output state as a page state corresponding to the second offline learning platform pre-configured text; and determining a page number difference between a page state corresponding to the first offline learning platform pre-configured text and a page state corresponding to the second offline learning platform pre-configured text based on the first text content output state and the second text content output state.

6. The method of claim 1, wherein the obtaining the context derivation features between the identified words and sentences in the first offline learning platform-provisioned text that match the learned text units in the second offline learning platform-provisioned text comprises:

7. The method of claim 1, wherein the obtaining the context derivation features between the identified words and sentences in the first offline learning platform-provisioned text that match the learned text units in the second offline learning platform-provisioned text comprises:

taking the context deducing features between the first target sentence cluster and the second target sentence cluster as the context deducing features between a first target identification word and sentence in a first offline learning platform pre-configured text and a second target identification word and sentence in a second offline learning platform pre-configured text; the second target identification words and sentences are identification words and sentences of which learning text units in the second offline learning platform pre-configured text are located in the second target sentence cluster;

before obtaining the context deducing features between the identification words and sentences matched with the learning text units in the first offline learning platform preset text and the second offline learning platform preset text, the method further comprises: based on text thermal identification data corresponding to a first offline learning platform pre-configured text and distribution characteristics of all identification words and sentences in the first offline learning platform pre-configured text, clustering the identification words and sentences in the first offline learning platform pre-configured text, and determining sentence clusters in the first offline learning platform pre-configured text; based on text thermal identification data corresponding to a second offline learning platform pre-configured text and distribution characteristics of all identification words and sentences in the second offline learning platform pre-configured text, clustering the identification words and sentences in the second offline learning platform pre-configured text, and determining sentence clusters in the second offline learning platform pre-configured text; determining corresponding sentence clusters in the first offline learning platform pre-configured text and the second offline learning platform pre-configured text based on a learning text unit association result of the first offline learning platform pre-configured text and the second offline learning platform pre-configured text;

Wherein the method further comprises: if sentence clusters with the number of the learning text units smaller than a first threshold exist in the first offline learning platform pre-configured text or the second offline learning platform pre-configured text, expanding the sentence clusters so that the number of the learning text units in the expanded sentence clusters is not smaller than the first threshold.

8. The method of claim 1, wherein the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are in a same pre-configured text set; the method further comprises the steps of:

9. The method of claim 8, wherein the first offline learning platform pre-configured text and the second offline learning platform pre-configured text are obtained from an initial set of pre-configured texts; the method further comprises the steps of:

10. An offline training platform system, comprising a processor and a memory; the processor is communicatively connected to the memory, the processor being configured to read a computer program from the memory and execute the computer program to implement the method of any of claims 1-9.