CN117725148A

CN117725148A - Question-answer word library updating method based on self-learning

Info

Publication number: CN117725148A
Application number: CN202410175373.6A
Authority: CN
Inventors: 杨凯; 刘萍; 邓日晓; 彭康; 阳城; 王武杰
Original assignee: Hunan Sanxiang Bank Co Ltd
Current assignee: Hunan Sanxiang Bank Co Ltd
Priority date: 2024-02-07
Filing date: 2024-02-07
Publication date: 2024-03-19

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a self-learning-based question and answer word library updating method.

Description

Question-answer word library updating method based on self-learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a question-answer word library updating method based on self-learning.

Background

Along with popularization of various teaching videos in teaching modes in various fields at present, it is important to ask questions of the teaching videos in real time while watching the teaching videos, timely solutions are obtained, in order to provide a more convenient and efficient video learning mode, in order to update a question and answer word library in the process of asking video teaching contents, timely and efficient is needed, in order to effectively collect various questions of students and update the questions to reasonable progress positions of the teaching videos, and technicians in related fields are constantly optimized.

The efficient update of the question-answer word library of the teaching video is required to effectively compare questions asked with the existing questions in the question aggregation library, and content to be updated is added to the proper progress position of the teaching video, so that students can quickly index the questions to proper answers in any progress state of the teaching video.

For example, chinese patent application: the invention discloses a method for adaptively acquiring a voice word stock based on historical data and machine learning, which comprises the following steps: step S1, sentence-pattern classification of a semantic plane is carried out on a voice recognition result, and a moving core and a moving element related to the moving core in a voice instruction are found; s2, picking out moving elements in the voice instruction, and selecting a plurality of word libraries by combining machine learning and user history data; s3, performing syntactic plane word segmentation in the selected word stock by using a natural language processing method, evaluating the results of a plurality of word stock fields, solving the field with the highest evaluation value as an optimal result, outputting the optimal result, and updating user history data; and S4, combining the optimal result with sentence analysis of the language plane to determine the final word stock field.

The prior art has the following problems;

in the prior art, a large amount of invalid computation caused by matching computation of a question text and a question database in the teaching video question process is not considered, the matching efficiency is affected, and the question database can not be adaptively updated according to the pertinence matching of the question text and the question sample text of the question database in a certain acquisition range, so that the updating efficiency of the database is affected.

Disclosure of Invention

Therefore, the invention provides a question and answer word library updating method based on self-learning, which is used for solving the problems that the size of a question data word library cannot be adaptively adjusted according to the difficulty level of teaching contents in the prior art, and the question data word library cannot be adaptively updated according to the contrast between the overlapping degree of a question text and a question segmentation.

In order to achieve the above object, the present invention provides a method for updating a question-answer word library based on self-learning, comprising:

step S1, determining a time node of a user side for sending a preset operation instruction to a teaching video, and determining a preset time period before and after the time node as a video acquisition period, wherein the preset operation instruction comprises inputting a question text to a preset interaction component;

step S2, obtaining the suspension times and progress bar movement times of a user side aiming at the teaching video in the video acquisition period so as to calculate a content difficulty characterization coefficient of the video acquisition period;

s3, determining an acquisition range for a questioning data word bank according to the content difficulty characterization coefficient, and acquiring a questioning sample text in the questioning data word bank by taking the acquisition range as a reference;

s4, determining the maximum value of the contact ratio of the question text and each question sample text as a sample contact ratio so as to judge whether the question data word library needs to be updated or not;

step S5, selecting a mode for updating the word stock based on the sample coincidence degree, comprising,

determining the coincidence ratio of the questioning text and the teaching content text of the teaching video as the content coincidence ratio so as to determine whether the questioning text is completely updated into the questioning data word stock, wherein the teaching content text is generated according to the subtitles of the teaching video;

or screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word, wherein the screening conditions are that the coincidence ratio of the question text and the question sample text is the sample coincidence ratio.

Further, in the step S2, a content difficulty characterization coefficient of the video acquisition period is calculated according to the formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period, and P _n P is the number of video pauses in the video acquisition period _n0 For the preset reference value of the number of times of video pauses, M _p M is the number of progress bar movements in the video acquisition period _p0 And alpha is a weight coefficient of the pause times, beta is a weight coefficient of the progress bar moving times, and e is a constant.

Further, in the step S3, a question database is further built in advance, the building process includes determining a plurality of time periods for the teaching video, calling a history record of a user side sending a predetermined operation instruction for the teaching video in each time period to obtain a sample question text, building an association relation between the sample question text and a corresponding time period, and storing the association relation to the question database.

Further, in the step S3, the collection range of the questioning data word stock is determined according to the content difficulty characterization coefficient,

the acquisition range comprises a time period taking the time node as a reference, and the length of the time period is positively correlated with the content difficulty characterization coefficient.

Further, in the step S3, the process of collecting the question sample text in the question database with the collection range as a reference includes,

and extracting sample question text associated with the time period corresponding to the acquisition range from the question database.

Further, in the step S4, the process of determining whether the word stock needs to be updated includes,

comparing the sample coincidence degree with a preset sample coincidence degree threshold;

and if the sample overlap ratio is smaller than the sample overlap ratio threshold, judging that the word stock needs to be updated.

Further, in the step S5, the process of selecting a means for updating the word stock includes,

comparing the sample coincidence degree with a preset sample coincidence degree comparison value;

if the sample coincidence degree is smaller than the sample coincidence degree contrast value, determining the coincidence degree of the question text and the teaching content text of the teaching video as the content coincidence degree so as to determine whether to update the question text into the question database;

screening out a questioning sample text meeting screening conditions if the sample coincidence degree is greater than or equal to the sample coincidence degree comparison value, and comparing each questioning word of the questioning text with the questioning sample word of the questioning sample text to judge whether to replace the corresponding questioning sample word with the questioning word;

the sample coincidence degree contrast value is smaller than the sample coincidence degree threshold value.

Further, in the step S5, the content overlapping ratio of the question text and the teaching content text of the teaching video is compared with a preset content overlapping ratio threshold,

if the content overlapping ratio is smaller than the content overlapping ratio threshold value, determining to discard the question text;

and if the content overlapping degree is greater than or equal to the content overlapping degree threshold value, determining to update the question text to the question database.

Further, in the step S5, the method further includes determining that the overlap ratio of each question word of the question text and the question word of the question sample text is a word overlap ratio, comparing each word overlap ratio with a preset word overlap ratio threshold,

and if the word segmentation overlap ratio is smaller than or equal to the word segmentation overlap ratio threshold, judging that the corresponding question sample word is replaced by the question word.

Further, the method further comprises the following steps: and S6, establishing a correlation answer for the question sample text which is updated and replaced.

Compared with the prior art, the method has the beneficial effects that the video suspension times and progress bar movement times in the video acquisition period of the teaching video are obtained through determining the video acquisition period, the content difficulty characterization coefficient is calculated to adjust the acquisition range of the questioning data word bank, the questioning sample text in the acquisition range is extracted, whether the questioning data word bank needs to be updated or not is judged through the maximum value of the coincidence ratio of the questioning text and each questioning sample text, and an updating mode of the word bank is selected, answer indexes are built for the updated or replaced questioning sample text, so that the size of the questioning data word bank is adjusted according to the difficulty degree adaptability of teaching content, the questioning data word bank is updated according to the coincidence ratio comparison condition adaptability of the questioning text and questioning segmentation, and the self-learning updating efficiency of the word bank is improved.

In particular, the invention calculates the content difficulty characterization coefficient of the video acquisition period by acquiring the video pause times and the progress bar moving times in the video acquisition period of the teaching video, and in the actual playing process of the teaching video, a learner generally pauses the teaching video to think and record the teaching content and can move the progress bar of the video to repeatedly learn the difficult-to-understand content.

In particular, the invention selects the collection range of different sizes of the question data word bank according to the difficulty degree of the video progress range where the question time node is located, in the actual situation, the question text is compared with the question data word bank of the whole video, a large amount of matching calculation affects the efficiency of the matching degree judgment, and a large amount of calculation resource waste is caused.

In particular, the maximum value of the overlap ratio of each question sample text in the question text and the question data word stock is determined as the sample overlap ratio so as to judge whether the question data word stock needs to be updated, in the actual situation, the question text with the sample overlap ratio meeting the preset condition is extremely high in overlap ratio, so that the question data word stock does not need to be updated, answer indexes can be directly carried out, the question text with the sample overlap ratio not meeting the preset condition is possibly not a problem of the same type because the question text and the question sample text are not the same type, the question data word stock needs to be updated in a further situation-dividing manner, and further, the question data word stock is updated according to the suitability of the overlap ratio of the question text and the question word-dividing condition, and the self-learning updating efficiency of the word stock is improved.

In particular, the invention calculates the coincidence degree of the questioning text and the teaching content text of the teaching video under the condition that the coincidence degree of the questioning text and each questioning sample text in a questioning data word bank meets the standard, and in actual condition, updates the questioning text with extremely low coincidence degree with the teaching content into the questioning data word bank, which can lead to invalid data redundancy in the questioning data word bank and cause the waste of storage resources of the questioning data word bank, conversely, the invention can update the questioning text into the questioning data word bank as a whole as long as the coincidence degree of the questioning text and the teaching content text meets the preset standard, which indicates that the questioning is not a questioning with no relation with the teaching content, thereby realizing the updating of the questioning data word bank according to the coincidence degree comparison condition of the questioning text and the questioning segmentation, and improving the self-learning updating efficiency of the word bank.

In particular, in the invention, under the condition that the overlap ratio of the questioning text and each questioning sample text is higher, the difference of the questioning itself is probably not large, only a part of fragments are different, and the part of fragments need to be replaced and updated, the invention carries out questioning segmentation on the questioning text according to the condition, carries out overlap ratio calculation on the questioning text and the questioning sample segmentation, replaces the questioning sample segmentation with the questioning segmentation, and further realizes that the questioning data word bank is adaptively updated according to the overlap ratio comparison condition of the questioning text and the questioning segmentation, and improves the self-learning updating efficiency of the word bank.

Drawings

FIG. 1 is a step diagram of a self-learning based question and answer thesaurus updating method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of determining a video acquisition period according to an embodiment of the present invention;

FIG. 3 is a logic flow diagram of a method for determining whether a word stock needs to be updated in accordance with an embodiment of the present invention;

FIG. 4 is a logic flow diagram of a manner of updating a selected pair word stock in accordance with an embodiment of the present invention;

in the figure, 1: time node, 2: a first predetermined period of time, 3: a second predetermined period of time, 4: and (5) a video acquisition period.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

Referring to fig. 1, which is a step diagram of a self-learning-based question-answer word library updating method according to an embodiment of the present invention, the self-learning-based question-answer word library updating method of the present invention includes:

step S1, determining a time node 1 for a user side to send a preset operation instruction aiming at teaching videos, and determining a preset time period before and after the time node 1 as a video acquisition period 4, wherein the preset operation instruction comprises inputting a question text to a preset interaction component;

step S2, obtaining the suspension times and progress bar movement times of a user side aiming at the teaching video in the video acquisition period 4 so as to calculate a content difficulty characterization coefficient of the video acquisition period 4;

Specifically, referring to fig. 2, which is a schematic diagram of determining a video acquisition period 4 according to an embodiment of the present invention, in the step S1, a first predetermined time period 2 before the time node 1 and a second predetermined time period 3 after the time node 1 are calibrated based on the time node 1 when a user side issues a predetermined operation instruction for a teaching video, and a time period determined by the first predetermined time period 2 and the second predetermined time period 3 together is determined as the video acquisition period 4.

Preferably, the process of determining the predetermined time period before and after the time node 1 as the video acquisition period 4 may be that the playing progress of the teaching video may be represented according to a playing time point of the learning video, for example, at a playing progress position of the teaching video to 20 minutes, the user side inputs a question text to a preset interaction component for the teaching video, in this embodiment, the playing progress position of the teaching video may be predetermined to be a time period from 15 minutes to 25 minutes, and the time period is determined to be the video acquisition period 4, and of course, time periods with different lengths may be predetermined.

Specifically, the specific interaction mode of the interaction component in step S1 is not limited in the present invention, and preferably, in the embodiment of the present invention, the user may trigger to pop up the interaction input window by clicking a button or a menu item preset on the playing interface, and input a question text to the interaction window through a keyboard and a mouse, so as to implement the input of the question text through the interaction component.

In particular, the specific manner of acquiring the suspension times and the progress bar movement times of the user side for the teaching Video in the step S2 is not limited, and preferably, in the embodiment of the present invention, the suspension times and the progress bar movement times of the user side for the teaching Video can be captured by adding an event monitor in the Video playing background, the progress bar stop and the movement events of the player can be monitored by selecting different programming languages, and the times of triggering the progress bar stop and the movement events are counted, so that the times of stopping and the movement of the progress bar in a certain period of time can be known, for example, the monitoring process of events such as Video playing, suspension, progress bar movement and the like can be judged by using Javascript, and the technology is widely used in the related fields of acquiring Video content and viewing behaviors and is not repeated herein.

Specifically, the invention does not limit the specific mode of calculating the coincidence ratio of different texts, for example, a TF-IDF algorithm can be used to represent text sentences, then cosine similarity between TF-IDF vectors of two text sentences is calculated to measure the coincidence ratio between the two text sentences, the common practice is to divide the sentences into words, count TF-IDF values of each word as weights of the words in the sentences, and then form weights of all the words into a vector representation sentence, or an LSTM model can be adopted, and the LSTM model can be used to calculate the coincidence ratio between the text sentences by inputting the two text sentences into two independent LSTM networks and utilizing the output of the networks, which is not repeated herein.

Specifically, the present invention does not limit the calculation manner of the overlap ratio between the words, in this embodiment, the semantic similarity between the words is determined as the overlap ratio, it can be understood that determining the semantic similarity between the words is an important task in natural language processing, and a pre-trained Word vector model (e.g. Word2Vec, gloVe, fastText) can be used to convert each Word into a vector representation, and then calculate the similarity between Word vectors. The similarity between word vectors may be measured using cosine similarity, euclidean distance, etc. metrics.

Specifically, the invention does not limit the word segmentation mode of the question text and the question sample text in the step S5, the word segmentation is a basic step of text mining, spaces are arranged among words of the english text, word segmentation can be performed according to the spaces, sometimes, a plurality of words are considered together according to semantics, for example, "NewYork" needs to be processed as one word segmentation, statistical probability can be established for the word segmentation of the chinese question text by using a corpus, the optimal word segmentation can be realized by calculating joint distribution probability corresponding to various word segmentation methods, the word segmentation method of an N-element model can be selected, the optimal word segmentation mode can be solved based on a viterbi algorithm, preferably, in the embodiment of the invention, the word segmentation process can be performed on the english text by using an nltk word segmentation tool, the word segmentation process is performed on the chinese text by using a jieba word segmentation tool, and the jieba word segmentation tool is widely used in the existing translation, homophonic translation and speech recognition tools.

Specifically, in the present invention, the manner of establishing the associated answer for the question sample text that completes the update and the replacement may be to establish an answer text corresponding to each question text, and in the prior art, a relational database, such as Microsoft SQL Server, may be constructed, and a NoSQL database may also be constructed, which is widely used by those skilled in the relevant arts of databases, and will not be repeated herein.

Specifically, in the step S2, the content difficulty characterization coefficient of the video acquisition period 4 is calculated according to the formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period 4, and P _n P is the number of video pauses in video acquisition period 4 _n0 For the preset reference value of the number of times of video pauses, M _p For the number of progress bar movements in video acquisition period 4, M _p0 For a preset reference value of the number of movements of the progress bar, α is a weight coefficient of the number of pauses, β is a weight coefficient of the number of movements of the progress bar, and e is a constant, where α+β=1.

Preferably, in the embodiment of the present invention, the preset reference value P for the number of video pauses _n0 Determining based on the total playing times T of the teaching video, wherein P _n0 =k ₁ ×T，k ₁ To take the value factor k for the number of pauses ₁ The value range of (5) is [0.4,0.6 ]]Preset progress bar movement times reference value M _p0 Also based on the total playing times T of the teaching video, wherein M _p0 =k ₂ ×T，k ₂ Taking a value factor k for the number of times of movement of the progress bar ₂ The value range of (5) is [0.25,0.6 ]]。

Specifically, the invention calculates the content difficulty characterization coefficient of the video acquisition period 4 by acquiring the video pause times and the progress bar moving times in the video acquisition period 4 of the teaching video, and in the actual playing process of the teaching video, a learner generally pauses the teaching video to think and record the teaching content and moves the progress bar of the video to repeatedly learn the difficult-to-understand content.

Specifically, in the step S3, a question database is further built in advance, the building process includes determining a plurality of time periods for the teaching video, calling a history record of a user side sending a predetermined operation instruction for the teaching video in each time period, so as to obtain a sample question text, building an association relation between the sample question text and a corresponding time period, and storing the association relation to the question database.

Specifically, the question database collects and sorts various question sample texts, the question sample texts comprise question sample texts which are set in advance according to teaching contents, the question sample texts are updated in real time, and the question texts which are received from a user side and are insufficient in overlapping degree with the set question sample texts but meet the correlation degree conditions are replaced and updated into the question database.

In particular, in the step S3, the acquisition range of the questioning data word stock is determined according to the content difficulty characterization coefficient,

the acquisition range comprises a time period taking the time node 1 as a reference, and the length of the time period is positively correlated with the content difficulty characterization coefficient R.

Preferably, in the embodiment of the present invention, at least three acquisition range adjustment manners for adjusting an acquisition range for a question database based on the content difficulty characterization coefficient R are preset, and in step S3, the method further includes comparing the content difficulty characterization coefficient R with a preset first difficulty characterization coefficient contrast value R ₁ Second difficulty characterization coefficient contrast value R ₂ In the comparison of the two types of materials,

if R is less than or equal to R ₁ Selecting a first acquisition range adjustment mode, wherein the first acquisition range adjustment mode adjusts the acquisition range of the question database into a first acquisition range Cs ₁ Wherein Cs ₁ =Cs ₀ +ΔCs ₁ ；

If R is ₁ ＜R＜R ₂ Selecting a second collection range adjustment mode, wherein the collection range of the question database is adjusted to a second collection range Cs by the second collection range adjustment mode ₂ Wherein Cs ₂ =Cs ₀ +ΔCs ₂ ；

If R is greater than or equal to R ₂ Selecting a third collection range adjustment mode, wherein the third collection range adjustment mode adjusts the collection range of the question database into a third collection range Cs ₃ Wherein Cs ₃ =Cs ₀ +ΔCs ₃ ；

Wherein Cs ₀ For initial range values for collection ranges of the question database, cs ₀ The value of (2) may be based on the total duration C of the teaching video _t Determination of Cs ₀ =0.2×C _t ，ΔCs ₁ For the first acquisition range adjustment amount ΔCs ₂ For the second acquisition range adjustment amount ΔCs ₃ For the third acquisition range adjustment, in this embodiment, a first difficulty characterization coefficient contrast value R is preset ₁ Second difficulty characterization coefficient contrast value R ₂ To distinguish the difficulty level of contents of a predetermined period of a question time node 1 of a teaching video, R can be set ₁ The value range of (C) is [1.25,1.35), R ₂ The value range of (2) is [1.35,1.45),to make the adjustment effective and unable to be adjusted excessively, 0.2Cs can be used ₀ ≤ΔCs ₁ ＜ΔCs ₂ ＜ΔCs ₃ ≤0.8Cs ₀ 。

Specifically, the invention selects the collection range of different sizes of the question data word stock according to the difficulty level of the video progress range of the question time node 1, in the actual situation, the question text is compared with the question data word stock of the whole video, a large amount of matching calculation affects the efficiency of the matching degree judgment, and a large amount of calculation resource waste is caused.

Specifically, in the step S3, the process of collecting the question sample text in the question database with the collection range as a reference includes,

In particular, referring to fig. 3, which is a logic flow diagram of determining whether a word stock needs to be updated according to an embodiment of the present invention, in step S4, the process of determining whether the word stock needs to be updated includes,

the overlap ratio Cr of the sample ₁ With a preset sample coincidence degree threshold Cr _a Comparing;

if the overlap ratio of the sample is Cr ₁ Less than the sample overlap ratio threshold Cr _a And determining that the word stock needs to be updated.

Preferably, the sample overlap ratio threshold Cr _a The value range of (2) is 85%,95%]。

Specifically, the maximum value of the overlap ratio of each question sample text in the question text and the question data word stock is determined as the sample overlap ratio so as to judge whether the question data word stock needs to be updated, in the actual situation, the question text with the sample overlap ratio meeting the preset condition is extremely high in overlap ratio, so that the question data word stock does not need to be updated, answer indexes can be directly carried out, the question text with the sample overlap ratio not meeting the preset condition is possibly not a problem of the same type because the question text and the question sample text are not the same type, the question data word stock needs to be updated in a further division mode, and further the question data word stock is updated according to the suitability of the overlap ratio of the question text and the question segmentation word, thereby improving the self-learning updating efficiency of the word stock.

In particular, referring to fig. 4, which is a logic flow diagram of a method for selecting a word stock update according to an embodiment of the present invention, in step S5, a process for selecting a method for updating a word stock includes,

the overlap ratio Cr of the sample ₁ Contrast value Cr of coincidence with preset sample _b Comparing;

if the overlap ratio of the sample is Cr ₁ Less than the sample coincidence degree contrast value Cr _b Determining the coincidence degree of the question text and the teaching content text of the teaching video as the content coincidence degree Cr ₂ Determining whether to update the question text to the question database;

if the overlap ratio of the sample is Cr ₁ Greater than or equal to the sample overlap ratio contrast value Cr _b Screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word;

wherein the sample coincidence degree contrast value Cr _b Less than the sample overlap ratio threshold Cr _a ，Cr _b The value range of (2) is 55%,65%]。

Specifically, in the step S5, the content overlap ratio Cr of the question text and the teaching content text of the teaching video is determined ₂ With a preset content overlap threshold Cr _c In the comparison of the two types of materials,

if the content overlap ratio Cr ₂ Less than the content coincidence threshold Cr _c Determining to discard the question text;

if the content overlap ratio Cr ₂ Greater than or equal to the content overlap threshold Cr _c And determining to update all the question text into the question database.

Preferably, in an embodiment of the present invention, the content overlap ratio threshold Cr _c The value range of (2) is [10%,15%]。

Specifically, the invention calculates the coincidence degree of the questioning text and the teaching content text of the teaching video under the condition that the coincidence degree of the questioning text and each questioning sample text in a questioning data word bank meets the standard, and in actual condition, updates the questioning text with extremely low coincidence degree with the teaching content into the questioning data word bank, which can lead to invalid data redundancy in the questioning data word bank and cause the waste of storage resources of the questioning data word bank, conversely, the invention can update the questioning text into the questioning data word bank as a whole as long as the coincidence degree of the questioning text and the teaching content text meets the preset standard, which indicates that the questioning is not a questioning with no relation with the teaching content, thereby realizing the updating of the questioning data word bank according to the coincidence degree comparison condition of the questioning text and the questioning segmentation word, and improving the self-learning updating efficiency of the word bank.

Specifically, in the step S5, determining that the overlap ratio of each question word of the question text and the question word of the question sample text is the word overlap ratio Cr ₃ The overlapping degree Cr of each word segmentation ₃ And presetWord segmentation overlap threshold Cr of (2) _m In the comparison of the two types of materials,

if the word segmentation overlap ratio Cr ₃ Less than or equal to the segmentation overlap ratio threshold Cr _m And judging that the corresponding question sample word is replaced by the question word.

Preferably, in an embodiment of the present invention, the word segmentation overlap ratio threshold Cr _m The value range of (2) is 75%,80%]。

Specifically, in the state that the overlap ratio of the questioning text and each questioning sample text is high, the difference of the questioning text is probably not large, only a part of fragments are different, and the part of fragments need to be replaced and updated, the questioning text is subjected to questioning word segmentation in the state, overlap ratio calculation is carried out on the questioning text and the questioning sample word segmentation, the questioning sample word with the overlap ratio which does not reach the requirement is replaced by the questioning word segmentation, and further, the adaptively updating of the questioning data word library according to the overlap ratio comparison condition of the questioning text and the questioning word segmentation is realized, and the self-learning updating efficiency of the word library is improved.

Specifically, the method further comprises the steps of: and S6, establishing a correlation answer for the question sample text which is updated and replaced.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for updating the question-answer word library based on self-learning is characterized by comprising the following steps of:

2. The method for updating a question-answering word library based on self-learning according to claim 1, wherein in step S2, the content difficulty characterization coefficient of the video acquisition period is calculated according to formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period, and P _n P is the number of video pauses in the video acquisition period _n0 For the preset reference value of the number of times of video pauses, M _p M is the number of progress bar movements in the video acquisition period _p0 And alpha is a weight coefficient of the pause times, beta is a weight coefficient of the progress bar moving times, and e is a constant.

3. The method for updating question and answer word library based on self-learning according to claim 1, wherein the step S3 further comprises pre-constructing a question data word library, the construction process comprises determining a plurality of time periods for the teaching video, calling a history record of a predetermined operation instruction sent by the user terminal for the teaching video in each time period to obtain a sample question text, constructing an association relation between the sample question text and the corresponding time period, and storing the association relation to the question data word library.

4. The method for updating a question and answer lexicon based on self-learning of claim 3, wherein in the step S3, the collection range for the question database is determined according to the content difficulty characterization coefficient,

5. The method of claim 4, wherein the step of collecting the question sample text in the question database based on the collection range in step S3 includes,

6. The method for updating a word stock of question and answer based on self-learning according to claim 1, wherein the step S4 is a step of determining whether the word stock needs to be updated,

7. The method for updating a word stock of question and answer based on self-learning according to claim 6, wherein the step S5 comprises the steps of selecting a means for updating the word stock,

8. The method for updating a question-answer word library based on self-learning according to claim 7, wherein in step S5, the content coincidence degree of the question text and the teaching content text of the teaching video is compared with a preset content coincidence degree threshold,

9. The method for updating a question and answer word library based on self-learning according to claim 8, wherein in step S5, the method further comprises determining the overlap ratio of each question word of the question text and the question sample word of the question sample text as the word overlap ratio, comparing each word overlap ratio with a preset word overlap ratio threshold,

10. The self-learning based question-answer thesaurus updating method according to claim 1, further comprising: and S6, establishing a correlation answer for the question sample text which is updated and replaced.