CN117725148A - Question-answer word library updating method based on self-learning - Google Patents

Question-answer word library updating method based on self-learning Download PDF

Info

Publication number
CN117725148A
CN117725148A CN202410175373.6A CN202410175373A CN117725148A CN 117725148 A CN117725148 A CN 117725148A CN 202410175373 A CN202410175373 A CN 202410175373A CN 117725148 A CN117725148 A CN 117725148A
Authority
CN
China
Prior art keywords
question
text
sample
word
questioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410175373.6A
Other languages
Chinese (zh)
Inventor
杨凯
刘萍
邓日晓
彭康
阳城
王武杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Sanxiang Bank Co Ltd
Original Assignee
Hunan Sanxiang Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Sanxiang Bank Co Ltd filed Critical Hunan Sanxiang Bank Co Ltd
Priority to CN202410175373.6A priority Critical patent/CN117725148A/en
Publication of CN117725148A publication Critical patent/CN117725148A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a self-learning-based question and answer word library updating method.

Description

Question-answer word library updating method based on self-learning
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a question-answer word library updating method based on self-learning.
Background
Along with popularization of various teaching videos in teaching modes in various fields at present, it is important to ask questions of the teaching videos in real time while watching the teaching videos, timely solutions are obtained, in order to provide a more convenient and efficient video learning mode, in order to update a question and answer word library in the process of asking video teaching contents, timely and efficient is needed, in order to effectively collect various questions of students and update the questions to reasonable progress positions of the teaching videos, and technicians in related fields are constantly optimized.
The efficient update of the question-answer word library of the teaching video is required to effectively compare questions asked with the existing questions in the question aggregation library, and content to be updated is added to the proper progress position of the teaching video, so that students can quickly index the questions to proper answers in any progress state of the teaching video.
For example, chinese patent application: the invention discloses a method for adaptively acquiring a voice word stock based on historical data and machine learning, which comprises the following steps: step S1, sentence-pattern classification of a semantic plane is carried out on a voice recognition result, and a moving core and a moving element related to the moving core in a voice instruction are found; s2, picking out moving elements in the voice instruction, and selecting a plurality of word libraries by combining machine learning and user history data; s3, performing syntactic plane word segmentation in the selected word stock by using a natural language processing method, evaluating the results of a plurality of word stock fields, solving the field with the highest evaluation value as an optimal result, outputting the optimal result, and updating user history data; and S4, combining the optimal result with sentence analysis of the language plane to determine the final word stock field.
The prior art has the following problems;
in the prior art, a large amount of invalid computation caused by matching computation of a question text and a question database in the teaching video question process is not considered, the matching efficiency is affected, and the question database can not be adaptively updated according to the pertinence matching of the question text and the question sample text of the question database in a certain acquisition range, so that the updating efficiency of the database is affected.
Disclosure of Invention
Therefore, the invention provides a question and answer word library updating method based on self-learning, which is used for solving the problems that the size of a question data word library cannot be adaptively adjusted according to the difficulty level of teaching contents in the prior art, and the question data word library cannot be adaptively updated according to the contrast between the overlapping degree of a question text and a question segmentation.
In order to achieve the above object, the present invention provides a method for updating a question-answer word library based on self-learning, comprising:
step S1, determining a time node of a user side for sending a preset operation instruction to a teaching video, and determining a preset time period before and after the time node as a video acquisition period, wherein the preset operation instruction comprises inputting a question text to a preset interaction component;
step S2, obtaining the suspension times and progress bar movement times of a user side aiming at the teaching video in the video acquisition period so as to calculate a content difficulty characterization coefficient of the video acquisition period;
s3, determining an acquisition range for a questioning data word bank according to the content difficulty characterization coefficient, and acquiring a questioning sample text in the questioning data word bank by taking the acquisition range as a reference;
s4, determining the maximum value of the contact ratio of the question text and each question sample text as a sample contact ratio so as to judge whether the question data word library needs to be updated or not;
step S5, selecting a mode for updating the word stock based on the sample coincidence degree, comprising,
determining the coincidence ratio of the questioning text and the teaching content text of the teaching video as the content coincidence ratio so as to determine whether the questioning text is completely updated into the questioning data word stock, wherein the teaching content text is generated according to the subtitles of the teaching video;
or screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word, wherein the screening conditions are that the coincidence ratio of the question text and the question sample text is the sample coincidence ratio.
Further, in the step S2, a content difficulty characterization coefficient of the video acquisition period is calculated according to the formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period, and P n P is the number of video pauses in the video acquisition period n0 For the preset reference value of the number of times of video pauses, M p M is the number of progress bar movements in the video acquisition period p0 And alpha is a weight coefficient of the pause times, beta is a weight coefficient of the progress bar moving times, and e is a constant.
Further, in the step S3, a question database is further built in advance, the building process includes determining a plurality of time periods for the teaching video, calling a history record of a user side sending a predetermined operation instruction for the teaching video in each time period to obtain a sample question text, building an association relation between the sample question text and a corresponding time period, and storing the association relation to the question database.
Further, in the step S3, the collection range of the questioning data word stock is determined according to the content difficulty characterization coefficient,
the acquisition range comprises a time period taking the time node as a reference, and the length of the time period is positively correlated with the content difficulty characterization coefficient.
Further, in the step S3, the process of collecting the question sample text in the question database with the collection range as a reference includes,
and extracting sample question text associated with the time period corresponding to the acquisition range from the question database.
Further, in the step S4, the process of determining whether the word stock needs to be updated includes,
comparing the sample coincidence degree with a preset sample coincidence degree threshold;
and if the sample overlap ratio is smaller than the sample overlap ratio threshold, judging that the word stock needs to be updated.
Further, in the step S5, the process of selecting a means for updating the word stock includes,
comparing the sample coincidence degree with a preset sample coincidence degree comparison value;
if the sample coincidence degree is smaller than the sample coincidence degree contrast value, determining the coincidence degree of the question text and the teaching content text of the teaching video as the content coincidence degree so as to determine whether to update the question text into the question database;
screening out a questioning sample text meeting screening conditions if the sample coincidence degree is greater than or equal to the sample coincidence degree comparison value, and comparing each questioning word of the questioning text with the questioning sample word of the questioning sample text to judge whether to replace the corresponding questioning sample word with the questioning word;
the sample coincidence degree contrast value is smaller than the sample coincidence degree threshold value.
Further, in the step S5, the content overlapping ratio of the question text and the teaching content text of the teaching video is compared with a preset content overlapping ratio threshold,
if the content overlapping ratio is smaller than the content overlapping ratio threshold value, determining to discard the question text;
and if the content overlapping degree is greater than or equal to the content overlapping degree threshold value, determining to update the question text to the question database.
Further, in the step S5, the method further includes determining that the overlap ratio of each question word of the question text and the question word of the question sample text is a word overlap ratio, comparing each word overlap ratio with a preset word overlap ratio threshold,
and if the word segmentation overlap ratio is smaller than or equal to the word segmentation overlap ratio threshold, judging that the corresponding question sample word is replaced by the question word.
Further, the method further comprises the following steps: and S6, establishing a correlation answer for the question sample text which is updated and replaced.
Compared with the prior art, the method has the beneficial effects that the video suspension times and progress bar movement times in the video acquisition period of the teaching video are obtained through determining the video acquisition period, the content difficulty characterization coefficient is calculated to adjust the acquisition range of the questioning data word bank, the questioning sample text in the acquisition range is extracted, whether the questioning data word bank needs to be updated or not is judged through the maximum value of the coincidence ratio of the questioning text and each questioning sample text, and an updating mode of the word bank is selected, answer indexes are built for the updated or replaced questioning sample text, so that the size of the questioning data word bank is adjusted according to the difficulty degree adaptability of teaching content, the questioning data word bank is updated according to the coincidence ratio comparison condition adaptability of the questioning text and questioning segmentation, and the self-learning updating efficiency of the word bank is improved.
In particular, the invention calculates the content difficulty characterization coefficient of the video acquisition period by acquiring the video pause times and the progress bar moving times in the video acquisition period of the teaching video, and in the actual playing process of the teaching video, a learner generally pauses the teaching video to think and record the teaching content and can move the progress bar of the video to repeatedly learn the difficult-to-understand content.
In particular, the invention selects the collection range of different sizes of the question data word bank according to the difficulty degree of the video progress range where the question time node is located, in the actual situation, the question text is compared with the question data word bank of the whole video, a large amount of matching calculation affects the efficiency of the matching degree judgment, and a large amount of calculation resource waste is caused.
In particular, the maximum value of the overlap ratio of each question sample text in the question text and the question data word stock is determined as the sample overlap ratio so as to judge whether the question data word stock needs to be updated, in the actual situation, the question text with the sample overlap ratio meeting the preset condition is extremely high in overlap ratio, so that the question data word stock does not need to be updated, answer indexes can be directly carried out, the question text with the sample overlap ratio not meeting the preset condition is possibly not a problem of the same type because the question text and the question sample text are not the same type, the question data word stock needs to be updated in a further situation-dividing manner, and further, the question data word stock is updated according to the suitability of the overlap ratio of the question text and the question word-dividing condition, and the self-learning updating efficiency of the word stock is improved.
In particular, the invention calculates the coincidence degree of the questioning text and the teaching content text of the teaching video under the condition that the coincidence degree of the questioning text and each questioning sample text in a questioning data word bank meets the standard, and in actual condition, updates the questioning text with extremely low coincidence degree with the teaching content into the questioning data word bank, which can lead to invalid data redundancy in the questioning data word bank and cause the waste of storage resources of the questioning data word bank, conversely, the invention can update the questioning text into the questioning data word bank as a whole as long as the coincidence degree of the questioning text and the teaching content text meets the preset standard, which indicates that the questioning is not a questioning with no relation with the teaching content, thereby realizing the updating of the questioning data word bank according to the coincidence degree comparison condition of the questioning text and the questioning segmentation, and improving the self-learning updating efficiency of the word bank.
In particular, in the invention, under the condition that the overlap ratio of the questioning text and each questioning sample text is higher, the difference of the questioning itself is probably not large, only a part of fragments are different, and the part of fragments need to be replaced and updated, the invention carries out questioning segmentation on the questioning text according to the condition, carries out overlap ratio calculation on the questioning text and the questioning sample segmentation, replaces the questioning sample segmentation with the questioning segmentation, and further realizes that the questioning data word bank is adaptively updated according to the overlap ratio comparison condition of the questioning text and the questioning segmentation, and improves the self-learning updating efficiency of the word bank.
Drawings
FIG. 1 is a step diagram of a self-learning based question and answer thesaurus updating method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of determining a video acquisition period according to an embodiment of the present invention;
FIG. 3 is a logic flow diagram of a method for determining whether a word stock needs to be updated in accordance with an embodiment of the present invention;
FIG. 4 is a logic flow diagram of a manner of updating a selected pair word stock in accordance with an embodiment of the present invention;
in the figure, 1: time node, 2: a first predetermined period of time, 3: a second predetermined period of time, 4: and (5) a video acquisition period.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, which is a step diagram of a self-learning-based question-answer word library updating method according to an embodiment of the present invention, the self-learning-based question-answer word library updating method of the present invention includes:
step S1, determining a time node 1 for a user side to send a preset operation instruction aiming at teaching videos, and determining a preset time period before and after the time node 1 as a video acquisition period 4, wherein the preset operation instruction comprises inputting a question text to a preset interaction component;
step S2, obtaining the suspension times and progress bar movement times of a user side aiming at the teaching video in the video acquisition period 4 so as to calculate a content difficulty characterization coefficient of the video acquisition period 4;
s3, determining an acquisition range for a questioning data word bank according to the content difficulty characterization coefficient, and acquiring a questioning sample text in the questioning data word bank by taking the acquisition range as a reference;
s4, determining the maximum value of the contact ratio of the question text and each question sample text as a sample contact ratio so as to judge whether the question data word library needs to be updated or not;
step S5, selecting a mode for updating the word stock based on the sample coincidence degree, comprising,
determining the coincidence ratio of the questioning text and the teaching content text of the teaching video as the content coincidence ratio so as to determine whether the questioning text is completely updated into the questioning data word stock, wherein the teaching content text is generated according to the subtitles of the teaching video;
or screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word, wherein the screening conditions are that the coincidence ratio of the question text and the question sample text is the sample coincidence ratio.
Specifically, referring to fig. 2, which is a schematic diagram of determining a video acquisition period 4 according to an embodiment of the present invention, in the step S1, a first predetermined time period 2 before the time node 1 and a second predetermined time period 3 after the time node 1 are calibrated based on the time node 1 when a user side issues a predetermined operation instruction for a teaching video, and a time period determined by the first predetermined time period 2 and the second predetermined time period 3 together is determined as the video acquisition period 4.
Preferably, the process of determining the predetermined time period before and after the time node 1 as the video acquisition period 4 may be that the playing progress of the teaching video may be represented according to a playing time point of the learning video, for example, at a playing progress position of the teaching video to 20 minutes, the user side inputs a question text to a preset interaction component for the teaching video, in this embodiment, the playing progress position of the teaching video may be predetermined to be a time period from 15 minutes to 25 minutes, and the time period is determined to be the video acquisition period 4, and of course, time periods with different lengths may be predetermined.
Specifically, the specific interaction mode of the interaction component in step S1 is not limited in the present invention, and preferably, in the embodiment of the present invention, the user may trigger to pop up the interaction input window by clicking a button or a menu item preset on the playing interface, and input a question text to the interaction window through a keyboard and a mouse, so as to implement the input of the question text through the interaction component.
In particular, the specific manner of acquiring the suspension times and the progress bar movement times of the user side for the teaching Video in the step S2 is not limited, and preferably, in the embodiment of the present invention, the suspension times and the progress bar movement times of the user side for the teaching Video can be captured by adding an event monitor in the Video playing background, the progress bar stop and the movement events of the player can be monitored by selecting different programming languages, and the times of triggering the progress bar stop and the movement events are counted, so that the times of stopping and the movement of the progress bar in a certain period of time can be known, for example, the monitoring process of events such as Video playing, suspension, progress bar movement and the like can be judged by using Javascript, and the technology is widely used in the related fields of acquiring Video content and viewing behaviors and is not repeated herein.
Specifically, the invention does not limit the specific mode of calculating the coincidence ratio of different texts, for example, a TF-IDF algorithm can be used to represent text sentences, then cosine similarity between TF-IDF vectors of two text sentences is calculated to measure the coincidence ratio between the two text sentences, the common practice is to divide the sentences into words, count TF-IDF values of each word as weights of the words in the sentences, and then form weights of all the words into a vector representation sentence, or an LSTM model can be adopted, and the LSTM model can be used to calculate the coincidence ratio between the text sentences by inputting the two text sentences into two independent LSTM networks and utilizing the output of the networks, which is not repeated herein.
Specifically, the present invention does not limit the calculation manner of the overlap ratio between the words, in this embodiment, the semantic similarity between the words is determined as the overlap ratio, it can be understood that determining the semantic similarity between the words is an important task in natural language processing, and a pre-trained Word vector model (e.g. Word2Vec, gloVe, fastText) can be used to convert each Word into a vector representation, and then calculate the similarity between Word vectors. The similarity between word vectors may be measured using cosine similarity, euclidean distance, etc. metrics.
Specifically, the invention does not limit the word segmentation mode of the question text and the question sample text in the step S5, the word segmentation is a basic step of text mining, spaces are arranged among words of the english text, word segmentation can be performed according to the spaces, sometimes, a plurality of words are considered together according to semantics, for example, "NewYork" needs to be processed as one word segmentation, statistical probability can be established for the word segmentation of the chinese question text by using a corpus, the optimal word segmentation can be realized by calculating joint distribution probability corresponding to various word segmentation methods, the word segmentation method of an N-element model can be selected, the optimal word segmentation mode can be solved based on a viterbi algorithm, preferably, in the embodiment of the invention, the word segmentation process can be performed on the english text by using an nltk word segmentation tool, the word segmentation process is performed on the chinese text by using a jieba word segmentation tool, and the jieba word segmentation tool is widely used in the existing translation, homophonic translation and speech recognition tools.
Specifically, in the present invention, the manner of establishing the associated answer for the question sample text that completes the update and the replacement may be to establish an answer text corresponding to each question text, and in the prior art, a relational database, such as Microsoft SQL Server, may be constructed, and a NoSQL database may also be constructed, which is widely used by those skilled in the relevant arts of databases, and will not be repeated herein.
Specifically, in the step S2, the content difficulty characterization coefficient of the video acquisition period 4 is calculated according to the formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period 4, and P n P is the number of video pauses in video acquisition period 4 n0 For the preset reference value of the number of times of video pauses, M p For the number of progress bar movements in video acquisition period 4, M p0 For a preset reference value of the number of movements of the progress bar, α is a weight coefficient of the number of pauses, β is a weight coefficient of the number of movements of the progress bar, and e is a constant, where α+β=1.
Preferably, in the embodiment of the present invention, the preset reference value P for the number of video pauses n0 Determining based on the total playing times T of the teaching video, wherein P n0 =k 1 ×T,k 1 To take the value factor k for the number of pauses 1 The value range of (5) is [0.4,0.6 ]]Preset progress bar movement times reference value M p0 Also based on the total playing times T of the teaching video, wherein M p0 =k 2 ×T,k 2 Taking a value factor k for the number of times of movement of the progress bar 2 The value range of (5) is [0.25,0.6 ]]。
Specifically, the invention calculates the content difficulty characterization coefficient of the video acquisition period 4 by acquiring the video pause times and the progress bar moving times in the video acquisition period 4 of the teaching video, and in the actual playing process of the teaching video, a learner generally pauses the teaching video to think and record the teaching content and moves the progress bar of the video to repeatedly learn the difficult-to-understand content.
Specifically, in the step S3, a question database is further built in advance, the building process includes determining a plurality of time periods for the teaching video, calling a history record of a user side sending a predetermined operation instruction for the teaching video in each time period, so as to obtain a sample question text, building an association relation between the sample question text and a corresponding time period, and storing the association relation to the question database.
Specifically, the question database collects and sorts various question sample texts, the question sample texts comprise question sample texts which are set in advance according to teaching contents, the question sample texts are updated in real time, and the question texts which are received from a user side and are insufficient in overlapping degree with the set question sample texts but meet the correlation degree conditions are replaced and updated into the question database.
In particular, in the step S3, the acquisition range of the questioning data word stock is determined according to the content difficulty characterization coefficient,
the acquisition range comprises a time period taking the time node 1 as a reference, and the length of the time period is positively correlated with the content difficulty characterization coefficient R.
Preferably, in the embodiment of the present invention, at least three acquisition range adjustment manners for adjusting an acquisition range for a question database based on the content difficulty characterization coefficient R are preset, and in step S3, the method further includes comparing the content difficulty characterization coefficient R with a preset first difficulty characterization coefficient contrast value R 1 Second difficulty characterization coefficient contrast value R 2 In the comparison of the two types of materials,
if R is less than or equal to R 1 Selecting a first acquisition range adjustment mode, wherein the first acquisition range adjustment mode adjusts the acquisition range of the question database into a first acquisition range Cs 1 Wherein Cs 1 =Cs 0 +ΔCs 1
If R is 1 <R<R 2 Selecting a second collection range adjustment mode, wherein the collection range of the question database is adjusted to a second collection range Cs by the second collection range adjustment mode 2 Wherein Cs 2 =Cs 0 +ΔCs 2
If R is greater than or equal to R 2 Selecting a third collection range adjustment mode, wherein the third collection range adjustment mode adjusts the collection range of the question database into a third collection range Cs 3 Wherein Cs 3 =Cs 0 +ΔCs 3
Wherein Cs 0 For initial range values for collection ranges of the question database, cs 0 The value of (2) may be based on the total duration C of the teaching video t Determination of Cs 0 =0.2×C t ,ΔCs 1 For the first acquisition range adjustment amount ΔCs 2 For the second acquisition range adjustment amount ΔCs 3 For the third acquisition range adjustment, in this embodiment, a first difficulty characterization coefficient contrast value R is preset 1 Second difficulty characterization coefficient contrast value R 2 To distinguish the difficulty level of contents of a predetermined period of a question time node 1 of a teaching video, R can be set 1 The value range of (C) is [1.25,1.35), R 2 The value range of (2) is [1.35,1.45),to make the adjustment effective and unable to be adjusted excessively, 0.2Cs can be used 0 ≤ΔCs 1 <ΔCs 2 <ΔCs 3 ≤0.8Cs 0
Specifically, the invention selects the collection range of different sizes of the question data word stock according to the difficulty level of the video progress range of the question time node 1, in the actual situation, the question text is compared with the question data word stock of the whole video, a large amount of matching calculation affects the efficiency of the matching degree judgment, and a large amount of calculation resource waste is caused.
Specifically, in the step S3, the process of collecting the question sample text in the question database with the collection range as a reference includes,
and extracting sample question text associated with the time period corresponding to the acquisition range from the question database.
In particular, referring to fig. 3, which is a logic flow diagram of determining whether a word stock needs to be updated according to an embodiment of the present invention, in step S4, the process of determining whether the word stock needs to be updated includes,
the overlap ratio Cr of the sample 1 With a preset sample coincidence degree threshold Cr a Comparing;
if the overlap ratio of the sample is Cr 1 Less than the sample overlap ratio threshold Cr a And determining that the word stock needs to be updated.
Preferably, the sample overlap ratio threshold Cr a The value range of (2) is 85%,95%]。
Specifically, the maximum value of the overlap ratio of each question sample text in the question text and the question data word stock is determined as the sample overlap ratio so as to judge whether the question data word stock needs to be updated, in the actual situation, the question text with the sample overlap ratio meeting the preset condition is extremely high in overlap ratio, so that the question data word stock does not need to be updated, answer indexes can be directly carried out, the question text with the sample overlap ratio not meeting the preset condition is possibly not a problem of the same type because the question text and the question sample text are not the same type, the question data word stock needs to be updated in a further division mode, and further the question data word stock is updated according to the suitability of the overlap ratio of the question text and the question segmentation word, thereby improving the self-learning updating efficiency of the word stock.
In particular, referring to fig. 4, which is a logic flow diagram of a method for selecting a word stock update according to an embodiment of the present invention, in step S5, a process for selecting a method for updating a word stock includes,
the overlap ratio Cr of the sample 1 Contrast value Cr of coincidence with preset sample b Comparing;
if the overlap ratio of the sample is Cr 1 Less than the sample coincidence degree contrast value Cr b Determining the coincidence degree of the question text and the teaching content text of the teaching video as the content coincidence degree Cr 2 Determining whether to update the question text to the question database;
if the overlap ratio of the sample is Cr 1 Greater than or equal to the sample overlap ratio contrast value Cr b Screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word;
wherein the sample coincidence degree contrast value Cr b Less than the sample overlap ratio threshold Cr a ,Cr b The value range of (2) is 55%,65%]。
Specifically, in the step S5, the content overlap ratio Cr of the question text and the teaching content text of the teaching video is determined 2 With a preset content overlap threshold Cr c In the comparison of the two types of materials,
if the content overlap ratio Cr 2 Less than the content coincidence threshold Cr c Determining to discard the question text;
if the content overlap ratio Cr 2 Greater than or equal to the content overlap threshold Cr c And determining to update all the question text into the question database.
Preferably, in an embodiment of the present invention, the content overlap ratio threshold Cr c The value range of (2) is [10%,15%]。
Specifically, the invention calculates the coincidence degree of the questioning text and the teaching content text of the teaching video under the condition that the coincidence degree of the questioning text and each questioning sample text in a questioning data word bank meets the standard, and in actual condition, updates the questioning text with extremely low coincidence degree with the teaching content into the questioning data word bank, which can lead to invalid data redundancy in the questioning data word bank and cause the waste of storage resources of the questioning data word bank, conversely, the invention can update the questioning text into the questioning data word bank as a whole as long as the coincidence degree of the questioning text and the teaching content text meets the preset standard, which indicates that the questioning is not a questioning with no relation with the teaching content, thereby realizing the updating of the questioning data word bank according to the coincidence degree comparison condition of the questioning text and the questioning segmentation word, and improving the self-learning updating efficiency of the word bank.
Specifically, in the step S5, determining that the overlap ratio of each question word of the question text and the question word of the question sample text is the word overlap ratio Cr 3 The overlapping degree Cr of each word segmentation 3 And presetWord segmentation overlap threshold Cr of (2) m In the comparison of the two types of materials,
if the word segmentation overlap ratio Cr 3 Less than or equal to the segmentation overlap ratio threshold Cr m And judging that the corresponding question sample word is replaced by the question word.
Preferably, in an embodiment of the present invention, the word segmentation overlap ratio threshold Cr m The value range of (2) is 75%,80%]。
Specifically, in the state that the overlap ratio of the questioning text and each questioning sample text is high, the difference of the questioning text is probably not large, only a part of fragments are different, and the part of fragments need to be replaced and updated, the questioning text is subjected to questioning word segmentation in the state, overlap ratio calculation is carried out on the questioning text and the questioning sample word segmentation, the questioning sample word with the overlap ratio which does not reach the requirement is replaced by the questioning word segmentation, and further, the adaptively updating of the questioning data word library according to the overlap ratio comparison condition of the questioning text and the questioning word segmentation is realized, and the self-learning updating efficiency of the word library is improved.
Specifically, the method further comprises the steps of: and S6, establishing a correlation answer for the question sample text which is updated and replaced.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The method for updating the question-answer word library based on self-learning is characterized by comprising the following steps of:
step S1, determining a time node of a user side for sending a preset operation instruction to a teaching video, and determining a preset time period before and after the time node as a video acquisition period, wherein the preset operation instruction comprises inputting a question text to a preset interaction component;
step S2, obtaining the suspension times and progress bar movement times of a user side aiming at the teaching video in the video acquisition period so as to calculate a content difficulty characterization coefficient of the video acquisition period;
s3, determining an acquisition range for a questioning data word bank according to the content difficulty characterization coefficient, and acquiring a questioning sample text in the questioning data word bank by taking the acquisition range as a reference;
s4, determining the maximum value of the contact ratio of the question text and each question sample text as a sample contact ratio so as to judge whether the question data word library needs to be updated or not;
step S5, selecting a mode for updating the word stock based on the sample coincidence degree, comprising,
determining the coincidence ratio of the questioning text and the teaching content text of the teaching video as the content coincidence ratio so as to determine whether the questioning text is completely updated into the questioning data word stock, wherein the teaching content text is generated according to the subtitles of the teaching video;
or screening out a question sample text meeting screening conditions, and comparing the coincidence ratio of each question word of the question text with the question sample word of the question sample text to judge whether to replace the corresponding question sample word with the question word, wherein the screening conditions are that the coincidence ratio of the question text and the question sample text is the sample coincidence ratio.
2. The method for updating a question-answering word library based on self-learning according to claim 1, wherein in step S2, the content difficulty characterization coefficient of the video acquisition period is calculated according to formula (1),in the formula (1), R is a content difficulty characterization coefficient of a video acquisition period, and P n P is the number of video pauses in the video acquisition period n0 For the preset reference value of the number of times of video pauses, M p M is the number of progress bar movements in the video acquisition period p0 And alpha is a weight coefficient of the pause times, beta is a weight coefficient of the progress bar moving times, and e is a constant.
3. The method for updating question and answer word library based on self-learning according to claim 1, wherein the step S3 further comprises pre-constructing a question data word library, the construction process comprises determining a plurality of time periods for the teaching video, calling a history record of a predetermined operation instruction sent by the user terminal for the teaching video in each time period to obtain a sample question text, constructing an association relation between the sample question text and the corresponding time period, and storing the association relation to the question data word library.
4. The method for updating a question and answer lexicon based on self-learning of claim 3, wherein in the step S3, the collection range for the question database is determined according to the content difficulty characterization coefficient,
the acquisition range comprises a time period taking the time node as a reference, and the length of the time period is positively correlated with the content difficulty characterization coefficient.
5. The method of claim 4, wherein the step of collecting the question sample text in the question database based on the collection range in step S3 includes,
and extracting sample question text associated with the time period corresponding to the acquisition range from the question database.
6. The method for updating a word stock of question and answer based on self-learning according to claim 1, wherein the step S4 is a step of determining whether the word stock needs to be updated,
comparing the sample coincidence degree with a preset sample coincidence degree threshold;
and if the sample overlap ratio is smaller than the sample overlap ratio threshold, judging that the word stock needs to be updated.
7. The method for updating a word stock of question and answer based on self-learning according to claim 6, wherein the step S5 comprises the steps of selecting a means for updating the word stock,
comparing the sample coincidence degree with a preset sample coincidence degree comparison value;
if the sample coincidence degree is smaller than the sample coincidence degree contrast value, determining the coincidence degree of the question text and the teaching content text of the teaching video as the content coincidence degree so as to determine whether to update the question text into the question database;
screening out a questioning sample text meeting screening conditions if the sample coincidence degree is greater than or equal to the sample coincidence degree comparison value, and comparing each questioning word of the questioning text with the questioning sample word of the questioning sample text to judge whether to replace the corresponding questioning sample word with the questioning word;
the sample coincidence degree contrast value is smaller than the sample coincidence degree threshold value.
8. The method for updating a question-answer word library based on self-learning according to claim 7, wherein in step S5, the content coincidence degree of the question text and the teaching content text of the teaching video is compared with a preset content coincidence degree threshold,
if the content overlapping ratio is smaller than the content overlapping ratio threshold value, determining to discard the question text;
and if the content overlapping degree is greater than or equal to the content overlapping degree threshold value, determining to update the question text to the question database.
9. The method for updating a question and answer word library based on self-learning according to claim 8, wherein in step S5, the method further comprises determining the overlap ratio of each question word of the question text and the question sample word of the question sample text as the word overlap ratio, comparing each word overlap ratio with a preset word overlap ratio threshold,
and if the word segmentation overlap ratio is smaller than or equal to the word segmentation overlap ratio threshold, judging that the corresponding question sample word is replaced by the question word.
10. The self-learning based question-answer thesaurus updating method according to claim 1, further comprising: and S6, establishing a correlation answer for the question sample text which is updated and replaced.
CN202410175373.6A 2024-02-07 2024-02-07 Question-answer word library updating method based on self-learning Pending CN117725148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410175373.6A CN117725148A (en) 2024-02-07 2024-02-07 Question-answer word library updating method based on self-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410175373.6A CN117725148A (en) 2024-02-07 2024-02-07 Question-answer word library updating method based on self-learning

Publications (1)

Publication Number Publication Date
CN117725148A true CN117725148A (en) 2024-03-19

Family

ID=90200162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410175373.6A Pending CN117725148A (en) 2024-02-07 2024-02-07 Question-answer word library updating method based on self-learning

Country Status (1)

Country Link
CN (1) CN117725148A (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000032448A (en) * 1998-11-14 2000-06-15 구자홍 Method of constructing database for learning vocabulary and video apparatus with learning function using constructing database method and method of controlling video apparatus
US20050143999A1 (en) * 2003-12-25 2005-06-30 Yumi Ichimura Question-answering method, system, and program for answering question input by speech
JP2011123496A (en) * 1998-04-15 2011-06-23 Lg Electronics Inc Learning database building method and video apparatus with learning function by using the learning database, and learning function control method therefor
JP2012159871A (en) * 2011-01-28 2012-08-23 Nippon Telegr & Teleph Corp <Ntt> Content recognition model learning device, content recognition model learning method and content recognition model learning program
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN105930336A (en) * 2016-03-28 2016-09-07 安阳师范学院 Environmental law case storage and query system
CN108389451A (en) * 2018-03-01 2018-08-10 郑州工业应用技术学院 A kind of three base educational system of clinical medicine based on cloud platform
CN111797214A (en) * 2020-06-24 2020-10-20 深圳壹账通智能科技有限公司 FAQ database-based problem screening method and device, computer equipment and medium
CN112559723A (en) * 2020-12-28 2021-03-26 广东国粒教育技术有限公司 FAQ search type question-answer construction method and system based on deep learning
CN113157897A (en) * 2021-05-26 2021-07-23 中国平安人寿保险股份有限公司 Corpus generation method and device, computer equipment and storage medium
CN114495595A (en) * 2021-12-29 2022-05-13 河南工业贸易职业学院 Audio and video play device suitable for college's chinese on-line teaching
CN116561274A (en) * 2023-04-28 2023-08-08 广西互真科技有限公司 Knowledge question-answering method based on digital human technology and natural language big model
KR102610999B1 (en) * 2023-09-04 2023-12-07 라이트하우스(주) Method, device and system for providing search and recommendation service for video lectures based on artificial intelligence
CN117194602A (en) * 2023-09-06 2023-12-08 书音(上海)文化科技有限公司 Local knowledge base updating method and system based on large language model and BERT model

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011123496A (en) * 1998-04-15 2011-06-23 Lg Electronics Inc Learning database building method and video apparatus with learning function by using the learning database, and learning function control method therefor
KR20000032448A (en) * 1998-11-14 2000-06-15 구자홍 Method of constructing database for learning vocabulary and video apparatus with learning function using constructing database method and method of controlling video apparatus
US20050143999A1 (en) * 2003-12-25 2005-06-30 Yumi Ichimura Question-answering method, system, and program for answering question input by speech
JP2012159871A (en) * 2011-01-28 2012-08-23 Nippon Telegr & Teleph Corp <Ntt> Content recognition model learning device, content recognition model learning method and content recognition model learning program
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN105930336A (en) * 2016-03-28 2016-09-07 安阳师范学院 Environmental law case storage and query system
CN108389451A (en) * 2018-03-01 2018-08-10 郑州工业应用技术学院 A kind of three base educational system of clinical medicine based on cloud platform
CN111797214A (en) * 2020-06-24 2020-10-20 深圳壹账通智能科技有限公司 FAQ database-based problem screening method and device, computer equipment and medium
CN112559723A (en) * 2020-12-28 2021-03-26 广东国粒教育技术有限公司 FAQ search type question-answer construction method and system based on deep learning
CN113157897A (en) * 2021-05-26 2021-07-23 中国平安人寿保险股份有限公司 Corpus generation method and device, computer equipment and storage medium
CN114495595A (en) * 2021-12-29 2022-05-13 河南工业贸易职业学院 Audio and video play device suitable for college's chinese on-line teaching
CN116561274A (en) * 2023-04-28 2023-08-08 广西互真科技有限公司 Knowledge question-answering method based on digital human technology and natural language big model
KR102610999B1 (en) * 2023-09-04 2023-12-07 라이트하우스(주) Method, device and system for providing search and recommendation service for video lectures based on artificial intelligence
CN117194602A (en) * 2023-09-06 2023-12-08 书音(上海)文化科技有限公司 Local knowledge base updating method and system based on large language model and BERT model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林勇;张诗丹;: "网络学习环境中智能化答疑系统的分析研究", 重庆文理学院学报(自然科学版), no. 02, 20 June 2006 (2006-06-20) *
齐翌辰;王森淼;赵亚慧;: "基于倒排索引的问答系统的设计与实现", 教育教学论坛, no. 35, 16 August 2018 (2018-08-16) *

Similar Documents

Publication Publication Date Title
Rosé et al. Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning
CN110532348B (en) Question and answer pair data generation method and device and electronic equipment
JP5043892B2 (en) Automatic evaluation of excessive repeated word usage in essays
CN110413728B (en) Method, device, equipment and storage medium for recommending exercise problems
CN106844530A (en) Training method and device of a kind of question and answer to disaggregated model
CN107423440B (en) Question-answer context switching and reinforced selection method based on emotion analysis
US20230027526A1 (en) Method and apparatus for classifying document based on attention mechanism and semantic analysis
CN111737427B (en) Method for recommending lesson forum posts by combining forum interaction behaviors and user reading preference
CN111090735B (en) Performance evaluation method of intelligent question-answering method based on knowledge graph
CN115630613B (en) Automatic coding system and method for evaluation problems in questionnaire survey
CN112417127A (en) Method, device, equipment and medium for training conversation model and generating conversation
CN111814490A (en) Method and system for improving semantic recognition capability through robot self-learning
CN114416929A (en) Sample generation method, device, equipment and storage medium of entity recall model
CN117725148A (en) Question-answer word library updating method based on self-learning
CN112700203B (en) Intelligent marking method and device
Sankhe et al. Survey on sentiment analysis
Acheampong et al. Answer triggering of factoid questions: A cognitive approach
CN115391523A (en) Wind power plant multi-source heterogeneous data processing method and device
CN113282715A (en) Deep learning-combined big data topic comment emotion analysis method and server
CN114896975A (en) Online education intelligent teaching-aid-oriented autonomous evolution method and system
CN113448860A (en) Test case analysis method and device
CN110309285B (en) Automatic question answering method, device, electronic equipment and storage medium
CN111767404A (en) Event mining method and device
CN112015861A (en) Intelligent test paper algorithm based on user historical behavior analysis
CN110955606A (en) C language source code static scoring method based on random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination