CN111508289A - Language learning system based on word use frequency - Google Patents

Language learning system based on word use frequency Download PDF

Info

Publication number
CN111508289A
CN111508289A CN202010291647.XA CN202010291647A CN111508289A CN 111508289 A CN111508289 A CN 111508289A CN 202010291647 A CN202010291647 A CN 202010291647A CN 111508289 A CN111508289 A CN 111508289A
Authority
CN
China
Prior art keywords
word
learning
words
difficulty
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010291647.XA
Other languages
Chinese (zh)
Other versions
CN111508289B (en
Inventor
杨定生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jushi Intelligent Technology Co ltd
Original Assignee
Shanghai Jushi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jushi Intelligent Technology Co ltd filed Critical Shanghai Jushi Intelligent Technology Co ltd
Priority to CN202010291647.XA priority Critical patent/CN111508289B/en
Publication of CN111508289A publication Critical patent/CN111508289A/en
Application granted granted Critical
Publication of CN111508289B publication Critical patent/CN111508289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a language learning system based on word use frequency, which comprises a server and a user side, wherein a word frequency database, a learning resource database and a learning resource difficulty marking unit are arranged in the server, and learning resources in the learning resource database are learning resources with text contents; the word frequency database stores words of multiple languages, all the words in each language are sorted from high to low according to the frequency value of the words used in life, and the sorting sequence number of each word is marked; the learning difficulty marking unit is used for marking the difficulty of the learning resources in each input learning resource database based on the frequency value of the words; and the user side is used for calling learning resources corresponding to the difficulty value according to the acquired difficulty value suitable for the current user for the user to learn. The invention enables the user to learn the high-frequency words and then the low-frequency words to a certain extent, thereby reducing the learning difficulty.

Description

Language learning system based on word use frequency
Technical Field
The invention belongs to the technical field of education software, and particularly relates to a language learning system based on word use frequency.
Background
The English level of most college graduates is less than that of English level of 6-year-old English children in England (because 6-year-old English children can freely understand English conversations among parents, friends and teachers, can express ideas, opinions and suggestions in English, can understand English stories and speak in English, can understand English contents in media such as television and the like, but cannot be realized by most college graduates in China). The reason is that the traditional foreign language education system does not conform to the learning and cognition rules, and the specific embodiment is as follows: the learning content is unscientific, the native language environment is lacked, and the listening is too little.
1. The learning content is unscientific: foreign language words and texts in the traditional course are selected subjectively by a writer, so that a plurality of high-frequency words are not learned, and a plurality of low-frequency words are learned firstly, so that a learner spends a lot of time to memorize a large number of low-frequency words which are not used, and a large number of low-frequency words like sheet, tiger and the like are not used for one month or even a plurality of months; the study interest is reduced due to the fact that the learned low-frequency vocabulary is difficult to use in practice, a large amount of dead notes are hard to remember, the study is not easy to use, and the like, so that the study is difficult to persist for a long time;
2. lack of native language environment: in reality, foreigners are not in long-term communication, and various audios and original videos in the internet cannot be immersed in the videos for a long time due to scattering, different depths and lack of assistance;
3. the learner in foreign language listens too little due to lack of the environment of the mother language, the English listening time of 6-year-old children in the United kingdom is at least 5 hours/day × 365 days/year × 6 years-10950 hours, the English listening time of general graduates in China is 15 minutes/class × 5 lessons/week × 20 weeks/school period × 32/60 minutes/hour-800 hours (note: 45-minute class per class, 2/3 time is Chinese explanation, only 15 minutes is English), and the learner in foreign language cannot understand the old speech due to too little listening.
By the above, to learn English well, the English must be based on the application of English in real life to gradually transition to the vocabulary learning that frequency of use is low according to the vocabulary that frequency of use is high, and build a native language environment as far as possible, let the student listen to pure english more, realize immersive English study, thereby improve student's learning efficiency.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a language learning system based on word usage frequency, which is to make a user learn high-frequency words and low-frequency words to a certain extent when learning resources from low to high according to the difficulty by marking difficulty marks on the learning resources based on word usage frequency, so as to reduce the learning difficulty and arouse learning interest.
In order to solve the technical problems, the invention adopts the technical scheme that: a language learning system based on word use frequency comprises a server and a user side, wherein a word frequency database, a learning resource database and a learning resource difficulty marking unit are arranged in the server, and learning resources in the learning resource database are learning resources with text contents;
the word frequency database stores words of multiple languages, all the words in each language are sorted from high to low according to the use frequency values of the words in life, the sorting sequence number is from small to large, and the sorting sequence number of each word is marked;
the learning difficulty marking unit is used for marking difficulty information of the learning resources in each input learning resource database based on the frequency value of the words;
and the user side is used for calling corresponding learning resources according to the difficulty information for the user to learn.
In the language learning system based on word usage frequency, when the learning difficulty labeling unit performs difficulty labeling on the learning resources in each input learning resource database based on the word frequency value, the language learning system based on word usage frequency includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q;
and 6, marking the difficulty of the learning resources as [ A, B ].
In the language learning system based on word usage frequency, when the learning difficulty labeling unit performs difficulty labeling on the learning resources in each input learning resource database based on the word frequency value, the language learning system based on word usage frequency includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers;
step 6, extracting the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion;
and 7, marking the difficulty of the learning resources as [ A, B ].
In the language learning system based on word usage frequency, when the learning difficulty labeling unit performs difficulty labeling on the learning resources in each input learning resource database based on the word frequency value, the language learning system based on word usage frequency includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, counting the number of the sorting serial numbers with the value of more than 10000-12000 in the extracted sorting serial numbers, wherein the counting result is P, judging whether P exceeds 2% of the total number of the sorting serial numbers in the set Q, if so, entering a step 6, and if not, entering a step 7;
step 6, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q; the difficulty of learning resources is marked as [ A, B ];
step 7, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers; extracting a sequencing serial number B with the largest value and a sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion; the difficulty of learning resources is labeled [ A, B ].
In the language learning system based on the word use frequency, a word dictionary module is arranged in the user side;
the word dictionary module comprises a word playing unit and a word learning video playing unit;
a forward switching button for switching to the word learning video playing unit is arranged on the word playing unit;
when the word playing unit is switched to the word learning video playing unit through the forward switching button, the word learning video playing unit calls the learning video of the word being played before the word playing unit is switched to play the learning video;
a reverse switching button for switching to the word playing unit is arranged on the word learning video playing unit;
when the word learning video playing unit is switched to the word playing unit through the reverse switching button, the word playing unit plays the words contained in the video played before the word learning video playing unit is switched according to the sequence number of the searched marks in the word frequency database from small to large.
In the language learning system based on the word use frequency, a word consolidation module is arranged in the user side;
the word consolidation module is used for presenting the words needing to be consolidated to the user;
the word consolidation module comprises the following steps when presenting the words needing consolidation to a user:
acquiring all word information learned by a user within T time units from the current time point;
inquiring the marked sequencing serial numbers of all the learned words in a word frequency database;
calculating a difficulty mark value
Figure BDA0002450619200000051
XiSorting sequence numbers of the marks inquired by the words in the word frequency database; n is the number of all words learned by the user in T time units;
rank ordering of query tokens in word frequency database at Q2The words in the range of +/-K are the words needing to be consolidated, and K is more than or equal to 100 and less than or equal to 300.
In the language learning system based on word use frequency, a language level testing module is arranged in the user side;
the language level testing module is used for testing the language level of the user;
the language level testing module comprises the following steps when testing the language level of a user:
step C1, acquiring language information needed to be used for testing;
c2, calling any word of the language information ordered between 2000 and 2500 in the word frequency database;
step C3, pushing preset test questions of the words to the user, obtaining test results, and recording the number of times of testing for 1 time; if the test result is correct, testing the value +1, and entering the step C4, if the test result is wrong, testing the value-1, and entering the step C5;
step C4, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is not less than 3, if so, enabling the test value to return to 0, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 10, and then returning to the step C3;
step C5, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is less than or equal to-3, if so, enabling the test value to be 0, calling the word corresponding to the ranking value after the ranking value of the tested word is-300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is-10, and then returning to the step C3.
In the language learning system based on word use frequency, the user side calls corresponding learning resources according to the difficulty information to provide the user with the learning resources, and the language learning system comprises the following steps:
step a, acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step B, marking all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
C, acquiring all words in all learning resources currently learned by the user, and removing duplication of all words;
step d, obtaining the value of the sequence number < B in the word frequency databasemaxAll the words of (1);
step e, comparing all the words obtained after the duplication removal in the step c with all the words obtained in the step d, and screening out the words which are not learned in all the words obtained in the step d;
f, arranging the unlearned words screened in the step e from small to large according to the numerical value of the sequencing sequence number;
step g, calling the words arranged at the first digit in the step f;
step h, searching out the learning resources containing the called words in the learning resources database, and marking the difficulty of the learning resources as [ A, B ]]B in (1) is less than Bmax
And step i, pushing the learning resources searched in the step to the user.
In the language learning system based on word usage frequency, in step h, the learning resource database is searched for a learning resource containing the called word, and the difficulty label [ a, B ] of the learning resource]B in (1) is less than Bmax
If the learning resource meeting the condition is not searched, the learning resource database is searched for the learning resource containing the called word, and the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax+300。
In the language learning system based on word use frequency, the user side calls corresponding learning resources according to the difficulty information to provide the user with the learning resources, and the language learning system comprises the following steps:
step 1), acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step 2), all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
Step 3), all words in all learning resources which are currently learned by the user are obtained, and all words are deduplicated;
step 4), screening the words obtained in the step 3), and screening out the learning frequency T less than TyThe word of (a); t isyIs a threshold value;
step 5), arranging the words screened out in the step 4) from small to large according to the numerical value of the sequencing sequence number;
step 6), calling the words arranged at the first digit in the step 5);
step 7), searching out the learning resource containing the called word in the learning resource database, and marking the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax
And 8) pushing the learning resources searched out in the step 7) to the user.
Compared with the prior art, the invention has the following advantages: according to the method, the difficulty mark is marked on the learning resource based on the word use frequency, so that a user can learn high-frequency words and low-frequency words to a certain extent when learning from low to high according to the difficulty, the learning difficulty is reduced, the learning interest is stimulated, and immersive English learning is realized, so that the learning efficiency of students is improved.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
Fig. 1 is a schematic block diagram of the present invention.
Detailed Description
As shown in fig. 1, a language learning system based on word usage frequency includes a server and a user, where the server is internally provided with a word frequency database, a learning resource database and a learning resource difficulty marking unit, and learning resources in the learning resource database are learning resources with text content; such as teaching videos, PPT lectures, teaching books, periodicals, and the like;
the word frequency database stores words of multiple languages, all the words in each language are sorted from high to low according to the use frequency values of the words in life, the sorting sequence number is from small to large, and the sorting sequence number of each word is marked;
the frequency value of the words used in life is a statistical value obtained by carrying out statistics on the use frequency of words in reality after large data analysis is carried out by extracting words in contents such as language mass different scenes, different conversations, movies, courseware and the like;
the learning difficulty marking unit is used for marking difficulty information of the learning resources in each input learning resource database based on the frequency value of the words;
and the user side is used for calling corresponding learning resources according to the difficulty information for the user to learn.
In this embodiment, when the learning difficulty marking unit performs difficulty marking on the word-based frequency value of the learning resource in each input learning resource database, the method includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q;
and 6, marking the difficulty of the learning resources as [ A, B ].
In another embodiment of the present invention, when the learning difficulty labeling unit performs difficulty labeling on the frequency value of the learning resource based on the word in each input learning resource database, the method includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers;
step 6, extracting the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion;
and 7, marking the difficulty of the learning resources as [ A, B ].
In another embodiment of the present invention, when the learning difficulty labeling unit performs difficulty labeling on the frequency value of the learning resource based on the word in each input learning resource database, the method includes the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, counting the number of the sorting serial numbers with the value of more than 10000-12000 in the extracted sorting serial numbers, wherein the counting result is P, judging whether P exceeds 2% of the total number of the sorting serial numbers in the set Q, if so, entering a step 6, and if not, entering a step 7;
step 6, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q; the difficulty of learning resources is marked as [ A, B ];
step 7, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers; extracting a sequencing serial number B with the largest value and a sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion; the difficulty of learning resources is labeled [ A, B ].
In this embodiment, a word dictionary module is arranged in the user side;
the word dictionary module comprises a word playing unit and a word learning video playing unit;
a forward switching button for switching to the word learning video playing unit is arranged on the word playing unit;
when the word playing unit is switched to the word learning video playing unit through the forward switching button, the word learning video playing unit calls the learning video of the word being played before the word playing unit is switched to play the learning video;
a reverse switching button for switching to the word playing unit is arranged on the word learning video playing unit;
when the word learning video playing unit is switched to the word playing unit through the reverse switching button, the word playing unit plays the words contained in the video played before the word learning video playing unit is switched according to the sequence number of the searched marks in the word frequency database from small to large.
It should be noted that, a learning video repository for storing a learning video for each word is further provided in the server, and when the word playing unit is switched to the word learning video playing unit by the forward switching button, the learning video playing unit retrieves the learning video of the word being played before the word playing unit is switched from the learning video repository to play the learning video.
In this embodiment, a word consolidation module is arranged in the user side;
the word consolidation module is used for presenting the words needing to be consolidated to the user;
the word consolidation module comprises the following steps when presenting the words needing consolidation to a user:
acquiring all word information learned by a user within T time units from the current time point;
inquiring the marked sequencing serial numbers of all the learned words in a word frequency database;
calculating a difficulty mark value
Figure BDA0002450619200000101
XiSorting sequence numbers of the marks inquired by the words in the word frequency database; n is the number of all words learned by the user in T time units;
rank ordering of query tokens in word frequency database at Q2The words in the range of +/-K are the words needing to be consolidated, and K is more than or equal to 100 and less than or equal to 300.
In this embodiment, a language level testing module is arranged in the user side;
the language level testing module is used for testing the language level of the user;
the language level testing module comprises the following steps when testing the language level of a user:
step C1, acquiring language information needed to be used for testing;
c2, calling any word of the language information ordered between 2000 and 2500 in the word frequency database;
step C3, pushing preset test questions of the words to the user, obtaining test results, and recording the number of times of testing for 1 time; if the test result is correct, testing the value +1, and entering the step C4, if the test result is wrong, testing the value-1, and entering the step C5;
step C4, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is not less than 3, if so, enabling the test value to return to 0, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 10, and then returning to the step C3;
step C5, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is less than or equal to-3, if so, enabling the test value to be 0, calling the word corresponding to the ranking value after the ranking value of the tested word is-300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is-10, and then returning to the step C3.
In this embodiment, the user side calls the corresponding learning resource according to the difficulty information to provide the user with the learning resources, which includes the following steps:
step a, acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step B, marking all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
C, acquiring all words in all learning resources currently learned by the user, and removing duplication of all words;
step d, obtaining the value of the sequence number < B in the word frequency databasemaxAll the words of (1);
step e, comparing all the words obtained after the duplication removal in the step c with all the words obtained in the step d, and screening out the words which are not learned in all the words obtained in the step d;
f, arranging the unlearned words screened in the step e from small to large according to the numerical value of the sequencing sequence number;
step g, calling the words arranged at the first digit in the step f;
step h, searching out the learning resources containing the called words in the learning resources database, and marking the difficulty of the learning resources as [ A, B ]]B in (1) is less than Bmax
And step i, pushing the learning resources searched in the step to the user.
In this embodiment, in the step h, a learning resource database is searched for a learning resource containing the called word, and the difficulty label [ a, B ] of the learning resource is]B in (1) is less than Bmax
If the learning resource meeting the condition is not searched, the learning resource database is searched for the learning resource containing the called word, and the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax+300。
In another embodiment of the present invention, the user terminal invokes the corresponding learning resource according to the difficulty information to provide the user with the learning resources, which includes the following steps:
step 1), acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step 2), all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
Step 3), all words in all learning resources which are currently learned by the user are obtained, and all words are deduplicated;
step 4) for step 3) to obtainSelecting words and phrases, and selecting out the learning frequency T less than TyThe word of (a); t isyIs a threshold value; preferably, Ty=20;
Step 5), arranging the words screened out in the step 4) from small to large according to the numerical value of the sequencing sequence number;
step 6), calling the words arranged at the first digit in the step 5);
step 7), searching out the learning resource containing the called word in the learning resource database, and marking the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax
And 8) pushing the learning resources searched out in the step 7) to the user.
It is to be noted that by defining B < BmaxThe method and the device can ensure that the learning resources pushed to the user do not have too high difficulty, and are suitable for the current learning ability of the user.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. A language learning system based on word usage frequency, comprising: the system comprises a server and a user side, wherein a word frequency database, a learning resource database and a learning resource difficulty marking unit are arranged in the server, and learning resources in the learning resource database are learning resources with text contents;
the word frequency database stores words of multiple languages, all the words in each language are sorted from high to low according to the use frequency values of the words in life, the sorting sequence number is from small to large, and the sorting sequence number of each word is marked;
the learning difficulty marking unit is used for marking difficulty information of the learning resources in each input learning resource database based on the frequency value of the words;
and the user side is used for calling corresponding learning resources according to the difficulty information for the user to learn.
2. A language learning system based on word usage frequency according to claim 1, characterized in that: when the learning difficulty marking unit marks the difficulty of the learning resources in each input learning resource database based on the frequency value of the words, the method comprises the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q;
and 6, marking the difficulty of the learning resources as [ A, B ].
3. A language learning system based on word usage frequency according to claim 1, characterized in that: when the learning difficulty marking unit marks the difficulty of the learning resources in each input learning resource database based on the frequency value of the words, the method comprises the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers;
step 6, extracting the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion;
and 7, marking the difficulty of the learning resources as [ A, B ].
4. A language learning system based on word usage frequency according to claim 1, characterized in that: when the learning difficulty marking unit marks the difficulty of the learning resources in each input learning resource database based on the frequency value of the words, the method comprises the following steps:
step 1, acquiring text data of learning resources, and performing word segmentation processing on the text data;
step 2, carrying out duplicate removal treatment on the words obtained by segmentation;
step 3, inquiring the marked sequencing serial number of the deduplicated words in a word frequency database;
step 4, arranging the searched sequencing serial numbers from small to large to obtain a set Q;
step 5, taking the last 20-50% of the sorting serial numbers in the set Q, counting the number of the sorting serial numbers with the value of more than 10000-12000 in the extracted sorting serial numbers, wherein the counting result is P, judging whether P exceeds 2% of the total number of the sorting serial numbers in the set Q, if so, entering a step 6, and if not, entering a step 7;
step 6, taking the sequencing serial number B with the largest value and the sequencing serial number A with the smallest value from the last 20-50% of the sequencing serial numbers in the set Q; the difficulty of learning resources is marked as [ A, B ];
step 7, taking the last 20-50% of the sorting serial numbers in the set Q, and deleting the sorting serial numbers with the numerical value of more than 10000-12000 in the extracted sorting serial numbers; extracting a sequencing serial number B with the largest value and a sequencing serial number A with the smallest value from the sequencing serial numbers left after deletion; the difficulty of learning resources is labeled [ A, B ].
5. A language learning system based on word usage frequency according to claim 1, characterized in that: a word dictionary module is arranged in the user side;
the word dictionary module comprises a word playing unit and a word learning video playing unit;
a forward switching button for switching to the word learning video playing unit is arranged on the word playing unit;
when the word playing unit is switched to the word learning video playing unit through the forward switching button, the word learning video playing unit calls the learning video of the word being played before the word playing unit is switched to play the learning video;
a reverse switching button for switching to the word playing unit is arranged on the word learning video playing unit;
when the word learning video playing unit is switched to the word playing unit through the reverse switching button, the word playing unit plays the words contained in the video played before the word learning video playing unit is switched according to the sequence number of the searched marks in the word frequency database from small to large.
6. A language learning system based on word usage frequency according to claim 1, characterized in that: a word consolidation module is arranged in the user side;
the word consolidation module is used for presenting the words needing to be consolidated to the user;
the word consolidation module comprises the following steps when presenting the words needing consolidation to a user:
acquiring all word information learned by a user within T time units from the current time point;
inquiring the marked sequencing serial numbers of all the learned words in a word frequency database;
calculating a difficulty mark value
Figure FDA0002450619190000031
XiSorting sequence numbers of the marks inquired by the words in the word frequency database; n is the number of all words learned by the user in T time units;
rank ordering of query tokens in word frequency database at Q2The words in the range of +/-K are the words needing to be consolidated, and K is more than or equal to 100 and less than or equal to 300.
7. A language learning system based on word usage frequency according to claim 1, characterized in that: a language level testing module is arranged in the user side;
the language level testing module is used for testing the language level of the user;
the language level testing module comprises the following steps when testing the language level of a user:
step C1, acquiring language information needed to be used for testing;
c2, calling any word of the language information ordered between 2000 and 2500 in the word frequency database;
step C3, pushing preset test questions of the words to the user, obtaining test results, and recording the number of times of testing for 1 time; if the test result is correct, testing the value +1, and entering the step C4, if the test result is wrong, testing the value-1, and entering the step C5;
step C4, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is not less than 3, if so, enabling the test value to return to 0, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is plus 10, and then returning to the step C3;
step C5, judging whether the testing frequency reaches a threshold value alpha, wherein alpha is larger than or equal to 3, if yes, outputting the sorting serial number of the currently tested words, and ending the test; if not, judging whether the test value is less than or equal to-3, if so, enabling the test value to be 0, calling the word corresponding to the ranking value after the ranking value of the tested word is-300, then returning to the step C3, otherwise, calling the word corresponding to the ranking value after the ranking value of the tested word is-10, and then returning to the step C3.
8. A language learning system based on word usage frequency according to claim 2 or 3 or 4 characterized in that: the user side calls corresponding learning resources according to the difficulty information for the user to learn, and the method comprises the following steps:
step a, acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step B, marking all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
C, acquiring all words in all learning resources currently learned by the user, and removing duplication of all words;
step d, obtaining the value of the sequence number < B in the word frequency databasemaxAll the words of (1);
step e, comparing all the words obtained after the duplication removal in the step c with all the words obtained in the step d, and screening out the words which are not learned in all the words obtained in the step d;
f, arranging the unlearned words screened in the step e from small to large according to the numerical value of the sequencing sequence number;
step g, calling the words arranged at the first digit in the step f;
step h, searching out the learning resources containing the called words in the learning resources database, and marking the difficulty of the learning resources as [ A, B ]]B in (1) is less than Bmax
And step i, pushing the learning resources searched in the step to the user.
9. A language learning system based on word usage frequency according to claim 8 wherein: in the step h, the learning resource database is searched for the learning resource containing the called word, and the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax
If the learning resource meeting the condition is not searched, the learning resource database is searched for the learning resource containing the called word, and the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax+300。
10. A language learning system based on word usage frequency according to claim 1, characterized in that: the user side calls corresponding learning resources according to the difficulty information for the user to learn, and the method comprises the following steps:
step 1), acquiring data of all learning resource difficulty marks [ A, B ] which are learned by a user at present;
step 2), all acquired difficulty marks [ A, B ]]Screening to screen out B with the maximum valuemax
Step 3), all words in all learning resources which are currently learned by the user are obtained, and all words are deduplicated;
step 4), screening the words obtained in the step 3), and screening out the learning frequency T less than TyThe word of (a); t isyIs a threshold value;
step 5), arranging the words screened out in the step 4) from small to large according to the numerical value of the sequencing sequence number;
step 6), calling the words arranged at the first digit in the step 5);
step 7), searching out the learning resource containing the called word in the learning resource database, and marking the difficulty mark [ A, B ] of the learning resource]B in (1) is less than Bmax
And 8) pushing the learning resources searched out in the step 7) to the user.
CN202010291647.XA 2020-04-14 2020-04-14 Language learning system based on word use frequency Active CN111508289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010291647.XA CN111508289B (en) 2020-04-14 2020-04-14 Language learning system based on word use frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010291647.XA CN111508289B (en) 2020-04-14 2020-04-14 Language learning system based on word use frequency

Publications (2)

Publication Number Publication Date
CN111508289A true CN111508289A (en) 2020-08-07
CN111508289B CN111508289B (en) 2021-10-08

Family

ID=71876003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010291647.XA Active CN111508289B (en) 2020-04-14 2020-04-14 Language learning system based on word use frequency

Country Status (1)

Country Link
CN (1) CN111508289B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991122A (en) * 2021-05-10 2021-06-18 北京世纪好未来教育科技有限公司 Planning method and device for Chinese character teaching

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090197225A1 (en) * 2008-01-31 2009-08-06 Kathleen Marie Sheehan Reading level assessment method, system, and computer program product for high-stakes testing applications
CN101814066A (en) * 2009-02-23 2010-08-25 富士通株式会社 Text reading difficulty judging device and method thereof
CN101877181A (en) * 2009-04-28 2010-11-03 夏普株式会社 Be used for generating automatically the apparatus and method of personalized learning and diagnostic exercises
CN104392640A (en) * 2014-11-07 2015-03-04 曾立人 Computer assisted foreign language corpus providing method and system
US20170316708A1 (en) * 2016-04-29 2017-11-02 Rovi Guides, Inc. Systems and methods for providing word definitions based on user exposure
US20180061274A1 (en) * 2016-08-27 2018-03-01 Gereon Frahling Systems and methods for generating and delivering training scenarios
CN107943993A (en) * 2017-12-04 2018-04-20 西北民族大学 A kind of method for learning Chinese and system based on complex network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090197225A1 (en) * 2008-01-31 2009-08-06 Kathleen Marie Sheehan Reading level assessment method, system, and computer program product for high-stakes testing applications
CN101814066A (en) * 2009-02-23 2010-08-25 富士通株式会社 Text reading difficulty judging device and method thereof
CN101877181A (en) * 2009-04-28 2010-11-03 夏普株式会社 Be used for generating automatically the apparatus and method of personalized learning and diagnostic exercises
CN104392640A (en) * 2014-11-07 2015-03-04 曾立人 Computer assisted foreign language corpus providing method and system
US20170316708A1 (en) * 2016-04-29 2017-11-02 Rovi Guides, Inc. Systems and methods for providing word definitions based on user exposure
US20180061274A1 (en) * 2016-08-27 2018-03-01 Gereon Frahling Systems and methods for generating and delivering training scenarios
CN107943993A (en) * 2017-12-04 2018-04-20 西北民族大学 A kind of method for learning Chinese and system based on complex network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991122A (en) * 2021-05-10 2021-06-18 北京世纪好未来教育科技有限公司 Planning method and device for Chinese character teaching

Also Published As

Publication number Publication date
CN111508289B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
US20070112554A1 (en) System of interactive dictionary
CN108363743A (en) A kind of intelligence questions generation method, device and computer readable storage medium
Devitt Transferability and genres
Purwanto et al. LISTENING COMPREHENSION STUDY: DIFFICULTIES AND STRATEGIES USED BY COLLEGE STUDENTS
CN109408803A (en) A method of it semantic understanding for subjective item natural language and corrects
CN115640368A (en) Method and system for intelligently diagnosing recommended question bank
Dolba Technical Terms Used in General English Textbooks Across Disciplines
CN111508289B (en) Language learning system based on word use frequency
CN110390032B (en) Method and system for reading handwritten composition
CN111326030A (en) Reading, dictation and literacy integrated learning system, device and method
CN113254752B (en) Lesson preparation method and device based on big data and storage medium
CN112164262A (en) Intelligent paper reading tutoring system
Zhu A study on the application of automated essay scoring in college english writing based on pigai
Koh The efficacy of basic sentence pattern approach for EFL learners in writing
Uchenwoke An analysis on the Chinese language learning needs and challenges: A case study of Nigerian Chinese language students
Lou Study on vocabulary learning strategies for Chinese English-Majors
Zhao et al. Design and Implementation of a Teaching Verbal Behavior Analysis Aid in Instructional Videos
Kirana Vocabulary Exposure to Islamic Institute Students Through an EFL Coursebook
Chencho Effects of vocabulary instruction using bottom-up and top-down instructional approaches on Bhutanese secondary students’ vocabulary knowledge
Liu et al. The Pedagogical Effects of Lexical Chunks on Chinese EFL Learners' Writing Proficiency
Jassim The Effect of Digital Games on English Vocabulary Learning: A Meta-Analysis
Nababan et al. An ANALYSIS OF GRAMMATICAL ERRORS IN WRITING NARRATIVE TEXT B: AN ANALYSIS OF GRAMMATICAL ERRORS IN WRITING NARRATIVE TEXT BY STUDENTS GRADE EIGHT AT SMP SWASTA VALENTINE DELI SERDANG
Ibrahim et al. The Student’s Perceptions of Netflix Movies in Learning English to Improve Writing Skills on Vocational High School
Munawaroh Investigating Vocabulary Acquisition of Second Semester Students of English Department at UNISMA
Yulia et al. The effect of an English TV series with a bimodal subtitle on students' vocabulary acquisition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant