CN110334215A - Construction method, device, electronic equipment and the storage medium of study of words frame - Google Patents
Construction method, device, electronic equipment and the storage medium of study of words frame Download PDFInfo
- Publication number
- CN110334215A CN110334215A CN201910620581.1A CN201910620581A CN110334215A CN 110334215 A CN110334215 A CN 110334215A CN 201910620581 A CN201910620581 A CN 201910620581A CN 110334215 A CN110334215 A CN 110334215A
- Authority
- CN
- China
- Prior art keywords
- word
- entry
- family
- similarity
- etymology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/358—Browsing; Visualisation therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention provides a kind of construction methods of study of words frame, device, electronic equipment and storage medium, by the data for obtaining entry and entry word frequency, entry is combined into word family again and further calculates the word frequency data of word family, on the basis of word family, finally word family is divided into the cluster of root with etymology method and is unfolded to show, this method is applicable not only to the frequency dividing vocabulary that the statistical Data Mining based on corpus goes out, it is also applied for the word list of the given various static state of outline, based on above-mentioned technology, the structure of vocabulary is no longer single-point or simple linear list, remember compared to single entry more deep, effectively increase the efficiency and effect of study of words.
Description
Technical field
The present invention relates to education sectors, and in particular to a kind of construction method of study of words frame, device, electronic equipment and
Computer readable storage medium.
Background technique
With the development of society, foreign language is also more and more important in our life and work, thus, also more and more
People generally requires to remember a large amount of word in learning process in study language (such as English), and simple memorizing words are non-
Normal is uninteresting, simultaneously because there are similitudes between many words, this also gives the people of study to bring the difficulty of memory.
Existing study of words mode is mostly the word for recommending to need to learn, and is voluntarily remembered by the people learnt, such
Habit mode does not have systematic learning framework, and not only learning efficiency is low, but also it is not deep to will cause memory.
Summary of the invention
In view of this, the embodiment of the present invention is dedicated to providing a kind of construction method of study of words frame, by by entry
Multiple clusters are divided into according to etymology and are shown according to cluster expansion, to enhance memory effect.
According to an aspect of the present invention, a kind of construction method for study of words frame that one embodiment of the invention provides, packet
It includes: obtaining the content of current vocabulary study, the content of the current vocabulary study includes multiple entries;According to the multiple entry
A certain feature, the multiple entry is divided into multiple clusters;And by the multiple entry according to the multiple cluster exhibition
Open display.
In one embodiment, according to a certain feature of the multiple entry, the multiple entry is divided into multiple clusters
Include:
According to the etymology of the multiple entry, the multiple entry is divided into multiple clusters.
In one embodiment, the content for obtaining current vocabulary study includes: that the multiple word is obtained from corpus
The entry word frequency of item;The entry group that form similarity is greater than presets similarity threshold is combined into word family, wherein the word family
The entry word frequency of word family word frequency all entries for including according to the word family obtain;And it chooses the word family word frequency sequence and leans on
The content that word family described in some or all of preceding and corresponding entry learn as current vocabulary.
In one embodiment, word family is combined into the entry group that form similarity is greater than presets similarity threshold
Later, further includes: the centre word of the word family is chosen, wherein the centre word represents the word family.
In one embodiment, the centre word for choosing the word family includes: to choose entry word frequency maximum in the word family
Entry be center word.
In one embodiment, it includes: that judgement is that the maximum entry of entry word frequency, which is center word, in the selection word family
It is no that there are the differences of the entry word frequency of other entries and maximum entry word frequency to be less than preset difference value;When there are the words of other entries
When the difference of word frequency and maximum entry word frequency is less than preset difference value, choose the corresponding entry of the maximum entry word frequency and
It is the entry that verb and/or length are less than pre-set length threshold in other described entries is center word.
In one embodiment, the word family word frequency sequence is chosen according to the current desired number of entries to be learnt described
Before the content that the forward number of entries word family and corresponding entry learn as current vocabulary, further includes: to described
Word family is pre-processed.
It is in one embodiment, described that carry out pretreatment to the word family include: to delete entry word frequency in the word family to be less than
The entry of first default word frequency threshold.
It is in one embodiment, described that carry out pretreatment to the word family include: the entry and the word calculated in current word family
The similarity of the centre word of race;Judge the similarity whether less than the first default similarity threshold;And when judging result is
The similarity is less than the described first default similarity threshold, and the corresponding entry of the similarity is adjusted to other word families
In.
In one embodiment, the similarity includes voice similarity and/or the first semantic similarity and/or etymology phase
Like degree.
In one embodiment, the entry word frequency for all entries that the word family word frequency of the word family includes according to the word family obtains
It sums to obtain to the entry word frequency that the word family word frequency for including: the word family is all entries that the word family includes.
In one embodiment, the corpus includes one of following corpus or a variety of combinations: American contemporary English
Language corpus, British National Corpus, middle text frequency corpus.
In one embodiment, the multiple entry is divided into multiple clusters by the etymology according to the multiple entry
It include: the etymology for searching the centre word of the word family;And by the corresponding word family of the centre word with identical etymology
It is divided into same class.
In one embodiment, the etymology includes the Yin Ougen of U.S.'s tradition dictionary.
In one embodiment, in the etymology according to the multiple entry, the multiple entry is divided into multiple poly-
After class, further includes: the word family for meeting preset condition is repartitioned class.
In one embodiment, the preset condition includes: comprising word family quantity in corresponding cluster lower than preset quantity threshold
Value, and/or with the similarity of corresponding etymology less than the second default similarity threshold, and/or it is not divided to any class.
In one embodiment, the mode for repartitioning class includes: to extract the stem of the centre word of current word family;It calculates
The similarity of the stem and all etymologies;It, will when being greater than third to preset the similarity quantity of similarity threshold is one
The current word family is divided to the class where the corresponding etymology of the similarity;And similarity threshold is preset greater than the third
When similarity quantity is multiple, the current word family is divided to word family minimum number in the corresponding multiple etymologies of multiple similarities
Etymology where class.
In one embodiment, the stem of the centre word for extracting current word family includes: the prefix for removing the centre word
And suffix, and/or compound word is disassembled into out corresponding entry.
In one embodiment, it is described by the multiple entry according to the multiple cluster expansion display include: will be described more
A entry is graphically unfolded to show.
In one embodiment, the chart includes multistage node, and wherein root node is etymology.
In one embodiment, the multistage node includes: other entries in centre word and corresponding word family.
In one embodiment, the multistage node further includes the combination such as any one or more in lower node: etymology, language
Sound feature, semantic feature and prefix.
In one embodiment, the chart includes mind map.
According to another aspect of the present invention, a kind of construction device for study of words frame that one embodiment of the invention provides,
It include: acquisition module, for obtaining the content of current vocabulary study, the content of the current vocabulary study includes multiple entries;
The multiple entry is divided into multiple clusters for a certain feature according to the multiple entry by cluster module;And it shows
Module, for the multiple entry to be unfolded to show according to the multiple cluster.
According to another aspect of the present invention, a kind of computer readable storage medium that one embodiment of the invention provides, it is described
Storage medium is stored with computer program, and the computer program is used to execute the structure of any of the above-described study of words frame
Construction method.
According to another aspect of the present invention, a kind of electronic equipment that one embodiment of the invention provides, the electronic equipment packet
It includes: processor;For storing the memory of the processor-executable instruction;The processor, for executing any of the above-described institute
The construction method for the study of words frame stated.
The construction method of study of words frame provided in an embodiment of the present invention, by the way that entry to be divided into according to a certain feature
Multiple clusters are simultaneously shown according to cluster expansion, are effectively increased learner and are understood learnt entry from the source of entry, compare
Single entry memory is more deep, effectively increases the efficiency and effect of study of words.
Detailed description of the invention
Fig. 1 show the flow chart of the construction method of the study of words frame of one embodiment of the application offer.
Fig. 2 show one embodiment of the application and provides the method flow diagram of settled preceding study of words content really.
Fig. 3 show another embodiment of the application and provides the method flow diagram of settled preceding study of words content really.
Fig. 4 show another embodiment of the application and provides the method flow diagram of settled preceding study of words content really.
Fig. 5 show the flow chart of the preprocess method of one embodiment of the application offer.
Fig. 6 show the flow chart of the construction method for the study of words frame that another embodiment of the application provides.
Fig. 7 show the flow chart of the method for repartitioning class of one embodiment of the application offer.
Fig. 8 show the structural schematic diagram of the chart of one embodiment of the application offer.
Fig. 9 show the structural schematic diagram of the construction device of the study of words frame of one embodiment of the application offer.
Figure 10 show the structural schematic diagram of the determining module of one embodiment of the application offer.
Figure 11 show the structural schematic diagram for the determining module that another embodiment of the application provides.
Figure 12 show the structural schematic diagram for the determining module that another embodiment of the application provides.
Figure 13 show the structural schematic diagram of the preprocessing module of one embodiment of the application offer.
Figure 14 show the structural schematic diagram of the construction device for the study of words frame that another embodiment of the application provides.
Figure 15 show the structural schematic diagram for repartitioning module of one embodiment of the application offer.
Figure 16 show the structural schematic diagram of the electronic equipment of one embodiment of the application offer.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this
Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts
Example is applied, shall fall within the protection scope of the present invention.
In addition, in the exemplary embodiment because identical reference marker indicate to have mutually isostructural same parts or
The same steps of same procedure, if being illustratively described an embodiment, in other exemplary embodiments only description with
The different structure or method of described embodiment.
In the whole instruction and claims, when a component representation be ' attach ' to another component, a portion
Part " can be directly connected to " to another component, or arrive another component by third member " electrical connection ".In addition, unless clearly
Opposite description is carried out, term " includes " and its corresponding term should only be interpreted as including the component, and the row of should not be construed
Except any other component.
Firstly, being explained accordingly noun involved in the application.Entry is also entry word, is dictionary term, refers to
Receive the word and its annotations of column.Entry can be word, be also possible to phrase.In a dictionary, entry is the basic of its composition
Unit is the unit annotated to spelling words, phonetic notation, meaning, usage etc..Word family refers to multiple sound and meaning phase
One family of logical or similar cognate composition or system.Etymology refers to the source of entry or word, from the language element in language
Call the turn the development that the record occurred earliest traces its voice, writing and the meaning of a word;Trace it from a kind of language shift be another language
The process of speech;Analysis is combined into its component part;It identifies its cognate in other language, or traces it and its cognate
Common form handed down from one's ancestors in a kind of ancestral on the books or hypothesis system language.
Fig. 1 show the flow chart of the construction method of the study of words frame of one embodiment of the application offer.Such as Fig. 1 institute
Show, the construction method of the study of words frame, comprising:
Step 110: obtaining the content of current vocabulary study, the content of current vocabulary study includes multiple entries.
Content about current vocabulary study includes not providing specific vocabulary and only having done wide in range ability description
Perhaps be the English study scene of certain wide in range ability demand or it is horizontal for the lexical abilities of specific learner and
The study of words content recommended is customized, and gives the single layer study scene of the word list of study outline or fixation.Step
120: according to a certain feature of multiple entries, multiple entries being divided into multiple clusters.
In one embodiment, multiple entries can be divided by multiple clusters according to the etymology of multiple entries.Each entry
There is its history source (i.e. etymology), by the etymology of retrospect word family, can understand entry from the root, it also can be more convenient
Learner understands and memory.Therefore, the embodiment of the present application carries out cluster operation to each entry by searching for the etymology of entry, with
Different etymologies is classification foundation, and each entry is divided into different classes.It should be appreciated that the embodiment of the present application can also choose it
Entry is divided into multiple clusters, such as word family by his feature, and part of speech, semantic and scene etc., the embodiment of the present application does not limit
This feature.In one embodiment, when English learning vocabulary, etymology may include the Yin Ougen of U.S.'s tradition dictionary.Wherein print
Europe root Primary Reference but be not limited to the U.S.'s tradition dictionary print Europe root, Yin Ougen is the root-form of original Indo-European languages, and former
Beginning Indo-European languages is the method by historical comparative linguistics, to existing and ancient times more than 400 Indo-European language carry out sections
The structure quasi-language investigated and generated is learned, it profoundly discloses the four big sources for occupying modern English vocabulary 90%: Germanic,
Greek, the systemic corresponding relationship between Latin and French.
Step 130: multiple entries are unfolded to show according to multiple clusters.
By by entry according to etymology be divided into it is multiple cluster and according to cluster expansion display, effectively increase learner from
The source of entry understands learnt entry, more deep compared to the memory of single entry, effectively increases the efficiency of study of words
And effect.
Fig. 2 show one embodiment of the application and provides the method flow diagram of settled preceding study of words content really.Such as Fig. 2 institute
Show, step 110 may include following sub-step:
Step 111: the entry word frequency of multiple entries is obtained from corpus.
Entry word frequency is the frequency that entry occurs in corpus, and it is general in corpus that entry word frequency represents the entry
Rate, could be by not consulting a dictionary when the density of new word is lower than certain value (such as 2%) due to usually in the process of reading
In the case where understand article general idea substantially, the high entry of frequency of occurrence can more facilitate reader and understand text as far as possible in grasp article
The general idea of chapter, therefore, by obtaining the entry word frequency of each entry in the word list for needing to learn, i.e., entry is in corpus
It frequently occurs, then it should preference learning.Wherein word list can be given examination outline, be also possible to include the study language
All entries word list.
In one embodiment, corpus includes one of natural language library, such as following corpus of any study
Or a variety of combination: American contemporary English corpus, British National Corpus, middle text frequency corpus etc..When the word that studies English
When remittance, corpus can be one of above-mentioned corpus or a variety of combinations, it is of course also possible to self-built language according to actual needs
Expect library, it should be understood that the embodiment of the present application can choose different corpus according to the demand of practical application, as long as selected
Corpus be able to reflect the frequency of occurrence for needing to learn entry, the application does not limit the concrete type of corpus
It is fixed.
Step 112: the entry group that form similarity is greater than presets similarity threshold being combined into word family, wherein word family
The entry word frequency for all entries that word family word frequency includes according to word family obtains.
Entry group is combined into word family according to the form similarity between entry, for example, " act ", " action ", " active ",
" activity " etc. can be combined into a word family, since the meaning of a word is generally same or similar between the entry in word family, and shape
Formula is similar, therefore, entry is combined into word family, can aid in understanding and the multiple entries of memory.
Wherein, the entry word frequency for all entries that the word family word frequency of word family can be included according to it obtains, and implements one
In example, the word family word frequency of word family is that the entry word frequency for all entries that word family includes sums to obtain.It should be appreciated that the application is implemented
The method that example can choose different acquisition word family word frequency according to the demand of practical application, as long as selected acquisition word family word
The method of frequency is able to reflect the frequency of occurrence for needing to learn word family, and the application is for obtaining the specific method of word family word frequency not
It limits.
Step 113: some or all of selection word family word frequency sequence is forward word family and corresponding entry are as current vocabulary
The content of study.
In one embodiment, it according to the current desired number of entries to be learnt, chooses word family word frequency and sorts forward entry
The content that quantity word family and corresponding entry learn as current vocabulary.After the word family word frequency for obtaining all word families, according to
The number of entries (i.e. vocabulary) for currently needing to learn chooses the word family word frequency number of entries word family that sorts forward and corresponding
The content that entry learns as current vocabulary, i.e., the maximum word family of selection number of entries word family word frequency and it includes all words
The content that item learns as current vocabulary.For example, (for example TOEFL does not provide specific the simple personal English level of promotion
Vocabulary and only done wide in range ability description or only certain wide in range ability demand, and for example general hear friendship
It is stream, accessible to read common unreduced text, or even include the specialized vocabulary for particular professional demand), then it can will
All word families are ranked up from large to small according to word family word frequency, and preferentially choosing the forward word family of a certain amount of sequence is current learn
The content of habit, to realize that learner can learn the higher entry of frequency of use as far as possible.And the embodiment of the present application may be used also
Personal English level is promoted with the people for being applied to have certain vocabulary level, can be chosen not according to the test to its English level
With word library as the optional range of its study of words, or is tested according to its English level layer and choose corresponding word frequency section
Optional range of the entry as its study of words.The embodiment of the present application can also be applied to give study outline or fixation
The single layer of word list learns scene, such as participates in college entrance examination, there will be specific college entrance examination English word outline or word list, then this Shen
Please embodiment the entry in the word list need to be only combined into multiple word families (i.e. limitation obtain entry or word family range),
Then it is ranked up from large to small according to word family word frequency, the higher entry of preference learning word family word frequency, even if to guarantee that examinee exists
When cannot grasp all words in word list completely, the higher entry of frequency of use can be also grasped as far as possible.
By the way that entry is combined into word family, and word family word frequency is obtained according to the entry word frequency in corpus, preferentially chooses word
The high word family of race's word frequency and corresponding entry are the content of study of words, effectively increase the efficiency and effect of study of words, together
When go to remember by way of word family, compared to single entry remember it is more deep, also further increase the vocabulary of memory.
Fig. 3 show another embodiment of the application and provides the method flow diagram of settled preceding study of words content really.Such as Fig. 3
Shown, after step 113, step 110 can also include sub-step:
Step 114: choosing the centre word of word family, wherein centre word represents word family.
After entry group is combined into word family, a centre word is chosen for each word family, for representing the word family, centre word can
To be any one entry in the word family, as long as the entry can represent in the word family principal mode of all entries and the meaning of a word i.e.
Can, the embodiment of the present application for word family centre word without limitation.
In one embodiment, the method for choosing the centre word of word family, which may is that, chooses the maximum word of entry word frequency in word family
Item is center word.It is that center word can preferably embody making for the word family by the entry by choosing the maximum entry of entry word frequency
Use frequency.In a further embodiment, the entry word frequency of other entries and the difference of maximum entry word frequency are judged whether there is
Value is less than preset difference value;When being less than preset difference value there are the difference of the entry word frequency of other entries and maximum entry word frequency
(existing and the much the same entry of maximum entry-word frequency), chooses in the corresponding entry of maximum entry word frequency and other entries
Being less than the entry of pre-set length threshold for verb and/or length is center word.Usual verb more representative of the word family form and
The meaning of a word, because the entry of a lot of other parts of speech is all to have the entry of verb by obtaining plus prefix, suffix or other combinations
It arrives.In order to reduce the length of memory entry to the greatest extent, to improve the effect of memory, the word that length is shorter in word family can also be chosen
Item, user can reduce the memory difficulty of user, naturally it is also possible to select by remembering the centre word come the entire word family of Understanding memory
It takes length of entry shorter and is the entry of verb.Such as referred in a upper embodiment include " act ", " action ",
The word family of the entries such as " active ", " activity ", can choose act is center word.It should be appreciated that the embodiment of the present application provides
The method of illustrative Selection Center word, the method for not representing the embodiment of the present application Selection Center word are defined in this.
In one embodiment, the implementation method of step 120 can specifically include: search the word of the centre word of the word family of selection
The corresponding word family of centre word with identical etymology is divided into same class by source.
By searching for the etymology of centre word, to the corresponding word family of centre word with identical etymology is divided into same
Class facilitates learner more into one so that learner understands continuous between the source of each word family and the word family of same etymology
Step understands and memory.
Fig. 4 show another embodiment of the application and provides the method flow diagram of settled preceding study of words content really.Such as Fig. 4
Shown, before step 113, this method can also include:
Step 115: word family is pre-processed.
In one embodiment, pre-process to word family may include: to delete entry word frequency in word family to preset less than first
The entry of word frequency threshold.Since particial entry is very rare, i.e., the probability occurred in daily reading or examination is very low, if flower
Time-consuming and energy goes to remember these entries, not only wastes time and energy, but also can reduce the efficiency for learning other entries.
Therefore, we delete the entry that these entry word frequency are less than a certain preset value (the first default word frequency threshold), without deliberately going
Memory, even if there are these rare entries, can also associate related word family according to its composition, to obtain its general word
Justice, the meaning of word whole for reading and understanding not will cause substantive influence.
Fig. 5 show the flow chart of the preprocess method of one embodiment of the application offer.As shown in figure 5, step 115 can be with
Including following sub-step:
Step 1151: calculating the similarity of the centre word of the entry and the word family in current word family.
Although its meaning of a word or pronunciation difference are larger, if by these words since some entry forms are more similar
Item group is combined into same word family, and learner cannot not only be helped to understand and remember, and will lead to learner's memory instead and obscures, therefore,
It needs the entry for differing increasing in the formal similar but meaning of a word or pronunciation being divided into different word families, to facilitate learner to manage
Solution and memory.For this purpose, the embodiment of the present application is sentenced by calculating the similarity of each entry and centre word in word family
Breaking, whether it should be divided in the word family.In one embodiment, similarity includes that voice similarity and/or first are semantic similar
Degree and/or etymology similarity.Voice similarity and/or semantic phase i.e. by calculating each entry in word family and centre word
Judge whether it should be divided in the word family like degree and/or etymology similarity.
Step 1152: judging similarity whether less than the first default similarity threshold, if so, going to step 1153, otherwise
Terminate.
A similarity threshold is preset, the similarity and the default similarity threshold of comparison entry and centre word are passed through
Size, to judge whether the entry should be divided in the word family.
Step 1153: the corresponding entry of similarity is adjusted into other word families.
When the similarity of a certain entry and centre word is less than default similarity threshold, illustrate that the entry should not be divided into this
In the corresponding word family of centre word, then the entry is adjusted into other word families.Such as entry " mental " and entry " metal " exist
In form it is closely similar, but entry " mental " indicate spirit, it is psychological, and entry " metal " indicate it is metal,
(with metal) covering, word meaning difference are larger.If simple is divided into word family according to the Formal Similarity, the two entries very may be used
It can be divided into same word family, but the two obvious entries are put together, understanding and memory are inappropriate, therefore, the two
Entry should not be divided into same word family.
In one embodiment, the method for adjustment can be sequence and adjust the entry to the word family high with its form similarity
In, then judge the semanteme and/or voice similarity of the centre word of the entry and word family adjusted again, if the entry with it is a certain
When the similarity of the centre word of word family is greater than or equal to the first default similarity threshold, it is determined that adjust the entry to the word family
In;If the similarity of the entry and the centre word of all word families is respectively less than the first default similarity threshold, which is drawn
Enter in a new word family.It should be appreciated that the method for adjustment can other any means, the embodiment of the present application is for adjustment
Specific method is without limitation.
Fig. 6 show the flow chart of the construction method for the study of words frame that another embodiment of the application provides.Such as Fig. 6 institute
Show, after step 120, this method can also include:
Step 140: the word family for meeting preset condition is repartitioned into class.
Due to the development with language, although many entries are originated from same etymology, its form or voice or word
Justice has biggish difference, therefore, if only dividing class according to etymology, it may appear that some word families that differ greatly are divided into same
One kind, the difficulty also resulted in memory in this way increase.
In one embodiment, preset condition may include: comprising word family quantity in corresponding cluster lower than preset quantity threshold
Value, and/or with the similarity of corresponding etymology less than the second default similarity threshold, and/or it is not divided to any class.For
Word family quantity too low (such as less than 5) in a certain cluster, the etymology of word family are unknown, etymology excessively complicated (i.e. word family and word
The similarity in source is lower) the case where, in order to improve the efficiency of learner's memory, class can be repartitioned to above-mentioned word family, to reach
To optimal understanding and memory effect.It should be appreciated that the embodiment of the present application learns and remembers to preferably help learner to improve
The effect recalled can be repartitioned for meeting the word family of preset condition, and the embodiment of the present application also can choose not to poly-
Word family after class is repartitioned.
Fig. 7 show the flow chart of the method for repartitioning class of one embodiment of the application offer.As shown in fig. 7, again
Divide class mode may include:
Step 1401: extracting the stem of the centre word of current word family.
In one embodiment, the stem of the centre word of current word family is extracted can include: the prefix and suffix of centre word are removed,
Or compound word is disassembled into out corresponding entry.Affixe refers to the morpheme for being adhered to and constituting neologisms on root, itself cannot be independent
Word is constituted, such as prefix, suffix, root refers to the part of the major embodiment meaning of a word in entry.If centre word is a compound word, one
As choose semantically important part, for example " hairdresser " is a compound word, and removing suffix can choose
" dress ", by the stem of extraction centre word, obtains the semantic part that can embody the word family as stem.
Step 1402: calculating the similarity of stem and all etymologies.
By calculating the similarity of stem and all etymologies, removal affixe corresponds to the form interference of centre word, compares center
The etymology of the stem of word, and can by comprehensive reference phonetic feature, semantic feature, in the form of feature and etymology feature, so as to
More accurately find the etymology of the centre word.
Step 1403: when being greater than third to preset the similarity quantity of similarity threshold is one, current word family being divided
Class to where the corresponding etymology of the similarity.
Step 1704: when it is multiple for being greater than third to preset the similarity quantity of similarity threshold, current word family being divided
Class to where the etymology of word family minimum number in the corresponding multiple etymologies of multiple similarity.
When there are multiple similarities with greater than the default similarity threshold of third (there is the multiple and close word of the stem
Source) when, which is divided in the class where the etymology of word family negligible amounts, to improve such word family quantity, so as to
With by some lesser Cluster-Fusions at utilizable classification.
In one embodiment, the implementation of step 130 may include: that multiple entries are graphically unfolded to show
Show.
After word family to be passed through to clustering according to etymology and is multiple classes, entry can graphically be shown,
To facilitate learner to be visually more easier to receive and understand source, classification and the meaning of a word of entry, also more facilitate to learn
Person improves the effect of study.In one embodiment, chart may include mind map.
In one embodiment, chart may include multistage node, and wherein root node is etymology.In one embodiment, multistage
Node can also include: other entries in centre word and corresponding word family.In one embodiment, multistage node can also include:
Etymology, phonetic feature, semantic feature and prefix.The structural schematic diagram of the chart of one embodiment of the application offer is provided,
As shown in figure 8, root node is etymology " dh ē-", first nodes be etymology (i.e. it is corresponding to show etymology source for voice in source
Change of tune rule), such as " Germanic " (Germany), i.e., for the word family from Germany, the meaning is exactly to include Green's rule
Variation, it is also possible to the general description of the change of tune;Two-level node is the word family meaning of a word, such as " action " is indicated with the word of lower node
Justice is " action ";Three-level node is center word " do ", " deed ", and level Four node is other in word family corresponding with the centre word
Entry.It should be appreciated that the embodiment of the present application, which is only exemplary, illustrates a kind of structure of chart, the embodiment of the present application can also be with
Different graph structures is chosen according to actual application scenarios, such as multistage node (part of speech etc.) can also be added in chart,
It needs the entry learnt as long as selected graph structure can be good at showing and facilitates learner's study and understand,
The embodiment of the present application for chart specific structure without limitation.
Fig. 9 show the structural schematic diagram of the construction device of the study of words frame of one embodiment of the application offer.Such as Fig. 9
Shown, the construction device of the study of words frame includes: to obtain module 21, for obtaining the content of current vocabulary study, wherein
The content of current vocabulary study includes multiple entries;Cluster module 22, for a certain feature according to multiple entries, by multiple words
Item is divided into multiple clusters;And display module 23, for multiple entries to be unfolded to show according to multiple clusters.
By the way that entry is divided into multiple clusters according to a certain feature and according to cluster expansion display, study is effectively increased
Person understands learnt entry from the source of entry, more deep compared to the memory of single entry, effectively increases study of words
Efficiency and effect.
In one embodiment, cluster module 22 is further configured to: for the etymology according to multiple entries, by multiple entries
It is divided into multiple clusters.
In one embodiment, when English learning vocabulary, etymology may include the Yin Ougen of U.S.'s tradition dictionary.
In one embodiment, display module 23 is further configured to graphically be unfolded to show by entry.
In one embodiment, chart may include multistage node, and wherein root node is etymology.In one embodiment, multistage
Node can also include: other entries in centre word and corresponding word family.In one embodiment, multistage node can also include:
Etymology, phonetic feature, semantic feature and prefix.
Figure 10 show the structural schematic diagram of the determining module of one embodiment of the application offer.As shown in Figure 10, mould is obtained
Block 21 includes following submodule: acquisition submodule 211, for obtaining the entry word frequency of multiple entries from corpus;Group zygote
Module 212, the entry group for form similarity to be greater than presets similarity threshold are combined into word family, wherein the word family of word family
The entry word frequency for all entries that word frequency includes according to word family obtains;And submodule 213 is chosen, for choosing word family word frequency row
The content that some or all of the forward word family of sequence and corresponding entry learn as current vocabulary.
In one embodiment, it chooses submodule 213 to be further configured to: according to the number of entries for currently needing to learn, choosing
Content of the number of entries word family and corresponding entry for taking word family word frequency to sort forward as study of words.
By the way that entry is combined into word family, and word family word frequency is obtained according to the entry word frequency in corpus, preferentially chooses word
The high word family of race's word frequency and corresponding entry are the content of study of words, effectively increase the efficiency and effect of study of words, together
When go to remember by way of word family, compared to single entry remember it is more deep, also further increase the vocabulary of memory.
In one embodiment, corpus includes one of following corpus or a variety of combinations: American contemporary English language
Expect library, British National Corpus, middle text frequency corpus.
In one embodiment, the word family word frequency of word family is that the entry word frequency for all entries that word family includes sums to obtain.
Figure 11 show the structural schematic diagram for the determining module that another embodiment of the application provides.As shown in figure 11, it obtains
Module 21 can also include that centre word chooses submodule 214, and for choosing the centre word of word family, wherein centre word represents word family.
In one embodiment, the method for choosing the centre word of word family, which may is that, chooses the maximum word of entry word frequency in word family
Item is center word.Preferably, when there are the differences of the entry word frequency of other entries and maximum entry word frequency to be less than preset difference value
When, it chooses in the corresponding entry of maximum entry word frequency and other entries and is less than pre-set length threshold for verb and/or length
Entry be center word.
Figure 12 show the structural schematic diagram for the determining module that another embodiment of the application provides.As shown in figure 12, it obtains
Module 21 can also include pretreatment submodule 215, for pre-processing to word family.
In one embodiment, pre-process to word family may include: to delete entry word frequency in word family to preset less than first
The entry of word frequency threshold.
Figure 13 show the structural schematic diagram of the preprocessing module of one embodiment of the application offer.As shown in figure 13, pre- place
Managing submodule 215 may include: the first computational submodule 2151, for calculating the center of entry and the word family in current word family
The similarity of word;Judging submodule 2152, for judging similarity whether less than the first default similarity threshold;Adjusting submodule
2153, for being similarity less than the first default similarity threshold when judging result, the corresponding entry of similarity is adjusted to it
In his word family.
In one embodiment, similarity includes voice similarity and/or the first semantic similarity and/or etymology similarity.
Figure 14 show the structural schematic diagram of the construction device for the study of words frame that another embodiment of the application provides.Such as
Shown in Figure 14, the apparatus may include module 24 is repartitioned, for the word family for meeting preset condition to be repartitioned class.
In one embodiment, preset condition may include: comprising word family quantity in corresponding cluster lower than preset quantity threshold
Value, and/or with the similarity of corresponding etymology less than the second default similarity threshold, and/or it is not divided to any class.
Figure 15 show the structural schematic diagram for repartitioning module of one embodiment of the application offer.As shown in figure 15, weight
New division module 24 may include: stem extracting sub-module 241, the stem of the centre word for extracting current word family;Second meter
Operator module 242, for calculating the similarity of stem Yu all etymologies;Submodule 243 is divided, for when default greater than third
When the similarity quantity of similarity threshold is one, the current word family is divided to where the corresponding etymology of the similarity
Class, and when it is multiple for being greater than the third to preset the similarity quantity of similarity threshold, it is more that current word family is divided to this
Class in the corresponding multiple etymologies of a similarity where the etymology of word family minimum number.
In one embodiment, the stem of the centre word of current word family is extracted can include: the prefix and suffix of centre word are removed,
Or compound word is disassembled into out corresponding entry.
In the following, being described with reference to Figure 16 the electronic equipment according to the embodiment of the present application.The electronic equipment can be first and set
Any of standby and second equipment or both or with their independent stand-alone devices, which can be with the first equipment
It is communicated with the second equipment, to receive the collected input signal of institute from them.
Figure 16 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.
As shown in figure 16, electronic equipment 10 includes one or more processors 11 and memory 12.
Processor 11 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability
Other forms processing unit, and can control the other assemblies in electronic equipment 10 to execute desired function.
Memory 12 may include one or more computer program products, and the computer program product may include each
The computer readable storage medium of kind form, such as volatile memory and/or nonvolatile memory.The volatile storage
Device for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile to deposit
Reservoir for example may include read-only memory (ROM), hard disk, flash memory etc..It can be deposited on the computer readable storage medium
One or more computer program instructions are stored up, processor 11 can run described program instruction, to realize this Shen described above
The construction method of the study of words frame of each embodiment please and/or other desired functions.The computer can
It reads that the various contents such as input signal, signal component, noise component(s) can also be stored in storage medium.
In one example, electronic equipment 10 can also include: input unit 13 and output device 14, these components pass through
The interconnection of bindiny mechanism's (not shown) of bus system and/or other forms.
For example, the input unit 13 can be microphone or wheat when the electronic equipment is the first equipment or the second equipment
Gram wind array, for capturing the input signal of sound source.When the electronic equipment is stand-alone device, which can be logical
Communication network connector, for receiving input signal collected from the first equipment and the second equipment.
In addition, the input equipment 13 can also include such as keyboard, mouse etc..
The output device 14 can be output to the outside various information, including range information, the directional information etc. determined.It should
Output equipment 14 may include that such as display, loudspeaker, printer and communication network and its long-range output connected are set
It is standby etc..
Certainly, to put it more simply, illustrated only in Figure 16 it is some in component related with the application in the electronic equipment 10,
The component of such as bus, input/output interface etc. is omitted.In addition to this, according to concrete application situation, electronic equipment 10 is also
It may include any other component appropriate.
Other than the above method and equipment, embodiments herein can also be computer program product comprising meter
Calculation machine program instruction, it is above-mentioned that the computer program instructions make the processor execute this specification when being run by processor
According to the step in the construction method of the study of words frame of the various embodiments of the application described in " illustrative methods " part.
The computer program product can be write with any combination of one or more programming languages for holding
The program code of row the embodiment of the present application operation, described program design language includes object oriented program language, such as
Java, Python, C++ etc. further include conventional procedural programming language, such as " C " language or similar programming
Language.Program code can be executed fully on the user computing device, partly execute on a user device, be only as one
Vertical software package executes, part executes on a remote computing or remotely counting completely on the user computing device for part
It calculates and is executed on equipment or server.
In addition, embodiments herein can also be computer readable storage medium, it is stored thereon with computer program and refers to
It enables, the computer program instructions make the processor execute above-mentioned " the exemplary side of this specification when being run by processor
According to the step in the entry choosing method of the study of words of the various embodiments of the application described in method " part.
The computer readable storage medium can be using any combination of one or more readable mediums.Readable medium can
To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity
Magnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Readable storage medium storing program for executing it is more specific
Example (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory
Device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc
Read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, made any modification, equivalent replacement etc. be should all be included in the protection scope of the present invention.
Claims (25)
1. a kind of construction method of study of words frame characterized by comprising
The content of current vocabulary study is obtained, the content of the current vocabulary study includes multiple entries;
According to a certain characteristic of the multiple entry, the multiple entry is divided into multiple clusters;And
The multiple entry is unfolded to show according to the multiple cluster.
2. the method according to claim 1, wherein a certain characteristic of the etymology according to the multiple entry,
The multiple entry, which is divided into multiple clusters, includes:
According to the etymology of the multiple entry, the multiple entry is divided into multiple clusters.
3. the method according to claim 1, wherein the content for obtaining current vocabulary study includes:
The entry word frequency of the multiple entry is obtained from corpus;
The entry group that form similarity is greater than presets similarity threshold is combined into word family, wherein the word family word frequency of the word family
The entry word frequency for all entries for including according to the word family obtains;And
Choose word family word frequency sequence it is some or all of forward described in word family and corresponding entry as current vocabulary study
Content.
4. according to the method described in claim 3, it is characterized in that, form similarity is greater than presets similarity described
The entry group of threshold value is combined into after word family, further includes:
The centre word of the word family is chosen, wherein the centre word represents the word family.
5. according to the method described in claim 4, it is characterized in that, the centre word for choosing the word family includes:
Choosing the maximum entry of entry word frequency in the word family is center word.
6. according to the method described in claim 5, it is characterized in that, described choose the maximum entry of entry word frequency in the word family
Include: for center word
The difference of the entry word frequency and maximum entry word frequency that judge whether there is other entries is less than preset difference value;
When being less than preset difference value there are the difference of the entry word frequency of other entries and maximum entry word frequency, the maximum is chosen
The corresponding entry of entry word frequency and other described entries in be that verb and/or length are less than the entry of pre-set length threshold and are
Heart word.
7. according to the method described in claim 2, it is characterized in that, choosing the word family word frequency described and sorting forward part
Or all before content that the word families and corresponding entry learn as current vocabulary, further includes:
The word family is pre-processed.
8. the method according to the description of claim 7 is characterized in that it is described to the word family carry out pretreatment include:
Entry word frequency is deleted in the word family less than the entry of the first default word frequency threshold.
9. the method according to the description of claim 7 is characterized in that it is described to the word family carry out pretreatment include:
Calculate the similarity of the centre word of the entry and the word family in current word family;
Judge the similarity whether less than the first default similarity threshold;And
When judging result is that the similarity is less than the described first default similarity threshold, by the corresponding institute's predicate of the similarity
Item is adjusted into other word families.
10. according to the method described in claim 9, it is characterized in that, the similarity includes voice similarity and/or the first language
Adopted similarity and/or etymology similarity.
11. according to the method described in claim 3, it is characterized in that, the word family word frequency of the word family includes according to the word family
The entry word frequency of all entries included:
The word family word frequency of the word family is that the entry word frequency for all entries that the word family includes sums to obtain.
12., will be described more according to the method described in claim 2, it is characterized in that, the etymology according to the multiple entry
A entry is divided into multiple clusters
Search the etymology of the centre word of the word family;And
The corresponding word family of the centre word with identical etymology is divided into same class.
13. according to the method described in claim 2, it is characterized in that, the etymology includes the Yin Ougen of U.S.'s tradition dictionary.
14., will be described according to the method described in claim 2, it is characterized in that, in the etymology according to the multiple entry
Multiple entries are divided into after multiple clusters, further includes:
The word family for meeting preset condition is repartitioned into class.
15. according to the method for claim 14, which is characterized in that the preset condition includes:
It is lower than preset quantity threshold value comprising word family quantity in corresponding cluster, and/or with the similarity of corresponding etymology less than the
Two default similarity thresholds, and/or it is not divided to any class.
16. according to the method for claim 14, which is characterized in that the mode for repartitioning class includes:
Extract the stem of the centre word of current word family;
Calculate the similarity of the stem Yu all etymologies;
When being greater than third to preset the similarity threshold quantity of similarity is one, the current word family is divided to the similarity
Class where corresponding etymology;And
When it is multiple for being greater than the third to preset the similarity quantity of similarity threshold, it is more that the current word family is divided to this
Class in the corresponding multiple etymologies of a similarity where the etymology of word family minimum number.
17. according to the method for claim 16, which is characterized in that the stem packet of the centre word for extracting current word family
It includes:
The prefix and suffix of the centre word are removed, and/or compound word is disassembled into out corresponding entry.
18. the method according to claim 1, wherein it is described by the multiple entry according to the multiple cluster
Expansion is shown
The multiple entry is graphically unfolded to show.
19. according to the method for claim 18, which is characterized in that the chart includes multistage node, and wherein root node is
Etymology.
20. according to the method for claim 19, which is characterized in that it is described multistage node include:
Other entries in centre word and corresponding word family.
21. according to the method for claim 20, which is characterized in that the multistage node further includes such as any in lower node
A or multiple combination:
Etymology, phonetic feature, semantic feature and prefix.
22. according to the method for claim 18, which is characterized in that the chart includes mind map.
23. a kind of construction device of study of words frame characterized by comprising
Module is obtained, for obtaining the content of current vocabulary study, the content of the current vocabulary study includes multiple entries;
The multiple entry is divided into multiple clusters for a certain feature according to the multiple entry by cluster module;And
Display module, for the multiple entry to be unfolded to show according to the multiple cluster.
24. a kind of computer readable storage medium, the storage medium is stored with computer program, and the computer program is used for
Execute the construction method of any study of words frame of the claims 1-22.
25. a kind of electronic equipment, the electronic equipment include:
Processor;
For storing the memory of the processor-executable instruction;
The processor, for executing the construction method of any study of words frame of the claims 1-22.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910620581.1A CN110334215B (en) | 2019-07-10 | 2019-07-10 | Construction method and device of vocabulary learning framework, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910620581.1A CN110334215B (en) | 2019-07-10 | 2019-07-10 | Construction method and device of vocabulary learning framework, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110334215A true CN110334215A (en) | 2019-10-15 |
CN110334215B CN110334215B (en) | 2021-08-10 |
Family
ID=68146159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910620581.1A Active CN110334215B (en) | 2019-07-10 | 2019-07-10 | Construction method and device of vocabulary learning framework, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334215B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1916889A (en) * | 2005-08-19 | 2007-02-21 | 株式会社日立制作所 | Language material storage preparation device and its method |
CN101587662A (en) * | 2009-01-20 | 2009-11-25 | 郭传喜 | Word frequency based word list sequence |
CN103324626A (en) * | 2012-03-21 | 2013-09-25 | 北京百度网讯科技有限公司 | Method for setting multi-granularity dictionary and segmenting words and device thereof |
JP5504097B2 (en) * | 2010-08-20 | 2014-05-28 | Kddi株式会社 | Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation |
CN105224664A (en) * | 2015-10-08 | 2016-01-06 | 孙继兰 | A kind of digital publication vocabulary extraction, display packing and system |
-
2019
- 2019-07-10 CN CN201910620581.1A patent/CN110334215B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1916889A (en) * | 2005-08-19 | 2007-02-21 | 株式会社日立制作所 | Language material storage preparation device and its method |
CN101587662A (en) * | 2009-01-20 | 2009-11-25 | 郭传喜 | Word frequency based word list sequence |
JP5504097B2 (en) * | 2010-08-20 | 2014-05-28 | Kddi株式会社 | Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation |
CN103324626A (en) * | 2012-03-21 | 2013-09-25 | 北京百度网讯科技有限公司 | Method for setting multi-granularity dictionary and segmenting words and device thereof |
CN105224664A (en) * | 2015-10-08 | 2016-01-06 | 孙继兰 | A kind of digital publication vocabulary extraction, display packing and system |
Non-Patent Citations (3)
Title |
---|
朱伟: "《恋练有词 考研英语词汇识记与应用大全》", 28 February 2015, 北京:群言出版社 * |
王珏: "《运用思维导图促进高中英语词汇教学》", 《基础教育研究》 * |
金亚美: "《从认知语言学角度看少儿英语词汇教学》", 《安徽文学(下半月)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110334215B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11334726B1 (en) | Applied artificial intelligence technology for using natural language processing to train a natural language generation system with respect to date and number textual features | |
CN106847288B (en) | Error correction method and device for voice recognition text | |
Ghosh et al. | Fracking sarcasm using neural network | |
CN106528532B (en) | Text error correction method, device and terminal | |
Saad et al. | Arabic morphological tools for text mining | |
CN110489538A (en) | Sentence answer method, device and electronic equipment based on artificial intelligence | |
CN107357772A (en) | List filling method, device and computer equipment | |
US20230385549A1 (en) | Systems and methods for colearning custom syntactic expression types for suggesting next best corresponence in a communication environment | |
CN109739973A (en) | Text snippet generation method, device, electronic equipment and storage medium | |
Al-Gaphari et al. | A method to convert Sana’ani accent to Modern Standard Arabic | |
Dmytriv et al. | Comparative Analysis of Using Different Parts of Speech in the Ukrainian Texts Based on Stylistic Approach. | |
CN113052544A (en) | Method and device for intelligently adapting workflow according to user behavior and storage medium | |
WO2019231346A1 (en) | Method and system for creating a dialog with a user in a channel convenient for said user | |
CN113761104A (en) | Method and device for detecting entity relationship in knowledge graph and electronic equipment | |
CN110321404A (en) | Entry choosing method, device, electronic equipment and the storage medium of study of words | |
Ibrahim et al. | Bel-Arabi: advanced Arabic grammar analyzer | |
CN111046168A (en) | Method, apparatus, electronic device, and medium for generating patent summary information | |
US20220114202A1 (en) | Summary generation apparatus, control method, and system | |
CN110334215A (en) | Construction method, device, electronic equipment and the storage medium of study of words frame | |
Rofiq | Indonesian news extractive text summarization using latent semantic analysis | |
Baruah et al. | Character coreference resolution in movie screenplays | |
CN111243351B (en) | Foreign language spoken language training system based on word segmentation technology, client and server | |
Munandar et al. | POS-tagging for non-english tweets: An automatic approach:(Study in Bahasa Indonesia) | |
CN113158693A (en) | Uygur language keyword generation method and device based on Chinese keywords, electronic equipment and storage medium | |
KR101318674B1 (en) | Word recongnition apparatus by using n-gram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |