WO2023228360A1 - Dispositif, procédé et programme de génération de modèle - Google Patents

Dispositif, procédé et programme de génération de modèle Download PDF

Info

Publication number
WO2023228360A1
WO2023228360A1 PCT/JP2022/021578 JP2022021578W WO2023228360A1 WO 2023228360 A1 WO2023228360 A1 WO 2023228360A1 JP 2022021578 W JP2022021578 W JP 2022021578W WO 2023228360 A1 WO2023228360 A1 WO 2023228360A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
words
familiarity
test
model
Prior art date
Application number
PCT/JP2022/021578
Other languages
English (en)
Japanese (ja)
Inventor
早苗 藤田
哲生 小林
正嗣 服部
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/021578 priority Critical patent/WO2023228360A1/fr
Publication of WO2023228360A1 publication Critical patent/WO2023228360A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Definitions

  • the disclosed technology relates to a technology for generating a model.
  • the vocabulary number estimation test is a test that accurately estimates the vocabulary number in a short time (see, for example, Non-Patent Document 1). An outline of the estimation procedure is shown below.
  • word familiarity DB database
  • Familiarity is a numerical value of the familiarity of a word. The higher the degree of familiarity with a word, the more familiar the word is.
  • the estimated number of vocabulary means the value estimated to be the user's vocabulary.
  • the number of words corresponding to each intimacy level is not the same. In other words, the number of words varies depending on familiarity.
  • the independent variable x of the logistic model is the total number of words in the word familiarity DB that have higher familiarity than each test word, x will suddenly increase around the familiarity level where many words are concentrated. Tend. In the example of FIG. 11, x suddenly increases around the intimacy level of 5 or around the intimacy level of 3.
  • the disclosed technology aims to generate a model that can robustly and accurately estimate a user's vocabulary size.
  • One aspect of the disclosed technology is a model generation device, in which a plurality of test words and an answer regarding knowledge of the test words of a user to whom the plurality of test words are presented are input, and familiarity is determined based on familiarity with the words. It is an index expressing Using the stored word familiarity database, obtain a model representing the relationship between a value based on the familiarity corresponding to the test word and a value based on the probability that the user answers that he or she knows the test word. It is equipped with a model generation section.
  • FIG. 1 is a diagram showing an example of the functional configuration of a model generation device and a word selection device.
  • FIG. 2 is a diagram illustrating an example of the processing procedure of the model generation method and word selection method.
  • FIG. 3 is a diagram showing an example of a logistic regression model.
  • FIG. 4 is a diagram illustrating an example of the functional configuration of the acquisition probability acquisition device.
  • FIG. 5 is a diagram illustrating an example of the processing procedure of the acquisition probability acquisition method.
  • FIG. 6 is a diagram for explaining an example of generation of acquired word information.
  • FIG. 7 is a diagram showing an example of the functional configuration of the recommended learning word extraction device.
  • FIG. 8 is a diagram illustrating an example of the processing procedure of the recommended learning word extraction method.
  • FIG. 9 is a diagram showing an example of recommended learning words.
  • FIG. 10 is a diagram showing an example of a functional configuration of a computer.
  • FIG. 11 is a diagram showing an example of the correspondence between familiarity and number of words.
  • the first embodiment is a model generation device and method, and a word generation device and method.
  • the model generation device 1 of this embodiment includes a storage section 11, a word selection section 12, a presentation section 13, an answer reception section 14, a model generation section 15, and a vocabulary number estimation section 16. .
  • the model generation device 1 does not need to include the word selection section 12, the presentation section 13, the answer reception section 14, the storage section 11, and the vocabulary number estimation section 16.
  • the word generation device A1 is configured by the storage section 11 and the word selection section 12. Note that the word generation device A1 may include a presentation section 13 and a response reception section 14.
  • the storage unit 11 stores an intimacy database (DB) in advance.
  • the word familiarity DB is a database that stores sets of M words (a plurality of words) and a predetermined familiarity (word familiarity) for each word.
  • a word familiarity DB is stored that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the M words in the word familiarity DB are ranked in an order based on familiarity (for example, in order of familiarity).
  • M is an integer of 2 or more representing the number of words included in the word familiarity DB.
  • M is preferably 70,000 or more
  • M is preferably 10,000 or more. . This is because the vocabulary size of Japanese adults is said to be around 40,000 to 50,000, so around 70,000 words would cover most people's vocabulary, including individual differences.
  • the number of vocabulary varies greatly depending on how the vocabulary is counted, such as variations in spelling and the handling of derived words. Therefore, depending on how you count vocabulary, you may need M of 100,000 or more for your native language.
  • the upper limit of the estimated number of vocabulary is the number of words included in the standard word familiarity DB. Therefore, when estimating the vocabulary of a person with a large vocabulary who is an outlier, it is desirable to increase the value of M.
  • Familiarity is an index that expresses the familiarity with a word.
  • indicators that express the familiarity of a word are: Familiarity is an indicator that expresses the degree of familiarity of a word (for example, the numerical value of the familiarity of a word introduced in Non-Patent Document 1); These are indicators that show how well you see and hear, how well you know words, how well you can write words, and how well you can speak using words. .
  • the storage unit 11 receives read requests from the word selection unit 12 and the model generation unit 15, and outputs the word corresponding to the request and the familiarity of the word.
  • the word selection unit 12 uses the word familiarity DB stored in the storage unit 11 to select a plurality of test words w(1), ..., w(N) from a plurality of words corresponding to the test word.
  • the intervals of intimacy are selected to be constant intervals (step S12).
  • the word selection unit 12 evenly selects N words from all the words included in the word familiarity DB in the storage unit 11 so that the familiarity of the selected words is at approximately constant intervals, and words are output as test words w(1), ..., w(N).
  • the word selection unit 12 selects words such that the familiarity interval is 0.1. For example, the word selection unit 12 selects a word w(1) with a familiarity of 1, a word w(2) with a familiarity of 1.1, a word w(60) with a familiarity of 6.9, a word w(60) with a familiarity of 7 words w (61), a total of 61 words may be selected.
  • the familiarity of the test words w(1),...,w(N) does not necessarily have to be at regular intervals, it is sufficient if they are evenly selected, and from past research, it is the boundary between what the user knows and what they do not know. If the degree of familiarity in the surrounding area is predicted, a larger number of words in the vicinity of the degree of familiarity to be investigated may be selected. That is, the familiarity values of the series of test words w(1), . . . , w(N) may vary in density.
  • test words w(1),..., w(N) output from the word selection unit 12, but the word selection unit 12 selects the test words w(1),..., w(N) in order of familiarity, for example.
  • Output w(N) There is no limit to the order of the test words w(1),..., w(N) output from the word selection unit 12, but the word selection unit 12 selects the test words w(1),..., w(N) in order of familiarity, for example.
  • the number N of test words may be specified by the question generation request, or may be predetermined. Although there is no limit to the value of N, it is desirable that, for example, about 50 ⁇ N ⁇ 100. In order to perform sufficient estimation, it is desirable that N ⁇ 25. A larger N allows more accurate estimation, but increases the burden on the user (subject) (step S12).
  • tests of 50 words each are conducted multiple times (for example, 3 times), the number of vocabulary is estimated for each test, and the answers from multiple tests are combined. You may re-estimate. In this case, the number of words tested at one time can be reduced, which reduces the burden on the user, and if the results can be viewed for each test, the user's motivation to answer can be maintained. Furthermore, if the final vocabulary size estimation is performed by combining words from multiple times, the estimation accuracy can be improved.
  • the presentation unit 13 presents the test words w(1), ..., w(N) to the user 100 (subject) according to a preset display format (step S13).
  • the presentation unit 13 displays a predetermined instruction sentence prompting the user 100 to input an answer regarding his or her knowledge of test words, and N test words w(1), . . . , in accordance with a preset display format.
  • w(N) is presented to the user 100 in a format for a vocabulary size estimation test.
  • this information may be presented as visual information such as text or images, auditory information such as audio, or tactile information such as Braille. good.
  • the presentation unit 13 may electronically display the instruction sentence and the test words on the display screen of a terminal device such as a PC (personal computer), tablet, or smartphone. That is, the presentation unit 13 may generate screen information to be presented on a display or the like, and may output the screen information to the display.
  • a terminal device such as a PC (personal computer), tablet, or smartphone. That is, the presentation unit 13 may generate screen information to be presented on a display or the like, and may output the screen information to the display.
  • the presentation unit 13 may be a printing device, and the instruction sentences and test words may be printed on paper or the like and output.
  • the presentation unit 13 may be a speaker of the terminal device and may output the instruction sentence and the test word aloud.
  • the presentation unit 13 may be a Braille display and present the instruction sentence and the test word in Braille.
  • test words are presented in descending order of familiarity, but the presentation order is not limited to this, and the test words may be presented in a random order.
  • ⁇ Reply reception section 14> Input: Answer regarding the user's knowledge of the test word
  • the user 100 who has been presented with the instruction sentence and the test word, sends the answer regarding the user's 100 knowledge of the test word to the response reception section. 14 (step S14).
  • the answer reception unit 14 is a touch panel of a terminal device such as a PC, a tablet, or a smartphone, and the user 100 inputs the answer to the touch panel.
  • the answer receiving unit 14 may be a microphone of a terminal device, and in this case, the user 100 inputs the answer by voice into the microphone.
  • the user 100 may input an answer into the answer reception unit 14 by clicking with a mouse or the like.
  • the answer reception unit 14 receives an input answer regarding knowledge of the test word (for example, an answer that the test word is known or an answer that the test word is not known), and outputs the answer as electronic data. do.
  • the answer receiving unit 14 may output an answer for each test word, may output answers for one test at once, or may output answers for multiple tests at once.
  • the answer reception unit 14 when the answer reception unit 14 receives an answer that the user 100 knows the test word, it assigns a value of 1 to the answer regarding the knowledge of the test word. On the other hand, when the answer reception unit 14 receives an answer that the user 100 does not know the test word, it assigns a numerical value of 0 to the answer regarding the knowledge of the test word. These numerical values are output to the model generation section 15.
  • Model generation unit 15 Input: Answer regarding the user's knowledge of the test word
  • Model The answer regarding the user's 100 knowledge of the test word outputted from the answer reception unit 14 is input to the model generation unit 15.
  • the model generation unit 15 uses the answer regarding the knowledge of the test word and the word familiarity DB stored in the storage unit 11 to generate a value based on the familiarity corresponding to the test word and the user 100's knowledge of the test word.
  • a model representing the relationship between a value based on the probability of answering that they know is obtained (step S15).
  • the obtained model is output to the vocabulary number estimation section 16.
  • the value based on the familiarity corresponding to the test word may be the familiarity itself corresponding to the test word, or may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the familiarity corresponding to the test word. You can. To simplify the explanation, a case will be exemplified below in which the value based on the degree of familiarity corresponding to the test word is the degree of familiarity corresponding to the test word itself.
  • the value based on the probability that the user 100 answers that he or she knows the test word may be the probability that the user 100 answers that he or she knows the test word, or may be the probability that the user 100 answers that he or she knows the test word. It may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the probability of answering that the answer is yes.
  • a case is exemplified in which the value based on the probability that the user 100 answers that he or she knows the test word is the probability that the user 100 answers that he or she knows the test word. .
  • model an example of the model is a logistic regression model (logistic model).
  • is a model parameter.
  • the model generation unit 15 refers to the word familiarity DB stored in the storage unit 11 to obtain the familiarity corresponding to the test word w(n) that the user 100 answered that he/she knows.
  • x(n) be the intimacy level. This familiarity x(n) is the familiarity corresponding to the test word w(n).
  • the model generation unit 15 determines that the user 100 knows the test word w(n) for which the user 100 answered that he or she does not know (or does not answer that he or she knows) the test word w(n).
  • the point (x, y) (x(n), 0).
  • the horizontal axis represents the degree of familiarity
  • the vertical axis represents the probability (y) of a person answering that they know the word.
  • "AIC" in FIG. 3 represents the Akaike information criterion, and the smaller the value, the better the fit of the model.
  • "n” in FIG. 3 represents the number of test words.
  • model generation section 15 may be the model generation section 15 or the model construction section 15. Models may also be created or constructed.
  • estimation methods 1 to 3 will be explained as examples of methods for estimating the number of vocabulary of the user 100 by the vocabulary number estimation unit 16.
  • the vocabulary size estimating unit 16 obtains a predetermined value acquisition familiarity, which is the familiarity when the value based on the probability that the user 100 answers that he or she knows the word is at a predetermined value or in the vicinity of the predetermined value.
  • a predetermined value acquisition familiarity which is the familiarity when the value based on the probability that the user 100 answers that he or she knows the word is at a predetermined value or in the vicinity of the predetermined value.
  • Examples of the predetermined value are 0.5 or 0.8.
  • the predetermined value may be any other value greater than 0 and less than 1.
  • the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, obtains the number of words with a familiarity equal to or higher than a predetermined value of acquired familiarity, and uses the obtained number as a user's
  • the number of vocabulary is 100.
  • the vocabulary number estimating unit 16 refers to the model and the word familiarity DB stored in the storage unit 11 to determine the familiarity x(m) corresponding to the word w(m) included in the word familiarity DB. ) is input to the model, the output value y(m) is obtained. In other words, the vocabulary size estimation unit 16 calculates the value of y corresponding to the familiarity x(m) corresponding to the word w(m) in the model, and sets the calculated value as the output value y(m). .
  • the vocabulary number estimation unit 16 calculates the knowledge of the test word w(m).
  • the number of vocabulary of the user 100 may be estimated by taking into account the answers regarding the question.
  • the number of vocabulary is directly set to x by estimating the number of vocabulary based on the logistic model estimated from y, which is the probability that the user 100 answers that he/she knows the test word, and the familiarity x of the test word.
  • the model converges more easily and the number of vocabulary can be estimated more robustly. Moreover, even if the distribution of the number of words corresponding to each degree of familiarity differs greatly, a sudden change in the estimated number of vocabulary can be suppressed.
  • the vocabulary number estimation unit 16 refers to the model and the word familiarity DB stored in the storage unit 11, and calculates the result when the familiarity x(i) included in the word familiarity DB is input to the model. Obtain the output value y(i). In other words, the vocabulary number estimating unit 16 calculates the value of y corresponding to the familiarity x(i) in the model, and sets the calculated value as the output value y(i). Further, the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, and the number n( of words corresponding to the familiarity x(i) included in the word familiarity DB) i) obtain.
  • the word selection unit 12 does not simply select a plurality of test words w(1), ..., w(N) from a plurality of words, rather than setting the familiarity intervals corresponding to the test words at regular intervals. good.
  • model generation unit 15 assumes an answer regarding knowledge of the non-presentation words, and generates values based on the familiarity corresponding to the test words and the non-presentation words, and the user 100's knowledge of the test words and the non-presentation words.
  • a model representing the relationship between the probability of answering or a value based on an assumption may be obtained.
  • the non-presented word is a word other than the plurality of test words among the plurality of words.
  • answers for non-presented words that were not used as test words are assumed and used to create the model.
  • Words near the upper limit of familiarity are words that many people know, and words near the lower limit are words that many people do not know. Therefore, if the user 100 answers that he/she knows the word with the highest degree of familiarity among the test words, it is assumed that the user 100 also knows the non-presented words with a degree of familiarity higher than that degree of familiarity. Conversely, if the user answers that he or she does not know the word with the lowest familiarity among the test words, it is assumed that the user does not know the non-presented word with a familiarity lower than that familiarity.
  • non-presented words which are words that were not presented to the user 100
  • answers regarding knowledge of non-presented words and estimating the model
  • the model converges more easily and a more appropriate model can be created. can be generated. This makes it easier for the model to converge, for example, even if user 100 answers that he or she knows most of the test words, or even if user 100 answers that he or she does not know most of the test words. , a more appropriate model can be generated.
  • the second embodiment is an acquisition probability acquisition device and method.
  • the acquisition probability acquisition device 2 of this embodiment includes a storage section 11, a model storage section 21, a word extraction section 22, a familiarity acquisition section 23, an acquisition probability acquisition section 24, and an acquisition word information generation section. It is equipped with 25.
  • the acquisition probability acquisition device 2 does not need to include the word extraction section 22 and the acquisition word information generation section 25.
  • the storage unit 11 is the same as the storage unit 11 of the first embodiment.
  • the storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the degree of familiarity is an index representing the degree of familiarity with a word.
  • the model storage unit 21 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word.
  • a certain person is a person who obtains the acquisition probability.
  • “Someone” may be the user 100.
  • acquiring words means, in other words, knowing the words, being able to use the words, knowing the words, or being able to explain the words.
  • An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
  • the acquisition probability acquisition device 2 may further include a model generation device 1 for generating a model stored in the model storage unit 21.
  • the acquisition probability acquisition device 2 includes (1) a word selection unit 12 that selects a plurality of test words from a plurality of words, (2) a presentation unit 13 that presents test words to a user, and (3) a user (4) Answer reception unit 14 that accepts answers regarding knowledge of test words; (4) Answers regarding knowledge of test words and word familiarity DB stored in storage unit 11; A model expressing the relationship between a value based on , and a value based on the probability that the user answers that he or she knows the test word is obtained, and the obtained model is used as the model stored in the model storage unit.
  • the device may further include a portion 15.
  • Each extracted word is output to the familiarity acquisition unit 23.
  • the text input to the word extraction unit 22 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
  • the word extraction unit 22 extracts each word contained in the input text, for example, by performing morphological analysis on the input text.
  • the intimacy level acquisition unit 23 acquires the familiarity level corresponding to each word from the word familiarity level DB stored in the storage unit 11 (step S23).
  • the acquisition probability acquisition device 2 does not include the word extraction unit 22, each word included in the text is input.
  • the familiarity acquisition unit 23 acquires the familiarity corresponding to each word included in the text from the word familiarity DB stored in the storage unit 11 (step S23).
  • Each word and the familiarity corresponding to each word are output to the acquisition probability acquisition unit 24.
  • the familiarity acquisition unit 23 and the word extraction unit 22 do not need to acquire the familiarity for words that are function words such as proper nouns, numerals, and particles.
  • the word extraction unit 22 may acquire the familiarity of only words that are content words.
  • Function words such as number words and particles are words that many people know. Therefore, by acquiring the familiarity of these function words, in other words, by using these function words as processing targets, the acquired word information generation unit 25 calculates the familiarity of the text. The percentage of estimated acquired words can be increased. On the other hand, by not acquiring familiarity for these function words, in other words, by not processing these function words, the acquired word information generation unit 25 calculates The estimated percentage of acquired words can be lowered.
  • the familiarity acquisition unit 23 may ignore words that are not included in the word familiarity DB without acquiring the familiarity. Thereby, even if the morphological analysis is incorrect, the acquisition probability acquisition process can be performed appropriately.
  • the acquisition probability acquisition unit 24 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 24 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
  • the acquisition probability acquisition unit 24 may acquire the acquisition probability by considering the part of speech, word length, etc. For example, the acquisition probability acquisition unit 24 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
  • Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
  • the acquired word information generation unit 25 generates acquired word information, which is information regarding acquisition of words included in the text, using the acquisition probability corresponding to each word (step S25 ).
  • An example of the acquired word information is at least one of the following: estimated acquired words in the text, the number of estimated acquired words in the text, and the ratio of estimated acquired words in the text.
  • the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person.
  • the number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment.
  • the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k).
  • the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 sets words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as estimated acquired words in the text.
  • the higher the familiarity of a word the smaller the GOISU(k). Therefore, it can be assumed that a person knows words in GOISU(k) that are less than the number of words in that person's vocabulary.
  • FIG. 6 shows an example of GOISU(k).
  • the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person.
  • the number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment.
  • the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k).
  • the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 sets the number of words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as the estimated number of acquired words in the text.
  • the acquired word information generation unit 25 calculates a value determined by, for example, the following formula (1) or formula (2), and uses the calculated value as the ratio of estimated acquired words in the text.
  • FREQ(k) is the number of times the word w(k) appears in the text. Assuming that the text is divided into multiple parts, DIFF(k) is the number of parts in which the word w(k) appears.
  • An example of a part is a predetermined unit that constitutes a text, such as a unit, chapter, or section. The entire text may be used as a unit.
  • K is the total number of words included in the text and for which the acquisition probability has been acquired by the acquisition probability acquisition unit 24.
  • the acquired word information generation unit 25 counts FREQ(k) and DIFF(k) based on the input word.
  • the acquired word information generation unit 25 calculates a value determined by equation (1) or equation (2) using FREQ(k) and DIFF(k) found by counting.
  • FIG. 6 shows an example of FREQ(k) and DIFF(k).
  • the number of occurrences of rare words in text will be less than the number of occurrences of well-known words in text.
  • the acquired word information generation unit 25 may use the number of estimated acquired words in the text/K as the ratio of the estimated acquired words in the text.
  • the number of estimated acquired words in a text can be determined by the method described in (Estimated number of acquired words in text).
  • the third embodiment is a recommended learning word extraction device and method.
  • the recommended learning word extraction device 3 of this embodiment includes a storage section 11, a model storage section 31, an acquisition probability acquisition section 32, and a recommended learning word extraction section 33.
  • the storage unit 11 is the same as the storage unit 11 of the first embodiment.
  • the storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the degree of familiarity is an index representing the degree of familiarity with a word.
  • the model storage unit 31 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word.
  • a certain person is a person from whom recommended learning words are extracted.
  • “Someone” may be the user 100.
  • An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
  • the recommended learning word extraction device 3 may further include a model generation device 1 for generating a model stored in the model storage unit 31.
  • the learning recommended word extraction device 3 includes (1) a word selection section 12 that selects a plurality of test words from a plurality of words, (2) a presentation section 13 that presents test words to a user, and (3) a usage method. and (4) the answer reception unit 14 that receives answers regarding the knowledge of the test words by the parent who corresponds to the test words, using the answers regarding the knowledge of the test words and the word familiarity DB stored in the storage unit 11.
  • a model that obtains a model expressing the relationship between a value based on the density and a value based on the probability that the user answers that he or she knows the test word, and uses the obtained model as the model stored in the model storage unit.
  • the generating unit 15 may further be provided.
  • the acquisition probability acquisition unit 32 uses at least the word familiarity DB stored in the storage unit 11 and the model stored in the model storage unit 31 to identify each word included in the input word set to a certain person.
  • the acquisition probability which is the probability that is acquired, is obtained (step S32).
  • the acquisition probability acquisition unit 32 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 32 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
  • the acquisition probability acquisition unit 32 may acquire the acquisition probability by considering the part of speech, the length of the word, etc. For example, the acquisition probability acquisition unit 32 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
  • Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
  • ⁇ Learning recommended word extraction unit 33 Input: word, acquisition probability
  • Output recommended learning word
  • the recommended learning word extraction unit 33 extracts recommended learning words from the word set based on the acquired acquisition probability (step S33).
  • the recommended learning word extraction unit 33 may extract words whose acquisition probability is close to a predetermined probability as recommended learning words.
  • the predetermined probability is a number greater than 0 and less than 1.
  • An example of a predetermined probability is 0.5.
  • the recommended learning word extraction unit 33 may extract a predetermined number of words with a predetermined probability as recommended learning words.
  • the 7 words shown in FIG. 9 are extracted as recommended learning words.
  • ENTRY is the notation of the word
  • PSY is the familiarity
  • Prob is the acquisition probability
  • YN is the answer that the user 100 knows or does not know about these words. If there is, it is information about the answer
  • Distance50 is the size of the difference between 0.5, which is the predetermined probability in this case, and Prob.
  • "-" is displayed in YN because the user 100 has not answered that he or she knows or does not know. If user 100 answers that they know the word, "1" is displayed in YN, and if user 100 answers that they do not know the word, "0" is displayed in YN. ” is displayed.
  • the recommended learning words are presented to the person from whom the recommended learning words are to be extracted.
  • the recommended learning words may be presented to a person from whom the recommended learning words are to be extracted, in the form of a table shown in FIG.
  • the recommended learning word extraction unit 33 may extract words included within a predetermined range that includes a predetermined probability as recommended learning words.
  • the recommended learning word extraction unit 33 may extract words with a predetermined part of speech and whose obtained acquisition probability is close to a predetermined probability as recommended learning words.
  • predetermined parts of speech are verbs, nouns, and adjectives.
  • the predetermined part of speech may be two or more types of parts of speech.
  • the recommended learning word extracting unit 33 may extract words for which the obtained acquisition probability is close to a predetermined probability from among words of two or more types of parts of speech as recommended learning words.
  • the part of speech information may be stored in the word familiarity DB.
  • the learning recommended word extraction unit 33 can refer to the word familiarity DB, obtain the part of speech of the word, and perform the above processing.
  • the learning recommended word extraction unit 33 may refer to a dictionary in which words and their parts of speech are stored in a storage unit (not shown), obtain the part of speech of the word, and perform the above processing.
  • the word set that is input to the acquisition probability acquisition unit 32 and is made up of a plurality of words that are candidates for learning recommended words may be words that are included in a predetermined text.
  • the recommended learning word extraction device 3 may include a word extraction section 34 described below.
  • Each extracted word is output to the acquisition probability acquisition unit 32 as a word set that is a candidate for learning recommended words.
  • the text input to the word extraction unit 34 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
  • the word extraction unit 34 extracts each word included in the input text, for example, by performing morphological analysis on the input text.
  • data may be exchanged directly between the components of the model generation device 1, acquisition probability acquisition device 2, and recommended learning word extraction device 3, or may be performed via a storage unit (not shown). .
  • a program that describes this processing content can be recorded on a computer-readable recording medium.
  • the computer-readable recording medium is, for example, a non-transitory recording medium, specifically a magnetic recording device, an optical disk, or the like.
  • this program is performed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, this program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to another computer via a network.
  • a computer that executes such a program for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer into the auxiliary storage unit 1050, which is its own non-temporary storage device. Store. When executing a process, this computer loads a program stored in the auxiliary storage unit 1050, which is its own non-temporary storage device, into the storage unit 1020, and executes the process according to the read program. Further, as another form of execution of this program, the computer may directly load the program from a portable recording medium into the storage unit 1020 and execute processing according to the program. Each time the received program is transferred, processing may be executed in accordance with the received program.
  • ASP Application Service Provider
  • the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer programs from the server computer to this computer, but only realizes processing functions by issuing execution instructions and obtaining results.
  • ASP Application Service Provider
  • the present apparatus is configured by executing a predetermined program on a computer, but at least a part of these processing contents may be implemented in hardware.
  • the word selection unit 12, the presentation unit 13, the answer reception unit 14, the model generation unit 15, the number of vocabulary estimation unit 16, the word extraction unit 22, the familiarity acquisition unit 23, the acquisition probability acquisition unit 24, the acquired word information generation unit 25 , the acquisition probability acquisition section 32, the recommended learning word extraction section 33, and the word extraction section 34 may be constituted by a processing circuit.
  • the storage unit 11, model storage unit 21, and model storage unit 31 may be configured by memory.
  • the memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
  • the processor includes: selecting a plurality of test words from the plurality of words using the word familiarity DB stored in the memory such that the intervals of the familiarity corresponding to the test words are constant intervals; Word selection device.
  • a non-transitory storage medium storing a program executable by a computer to perform a word selection process,
  • the word selection process includes: Familiarity is an index representing the familiarity with a word, and using a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, multiple tests are performed from the plurality of words. Select words such that the familiarity intervals corresponding to the test words are at regular intervals.
  • the memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
  • the processor includes: A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words are presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB stored in the memory are input. to obtain a model representing the relationship between a value based on the degree of familiarity corresponding to the test word and a value based on the probability that the user answers that he/she knows the test word.
  • Model generator Model generator.
  • a non-transitory storage medium storing a program executable by a computer to perform a model generation process,
  • the model generation process is A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words were presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB are used to perform the test.
  • the degree of familiarity is an index representing the degree of familiarity with a word
  • the word familiarity DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively.
  • the memory includes: The degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively; A model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word; is memorized,
  • the processor includes: Obtaining the familiarity corresponding to each word included in the input text from the word familiarity DB stored in the memory, acquiring an acquisition probability that is a probability that the certain person has acquired each of the words, using at least the familiarity corresponding to each of the acquired words and the model stored in the memory; Acquisition probability acquisition device.
  • a non-temporary storage medium storing a program executable by a computer to execute an acquisition probability acquisition process
  • the acquisition probability acquisition process includes: Familiarity is an index representing the familiarity with a word, and from a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, each word included in the input text is Get the corresponding intimacy, Using at least a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word, and the familiarity corresponding to each of the acquired words. and obtain an acquisition probability that is the probability that the certain person has acquired each word; Non-transitory storage medium.
  • the memory includes:
  • the degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively;
  • a model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word; is memorized,
  • the processor includes: acquisition, which is the probability that the certain person has acquired each word included in the input word set, using at least the word familiarity DB stored in the memory and the model stored in the memory; get the probability, extracting recommended learning words from the word set based on the acquired acquisition probability; Recommended learning word extraction device.
  • a non-temporary storage medium storing a program executable by a computer to execute a recommended learning word extraction process
  • the learning recommended word extraction process is Familiarity is an index representing the familiarity with a word
  • a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, and a value based on the familiarity corresponding to each word.
  • acquisition probability which is the probability that a certain person has acquired each word included in the input word set, using at least a model representing a relationship between a value based on the probability that a certain person has acquired each word; get extracting recommended learning words from the word set based on the acquired acquisition probability;
  • Non-transitory storage medium Non-transitory storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un dispositif de génération de modèle qui comprend : une unité de stockage (11) qui stocke une base de données de degré de familiarité de mot qui reçoit, en tant qu'entrée, une pluralité de mots de test présentés à un utilisateur et les réponses de l'utilisateur concernant la connaissance des mots de test, et qui stocke une pluralité de mots et une pluralité de degrés de familiarité associés respectivement à la pluralité de mots, un degré de familiarité étant une mesure de familiarité avec un mot ; et une unité de génération de modèle (15) qui obtient un modèle représentant la relation entre une valeur basée sur le degré de familiarité associé à chaque mot de test et une valeur basée sur la probabilité avec laquelle l'utilisateur répondra que l'utilisateur connaît le mot de test, à l'aide des réponses concernant la connaissance des mots de test et de la base de données de degré de familiarité de mot stockée dans l'unité de stockage (11).
PCT/JP2022/021578 2022-05-26 2022-05-26 Dispositif, procédé et programme de génération de modèle WO2023228360A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/021578 WO2023228360A1 (fr) 2022-05-26 2022-05-26 Dispositif, procédé et programme de génération de modèle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/021578 WO2023228360A1 (fr) 2022-05-26 2022-05-26 Dispositif, procédé et programme de génération de modèle

Publications (1)

Publication Number Publication Date
WO2023228360A1 true WO2023228360A1 (fr) 2023-11-30

Family

ID=88918745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/021578 WO2023228360A1 (fr) 2022-05-26 2022-05-26 Dispositif, procédé et programme de génération de modèle

Country Status (1)

Country Link
WO (1) WO2023228360A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005107483A (ja) * 2003-09-11 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> 単語学習方法、単語学習装置、単語学習プログラム及びそのプログラムを記録した記録媒体、並びに文字列学習方法、文字列学習装置、文字列学習プログラム及びそのプログラムを記録した記録媒体
WO2021260760A1 (fr) * 2020-06-22 2021-12-30 日本電信電話株式会社 Dispositif d'estimation de taille de vocabulaire, procédé d'estimation de taille de vocabulaire, et programme

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005107483A (ja) * 2003-09-11 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> 単語学習方法、単語学習装置、単語学習プログラム及びそのプログラムを記録した記録媒体、並びに文字列学習方法、文字列学習装置、文字列学習プログラム及びそのプログラムを記録した記録媒体
WO2021260760A1 (fr) * 2020-06-22 2021-12-30 日本電信電話株式会社 Dispositif d'estimation de taille de vocabulaire, procédé d'estimation de taille de vocabulaire, et programme

Similar Documents

Publication Publication Date Title
JP5664978B2 (ja) 学習支援システム及び学習支援方法
JP6544131B2 (ja) 情報処理装置及び情報処理プログラム
US9208144B1 (en) Crowd-sourced automated vocabulary learning system
Han Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach
BR122017002795A2 (pt) sistemas e métodos para aprendizagem de idioma
JP2008242437A5 (fr)
US20160012751A1 (en) Comprehension assistance system, comprehension assistance server, comprehension assistance method, and computer-readable recording medium
JP7396485B2 (ja) 語彙数推定装置、語彙数推定方法、およびプログラム
KR20150131022A (ko) 청력 검사 장치, 청력 검사 방법 및 청력 검사용 단어 작성 방법
Butler et al. Exploration of automatic speech recognition for deaf and hard of hearing students in higher education classes
Gnevsheva Variation in foreign accent identification
JP6030659B2 (ja) メンタルヘルスケア支援装置、システム、方法およびプログラム
JP6717387B2 (ja) 文章評価装置、文章評価方法および記録媒体
WO2023228360A1 (fr) Dispositif, procédé et programme de génération de modèle
WO2023228361A1 (fr) Dispositif, procédé et programme d&#39;acquisition de probabilité d&#39;acquisition
WO2023228359A1 (fr) Dispositif, procédé et programme de sélection de mot
WO2023228358A1 (fr) Dispositif, procédé et programme d&#39;extraction de mot de recommandation d&#39;apprentissage
KR20180096317A (ko) 영단어 학습을 위한 시스템
EP3682796A1 (fr) Procédé mis en uvre par ordinateur, appareil et produit-programme d&#39;ordinateur permettant de déterminer un ensemble de mots mis à jour destiné à être utilisé dans un test d&#39;apprentissage verbal auditif
KR101432791B1 (ko) 문장 음 높낮이 표시방법 및 문장 음 높낮이를 표시하는 어학콘텐츠 서비스 시스템과 그 방법
Lockart et al. Factors that enhance English-speaking speech-language pathologists' transcription of Cantonese-speaking children's consonants
JP7396488B2 (ja) 語彙数推定装置、語彙数推定方法、およびプログラム
JP7396487B2 (ja) 語彙数推定装置、語彙数推定方法、およびプログラム
Albert et al. Effects of masculinized and feminized male voices on men and women’s distractibility and implicit memory
JP2021162732A (ja) 課題レコメンドシステム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943762

Country of ref document: EP

Kind code of ref document: A1