WO2023228359A1 - Word selection device, method, and program - Google Patents

Word selection device, method, and program Download PDF

Info

Publication number
WO2023228359A1
WO2023228359A1 PCT/JP2022/021577 JP2022021577W WO2023228359A1 WO 2023228359 A1 WO2023228359 A1 WO 2023228359A1 JP 2022021577 W JP2022021577 W JP 2022021577W WO 2023228359 A1 WO2023228359 A1 WO 2023228359A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
words
familiarity
test
unit
Prior art date
Application number
PCT/JP2022/021577
Other languages
French (fr)
Japanese (ja)
Inventor
早苗 藤田
哲生 小林
正嗣 服部
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/021577 priority Critical patent/WO2023228359A1/en
Publication of WO2023228359A1 publication Critical patent/WO2023228359A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Definitions

  • the disclosed technology relates to a technology for selecting words.
  • the vocabulary number estimation test is a test that accurately estimates the vocabulary number in a short time (see, for example, Non-Patent Document 1). An outline of the estimation procedure is shown below.
  • word familiarity DB database
  • Familiarity is a numerical value of the familiarity of a word. The higher the degree of familiarity with a word, the more familiar the word is.
  • the estimated number of vocabulary means the value estimated to be the user's vocabulary.
  • the number of words corresponding to each intimacy level is not the same. In other words, the number of words varies depending on familiarity.
  • test words are selected at approximately constant intervals from the words arranged in order of familiarity, the familiarity values of the selected test words will not be at constant intervals. In other words, more words are selected from a familiarity level where many words are concentrated, and conversely, fewer words are selected from a familiarity level where few words exist.
  • the disclosed technology aims to make it easier to converge the logistic regression analysis and to robustly estimate the number of vocabulary.
  • One aspect of the disclosed technology is a word selection device in which the degree of familiarity is an index representing the degree of intimacy with respect to a word, and a word parent that stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words.
  • the degree of familiarity is an index representing the degree of intimacy with respect to a word
  • a word parent that stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words.
  • FIG. 1 is a diagram showing an example of the functional configuration of a model generation device and a word selection device.
  • FIG. 2 is a diagram illustrating an example of the processing procedure of the model generation method and word selection method.
  • FIG. 3 is a diagram showing an example of a logistic regression model.
  • FIG. 4 is a diagram illustrating an example of the functional configuration of the acquisition probability acquisition device.
  • FIG. 5 is a diagram illustrating an example of the processing procedure of the acquisition probability acquisition method.
  • FIG. 6 is a diagram for explaining an example of generation of acquired word information.
  • FIG. 7 is a diagram showing an example of the functional configuration of the recommended learning word extraction device.
  • FIG. 8 is a diagram illustrating an example of the processing procedure of the recommended learning word extraction method.
  • FIG. 9 is a diagram showing an example of recommended learning words.
  • FIG. 10 is a diagram showing an example of a functional configuration of a computer.
  • FIG. 11 is a diagram showing an example of the correspondence between familiarity and number of words.
  • the first embodiment is a model generation device and method, and a word generation device and method.
  • the model generation device 1 of this embodiment includes a storage section 11, a word selection section 12, a presentation section 13, an answer reception section 14, a model generation section 15, and a vocabulary number estimation section 16. .
  • the model generation device 1 does not need to include the word selection section 12, the presentation section 13, the answer reception section 14, the storage section 11, and the vocabulary number estimation section 16.
  • the word generation device A1 is configured by the storage section 11 and the word selection section 12. Note that the word generation device A1 may include a presentation section 13 and a response reception section 14.
  • the storage unit 11 stores an intimacy database (DB) in advance.
  • the word familiarity DB is a database that stores sets of M words (a plurality of words) and a predetermined familiarity (word familiarity) for each word.
  • a word familiarity DB is stored that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the M words in the word familiarity DB are ranked in an order based on familiarity (for example, in order of familiarity).
  • M is an integer of 2 or more representing the number of words included in the word familiarity DB.
  • M is preferably 70,000 or more
  • M is preferably 10,000 or more. . This is because the vocabulary size of Japanese adults is said to be around 40,000 to 50,000, so around 70,000 words would cover most people's vocabulary, including individual differences.
  • the number of vocabulary varies greatly depending on how the vocabulary is counted, such as variations in spelling and the handling of derived words. Therefore, depending on how you count vocabulary, you may need M of 100,000 or more for your native language.
  • the upper limit of the estimated number of vocabulary is the number of words included in the standard word familiarity DB. Therefore, when estimating the vocabulary of a person with a large vocabulary who is an outlier, it is desirable to increase the value of M.
  • Familiarity is an index that expresses the familiarity with a word.
  • indicators that express the familiarity of a word are: Familiarity is an indicator that expresses the degree of familiarity of a word (for example, the numerical value of the familiarity of a word introduced in Non-Patent Document 1); These are indicators that show how well you see and hear, how well you know words, how well you can write words, and how well you can speak using words. .
  • the storage unit 11 receives read requests from the word selection unit 12 and the model generation unit 15, and outputs the word corresponding to the request and the familiarity of the word.
  • the word selection unit 12 uses the word familiarity DB stored in the storage unit 11 to select a plurality of test words w(1), ..., w(N) from a plurality of words corresponding to the test word.
  • the intervals of intimacy are selected to be constant intervals (step S12).
  • the word selection unit 12 evenly selects N words from all the words included in the word familiarity DB in the storage unit 11 so that the familiarity of the selected words is at approximately constant intervals, and words are output as test words w(1), ..., w(N).
  • the word selection unit 12 selects words such that the familiarity interval is 0.1. For example, the word selection unit 12 selects a word w(1) with a familiarity of 1, a word w(2) with a familiarity of 1.1, a word w(60) with a familiarity of 6.9, a word w(60) with a familiarity of 7 words w (61), a total of 61 words may be selected.
  • the familiarity of the test words w(1),...,w(N) does not necessarily have to be at regular intervals, it is sufficient if they are evenly selected, and from past research, it is the boundary between what the user knows and what they do not know. If the degree of familiarity in the surrounding area is predicted, a larger number of words in the vicinity of the degree of familiarity to be investigated may be selected. That is, the familiarity values of the series of test words w(1), . . . , w(N) may vary in density.
  • test words w(1),..., w(N) output from the word selection unit 12, but the word selection unit 12 selects the test words w(1),..., w(N) in order of familiarity, for example.
  • Output w(N) There is no limit to the order of the test words w(1),..., w(N) output from the word selection unit 12, but the word selection unit 12 selects the test words w(1),..., w(N) in order of familiarity, for example.
  • the number N of test words may be specified by the question generation request, or may be predetermined. Although there is no limit to the value of N, it is desirable that, for example, about 50 ⁇ N ⁇ 100. In order to perform sufficient estimation, it is desirable that N ⁇ 25. A larger N allows more accurate estimation, but increases the burden on the user (subject) (step S12).
  • tests of 50 words each are conducted multiple times (for example, 3 times), the number of vocabulary is estimated for each test, and the answers from multiple tests are combined. You may re-estimate. In this case, the number of words tested at one time can be reduced, which reduces the burden on the user, and if the results can be viewed for each test, the user's motivation to answer can be maintained. Furthermore, if the final vocabulary size estimation is performed by combining words from multiple times, the estimation accuracy can be improved.
  • the presentation unit 13 presents the test words w(1), ..., w(N) to the user 100 (subject) according to a preset display format (step S13).
  • the presentation unit 13 displays a predetermined instruction sentence prompting the user 100 to input an answer regarding his or her knowledge of test words, and N test words w(1), . . . , in accordance with a preset display format.
  • w(N) is presented to the user 100 in a format for a vocabulary size estimation test.
  • this information may be presented as visual information such as text or images, auditory information such as audio, or tactile information such as Braille. good.
  • the presentation unit 13 may electronically display the instruction sentence and the test words on the display screen of a terminal device such as a PC (personal computer), tablet, or smartphone. That is, the presentation unit 13 may generate screen information to be presented on a display or the like, and may output the screen information to the display.
  • a terminal device such as a PC (personal computer), tablet, or smartphone. That is, the presentation unit 13 may generate screen information to be presented on a display or the like, and may output the screen information to the display.
  • the presentation unit 13 may be a printing device, and the instruction sentences and test words may be printed on paper or the like and output.
  • the presentation unit 13 may be a speaker of the terminal device and may output the instruction sentence and the test word aloud.
  • the presentation unit 13 may be a Braille display and present the instruction sentence and the test word in Braille.
  • test words are presented in descending order of familiarity, but the presentation order is not limited to this, and the test words may be presented in a random order.
  • ⁇ Reply reception section 14> Input: Answer regarding the user's knowledge of the test word
  • the user 100 who has been presented with the instruction sentence and the test word, sends the answer regarding the user's 100 knowledge of the test word to the response reception section. 14 (step S14).
  • the answer reception unit 14 is a touch panel of a terminal device such as a PC, a tablet, or a smartphone, and the user 100 inputs the answer to the touch panel.
  • the answer receiving unit 14 may be a microphone of a terminal device, and in this case, the user 100 inputs the answer by voice into the microphone.
  • the user 100 may input an answer into the answer reception unit 14 by clicking with a mouse or the like.
  • the answer reception unit 14 receives an input answer regarding knowledge of the test word (for example, an answer that the test word is known or an answer that the test word is not known), and outputs the answer as electronic data. do.
  • the answer receiving unit 14 may output an answer for each test word, may output answers for one test at once, or may output answers for multiple tests at once.
  • the answer reception unit 14 when the answer reception unit 14 receives an answer that the user 100 knows the test word, it assigns a value of 1 to the answer regarding the knowledge of the test word. On the other hand, when the answer reception unit 14 receives an answer that the user 100 does not know the test word, it assigns a numerical value of 0 to the answer regarding the knowledge of the test word. These numerical values are output to the model generation section 15.
  • Model generation unit 15 Input: Answer regarding the user's knowledge of the test word
  • Model The answer regarding the user's 100 knowledge of the test word outputted from the answer reception unit 14 is input to the model generation unit 15.
  • the model generation unit 15 uses the answer regarding the knowledge of the test word and the word familiarity DB stored in the storage unit 11 to generate a value based on the familiarity corresponding to the test word and the user 100's knowledge of the test word.
  • a model representing the relationship between a value based on the probability of answering that they know is obtained (step S15).
  • the obtained model is output to the vocabulary number estimation section 16.
  • the value based on the familiarity corresponding to the test word may be the familiarity itself corresponding to the test word, or may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the familiarity corresponding to the test word. You can. To simplify the explanation, a case will be exemplified below in which the value based on the degree of familiarity corresponding to the test word is the degree of familiarity corresponding to the test word itself.
  • the value based on the probability that the user 100 answers that he or she knows the test word may be the probability that the user 100 answers that he or she knows the test word, or may be the probability that the user 100 answers that he or she knows the test word. It may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the probability of answering that the answer is yes.
  • a case is exemplified in which the value based on the probability that the user 100 answers that he or she knows the test word is the probability that the user 100 answers that he or she knows the test word. .
  • model an example of the model is a logistic regression model (logistic model).
  • is a model parameter.
  • the model generation unit 15 refers to the word familiarity DB stored in the storage unit 11 to obtain the familiarity corresponding to the test word w(n) that the user 100 answered that he/she knows.
  • x(n) be the intimacy level. This familiarity x(n) is the familiarity corresponding to the test word w(n).
  • the model generation unit 15 determines that the user 100 knows the test word w(n) for which the user 100 answered that he or she does not know (or does not answer that he or she knows) the test word w(n).
  • the point (x, y) (x(n), 0).
  • the horizontal axis represents the degree of familiarity
  • the vertical axis represents the probability (y) of a person answering that they know the word.
  • "AIC" in FIG. 3 represents the Akaike information criterion, and the smaller the value, the better the fit of the model.
  • "n” in FIG. 3 represents the number of test words.
  • model generation section 15 may be the model generation section 15 or the model construction section 15. Models may also be created or constructed.
  • estimation methods 1 to 3 will be explained as examples of methods for estimating the number of vocabulary of the user 100 by the vocabulary number estimation unit 16.
  • the vocabulary size estimating unit 16 obtains a predetermined value acquisition familiarity, which is the familiarity when the value based on the probability that the user 100 answers that he or she knows the word is at a predetermined value or in the vicinity of the predetermined value.
  • a predetermined value acquisition familiarity which is the familiarity when the value based on the probability that the user 100 answers that he or she knows the word is at a predetermined value or in the vicinity of the predetermined value.
  • Examples of the predetermined value are 0.5 or 0.8.
  • the predetermined value may be any other value greater than 0 and less than 1.
  • the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, obtains the number of words with a familiarity equal to or higher than a predetermined value of acquired familiarity, and uses the obtained number as a user's
  • the number of vocabulary is 100.
  • the vocabulary number estimating unit 16 refers to the model and the word familiarity DB stored in the storage unit 11 to determine the familiarity x(m) corresponding to the word w(m) included in the word familiarity DB. ) is input to the model, the output value y(m) is obtained. In other words, the vocabulary size estimation unit 16 calculates the value of y corresponding to the familiarity x(m) corresponding to the word w(m) in the model, and sets the calculated value as the output value y(m). .
  • the vocabulary number estimation unit 16 calculates the knowledge of the test word w(m).
  • the number of vocabulary of the user 100 may be estimated by taking into account the answers regarding the question.
  • the number of vocabulary is directly set to x by estimating the number of vocabulary based on the logistic model estimated from y, which is the probability that the user 100 answers that he/she knows the test word, and the familiarity x of the test word.
  • the model converges more easily and the number of vocabulary can be estimated more robustly. Moreover, even if the distribution of the number of words corresponding to each degree of familiarity differs greatly, a sudden change in the estimated number of vocabulary can be suppressed.
  • the vocabulary number estimation unit 16 refers to the model and the word familiarity DB stored in the storage unit 11, and calculates the result when the familiarity x(i) included in the word familiarity DB is input to the model. Obtain the output value y(i). In other words, the vocabulary number estimating unit 16 calculates the value of y corresponding to the familiarity x(i) in the model, and sets the calculated value as the output value y(i). Further, the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, and the number n( of words corresponding to the familiarity x(i) included in the word familiarity DB) i) obtain.
  • the word selection unit 12 does not simply select a plurality of test words w(1), ..., w(N) from a plurality of words, rather than setting the familiarity intervals corresponding to the test words at regular intervals. good.
  • model generation unit 15 assumes an answer regarding knowledge of the non-presentation words, and generates values based on the familiarity corresponding to the test words and the non-presentation words, and the user 100's knowledge of the test words and the non-presentation words.
  • a model representing the relationship between the probability of answering or a value based on an assumption may be obtained.
  • the non-presented word is a word other than the plurality of test words among the plurality of words.
  • answers for non-presented words that were not used as test words are assumed and used to create the model.
  • Words near the upper limit of familiarity are words that many people know, and words near the lower limit are words that many people do not know. Therefore, if the user 100 answers that he/she knows the word with the highest degree of familiarity among the test words, it is assumed that the user 100 also knows the non-presented words with a degree of familiarity higher than that degree of familiarity. Conversely, if the user answers that he or she does not know the word with the lowest familiarity among the test words, it is assumed that the user does not know the non-presented word with a familiarity lower than that familiarity.
  • non-presented words which are words that were not presented to the user 100
  • answers regarding knowledge of non-presented words and estimating the model
  • the model converges more easily and a more appropriate model can be created. can be generated. This makes it easier for the model to converge, for example, even if user 100 answers that he or she knows most of the test words, or even if user 100 answers that he or she does not know most of the test words. , a more appropriate model can be generated.
  • the second embodiment is an acquisition probability acquisition device and method.
  • the acquisition probability acquisition device 2 of this embodiment includes a storage section 11, a model storage section 21, a word extraction section 22, a familiarity acquisition section 23, an acquisition probability acquisition section 24, and an acquisition word information generation section. It is equipped with 25.
  • the acquisition probability acquisition device 2 does not need to include the word extraction section 22 and the acquisition word information generation section 25.
  • the storage unit 11 is the same as the storage unit 11 of the first embodiment.
  • the storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the degree of familiarity is an index representing the degree of familiarity with a word.
  • the model storage unit 21 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word.
  • a certain person is a person who obtains the acquisition probability.
  • “Someone” may be the user 100.
  • acquiring words means, in other words, knowing the words, being able to use the words, knowing the words, or being able to explain the words.
  • An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
  • the acquisition probability acquisition device 2 may further include a model generation device 1 for generating a model stored in the model storage unit 21.
  • the acquisition probability acquisition device 2 includes (1) a word selection unit 12 that selects a plurality of test words from a plurality of words, (2) a presentation unit 13 that presents test words to a user, and (3) a user (4) Answer reception unit 14 that accepts answers regarding knowledge of test words; (4) Answers regarding knowledge of test words and word familiarity DB stored in storage unit 11; A model expressing the relationship between a value based on , and a value based on the probability that the user answers that he or she knows the test word is obtained, and the obtained model is used as the model stored in the model storage unit.
  • the device may further include a portion 15.
  • Each extracted word is output to the familiarity acquisition unit 23.
  • the text input to the word extraction unit 22 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
  • the word extraction unit 22 extracts each word contained in the input text, for example, by performing morphological analysis on the input text.
  • the intimacy level acquisition unit 23 acquires the familiarity level corresponding to each word from the word familiarity level DB stored in the storage unit 11 (step S23).
  • the acquisition probability acquisition device 2 does not include the word extraction unit 22, each word included in the text is input.
  • the familiarity acquisition unit 23 acquires the familiarity corresponding to each word included in the text from the word familiarity DB stored in the storage unit 11 (step S23).
  • Each word and the familiarity corresponding to each word are output to the acquisition probability acquisition unit 24.
  • the familiarity acquisition unit 23 and the word extraction unit 22 do not need to acquire the familiarity for words that are function words such as proper nouns, numerals, and particles.
  • the word extraction unit 22 may acquire the familiarity of only words that are content words.
  • Function words such as number words and particles are words that many people know. Therefore, by acquiring the familiarity of these function words, in other words, by using these function words as processing targets, the acquired word information generation unit 25 calculates the familiarity of the text. The percentage of estimated acquired words can be increased. On the other hand, by not acquiring familiarity for these function words, in other words, by not processing these function words, the acquired word information generation unit 25 calculates The estimated percentage of acquired words can be lowered.
  • the familiarity acquisition unit 23 may ignore words that are not included in the word familiarity DB without acquiring the familiarity. Thereby, even if the morphological analysis is incorrect, the acquisition probability acquisition process can be performed appropriately.
  • the acquisition probability acquisition unit 24 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 24 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
  • the acquisition probability acquisition unit 24 may acquire the acquisition probability by considering the part of speech, word length, etc. For example, the acquisition probability acquisition unit 24 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
  • Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
  • the acquired word information generation unit 25 generates acquired word information, which is information regarding acquisition of words included in the text, using the acquisition probability corresponding to each word (step S25 ).
  • An example of the acquired word information is at least one of the following: estimated acquired words in the text, the number of estimated acquired words in the text, and the ratio of estimated acquired words in the text.
  • the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person.
  • the number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment.
  • the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k).
  • the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 sets words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as estimated acquired words in the text.
  • the higher the familiarity of a word the smaller the GOISU(k). Therefore, it can be assumed that a person knows words in GOISU(k) that are less than the number of words in that person's vocabulary.
  • FIG. 6 shows an example of GOISU(k).
  • the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person.
  • the number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment.
  • the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k).
  • the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
  • the acquired word information generation unit 25 sets the number of words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as the estimated number of acquired words in the text.
  • the acquired word information generation unit 25 calculates a value determined by, for example, the following formula (1) or formula (2), and uses the calculated value as the ratio of estimated acquired words in the text.
  • FREQ(k) is the number of times the word w(k) appears in the text. Assuming that the text is divided into multiple parts, DIFF(k) is the number of parts in which the word w(k) appears.
  • An example of a part is a predetermined unit that constitutes a text, such as a unit, chapter, or section. The entire text may be used as a unit.
  • K is the total number of words included in the text and for which the acquisition probability has been acquired by the acquisition probability acquisition unit 24.
  • the acquired word information generation unit 25 counts FREQ(k) and DIFF(k) based on the input word.
  • the acquired word information generation unit 25 calculates a value determined by equation (1) or equation (2) using FREQ(k) and DIFF(k) found by counting.
  • FIG. 6 shows an example of FREQ(k) and DIFF(k).
  • the number of occurrences of rare words in text will be less than the number of occurrences of well-known words in text.
  • the acquired word information generation unit 25 may use the number of estimated acquired words in the text/K as the ratio of the estimated acquired words in the text.
  • the number of estimated acquired words in a text can be determined by the method described in (Estimated number of acquired words in text).
  • the third embodiment is a recommended learning word extraction device and method.
  • the recommended learning word extraction device 3 of this embodiment includes a storage section 11, a model storage section 31, an acquisition probability acquisition section 32, and a recommended learning word extraction section 33.
  • the storage unit 11 is the same as the storage unit 11 of the first embodiment.
  • the storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
  • the degree of familiarity is an index representing the degree of familiarity with a word.
  • the model storage unit 31 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word.
  • a certain person is a person from whom recommended learning words are extracted.
  • “Someone” may be the user 100.
  • An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
  • the recommended learning word extraction device 3 may further include a model generation device 1 for generating a model stored in the model storage unit 31.
  • the learning recommended word extraction device 3 includes (1) a word selection section 12 that selects a plurality of test words from a plurality of words, (2) a presentation section 13 that presents test words to a user, and (3) a usage method. and (4) the answer reception unit 14 that receives answers regarding the knowledge of the test words by the parent who corresponds to the test words, using the answers regarding the knowledge of the test words and the word familiarity DB stored in the storage unit 11.
  • a model that obtains a model expressing the relationship between a value based on the density and a value based on the probability that the user answers that he or she knows the test word, and uses the obtained model as the model stored in the model storage unit.
  • the generating unit 15 may further be provided.
  • the acquisition probability acquisition unit 32 uses at least the word familiarity DB stored in the storage unit 11 and the model stored in the model storage unit 31 to identify each word included in the input word set to a certain person.
  • the acquisition probability which is the probability that is acquired, is obtained (step S32).
  • the acquisition probability acquisition unit 32 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 32 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
  • the acquisition probability acquisition unit 32 may acquire the acquisition probability by considering the part of speech, the length of the word, etc. For example, the acquisition probability acquisition unit 32 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
  • Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
  • ⁇ Learning recommended word extraction unit 33 Input: word, acquisition probability
  • Output recommended learning word
  • the recommended learning word extraction unit 33 extracts recommended learning words from the word set based on the acquired acquisition probability (step S33).
  • the recommended learning word extraction unit 33 may extract words whose acquisition probability is close to a predetermined probability as recommended learning words.
  • the predetermined probability is a number greater than 0 and less than 1.
  • An example of a predetermined probability is 0.5.
  • the recommended learning word extraction unit 33 may extract a predetermined number of words with a predetermined probability as recommended learning words.
  • the 7 words shown in FIG. 9 are extracted as recommended learning words.
  • ENTRY is the notation of the word
  • PSY is the familiarity
  • Prob is the acquisition probability
  • YN is the answer that the user 100 knows or does not know about these words. If there is, it is information about the answer
  • Distance50 is the size of the difference between 0.5, which is the predetermined probability in this case, and Prob.
  • "-" is displayed in YN because the user 100 has not answered that he or she knows or does not know. If user 100 answers that they know the word, "1" is displayed in YN, and if user 100 answers that they do not know the word, "0" is displayed in YN. ” is displayed.
  • the recommended learning words are presented to the person from whom the recommended learning words are to be extracted.
  • the recommended learning words may be presented to a person from whom the recommended learning words are to be extracted, in the form of a table shown in FIG.
  • the recommended learning word extraction unit 33 may extract words included within a predetermined range that includes a predetermined probability as recommended learning words.
  • the recommended learning word extraction unit 33 may extract words with a predetermined part of speech and whose obtained acquisition probability is close to a predetermined probability as recommended learning words.
  • predetermined parts of speech are verbs, nouns, and adjectives.
  • the predetermined part of speech may be two or more types of parts of speech.
  • the recommended learning word extracting unit 33 may extract, as recommended learning words, words whose acquisition probability is close to a predetermined probability from among words of two or more types of parts of speech.
  • the part of speech information may be stored in the word familiarity DB.
  • the learning recommended word extraction unit 33 can refer to the word familiarity DB, obtain the part of speech of the word, and perform the above processing.
  • the learning recommended word extraction unit 33 may refer to a dictionary in which words and their parts of speech are stored in a storage unit (not shown), obtain the part of speech of the word, and perform the above processing.
  • the word set that is input to the acquisition probability acquisition unit 32 and is made up of a plurality of words that are candidates for learning recommended words may be words that are included in a predetermined text.
  • the recommended learning word extraction device 3 may include a word extraction section 34 described below.
  • Each extracted word is output to the acquisition probability acquisition unit 32 as a word set that is a candidate for a recommended learning word.
  • the text input to the word extraction unit 34 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
  • the word extraction unit 34 extracts each word included in the input text, for example, by performing morphological analysis on the input text.
  • data may be exchanged directly between the components of the model generation device 1, acquisition probability acquisition device 2, and recommended learning word extraction device 3, or may be performed via a storage unit (not shown). .
  • a program that describes this processing content can be recorded on a computer-readable recording medium.
  • the computer-readable recording medium is, for example, a non-transitory recording medium, specifically a magnetic recording device, an optical disk, or the like.
  • this program is performed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, this program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to another computer via a network.
  • a computer that executes such a program for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer into the auxiliary storage unit 1050, which is its own non-temporary storage device. Store. When executing a process, this computer loads a program stored in the auxiliary storage unit 1050, which is its own non-temporary storage device, into the storage unit 1020, and executes the process according to the read program. Further, as another form of execution of this program, the computer may directly load the program from a portable recording medium into the storage unit 1020 and execute processing according to the program. Each time the received program is transferred, processing may be executed in accordance with the received program.
  • ASP Application Service Provider
  • the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer programs from the server computer to this computer, but only realizes processing functions by issuing execution instructions and obtaining results.
  • ASP Application Service Provider
  • the present apparatus is configured by executing a predetermined program on a computer, but at least a part of these processing contents may be implemented in hardware.
  • the word selection unit 12, the presentation unit 13, the answer reception unit 14, the model generation unit 15, the number of vocabulary estimation unit 16, the word extraction unit 22, the familiarity acquisition unit 23, the acquisition probability acquisition unit 24, the acquired word information generation unit 25 , the acquisition probability acquisition section 32, the recommended learning word extraction section 33, and the word extraction section 34 may be constituted by a processing circuit.
  • the storage unit 11, model storage unit 21, and model storage unit 31 may be configured by memory.
  • the memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
  • the processor includes: selecting a plurality of test words from the plurality of words using the word familiarity DB stored in the memory such that the intervals of the familiarity corresponding to the test words are constant intervals; Word selection device.
  • a non-transitory storage medium storing a program executable by a computer to perform a word selection process,
  • the word selection process includes: Familiarity is an index representing the familiarity with a word, and using a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, multiple tests are performed from the plurality of words. Select words such that the familiarity intervals corresponding to the test words are at regular intervals.
  • the memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
  • the processor includes: A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words are presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB stored in the memory are input. to obtain a model representing the relationship between a value based on the degree of familiarity corresponding to the test word and a value based on the probability that the user answers that he/she knows the test word.
  • Model generator Model generator.
  • a non-transitory storage medium storing a program executable by a computer to perform a model generation process,
  • the model generation process is A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words were presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB are used to perform the test.
  • the degree of familiarity is an index representing the degree of familiarity with a word
  • the word familiarity DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively.
  • the memory includes: The degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively; A model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word; is memorized,
  • the processor includes: Obtaining the familiarity corresponding to each word included in the input text from the word familiarity DB stored in the memory, acquiring an acquisition probability that is a probability that the certain person has acquired each of the words, using at least the familiarity corresponding to each of the acquired words and the model stored in the memory; Acquisition probability acquisition device.
  • a non-temporary storage medium storing a program executable by a computer to execute an acquisition probability acquisition process
  • the acquisition probability acquisition process includes: Familiarity is an index representing the familiarity with a word, and from a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, each word included in the input text is Get the corresponding intimacy, Using at least a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word, and the familiarity corresponding to each of the acquired words. and obtain an acquisition probability that is the probability that the certain person has acquired each word; Non-transitory storage medium.
  • the memory includes:
  • the degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively;
  • a model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word; is memorized,
  • the processor includes: acquisition, which is the probability that the certain person has acquired each word included in the input word set, using at least the word familiarity DB stored in the memory and the model stored in the memory; get the probability, extracting recommended learning words from the word set based on the acquired acquisition probability; Recommended learning word extraction device.
  • a non-temporary storage medium storing a program executable by a computer to execute a recommended learning word extraction process
  • the learning recommended word extraction process is Familiarity is an index representing the familiarity with a word
  • a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, and a value based on the familiarity corresponding to each word.
  • acquisition probability which is the probability that a certain person has acquired each word included in the input word set, using at least a model representing a relationship between a value based on the probability that a certain person has acquired each word; get extracting recommended learning words from the word set based on the acquired acquisition probability;
  • Non-transitory storage medium Non-transitory storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A word selection device A1 comprises: a storage unit 11 in which a word familiarity DB storing a plurality of words and a plurality of familiarities corresponding to the plurality of words is stored, the familiarity being an indicator of intimacy with a word; and a word selection unit 12 that uses the word familiarity DB stored in the storage unit 11 to select a plurality of test words from among the plurality of words such that familiarity intervals corresponding to the test words are constant intervals.

Description

単語選択装置、方法及びプログラムWord selection device, method and program
 開示の技術は、単語を選択する技術に関する。 The disclosed technology relates to a technology for selecting words.
 ある人が知っている単語の総数をその人の語彙数という。語彙数推定テストは、その語彙数を短時間に精度よく推定するテストである(例えば、非特許文献1等参照)。以下にその推定手順の概要を示す。 The total number of words that a person knows is called that person's vocabulary. The vocabulary number estimation test is a test that accurately estimates the vocabulary number in a short time (see, for example, Non-Patent Document 1). An outline of the estimation procedure is shown below.
 (1)単語親密度DB(データベース)の単語リストを、親密度の順に並べてほぼ一定間隔となるようにテスト単語を選択する(例えば、1000語ごとに1語選択する)。親密度(単語親密度)とは単語のなじみ深さを数値化したものである。親密度が高い単語ほどなじみのある語であることを示す。 (1) Arrange the word list in the word familiarity DB (database) in order of familiarity and select test words at approximately regular intervals (for example, select one word for every 1000 words). Familiarity (word familiarity) is a numerical value of the familiarity of a word. The higher the degree of familiarity with a word, the more familiar the word is.
 (2)テスト単語を利用者に提示し、その単語を知っているか否かを回答させる。 (2) Present a test word to the user and ask them to answer whether they know the word or not.
 (3)このようなテスト単語と回答の組み合わせを最もよく説明できるようロジスティック回帰分析を行う。ただし、このロジスティック回帰分析では、単語親密度DB中において各テスト単語の親密度以上の親密度の単語の総数を独立変数xとし、利用者が各単語を知っていると回答する確率(例えば、0又は1)を従属変数yとする。ロジスティック回帰分析の結果、ロジスティックモデル(あるいは、ロジスティック回帰式)を得る。ロジスティックモデルの例を、図12に示す。 (3) Perform logistic regression analysis to best explain such combinations of test words and answers. However, in this logistic regression analysis, the total number of words in the word familiarity DB that have a familiarity greater than or equal to each test word is set as the independent variable x, and the probability that the user answers that they know each word (for example, 0 or 1) as the dependent variable y. As a result of the logistic regression analysis, a logistic model (or a logistic regression equation) is obtained. An example of a logistic model is shown in FIG.
 (4)求めたロジスティックモデルにおいて、y=0.5に対応するxの値を求め、推定語彙数とする。なお、推定語彙数とは、利用者の語彙数と推定される値を意味する。 (4) In the obtained logistic model, find the value of x corresponding to y=0.5 and use it as the estimated vocabulary number. Note that the estimated number of vocabulary means the value estimated to be the user's vocabulary.
 この方法では、単語親密度DBを用いることで、選択されたテスト単語を知っているか否かをテストするだけで、利用者の語彙数を精度よく推定できる。 In this method, by using a word familiarity database, it is possible to accurately estimate the user's vocabulary size simply by testing whether or not the user knows the selected test word.
 なお、図11に示すように、各親密度に対応する単語の数は、同じではない。言い換えれば、単語の数は、親密度によってばらつきがある。 Note that, as shown in FIG. 11, the number of words corresponding to each intimacy level is not the same. In other words, the number of words varies depending on familiarity.
 このため、親密度の順に並べた単語からほぼ一定間隔となるようにテスト単語を選択しても、選んだテスト単語の親密度の値は一定間隔にならない。つまり、多くの単語が集中する親密度の単語は多く選ばれ、逆に、あまり単語が存在しない親密度から選ばれる単語は少なくなる。 For this reason, even if test words are selected at approximately constant intervals from the words arranged in order of familiarity, the familiarity values of the selected test words will not be at constant intervals. In other words, more words are selected from a familiarity level where many words are concentrated, and conversely, fewer words are selected from a familiarity level where few words exist.
このため、例えば、親密度が低い単語からはあまりテスト単語が選ばれないため、利用者が提示されたテスト単語のほとんどを知っている場合があり、y=0 となる単語が十分に得られず、ロジスティック回帰分析を行っても収束しにくいため、推定語彙数が単語親密度DBの最大語数以上になるといった場合があった。例えば、図12の例では、y=0.5に対応するxの値は図の範囲外となっている。 For this reason, for example, test words are not often selected from words with low familiarity, so the user may know most of the test words presented, and there may not be enough words for y=0. First, even if logistic regression analysis is performed, it is difficult to converge, so there are cases where the estimated number of vocabulary exceeds the maximum number of words in the word familiarity database. For example, in the example of FIG. 12, the value of x corresponding to y=0.5 is outside the range of the diagram.
 開示の技術は、ロジスティック回帰分析を収束しやすくなるようにし、語彙数をロバストに推定できるようにすることを目的とする。 The disclosed technology aims to make it easier to converge the logistic regression analysis and to robustly estimate the number of vocabulary.
 開示の技術の一態様は、単語選択装置であって、親密度は単語に対する親密さを表す指標であり、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されている記憶部と、記憶部に記憶されている単語親密度DBを用いて、複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する単語選択部と、を備えている。 One aspect of the disclosed technology is a word selection device in which the degree of familiarity is an index representing the degree of intimacy with respect to a word, and a word parent that stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words. Using a storage unit in which the density DB is stored and a word familiarity DB stored in the storage unit, multiple test words are generated from a plurality of words so that the intervals between the familiarities corresponding to the test words are constant. and a word selection section for selecting words.
 開示の技術によれば、ロジスティック回帰分析を収束しやすくなるようにし、語彙数をロバストに推定できるようにすることができる。 According to the disclosed technology, it is possible to make it easier to converge the logistic regression analysis and to robustly estimate the number of vocabulary.
図1は、モデル生成装置及び単語選択装置の機能構成の例を示す図である。FIG. 1 is a diagram showing an example of the functional configuration of a model generation device and a word selection device. 図2は、モデル生成方法及び単語選択方法の処理手続きの例を示す図である。FIG. 2 is a diagram illustrating an example of the processing procedure of the model generation method and word selection method. 図3は、ロジスティック回帰のモデルの例を示す図である。FIG. 3 is a diagram showing an example of a logistic regression model. 図4は、獲得確率取得装置の機能構成の例を示す図である。FIG. 4 is a diagram illustrating an example of the functional configuration of the acquisition probability acquisition device. 図5は、獲得確率取得方法の処理手続きの例を示す図である。FIG. 5 is a diagram illustrating an example of the processing procedure of the acquisition probability acquisition method. 図6は、獲得語情報の生成の例を説明するための図である。FIG. 6 is a diagram for explaining an example of generation of acquired word information. 図7は、学習推奨語抽出装置の機能構成の例を示す図である。FIG. 7 is a diagram showing an example of the functional configuration of the recommended learning word extraction device. 図8は、学習推奨語抽出方法の処理手続きの例を示す図である。FIG. 8 is a diagram illustrating an example of the processing procedure of the recommended learning word extraction method. 図9は、学習推奨語の例を示す図である。FIG. 9 is a diagram showing an example of recommended learning words. 図10は、コンピュータの機能構成例を示す図である。FIG. 10 is a diagram showing an example of a functional configuration of a computer. 図11は、親密度と語数の対応の例を示す図である。FIG. 11 is a diagram showing an example of the correspondence between familiarity and number of words. 図12は、背景技術を説明するための図である。FIG. 12 is a diagram for explaining the background technology.
 以下、図面を参照して開示の技術の実施形態を説明する。 Hereinafter, embodiments of the disclosed technology will be described with reference to the drawings.
 [第一実施形態]
 まず、第一実施形態を説明する。第一実施形態は、モデル生成装置及び方法、単語生成装置及び方法である。
[First embodiment]
First, a first embodiment will be described. The first embodiment is a model generation device and method, and a word generation device and method.
 図1に例示するように、本実施形態のモデル生成装置1は、記憶部11、単語選択部12、提示部13、回答受付部14、モデル生成部15及び語彙数推定部16を備えている。モデル生成装置1は、単語選択部12、提示部13、回答受付部14、記憶部11及び語彙数推定部16を備えていなくてもよい。 As illustrated in FIG. 1, the model generation device 1 of this embodiment includes a storage section 11, a word selection section 12, a presentation section 13, an answer reception section 14, a model generation section 15, and a vocabulary number estimation section 16. . The model generation device 1 does not need to include the word selection section 12, the presentation section 13, the answer reception section 14, the storage section 11, and the vocabulary number estimation section 16.
 なお、図1に破線で示すように、記憶部11及び単語選択部12により、単語生成装置A1は構成される。なお、単語生成装置A1は、提示部13、回答受付部14を備えていてもよい。 Note that, as shown by the broken line in FIG. 1, the word generation device A1 is configured by the storage section 11 and the word selection section 12. Note that the word generation device A1 may include a presentation section 13 and a response reception section 14.
 <記憶部11>
 記憶部11には予め親密度データベース(DB)が格納されている。単語親密度DBは、M個の単語(複数の単語)と当該単語それぞれに対して予め定められた親密度(単語親密度)との組を格納したデータベースである。言い換えれば、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されている。
<Storage unit 11>
The storage unit 11 stores an intimacy database (DB) in advance. The word familiarity DB is a database that stores sets of M words (a plurality of words) and a predetermined familiarity (word familiarity) for each word. In other words, a word familiarity DB is stored that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words.
 単語親密度DBのM個の単語は親密度に基づく順序(例えば、親密度順)で順位付けされている。Mは単語親密度DBに含まれる単語数を表す2以上の整数である。Mの値に限定はないが、例えば、母語の語彙数を測る場合Mは70000以上、第二言語(例えば日本語母語話者にとっての英語)の語彙数を測る場合Mは10000以上、が望ましい。日本人の成人の語彙数が約4万から5万程度と言われているため、7万語程度あれば個人差を含めてほとんどの人の語彙をカバーできるからである。一方、第二言語の場合、母語ほど語彙数がない場合が多く、母語の場合のMより少ない語数でもほとんどの人の語彙をカバーできると考えられる。ただし、表記ゆれや派生語の扱いなど語彙の数え方によって語彙数は大きく変わる。そのため、語彙の数え方によっては母語の場合Mが100000以上必要な場合もある。また、推定される語彙数は、基準となる単語親密度DBに含まれる語数が上限となる。そのため、外れ値となるような語彙数の多い人の語彙推定も行う場合には、Mの値をより大きくすることが望ましい。 The M words in the word familiarity DB are ranked in an order based on familiarity (for example, in order of familiarity). M is an integer of 2 or more representing the number of words included in the word familiarity DB. There is no limit to the value of M, but for example, when measuring the number of vocabulary words in a native language, M is preferably 70,000 or more, and when measuring the number of vocabulary words in a second language (for example, English for Japanese native speakers), M is preferably 10,000 or more. . This is because the vocabulary size of Japanese adults is said to be around 40,000 to 50,000, so around 70,000 words would cover most people's vocabulary, including individual differences. On the other hand, in the case of a second language, it often does not have as large a vocabulary as the mother tongue, and it is thought that most people's vocabulary can be covered with fewer words than M in the case of the mother tongue. However, the number of vocabulary varies greatly depending on how the vocabulary is counted, such as variations in spelling and the handling of derived words. Therefore, depending on how you count vocabulary, you may need M of 100,000 or more for your native language. Furthermore, the upper limit of the estimated number of vocabulary is the number of words included in the standard word familiarity DB. Therefore, when estimating the vocabulary of a person with a large vocabulary who is an outlier, it is desirable to increase the value of M.
 親密度(単語親密度)とは、単語に対する親密さを表す指標である。単語に対する親密さを表す指標の例は、親密度は単語のなじみ深さを表す指標(例えば、非特許文献1で紹介されている単語のなじみ深さを数値化したもの)、単語をどの程度見たり聞いたりするかを表す指標、単語をどの程度知っているかを表す指標、単語をどの程度書くことができるかを表す指標、単語を用いてどの程度話すことができるかを表す指標である。 Familiarity (word familiarity) is an index that expresses the familiarity with a word. Examples of indicators that express the familiarity of a word are: Familiarity is an indicator that expresses the degree of familiarity of a word (for example, the numerical value of the familiarity of a word introduced in Non-Patent Document 1); These are indicators that show how well you see and hear, how well you know words, how well you can write words, and how well you can speak using words. .
例えば、親密度が高い単語ほど親密な語である。本実施形態では、親密度を表す数値が大きいほど親密度が高いことを表す。しかし、これは本発明を限定するものではない。 For example, the higher the familiarity of a word, the more intimate the word. In this embodiment, the larger the numerical value representing the degree of intimacy, the higher the degree of intimacy. However, this does not limit the invention.
 記憶部11は、単語選択部12およびモデル生成部15からの読み出し要請を入力として、当該要請に応じた単語と、その単語の親密度を出力する。 The storage unit 11 receives read requests from the word selection unit 12 and the model generation unit 15, and outputs the word corresponding to the request and the familiarity of the word.
 <単語選択部12>
 入力:利用者またはシステムからの問題生成要請
 出力:語彙数推定テストに使用するN個のテスト単語
 単語選択部12は、利用者またはシステムからの問題生成要請を受け付けると、記憶部11の単語親密度DBに含まれる順序付けされた複数の単語から語彙数推定テストに使用する複数のテスト単語w(1),…,w(N)を選択して出力する。
<Word selection section 12>
Input: Request for question generation from the user or system Output: N test words used for the vocabulary estimation test When the word selection unit 12 receives a question generation request from the user or the system, the word selection unit 12 selects the word parent in the storage unit 11. A plurality of test words w(1), . . . , w(N) used in the vocabulary size estimation test are selected and output from a plurality of ordered words included in the density DB.
 例えば、単語選択部12は、記憶部11に記憶されている単語親密度DBを用いて、複数の単語から複数のテスト単語w(1),…,w(N)を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する(ステップS12)。 For example, the word selection unit 12 uses the word familiarity DB stored in the storage unit 11 to select a plurality of test words w(1), ..., w(N) from a plurality of words corresponding to the test word. The intervals of intimacy are selected to be constant intervals (step S12).
 例えば、単語選択部12は、記憶部11の単語親密度DBに含まれる全単語を対象として、選択した単語の親密度がほぼ一定間隔になるようにまんべんなく単語をN個選択し、選択したN個の単語をテスト単語w(1),…,w(N)として出力する。 For example, the word selection unit 12 evenly selects N words from all the words included in the word familiarity DB in the storage unit 11 so that the familiarity of the selected words is at approximately constant intervals, and words are output as test words w(1), ..., w(N).
 例えば、単語選択部12は、親密度の間隔が0.1となるように単語を選択する。例えば、単語選択部12は、親密度が1の単語w(1)、親密度が1.1の単語w(2)、…、親密度が6.9の単語w(60)、親密度が7の単語w(61)の計61個の単語を選択してもよい。 For example, the word selection unit 12 selects words such that the familiarity interval is 0.1. For example, the word selection unit 12 selects a word w(1) with a familiarity of 1, a word w(2) with a familiarity of 1.1, a word w(60) with a familiarity of 6.9, a word w(60) with a familiarity of 7 words w (61), a total of 61 words may be selected.
 テスト単語w(1),…,w(N)の親密度は、必ずしも一定間隔である必要はなく、まんべんなく選択できていれば良く、過去の調査から利用者が知っているか知らないかの境界周辺の親密度が予測されている場合など、重点的に調査したい親密度周辺の単語を多めに選択するなどしてもよい。すなわち、一連のテスト単語w(1),…,w(N)の親密度の数値に粗密があってもよい。 The familiarity of the test words w(1),...,w(N) does not necessarily have to be at regular intervals, it is sufficient if they are evenly selected, and from past research, it is the boundary between what the user knows and what they do not know. If the degree of familiarity in the surrounding area is predicted, a larger number of words in the vicinity of the degree of familiarity to be investigated may be selected. That is, the familiarity values of the series of test words w(1), . . . , w(N) may vary in density.
 単語選択部12から出力されるテスト単語w(1),…,w(N)の順序に限定はないが、単語選択部12は、例えば親密度の高い順にテスト単語w(1),…,w(N)を出力する。 There is no limit to the order of the test words w(1),..., w(N) output from the word selection unit 12, but the word selection unit 12 selects the test words w(1),..., w(N) in order of familiarity, for example. Output w(N).
 テスト単語の数Nは、問題生成要請によって指定されてもよいし、予め定められていてもよい。Nの値に限定はないが、例えば50≦N≦100程度が望ましい。十分な推定を行うためにはN≧25であることが望ましい。Nが大きい方が精度の高い推定が可能であるが、利用者(被験者)の負荷が高くなる(ステップS12)。 The number N of test words may be specified by the question generation request, or may be predetermined. Although there is no limit to the value of N, it is desirable that, for example, about 50≦N≦100. In order to perform sufficient estimation, it is desirable that N≧25. A larger N allows more accurate estimation, but increases the burden on the user (subject) (step S12).
 利用者の負荷を減らし、精度を高くするために、例えば50語ずつのテストを複数回(例えば、3回)実施し、それぞれのテストごとに語彙数を推定し、複数回分の回答をまとめて推定しなおしてもよい。この場合、1度のテスト単語を少なくできるため、利用者の負担が少なく、それぞれのテストごとに結果が見られようにすれば利用者の回答モチベーション維持につながる。また、複数回分の語を合わせて最終的な語彙数推定を実施すれば、推定精度を向上できる。 In order to reduce the burden on users and increase accuracy, tests of 50 words each are conducted multiple times (for example, 3 times), the number of vocabulary is estimated for each test, and the answers from multiple tests are combined. You may re-estimate. In this case, the number of words tested at one time can be reduced, which reduces the burden on the user, and if the results can be viewed for each test, the user's motivation to answer can be maintained. Furthermore, if the final vocabulary size estimation is performed by combining words from multiple times, the estimation accuracy can be improved.
 テスト単語に対応する親密度の間隔が一定間隔になるように、複数のテスト単語を選択することで、親密度のばらつきを抑えることができるため、ロジスティック曲線が収束しやすくなる。 By selecting multiple test words so that the intervals of familiarity corresponding to the test words are constant, variations in familiarity can be suppressed, making it easier for the logistic curve to converge.
 <提示部13>
 入力:N個のテスト単語
 出力:指示文およびN個のテスト単語
 提示部13には、単語選択部12から出力されたN個のテスト単語w(1),…,w(N)が入力される。提示部13は、事前に設定された表示形式に従い、テスト単語w(1),…,w(N)を利用者100(被験者)に提示する(ステップS13)。
<Presentation section 13>
Input: N test words Output: instruction sentence and N test words N test words w(1), ..., w(N) output from the word selection section 12 are input to the presentation section 13. Ru. The presentation unit 13 presents the test words w(1), ..., w(N) to the user 100 (subject) according to a preset display format (step S13).
 例えば、提示部13は、事前に設定された表示形式に従い、利用者100のテスト単語の知識に関する回答の入力を促す予め定められた指示文、およびN個のテスト単語w(1),…,w(N)を、語彙数推定テスト用のフォーマットで利用者100に提示する。 For example, the presentation unit 13 displays a predetermined instruction sentence prompting the user 100 to input an answer regarding his or her knowledge of test words, and N test words w(1), . . . , in accordance with a preset display format. w(N) is presented to the user 100 in a format for a vocabulary size estimation test.
 この提示形式に限定はなく、これらの情報がテキストや画像などの視覚情報として提示されてもよいし、音声などの聴覚情報として提示されてもよいし、点字などの触覚情報として提示されてもよい。 There are no limitations to this presentation format, and this information may be presented as visual information such as text or images, auditory information such as audio, or tactile information such as Braille. good.
 例えば、提示部13は、PC(personal computer)、タブレット、スマートフォンなどの端末装置の表示画面に、指示文およびテスト単語を電子的に表示してもよい。すなわち、提示部13は、ディスプレイ等に提示するための画面情報を生成し、ディスプレイに対して出力してもよい。 For example, the presentation unit 13 may electronically display the instruction sentence and the test words on the display screen of a terminal device such as a PC (personal computer), tablet, or smartphone. That is, the presentation unit 13 may generate screen information to be presented on a display or the like, and may output the screen information to the display.
 または、提示部13が印刷装置であり、指示文およびテスト単語を紙などに印刷して出力してもよい。あるいは提示部13が端末装置のスピーカーであり、指示文およびテスト単語を音声出力してもよい。または、提示部13が点字ディスプレイであり、指示文およびテスト単語の点字を提示してもよい。 Alternatively, the presentation unit 13 may be a printing device, and the instruction sentences and test words may be printed on paper or the like and output. Alternatively, the presentation unit 13 may be a speaker of the terminal device and may output the instruction sentence and the test word aloud. Alternatively, the presentation unit 13 may be a Braille display and present the instruction sentence and the test word in Braille.
 利用者100のテスト単語の知識に関する回答は、テスト単語を「知っている」または「知らない」の何れかを表すもの(各順位のテスト単語を知っている、または、知らないとの回答)であってもよいし、「知っている」および「知らない」を含む3以上の選択肢の何れかを表すものであってもよい。「知っている」および「知らない」以外の選択肢の例は「(知っているかどうか)自信がない」「単語としては知っているが、意味は知らない」などである。ただし、利用者100に「知っている」および「知らない」を含む3以上の選択肢から回答させても、「知っている」または「知らない」の何れかを回答させる場合に比べて語彙数推定精度が向上しない場合もある。例えば、利用者100に「知っている」「知らない」「自信がない」の3個の選択肢から回答を選ばせた場合、「自信がない」が選択されるか否かは利用者100の性格に依存する。このような場合には、選択肢を増やしても語彙数推定精度は向上しない。したがって、通常、利用者100にテスト単語を「知っている」または「知らない」などの二択で回答させる方が好ましい。 User 100's answer regarding knowledge of the test word indicates either "knowing" or "doing not know" the test word (response of knowing or not knowing the test word of each rank) or may represent any of three or more options including "I know" and "I don't know." Examples of options other than ``I know'' and ``I don't know'' include ``I'm not sure (I know)'' and ``I know the word, but I don't know the meaning.'' However, even if the user 100 is asked to answer from three or more options including ``I know'' and ``I don't know,'' the number of vocabulary is lower than when the user 100 is asked to answer either ``I know'' or ``I don't know.'' In some cases, the estimation accuracy may not improve. For example, if the user 100 is asked to select an answer from three options: "I know," "I don't know," and "I'm not confident," it is up to the user 100 to choose "I'm not confident." Depends on personality. In such a case, increasing the number of options does not improve vocabulary size estimation accuracy. Therefore, it is usually preferable to have the user 100 answer the test word with two choices, such as "I know" or "I don't know."
ただし、「知っている」か「知らない」かではなく、「(テスト単語をつかって)例文を説明作れる」か「例文を作れない」か、あるいは、「(テスト単語の)意味を説明できる」か「意味を説明できない」かといった観点での回答を求めてもよい。観点を明確にすることで、推定する語彙数は変わってくる。たとえば「例文を作れる」かどうかであれば、利用者が使えると考えている語彙数を推定することになる。 However, rather than "I know" or "I don't know," it is "I can explain the meaning of (the test word)" or "I can't create an example sentence (using the test word)." ” or “I can't explain the meaning.” By clarifying your viewpoint, the estimated number of vocabulary will change. For example, if the user is able to create example sentences, the number of vocabulary they think they can use is estimated.
 以下では、利用者100にテスト単語を「知っている」または「知らない」の何れかから回答させる例を説明する。 In the following, an example will be described in which the user 100 is asked to answer the test word by selecting either "I know" or "I don't know."
 また、例えば、テスト単語は親密度が高い順に提示されるが、提示順はこれに限るものではなく、ランダムな順序でテスト単語が提示されてもよい。 Furthermore, for example, the test words are presented in descending order of familiarity, but the presentation order is not limited to this, and the test words may be presented in a random order.
 <回答受付部14>
 入力:利用者のテスト単語の知識に関する回答
 出力:利用者のテスト単語の知識に関する回答
 指示文およびテスト単語が提示された利用者100は、利用者100のテスト単語の知識に関する回答を回答受付部14に入力する(ステップS14)。
<Reply reception section 14>
Input: Answer regarding the user's knowledge of the test word Output: Answer regarding the user's knowledge of the test word The user 100, who has been presented with the instruction sentence and the test word, sends the answer regarding the user's 100 knowledge of the test word to the response reception section. 14 (step S14).
 例えば、回答受付部14は、PC、タブレット、スマートフォンなどの端末装置のタッチパネルであり、利用者100は当該タッチパネルに回答を入力する。回答受付部14が端末装置のマイクロホンであってもよく、この場合、利用者100は当該マイクロホンに回答を音声入力する。 For example, the answer reception unit 14 is a touch panel of a terminal device such as a PC, a tablet, or a smartphone, and the user 100 inputs the answer to the touch panel. The answer receiving unit 14 may be a microphone of a terminal device, and in this case, the user 100 inputs the answer by voice into the microphone.
 利用者100は、マウス等でクリックすることにより、回答受付部14に回答を入力してもよい。 The user 100 may input an answer into the answer reception unit 14 by clicking with a mouse or the like.
 回答受付部14は、入力されたテスト単語の知識に関する回答(例えば、テスト単語を知っているとの回答、またはテスト単語を知らないとの回答)を受け付け、電子的なデータとして当該回答を出力する。回答受付部14は、テスト単語ごとに回答を出力してもよいし、1テスト分の回答をまとめて出力してもよいし、複数テスト分の回答をまとめて出力してもよい。 The answer reception unit 14 receives an input answer regarding knowledge of the test word (for example, an answer that the test word is known or an answer that the test word is not known), and outputs the answer as electronic data. do. The answer receiving unit 14 may output an answer for each test word, may output answers for one test at once, or may output answers for multiple tests at once.
 例えば、回答受付部14は、利用者100がテスト単語を知っているという回答を受け付けた場合には、そのテスト単語の知識に関する回答に1という数値を割り当てる。一方、回答受付部14は、利用者100がテスト単語を知らないという回答を受け付けた場合には、そのテスト単語の知識に関する回答に0という数値を割り当てる。これらの数値が、モデル生成部15に出力される。 For example, when the answer reception unit 14 receives an answer that the user 100 knows the test word, it assigns a value of 1 to the answer regarding the knowledge of the test word. On the other hand, when the answer reception unit 14 receives an answer that the user 100 does not know the test word, it assigns a numerical value of 0 to the answer regarding the knowledge of the test word. These numerical values are output to the model generation section 15.
 <モデル生成部15>
 入力:利用者のテスト単語の知識に関する回答
 出力:モデル
 回答受付部14から出力された利用者100のテスト単語の知識に関する回答は、モデル生成部15に入力される。
<Model generation unit 15>
Input: Answer regarding the user's knowledge of the test word Output: Model The answer regarding the user's 100 knowledge of the test word outputted from the answer reception unit 14 is input to the model generation unit 15.
 モデル生成部15は、テスト単語の知識に関する回答と、記憶部11に記憶されている単語親密度DBとを用いて、テスト単語に対応する親密度に基づく値と、利用者100がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得る(ステップS15)。得られたモデルは、語彙数推定部16に出力される。 The model generation unit 15 uses the answer regarding the knowledge of the test word and the word familiarity DB stored in the storage unit 11 to generate a value based on the familiarity corresponding to the test word and the user 100's knowledge of the test word. A model representing the relationship between a value based on the probability of answering that they know is obtained (step S15). The obtained model is output to the vocabulary number estimation section 16.
 テスト単語に対応する親密度に基づく値は、テスト単語に対応する親密度そのものであってもよいし、テスト単語に対応する親密度の非単調減少関数値(例えば、単調増加関数値)であってもよい。説明の簡略化のため、以下では、テスト単語に対応する親密度に基づく値が、テスト単語に対応する親密度そのものである場合を例示する。 The value based on the familiarity corresponding to the test word may be the familiarity itself corresponding to the test word, or may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the familiarity corresponding to the test word. You can. To simplify the explanation, a case will be exemplified below in which the value based on the degree of familiarity corresponding to the test word is the degree of familiarity corresponding to the test word itself.
 利用者100がテスト単語を知っていると回答する確率に基づく値は、利用者100がテスト単語を知っていると回答する確率そのものであってもよいし、利用者100がテスト単語を知っていると回答する確率の非単調減少関数値(例えば、単調増加関数値)であってもよい。説明の簡略化のため、以下では、利用者100がテスト単語を知っていると回答する確率に基づく値は、利用者100がテスト単語を知っていると回答する確率そのものである場合を例示する。 The value based on the probability that the user 100 answers that he or she knows the test word may be the probability that the user 100 answers that he or she knows the test word, or may be the probability that the user 100 answers that he or she knows the test word. It may be a non-monotonically decreasing function value (for example, a monotonically increasing function value) of the probability of answering that the answer is yes. To simplify the explanation, in the following, a case is exemplified in which the value based on the probability that the user 100 answers that he or she knows the test word is the probability that the user 100 answers that he or she knows the test word. .
 モデルに限定はないが、モデルの一例はロジスティック回帰のモデル(ロジスティックモデル)である。説明の簡略化のため、以下では、テスト単語に対応する親密度を独立変数xとし、利用者100が各単語を知っていると回答する確率を従属変数yとしたロジスティック曲線y=f(x,Ψ)がモデルである場合を例示する。Ψはモデルパラメータである。 Although there is no limitation to the model, an example of the model is a logistic regression model (logistic model). To simplify the explanation, below, we use a logistic curve y=f(x ,Ψ) is a model. Ψ is a model parameter.
 モデル生成部15は、記憶部11に記憶されている単語親密度DBを参照して、利用者100が知っていると回答したテスト単語w(n)に対応する親密度を得て、得られた親密度をx(n)とする。この親密度x(n)が、テスト単語w(n)に対応する親密度である。 The model generation unit 15 refers to the word familiarity DB stored in the storage unit 11 to obtain the familiarity corresponding to the test word w(n) that the user 100 answered that he/she knows. Let x(n) be the intimacy level. This familiarity x(n) is the familiarity corresponding to the test word w(n).
 モデル生成部15は、利用者100が知っていると回答したテスト単語w(n)について、利用者100が当該テスト単語w(n)を知っていると回答する確率yが1(すなわち100%)であり、当該テスト単語w(n)に対応する親密度がx(n)である点(x,y)=(x(n),1)を設定する。 The model generation unit 15 calculates that for the test word w(n) that the user 100 answered that he or she knows, the probability y that the user 100 answers that he or she knows the test word w(n) is 1 (that is, 100%). ), and a point (x, y)=(x(n), 1) is set where the familiarity corresponding to the test word w(n) is x(n).
 また、モデル生成部15は、利用者100が知らないと回答した(または、知っていると回答しない)テスト単語w(n)について、利用者100が当該テスト単語w(n)を知っていると回答する確率yが0(すなわち0%)であり、そのときの当該テスト単語w(n)に対応する親密度がx(n)である点(x,y)=(x(n),0)を設定する。 Further, the model generation unit 15 determines that the user 100 knows the test word w(n) for which the user 100 answered that he or she does not know (or does not answer that he or she knows) the test word w(n). The point (x, y) = (x(n), 0).
 モデル生成部15は、n=1,…,Nの各点(x,y)=(x(n),1)または(x(n),0)に対してロジスティック曲線への当てはめを行い、誤差を最小化するロジスティック曲線y=f(x,Ψ)をモデルとして得る。すなわち、モデル生成部15は、n=1,…,Nの各点(x,y)=(x(n),1)または(x(n),0)に対して誤差を最小化するロジスティック曲線y=f(x,Ψ)をモデルとして得る。 The model generation unit 15 fits each point (x, y) = (x(n), 1) or (x(n), 0) of n = 1, ..., N to a logistic curve, A logistic curve y=f(x,Ψ) that minimizes the error is obtained as a model. That is, the model generation unit 15 generates a logistic model that minimizes the error for each point (x, y) = (x(n), 1) or (x(n), 0) of n = 1, ..., N. A curve y=f(x,Ψ) is obtained as a model.
 図3にロジスティック曲線y=f(x,Ψ)のモデルを例示する。図3では、横軸が親密度を表し、縦軸が単語を知っていると回答する確率(y)を表す。丸印は利用者100が知っていると回答したテスト単語w(n)に対する点(x,y)=(x(n),1)、および利用者100が知らないと回答した(または、知っていると回答しない)テスト単語w(n)に対する点(x,y)=(x(n),0)を表す。図3の「AIC」は、赤池情報量規準を表し、値が小さいほどモデルの当てはまりがよいことを示す。図3の「n」は、テスト単語の数を表す。 FIG. 3 illustrates a model of the logistic curve y=f(x, Ψ). In FIG. 3, the horizontal axis represents the degree of familiarity, and the vertical axis represents the probability (y) of a person answering that they know the word. The circles indicate the points (x, y) = (x(n), 1) for the test words w(n) that the user 100 answered that they knew, and the points that the user 100 answered that they did not know (or did not know). The point (x, y) = (x(n), 0) for the test word w(n) is expressed as (x, y) = (x(n), 0). "AIC" in FIG. 3 represents the Akaike information criterion, and the smaller the value, the better the fit of the model. "n" in FIG. 3 represents the number of test words.
 ここで、生成は、作成、構築と言い換えることもできる。したがって、モデル生成部15は、モデル作成部15又はモデル構築部15であってもよい。また、モデルは、作成又は構築されてもよい。 Here, generation can also be translated as creation or construction. Therefore, the model generation section 15 may be the model generation section 15 or the model construction section 15. Models may also be created or constructed.
 <語彙数推定部16>
 入力:モデル
 出力:利用者100の語彙数
 語彙数推定部16は、モデルに基づいて利用者100の語彙数を推定する(ステップS16)。
<Number of vocabulary estimation unit 16>
Input: Model Output: Number of vocabulary of user 100 The vocabulary number estimating unit 16 estimates the number of vocabulary of user 100 based on the model (step S16).
 以下、語彙数推定部16による利用者100の語彙数を推定方法の例として、推定方法1から推定方法3を説明する。 Hereinafter, estimation methods 1 to 3 will be explained as examples of methods for estimating the number of vocabulary of the user 100 by the vocabulary number estimation unit 16.
 (推定方法1)
 語彙数推定部16は、モデルにおいて、利用者100が単語を知っていると回答する確率に基づく値が所定値または所定値の近傍のときの親密度である所定値獲得親密度を得る。所定値の例は、0.5または0.8である。もちろん、所定値は、0より大きい1未満の他の値であってもよい。
(Estimation method 1)
In the model, the vocabulary size estimating unit 16 obtains a predetermined value acquisition familiarity, which is the familiarity when the value based on the probability that the user 100 answers that he or she knows the word is at a predetermined value or in the vicinity of the predetermined value. Examples of the predetermined value are 0.5 or 0.8. Of course, the predetermined value may be any other value greater than 0 and less than 1.
 そして、語彙数推定部16は、記憶部11に記憶されている単語親密度DBを参照して、所定値獲得親密度以上の親密度の単語の数を得て、得られた数を利用者100の語彙数とする。 Then, the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, obtains the number of words with a familiarity equal to or higher than a predetermined value of acquired familiarity, and uses the obtained number as a user's The number of vocabulary is 100.
 (推定方法2)
 語彙数推定部16は、モデルと、記憶部11に記憶されている単語親密度DBとを参照して、単語親密度DBに含まれている単語w(m)に対応する親密度x(m)をモデルに入力した場合の出力値y(m)を得る。言い換えれば、語彙数推定部16は、モデルにおける、単語w(m)に対応する親密度x(m)に対応するyの値を計算して、その計算値を出力値y(m)とする。語彙数推定部16は、この処理を単語親密度DBに含まれる各単語w(m)(m=1,…,M)に対して行うことで、出力値y(m)(m=1,…,M)を得る。
(Estimation method 2)
The vocabulary number estimating unit 16 refers to the model and the word familiarity DB stored in the storage unit 11 to determine the familiarity x(m) corresponding to the word w(m) included in the word familiarity DB. ) is input to the model, the output value y(m) is obtained. In other words, the vocabulary size estimation unit 16 calculates the value of y corresponding to the familiarity x(m) corresponding to the word w(m) in the model, and sets the calculated value as the output value y(m). . The vocabulary number estimating unit 16 performs this process on each word w(m) (m=1,...,M) included in the word familiarity DB, thereby obtaining an output value y(m)(m=1,...,M). …, M) is obtained.
 そして、語彙数推定部16は、Σm=1 y(m)を計算して、この計算値を利用者100の語彙数とする。 Then, the vocabulary number estimating unit 16 calculates Σ m=1 M y (m), and sets this calculated value as the user's 100 vocabulary number.
 その際、単語w(m)が、テスト単語であり、テスト単語w(m)の知識に関する回答が得られている場合には、語彙数推定部16は、そのテスト単語w(m)の知識に関する回答を考慮して、利用者100の語彙数を推定してもよい。 At that time, if the word w(m) is a test word and an answer regarding the knowledge of the test word w(m) has been obtained, the vocabulary number estimation unit 16 calculates the knowledge of the test word w(m). The number of vocabulary of the user 100 may be estimated by taking into account the answers regarding the question.
 例えば、語彙数推定部16は、テスト単語w(m)の知識に関する回答が知っているという回答である場合にはy(m)=1とし、テスト単語w(m)の知識に関する回答が知らないという回答である場合にはy(m)=0とする。テスト単語以外の単語のy(m)は、上記のようにモデルから得られた出力値y(m)を用いる。 For example, the vocabulary estimation unit 16 sets y(m)=1 when the answer regarding the knowledge of the test word w(m) is that the person knows the test word w(m). If the answer is no, y(m)=0. For y(m) of words other than the test word, the output value y(m) obtained from the model as described above is used.
 そして、語彙数推定部16は、これらのy(m)を用いて、Σm=1 y(m)を計算して、この計算値を利用者100の語彙数とする。 Then, the vocabulary number estimating unit 16 uses these y(m) to calculate Σ m=1 My (m), and sets this calculated value as the user's 100 vocabulary number.
 テスト単語の知識に関する回答を考慮することで、より適切な語彙数を推定することができる。 By considering answers regarding knowledge of test words, a more appropriate vocabulary size can be estimated.
 利用者100が当該テスト単語を知っていると回答する確率であるyと、当該テスト単語の親密度xから推定したロジスティックモデルに基づき語彙数を推定することで、語彙数を直接xとする場合よりモデルが収束しやすくなりロバストに語彙数を推定できる。また、各親密度に対応する語数の分布が大きく異なる場合であっても、急激な推定語彙数の変化を抑えることができる。 When the number of vocabulary is directly set to x by estimating the number of vocabulary based on the logistic model estimated from y, which is the probability that the user 100 answers that he/she knows the test word, and the familiarity x of the test word. The model converges more easily and the number of vocabulary can be estimated more robustly. Moreover, even if the distribution of the number of words corresponding to each degree of familiarity differs greatly, a sudden change in the estimated number of vocabulary can be suppressed.
 (推定方法3)
 語彙数推定部16は、モデルと、記憶部11に記憶されている単語親密度DBとを参照して、単語親密度DBに含まれている親密度x(i)をモデルに入力した場合の出力値y(i)を得る。言い換えれば、語彙数推定部16は、モデルにおける、親密度x(i)に対応するyの値を計算して、その計算値を出力値y(i)とする。また、語彙数推定部16は、記憶部11に記憶されている単語親密度DBを参照して、単語親密度DBに含まれている、親密度x(i)に対応する単語の数n(i)を得る。語彙数推定部16は、これらの処理を単語親密度DBに含まれる各親密度x(i)(i=1,…,I)に対して行うことで、出力値y(i)(i=1,…,I)、単語の数n(i)(i=1,…,I)を得る。Iは、親密度の種類の数である。
(Estimation method 3)
The vocabulary number estimation unit 16 refers to the model and the word familiarity DB stored in the storage unit 11, and calculates the result when the familiarity x(i) included in the word familiarity DB is input to the model. Obtain the output value y(i). In other words, the vocabulary number estimating unit 16 calculates the value of y corresponding to the familiarity x(i) in the model, and sets the calculated value as the output value y(i). Further, the vocabulary number estimating unit 16 refers to the word familiarity DB stored in the storage unit 11, and the number n( of words corresponding to the familiarity x(i) included in the word familiarity DB) i) obtain. The vocabulary number estimation unit 16 performs these processes for each familiarity level x(i) (i=1,...,I) included in the word familiarity level DB, thereby obtaining an output value y(i) (i= 1,...,I), and the number of words n(i) (i=1,...,I). I is the number of types of intimacy.
 そして、語彙数推定部16は、Σi=1 y(i)×n(i)を計算して、この計算値を利用者100の語彙数とする。 Then, the vocabulary number estimating unit 16 calculates Σ i=1 I y(i)×n(i), and sets this calculated value as the user's 100 vocabulary number.
 親密度が同じであれば対応するyの値は同じである。また、同じ親密度の単語があり得る。このため、推定方法2ではなく、推定方法3のように、親密度ごとに計算をすることで、語彙数推定の計算を早く行うことができる。 If the intimacy levels are the same, the corresponding y values are the same. Also, there may be words with the same familiarity. Therefore, by performing calculations for each degree of familiarity as in estimation method 3 instead of estimation method 2, it is possible to quickly calculate the vocabulary size estimation.
 <第一実施形態の変形例>
 単語選択部12は、テスト単語に対応する親密度の間隔が一定間隔になるようにではなく、複数の単語から複数のテスト単語w(1),…,w(N)を単に選択してもよい。
<Modified example of first embodiment>
The word selection unit 12 does not simply select a plurality of test words w(1), ..., w(N) from a plurality of words, rather than setting the familiarity intervals corresponding to the test words at regular intervals. good.
 また、モデル生成部15は、非提示単語の知識に関する回答を仮定して、テスト単語及び非提示単語に対応する親密度に基づく値と、利用者100がテスト単語及び非提示単語を知っていると回答する確率あるいは仮定に基づく値と、の関係を表すモデルを得てもよい。 Further, the model generation unit 15 assumes an answer regarding knowledge of the non-presentation words, and generates values based on the familiarity corresponding to the test words and the non-presentation words, and the user 100's knowledge of the test words and the non-presentation words. A model representing the relationship between the probability of answering or a value based on an assumption may be obtained.
 ここで、非提示単語とは、複数の単語の中の複数のテスト単語以外の単語である。ロジスティックモデルが収束しやすくなるよう、テスト単語に用いなかった非提示単語の回答を仮定してモデルの作成に利用する。親密度の上限に近い単語は、多くの人が知っている単語であり、下限に近い単語は多くの人が知らない単語である。そのため、テスト単語で最も親密度の高い単語を利用者100が知っていると回答した場合には、その親密度以上の親密度の非提示単語も知っていると仮定する。逆に、テスト単語で最も親密度の低い語を利用者が知らないと回答した場合には、その親密度以下の親密度の非提示単語も知らないと仮定する。 Here, the non-presented word is a word other than the plurality of test words among the plurality of words. In order to make it easier for the logistic model to converge, answers for non-presented words that were not used as test words are assumed and used to create the model. Words near the upper limit of familiarity are words that many people know, and words near the lower limit are words that many people do not know. Therefore, if the user 100 answers that he/she knows the word with the highest degree of familiarity among the test words, it is assumed that the user 100 also knows the non-presented words with a degree of familiarity higher than that degree of familiarity. Conversely, if the user answers that he or she does not know the word with the lowest familiarity among the test words, it is assumed that the user does not know the non-presented word with a familiarity lower than that familiarity.
 言い換えれば、非提示単語を利用者100に提示したと仮定した場合の利用者100の非提示単語の知識に関する回答は、テスト単語の親密度の最大値よりも高い親密度の単語に対しては知っているという回答であり、テスト単語の親密度の最小値よりも低い親密度の単語に対しては知らないという回答であるとする。 In other words, assuming that a non-presentation word is presented to the user 100, the answer regarding the knowledge of the non-presentation word of the user 100 will be It is assumed that the answer is "I know", and the answer is "I don't know" for words whose familiarity is lower than the minimum value of the test word's familiarity.
 例えば、利用者が親密度6.5のテスト単語を知っていると回答した場合には、親密度6.7や6.9の非提示単語は知っているという回答であると仮定する。また、利用者が親密度2のテスト単語を知らないと回答した場合には、親密度1.8や1.6の非提示単語は知らないという回答であると仮定する。 For example, if the user answers that they know a test word with a familiarity level of 6.5, it is assumed that the user answers that they know a non-presentation word with a familiarity level of 6.7 or 6.9. Furthermore, if the user answers that he or she does not know the test word with a familiarity level of 2, it is assumed that the user answers that he/she does not know the non-presentation words with a familiarity level of 1.8 or 1.6.
 このように、利用者100に提示しなかった単語である非提示単語及び非提示単語の知識に関する回答を追加して、モデルを推定することで、モデルが収束しやすくなり、より適切なモデルを生成することができる。これにより、例えば、利用者100がテスト単語のほとんどを知っていると回答した場合、または、利用者100がテスト単語のほとんどを知らないと回答した場合であっても、モデルが収束しやすくなり、より適切なモデルを生成することができる。 In this way, by adding non-presented words, which are words that were not presented to the user 100, and answers regarding knowledge of non-presented words, and estimating the model, the model converges more easily and a more appropriate model can be created. can be generated. This makes it easier for the model to converge, for example, even if user 100 answers that he or she knows most of the test words, or even if user 100 answers that he or she does not know most of the test words. , a more appropriate model can be generated.
 [第二実施形態]
 第二実施形態を説明する。第二実施形態は、獲得確率取得装置及び方法である。
[Second embodiment]
A second embodiment will be described. The second embodiment is an acquisition probability acquisition device and method.
 以下では、第一実施形態及び第一実施形態の変形例との相違点を中心に説明する。既に説明した事項については説明を省略することがある。 Below, differences between the first embodiment and the modification of the first embodiment will be mainly described. Explanation of matters that have already been explained may be omitted.
 図4に例示するように、本実施形態の獲得確率取得装置2は、記憶部11、モデル記憶部21、単語抽出部22、親密度取得部23、獲得確率取得部24及び獲得語情報生成部25を備えている。獲得確率取得装置2は、単語抽出部22及び獲得語情報生成部25を備えていなくてもよい。 As illustrated in FIG. 4, the acquisition probability acquisition device 2 of this embodiment includes a storage section 11, a model storage section 21, a word extraction section 22, a familiarity acquisition section 23, an acquisition probability acquisition section 24, and an acquisition word information generation section. It is equipped with 25. The acquisition probability acquisition device 2 does not need to include the word extraction section 22 and the acquisition word information generation section 25.
 <記憶部11>
 記憶部11については、第一実施形態の記憶部11と同じである。
<Storage unit 11>
The storage unit 11 is the same as the storage unit 11 of the first embodiment.
 記憶部11には、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されている。ここで、親密度は単語に対する親密さを表す指標である。 The storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words. Here, the degree of familiarity is an index representing the degree of familiarity with a word.
 <モデル記憶部21>
 モデル記憶部21には、各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルが記憶されている。ここで、「ある者」とは、獲得確率の取得となる者である。「ある者」は、利用者100であってもよい。
<Model storage unit 21>
The model storage unit 21 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word. Here, "a certain person" is a person who obtains the acquisition probability. “Someone” may be the user 100.
 ここで、単語を獲得とは、言い換えれば、単語を知っている、単語を使うことができる、単語をわかっている、又は、単語を説明できることである。 Here, acquiring words means, in other words, knowing the words, being able to use the words, knowing the words, or being able to explain the words.
 このモデルの例は、第一実施形態及び第一実施形態の変形例のモデル生成装置1で生成されたモデルである。 An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
 図4に破線で示すように、獲得確率取得装置2は、モデル記憶部21に記憶されるモデルを生成するためのモデル生成装置1を更に備えていてもよい。 As shown by the broken line in FIG. 4, the acquisition probability acquisition device 2 may further include a model generation device 1 for generating a model stored in the model storage unit 21.
 すなわち、獲得確率取得装置2は、(1)複数の単語から複数のテスト単語を選択する単語選択部12と、(2)テスト単語を利用者に提示する提示部13と、(3)利用者のテスト単語の知識に関する回答を受け付ける回答受付部14と、(4)テスト単語の知識に関する回答と、記憶部11に記憶されている単語親密度DBとを用いて、テスト単語に対応する親密度に基づく値と、利用者がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得て、得られたモデルをモデル記憶部に記憶されているモデルとするモデル生成部15と、を更に備えていてもよい。 That is, the acquisition probability acquisition device 2 includes (1) a word selection unit 12 that selects a plurality of test words from a plurality of words, (2) a presentation unit 13 that presents test words to a user, and (3) a user (4) Answer reception unit 14 that accepts answers regarding knowledge of test words; (4) Answers regarding knowledge of test words and word familiarity DB stored in storage unit 11; A model expressing the relationship between a value based on , and a value based on the probability that the user answers that he or she knows the test word is obtained, and the obtained model is used as the model stored in the model storage unit. The device may further include a portion 15.
 <単語抽出部22>
 入力:テキスト
 出力:単語
 単語抽出部22は、入力されたテキストに含まれる各単語を抽出する(ステップS22)。
<Word extraction unit 22>
Input: Text Output: Word The word extraction unit 22 extracts each word included in the input text (step S22).
 抽出された各単語は、親密度取得部23に出力される。 Each extracted word is output to the familiarity acquisition unit 23.
 単語抽出部22に入力されるテキストは、情報処理装置である単語抽出部22が可読なテキストであればどのようなテキストであってもよい。テキストの例は、教科書や小説等の本、新聞や雑誌、Webページに掲載されたテキストである。 The text input to the word extraction unit 22 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
 単語抽出部22は、例えば入力されたテキストについて形態素解析をすることにより、テキストに含まれる各単語を抽出する。 The word extraction unit 22 extracts each word contained in the input text, for example, by performing morphological analysis on the input text.
 <親密度取得部23>
 入力:単語
 出力:単語、親密度
 親密度取得部23には、単語抽出部22が抽出した各単語が入力される。親密度取得部23は、各単語に対応する親密度を記憶部11に記憶されている単語親密度DBから単語から取得する(ステップS23)。
<Familiarity acquisition unit 23>
Input: Word Output: Word, Familiarity Each word extracted by the word extraction unit 22 is input to the intimacy acquisition unit 23 . The intimacy level acquisition unit 23 acquires the familiarity level corresponding to each word from the word familiarity level DB stored in the storage unit 11 (step S23).
 獲得確率取得装置2が単語抽出部22を備えていない場合には、テキストに含まれる各単語が入力される。この場合、親密度取得部23は、テキストに含まれる各単語に対応する親密度を記憶部11に記憶されている単語親密度DBから単語から取得する(ステップS23)。 If the acquisition probability acquisition device 2 does not include the word extraction unit 22, each word included in the text is input. In this case, the familiarity acquisition unit 23 acquires the familiarity corresponding to each word included in the text from the word familiarity DB stored in the storage unit 11 (step S23).
 各単語及び各単語に対応する親密度は、獲得確率取得部24に出力される。 Each word and the familiarity corresponding to each word are output to the acquisition probability acquisition unit 24.
 なお、親密度取得部23は、単語抽出部22は、固有名詞、数詞、助詞等の機能語である単語については親密度を取得しなくてもよい。言い換えれば、単語抽出部22は、内容語である単語のみについて親密度を取得してもよい。 Note that the familiarity acquisition unit 23 and the word extraction unit 22 do not need to acquire the familiarity for words that are function words such as proper nouns, numerals, and particles. In other words, the word extraction unit 22 may acquire the familiarity of only words that are content words.
 数詞、助詞等の機能語は、多くの人が知っている単語である。このため、これらの機能語である単語について親密度を取得することで、言い換えればこれらの機能語である単語を処理の対象とすることで、獲得語情報生成部25により計算される、テキスト中の推定獲得語の割合を高くすることができる。反対に、これらの機能語である単語について親密度を取得しないことで、言い換えればこれらの機能語である単語を処理の対象としないことで、獲得語情報生成部25により計算される、テキスト中の推定獲得語の割合を低くすることができる。 Function words such as number words and particles are words that many people know. Therefore, by acquiring the familiarity of these function words, in other words, by using these function words as processing targets, the acquired word information generation unit 25 calculates the familiarity of the text. The percentage of estimated acquired words can be increased. On the other hand, by not acquiring familiarity for these function words, in other words, by not processing these function words, the acquired word information generation unit 25 calculates The estimated percentage of acquired words can be lowered.
 また、親密度取得部23は、単語親密度DBに含まれていない単語については親密度を取得せずに無視してもよい。これにより、形態素解析が誤っている場合であっても、獲得確率取得の処理を適切に行うことができる。 Additionally, the familiarity acquisition unit 23 may ignore words that are not included in the word familiarity DB without acquiring the familiarity. Thereby, even if the morphological analysis is incorrect, the acquisition probability acquisition process can be performed appropriately.
 <獲得確率取得部24>
 入力:単語、親密度
 出力:単語、獲得確率
 獲得確率取得部24は、各単語に対応する親密度と、モデル記憶部21に記憶されているモデルとを少なくとも用いて、各単語をある者が獲得している確率である獲得確率を取得する(ステップS24)。
<Acquisition probability acquisition unit 24>
Input: word, familiarity output: word, acquisition probability The acquisition probability acquisition unit 24 uses at least the familiarity level corresponding to each word and the model stored in the model storage unit 21 to determine whether each word is The acquisition probability, which is the probability that has been acquired, is acquired (step S24).
 獲得確率取得部24は、各単語に対応する親密度をモデルに入力した場合の出力値を得て、得られた出力値をその各単語に対応する獲得確率とする。言い換えれば、獲得確率取得部24は、モデルにおける、各単語に対応する親密度xに対応するyの値を計算して、その計算値をその各単語に対応する獲得確率とする。 The acquisition probability acquisition unit 24 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 24 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
 モデル記憶部21に記憶されているモデルが、単語に対応する親密度を独立変数xとし、ある者が各単語を知っていると回答する確率を従属変数yとしたロジスティック曲線y=f(x,Ψ)であるがモデルである場合には、獲得確率取得部24は、各単語に対応する親密度xに対応するy=f(x,Ψ)の値を計算して、その計算値をその各単語に対応する獲得確率とする。 The model stored in the model storage unit 21 uses a logistic curve y=f(x ,Ψ) is a model, the acquisition probability acquisition unit 24 calculates the value of y=f(x,Ψ) corresponding to the familiarity x corresponding to each word, and uses the calculated value as Let it be the acquisition probability corresponding to each word.
 獲得確率取得部24は、品詞、単語の長さ等を考慮して獲得確率を取得してもよい。例えば、獲得確率取得部24は、品詞、単語の長さ等も説明変数として用いて獲得確率を取得してもよい。 The acquisition probability acquisition unit 24 may acquire the acquisition probability by considering the part of speech, word length, etc. For example, the acquisition probability acquisition unit 24 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
 各単語及び各単語に対応する獲得確率は、獲得語情報生成部25に出力される。 Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
 <獲得語情報生成部25>
 入力:単語、獲得確率
 出力:獲得語情報
 獲得語情報生成部25は、各単語に対応する獲得確率を用いて、テキストに含まれる単語の獲得に関する情報である獲得語情報を生成する(ステップS25)。
<Acquired word information generation unit 25>
Input: word, acquisition probability Output: acquired word information The acquired word information generation unit 25 generates acquired word information, which is information regarding acquisition of words included in the text, using the acquisition probability corresponding to each word (step S25 ).
 獲得語情報の例は、テキスト中の推定獲得語、テキスト中の推定獲得語の数、テキスト中の推定獲得語の割合の少なくとも1つである。 An example of the acquired word information is at least one of the following: estimated acquired words in the text, the number of estimated acquired words in the text, and the ratio of estimated acquired words in the text.
 以下、テキスト中の推定獲得語、テキスト中の推定獲得語の数、テキスト中の推定獲得語の割合のそれぞれの求めた方の例について説明する。 Hereinafter, examples of how each of the estimated acquired words in a text, the number of estimated acquired words in a text, and the proportion of estimated acquired words in a text are determined will be explained.
 (テキスト中の推定獲得語)
 まず、獲得語情報生成部25は、ある者の語彙数を推定する。語彙数の推定は、第一実施形態の語彙数推定部16で説明した方法により行うことができる。語彙数を推定するために、図4に一点鎖線で示すように、記憶部11から単語親密度DB及びモデル記憶部21からモデルが獲得語情報生成部25に入力されてもよい。
(estimated acquired words in text)
First, the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person. The number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment. In order to estimate the number of vocabulary, the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
 次に、獲得語情報生成部25は、入力された各単語w(k)に対応する親密度以上の親密度の単語の数GOISU(k)を得る。GOISU(k)を得るために、図4に一点鎖線で示すように、記憶部11から単語親密度DBが獲得語情報生成部25に入力されてもよい。 Next, the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k). In order to obtain GOISU(k), the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
 そして、獲得語情報生成部25は、GOISU(k)がある者の語彙数以下である単語を、テキスト中の推定獲得語とする。一般に、親密度が高い語ほどGOISU(k)は小さくなる。このため、ある者は、そのある者の語彙数以下のGOISU(k)の単語を知っていると仮定できる。 Then, the acquired word information generation unit 25 sets words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as estimated acquired words in the text. Generally, the higher the familiarity of a word, the smaller the GOISU(k). Therefore, it can be assumed that a person knows words in GOISU(k) that are less than the number of words in that person's vocabulary.
 図6に、GOISU(k)の例を示す。 FIG. 6 shows an example of GOISU(k).
 (テキスト中の推定獲得語の数)
 まず、獲得語情報生成部25は、ある者の語彙数を推定する。語彙数の推定は、第一実施形態の語彙数推定部16で説明した方法により行うことができる。語彙数を推定するために、図4に一点鎖線で示すように、記憶部11から単語親密度DB及びモデル記憶部21からモデルが獲得語情報生成部25に入力されてもよい。
(Estimated number of acquired words in text)
First, the acquired word information generation unit 25 estimates the number of vocabulary words of a certain person. The number of vocabulary can be estimated by the method described in the vocabulary number estimation unit 16 of the first embodiment. In order to estimate the number of vocabulary, the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed line in FIG.
 次に、獲得語情報生成部25は、入力された各単語w(k)に対応する親密度以上の親密度の単語の数GOISU(k)を得る。GOISU(k)を得るために、図4に一点鎖線で示すように、記憶部11から単語親密度DBが獲得語情報生成部25に入力されてもよい。 Next, the acquired word information generation unit 25 obtains the number GOISU(k) of words with a familiarity level greater than or equal to the familiarity level corresponding to each input word w(k). In order to obtain GOISU(k), the word familiarity DB may be input from the storage unit 11 to the acquired word information generation unit 25, as shown by the dashed line in FIG.
 そして、獲得語情報生成部25は、GOISU(k)がある者の語彙数以下である単語の数を、テキスト中の推定獲得語の数とする。 Then, the acquired word information generation unit 25 sets the number of words whose GOISU(k) is less than or equal to the number of vocabulary of a certain person as the estimated number of acquired words in the text.
 (テキスト中の推定獲得語の割合)
 獲得語情報生成部25は、例えば、以下の式(1)又は式(2)により定まる値を計算して、その計算値をテキスト中の推定獲得語の割合とする。
(Percentage of estimated acquired words in text)
The acquired word information generation unit 25 calculates a value determined by, for example, the following formula (1) or formula (2), and uses the calculated value as the ratio of estimated acquired words in the text.
(Σk=1 y(k)FREQ(k))/Σk=1 FREQ(k)…(1)
(Σk=1 y(k)DIFF(k))/Σk=1 DIFF(k)…(2)
 ここで、FREQ(k)は、単語w(k)がテキストに現れた回数である。テキストが複数のパートに分かれているとして、DIFF(k)は、単語w(k)が現れたパートの数である。パートの例は、単元、章、節等のテキストを構成する所定の単位である。テキスト全体を単位としてもよい。Kは、テキストに含まれ、獲得確率取得部24によって獲得確率が取得された単語の総数である。
k=1 K y(k) FREQ(k))/Σ k=1 K FREQ(k)…(1)
k=1 K y(k)DIFF(k))/Σ k=1 K DIFF(k)…(2)
Here, FREQ(k) is the number of times the word w(k) appears in the text. Assuming that the text is divided into multiple parts, DIFF(k) is the number of parts in which the word w(k) appears. An example of a part is a predetermined unit that constitutes a text, such as a unit, chapter, or section. The entire text may be used as a unit. K is the total number of words included in the text and for which the acquisition probability has been acquired by the acquisition probability acquisition unit 24.
 獲得語情報生成部25は、FREQ(k)及びDIFF(k)を、入力された単語に基づいてカウントする。獲得語情報生成部25は、カウントすることで求まったFREQ(k)及びDIFF(k)を用いて、式(1)又は式(2)により定まる値を計算する。 The acquired word information generation unit 25 counts FREQ(k) and DIFF(k) based on the input word. The acquired word information generation unit 25 calculates a value determined by equation (1) or equation (2) using FREQ(k) and DIFF(k) found by counting.
 図6に、FREQ(k)及びDIFF(k)の例を示す。 FIG. 6 shows an example of FREQ(k) and DIFF(k).
 一般的に、多くの人が知っている単語ほど出現頻度は高くなり、多くの人が知らない単語ほど出現頻度は低くなる。したがって、珍しい語のテキスト中での出現回数は、よく知られている語のテキスト内での出現回数より少なくなる。 In general, the more people know a word, the more frequently it appears, and the more people don't know a word, the less often it appears. Therefore, the number of occurrences of rare words in text will be less than the number of occurrences of well-known words in text.
 このため、FREQ(k)を用いた式(1)によって求まるテキスト中の推定獲得語の割合は、DIFF(k)を用いた式(2)によって求まるテキスト中の推定獲得語の割合よりも高くなると考えられる。式(1)及び(2)のどちらを用いるのかは、獲得語情報としてどのような情報が必要か等に応じて適宜定められる。 Therefore, the proportion of estimated acquired words in the text determined by equation (1) using FREQ(k) is higher than the proportion of estimated acquired words in the text determined by equation (2) using DIFF(k). It is considered to be. Which of equations (1) and (2) to use is determined as appropriate depending on what kind of information is needed as acquired word information.
 なお、獲得語情報生成部25は、テキスト中の推定獲得語の数/Kをテキスト中の推定獲得語の割合としてもよい。テキスト中の推定獲得語の数は、(テキスト中の推定獲得語の数)で説明した方法に求めることができる。 Note that the acquired word information generation unit 25 may use the number of estimated acquired words in the text/K as the ratio of the estimated acquired words in the text. The number of estimated acquired words in a text can be determined by the method described in (Estimated number of acquired words in text).
 [第三実施形態]
 第三実施形態を説明する。第三実施形態は、学習推奨語抽出装置及び方法である。
[Third embodiment]
A third embodiment will be described. The third embodiment is a recommended learning word extraction device and method.
 以下では、第一実施形態及び第一実施形態の変形例との相違点を中心に説明する。既に説明した事項については説明を省略することがある。 Below, differences between the first embodiment and the modification of the first embodiment will be mainly described. Explanation of matters that have already been explained may be omitted.
 図7に例示するように、本実施形態の学習推奨語抽出装置3は、記憶部11、モデル記憶部31、獲得確率取得部32及び学習推奨語抽出部33を備えている。 As illustrated in FIG. 7, the recommended learning word extraction device 3 of this embodiment includes a storage section 11, a model storage section 31, an acquisition probability acquisition section 32, and a recommended learning word extraction section 33.
 <記憶部11>
 記憶部11については、第一実施形態の記憶部11と同じである。
<Storage unit 11>
The storage unit 11 is the same as the storage unit 11 of the first embodiment.
 記憶部11には、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されている。ここで、親密度は単語に対する親密さを表す指標である。 The storage unit 11 stores a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words. Here, the degree of familiarity is an index representing the degree of familiarity with a word.
 <モデル記憶部31>
 モデル記憶部31には、各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルが記憶されている。ここで、「ある者」とは、学習推奨語の抽出の対象となる者である。「ある者」は、利用者100であってもよい。
<Model storage unit 31>
The model storage unit 31 stores a model representing the relationship between a value based on the degree of familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word. Here, "a certain person" is a person from whom recommended learning words are extracted. “Someone” may be the user 100.
 このモデルの例は、第一実施形態及び第一実施形態の変形例のモデル生成装置1で生成されたモデルである。 An example of this model is a model generated by the model generation device 1 of the first embodiment and a modification of the first embodiment.
 図6に破線で示すように、学習推奨語抽出装置3は、モデル記憶部31に記憶されるモデルを生成するためのモデル生成装置1を更に備えていてもよい。 As shown by the broken line in FIG. 6, the recommended learning word extraction device 3 may further include a model generation device 1 for generating a model stored in the model storage unit 31.
 すなわち、学習推奨語抽出装置3は、(1)複数の単語から複数のテスト単語を選択する単語選択部12と、(2)テスト単語を利用者に提示する提示部13と、(3)利用者のテスト単語の知識に関する回答を受け付ける回答受付部14と、(4)テスト単語の知識に関する回答と、記憶部11に記憶されている単語親密度DBとを用いて、テスト単語に対応する親密度に基づく値と、利用者がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得て、得られたモデルをモデル記憶部に記憶されているモデルとするモデル生成部15と、を更に備えていてもよい。 That is, the learning recommended word extraction device 3 includes (1) a word selection section 12 that selects a plurality of test words from a plurality of words, (2) a presentation section 13 that presents test words to a user, and (3) a usage method. and (4) the answer reception unit 14 that receives answers regarding the knowledge of the test words by the parent who corresponds to the test words, using the answers regarding the knowledge of the test words and the word familiarity DB stored in the storage unit 11. A model that obtains a model expressing the relationship between a value based on the density and a value based on the probability that the user answers that he or she knows the test word, and uses the obtained model as the model stored in the model storage unit. The generating unit 15 may further be provided.
 <獲得確率取得部32>
 入力:単語
 出力:単語、獲得確率
 獲得確率取得部32には、学習推奨語の候補となる複数の単語からなる単語集合が入力される。
<Acquisition probability acquisition unit 32>
Input: word Output: word, acquisition probability The acquisition probability acquisition unit 32 receives a word set consisting of a plurality of words that are candidates for recommended learning words.
 獲得確率取得部32は、記憶部11に記憶されている単語親密度DBと、モデル記憶部31に記憶されているモデルとを少なくとも用いて、入力された単語集合に含まれる各単語をある者が獲得している確率である獲得確率を取得する(ステップS32)。 The acquisition probability acquisition unit 32 uses at least the word familiarity DB stored in the storage unit 11 and the model stored in the model storage unit 31 to identify each word included in the input word set to a certain person. The acquisition probability, which is the probability that is acquired, is obtained (step S32).
 獲得確率取得部32は、各単語に対応する親密度をモデルに入力した場合の出力値を得て、得られた出力値をその各単語に対応する獲得確率とする。言い換えれば、獲得確率取得部32は、モデルにおける、各単語に対応する親密度xに対応するyの値を計算して、その計算値をその各単語に対応する獲得確率とする。 The acquisition probability acquisition unit 32 obtains an output value when the familiarity corresponding to each word is input into the model, and uses the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 32 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and uses the calculated value as the acquisition probability corresponding to each word.
 モデル記憶部31に記憶されているモデルが、単語に対応する親密度を独立変数xとし、ある者が各単語を知っていると回答する確率を従属変数yとしたロジスティック曲線y=f(x,Ψ)であるがモデルである場合には、獲得確率取得部32は、各単語に対応する親密度xに対応するy=f(x,Ψ)の値を計算して、その計算値をその各単語に対応する獲得確率とする。 The model stored in the model storage unit 31 uses a logistic curve y=f(x , Ψ) is a model, the acquisition probability acquisition unit 32 calculates the value of y=f(x, Ψ) corresponding to the familiarity x corresponding to each word, and uses the calculated value as Let it be the acquisition probability corresponding to each word.
 獲得確率取得部32は、品詞、単語の長さ等を考慮して獲得確率を取得してもよい。例えば、獲得確率取得部32は、品詞、単語の長さ等も説明変数として用いて獲得確率を取得してもよい。 The acquisition probability acquisition unit 32 may acquire the acquisition probability by considering the part of speech, the length of the word, etc. For example, the acquisition probability acquisition unit 32 may acquire the acquisition probability using part of speech, word length, etc. as explanatory variables.
 各単語及び各単語に対応する獲得確率は、獲得語情報生成部25に出力される。 Each word and the acquisition probability corresponding to each word are output to the acquisition word information generation unit 25.
 <学習推奨語抽出部33>
 入力:単語、獲得確率
 出力:学習推奨語
 学習推奨語抽出部33は、取得された獲得確率に基づいて前記単語集合から学習推奨語を抽出する(ステップS33)。
<Learning recommended word extraction unit 33>
Input: word, acquisition probability Output: recommended learning word The recommended learning word extraction unit 33 extracts recommended learning words from the word set based on the acquired acquisition probability (step S33).
 例えば、学習推奨語抽出部33は、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。 For example, the recommended learning word extraction unit 33 may extract words whose acquisition probability is close to a predetermined probability as recommended learning words.
 所定の確率は、0より大きく1より小さい数である。所定の確率の例は、0.5である。 The predetermined probability is a number greater than 0 and less than 1. An example of a predetermined probability is 0.5.
 学習推奨語抽出部33は、所定の確率に近い所定の個数の単語を、学習推奨語として抽出してもよい。 The recommended learning word extraction unit 33 may extract a predetermined number of words with a predetermined probability as recommended learning words.
 所定の確率が0.5であり、所定の個数が7である場合には、例えば図9に示す7個の単語が学習推奨語として抽出される。図9において、ENTRYは単語の表記であり、PSYは親密度であり、Probは獲得確率であり、YNはこれらの単語について利用者100から知っているという回答又は知らないという回答が得られている場合にはその回答についての情報であり、Distance50はこの場合の所定の確率である0.5とProbとの差の大きさである。 If the predetermined probability is 0.5 and the predetermined number is 7, for example, the 7 words shown in FIG. 9 are extracted as recommended learning words. In FIG. 9, ENTRY is the notation of the word, PSY is the familiarity, Prob is the acquisition probability, and YN is the answer that the user 100 knows or does not know about these words. If there is, it is information about the answer, and Distance50 is the size of the difference between 0.5, which is the predetermined probability in this case, and Prob.
 この例では、利用者100から知っているという回答又は知らないという回答が得られていないため、YNに”-”を表示している。利用者100から単語を知っているという回答が得られている場合にはYNに”1”が表示され、利用者100から単語を知らないという回答が得られている場合にはYNに”0”が表示される。 In this example, "-" is displayed in YN because the user 100 has not answered that he or she knows or does not know. If user 100 answers that they know the word, "1" is displayed in YN, and if user 100 answers that they do not know the word, "0" is displayed in YN. ” is displayed.
 学習推奨語は、学習推奨語の抽出の対象となる者に提示される。学習推奨語は、図9に示す表の形式で、学習推奨語の抽出の対象となる者に提示されてもよい。 The recommended learning words are presented to the person from whom the recommended learning words are to be extracted. The recommended learning words may be presented to a person from whom the recommended learning words are to be extracted, in the form of a table shown in FIG.
 学習推奨語抽出部33は、所定の確率を含む所定の範囲内に含まれる単語を、学習推奨語として抽出してもよい。 The recommended learning word extraction unit 33 may extract words included within a predetermined range that includes a predetermined probability as recommended learning words.
 学習推奨語抽出部33は、所定の品詞の単語であって、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。所定の品詞の例は、動詞、名詞、形容詞である。所定の品詞は、2種類以上の品詞であってもよい。この場合、学習推奨語抽出部33は、2種類以上の品詞のそれぞれの単語の中から、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。 The recommended learning word extraction unit 33 may extract words with a predetermined part of speech and whose obtained acquisition probability is close to a predetermined probability as recommended learning words. Examples of predetermined parts of speech are verbs, nouns, and adjectives. The predetermined part of speech may be two or more types of parts of speech. In this case, the recommended learning word extracting unit 33 may extract, as recommended learning words, words whose acquisition probability is close to a predetermined probability from among words of two or more types of parts of speech.
 品詞の情報は、単語親密度DBに記憶されていてもよい。この場合、学習推奨語抽出部33は、単語親密度DBを参照して、単語の品詞を取得して、上記の処理を行うことができる。 The part of speech information may be stored in the word familiarity DB. In this case, the learning recommended word extraction unit 33 can refer to the word familiarity DB, obtain the part of speech of the word, and perform the above processing.
 学習推奨語抽出部33は、図示していない記憶部に記憶された単語とその品詞が記憶された辞書を参照して、単語の品詞を取得して、上記の処理を行ってもよい。 The learning recommended word extraction unit 33 may refer to a dictionary in which words and their parts of speech are stored in a storage unit (not shown), obtain the part of speech of the word, and perform the above processing.
 <第三実施形態の変形例>
 獲得確率取得部32に入力される、学習推奨語の候補となる複数の単語からなる単語集合は、所定のテキストに含まれる単語であってもよい。そのために、学習推奨語抽出装置3は、以下に説明する単語抽出部34を備えていてもよい。
<Modified example of third embodiment>
The word set that is input to the acquisition probability acquisition unit 32 and is made up of a plurality of words that are candidates for learning recommended words may be words that are included in a predetermined text. For this purpose, the recommended learning word extraction device 3 may include a word extraction section 34 described below.
 <単語抽出部34>
 入力:テキスト
 出力:単語
 単語抽出部34は、入力されたテキストに含まれる各単語を抽出する(ステップS34)。
<Word extraction unit 34>
Input: Text Output: Word The word extraction unit 34 extracts each word included in the input text (step S34).
 抽出された各単語は、学習推奨語の候補である単語集合として獲得確率取得部32に出力される。 Each extracted word is output to the acquisition probability acquisition unit 32 as a word set that is a candidate for a recommended learning word.
 単語抽出部34に入力されるテキストは、情報処理装置である単語抽出部22が可読なテキストであればどのようなテキストであってもよい。テキストの例は、教科書や小説等の本、新聞や雑誌、Webページに掲載されたテキストである。 The text input to the word extraction unit 34 may be any text that can be read by the word extraction unit 22, which is an information processing device. Examples of texts are books such as textbooks and novels, newspapers and magazines, and texts published on web pages.
 単語抽出部34は、例えば入力されたテキストについて形態素解析をすることにより、テキストに含まれる各単語を抽出する。 The word extraction unit 34 extracts each word included in the input text, for example, by performing morphological analysis on the input text.
 [変形例]
 なお、本開示は、上述した実施形態に限定されるものではなく、本開示の要旨を逸脱しない範囲内で様々な変形や応用が可能である。
[Modified example]
Note that the present disclosure is not limited to the embodiments described above, and various modifications and applications are possible without departing from the gist of the present disclosure.
 実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The various processes described in the embodiments are not only executed in chronological order according to the order described, but may also be executed in parallel or individually depending on the processing capacity of the device that executes the processes or as necessary.
 例えば、モデル生成装置1、獲得確率取得装置2、学習推奨語抽出装置3の構成部間のデータのやり取りは直接行われてもよいし、図示していない記憶部を介して行われてもよい。 For example, data may be exchanged directly between the components of the model generation device 1, acquisition probability acquisition device 2, and recommended learning word extraction device 3, or may be performed via a storage unit (not shown). .
 [プログラム、記録媒体]
 上述した各装置の各部の処理をコンピュータにより実現してもよく、この場合は各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムを図10に示すコンピュータ1000の記憶部1020に読み込ませ、演算処理部1010、入力部1030、出力部1040、表示部1060などに動作させることにより、上記各装置における各種の処理機能がコンピュータ上で実現される。
[Program, recording medium]
The processing of each part of each device described above may be realized by a computer, and in this case, the processing contents of the functions that each device should have are described by a program. Then, by loading this program into the storage unit 1020 of the computer 1000 shown in FIG. is realized on a computer.
 この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体は、例えば、非一時的な記録媒体であり、具体的には、磁気記録装置、光ディスク、等である。 A program that describes this processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a non-transitory recording medium, specifically a magnetic recording device, an optical disk, or the like.
 また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 Further, distribution of this program is performed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, this program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to another computer via a network.
 このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の非一時的な記憶装置である補助記録部1050に格納する。そして、処理の実行時、このコンピュータは、自己の非一時的な記憶装置である補助記録部1050に格納されたプログラムを記憶部1020に読み込み、読み込んだプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを記憶部1020に読み込み、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。 A computer that executes such a program, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer into the auxiliary storage unit 1050, which is its own non-temporary storage device. Store. When executing a process, this computer loads a program stored in the auxiliary storage unit 1050, which is its own non-temporary storage device, into the storage unit 1020, and executes the process according to the read program. Further, as another form of execution of this program, the computer may directly load the program from a portable recording medium into the storage unit 1020 and execute processing according to the program. Each time the received program is transferred, processing may be executed in accordance with the received program. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer programs from the server computer to this computer, but only realizes processing functions by issuing execution instructions and obtaining results. You can also use it as Note that the program in this embodiment includes information that is used for processing by an electronic computer and that is similar to a program (data that is not a direct command to the computer but has a property that defines the processing of the computer, etc.).
 また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Furthermore, in this embodiment, the present apparatus is configured by executing a predetermined program on a computer, but at least a part of these processing contents may be implemented in hardware.
 例えば、単語選択部12、提示部13、回答受付部14、モデル生成部15、語彙数推定部16、単語抽出部22、親密度取得部23、獲得確率取得部24、獲得語情報生成部25、獲得確率取得部32、学習推奨語抽出部33、単語抽出部34は、処理回路により構成されてもよい。 For example, the word selection unit 12, the presentation unit 13, the answer reception unit 14, the model generation unit 15, the number of vocabulary estimation unit 16, the word extraction unit 22, the familiarity acquisition unit 23, the acquisition probability acquisition unit 24, the acquired word information generation unit 25 , the acquisition probability acquisition section 32, the recommended learning word extraction section 33, and the word extraction section 34 may be constituted by a processing circuit.
 また、記憶部11、モデル記憶部21、モデル記憶部31は、メモリにより構成されてもよい。 Furthermore, the storage unit 11, model storage unit 21, and model storage unit 31 may be configured by memory.
 以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes are further disclosed.
 (付記項1)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記メモリには、親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されており、
 前記プロセッサは、
 前記メモリに記憶されている単語親密度DBを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する、
 単語選択装置。
(Additional note 1)
memory and
at least one processor connected to the memory;
including;
The memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
The processor includes:
selecting a plurality of test words from the plurality of words using the word familiarity DB stored in the memory such that the intervals of the familiarity corresponding to the test words are constant intervals;
Word selection device.
 (付記項2)
 単語選択処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記単語選択処理は、
 親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する、
 非一時的記憶媒体。
(Additional note 2)
A non-transitory storage medium storing a program executable by a computer to perform a word selection process,
The word selection process includes:
Familiarity is an index representing the familiarity with a word, and using a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, multiple tests are performed from the plurality of words. Select words such that the familiarity intervals corresponding to the test words are at regular intervals.
Non-transitory storage medium.
 (付記項3)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記メモリには、親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されており、
 前記プロセッサは、
 複数のテスト単語と、前記複数のテスト単語が提示された利用者の前記テスト単語の知識に関する回答とを入力として、 前記テスト単語の知識に関する回答と、前記メモリに記憶されている単語親密度DBとを用いて、前記テスト単語に対応する親密度に基づく値と、前記利用者が前記テスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得る、
 モデル生成装置。
(Additional note 3)
memory and
at least one processor connected to the memory;
including;
The memory stores a word familiarity DB in which familiarity is an index representing intimacy with a word, and stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, respectively;
The processor includes:
A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words are presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB stored in the memory are input. to obtain a model representing the relationship between a value based on the degree of familiarity corresponding to the test word and a value based on the probability that the user answers that he/she knows the test word.
Model generator.
 (付記項4)
 モデル生成処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記モデル生成処理は、
 複数のテスト単語と、前記複数のテスト単語が提示された利用者の前記テスト単語の知識に関する回答とを入力として、 前記テスト単語の知識に関する回答と、単語親密度DBとを用いて、前記テスト単語に対応する親密度に基づく値と、前記利用者が前記テスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得、
 前記親密度は単語に対する親密さを表す指標であり、単語親密度DBは、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納している、
 非一時的記憶媒体。
(Additional note 4)
A non-transitory storage medium storing a program executable by a computer to perform a model generation process,
The model generation process is
A plurality of test words and an answer regarding the knowledge of the test word of the user to whom the plurality of test words were presented are input, and the answer regarding the knowledge of the test word and the word familiarity DB are used to perform the test. Obtaining a model representing the relationship between a value based on familiarity corresponding to a word and a value based on a probability that the user answers that he/she knows the test word,
The degree of familiarity is an index representing the degree of familiarity with a word, and the word familiarity DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively.
Non-transitory storage medium.
 (付記項5)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記メモリには、
 親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBと、
 各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、
 が記憶されており、
 前記プロセッサは、
 入力されたテキストに含まれる各単語に対応する親密度を前記メモリに記憶されている単語親密度DBから取得し、
 前記取得された各単語に対応する親密度と、前記メモリに記憶されているモデルとを少なくとも用いて、前記各単語を前記ある者が獲得している確率である獲得確率を取得する、
 獲得確率取得装置。
(Additional note 5)
memory and
at least one processor connected to the memory;
including;
The memory includes:
The degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively;
A model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word;
is memorized,
The processor includes:
Obtaining the familiarity corresponding to each word included in the input text from the word familiarity DB stored in the memory,
acquiring an acquisition probability that is a probability that the certain person has acquired each of the words, using at least the familiarity corresponding to each of the acquired words and the model stored in the memory;
Acquisition probability acquisition device.
 (付記項6)
 獲得確率取得処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記獲得確率取得処理は、
 親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBから、入力されたテキストに含まれる各単語に対応する親密度を取得し、
 各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、前記取得された各単語に対応する親密度とを少なくとも用いて、前記各単語を前記ある者が獲得している確率である獲得確率を取得する、
 非一時的記憶媒体。
(Additional note 6)
A non-temporary storage medium storing a program executable by a computer to execute an acquisition probability acquisition process,
The acquisition probability acquisition process includes:
Familiarity is an index representing the familiarity with a word, and from a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, each word included in the input text is Get the corresponding intimacy,
Using at least a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word, and the familiarity corresponding to each of the acquired words. and obtain an acquisition probability that is the probability that the certain person has acquired each word;
Non-transitory storage medium.
 (付記項7)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記メモリには、
 親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBと、
 各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、
 が記憶されており、
 前記プロセッサは、
 前記メモリに記憶されている単語親密度DBと、前記メモリに記憶されているモデルとを少なくとも用いて、入力された単語集合に含まれる各単語を前記ある者が獲得している確率である獲得確率を取得し、
 前記取得された獲得確率に基づいて単語集合から学習推奨語を抽出する、
 学習推奨語抽出装置。
(Supplementary Note 7)
memory and
at least one processor connected to the memory;
including;
The memory includes:
The degree of familiarity is an index representing the degree of intimacy with a word, and a word familiarity degree DB stores a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively;
A model representing the relationship between a value based on familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word;
is memorized,
The processor includes:
acquisition, which is the probability that the certain person has acquired each word included in the input word set, using at least the word familiarity DB stored in the memory and the model stored in the memory; get the probability,
extracting recommended learning words from the word set based on the acquired acquisition probability;
Recommended learning word extraction device.
 (付記項8)
 学習推奨語抽出処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記学習推奨語抽出処理は、
 親密度は単語に対する親密さを表す指標であり、複数の単語と前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBと、各単語に対応する親密度に基づく値とある者が各単語を獲得している確率に基づく値との関係を表すモデルとを少なくとも用いて、入力された単語集合に含まれる各単語を前記ある者が獲得している確率である獲得確率を取得し、
 前記取得された獲得確率に基づいて単語集合から学習推奨語を抽出する、
 非一時的記憶媒体。
(Supplementary Note 8)
A non-temporary storage medium storing a program executable by a computer to execute a recommended learning word extraction process,
The learning recommended word extraction process is
Familiarity is an index representing the familiarity with a word, and there is a word familiarity DB that stores a plurality of words and a plurality of familiarity levels corresponding to the plurality of words, and a value based on the familiarity corresponding to each word. acquisition probability, which is the probability that a certain person has acquired each word included in the input word set, using at least a model representing a relationship between a value based on the probability that a certain person has acquired each word; get
extracting recommended learning words from the word set based on the acquired acquisition probability;
Non-transitory storage medium.
 本明細書に記載された全ての文献、特許出願、及び技術規格は、個々の文献、特許出願、及び技術規格が参照により取り込まれることが具体的かつ個々に記載された場合と同程度に、本明細書中に参照により取り込まれる。 All documents, patent applications, and technical standards mentioned herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually indicated to be incorporated by reference. Incorporated herein by reference.

Claims (4)

  1.  親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBが記憶されている記憶部と、
     前記記憶部に記憶されている単語親密度DBを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する単語選択部と、
     を含む単語選択装置。
    The degree of familiarity is an index representing the degree of familiarity with a word, and a storage unit in which a word familiarity degree DB storing a plurality of words and a plurality of degrees of familiarity corresponding to the plurality of words, respectively, is stored;
    a word selection unit that selects a plurality of test words from the plurality of words using a word familiarity DB stored in the storage unit such that familiarity intervals corresponding to the test words are constant intervals;
    A word selection device containing.
  2.  請求項1の単語選択装置であって、
     前記テスト単語を利用者に提示する提示部と、
     前記利用者の前記テスト単語の知識に関する回答を受け付ける回答受付部と、
     を更に含む単語選択装置。
    The word selection device according to claim 1,
    a presentation unit that presents the test words to the user;
    an answer reception unit that receives answers regarding the user's knowledge of the test words;
    A word selection device further comprising:
  3.  単語選択部が、親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度DBを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する単語選択ステップ、
     を含む単語選択方法。
    The word selection unit uses a word familiarity DB that stores a plurality of words and a plurality of familiarity degrees corresponding to the plurality of words, where the degree of familiarity is an index representing the degree of intimacy with a word. a word selection step of selecting a plurality of test words from the words such that familiarity intervals corresponding to the test words are constant;
    Word selection methods including.
  4.  請求項1の単語選択装置の各部としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as each part of the word selection device according to claim 1.
PCT/JP2022/021577 2022-05-26 2022-05-26 Word selection device, method, and program WO2023228359A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/021577 WO2023228359A1 (en) 2022-05-26 2022-05-26 Word selection device, method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/021577 WO2023228359A1 (en) 2022-05-26 2022-05-26 Word selection device, method, and program

Publications (1)

Publication Number Publication Date
WO2023228359A1 true WO2023228359A1 (en) 2023-11-30

Family

ID=88918766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/021577 WO2023228359A1 (en) 2022-05-26 2022-05-26 Word selection device, method, and program

Country Status (1)

Country Link
WO (1) WO2023228359A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005107483A (en) * 2003-09-11 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> Word learning method, word learning apparatus, word learning program, and recording medium with the program recorded thereon, and character string learning method, character string learning apparatus, character string learning program, and recording medium with the program recorded thereon
WO2021260760A1 (en) * 2020-06-22 2021-12-30 日本電信電話株式会社 Vocabulary count estimation device, vocabulary count estimation method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005107483A (en) * 2003-09-11 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> Word learning method, word learning apparatus, word learning program, and recording medium with the program recorded thereon, and character string learning method, character string learning apparatus, character string learning program, and recording medium with the program recorded thereon
WO2021260760A1 (en) * 2020-06-22 2021-12-30 日本電信電話株式会社 Vocabulary count estimation device, vocabulary count estimation method, and program

Similar Documents

Publication Publication Date Title
US10217464B2 (en) Vocabulary generation system
JP6544131B2 (en) INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM
JP5664978B2 (en) Learning support system and learning support method
US9208144B1 (en) Crowd-sourced automated vocabulary learning system
BR122017002789A2 (en) systems and methods for language learning
Han Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach
EP2966601A1 (en) Comprehension assistance system, comprehension assistance server, comprehension assistance method, and computer-readable recording medium
AU2018345706A1 (en) Tailoring an interactive dialog application based on creator provided content
EP2974654A1 (en) Hearing examination device, hearing examination method, and method for generating words for hearing examination
Butler et al. Exploration of automatic speech recognition for deaf and hard of hearing students in higher education classes
JP6030659B2 (en) Mental health care support device, system, method and program
Ambrazaitis Nuclear intonation in Swedish: Evidence from experimental-phonetic studies and a comparison with German
JP6717387B2 (en) Text evaluation device, text evaluation method and recording medium
WO2023228359A1 (en) Word selection device, method, and program
WO2023228361A1 (en) Acquisition probability acquisition device, method, and program
WO2023228360A1 (en) Model generation device, method, and program
WO2023228358A1 (en) Learning recommendation word extraction device, method, and program
KR20180096317A (en) System for learning the english
WO2020036011A1 (en) Information processing device, information processing method, and program
KR101432791B1 (en) Sentence display method according to pitch of sentence and language contens service system using the method for sentence display method
JPWO2015093123A1 (en) Information processing device
JP7396488B2 (en) Vocabulary count estimation device, vocabulary count estimation method, and program
JP7396487B2 (en) Vocabulary count estimation device, vocabulary count estimation method, and program
JP2021162732A (en) Subject recommendation system
CN112307748A (en) Method and device for processing text

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943761

Country of ref document: EP

Kind code of ref document: A1