WO2021245997A1 - Dispositif d'aide à l'apprentissage des langues, programme, et procédé de traitement d'informations - Google Patents

Dispositif d'aide à l'apprentissage des langues, programme, et procédé de traitement d'informations Download PDF

Info

Publication number
WO2021245997A1
WO2021245997A1 PCT/JP2021/006599 JP2021006599W WO2021245997A1 WO 2021245997 A1 WO2021245997 A1 WO 2021245997A1 JP 2021006599 W JP2021006599 W JP 2021006599W WO 2021245997 A1 WO2021245997 A1 WO 2021245997A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
phrase
words
sort
language learning
Prior art date
Application number
PCT/JP2021/006599
Other languages
English (en)
Japanese (ja)
Inventor
拓途 西村
Original Assignee
言語研究開発合同会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 言語研究開発合同会社 filed Critical 言語研究開発合同会社
Publication of WO2021245997A1 publication Critical patent/WO2021245997A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages

Definitions

  • the present invention relates to a language learning support device, a program, and an information processing method.
  • Patent Document 1 discloses a teaching material creation support system that supports the creation of teaching materials.
  • the present invention has decided to provide a technique that enables the creation of language learning teaching materials based on objective data.
  • a language learning support device includes a word extraction unit, a counting unit, and a sorting unit.
  • the word extraction unit is configured to extract words contained in a sentence.
  • the counting unit is configured to count the number of occurrences of words and phrases.
  • a phrase is a unit that treats a plurality of extracted words as a combination of a plurality of words arranged in the order of appearance of a sentence.
  • the sort unit is configured to sort the counted phrases based on the number of occurrences.
  • the program for realizing the software appearing in the present embodiment may be provided as a non-transitory recording medium (Non-Transity Computer-Readable Medium) that can be read by a computer, or may be downloaded from an external server. It may be provided as possible, or it may be provided so that the program is started by an external computer and the function is realized by the client terminal (so-called cloud computing).
  • Non-Transity Computer-Readable Medium Non-Transity Computer-Readable Medium
  • the "part" may include, for example, a combination of hardware resources implemented by a circuit in a broad sense and information processing of software specifically realized by these hardware resources. ..
  • various information is handled in this embodiment, and these information are, for example, physical values of signal values representing voltage and current, and signal values as a bit aggregate of a binary number composed of 0 or 1. It is represented by high and low or quantum superposition (so-called qubit), and communication / operation can be executed on a circuit in a broad sense.
  • a circuit in a broad sense is a circuit realized by at least appropriately combining a circuit, a circuit, a processor, a memory, and the like. That is, an integrated circuit for a specific application (Application Specific Integrated Circuit: ASIC), a programmable logic device (for example, a simple programmable logic device (Simple Programmable Logic Device: SPLD), a composite programmable logic device (Complex Program)) It includes a programmable gate array (Field Programgable Gate Array: FPGA) and the like.
  • FIG. 1 is a block diagram showing a hardware configuration of the language learning support device 3 according to the first embodiment.
  • the language learning support device 3 is implemented by installing a dedicated program on the computer.
  • the language learning support device 3 has a communication unit 31, a storage unit 32, a control unit 33, a display unit 34, and an input unit 35, and these components are communication buses inside the language learning support device 3. It is electrically connected via 30. Each component will be further described.
  • Communication unit 31 Although wired communication means such as USB, IEEE1394, Thunderbolt, and wired LAN network communication are preferable, the communication unit 31 can perform wireless LAN network communication, mobile communication such as LTE / 3G, Bluetooth (registered trademark) communication, and the like as necessary. May be included. That is, it is more preferable to carry out as a set of these plurality of communication means.
  • the storage unit 32 stores various information defined by the above description. This is, for example, as a storage device such as a solid state drive (SSD) for storing various programs and the like related to the language learning support device 3 executed by the control unit 33, or temporarily related to the calculation of the program. It can be implemented as a memory such as a random access memory (Random Access Memory: RAM) that stores necessary information (arguments, arrays, etc.). Further, these combinations may be used.
  • Control unit 33 The control unit 33 processes and controls the overall operation related to the language learning support device 3.
  • the control unit 33 is, for example, a central processing unit (CPU) (not shown).
  • the control unit 33 realizes various functions related to the language learning support device 3 by reading out a predetermined program stored in the storage unit 32. That is, information processing by software (stored in the storage unit 32) is specifically realized by hardware (control unit 33), and is executed as each functional unit (see FIG. 2) included in the control unit 33. Can be done. These will be described in more detail in the next section.
  • the control unit 33 is not limited to a single unit, and may be implemented so as to have a plurality of control units 33 for each function. Further, it may be a combination thereof.
  • the display unit 34 may be included in the housing of the language learning support device 3, or may be externally attached, for example.
  • the display unit 34 displays a screen of a graphical user interface (GUI) that can be operated by the user.
  • GUI graphical user interface
  • the display device can selectively display the display screen in response to the control signal of the output unit 338 in the control unit 33. As a result, the display unit 34 can visually display the sort result S by the user.
  • the input unit 35 may be included in the housing of the language learning support device 3 or may be externally attached.
  • the input unit 35 may be implemented as a touch panel integrally with the display unit 34. If it is a touch panel, the user can input a tap operation, a swipe operation, and the like.
  • a switch button, a mouse, a QWERTY keyboard, or the like may be adopted. That is, the input unit 35 receives the operation input made by the user.
  • the input is transferred to the control unit 33 as a command signal via the communication bus 30, and the control unit 33 can execute predetermined control or calculation as needed.
  • the user uses the input unit 35 to set the upper limit of the number of words to be included in the phrase, whether or not natural language processing is possible, the number of words of the adopted phrase 4, the display condition of the sort result S, the processing condition of the sentence data T1, and the like. You can enter it.
  • FIG. 2 is a functional block diagram showing the functions of the language learning support device 3.
  • the information processing by the software stored in the storage unit 32
  • the hardware control unit 33
  • the language learning support device 3 (control unit 33) has a reception unit 331, a setting unit 332, a processing unit 333, a word extraction unit 334, a count unit 335, a sort unit 336, and duplicate deletion.
  • a unit 337 and an output unit 338 are provided.
  • the reception unit 331 receives information via the communication unit 31 or the storage unit 32, and is configured to be readable in the working memory.
  • the reception unit 331 is configured to receive various information via the communication unit 31, the storage unit 32, or the input unit 35.
  • the reception unit 331 accepts the text data T1 and information related to the processing settings of the text data T1 as input data.
  • Information on processing settings such as sentence data T1, sentence data T1, and phrase group F may be read out in advance in the storage unit 32 of the language learning support device 3, or may be stored in an external medium. You may try to read things. Alternatively, the user may directly create these data or information using the input unit 35, or may download these data or information from the outside via the communication unit 31.
  • the setting unit 332 sets the processing conditions by the language learning support device 3 based on various information received by the reception unit 331. Specifically, for example, the setting unit 332 sets the upper limit of the number of words to be included in the phrase, the presence / absence of natural language processing, the number of words included in the adopted phrase 4, and the sort result based on the information regarding the processing setting of the sentence data T1. Set the processing conditions of the text data T1 such as the display condition of S.
  • Various settings by the setting unit 332 are stored in the storage unit 32 as a setting file. That is, the setting unit 332 makes various settings based on the processing conditions, and reflects the processing conditions of the sentence data T1 received by the reception unit 331 in the information processing of the language learning support device 3. The details of the processing conditions will be described in detail in the next section.
  • the processing unit 333 processes the file including the sentence data T1 and the words, symbols, numbers and the like included in the sentence data T1 based on the settings made by the setting unit 332. Specifically, the files including the sentence data T1 are combined / divided, the format of the sentence data T1 is converted, and the words and phrases including arbitrary symbols are deleted.
  • the word extraction unit 334 extracts words included in the sentence based on a predetermined input in the sentence. It should be noted that the word extraction unit 334 extracts symbols and numbers included in the sentence together with the word.
  • the counting unit 335 is configured to count the number of occurrences of words and phrases. Further, when the upper limit value is set, the counting unit 335 is configured to count the number of occurrences of words and phrases equal to or less than the set upper limit value.
  • the sort unit 336 sorts the counted words and phrases based on the number of occurrences.
  • the sort unit 336 sorts the adopted phrase 4 based on the number of occurrences. Further, the sort unit may sort the phrases counted by the count unit 335 based on the appearance degree F4. As a result, words and phrases that appear frequently are shown in a ranking format.
  • the duplicate deletion unit 337 is configured to determine the adopted phrase 4 by deleting a part of the plurality of phrases. ..
  • the output unit 338 outputs the sort result S, which is displayed on the display unit 34 of the language learning support device 3.
  • the sort result S output by the output unit 338 is, for example, the sort result S18 to the sort result S22, and the details will be described later.
  • FIG. 3 is an activity diagram showing an operation flow of the language learning support device 3. Hereinafter, each activity in FIG. 3 will be described.
  • the user uses the input unit 35 to read the information regarding the processing settings of the sentence data T1 and the sentence data T1 into the dedicated program pre-installed in the language learning support device 3.
  • the reception unit 331 receives information regarding the processing settings of the text data T1 and the text data T1 (activity A1).
  • the reception unit 331 may accept the text data T1 that has been processed in natural language.
  • the file format of the text data T1 is, for example, a text format (.txt or .csv).
  • the text data T1 may consist of a plurality of files or may be a single file.
  • the text data T1 is preferably a language corpus, but any kind of material such as academic papers, newspapers, speeches, etc., is not limited as long as it is composed of any language.
  • the sentence data T1 is preferably composed of hundreds of millions of words or more, but is not limited to this, and may be 1000 words or less.
  • the information regarding the processing settings of the sentence data T1 is, for example, information such as an upper limit of the number of words to be included in the phrase, a setting regarding natural language processing, and a setting regarding a file division unit.
  • the language of the sentence data T1 is not particularly limited, but is, for example, English, Chinese, French, German, Spanish, Russian, Portuguese, Malawi, Arabic, and the like. In this embodiment, English is used as an example.
  • the setting unit 332 sets the upper limit of the number of words to be included in the phrase based on the processing setting received in the activity A1 (activity A2). At this time, the processing conditions of the sentence data T1 such as the setting related to the natural language processing and the number of words of the adopted phrase 4 are set.
  • the processing unit 333 divides the file for each predetermined number of words (activity A3).
  • the predetermined number of words is, for example, 10,000 words, 1 million words, etc., based on the processing settings of the setting unit 332.
  • the processing unit 333 combines all the files before dividing the files and then divides the files. For example, if the total number of words contained in the file is 1 billion words, the processing unit 333 divides the file into 1000 files for every 1 million words.
  • the processing unit 333 converts the characters, numbers, and symbols included in the text data T1 into a predetermined format based on the settings made in the activity A2 (activity A4). Further, the processing unit 333 deletes the line feed included in the text data T1. Specifically, for example, the processing unit 333 converts full-width characters (including alphanumericals and symbols) in the text into half-width characters, and also converts uppercase letters of the alphabet into lowercase letters. Further, the processing unit 333 deletes the line feed included in the text data T1.
  • FIG. 4 is a diagram showing the text data T1 and the result of the conversion process by the processing unit 333.
  • the text data T1 shown in FIG. 4 is the text data T1 received in the activity A1.
  • the intermediate data T10 is a processing result when the line feed included in the text data T1 is deleted by the processing unit 333.
  • the intermediate data T11 is a processing result when the uppercase letters included in the text are converted into lowercase letters by the processing unit 333.
  • the processing unit 332 when the processing unit 332 is set to perform natural language processing, the processing unit 333 replaces each word in the sentence with a part of speech.
  • the intermediate data T12 is the processing result when a part of the sentence "my father's dragon chapter one my father meets" is converted and replaced with "qualifier, qualifier, noun, number, qualifier, noun, verb". This is an example.
  • the intermediate data T12 an example of performing natural language processing regardless of the type of word is shown, but such processing may be performed only for a specific word based on the setting of the setting unit 332. For example, when the setting unit 332 is set to perform natural language processing only for "a” or "the”, natural language processing may be performed only for "a” or "the".
  • the word extraction unit 334 extracts words, symbols, and numbers (hereinafter, words, etc.) included in the sentence data T1 based on predetermined inputs (for example, spaces, tab symbols, or line breaks) in the sentence (activity A5). .. Further, when a word or the like is extracted, the processing unit 333 generates a word list in which the extracted words or the like are arranged in the order of appearance of the sentence, and a phrase composed of words or the like having an upper limit or less based on the word list. Generate a phrase list containing (Activity A6). It should be noted that the processing unit 333 considers the symbols and numbers extracted together with the words as one word, respectively, and creates a word list and a phrase list.
  • a phrase is a unit that treats a plurality of extracted words as a combination of a plurality of words arranged in the order of appearance of a sentence.
  • 5 and 6 are diagrams showing the results of conversion processing by the processing unit 333.
  • the processing unit 333 when the upper limit value is 2, after the word included in the sentence data T1 is extracted by the word extraction unit 334, the processing unit 333 generates a word list (for example, intermediate data T13) (FIG. 5). Further, the processing unit 333 generates a phrase list (for example, intermediate data T15) in which the extracted words are listed as phrases for every two words based on the word list.
  • the two-word phrase is "my father” in the order of appearance of the sentence. "S”, “dragon chapter” and “father's dragon”, “chipter one” are listed. That is, any consecutive combination is listed as a phrase. More generalized, in the case of n-word phrases, n ways of listing can be considered. By doing so, a complete listing is realized. As a result, the sentence data T1 is converted into two lists (word list and phrase list of two words) by the processing unit 333. Since it is complicated to list and explain any continuous combination, in the following, a representative one of the continuous arbitrary combinations will be selected and described as a representative example. ..
  • the processing unit 333 When the upper limit value is 3, the processing unit 333 generates three phrase lists including the phrase of three words in addition to the above-mentioned word list and the phrase list including two words. In such a case, the extracted words are listed in the order of appearance of the sentence, with "my father's dragon" as a representative example of the phrase of three words (not shown).
  • phrase list containing four words and a phrase list containing five words are generated, and a total of five word lists or phrase lists are generated.
  • each symbol and number is treated as one word. That is, when “little", “boy”, and “.” Are listed in the word list and these are treated as one phrase, "little boy.” Is regarded as a three-word phrase.
  • the processing unit 333 deletes the predetermined symbol extracted together with the word and the phrase including the predetermined symbol based on the processing setting (activity A7). For example, when the setting unit 332 is set to delete symbols such as commas, periods, question marks, and double quotation marks, the processing unit 333 deletes the intermediate data T13 and then deletes the intermediate data. Generate T14. Further, in the phrase list composed of two-word phrases, as shown in FIG. 6, the processing unit 333 may use the phrases indicated by the intermediate data T15 as "boy.” "Street.” The phrase related to "?” Is deleted and intermediate data T16 is generated (FIG. 6).
  • the word list (for example, intermediate data T14) and phrase list (for example, intermediate data T16) generated here are , Preferably generated in text format.
  • the counting unit 335 counts the number of occurrences of words and phrases equal to or less than the set upper limit value (activity A8). Further, when the number of appearances is counted, the sort unit 336 sorts the counted phrases based on the number of appearances. That is, the sort unit 336 arranges the words or phrases that appear in the sentence in descending order of the number of occurrences.
  • FIG. 7 is a diagram showing a sort result S by the sort unit 336. Is displayed. For example, the sort result S10 is arranged in descending order of the number of occurrences of words included in the word list. Further, in the sort result S11, the phrases included in the phrase list of two words are arranged in descending order of the number of occurrences.
  • the duplicate deletion unit 337 determines the adopted phrase 4 by deleting a part of the plurality of phrases (Activity A9).
  • the adopted phrase 4 is preferably determined based on the number of words included in the phrase. More specifically, the adopted phrase 4 is preferably a phrase having the maximum or minimum number of words contained in the phrase. The determination of whether the adopted phrase 4 is the maximum phrase or the minimum phrase and the determination of the number of words included in the adopted phrase 4 are based on the processing settings of the sentence data T1 made by the setting unit 332. It will be.
  • FIG. 8 is a diagram showing the sort result S before and after the duplicate deletion.
  • FIG. 8 shows an example in which the upper limit value is set to 3 and the adopted phrase 4 is set as the maximum number of words included in the phrase. For example, referring to the sort result S12 to the sort result S14 before deleting duplicates, the word “he” appears three times, the phrase “he is” twice, and the phrase "he is a student” twice in the sentence. When it appears, the duplicate deletion unit 337 deletes "he” and "he is” that are duplicated in each sort result S, determines the adoption phrase 4 as "he is a student", and sort result S15. -Generates the sort result S17.
  • the duplicate deletion unit 337 deletes the rest of the plurality of phrases except for one adopted phrase 4.
  • FIG. 8 refer to FIG. 8 as an example when the phrase with the largest number of words included in the phrase is set as the adopted phrase 4.
  • the phrases included in the sort result S12 and the sort result S13 other than the sort result S14 having the maximum number of words are deleted. That is, the phrase included in the sort result S14 is preferentially left as the adopted phrase 4. More specifically, when the sort results S before and after the duplicate deletion are compared, "he”, “is”, “a”, "he is” and "is a" are included in the phrases of the sort result S14, respectively.
  • the number of words included in the adopted phrase 4 has a lower limit of 1, and there is no upper limit, but it is preferably 3 or more and 20 or less. Specifically, it is 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20, and any of the numerical values exemplified here. It may be within the range between the two.
  • the number of words included in the adopted phrase 4 is set to 4, and each phrase having the number of words 3 to 20 contains the same combination of words, the number of words is 3 and 5.
  • ⁇ 20 phrases are deleted, and the phrase with 4 words is left as the adopted phrase 4.
  • a phrase having a number of words suitable for language learning is preferentially left as the adopted phrase 4, and the learner can learn the language more efficiently.
  • the sort unit 336 sorts the adopted phrase 4 based on the number of occurrences (activity A10). Then, when the natural language processing is not performed, the output unit 338 outputs such a result (activity A11). Then, the display unit 34 displays the output sort result S. As a result, the user can confirm the word or phrase that appears frequently in the sentence in the ranking format, and the learner can learn the language based on the objective data.
  • FIG. 9 is an example of the sort result S displayed on the display unit 34.
  • the sort result S18, the sort result S19, and the sort result S20 are the sort results S in the phrases of 2 words, 3 words, and 5 words, respectively.
  • the sort result S is displayed based on the display setting of the sort result S set by the setting unit 332.
  • the display setting of the sort result S is, for example, a setting for displaying only words and phrases having an appearance frequency of two or more times, and a setting for displaying only words and phrases having the top ten appearance frequencies.
  • the sort result S18 to the sort result S22 are examples of the sort result S displayed based on the setting of displaying only words and phrases having an appearance frequency of 3 times or more.
  • FIG. 10 is an example of a diagram showing phrases before and after the conversion process by the processing unit 333.
  • the processing unit 333 converts a phrase in which a predetermined word included in a sentence is replaced with a predetermined part of speech into a predetermined character. That is, the processing unit 333 converts "be going to verb" and "have to verb” into “be going to do” and "have to do” (see sort result S24 and sort result S24).
  • the output unit 338 outputs the sort result S (activity A13). Even when natural language processing is performed, the activity A12 is skipped and the sort result S is output with each word replaced with a predetermined part of speech as in the sort result S22. You may.
  • the sort unit 336 may sort the adopted phrase 4 based on the number of occurrences regardless of the number of words. Specifically, after the activity A10 or the activity A12, the processing unit 333 adds up the sort result S18 to the sort result S20, and the sort unit 336 sorts the processing result in the activity A11 or the activity A13 based on the number of occurrences. You may.
  • the sort result S21 is the result of sorting the adopted phrases 4 regardless of the number of words included in each phrase when the upper limit value is set to 5. That is, not the ranking of the number of occurrences of the phrase by the number of words, but the overall ranking showing the sort results S having different numbers of words.
  • the language learning support device 3 it is possible to generate a ranking of words and phrases based on the frequency of actual use by using the sentence data T1 as an input. By using such rankings, it is possible to create language learning materials based on more objective data, and it is thought that it will contribute to language learning support. Further, in the present embodiment, since the language learning support device 3 in which the dedicated program is installed is used, it can be used even in an offline environment and is suitable for handling a large amount of sentence data T1.
  • FIG. 11 is a diagram showing an outline of the configuration of the system 1 according to the present embodiment.
  • the system 1 includes a terminal 2 and a language learning support device 3, which are configured to be communicable through a telecommunication line.
  • the terminal 2 may be in any form as long as it can access the language learning support device 3 through a smartphone, a tablet terminal, a computer, or any other telecommunication line.
  • the terminal 2 has a communication unit, a storage unit, a control unit, a display unit, and an input unit, and these components are electrically connected to each other inside the terminal 2 via a communication bus.
  • the communication unit the storage unit, the control unit, the display unit, and the input unit
  • the communication unit 31 the storage unit 32, the control unit 33, the display unit 34, and the language learning support device 3 described in Section 1
  • the input unit 35 Please refer to the description of the input unit 35.
  • the language learning support device 3 has a communication unit 31, a storage unit 32, and a control unit 33 as hardware configurations, and these components are electrically connected to the inside of the language learning support device 3 via the communication bus 30. Is connected.
  • the language learning support device 3 (control unit 33) in the second embodiment has a reception unit 331, a setting unit 332, a processing unit 333, a word extraction unit 334, and a counting unit 335 as functional configurations. It includes a sort unit 336, a duplicate deletion unit 337, and an output unit 338.
  • the reception unit 331 is configured to receive various information from the terminal 2 used by the user via the network and the communication unit 31. Specifically, the reception unit 331 receives the sentence data T1 from the terminal 2 and the information regarding the processing setting of the sentence data T1 in the language learning support device 3.
  • the output unit 338 outputs the sort result S, which is displayed on the display unit of the terminal 2.
  • the output unit 338 may generate only the rendering information for displaying the sort result S on the terminal 2.
  • the system 1 it is possible to generate a ranking of words and phrases based on the frequency of actual use by inputting the sentence data T1.
  • rankings it is possible to create language learning materials based on more objective data, and it is thought that it will contribute to language learning support.
  • the user is configured to be able to access the language learning support device 3 which is an external server via the terminal 2, and many users can generate the ranking more affordably.
  • FIG. 12 is a functional block diagram showing the functions of the language learning support device 3 according to the third embodiment. Specifically, the language learning support device 3 further includes a calculation unit 339.
  • FIG. 13 is an activity diagram showing the flow of operation of the language learning support device 3. Hereinafter, each activity in FIG. 13 will be described.
  • the user may use the input unit 35 to read the phrase group F, the frequency data T5, and the weighted data T6 into a dedicated program pre-installed in the language learning support device 3 as input data. Further, at this time, the user may read the information related to the weighting processing condition as input data.
  • the reception unit 331 receives these input data (A101). That is, the reception unit 331 receives the phrase group F and the frequency data T5.
  • the input data received by the reception unit 331 is stored in the storage unit 32.
  • FIG. 14 is a diagram showing an example of input data.
  • the phrase group F includes a plurality of phrases including a word or two or more words.
  • the duplicate deletion unit 337 may delete the duplicate phrases.
  • Frequency data T5 is data indicating the number of times a word or phrase appears. Specifically, the frequency data T5 is data in which a word or phrase is associated with the number of occurrences thereof. According to the example of FIG. 14, it is shown that the number of appearances of "This" is 10 and the number of appearances of "is” is 20.
  • the duplicate deletion unit 337 may delete the duplicate after adding up the number of occurrences of the duplicate word or phrase.
  • Weighted data T6 is data indicating the weight of a specific word or phrase. According to the example of FIG. 14, among the words or phrases included in the frequency data T5, “good”, “this is”, and “now” are weighted with 5, 2, and 4, respectively. There is. Further, the information related to the weighting processing condition is, for example, information related to a condition for uniformly weighting a specific type of word or phrase.
  • the calculation unit 339 calculates the appearance degree F4 of each word or each phrase included in the phrase group F based on the phrase group F and the frequency data T5 (A102). Specifically, the calculation unit 339 calculates the appearance degree F4 from the total number of appearances of each word or each phrase included in the phrase group F with reference to the frequency data T5.
  • FIG. 15 is a diagram showing an example of a processing result by the control unit 33.
  • the intermediate data T17 is an example of the calculated appearance degree F4a.
  • the calculation unit 339 includes the words “this", “is”, and “good” included in “this is good”.
  • 65 which is the total of the number of appearances (10 times, 20 times, 30 times, respectively) and the number of appearances of the phrase "this is” included in "this is good” (5 times), is calculated by the number of words in this phrase. Divide by 3 and calculate 21.67 as the appearance degree F4a of "this is good".
  • the calculation unit 339 calculates the appearance degree F4a after performing rounding. Specifically, for example, the calculation unit 339 calculates the number rounded to the third decimal place as the appearance degree F4a. In this way, the appearance degree F4a is calculated for all the phrases included in the phrase group F.
  • the calculation unit 339 may calculate the appearance degree F4 after weighting a specific word or phrase based on the weighting data T6 and the information related to the weighting processing condition.
  • FIGS. 14 and 15 An example in which the calculation unit 339 calculates the appearance degree F4b of “this is good” based on the weighted data T6 will be described with reference to.
  • the intermediate data T18 is an example of the calculated appearance degree F4b.
  • the calculation unit 339 In the weighted data T6, a weight of a value 3 is given to "good” included in “this is good". Therefore, in the frequency data T5, the calculation unit 339 considers that the number of appearances of "good” is 30 times, and the number of appearances of "good” is 90 times by multiplying this by 3. Calculate F4b. That is, the calculation unit 339 includes the number of appearances (10 times, 20 times, 90 times, respectively) of the words “this", “is”, and “good” included in “this is good” and the phrase “this is”. The total of 125, which is the total number of appearances of "" (5 times), is divided by 3, which is the number of words in this phrase, and 41.67 is calculated as the appearance degree F4b of "this is good".
  • the appearance degree F4b is calculated for all the phrases included in the phrase group F.
  • the calculation unit 339 can calculate the appearance degree F4 by various methods. For example, the calculation unit 339 may divide or multiply by a preset value the total number of occurrences of each word or each phrase included in the phrase group F without dividing by 3, which is the number of words in the phrase. It may be calculated as it is as the appearance degree F4b. Further, for example, the calculation unit 339 may perform a four-rule calculation on the total number of occurrences of each word or each phrase included in the phrase group F with the phrase having the largest number of words among the phrases included in the phrase group F.
  • the calculation unit 339 performs natural language processing and then determines the number of appearances of the verb included in the frequency data T5.
  • the appearance degree F4 may be calculated by doubling.
  • the calculation unit 339 increases the number of appearances of the nouns included in the frequency data T5 by 0.5 times and the appearance degree F4. May be calculated.
  • the calculation unit 339 can calculate the appearance degree F4 by freely changing the magnitude of the weight depending on the conditions.
  • the sort unit 336 sorts the phrases based on the appearance degree F4 (A103). Specifically, the sort unit 336 sorts the phrases in descending order of the appearance degree F4 included in the phrase group F.
  • the sort result S25 is an example of the phrase group F sorted based on the appearance degree F4a.
  • the sort result S26 is an example of the phrase group F sorted based on the appearance degree F4b. As a result, the phrase with a large number of occurrences calculated based on the frequency data T5 is shown in the ranking format.
  • the sorting unit may sort the phrases counted by the counting unit 335 based on the appearance degree F4.
  • the output unit 338 outputs the sort result S25 or the sort result S26 (A104). This ends the information processing in the third embodiment.
  • the embodiment of the present embodiment may be a program. This program causes the computer to function as a language learning support device 3. (2) The above program may be pre-installed in the language learning support device 3, or it may be installed in a computer and implemented so as to function as the language learning support device 3 after the fact. (3) The embodiment of the present embodiment may be an information processing method.
  • the information processing method includes a word extraction step, a count step, and a sort step.
  • word extraction step the words contained in the sentence are extracted.
  • the count step the number of occurrences of words and phrases is counted within the set upper limit.
  • a phrase is a unit that treats a plurality of extracted words as a combination of a plurality of words arranged in the order of appearance of a sentence.
  • the sort step the counted phrases are sorted based on the number of occurrences.
  • the language learning support device further includes a setting unit, the setting unit is configured to be able to set an upper limit value of the number of words to be included in the phrase, and the counting unit is a word and a word equal to or less than the set upper limit value.
  • a device configured to count the number of occurrences of the phrase.
  • the language learning support device further includes a duplicate deletion unit, and when a plurality of the phrases include the same combination of the words, the duplicate deletion unit deletes a part of the plurality of the phrases. It is configured to determine an adopted phrase, and the sort unit is configured to sort the adopted phrase based on the number of occurrences.
  • the duplicate deletion unit is configured to delete the rest of the plurality of phrases excluding one adopted phrase.
  • the adopted phrase is determined based on the number of words included in the phrase.
  • the adopted phrase is the phrase having the maximum or minimum number of words contained in the phrase.
  • the number of words included in the adopted phrase is 3 or more and 20 or less.
  • the language learning support device further includes a reception unit and a calculation unit, the reception unit is configured to receive a phrase group and frequency data, and the phrase group includes the word or the phrase.
  • the frequency data is data indicating the number of occurrences of the word or the phrase
  • the calculation unit is based on the phrase group and the frequency data, and the appearance degree of the word or the phrase included in the phrase group.
  • the sorting unit is configured to sort the phrases counted by the counting unit based on the degree of appearance.
  • the word and the number of occurrences of the phrase are counted, and the phrase is a unit that treats the extracted words as a combination of the words arranged in the order of appearance of the sentence, and is counted in the sort step.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'objet de la présente invention est de fournir une technologie qui permet de créer du matériel d'apprentissage des langues basé sur des données objectives. Un aspect de la présente invention concerne un dispositif d'aide à l'apprentissage des langues. Ce dispositif d'aide à l'apprentissage des langues comprend une unité d'extraction de mots, une unité de comptage et une unité de tri. L'unité d'extraction de mots est conçue pour extraire des mots compris dans des phrases. L'unité de comptage est conçue pour compter le nombre d'apparitions de mots et de phrases. La phrase est une unité utilisée pour traiter une pluralité des mots extraits en tant que combinaison d'une pluralité de mots agencés dans l'ordre d'apparition des phrases. L'unité de tri est conçue pour trier les phrases comptées sur la base du nombre de fois qu'elles apparaissent.
PCT/JP2021/006599 2020-06-05 2021-02-22 Dispositif d'aide à l'apprentissage des langues, programme, et procédé de traitement d'informations WO2021245997A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020098953A JP2023110106A (ja) 2020-06-05 2020-06-05 言語学習支援装置、プログラム及び情報処理方法
JP2020-098953 2020-06-05

Publications (1)

Publication Number Publication Date
WO2021245997A1 true WO2021245997A1 (fr) 2021-12-09

Family

ID=78830783

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/006599 WO2021245997A1 (fr) 2020-06-05 2021-02-22 Dispositif d'aide à l'apprentissage des langues, programme, et procédé de traitement d'informations

Country Status (2)

Country Link
JP (1) JP2023110106A (fr)
WO (1) WO2021245997A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003066828A (ja) * 2001-08-28 2003-03-05 Techno Link:Kk 外国語文章の難易度判定方法、その装置、記録媒体、プログラム
JP2008282407A (ja) * 2007-05-11 2008-11-20 Sony United Kingdom Ltd 情報処理装置
JP2015102914A (ja) * 2013-11-21 2015-06-04 日本電信電話株式会社 不理解文判定モデル学習方法、不理解文判定方法、装置、及びプログラム
WO2017060994A1 (fr) * 2015-10-07 2017-04-13 株式会社日立製作所 Système et procédé de commande d'informations présentées à un utilisateur se référant à un contenu

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003066828A (ja) * 2001-08-28 2003-03-05 Techno Link:Kk 外国語文章の難易度判定方法、その装置、記録媒体、プログラム
JP2008282407A (ja) * 2007-05-11 2008-11-20 Sony United Kingdom Ltd 情報処理装置
JP2015102914A (ja) * 2013-11-21 2015-06-04 日本電信電話株式会社 不理解文判定モデル学習方法、不理解文判定方法、装置、及びプログラム
WO2017060994A1 (fr) * 2015-10-07 2017-04-13 株式会社日立製作所 Système et procédé de commande d'informations présentées à un utilisateur se référant à un contenu

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: ""English word counter" that counts the number of English words", STABUCKY, 3 June 2020 (2020-06-03), XP055880402, Retrieved from the Internet <URL:https://stabucky.com/wp/archives/2193> *

Also Published As

Publication number Publication date
JP2023110106A (ja) 2023-08-09

Similar Documents

Publication Publication Date Title
Nguyen et al. NEU-chatbot: Chatbot for admission of National Economics University
JPH06110948A (ja) 文献を識別し、検索し、分類する方法
US20150220514A1 (en) Data processing systems including a translation input method editor
van Esch et al. Writing across the world's languages: Deep internationalization for Gboard, the Google keyboard
US20190303437A1 (en) Status reporting with natural language processing risk assessment
Dombrowski Preparing Non-English texts for computational analysis
TWI475405B (zh) 電子裝置及其文字輸入介面顯示方法
Park et al. Enhanced auditory feedback for Korean touch screen keyboards
WO2021245997A1 (fr) Dispositif d&#39;aide à l&#39;apprentissage des langues, programme, et procédé de traitement d&#39;informations
Sharma et al. Word prediction system for text entry in Hindi
Destaw et al. Question answering classification for Amharic social media community based questions
WO2022039214A1 (fr) Dispositif d&#39;aide à l&#39;apprentissage des langues, programme, et procédé de traitement d&#39;informations
KR101018821B1 (ko) 중국어 문자 생성 방법 및 이에 사용되는 키입력 장치
WO2023171790A1 (fr) Dispositif d&#39;aide à la création de texte et programme d&#39;aide à la création de texte
JP2014142762A (ja) 外国語の発音表記方法および情報表示装置
JP7429974B2 (ja) 検査装置、学習装置、検査方法、学習器の生産方法、およびプログラム
EP4057259A1 (fr) Dispositif de saisie de caractères, procédé de saisie de caractères et programme de saisie de caractères
JP7223450B2 (ja) 自動翻訳装置及び自動翻訳プログラム
EP1221082B1 (fr) Utilisation de la phonetique anglaise pour ecrire avec des caracteres non romains
CN114490976B (zh) 对话摘要训练数据的生成方法、装置、设备及存储介质
Akmuradov et al. Text Analyzing Algorithm for Speech Synthesizer of Uzbek Language
JP2024055744A (ja) 情報処理システム、情報処理方法及びプログラム
KR20170112838A (ko) 한글 복모음이 완성형으로 배정된 키보드
JPH01185724A (ja) 検索装置
KR20230129305A (ko) 중국어 문자를 포함하는 텍스트 조각을 분석하기 위한 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817318

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/03/2023)

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 21817318

Country of ref document: EP

Kind code of ref document: A1