WO2021093968A1

WO2021093968A1 - Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user

Info

Publication number: WO2021093968A1
Application number: PCT/EP2019/081498
Authority: WO
Inventors: Shijun Zhou; Enio Ohmaye
Original assignee: Signum International Ag
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2021-05-20

Abstract

There is provided a computerized system (100) and method using word embedding for generating a list of words personalized to the learning needs of a user of the system (100) at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (V_WORD) having a position (P_WORD) in an M-dimensional word embedding space, by obtaining a first input signal (S_{INPUT_1}) indicative of user specific system initialization settings; initializing the system (100), by assigning a respective score value (S_MASTER) to each numeric vector (V_WORD), based on the first input signal (S_{INPUT_1}) and a predetermined set of rules obtained from the memory (120); calculating the position of the center of mass (P_CM) of the M-dimensional word embedding space, at the initial time instance (t_INITIAL), based on the respective positions (P_WORD) and score value (S_MASTER) of the numeric vectors (V_WORD) comprised in the M-dimensional word embedding space; and generating a list of words personalized to the learning needs of a user based on the respective distances from the position (P_WORD) of each numeric vector (V_WORD) to the calculated position of the center of mass (P_CM).

Description

COMPUTERIZED SYSTEM AND METHOD OF USING WORD EMBEDDING FOR GENERATING A LIST OF

WORDS PERSONALIZED TO THE LEARNING NEEDS OF A USER

TECHNICAL FIELD

The present disclosure relates to a computerized system, computerized method and computer program product using word embedding for automatically generating a list of words personalized to the learning needs of a user, selected from a corpus of words represented as vectors in an M- dimensional word embedding space.

BACKGROUND

In learning a language, whether it is a person's first language, a second language or other, learning the vocabulary of the language is crucial to being able to understand and use the language. As of the past decades, the study patterns of learners have evolved from being in a stationary classroom performing assignments using pen and paper, to including or completely transitioning to digital learning applications accessible via computers or smart devices, e.g. smart phones. With this change there has also been a change from the previous approach of studying with the aid of a single teacher to having access to a combination of one or more human instructors and specifically developed and often adaptive computer systems, based on knowledge obtained by language learning experts.

The common way of such a specifically developed adaptive computer system to assist a learner in his/her learning process is to make the learning aware of and remediate the mistake areas of a certain language currently studied learning point. The nature of such systems is thus to prompt a learning to "redo what you have not done well" by, e.g., going back in a provided static list of words and repeat the assignment with regards to the words where an erroneous answer was given by the user/learner. An example of a known repetition method implemented in computerized language learning systems is presenting the learner with the same word at certain intervals with the goal of the learner eventually memorizing the word. This is commonly referred to as "spaced repetition".

The above presented methods have limitations, one of them being that the content is more or less static and that the learning process is not well adapted/personalized to the needs of the individual learner. There is a need for providing a customized/personalized learning path for each individual student, as each of them have different needs and progresses.

SUMMARY

An object of the present disclosure is to address at least one of the issues described above.

The inventors have realized that in order to providing a customized/personalized learning path for each individual student, an improved computerized system and method must be provided that enable users/learners to expand their knowledge outside of the currently studied learning points, so the users learn not only from their mistakes, but are enabled to learn something new. For this purpose, previous solutions using e.g. spaced repetition or making learners repeat assignments for only the words that they have already studied before moving on to a "new chapter" or the like will not suffice. These previous solutions do not enable the learner to learn something new, and specifically do not introduce any new vocabulary information/words personalized to the learning needs of the user. What is needed, the inventors realized, is to enable at each time instance the generation of a list of words, a recommendation, personalized to the learner that will be the natural next step to take in terms of expanding their vocabulary and the understanding of the words in it.

The previously known systems do not provide any satisfying solution to this problem. From this realization the inventors, having good knowledge in the science of the human brain, pondered the prospect of using word embedding to achieve an improved computerized vocabulary learning system and method.

Word embedding may be used for calculating how similar a piece of text or is to another piece of text, how similar is a word to another word, in a high dimensional word embedding space wherein each dimension represents a property of the word and the word is in the word embedding space represented as a vector comprising a set of numeric values, one for each dimension of the word embedding space.

Such a high dimensional word embedding space typically comprises hundreds, or more, dimensions. It is hence not possible for the human mind to produce the data of the word embedding space. To obtain the needed word embedding space, the inventors used a large corpus of language learning information to train a machine learning algorithm to perform the word embedding. Based on the training data provided, the machine learning algorithm was configured to set the distance between words in the high dimensional word embedding space dependent on the similarity of the words according to the property represented in that dimension. In other words, the closer two words (vectors representing words) are in the word embedding space, the more similar they are deemed to be. The similarity may mean that they are related in meaning, appear in a similar context in the training data, etc.

For the same reason, the sheer vastness of information available, it is not possible for the human mind to process the data in the word embedding space, especially not to comprehend the relationship between words in the large number of dimensions available, to generate a comprehensive list of words recommended for study by, i.e. personalized to the learning needs of, a learner based on the word embedding information.

Having come this far in the inventive process, the inventors further realised that the word embedding space thus generated may suitably be used for dynamically generating personalized recommendations for vocabulary training, e.g. in the form of a list of words suggested for study, if the vocabulary comprehension of the individual learner could be determined, and possibly tracked, in relation to the words represented in the word embedding space.

The list of words may then be presented to the user for self-study or digitally assisted study or used as input to the same of a different computerized system configured to provide digital language learning assignments based on the recommended word on the list. However, the task essential to enabling any of these aims is to determine, and possibly track the vocabulary comprehension of the individual learner could be in relation to the words represented in the word embedding space and to use this knowledge for generating the recommended word list.

In embodiments described herein, this object is achieved by an end-to-end specialized adaptive system, and corresponding computerized method, using word embedding in a high dimensional word space to not only remediate on the vocabulary that is not being mastered, but also adaptively progress the learners towards new parts of the vocabulary, wherein the new parts are selected personally for the learner, based on the learner's preferences and personalized by the specialized adaptive system based on knowledge on the workings of the human brain.

The invention is defined by the appended claims.

According to a first aspect of the invention, there is provided a computerized system using word embedding for generating a list of words personalized to the learning needs of a user of the system at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space, the system comprising processing circuitry and a memory configured to communicate with the processing circuitry. The processing circuitry is configured to obtain, via a first interface, a first input signal indicative of user specific system initialization settings and to initialize the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules obtained from the memory and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space. The processing circuitry is further configured to generate a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass.

The processing circuitry may further be configured to, repeatedly: obtain, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjust the settings of the system by updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.

Embodiments described herein thereby solve the limitation of spaced repetition, by enabling the learners to expand their vocabulary by providing new relevant vocabulary information based on the vocabulary comprehension of the learner. Suitably, this enable learners to learn vocabulary faster and more efficiently for the user by providing personalized recommendations of words to focus on next, adapted to the individual learner/user of the system.

In one or more embodiments, the processing circuitry is configured to, before generating the list of words, or the updated list of words apply a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determine a subset of numeric vectors comprising the numeric vectors that are inside the filter mask.

The processing circuitry may be configured to set the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.

The memory may be configured to, for each time instance, store information on the calculated position of the center of mass and the respective associated time instance, at which the position of the center of mass was calculated. In these embodiments, the processing circuitry may further be configured to, for two or more of the time instances for which information has been stored: retrieve information on the calculated position of the center of mass and the respective associated time instance the position was calculated; and determine the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated. The processing circuitry may further be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time.

The processing circuitry may further be configured to present a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time via the first user interface or the second user interface.

Advantageously, embodiments herein thereby provide the possibility to represent the vocabulary comprehension of a learner/user of the system, and possibly also to represent and/or track the progression of the vocabulary comprehension. The representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system. If tracking is performed, the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.

According to a second aspect of the invention, there is provided a method, in a computerized system, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space. The method comprises obtaining, via a first interface, a first input signal indicative of user specific system initialization settings and initializing, using processing circuitry, the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules; and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space. The method further comprises generating, using the processing circuitry, a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass. In one or more embodiments, the method further comprises, repeatedly: obtaining, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjusting, using the processing circuitry, the settings of the system by: updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules; and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and finally generating, using the processing circuitry, an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.

In one or more embodiments the method comprises, before generating the list of words, or the updated list of words: applying, using the processing circuitry, a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determining, using the processing circuitry, a subset of numeric vectors comprising the numeric vectors that are inside the filter mask. In these embodiments, generating the list of words, or generating the updated list of words, comprises generating the list to only comprise words represented by numeric vectors in the determined subset of numeric vectors.

In some embodiments the method may further comprise setting, using the processing circuitry, the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.

The method according to some embodiments comprises storing, in a memory of the system, information on the calculated position of the center of mass and the respective associated time instance at which the position of the center of mass was calculated. In these embodiments, the method may further comprise, for two or more of the time instances for which information has been stored: retrieving, using the processing circuitry, information on the calculated position of the center of mass and the respective associated time instance the position was calculated and determining, using the processing circuitry, the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated. The generating the list of words, or the updated list of words, using the processing circuitry, may in these embodiments further be based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time. The method of these embodiments may further comprise presenting, via the first user interface or the second user interface, a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time.

According to third aspect of the invention, there is provided a computer program loadable into a memory communicatively connected or coupled to at least one data processor, comprising software for executing the method according any of the method embodiments described herein when the program is run on the at least one data processor.

According to fourth aspect of the invention, there is provided a processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor execute the method according to of any of the method embodiments described herein when the program is loaded into the at least one data processor.

The effects and/or advantages presented in the present disclosure for embodiments of the first aspect also apply to corresponding embodiments of the second, third and fourth aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is now to be explained more closely by means of preferred embodiments, which are disclosed as examples, and with reference to the attached drawings.

Fig. 1 shows a schematic overview of a system according to one or more embodiment;

Fig. 2 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment;

Fig. 3 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment;

Fig. 4 is a flow chart of a computerized method for determining and possibly using information on a change in position of the center of mass over time, according to one or more embodiment;

Fig. 5 shows an oversimplified 2D representation of a word embedding space.

Figs. 6 to 7 show an illustrative example of center of mass calculation and updating in an oversimplified 2D representation of a word embedding space.

DETAILED DESCRIPTION

Introduction Firstly, we provide some definition of terms used herein.

In the M dimensional word embedding space described for embodiments herein, M is in a non limiting example an integer around 300, but it may in different implementations range from 50 or 100 to several thousands, depending on factors such as the number of properties relevant to describe the word in the embedding space and the computational capabilities of the system used.

An oversimplified 2D representation of a word embedding space (unmarked axis) is shown in Figure 5. In the word embedding space, words that are similar to each other, based on the properties defined for the words (included in the word vectors) and pre-set rules and conditions, are positioned close to each other, while words that are less similar by the same standards are positioned far from each other in the word embedding space.

The center of mass is the unique point at the center of a distribution of mass in space, here the word embedding space, that has the property that the weighted position vectors relative to this point sum to zero. In analogy to statistics, the center of mass is the mean location of a distribution of mass in space. In the case of a system of particles P„ / = 1, ..., n, each with mass m ,· that are located in space with coordinates r„ / = 1, ..., n, the coordinates R of the center of mass satisfy the condition:

Solving this equation for R yields the formula:

where M is the sum of the masses of all of the particles.

The mass of a vector V_WORD ("particle") in the word embedding space of this disclosure corresponds to a numeric "weight" determined based on the initial or current score value S_MASTER assigned to the vector VwoRD· The mass may e.g. be determined as mass = l/log₂(S_MASTER), where S_MASTER is larger than or equal to 2 after initialization, to make sure that the mass is always positive and smaller than or equal to 1, but any suitable conversion function may be used that fulfils the condition that as the value of S_MASTER increases, the mass of a vector V_WORD having an assigned score value S_MASTER decreases. A non-limiting example is illustrated in Fig. 6, showing an oversimplified 2D representation of a word embedding system wherein the dots and circles represent the initialized words with different mastery score values. As described above, the higher the score value for the word is, the smaller the mass is since the mass is. The triangle is the center of mass. When the system receives information that the learner/user of the system has learnt a new word, improved his/her understanding of a word, or has decayed in his/her knowledge of a word, the score values are updated according to embodiments herein, whereby the center of mass will move to a new position. This change in position of the center of mass is illustrated by the dashed arrow in Fig. 7.

In a non-limiting example, the score value S_MASTER assigned to each numeric vector V_WORD, based on the first input signal S|_NPUT_I and a predetermined set of rules according to embodiments herein may be selected as one of the following values:

SMASTER = 0: meaning that the learner/user of the system has not been presented with the word before.

SMASTER = MIN: a preset minimum value > 0 representing a minimum score of mastery of the word.

MIN < SMASTER < MASTER: representing that the learner/user of the system is learning the word. A suitable number of internal levels between MIN and MAX may be applied, for example being represented as integers or float numbers.

S_MASTER = MASTER: a preset maximum value meaning the learner/user of the system has mastered the word.

System architecture

Figure 1 shows a schematic overview of a computerized system 100 using word embedding for generating a list of words personalized to the learning needs of a user of the system 100 at a given time instance t.

The words on the list are selected from a plurality of words each represented as an M-dimensional numeric vector V_WORD having a position P_WORD in an M-dimensional word embedding space. The system 100 comprises processing circuitry 110 and a memory 120 configured to communicate with the processing circuitry 110. The processing circuitry 100 is configured to obtain, via a first interface 130, a first input signal S|_NPUT_I indicative of user specific system initialization settings and to initialize the system 100, by assigning a respective score value SMASTER to each numeric vector V_WORD, based on the first input signal S|_NPUT_I and a predetermined set of rules obtained from the memory 120 and calculating the position of the center of mass P_CM of the M-dimensional word embedding space, at the initial time instance t_|NmAL, based on the respective positions P_WORD and score value SMASTER of the numeric vectors V_WORD comprised in the M-dimensional word embedding space. The processing circuitry is further configured to generate a list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each numeric vector V_WORD to the calculated position of the center of mass P_CM·

This solve the limitation of spaced repetition, by enabling learners to expand their vocabulary by providing new relevant vocabulary information based on the vocabulary comprehension of the learner. Suitably, this enable learners to learn vocabulary faster and more efficiently for the user by providing personalized recommendations of words to focus on next, adapted to the individual learner/user of the system.

In one or more embodiment, the processing circuitry 110 is configured to, repeatedly: obtain, via the first interface 130, a second input signal S_|NPUT 2 indicative of user input related to one or more of the numeric vectors V_WORD comprised in the M-dimensional word embedding space, at a current time instance t_CURRENT; and adjust the settings of the system 100 by updating the respective score value SMASTER assigned to each of the one or more numeric vectors V_WORD, based on the second input signal SINPUT_2 and the predetermined set of rules and calculating an updated position of the center of mass P _CM of the M-dimensional word embedding space, at the current time instance t_CURRENT, based on the respective positions P_WORD and updated score value SMASTER of the numeric vectors V_WORD comprised in the M-dimensional word embedding space. Thereafter the processing circuitry 110 is configured to, for each time the second input signal S_|NPUT 2 is obtained and system settings adjusted, generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each numeric vector V_WORD to the calculated updated position of the center of mass P_CM·

Thereby, the personalized recommendations of words to focus on next are continuously adapted to the individual learner/user of the system, which further increases the relevance of the recommended words on the generated list to the user.

In one or more embodiments, the processing circuitry 110 is configured to, before generating the list of words, or the updated list of words: apply a filter mask centered at the position of the calculated center of mass P_CM of the M-dimensional word embedding space; and determine a subset of numeric vectors V_WORD comprising the numeric vectors V_WORD that are inside the filter mask, wherein the processing circuitry 110 is further configured to generate the list of words, or generate the updated list of words, to only comprise words represented by numeric vectors V_WORD in the determined subset of numeric vectors V_WORD- The filter mask has the same dimension as the word embedding space and hence filters in all dimensions, using the same value/search radius for all dimensions, or differentiated values/search radii for different dimensions. The filter mask is pre-defined/pre- calculated. As the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and faster.

The processing circuitry 110 may be configured to set the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100. Thereby, the user is enabled to select the length of the list of words to focus on and hence control the pace at which the learning progresses to suit the needs of the user.

The memory 120 may be configured to, for each time instance t_|NmAL, I_CURRENT/ store information on the calculated position of the center of mass P_CM and the respective associated time instance t_|NmAu tcu_RRENT at which the position of the center of mass P_CM was calculated. Thereby, the vocabulary knowledge status of the learner can be determined at one or more given time instances.

The memory 120 may in these embodiments be configured to, for each time instance t|_NmAL, f CURRENT_/ store information on the calculated position of the center of mass P_CM and the respective associated time instance t|_NmAL, ICURRENT at which the position of the center of mass P_CM was calculated for more than one learner/user 155 connected to the system, whereby comparison of the vocabulary knowledge status of the learners at one or more given time instances is enabled.

In some embodiments, the memory may further be configured to store the score values S_MASTER assigned to each or a selection of the words represented by numeric vectors V_WORD in the word embedding system, or the determined subset of numeric vectors V_WORD, at the respective associated time instance t_|NmAL, t_CURRENT· Thereby a more granular determination of the vocabulary knowledge status of the learner can be made at one or more given time instances. If the memory is configured to store the score values S_MASTER for more than one learner/user 155 connected to the system in this manner, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled.

The processing circuitry 110 may further be configured to, for two or more time instances for which information has been stored: retrieve information on the calculated position of the center of mass P _CM and the respective associated time instance the position was calculated and determine the change in the position of the center of mass P_CM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass P_CM and the respective associated time instance the position was calculated. The processing circuitry 110 may in these embodiments be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass P_CM in the M-dimensional word embedding space over time. Alternatively, or additionally, the processing circuitry 110 may be configured to present a visualization of the determined change in the position of the center of mass P _CM in the M-dimensional word embedding space over time via the first user interface 130 or the second user interface 140. Advantageously, embodiments herein thereby provide the possibility to represent the vocabulary comprehension of the learner/user of the system 100, and possibly also to represent and/or track the progression of the vocabulary comprehension. The representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system. If tracking is performed, the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.

Method embodiments

Turning now to figure 2, there is shown a method, in a computerized system 100, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance t, the words on the list being selected from a plurality of words each represented as an M- dimensional numeric vector V_WORD having a position P_WORD in an M-dimensional word embedding space, the method comprising:

In step 210: obtaining, via a first interface 130, a first input signal S|_NPUT_I indicative of user specific system initialization settings.

The user specific system initialization settings may comprise results of a test performed by the user and input into the system via a digital learning environment (program application or the like). Alternatively, the user specific system initialization settings may be input to the system as manual input from the user, a teacher, or another interested party - e.g. by enabling selection of learning preferences in a displayed menu via a user interface or input as a signal from the system, or an external program application communicatively connected to the system, wherein the signal represents results of a placement test or the like. Alternatively, if no specific input has been made, the initialisation setting may comprise pre-set default values.

In step 220: initializing, using processing circuitry 110, the system 100.

The initialization of step 220 includes two sub-steps 222, 224, comprising:

In sub-step 222: assigning a respective score value S_MASTER to each numeric vector V_WORD, based on the first input signal S|_NPUT_I and a predetermined set of rules. In some embodiments, the first input signal S|_NPUT_I may comprise the respective score values S_MASTER and the predetermined set of rules define that the respective score values are to be assigned to the numeric vectors V_WORD· In some embodiments, the first input signal S|_NPUT_I may comprise score values S_MASTER for some of the numeric vectors V_WORD in the M-dimensional word embedding space and the rules further comprise how to approximate score values numeric vectors V_WORD for groups/clusters of words based on the provided score values S_MASTER- Alternatively, or in combination, the first input signal S|_NPUT_I may comprise an estimated "mastery level" for one or more of the numeric vectors V_WORD and the rules may comprise how the words/vectors in the M-dimensional word embedding space are to be scored for users of different mastery levels. Alternatively, the first input signal S|_NPUT_I may comprise only default values (if no specific values are available for the user) and the rules may comprise how to score the words/vectors based on the default values.

In sub-step 224: calculating the position of the center of mass P_CM of the M-dimensional word embedding space, at the initial time instance t_|NmAL, based on the respective positions P_WORD and score value S_MASTER of the numeric vectors V_WORD comprised in the M-dimensional word embedding space.

After initialization of the system settings, the method shown in Figure 2 further comprises:

In step 240: generating, using the processing circuitry 110, a list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each numeric vector VWORD to the calculated position of the center of mass P_CM·

In one or more embodiments, generating the list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each of the numeric vector V_WORD to the calculated position of the center of mass P_CM e.g. comprises generating a list comprising the N words that are represented by the N numeric vector V_WORD with a respective position P_WORD closest to the position of the center of mass P_CM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.

In other embodiments, generating the list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each of the numeric vector V_WORD to the calculated position of the center of mass P_CM e.g. comprises generating a list comprising all words represented by a numeric vector V_WORD with a position P_WORD less than the pre-set distance d from the position of the center of mass P_CM in the M-dimensional word embedding space.

After generation of the list of words personalized to the learning needs of a user, the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140, thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user. Alternatively, or in combination, the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.

In some embodiments, the method shown in Figure 2 further comprises, before step 240 of generating the list of words:

In an optional step 230: applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass P_CM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors V_WORD comprising the numeric vectors V_WORD that are inside the filter mask.

In embodiments wherein step 230 is performed, the method step 240 of generating the list of words comprises generating the list to only comprise words represented by numeric vectors V_WORD in the determined subset of numeric vectors V_WORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.

The method of figure 2 may further comprise, as shown in Figure 3, performing the following steps, repeatedly at selected (pre-set of user input) time intervals or points in time:

In step 310: obtaining, via the first interface 130, a second input signal S_|NPUT 2 indicative of user input related to one or more of the numeric vectors V_WORD comprised in the M-dimensional word embedding space, at a current time instance t_CURRENT·

The second input signal S_|NPUT 2 may comprise results of a test, assignment, one or more action, or the like performed by the user and input into the system via a digital learning environment (program application or the like). The input may be made via a user interface 130, 140 or an input device 150 and may comprise text input, voice input, selections, and/or other information. Alternatively, the input to the system may be manual input performed by a teacher, or another interested party, via a user interface 130, 140 or input device 150, relating to the learning process of the learner.

In step 320: adjusting, using the processing circuitry 110, the settings of the system 100.

In the embodiments shown in Figure 3, the adjusting of step 320 comprises two sub-steps 322, 324, comprising: In sub-step 322: updating the respective score value S_MASTER assigned to each of the one or more numeric vectors V_WORD, based on the second input signal S_|NPUT 2 and the predetermined set of rules.

The predetermined set of rules may include that a score value S_MASTER assigned to a numeric vector VWORD should be increased (e.g. updated to the next, higher, level or a defined number of levels closer to the maximum mastery level) if the second input signal S_|NPUT 2 comprises information indicating that the learner/user 155 of the system has e.g. performed and/or answered correctly to an exercise including the word represented by the numeric vector V_WORD-

The predetermined set of rules may further include that a score value S_MASTER assigned to a numeric vector VWORD should be decreased (e.g. updated to the next, lower, level or a defined number of levels closer to the minimum mastery level) if the time since the learner/user of the system was last presented with the word represented by the numeric vector V_WORD in an assignment, as indicated by the second input signal S_|NPUT 2 or based on one or more previously received second input signals SINPUT_2, exceeds a preset threshold.

In sub-step 324: calculating an updated position of the center of mass P_CM of the M-dimensional word embedding space, at the current time instance t_CURRENT, based on the respective positions P_WORD and updated score value S_MASTER of the numeric vectors V_WORD comprised in the M-dimensional word embedding space.

In other words, when the mastery level and score values S_MASTER change, the position of P_cm will move within the M-dimensional word embedding space. If the mastery level and score values S_MASTER related to a certain word or group of words (e.g. neighbouring words in the word embedding space) increases, the position of P_cm will move closer to un-initialized parts of the vocabulary, as higher score values means that the "mass" of the numeric vector V_WORD being assigned the score values decreases.

After adjustment of the system settings, one or more method embodiments shown in Figure 3 further comprises:

In step 340: generating, using the processing circuitry 110, an updated list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each numeric vector V_WORD to the calculated updated position of the center of mass P_CM·

In one or more embodiments, generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each of the numeric vector VWORD to the updated position of the center of mass P_CM e.g. comprises generating a list comprising the N words that are represented by the N numeric vector V_WORD with a respective position P_WORD closest to the updated position of the center of mass P_CM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.

In other embodiments, generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position P_WORD of each of the numeric vector V_WORD to the updated position of the center of mass P_CM e.g. comprises generating a list comprising all words represented by a numeric vector V_WORD with a position P_WORD less than the pre-set distance d from the updated position of the center of mass P_CM in the M-dimensional word embedding space.

After generation of the updated list of words personalized to the learning needs of a user, the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140, thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user. Alternatively, or in combination, the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.

In similarity to what is described in connection with Figure 2, the method shown in Figure 3 may in some embodiments further comprise, before step 340 of generating the updated list of words:

In an optional step 330: applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass P_CM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors V_WORD comprising the numeric vectors V_WORD that are inside the filter mask.

In embodiments wherein step 330 is performed, the method step 340 of generating the updated list of words comprises generating the list to only comprise words represented by numeric vectors V_WORD in the determined subset of numeric vectors V_WORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.

In some embodiments, the method described in connection with Figure 2 and/or 3 may further comprise setting, using the processing circuitry 110, the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100. Thereby, the user of the system, i.e. the learner or another user influencing the learning process such as a teacher, instructor, supervisor, or other is enabled to control the amount of words recommended for study. Alternatively, the length of the list may be pre-set in the system, and/or be dynamically adjusted by the system depending on pre-set rules and conditions.

Using the M-dimensional word embedding system, not only can recommendations be made for future study of vocabulary for optimized learning, but the vocabulary knowledge status of the learner, or each of several learners, connected to the system 100 can be determined for a given time instance by determining the current position of the center of mass P_CM for each specified learner.

Furthermore, the learning progress of the learner, or each of several learners, connected to the system 100 can be monitored over time, by determining the current position of the center of mass P _CM for each specified learner at more than one time instance, e.g. at a number of consecutive time instances, and the information thus gathered can be presented visually via a user interface and/or be feedback into the system to further enhance future recommendations.

The information may also be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system may be continuously improved.

In Figure 4, method embodiments relating to determining the current position of the center of mass P _CM for a current learner/user 155 of the system are shown, the method embodiments comprising:

In step 410: storing, in a memory 120 of the system 100, information on the calculated position of the center of mass P_CM and the respective associated time instance t_|NmAu ICURRENT at which the position of the center of mass P_CM was calculated. Thereby, the vocabulary knowledge status of the learner can be determined at one or more given time instances.

If step 410 is performed for more than one learner/user 155 connected to the system, comparison of the vocabulary knowledge status of the learners at one or more given time instances is enabled.

The storing of step 410 may further comprise storing the score values S_MASTER assigned to each or a selection of the words represented by numeric vectors V_WORD in the word embedding system or the determined subset of numeric vectors V_WORD, at the respective associated time instance t_|NmAL, tcu_RRENT· Thereby a more granular determination of the vocabulary knowledge status of the learner can be made at one or more given time instances. If step 410 is performed for more than one learner/user 155 connected to the system, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled. In step 420: For two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the calculated position of the center of mass P_CM and the respective associated time instance the position was calculated.

In embodiments wherein step 410 comprises storing the score values S_MASTER assigned to each or a selection of the words represented by numeric vectors V_WORD in the word embedding system or the determined subset of numeric vectors V_WORD, at the respective associated time instance t_|NmAu tcu_RRENT, step 420 may further comprise, for the two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the score values S_MASTER assigned to each or the selection of the words represented by numeric vectors V_WORD in the word embedding system or the determined subset of numeric vectors V_WORD at the respective associated time instance t|_NmAL, f CURRENT_* In step 430: for the two or more time instances in step 420: determining, using the processing circuitry 110, the change in the position of the center of mass P_CM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass P_CM and the respective associated time instance the position was calculated.

Thereby, the learning progress of the learner, over time, may be determined.

If steps 410 and 420 is performed for more than one learner/user 155 connected to the system, the learning progress of more than one learner may be determined in step 430, and a comparison of the learning progress of the learners, at one or more given time instances, is enabled.

In some embodiments, generating the list of words in Step 240, and/or generating the updated list of words in Step 340, may further be based on the determined change in the position of the center of mass P _CM in the M-dimensional word embedding space over time.

In an optional step 440: presenting, via the first user interface 130 or the second user interface 140, a visualization of the calculated position of the center of mass P_CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass P_CM in the M- dimensional word embedding space over time.

Thereby, the learner, or any other interested party, may view and thereby better understand the vocabulary knowledge status of the learner at a given time, and/or the learning progress of the learner over time. Any anomalies, such as decreasing knowledge or progress slowing down may hence easily be detected and action taken. In embodiments wherein step 420 comprises retrieving, using the processing circuitry 110, information on the score values S_MASTER assigned to each or the selection of the words represented by numeric vectors V_WORD in the word embedding system or the determined subset of numeric vectors VWORD, the presenting of step 440 may further comprise presenting information on the score values SMASTER assigned to each or the selection of the words represented by numeric vectors V_WORD in the word embedding system or the determined subset of numeric vectors V_WORD-

If steps 410 and 420, and optionally also step 439, is performed for more than one learner/user 155 connected to the system, step 440 may comprise presenting a visualization of the calculated position of the center of mass P_CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass P_CM in the M-dimensional word embedding space over time, for the more than one learner/user 155. Thereby, a visualization of the learning progress of the learners, at one or more given time instances, is provided.

In an optional step 450: Feeding back to the system the calculated position of the center of mass P_CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time.

Of course, step 450 may also be performed with regard to more than one learner/user 155 connected to the system.

Thereby, information on the vocabulary knowledge status of a learner, or several learners connected to the system, at a given time, and/or the learning progress of the learner, or learners, over time may be used to further enhance future recommendations and/or to be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system is continuously improved.

The processing circuitry 110 may further be configured to perform the steps and functions according to any of the method embodiments described herein.

Further embodiments

All of the process steps, as well as any sub-sequence of steps, described with reference to Fig. 2 above may be controlled by means of a programmed data processor. Moreover, although the embodiments of the invention described above with reference to the drawings comprise processing circuitry, the invention thus also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled form, or in any other form suitable for use in the implementation of the process according to the invention. The program may either be a part of an operating system or be a separate application. The carrier may be any entity or device capable of carrying the program. For example, the carrier may comprise a storage medium, such as a Flash memory, a ROM (Read Only Memory), an EPROM (Erasable Programmable Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read only Memory), or a magnetic recording medium, for example a floppy disc or hard disc. Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or by other means. When the program is embodied in a signal which may be conveyed directly by a cable or other device or means, the carrier may be constituted by such cable or device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.

Program code, which, when run by the processing circuitry 110, causes the system 100 to perform the method according to any of the method embodiments herein may already be pre-stored in an internal memory 120 of the system 100. The processor 110 is in such embodiments communicably connected to the memory 120.

In one or more embodiments, there may be provided a computer program loadable into a memory communicatively connected or coupled to at least one data processor, e.g. the processor 110, comprising software for executing the method according any of the embodiments herein when the program is run on the at least one processor 110.

In one or more further embodiment, there may be provided a processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor, e.g. the processor 110, execute the method according to of any of the embodiments herein when the program is loaded into the at least one data processor.

The invention is not restricted to the described embodiments in the figures but may be varied freely within the scope of the claims.

Claims

1) Computerized system (100) using word embedding for generating a list of words personalized to the learning needs of a user of the system (100) at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (VWORD) having a position (P_WORD) in an M-dimensional word embedding space, the system (100) comprising: processing circuitry (110); and a memory (120) configured to communicate with the processing circuitry (110); wherein the processing circuitry (100) is configured to: obtain, via a first interface (130), a first input signal (S_{|NPUT i}) indicative of user specific system initialization settings; initialize the system (100), by: assigning a respective score value (S_MASTER) to each numeric vector (V_WORD), based on the first input signal (S_{|NPUT i}) and a predetermined set of rules obtained from the memory (120); and calculating the position of the center of mass (P_CM) of the M-dimensional word embedding space, at the initial time instance (t_|NmAL), based on the respective positions (PWORD) and score value (S_MASTER) of the numeric vectors (V_WORD) comprised in the M- dimensional word embedding space; and generate a list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (V_WORD) to the calculated position of the center of mass (P_CM)·

2) The system (100) of claim 1, wherein the processing circuitry (110) is further configured to, repeatedly: obtain, via the first interface (130), a second input signal (S_|NPUT _₂) indicative of user input related to one or more of the numeric vectors (V_WORD) comprised in the M-dimensional word embedding space, at a current time instance (t_CURRENT); adjust the settings of the system (100) by: updating the respective score value (S_MASTER) assigned to each of the one or more numeric vectors (V_WORD), based on the second input signal (S|_{N P}UT_₂) and the predetermined set of rules; and calculating an updated position of the center of mass (P_CM) of the M-dimensional word embedding space, at the current time instance ^_CURRENT), based on the respective positions (P_WORD) and updated score value (S_MASTER) of the numeric vectors (V_WORD) comprised in the M-dimensional word embedding space; and generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position (P_WORD) of each numeric vector (V_WORD) to the calculated updated position of the center of mass (P_CM)·

3) The system (100) of claim 1 or 2, wherein in the processing circuitry (110) is configured to, before generating the list of words, or the updated list of words: apply a filter mask centered at the position of the calculated center of mass (P_CM) of the M- dimensional word embedding space; and determine a subset of numeric vectors (V_WORD) comprising the numeric vectors (V_WORD) that are inside the filter mask, wherein the processing circuitry (110) is configured to generate the list of words, or generate the updated list of words, to only comprise words represented by numeric vectors (V_WORD) in the determined subset of numeric vectors (V_WORD)·

4) The computerized system (100) of any of the preceding claims, wherein the processing circuitry (110) is configured to set the length of the list of words based on user input received via the first user interface (130) or a second user interface (140) or an input device (150) connected to the system (100).

5) The system (100) of any of the preceding claims, wherein the memory (120) is configured to, for each time instance (t|_NmAL, f CURRENT)_/ store information on the calculated position of the center of mass (P_CM) and the respective associated time instance (t|_NmAL, f CURRENT) at which the position of the center of mass (P_CM) was calculated.

6) The system (100) of claim 5, wherein the processing circuitry (110) is configured to, for two or more of the time instances for which information has been stored: retrieve information on the calculated position of the center of mass (P_CM) and the respective associated time instance the position was calculated; and determine the change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass (P_CM) and the respective associated time instance the position was calculated.

7) The system (100) of claim 6, wherein the processing circuitry (110) is configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time.

8) The system (100) of claim 6, wherein the processing circuitry (110) is configured to present a visualization of the determined change in the position of the center of mass (P_CM) in the M- dimensional word embedding space over time via the first user interface (130) or the second user interface (140).

9) A method, in a computerized system (100), of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (VWORD) having a position (P_WORD) in an M-dimensional word embedding space, the method comprising: obtaining, via a first interface (130), a first input signal (S_{|NPUT i}) indicative of user specific system initialization settings; initializing, using processing circuitry (110), the system (100), by: assigning a respective score value (S_MASTER) to each numeric vector (V_WORD), based on the first input signal (S_{|NPUT i}) and a predetermined set of rules; and calculating the position of the center of mass (P_CM) of the M-dimensional word embedding space, at the initial time instance (t_|NmAL), based on the respective positions (PWORD) and score value (S_MASTER) of the numeric vectors (V_WORD) comprised in the M- dimensional word embedding space; and generating, using the processing circuitry (110), a list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated position of the center of mass (P_CM)· 10) The method of claim 9, further comprising, repeatedly: obtaining, via the first interface (130), a second input signal (S_|NPUT 2) indicative of user input related to one or more of the numeric vectors (V_WORD) comprised in the M-dimensional word embedding space, at a current time instance (t_CURRENT); adjusting, using the processing circuitry (110), the settings of the system (100) by: updating the respective score value (S_MASTER) assigned to each of the one or more numeric vectors (V_WORD), based on the second input signal (S|_{N P}UT_₂) and the predetermined set of rules; and calculating an updated position of the center of mass (P_CM) of the M-dimensional word embedding space, at the current time instance ^_CURRENT), based on the respective positions (P_WORD) and updated score value (S_MASTER) of the numeric vectors (V_WORD) comprised in the M-dimensional word embedding space; and generating, using the processing circuitry (110), an updated list of words personalized to the learning needs of a user based on the respective distances from the position (P_WORD) of each numeric vector (V_WORD) to the calculated updated position of the center of mass (P_CM)·

11) The method of claim 9 or 10, further comprising, before generating the list of words, or the updated list of words: applying, using the processing circuitry (110), a filter mask centered at the position of the calculated center of mass (P_CM) of the M-dimensional word embedding space; and determining, using the processing circuitry (110), a subset of numeric vectors (V_WORD) comprising the numeric vectors (V_WORD) that are inside the filter mask, wherein the method step of generating the list of words, or generating the updated list of words, comprises generating the list to only comprise words represented by numeric vectors (V_WORD) in the determined subset of numeric vectors (V_WORD)·

12) The method of any of the claims 9 to 11, further comprising setting, using the processing circuitry (110), the length of the list of words based on user input received via the first user interface (130) or a second user interface (140) or an input device (150) connected to the system (100).

13) The method of any of the claims 9 to 12, further comprising storing, in a memory (120) of the system (100), information on the calculated position of the center of mass (P_CM) and the respective associated time instance (t_|NmAu tcu_RREN-r) at which the position of the center of mass (PCM) was calculated.

14) The method of claim 13, further comprising, for two or more of the time instances for which information has been stored: - retrieving, using the processing circuitry (110), information on the calculated position of the center of mass (P_CM) and the respective associated time instance the position was calculated; and determining, using the processing circuitry (110), the change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass (P_CM) and the respective associated time instance the position was calculated.

15) The method of claim 14, wherein generating the list of words, or the updated list of words, using the processing circuitry (110), is further based on the determined change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time. 16) The method of claim 14, further comprising presenting, via the first user interface (130) or the second user interface (140), a visualization of the determined change in the position of the center of mass (P_CM) in the M-dimensional word embedding space over time.

17) A computer program loadable into a memory communicatively connected or coupled to at least one data processor, comprising software for executing the method according any of the method claims 9 to 16 when the program is run on the at least one data processor.

18) A processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor execute the method according to of any of the method claims 9 to 16 when the program is loaded into the at least one data processor.