WO2015087084A1 - System and method for inputting images or labels into electronic devices - Google Patents

System and method for inputting images or labels into electronic devices Download PDF

Info

Publication number
WO2015087084A1
WO2015087084A1 PCT/GB2014/053688 GB2014053688W WO2015087084A1 WO 2015087084 A1 WO2015087084 A1 WO 2015087084A1 GB 2014053688 W GB2014053688 W GB 2014053688W WO 2015087084 A1 WO2015087084 A1 WO 2015087084A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
image
text
user
words
Prior art date
Application number
PCT/GB2014/053688
Other languages
French (fr)
Inventor
James Aley
Gareth Jones
Luke HEWITT
Original Assignee
Touchtype Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Touchtype Limited filed Critical Touchtype Limited
Priority to KR1020167018754A priority Critical patent/KR102345453B1/en
Priority to CN201480067660.XA priority patent/CN105814519B/en
Priority to EP14819056.4A priority patent/EP3080682A1/en
Publication of WO2015087084A1 publication Critical patent/WO2015087084A1/en
Priority to US15/179,833 priority patent/US10664657B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04886Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus

Definitions

  • the present invention relates to a system and method for inputting images/labels into an electronic device.
  • the invention relates to a system and method for offering an image/label to be input into a device on the basis of user entered text.
  • the Unicode (6.0) standard allocates 722 codepoints as descriptions of emojis (examples include U+1 F60D: Smiling face with heart shaped eyes and U+1 F692: Fire engine). It is typical for messaging services (eg. Facebook, Whatsapp) to design their own set of images, which they use to render each of these Unicode characters so that they may be sent and received. Additionally, both Android (4.1+) and iOS (5+) provide representations of these characters natively as part of the default font.
  • emoji selection panel in which emojis are organised into several categories which can be scrolled through.
  • the user is still required to search through the emojis of that category in order to find the emoji they want to use.
  • some emojis may not be easily classified, making it more difficult for the user to decide in which category they should search for that emoji.
  • solutions which attempt to reduce further the burden of inputting emojis.
  • several messaging clients will replace automatically certain shorthand text with images. For example, Facebook Messenger will convert the emoticon :-) to a picture of a smiling face and will convert the short hand text sequence, (y), to a picture of a thumbs up when the message is sent.
  • the Google Android Jellybean keyboard will offer an emoji candidate when the user types, exactly, a word corresponding to a description of that emoji, e.g. if 'snowflake' is typed, the picture ⁇ is offered to the user as candidate input.
  • the present invention provides systems in accordance with independent claims 1 and 2, methods in accordance with independent claims 32, 33, 34, 54 and 55, and a computer program in accordance with independent claim 56.
  • Figs. 1a and 1 b show systems for generating image/label predictions in accordance with a first system type of the present invention
  • Figs. 2a-2c are schematics of alternative image/label language models according to the invention, to be used in the systems of Figs. 1a and 1 b;
  • Fig. 3 is a schematic of an n-gram map comprises text sections associated with images/labels (for this example, emojis), for use in the language model of Figs. 2b and 2c;
  • Fig. 4 is a schematic of an n-gram map comprising text sections associated with images/labels (for this example, emojis), where images/labels identified in the training text have been associated with sections of text which do not immediately precede the identified image/label, for use in the image/label language model of Figs. 2b and 2c;
  • Fig. 5 shows a system for generating image/label predictions in accordance with a second system type of the present invention
  • Fig. 6 shows a system for generating image/label predictions in accordance with a third system type of the present invention
  • Figs. 7-1 1 illustrate different embodiments of a user interface in accordance with the present invention.
  • Figs. 12-16 show flow charts according to methods of the present invention. Detailed description of the invention
  • the system of the present invention is configured to generate an image/label prediction relevant for user inputted text.
  • the system of the invention comprises a prediction means trained on sections of the text associated with an image/label.
  • the prediction means is configured to receive the text input by a user and predict the relevance of the image/label to the user inputted text.
  • the image prediction may relate to any kind of image, including a photo, logo, drawing, icon, emoji or emoticon, sticker, or any other image which may be associated with a section of text.
  • the image is an emoji.
  • the label prediction may relate to any label associated with a body of text, where that label is used to identify or categorise the body of text.
  • the label could therefore refer to the author of the text, a company/person generating sections of text, or any other relevant label.
  • the label is a hashtag, for example as used in Twitter feeds.
  • the present invention provides three alternative ways of generating image/label predictions to solve the problem of reducing the burden of image/label entry into electronic devices.
  • the solutions comprise using a language model to generate image/label predictions, using a search engine to generate image/label predictions from a plurality of statistical models, and using a classifier to generate image/label predictions.
  • the alternative solutions i.e. alternative prediction means
  • a system in accordance with the first solution can be implemented as shown in Figs. 1a and 1 b, which show block diagrams of the high level text prediction architecture according to the invention.
  • the system comprises a prediction engine 100 configured to generate an image/label prediction 50 relevant for user inputted text.
  • Fig. 1a and 1 b show block diagrams of the high level text prediction architecture according to the invention.
  • the system comprises a prediction engine 100 configured to generate an image/label prediction 50 relevant for user inputted text.
  • the prediction engine 100 comprises an image/label language model 10 to generate image/label predictions 50 and, optionally, word prediction(s) 60.
  • the image/label language model 10 may be a generic image/label language model, for example a language model based on the English language, or may be an application-specific image/label language model, e.g. a language model trained on SMS messages or email messages, or any other suitable type of language model.
  • the prediction engine 100 may comprise any number of additional language models, which may be text-only language models or an image/label language model in accordance with the present invention, as illustrated in Fig. 1 b.
  • the predication engine 100 may comprise a multi-language model 30 (Multi-LM) to combine the image/label predictions and/or word predictions, sourced from each of the language models 10, 20 to generate final image/label predictions 50 and/or final word predictions 60 that may be provided to a user interface for display and user selection.
  • the final image/label predictions 50 are preferably a set (i.e. a specified number) of the overall most probable predictions. The system may present to the user only the most likely image/label prediction 50.
  • Multi-LM 30 to combine word predictions sourced from a plurality of language models is described on line 1 of page 11 to line 2 of page 12 of WO 2010/1 12841 , which is hereby incorporated by reference.
  • the additional language model 20 is a standard word-based language model, for example as described in detail in WO 2010/112842, and in particular as shown in relation to Figs. 2a-d of WO 2010/112842
  • the standard word-based language model can be used alongside the image/label-based language model 10, such that the prediction engine 100 generates an image/label prediction 50 from the image/label language model 10 and a word prediction 60 from the word-based language model 20.
  • the image/word based language model 10 may also generate word predictions (as described below with respect to Figs. 2a-2c) which are used by the Multi-LM 30 to generate a final set of word predictions 60.
  • the word-based language model 20 may be replaced by any suitable language model for generating word predictions, which may include language models based on morphemes or word-segments, as discussed in detail in UK patent application no. 1321927.4, which is hereby incorporated by reference in its entirety.
  • the Multi- LM 30 can be used to generate final image/label predictions 50 from image/label predictions sourced from both language models 10, 20.
  • the Multi-LM 30 may also be used to tokenise user inputted text, as described on the first paragraph of page 21 of WO 2010/112842, and as described in more detail below, in relation to the language model embodiments of the present invention.
  • An image/label language model 10 will be described with reference to figures 2a-2c which illustrate schematics of image/label language models which receive user inputted text and return image/label predictions 50 (and optionally word/term predictions 60).
  • the language model may use either or both of the possible inputs.
  • the current term input 11 comprises information the system has about the term the system is trying to predict, e.g. the word the user is attempting to enter (e.g. if the user has entered "I am working on ge", the current term input 11 is 'ge'). This could be a sequence of multi-character keystrokes, individual character keystrokes, the characters determined from a continuous touch gesture across a touchscreen keypad, or a mixture of input forms.
  • the context input 12 comprises the sequence of terms entered so far by the user, directly preceding the current term (e.g.
  • the context input 12 will contain the preceding n-1 terms that have been selected and input into the system by the user.
  • the n-1 terms of context may comprise a single word, a sequence of words, or no words if the current word input relates to a word beginning a sentence.
  • a language model may comprise an input model (which takes the current term input 11 as input) and a context model (which takes the context input 12 as input).
  • the language model comprises a trie 13 (an example of an input model) and a word-based n-gram map 14 (an example of a context model) to generate word predictions from current input 1 1 and context 12 respectively.
  • the first part of this language model corresponds to that discussed in detail in WO 2010/112841 , and in particular as described in relation to Figs. 2a-2d WO 2010/112841.
  • the language model of Fig. 2a of the present invention can also include an intersection 15 to compute a final set of word predictions 60 from the predictions generated by the trie 13 and n-gram map 14.
  • the trie 13 can be a standard trie (see fig. 3 of WO 2010/1 12841) or an approximate trie (see fig. 4a of WO 2010/1 12841) which is queried with the direct current word-segment input 1 1.
  • the trie 13 can be a probabilistic trie which is queried with a KeyPressVector generated from the current input, as described in detail on line 16 of page 17 to line 16 of page 20 (and illustrated in figs. 4b and 4c) of WO 2010/112841 , which is hereby incorporated by reference.
  • the language model can also comprise any number of filters to generate the final set of word predictions 60, as described in that earlier application.
  • the intersection 15 of the language model 10 of figs. 2a and 2c can be configured to employ a back-off approach if a candidate predicted by the trie has not been predicted by the n- gram map also (rather than retaining only candidates generated by both, which is described in WO 2010/112841).
  • the intersection mechanism 15 map apply a 'back-off penalty to the probability (which may be a fixed penalty, e.g. by multiplying by a fixed value).
  • the context model e.g. the n-gram map
  • the language model of Fig. 2a includes a word ⁇ image/label correspondence map 40, which maps each word of the language model 10 to one or more relevant images/labels, e.g. if the word prediction 60 is 'pizza', the language model outputs an image of a pizza (e.g. the pizza emoji) as the image prediction 50.
  • a word ⁇ image/label correspondence map 40 maps each word of the language model 10 to one or more relevant images/labels, e.g. if the word prediction 60 is 'pizza', the language model outputs an image of a pizza (e.g. the pizza emoji) as the image prediction 50.
  • Fig. 2b illustrates a second image/label language model 10 in accordance with the first solution of the present invention.
  • the image/label language model 10 is configured to generate an image/label prediction 50 and, optionally a word prediction 60 on the basis of context 12 alone.
  • the image/label language model receives the context input 12 only, which comprises one or more words which are used to search the n-gram map 14'.
  • the n-gram map 14' of Fig. 2b is trained in a different way to that of Fig. 2a, enabling the image/label language model 10 to generate relevant image/label predictions 50 without the use of a word ⁇ image/label correspondence map 40.
  • the language model 10 may output the most likely image/label 50 associated with the most likely word 60 used to start a sentence. For certain circumstances, it may be appropriate to predict an image/label on the basis of context only, e.g. the prediction of emojis. In other circumstances, e.g. the prediction of a label (such as a hashtag), it might be more appropriate to use current word input (by itself or in addition to context input), because the user might partially type the label before it is predicted.
  • a label such as a hashtag
  • n-gram maps 14' of the second embodiment are illustrated schematically in Figs. 3 and 4, where for illustrative purposes an emoji has been chosen for the image/label.
  • the n-gram map 14' of Fig. 3 has been trained on source data comprising images/labels embedded in sections of text.
  • the language model could be trained on data from twitter, where the tweets have been filtered to collect tweets comprising emojis.
  • the emojis (which are used merely as an example for the image/label) are treated like words to generate the language model, i.e. the n-gram context map comprises emojis in the context in which they have been identified. For example, if the source data comprises the sentence "I am not happy about this ⁇ ", the emoji ⁇ will only follow its preceding context, e.g.
  • the language model will therefore predict the emoji ⁇ if the context 12 fed into the language model comprises ""happy about this", as it is the next part of the sequence.
  • the n-gram map comprises the probabilities associates with sequences of words and emojis, where emojis and words are treated indiscriminately for assigning probabilities. The probabilities can therefore be assigned on the basis of frequency of appearance in the training data given a particular context in that training data.
  • the n-gram map of Fig. 4 has been trained by associating images/labels identified within the source text with sections of text which do not immediately precede the identified images/labels.
  • the language model is able to predict relevant/appropriate images/labels, even though the user has not entered text which describes the relevant image/label and has not entered text which would usually immediately precede the image/label, such as ⁇ am' for ⁇ am ⁇ ".
  • images/labels are identified within a source text (e.g. filtered twitter tweets) and each identified image/label is associated with sections of text within that source text.
  • an emoji of a particular tweet is associated with all n-grams from that tweet. For example: training on the tweet "I'm not happy about this ⁇ " would generate the following n-grams with associated emoji:
  • One way to generate emoji predictions from such a non-direct context n-gram map 14' is to take the emojis that are appended to the word sequences of the n-gram map 14' which most closely match the word sequence of the user inputted text. If the user inputted text is W 1 W 2 W 3 W4, the predicted emoji is the emoji that is appended to the sequence W 1 W 2 W 3 W 4.
  • An alternative way to generate emoji predictions from a non-direct context n-gram map 14' is to predict an emoji for each word of the user inputted text, e.g.
  • the word sequence of user inputted text is ⁇ N ⁇ N 2 ⁇ Nz ⁇ N 4i etc., predict a first emoji, e ⁇ for ⁇ N a second emoji e 2 for ⁇ N ⁇ N 2 (where ⁇ N ⁇ N 2 means predicting an emoji for the word sequence ⁇ N ⁇ N 2 ), e 3 for ⁇ N ⁇ N 2 ⁇ N $ and e 4 for W 1 W 2 W 3 W 4 , etc.
  • the weighted average of the set of emoji predictions (e ⁇ e 2 , e 3 , e 4 ) can be used to generate the emoji predictions 50, i.e. the most frequently predicted emoji will be outputted as the most likely emoji. By taking a weighted average of the set of emoji predictions, it may be possible to increase the contextual reach of the emoji prediction.
  • the model is preferably pruned in two ways.
  • the first is to prune based on frequency of occurrence, e.g. prune n-grams with frequency counts of less than a fixed number of occurrences (e.g. if a particular n-gram and associated emoji is seen less than 10 times in the training data, remove that n-gram and associated emoji).
  • the second way of pruning is to prune on the basis of the probability difference from the unigram probabilities.
  • the probability of predicting ⁇ will not be much larger than the unigram probability of ⁇ , because training will also have encountered many other n-grams of the form about this [EMOJI] with no particular bias.
  • the n- gram "about this ⁇ " can therefore be pruned.
  • a combination of the two pruning methods is also possible, as are any other suitable pruning methods.
  • the language model 10 receives a sequence of one or more words (context 12) from the Multi-LM 30 and compares the sequence of one or more words to a sequence of words stored in the n-gram map 14'.
  • a sequence of one or more words intext 12
  • the language model would predict " ⁇ ”.
  • the language model In relation to the n-gram map of Fig. 4, the language model generates an emoji prediction much more regularly, as the language model has been trained on direct and non-direct context.
  • the language model can optionally output one or more word prediction 60 alongside the image/label prediction(s) 50.
  • the language model compares the input sequence of one or more words (context 12) to the stored sequences of words (with appended emojis). If it identifies a stored sequence of words that comprises the sequence of one or more words, it outputs the next word in the stored sequence that follows the sequence of one or more words, for direct input of the next word into the system or for display of the next word 60 on a user interface for user selection, for example.
  • a third embodiment of a language model 10 is illustrated in Fig. 2c. As with the language model of Fig. 2a, the language model 10 of Fig.
  • n-gram map 14' to generate word predictions from current input 11 and context input 12 respectively, and an intersection 15 to generate one or more final word prediction(s) 60.
  • the n-gram map 14' of the third embodiment is the same of that of the second embodiment, i.e. it comprises images/labels embedded within sections of text or appended to sections of text. The same n-gram map 14' can therefore be used to generate image/label predictions 50, as well as word predictions 60.
  • the system of the first solution predicts an image/label on the basis of the user entered text and, optionally, a word/term on the basis of that user entered text.
  • the second solution to reducing the burden of image/label input relates to a search engine configured to generate image/label predictions for user input, similar to that discussed in detail in UK patent application 1223450.6, which is hereby incorporated by reference in its entirety.
  • Fig. 5 shows a block diagram of the high level system architecture of the system of the invention.
  • the search engine 100' uses an image/label database 70 that preferably comprises a one-to-one mapping of statistical models to image/labels, i.e.
  • the image/label database comprises a statistical model associated with each image/label (e.g. emoji or hashtag), each image/label statistical model being trained on sections of text associated with that image/label.
  • a language model is a non-limiting example of a statistical model, where the language model is a probability distribution representing the statistical probability of sequences of words occurring within a natural language.
  • a language model in accordance with this solution does not have images/labels within the language model, it is a text only language model mapped to a particular image/label.
  • the search engine 100' uses the image/label database 70 and user inputted text 12' and, optionally, one or more other evidence sources 12", e.g. the image/label input history for a given user of a system. To trigger a search, the search engine receives user entered text 12'.
  • the image/label database 70 associates individual images/labels with an equal number of statistical models and, optionally, alternative statistical models (not shown) that are not language based (e.g. a model that estimates user relevance given prior input of a particular image/label), as will be described later.
  • the search engine 100' is configured to query the image/label database 70 with the user inputted text evidence 12' in order to generate for each image/label in the content database an estimate of the likelihood that the image/label is relevant given the user inputted text.
  • the search engine outputs the most probable or the p most probable images/labels as image/label predictions 50, which may optionally be presented to a user.
  • An estimate for the probability, P, of observing the user inputted text, e, given an image/label, c, is relevant under an associated image/label statistical model M is:
  • the first two approaches are based on extracting a set of features and training a generative model (which in this case equates to extracting features from a text associated with an image/label and training an image/label statistical model on those features), while statistical language modelling attempts to model a sequential distribution over the terms in the user inputted text.
  • a set of features is extracted from user inputted text, preferably by using any suitable feature extraction mechanism which is part of the search engine 100'. To generate a relevance estimate, these features are assumed to have been independently generated by an associated image/label statistical model. An estimate of the probability of a given feature being relevant to particular image/label is stored in the image/label statistical model.
  • an image/label statistical model is trained on text associated with an image/label by extracting features from the text associated with the image/label and analysing the frequency of these features in that text.
  • features are various methods used in the art for the generation of these features from text. For example:
  • Term combination Features may include combinations of terms, either contiguous n-grams or representing non-local sentential relations.
  • Syntactic Features may include syntactic information such as part-of-speech tags, or higher level parse tree elements.
  • Latent topics/clusters Features may be sets/clusters of terms that may represent underlying "topics" or themes within the text.
  • the preferred features are typically individual terms or short phrases (n-grams).
  • Individual term features are extracted from a text sequence by tokenising the sequence into terms (where a term denotes both words and additional orthographic items such as morphemes and/or punctuation) and discarding unwanted terms (e.g. terms that have no semantic value such as 'stopwords').
  • features may also be case-normalised, i.e. converted to lowercase.
  • N-gram features are generated by concatenating adjacent terms into atomic entities. For example, given the text sequence "Dear special friends", the individual term features would be: “Dear”, “special” and “friends”, while the bigram (2-gram) features would be "Dear_special” and "speciaMriends”.
  • the feature generation mechanism of the search engine 100' prefferably weight features extracted from the user inputted text 12 in order to exaggerate the importance of those which are known to have a greater chance a priori of carrying useful information. For instance, for term features, this is normally done using some kind of heuristic technique which encapsulates the scarcity of the words in common English (such as the term frequency-inverse document frequency, TFiDF), since unusual words are more likely to be indicative of the relevan tical models than common words.
  • TFiDF is defined as:
  • tf(t) is the number of times term t occurs in the user inputted text
  • df(t) is the number of image/label statistical models in which t occurs across all image/label statistical models.
  • the D features of the user inputted text 12' can be represented by a real valued D-dimensional vector. Normalization can then be achieved by the search engine 100' by converting each of the vectors to unit length. It may be preferable to normalise the feature vector because a detrimental consequence of the independence assumption on features is that user inputted text samples of different length are described by a different number of events, which can lead to spurious discrepancies in the range of values returned by different system queries.
  • c,M), of observing the user inputted text, e, given an image/label, c, is relevant under an associated image/label statistical model M is computed as a product over independent features, f extracted from the text input by a user, e:
  • the search engine 100' is configured to query the image/label database 70 with each feature f t .
  • the database returns a list of all the image/label statistical models comprising that feature and the probability estimate associated with that feature for each image/label statistical model.
  • the weight vector is preferably normalized to have unit length:
  • an estimate of the image/label dependent feature likelihood, (gi ⁇ c, M) is needed.
  • the search engine 100' takes this estimate from the image/label statistical model which has been trained by analysing the frequency of features in the source text. Under this approach, however, if the probability estimate for any feature of the user inputted text is zero (because, for example, the term is not present in the language model), the final probability (£ "
  • the search engine 100' can therefore determine which image/label 50 is the most relevant given the user inputted text by querying each image/label statistical model of the image/label database 70 with features f t extracted from the user inputted text, to determine which image/label statistical model provides the greatest probability estimate (since the image/label statistical models are mapped to corresponding images/labels).
  • the search engine 100' can take into account additional types of evidence, e.g. evidence that relates specifically to a given user, e.g. previously generated language, previously entered images/labels, or social context / demographic (e.g. since the type of emoji that is popularly used may vary with nationality/culture/age).
  • additional types of evidence e.g. evidence that relates specifically to a given user, e.g. previously generated language, previously entered images/labels, or social context / demographic (e.g. since the type of emoji that is popularly used may vary with nationality/culture/age).
  • the search engine may take into account a prior probability of image/label relevance, e.g. a measure of the likelihood that an image/label will be relevant in the absence of any specific evidence related to an individual user or circumstance.
  • This prior probability can be modelled using an aggregate analysis of general usage patterns across all images/labels.
  • recency how recently the image/label was inputted by a user
  • the search engine 100' If multiple evidence sources 12', 12" are taken into account, the search engine 100' generates an estimate for each image/label given each evidence source. For each image/label, the search engine is configured to combine the estimates for the evidences sources to generate an overall estimate for that image/label. To do this, the search engine 100' may be configured to treat each of the evidence sources as independent, i.e. a user's image/label input history as independent from the text input.
  • c,M c ) is therefore calculated by the search engine 100' as a product of the probability estimates for the independent evidence sources e,.
  • the search engine 100' is therefore configured to calculate the individual evidence estimates separately.
  • each image/label , M associated with each evidence source
  • the relative impact of individual evidence sources can be controlled by the search engine 100' by a per-distribution smoothing hyper-parameter which allows the system to specify a bound on the amount of information yielded by each source. This can be interpreted as a confidence in each evidence source.
  • An aggressive smoothing factor on an evidence source (with the limiting case being the uniform distribution, in which case the evidence source is essentially ignored) relative to other evidence sources will reduce the differences between probability estimates for an evidence source conditioned on different pieces of images/labels. The distribution becomes flatter as the smoothing increases, and the overall impact of the source on the probability, P(E
  • the statistical model may be a language model, such that there is a plurality of language models associated with the plurality of images/labels, where those language models comprise n-gram word sequences.
  • the language models may be used to generate word predictions on the basis of the user inputted text (e.g. by comparing the sequence of words of the user inputted text to a stored sequence of words, to predict the next word on the basis of the stored sequence).
  • the system is therefore able to generate a word prediction via the individual language models as well as an image/label prediction via the search engine.
  • the system may comprise one or more language models (e.g. word-based language model, morpheme-based language model etc.), in addition to the statistical models of the search engine, to generate text predictions.
  • the search engine 100' may be configured to discard all features f, which have a TFiDF value lower than a certain threshold.
  • Features with a low TFiDF weighting will, in general, have a minimal impact on the overall probability estimates.
  • low TFIDF terms also tend to have a reasonably uniform distribution of occurrence across content corpora, meaning their impact on the probability estimates will also be reasonably uniform across classes.
  • the search engine can be configured to retrieve the top k images/labels.
  • the top-k image/label retrieval acts as a first pass to reduce the number of candidate images/labels, which can then be ranked using a more computationally expensive procedure.
  • the search engine For each feature of the user inputted text, f, with TFiDF t (normalised to be in the range [0, 1]), the search engine is configured to find the k.t images/labels which have the highest probabilistic association with f, where this set of images/labels is denoted C f .
  • the search engine than 'scores' the evidence with respect to this limited set of candidate images/labels. Since k is likely to be small compared to the original number of images/labels, this provides a significant performance improvement. Any other suitable solution for retrieving the top k images/labels can be employed, for example by using Apache Lucene (http://lucene.apache.org/) or by using a k- nearest neighbour approach (http://en.wikipedia.org/wiki/Nearest_neighbor_search#k- nearest_neighbor), etc.
  • Fig. 6 illustrates a system in accordance with a third embodiment of the invention that comprises a classifier 100" to generate image/label predictions 50 that are relevant to user inputted text 12'.
  • a classifier 100" for generating text predictions has been described in detail in WO 201 1/042710, which is hereby incorporated by reference in its entirety.
  • classification is the problem of identifying to which of a set of categories (sub- populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
  • the classifier 100" is the feature that implements classification, mapping input data to a category.
  • the classifier 100" is configured to map user inputted text to image/labels.
  • the classifier 100" is trained on text data that has been pre-labelled with images/labels, and makes real-time image/label predictions 50 for sections of text 12 entered into the system by a user.
  • a plurality of text sources 80 are used to train the classifier 100".
  • Each of the plurality of text sources 80 comprises all of the sections of text associated with a particular image/label as found in the source data.
  • any text of a sentence comprising a particular image/label may be taken to be text associated with that image/label or any text which precedes the image/label may be taken to be associated text, for example a twitter feed and its associated hashtag or a sentence and its associated emoji.
  • each text source of the plurality of text sources 80 is mapped to or associated with a particular image/label.
  • Feature Vector Generator 90 of the system.
  • the Feature Vector Generator 90 is configured to convert the user inputted text 12' into a feature vector ready for classification.
  • the Feature Vector Generator 90 is as described above for the search engine system.
  • the Feature Vector Generator 90 is also used to generate the feature vectors used to train the classifier (from the plurality of text sources) via a classifier trainer 95.
  • the value D of the vector space is governed by the total number of features used in the model, typically upwards of 10,000 for a real-world classification problem.
  • the Feature Vector Generator 90 is configured to convert a discrete section of text into a vector by weighting each cell according to a value related to the frequency of occurrence of that term in the given text section, normalised by the inverse of its frequency of occurrence (TFiDF) across the entire body of text, where tf(t) is the number of times term t occurs in the current source text, and df(t) is the number of source texts in which t occurs across the whole collection of text sources.
  • TFiDF frequency of occurrence
  • the Feature Vector Generator 90 is configured to split user inputted text 12' into features (typically individual words or short phrases) and to generate a feature vector from the features.
  • the feature vectors are D-dimensional real-valued vectors, R D , where each dimension represents a particular feature used to represent the text.
  • the feature vector is passed to the classifier 100" (which uses the feature vector to generate image/label predictions).
  • the classifier 100" is trained by a training module 95 using the feature vectors generated by the Feature Vector Generator 90 from the text sources 80.
  • a trained classifier 100" takes as input a feature vector that has been generated from text input by a user 12', and yields image/label predictions 50, comprising a set of image/label predictions mapped to probability values, as an output.
  • the image/label predictions 50 are drawn from the space of image/label predictions associated with/mapped to the plurality of text sources.
  • the classifier 100' is a linear classifier (which makes a classification decision based on the value of a linear combination of the features) or a classifier based on the batch perceptron principle where, during training, a weights vector is updated in the direction of all misclassified instances simultaneously, although any suitable classifier may be utilised.
  • a timed aggregate perceptron (TAP) classifier is used.
  • the TAP classifier is natively a binary (2-class) classification model. To handle multi-class problems, i.e. multiple images/labels, a one-versus-all scheme is utilised, in which the TAP classifier is trained for each image/label against all other images/labels.
  • the training of a classifier is described in more detail on line 26 of page 10 to line 8 of page 12 in WO 201 1/042710, which is hereby incorporated by reference.
  • a classifier training module 95 carries out the training process as already mentioned.
  • the training module 95 yields a weights vector for each class, i.e. a weights vector for each image/label.
  • the image/label confidence values generated by the classifier 100" are used to generate a set of image/label predictions (where the dot product with the highest value (greatest confidence) is matched to the most likely image/label).
  • the system may further comprises a weighting module.
  • the weighting module (not shown) may use the vector of confidence values generated by the classifier to weight the prior probabilities for each image/label to provide a weighted set of image/label predictions 50.
  • the weighting module may be configured to respect the absolute probabilities assigned to a set of image/label predictions, so as not to skew spuriously future comparisons.
  • the weighting module can be configured to leave image/label predictions from the most likely prediction component unchanged, and down-scales the probability from less likely images/labels proportionally.
  • the image/label predictions 100" output by the classifier 100" can be displayed on a user interface for user selection.
  • the classifier 100" is required to generate the dot product of the input vector with each image/label vector to generate image/label predictions 50.
  • the greater the number of image/labels the greater the number of dot products the classifier is required to calculate.
  • the images/labels may be grouped together, e.g. all emojis relating to a particular emotion (such as happiness) can be grouped into one class, or all emojis relating to a particular topic or subject, such as clothing etc.
  • the classifier would predict the class, for example an emotion (sad, happy, etc.) and the n most likely emoji predictions of that class can be displayed to the user for user selection.
  • this does result in the user having to select from a larger panel of emojis.
  • the coarser grade classes could be used to find the right category of emoji, with the finer emoji prediction occurring only for that coarser category, thus reducing the number of dot products the classifier is required to take.
  • a first set of features can be extracted from the user inputted text to generate an initial set of image/label predictions
  • a second set of features can be extracted from the user inputted text to determine the one or more most-likely image/label predictions from that initial set of image/label predictions.
  • the first set of features may be smaller in number than the second set of features.
  • search engine 100' may become more desirable than the classifier 100", because the search engine calculates the probabilities associated with the images/labels by a different mechanism which is able to cope better with determining probability estimates for a large volume of images/labels.
  • the systems of the present invention can be employed in a broad range of electronic devices.
  • the present system can be used for messaging, texting, emailing, tweeting etc. on mobile phones, PDA devices, tablets, or computers.
  • the present invention is also directed to a user interface for an electronic device, wherein the user interface displays the predicted image/label 50 for user selection and input.
  • the image/label prediction 50 can be generated by any of the systems discussed above.
  • the user interface preferably displays one or more word/term predictions 60 for user selection, in addition to the display of one or more image/label predictions 50.
  • Figures 7-1 1 illustrate, by way of example only, the display of an emoji on a user interface for user selection and input.
  • the invention is not limited to the display and input of an emoji, and is applicable to any image/label prediction 50.
  • the user interface comprises one or more candidate prediction buttons (in this example, three candidate prediction buttons) displaying one or more (in this example three) most likely user text predictions (i.e. The', T, 'What', in this example).
  • the user interface 150 also comprises a virtual button 155 for displaying the current most relevant image/label prediction 60 (in a preferred embodiment, an emoji, and in the particular example illustrated a beer emoji).
  • Processing circuitry of the device is configured such that a first user input, for example a tap on a touchscreen device, directed at the virtual button 155 displaying the emoji, inputs the displayed emoji into the device; and a second user input (different from the first user input), for example a long-press or directional swipe directed at the button 155, opens a menu to other actions, e.g. next most relevant emoji predictions, all emoji, carriage return, etc.
  • a first user input for example a tap on a touchscreen device
  • a second user input for example a long-press or directional swipe directed at the button 155
  • opens a menu to other actions e.g. next most relevant emoji predictions, all emoji, carriage return, etc.
  • an image (e.g. emoji) prediction 50 mapped to a word prediction 60 (e.g. through the word->emoji correspondence map of fig. 2a) will be presented as a prediction 160 on a prediction pane, alongside the matching word prediction 161.
  • the candidate prediction buttons therefore display the two most relevant word predictions (for the example of a user interface with three candidate buttons) and the image (e.g. emoji) most appropriate for the most relevant word prediction.
  • the image/label prediction presented as a prediction 160 on the prediction pane is the most-likely image/label prediction (as determined by any of the above described systems) and does not therefore need to correspond to a word prediction of the prediction pane.
  • the image (e.g. emoji) prediction 160 may always be displayed on the right-hand side of the prediction pane, making the image easy to locate.
  • Alternative image (e.g. emoji) predictions 60 may be made available by long-pressing the image (e.g. emoji) prediction button 160.
  • the emoji button 155 reflects this prediction and also presents emoji related to recently typed words.
  • a first gesture (e.g. a tap) on the image (for the illustrated example, emoji) button 155 will insert the emoji displayed by the button, and a second gesture on the button (e.g. a longpress or swipe) will display the emojis related to recently typed words for user selection.
  • an image/label e.g. for the illustrated example, an emoji
  • an image/label e.g. for the illustrated example, an emoji
  • emoji e.g. for the illustrated example, an emoji
  • the current word candidates in this example 'food', 'and', 'is'
  • words that have been recently typed e.g. 'cat'
  • the emoji displayed on the button 165 can be inserted via a first gesture on the button 165 or button 155 (e.g. a tap), with the alternative emojis available via a second gesture on the button 155 or button 165 (e.g. by a long-press or swipe).
  • the image/label panel e.g. emoji panel
  • alternative relevant images e.g. emojis
  • the image/label candidate prediction button 165 can be accessed by long-pressing the imagel/label candidate prediction button 165.
  • the user long presses the emoji candidate prediction button 165, slides their finger towards the emoji panel icon and releases.
  • the emoji panel icon will be on the far left side of the pop-up to allow a 'blind directional swipe' to access it.
  • the rest of the pop-up is filled with extended emoji predictions.
  • the image/label (e.g. emoji) can be displayed with its matching word on a candidate button 170 of the prediction pane.
  • the word can be inserted by a first user gesture on the candidate button 170 (e.g. by tapping the button 170), with the image/label (e.g. emoji) inserted via a second user gesture on the candidate button 170 (for example, by a long press of the button 170).
  • a standard emoji key 155 can be provided as with previous user interface embodiments to allow the user to insert a predicted emoji (which may not necessarily match the predicted word) or allow the user to search for alternative emojis.
  • FIG. 11 illustrates how an image (e.g. emoji) can be displayed and inserted with a continuous touch input, for example as described in detail in earlier application WO2013/107998, which is hereby incorporated by reference in its entirety, and as illustrated in Fig. 1 of WO2013/107998.
  • the prediction pane comprises a word prediction button 175 'heart' and an emoji prediction button 165 which displays a relevant emoji, e.g. [heart emoji].
  • the user moves over to the word prediction pane and removes their finger from contact with the user interface at a location on the word prediction button 175.
  • the word prediction is inserted whenever the user lifts their finger form the user interface, unless their finger is lifted at the emoji button.
  • the processing circuitry can be configured to insert the word if the user lifts their finger from the user interface whilst on the last character of the word or even mid-word when the prediction engine has predicted and displayed that word for user selection and input.
  • the user breaks contact with the touchscreen interface at the emoji candidate button 165.
  • the processing circuitry for the user interface may be configured such that the user ending the continuous touch gesture on the emoji button 165 and remaining on the emoji button 165 for a particular length of time brings up a pop-up panel 200 of alternative emojis for user selection.
  • the user interface has been described as comprising various 'buttons'.
  • the term 'button' is used to describe an area on a user interface where an image/label/word is displayed, where that image/label/word which is displayed can be input by a user by activating the 'button', e.g. by gesturing on or over the area which displays the image/label/word.
  • Figs. 12-16 are schematic flow charts of methods according to the invention.
  • the present invention provides a method of generating a prediction means to predict an image/label relevant to user inputted text.
  • the method comprises receiving text having one or more images/labels embedded within sections of text 400, identifying an image/label embedded within the text 410, and associating the identified image/label with sections of the text 420.
  • the prediction means is then trained on the sections of text associated with the image/label.
  • the prediction means is a language model 10
  • the language model 10 is trained on text comprising images/labels, for example by including an image/label in an n-gram word/image sequence or by appending the image/label to an n-gram word sequence.
  • each statistical model can be mapped to a given image/label and trained on text associated with that image/label.
  • the prediction means is a classifier 100" trained on a plurality of text sources, each text source comprises sections of text associated with a given image/label.
  • a second method of the invention as illustrated in Fig. 13, there is provided a method of predicting using a prediction means an image/label relevant to text input into a system by a user, wherein the prediction means is trained on sections of text associated with an image/label.
  • the method comprises receiving at the prediction means the text input by a user 500, determining the relevance of the text input by a user to the sections of text associated with the image/label 510, and predicting on the basis of the sections of text associated with the image/label the relevance of the image/label to the text input by a user 520.
  • the prediction means is the search engine 100'
  • the search engine 100' determines the relevance of the user inputted text by extracting features from the user inputted text and querying an image/label database 70 with those features.
  • the search engine 100' By querying the database 70, the search engine 100' is able to determine which image/label statistical model is the most relevant and is therefore able to generate image/label predictions 50, because each statistical model is mapped to a particular image/label.
  • the prediction means is a classifier 100"
  • the classifier 100' is able to determine the relevance of an image/label to user inputted text by generating the dot product of a feature vector representing the image/label (generated from the source text which comprises sections of text associated with that image/label) with a feature vector representing the user inputted text.
  • a third method of the invention as illustrated in Fig.
  • a method to predict using a prediction means an image/label relevant to text input into a system by a user wherein the prediction means is trained on text which comprises an image/label embedded within text, wherein the prediction means has been trained by identifying the image/label within the text and associating the identified image/label with sections of the text.
  • the method comprises receiving at the prediction means the text input by a user 600, comparing the text input by a user to the sections of text associated with the image/label 610, and predicting on the basis of the sections of text associated with the identified image/label the relevance of the image/label to the text input by a user 620.
  • the language model may comprise an image/label within an n-gram word/image sequence of an n-gram map 14' or an image/label appended to an n-gram word sequence of an n-gram map 14'.
  • the language model predicts a relevant image/label 50 by comparing the user inputted text to a stored n-gram sequence and outputting a relevant image/label which is part of the stored n-gram or is appended to the stored n-gram.
  • the language model comprises a word-based n-gram map 14 and a word ⁇ image correspondence map 40 which is trained on sections of text (i.e. words) associated with the images.
  • the language model is configured to predict the next word in a sequence of user inputted words by comparing the word sequence to a stored n-gram of the map 14 and then mapping this predicted word to an image using the correspondence map 40.
  • Third and fourth methods of the invention relate to a user's interaction with a touchscreen user interface of a device comprising one or more of the above described systems for generating image/label predictions 50.
  • the third method of the invention provides a method of entering data into an electronic device comprising a touchscreen user interface having a keyboard, wherein the user interface comprises a virtual image/label button configured to display the predicted image/label for user selection.
  • the method comprises inputting a character sequence via a continuous gesture across the keyboard 700.
  • the method comprises inputting the image/label as data 720.
  • the gesture may include breaking contact with the user interface at the image/label virtual button.
  • the fourth method relates to a method for selecting between entry of a word/term and entry of an image/label that corresponds to that word/term on a touchscreen user interface comprising a virtual button configured to display a predicted word/term and/or the predicted image/label.
  • the method comprises, in response to receipt of a first gesture type on/across the button, inputting the predicted word/term 800; and, in response to a second gesture type on/across the button, inputting the predicted image/label 810.
  • the present invention solves the above mentioned problems by providing a system and method for predicting emojis/stickers based on user entered text.
  • the present invention is able to increase the speed of emoji input by offering one or several relevant emoji predictions, which saves the user from having to scroll through different emojis to identify the one they want.
  • the system and method of the present invention provides increased emoji discoverability, as the prediction of emojis based on next-word prediction/correction and context means that an emoji may be predicted and presented to a user, even though the user may not be aware that a relevant or appropriate emoji exists.
  • the systems and methods of the present invention therefore provide efficient emoji selection and input into an electronic device. Rather than having to scroll through possible emojis, the user can insert a relevant emoji by the tap of a virtual key displaying a predicted emoji.
  • the examples have been provided with reference to emojis, the invention is equally applicable to the insertion of any image/label relevant to user entered text, as previously described.
  • the present invention also provides a computer program product comprising a computer readable medium having stored thereon computer program means for causing a processor to carry out one or more of the methods according to the present invention.
  • the computer program product may be a data carrier having stored thereon computer program means for causing a processor external to the data carrier, i.e. a processor of an electronic device, to carry out the method according to the present invention.
  • the computer program product may also be available for download, for example from a data carrier or from a supplier over the internet or other available network, e.g. downloaded as an app onto a mobile device (such as a mobile phone) or downloaded onto a computer, the mobile device or computer comprising a processor for executing the computer program means once downloaded.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

System and method for inputtingimages/labelsinto electronic devices There are provided systems and methods for predicting an image/label relevant to text input by a user. In a first aspect, there is provided a system comprising a means for receiving text input by a user and a prediction means trained on sections of text associated with an image/label. The prediction means is configured to receive the text input by the user, determine the relevance of the text input by the user to the sections of text associated with the image/label, and predict on the basis of the sections of text associated with the image/label the relevance of the image/label to the text input by the user. The systems and methods of the present invention reduce the burden of entering an image/label.

Description

SYSTEM AND METHOD FOR INPUTTING IMAGES OR LABELS INTO ELECTRONIC DEVICES
Field of the invention The present invention relates to a system and method for inputting images/labels into an electronic device. In particular, the invention relates to a system and method for offering an image/label to be input into a device on the basis of user entered text.
Background of the invention
In the texting and messaging environment, it has become popular for users to include images in word-based text. For example, it is common for users to enter text-based representations of images, known as emoticons, to express emotion such as :-) or ;-p [typical in the west] or (Λ_Λ) [typical in Asia]. More recently, small character sized images, called emojis have become popular. Stickers have also become popular. A sticker is a detailed illustration of a character that represents an emotion or action that is a mix of cartoons and emojis.
As of October 2010, the Unicode (6.0) standard allocates 722 codepoints as descriptions of emojis (examples include U+1 F60D: Smiling face with heart shaped eyes and U+1 F692: Fire engine). It is typical for messaging services (eg. Facebook, Whatsapp) to design their own set of images, which they use to render each of these Unicode characters so that they may be sent and received. Additionally, both Android (4.1+) and iOS (5+) provide representations of these characters natively as part of the default font. Although it is popular to input emojis, it remains difficult to do so, because the user has to discover appropriate emojis and, even knowing the appropriate emoji, has to navigate through a great number of possible emojis to find the one they want to input.
Keyboards and messaging clients have tried to reduce the problem by including an emoji selection panel, in which emojis are organised into several categories which can be scrolled through. Although the emojis have been grouped into relevant categories, the user is still required to search through the emojis of that category in order to find the emoji they want to use. Furthermore, some emojis may not be easily classified, making it more difficult for the user to decide in which category they should search for that emoji. There are known solutions which attempt to reduce further the burden of inputting emojis. For example, several messaging clients will replace automatically certain shorthand text with images. For example, Facebook Messenger will convert the emoticon :-) to a picture of a smiling face and will convert the short hand text sequence, (y), to a picture of a thumbs up when the message is sent.
Additionally, the Google Android Jellybean keyboard will offer an emoji candidate when the user types, exactly, a word corresponding to a description of that emoji, e.g. if 'snowflake' is typed, the picture Φ is offered to the user as candidate input.
These known solutions to reduce the burden of emoji input still require a user to provide the shorthand text that identifies the emoji or to type the exact description of the emoji. Although the known systems obviate the requirement to scroll through screens of emojis, they still require the user to explicitly and correctly identify the emoji they wish to enter.
It is an object of the present invention to address the above-mentioned problem and reduce the burden of image (e.g. emoji, emoticon or sticker) and label input in the messaging/texting environment. Summary of the invention
The present invention provides systems in accordance with independent claims 1 and 2, methods in accordance with independent claims 32, 33, 34, 54 and 55, and a computer program in accordance with independent claim 56.
Optional features of the invention are the subject of dependent claims.
Brief description of the drawings The present invention will now be described in detail with reference to the accompanying drawings, in which:
Figs. 1a and 1 b show systems for generating image/label predictions in accordance with a first system type of the present invention;
Figs. 2a-2c are schematics of alternative image/label language models according to the invention, to be used in the systems of Figs. 1a and 1 b; Fig. 3 is a schematic of an n-gram map comprises text sections associated with images/labels (for this example, emojis), for use in the language model of Figs. 2b and 2c;
Fig. 4 is a schematic of an n-gram map comprising text sections associated with images/labels (for this example, emojis), where images/labels identified in the training text have been associated with sections of text which do not immediately precede the identified image/label, for use in the image/label language model of Figs. 2b and 2c;
Fig. 5 shows a system for generating image/label predictions in accordance with a second system type of the present invention;
Fig. 6 shows a system for generating image/label predictions in accordance with a third system type of the present invention;
Figs. 7-1 1 illustrate different embodiments of a user interface in accordance with the present invention; and
Figs. 12-16 show flow charts according to methods of the present invention. Detailed description of the invention
The system of the present invention is configured to generate an image/label prediction relevant for user inputted text. In general, the system of the invention comprises a prediction means trained on sections of the text associated with an image/label. The prediction means is configured to receive the text input by a user and predict the relevance of the image/label to the user inputted text.
The image prediction may relate to any kind of image, including a photo, logo, drawing, icon, emoji or emoticon, sticker, or any other image which may be associated with a section of text. In a preferred embodiment of the present invention, the image is an emoji.
The label prediction may relate to any label associated with a body of text, where that label is used to identify or categorise the body of text. The label could therefore refer to the author of the text, a company/person generating sections of text, or any other relevant label. In a preferred embodiment of the present invention, the label is a hashtag, for example as used in Twitter feeds.
The present invention provides three alternative ways of generating image/label predictions to solve the problem of reducing the burden of image/label entry into electronic devices. In particular, the solutions comprise using a language model to generate image/label predictions, using a search engine to generate image/label predictions from a plurality of statistical models, and using a classifier to generate image/label predictions. The alternative solutions (i.e. alternative prediction means) will be described in that order. A system in accordance with the first solution can be implemented as shown in Figs. 1a and 1 b, which show block diagrams of the high level text prediction architecture according to the invention. The system comprises a prediction engine 100 configured to generate an image/label prediction 50 relevant for user inputted text. In Fig. 1 a, the prediction engine 100 comprises an image/label language model 10 to generate image/label predictions 50 and, optionally, word prediction(s) 60. The image/label language model 10 may be a generic image/label language model, for example a language model based on the English language, or may be an application-specific image/label language model, e.g. a language model trained on SMS messages or email messages, or any other suitable type of language model. The prediction engine 100 may comprise any number of additional language models, which may be text-only language models or an image/label language model in accordance with the present invention, as illustrated in Fig. 1 b.
As shown in Fig. 1 b, if the prediction engine 100 comprises one or more additional language models, such as additional language model 20, the predication engine 100 may comprise a multi-language model 30 (Multi-LM) to combine the image/label predictions and/or word predictions, sourced from each of the language models 10, 20 to generate final image/label predictions 50 and/or final word predictions 60 that may be provided to a user interface for display and user selection. The final image/label predictions 50 are preferably a set (i.e. a specified number) of the overall most probable predictions. The system may present to the user only the most likely image/label prediction 50.
The use of a Multi-LM 30 to combine word predictions sourced from a plurality of language models is described on line 1 of page 11 to line 2 of page 12 of WO 2010/1 12841 , which is hereby incorporated by reference.
If the additional language model 20 is a standard word-based language model, for example as described in detail in WO 2010/112842, and in particular as shown in relation to Figs. 2a-d of WO 2010/112842, the standard word-based language model can be used alongside the image/label-based language model 10, such that the prediction engine 100 generates an image/label prediction 50 from the image/label language model 10 and a word prediction 60 from the word-based language model 20. If preferred, the image/word based language model 10 may also generate word predictions (as described below with respect to Figs. 2a-2c) which are used by the Multi-LM 30 to generate a final set of word predictions 60. Since the additional language model 20 of this embodiment can predict words only, the Multi-LM 30 is not needed to output final image/label predictions 50. The word-based language model 20 may be replaced by any suitable language model for generating word predictions, which may include language models based on morphemes or word-segments, as discussed in detail in UK patent application no. 1321927.4, which is hereby incorporated by reference in its entirety.
If the additional language model 20 is an additional image/label language model, then the Multi- LM 30 can be used to generate final image/label predictions 50 from image/label predictions sourced from both language models 10, 20. The Multi-LM 30 may also be used to tokenise user inputted text, as described on the first paragraph of page 21 of WO 2010/112842, and as described in more detail below, in relation to the language model embodiments of the present invention.
An image/label language model 10 will be described with reference to figures 2a-2c which illustrate schematics of image/label language models which receive user inputted text and return image/label predictions 50 (and optionally word/term predictions 60).
There are two possible inputs into a given language model, a current term input 11 and a context input 12. The language model may use either or both of the possible inputs. The current term input 11 comprises information the system has about the term the system is trying to predict, e.g. the word the user is attempting to enter (e.g. if the user has entered "I am working on ge", the current term input 11 is 'ge'). This could be a sequence of multi-character keystrokes, individual character keystrokes, the characters determined from a continuous touch gesture across a touchscreen keypad, or a mixture of input forms. The context input 12 comprises the sequence of terms entered so far by the user, directly preceding the current term (e.g. "I am working"), and this sequence is split into 'tokens' by the Multi-LM 30 or a separate tokeniser (not shown). If the system is generating a prediction for the nth term, the context input 12 will contain the preceding n-1 terms that have been selected and input into the system by the user. The n-1 terms of context may comprise a single word, a sequence of words, or no words if the current word input relates to a word beginning a sentence. A language model may comprise an input model (which takes the current term input 11 as input) and a context model (which takes the context input 12 as input).
In a first embodiment illustrated in Fig. 2a, the language model comprises a trie 13 (an example of an input model) and a word-based n-gram map 14 (an example of a context model) to generate word predictions from current input 1 1 and context 12 respectively. The first part of this language model corresponds to that discussed in detail in WO 2010/112841 , and in particular as described in relation to Figs. 2a-2d WO 2010/112841. The language model of Fig. 2a of the present invention can also include an intersection 15 to compute a final set of word predictions 60 from the predictions generated by the trie 13 and n-gram map 14. As described in detail on line 4 of page 16 to line 14 of page 17 of WO 2010/1 12841 , the trie 13 can be a standard trie (see fig. 3 of WO 2010/1 12841) or an approximate trie (see fig. 4a of WO 2010/1 12841) which is queried with the direct current word-segment input 1 1. Alternatively, the trie 13 can be a probabilistic trie which is queried with a KeyPressVector generated from the current input, as described in detail on line 16 of page 17 to line 16 of page 20 (and illustrated in figs. 4b and 4c) of WO 2010/112841 , which is hereby incorporated by reference. The language model can also comprise any number of filters to generate the final set of word predictions 60, as described in that earlier application. If desired, the intersection 15 of the language model 10 of figs. 2a and 2c can be configured to employ a back-off approach if a candidate predicted by the trie has not been predicted by the n- gram map also (rather than retaining only candidates generated by both, which is described in WO 2010/112841). Each time the system has to back-off on the context searched for, the intersection mechanism 15 map apply a 'back-off penalty to the probability (which may be a fixed penalty, e.g. by multiplying by a fixed value). In this embodiment, the context model (e.g. the n-gram map) may comprise unigram probabilities with the back-off penalties applied.
The language model of Fig. 2a includes a word→image/label correspondence map 40, which maps each word of the language model 10 to one or more relevant images/labels, e.g. if the word prediction 60 is 'pizza', the language model outputs an image of a pizza (e.g. the pizza emoji) as the image prediction 50.
Fig. 2b illustrates a second image/label language model 10 in accordance with the first solution of the present invention. The image/label language model 10 is configured to generate an image/label prediction 50 and, optionally a word prediction 60 on the basis of context 12 alone. In this embodiment, the image/label language model receives the context input 12 only, which comprises one or more words which are used to search the n-gram map 14'. The n-gram map 14' of Fig. 2b is trained in a different way to that of Fig. 2a, enabling the image/label language model 10 to generate relevant image/label predictions 50 without the use of a word→image/label correspondence map 40. If there is no context 12, then the language model 10 may output the most likely image/label 50 associated with the most likely word 60 used to start a sentence. For certain circumstances, it may be appropriate to predict an image/label on the basis of context only, e.g. the prediction of emojis. In other circumstances, e.g. the prediction of a label (such as a hashtag), it might be more appropriate to use current word input (by itself or in addition to context input), because the user might partially type the label before it is predicted.
Examples of n-gram maps 14' of the second embodiment are illustrated schematically in Figs. 3 and 4, where for illustrative purposes an emoji has been chosen for the image/label.
The n-gram map 14' of Fig. 3 has been trained on source data comprising images/labels embedded in sections of text. For example, the language model could be trained on data from twitter, where the tweets have been filtered to collect tweets comprising emojis. In the n-gram map 14' of Fig. 3, the emojis (which are used merely as an example for the image/label) are treated like words to generate the language model, i.e. the n-gram context map comprises emojis in the context in which they have been identified. For example, if the source data comprises the sentence "I am not happy about this ©", the emoji © will only follow its preceding context, e.g. if the n-gram has a depth of four "happy about this ©". The language model will therefore predict the emoji © if the context 12 fed into the language model comprises ""happy about this", as it is the next part of the sequence. The n-gram map comprises the probabilities associates with sequences of words and emojis, where emojis and words are treated indiscriminately for assigning probabilities. The probabilities can therefore be assigned on the basis of frequency of appearance in the training data given a particular context in that training data.
The n-gram map of Fig. 4 has been trained by associating images/labels identified within the source text with sections of text which do not immediately precede the identified images/labels. By training the language model in this fashion, the language model is able to predict relevant/appropriate images/labels, even though the user has not entered text which describes the relevant image/label and has not entered text which would usually immediately precede the image/label, such as Ί am' for Ί am ©". To train this language model 10, images/labels are identified within a source text (e.g. filtered twitter tweets) and each identified image/label is associated with sections of text within that source text. Using the example of tweets, an emoji of a particular tweet is associated with all n-grams from that tweet. For example: training on the tweet "I'm not happy about this ©" would generate the following n-grams with associated emoji:
o I'm not happy ©
o not happy about ©
o happy about this ©
o I'm not ©
o not happy ©
etc.
One way to generate emoji predictions from such a non-direct context n-gram map 14' is to take the emojis that are appended to the word sequences of the n-gram map 14' which most closely match the word sequence of the user inputted text. If the user inputted text is W1W2W3W4, the predicted emoji is the emoji that is appended to the sequence W1W2W3W4. An alternative way to generate emoji predictions from a non-direct context n-gram map 14' is to predict an emoji for each word of the user inputted text, e.g. if the word sequence of user inputted text is \N^N2\Nz\N4i etc., predict a first emoji, e^ for \N a second emoji e2 for \N^\N2 (where \N^\N2 means predicting an emoji for the word sequence \N^N2), e3 for \N \N2\N$ and e4 for W1W2W3W4, etc. The weighted average of the set of emoji predictions (e^ e2, e3, e4) can be used to generate the emoji predictions 50, i.e. the most frequently predicted emoji will be outputted as the most likely emoji. By taking a weighted average of the set of emoji predictions, it may be possible to increase the contextual reach of the emoji prediction.
Owing to the number of different sections of text that can be associated with each emoji, the model is preferably pruned in two ways. The first is to prune based on frequency of occurrence, e.g. prune n-grams with frequency counts of less than a fixed number of occurrences (e.g. if a particular n-gram and associated emoji is seen less than 10 times in the training data, remove that n-gram and associated emoji).
The second way of pruning is to prune on the basis of the probability difference from the unigram probabilities. As an example, after the context "about this", the probability of predicting © will not be much larger than the unigram probability of ©, because training will also have encountered many other n-grams of the form about this [EMOJI] with no particular bias. The n- gram "about this ©" can therefore be pruned. A combination of the two pruning methods is also possible, as are any other suitable pruning methods.
Referring to Fig. 2b, the language model 10 receives a sequence of one or more words (context 12) from the Multi-LM 30 and compares the sequence of one or more words to a sequence of words stored in the n-gram map 14'. In relation to the n-gram map of Fig. 3, an emoji is only predicted if the emoji directly follows the sequence of one or more words, e.g. from the context sequence "not happy about this", the language model would predict "©". In relation to the n- gram map of Fig. 4, the language model generates an emoji prediction much more regularly, as the language model has been trained on direct and non-direct context.
As shown in Fig. 2b, the language model can optionally output one or more word prediction 60 alongside the image/label prediction(s) 50. The language model compares the input sequence of one or more words (context 12) to the stored sequences of words (with appended emojis). If it identifies a stored sequence of words that comprises the sequence of one or more words, it outputs the next word in the stored sequence that follows the sequence of one or more words, for direct input of the next word into the system or for display of the next word 60 on a user interface for user selection, for example. A third embodiment of a language model 10 is illustrated in Fig. 2c. As with the language model of Fig. 2a, the language model 10 of Fig. 2c comprises a trie 13 and an n-gram map 14' to generate word predictions from current input 11 and context input 12 respectively, and an intersection 15 to generate one or more final word prediction(s) 60. The n-gram map 14' of the third embodiment is the same of that of the second embodiment, i.e. it comprises images/labels embedded within sections of text or appended to sections of text. The same n-gram map 14' can therefore be used to generate image/label predictions 50, as well as word predictions 60.
As will be understood from above, the system of the first solution predicts an image/label on the basis of the user entered text and, optionally, a word/term on the basis of that user entered text.
Although the image/label language models 10 of the first solution have been described in relation to language models comprising trained n-gram maps, this is by way of example only, and any other suitably trained language model can be used. The second solution to reducing the burden of image/label input, relates to a search engine configured to generate image/label predictions for user input, similar to that discussed in detail in UK patent application 1223450.6, which is hereby incorporated by reference in its entirety. Fig. 5 shows a block diagram of the high level system architecture of the system of the invention. The search engine 100' uses an image/label database 70 that preferably comprises a one-to-one mapping of statistical models to image/labels, i.e. the image/label database comprises a statistical model associated with each image/label (e.g. emoji or hashtag), each image/label statistical model being trained on sections of text associated with that image/label. A language model is a non-limiting example of a statistical model, where the language model is a probability distribution representing the statistical probability of sequences of words occurring within a natural language. Unlike the language model 10 of the first solution, a language model in accordance with this solution does not have images/labels within the language model, it is a text only language model mapped to a particular image/label.
To generate the image/label predictions(s) 50, the search engine 100' uses the image/label database 70 and user inputted text 12' and, optionally, one or more other evidence sources 12", e.g. the image/label input history for a given user of a system. To trigger a search, the search engine receives user entered text 12'.
The image/label database 70 associates individual images/labels with an equal number of statistical models and, optionally, alternative statistical models (not shown) that are not language based (e.g. a model that estimates user relevance given prior input of a particular image/label), as will be described later.
The search engine 100' is configured to query the image/label database 70 with the user inputted text evidence 12' in order to generate for each image/label in the content database an estimate of the likelihood that the image/label is relevant given the user inputted text. The search engine outputs the most probable or the p most probable images/labels as image/label predictions 50, which may optionally be presented to a user.
An estimate for the probability, P, of observing the user inputted text, e, given an image/label, c, is relevant under an associated image/label statistical model M is:
P(e\c, M) There are many techniques which could be applied by the search engine to compute the required estimate, such as:
• naive Bayesian modelling
· maximum entropy modelling
• statistical language modelling
The first two approaches are based on extracting a set of features and training a generative model (which in this case equates to extracting features from a text associated with an image/label and training an image/label statistical model on those features), while statistical language modelling attempts to model a sequential distribution over the terms in the user inputted text. To provide a working example, the first approach is discussed, but they are all applicable. A set of features is extracted from user inputted text, preferably by using any suitable feature extraction mechanism which is part of the search engine 100'. To generate a relevance estimate, these features are assumed to have been independently generated by an associated image/label statistical model. An estimate of the probability of a given feature being relevant to particular image/label is stored in the image/label statistical model. In particular, an image/label statistical model is trained on text associated with an image/label by extracting features from the text associated with the image/label and analysing the frequency of these features in that text. There are various methods used in the art for the generation of these features from text. For example:
• 'Bag-of-words' term presence/absence: The features are the set of unique words used in the text.
• Unigram: The features are simply the words of the text. This model results in words which appear multiple times being given proportionally greater weight.
• Term combination: Features may include combinations of terms, either contiguous n-grams or representing non-local sentential relations.
• Syntactic: Features may include syntactic information such as part-of-speech tags, or higher level parse tree elements. • Latent topics/clusters: Features may be sets/clusters of terms that may represent underlying "topics" or themes within the text.
The preferred features are typically individual terms or short phrases (n-grams). Individual term features are extracted from a text sequence by tokenising the sequence into terms (where a term denotes both words and additional orthographic items such as morphemes and/or punctuation) and discarding unwanted terms (e.g. terms that have no semantic value such as 'stopwords'). In some cases, features may also be case-normalised, i.e. converted to lowercase. N-gram features are generated by concatenating adjacent terms into atomic entities. For example, given the text sequence "Dear special friends", the individual term features would be: "Dear", "special" and "friends", while the bigram (2-gram) features would be "Dear_special" and "speciaMriends".
It is preferable for the feature generation mechanism of the search engine 100' to weight features extracted from the user inputted text 12 in order to exaggerate the importance of those which are known to have a greater chance a priori of carrying useful information. For instance, for term features, this is normally done using some kind of heuristic technique which encapsulates the scarcity of the words in common English (such as the term frequency-inverse document frequency, TFiDF), since unusual words are more likely to be indicative of the relevan tical models than common words. TFiDF is defined as:
Figure imgf000013_0001
where tf(t) is the number of times term t occurs in the user inputted text, and df(t) is the number of image/label statistical models in which t occurs across all image/label statistical models. The D features of the user inputted text 12' can be represented by a real valued D-dimensional vector. Normalization can then be achieved by the search engine 100' by converting each of the vectors to unit length. It may be preferable to normalise the feature vector because a detrimental consequence of the independence assumption on features is that user inputted text samples of different length are described by a different number of events, which can lead to spurious discrepancies in the range of values returned by different system queries.
The probability, P(e|c,M), of observing the user inputted text, e, given an image/label, c, is relevant under an associated image/label statistical model M is computed as a product over independent features, f extracted from the text input by a user, e:
Figure imgf000014_0001
The search engine 100' is configured to query the image/label database 70 with each feature ft. The database returns a list of all the image/label statistical models comprising that feature and the probability estimate associated with that feature for each image/label statistical model. The probability, P(e|c,M), of observing the user inputted text, e, given an image/label, c, is relevant under an image/label statistical model, M, is computed as a product of the probability estimates for all of the features ft of the user inputted evidence e, over all of the image/label statistical models M that comprise those features ft.
This expression is rewritten, taking g, to be each unique feature which has occurred a given number of times (n,) (where ft = gini) in the user inputted text e, 12':
Figure imgf000014_0002
Assuming the search engine 100' includes the TFiDF weighting, n, can be replaced with its corresponding weight, w,. The weight vector wis a vector containing the TiFDF scores for all features extracted from the user inputted text. The weight vector is preferably normalized to have unit length:
Figure imgf000014_0003
And converting to logs:
Figure imgf000014_0004
log(P(e|c, )) can be rewritten as the dot product of two vectors, one representing the weights and the other representing the log probabilities:
Figure imgf000014_0005
M)) = w v In order to compute the above, an estimate of the image/label dependent feature likelihood, (gi \c, M), is needed. The search engine 100' takes this estimate from the image/label statistical model which has been trained by analysing the frequency of features in the source text. Under this approach, however, if the probability estimate for any feature of the user inputted text is zero (because, for example, the term is not present in the language model), the final probability (£"|c, ) would be zero. If the training corpus is sparse, it is unlikely that every feature in the user inputted text will have been observed in the training corpus for the image/label statistical model. Hence some form of smoothing can be used to reallocate some of the probability mass of observed features to unobserved features. There are many widely accepted techniques for smoothing the frequency-based probabilities, e.g. Laplace smoothing.
The search engine 100' can therefore determine which image/label 50 is the most relevant given the user inputted text by querying each image/label statistical model of the image/label database 70 with features ft extracted from the user inputted text, to determine which image/label statistical model provides the greatest probability estimate (since the image/label statistical models are mapped to corresponding images/labels).
As mentioned previously, the search engine 100' can take into account additional types of evidence, e.g. evidence that relates specifically to a given user, e.g. previously generated language, previously entered images/labels, or social context / demographic (e.g. since the type of emoji that is popularly used may vary with nationality/culture/age).
Furthermore, the search engine may take into account a prior probability of image/label relevance, e.g. a measure of the likelihood that an image/label will be relevant in the absence of any specific evidence related to an individual user or circumstance. This prior probability can be modelled using an aggregate analysis of general usage patterns across all images/labels. There are many further information sources that can be taken into account, for instance recency (how recently the image/label was inputted by a user) could be important, particularly in the case where an up-to-date image/label is particularly relevant, or if the image/label is used in a twitter feed followed by a large number of followers.
If multiple evidence sources 12', 12" are taken into account, the search engine 100' generates an estimate for each image/label given each evidence source. For each image/label, the search engine is configured to combine the estimates for the evidences sources to generate an overall estimate for that image/label. To do this, the search engine 100' may be configured to treat each of the evidence sources as independent, i.e. a user's image/label input history as independent from the text input. To compute the probability, P(E|c,Mc), of seeing the evidence, E, given a particular image/label, c, the evidence E is assumed to be separated into non-overlapping, mutually independent sets, [ei , en], that are independently generated from some distribution, conditioned on a target image/label c and an associated model Mc. This independence assumption can be written as:
Figure imgf000016_0001
The probability P(E|c,Mc) is therefore calculated by the search engine 100' as a product of the probability estimates for the independent evidence sources e,. The search engine 100' is therefore configured to calculate the individual evidence estimates separately.
There is a statistical model for each image/label , M, associated with each evidence source, and the relative impact of individual evidence sources can be controlled by the search engine 100' by a per-distribution smoothing hyper-parameter which allows the system to specify a bound on the amount of information yielded by each source. This can be interpreted as a confidence in each evidence source. An aggressive smoothing factor on an evidence source (with the limiting case being the uniform distribution, in which case the evidence source is essentially ignored) relative to other evidence sources will reduce the differences between probability estimates for an evidence source conditioned on different pieces of images/labels. The distribution becomes flatter as the smoothing increases, and the overall impact of the source on the probability, P(E|c, Mc), diminishes.
As described above, in one example, the statistical model may be a language model, such that there is a plurality of language models associated with the plurality of images/labels, where those language models comprise n-gram word sequences. In such an embodiment, the language models may be used to generate word predictions on the basis of the user inputted text (e.g. by comparing the sequence of words of the user inputted text to a stored sequence of words, to predict the next word on the basis of the stored sequence). The system is therefore able to generate a word prediction via the individual language models as well as an image/label prediction via the search engine. Alternatively, the system may comprise one or more language models (e.g. word-based language model, morpheme-based language model etc.), in addition to the statistical models of the search engine, to generate text predictions.
To increase processing speed, the search engine 100' may be configured to discard all features f, which have a TFiDF value lower than a certain threshold. Features with a low TFiDF weighting will, in general, have a minimal impact on the overall probability estimates. Furthermore, low TFIDF terms ('stop words') also tend to have a reasonably uniform distribution of occurrence across content corpora, meaning their impact on the probability estimates will also be reasonably uniform across classes. By reducing the number of features the search engine 100' uses to query the image/label database 70 with, the processing speed is increased.
Alternatively, or in addition, the search engine can be configured to retrieve the top k images/labels. The top-k image/label retrieval acts as a first pass to reduce the number of candidate images/labels, which can then be ranked using a more computationally expensive procedure. For each feature of the user inputted text, f, with TFiDF t (normalised to be in the range [0, 1]), the search engine is configured to find the k.t images/labels which have the highest probabilistic association with f, where this set of images/labels is denoted Cf . The search engine can then determine the union across all features C = Uf<FCf to obtain a set of candidate images/labels which is bounded above by |F|.k in size. The search engine than 'scores' the evidence with respect to this limited set of candidate images/labels. Since k is likely to be small compared to the original number of images/labels, this provides a significant performance improvement. Any other suitable solution for retrieving the top k images/labels can be employed, for example by using Apache Lucene (http://lucene.apache.org/) or by using a k- nearest neighbour approach (http://en.wikipedia.org/wiki/Nearest_neighbor_search#k- nearest_neighbor), etc. The value for k will depend on device capabilities versus accuracy requirements and computational complexity (for example, the number of features, etc.). The third solution to reduce the burden of image/label input uses a classifier to generate relevant image/label predictions on the basis of user entered text. Fig. 6 illustrates a system in accordance with a third embodiment of the invention that comprises a classifier 100" to generate image/label predictions 50 that are relevant to user inputted text 12'. A classifier 100" for generating text predictions has been described in detail in WO 201 1/042710, which is hereby incorporated by reference in its entirety. In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub- populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. The classifier 100" is the feature that implements classification, mapping input data to a category. In the present invention, the classifier 100" is configured to map user inputted text to image/labels. The classifier 100" is trained on text data that has been pre-labelled with images/labels, and makes real-time image/label predictions 50 for sections of text 12 entered into the system by a user.
A plurality of text sources 80 are used to train the classifier 100". Each of the plurality of text sources 80 comprises all of the sections of text associated with a particular image/label as found in the source data. For an unsupervised approach to generating the text sources, any text of a sentence comprising a particular image/label may be taken to be text associated with that image/label or any text which precedes the image/label may be taken to be associated text, for example a twitter feed and its associated hashtag or a sentence and its associated emoji.
Thus, each text source of the plurality of text sources 80 is mapped to or associated with a particular image/label.
User inputted text 12' is input into a Feature Vector Generator 90 of the system. The Feature Vector Generator 90 is configured to convert the user inputted text 12' into a feature vector ready for classification. The Feature Vector Generator 90 is as described above for the search engine system. The Feature Vector Generator 90 is also used to generate the feature vectors used to train the classifier (from the plurality of text sources) via a classifier trainer 95. The value D of the vector space is governed by the total number of features used in the model, typically upwards of 10,000 for a real-world classification problem. The Feature Vector Generator 90 is configured to convert a discrete section of text into a vector by weighting each cell according to a value related to the frequency of occurrence of that term in the given text section, normalised by the inverse of its frequency of occurrence (TFiDF) across the entire body of text, where tf(t) is the number of times term t occurs in the current source text, and df(t) is the number of source texts in which t occurs across the whole collection of text sources. Each vector is then normalised to unit length by the Feature Vector Generator 90.
The Feature Vector Generator 90 is configured to split user inputted text 12' into features (typically individual words or short phrases) and to generate a feature vector from the features. The feature vectors are D-dimensional real-valued vectors, RD , where each dimension represents a particular feature used to represent the text. The feature vector is passed to the classifier 100" (which uses the feature vector to generate image/label predictions). The classifier 100" is trained by a training module 95 using the feature vectors generated by the Feature Vector Generator 90 from the text sources 80. A trained classifier 100" takes as input a feature vector that has been generated from text input by a user 12', and yields image/label predictions 50, comprising a set of image/label predictions mapped to probability values, as an output. The image/label predictions 50 are drawn from the space of image/label predictions associated with/mapped to the plurality of text sources.
In a preferred embodiment, the classifier 100' is a linear classifier (which makes a classification decision based on the value of a linear combination of the features) or a classifier based on the batch perceptron principle where, during training, a weights vector is updated in the direction of all misclassified instances simultaneously, although any suitable classifier may be utilised. In one embodiment, a timed aggregate perceptron (TAP) classifier is used. The TAP classifier is natively a binary (2-class) classification model. To handle multi-class problems, i.e. multiple images/labels, a one-versus-all scheme is utilised, in which the TAP classifier is trained for each image/label against all other images/labels. The training of a classifier is described in more detail on line 26 of page 10 to line 8 of page 12 in WO 201 1/042710, which is hereby incorporated by reference.
A classifier training module 95 carries out the training process as already mentioned. The training module 95 yields a weights vector for each class, i.e. a weights vector for each image/label.
Given a set of N sample vectors of dimensionality D, paired with target labels ( ^ the classifier training procedure returns an optimized weights vector, W E RD . The prediction, f(x), of whether an image/label is relevant for a new user inputted text sample, x e RD , can be determined by: fix) = signi v. x) (1 ) Where the sign function converts an arbitrary real number to +/-1 based on its sign. The default decision boundary lies along the unbiased hyperplane iv. x = 0 , although a threshold can be introduced to adjust the bias. A modified form of the classification expression (1) is used without the sign function to yield a confidence value for each image/label, resulting in an M-dimensional vector of confidence values, where M is the number of images/labels. So, for instance, given a new, unseen user inputted text section represented by vector sample x e RD the following confidence vector c £ RD would be generated (where M=3 for simplicity):
Figure imgf000020_0001
Assuming a flat probability over all images/labels, the image/label confidence values generated by the classifier 100" are used to generate a set of image/label predictions (where the dot product with the highest value (greatest confidence) is matched to the most likely image/label).
If the images/labels are provided with a prior probability, e.g. a measure of the likelihood that an image/label will be relevant in the absence of any specific evidence related to an individual user or circumstance, or a prior probability based on the user's image/label input history, etc., then the system may further comprises a weighting module. The weighting module (not shown) may use the vector of confidence values generated by the classifier to weight the prior probabilities for each image/label to provide a weighted set of image/label predictions 50.
The weighting module may be configured to respect the absolute probabilities assigned to a set of image/label predictions, so as not to skew spuriously future comparisons. Thus, the weighting module can be configured to leave image/label predictions from the most likely prediction component unchanged, and down-scales the probability from less likely images/labels proportionally.
The image/label predictions 100" output by the classifier 100" (or weighting module) can be displayed on a user interface for user selection. As will be understood from above, the classifier 100" is required to generate the dot product of the input vector with each image/label vector to generate image/label predictions 50. Thus, the greater the number of image/labels, the greater the number of dot products the classifier is required to calculate.
To reduce the number of classes, the images/labels may be grouped together, e.g. all emojis relating to a particular emotion (such as happiness) can be grouped into one class, or all emojis relating to a particular topic or subject, such as clothing etc. In that instance, the classifier would predict the class, for example an emotion (sad, happy, etc.) and the n most likely emoji predictions of that class can be displayed to the user for user selection. However, this does result in the user having to select from a larger panel of emojis. To reduce processing power, whilst still predicting the most relevant emoji, the coarser grade classes could be used to find the right category of emoji, with the finer emoji prediction occurring only for that coarser category, thus reducing the number of dot products the classifier is required to take.
Alternatively, a first set of features can be extracted from the user inputted text to generate an initial set of image/label predictions, and a second set of features can be extracted from the user inputted text to determine the one or more most-likely image/label predictions from that initial set of image/label predictions. To save on processing power, the first set of features may be smaller in number than the second set of features.
If the system is to deal with a large volume of images/labels then the use of a search engine 100' may become more desirable than the classifier 100", because the search engine calculates the probabilities associated with the images/labels by a different mechanism which is able to cope better with determining probability estimates for a large volume of images/labels.
The systems of the present invention can be employed in a broad range of electronic devices. By way of non-limiting example, the present system can be used for messaging, texting, emailing, tweeting etc. on mobile phones, PDA devices, tablets, or computers.
The present invention is also directed to a user interface for an electronic device, wherein the user interface displays the predicted image/label 50 for user selection and input. The image/label prediction 50 can be generated by any of the systems discussed above. As described in more detail below, the user interface preferably displays one or more word/term predictions 60 for user selection, in addition to the display of one or more image/label predictions 50.
A user interface in accordance with embodiments of the invention will now be described with reference to Figs. 7-11. Figures 7-1 1 illustrate, by way of example only, the display of an emoji on a user interface for user selection and input. However, the invention is not limited to the display and input of an emoji, and is applicable to any image/label prediction 50.
In a first embodiment of a user interface, as illustrated in Fig. 7, the user interface comprises one or more candidate prediction buttons (in this example, three candidate prediction buttons) displaying one or more (in this example three) most likely user text predictions (i.e. The', T, 'What', in this example). The user interface 150 also comprises a virtual button 155 for displaying the current most relevant image/label prediction 60 (in a preferred embodiment, an emoji, and in the particular example illustrated a beer emoji). Processing circuitry of the device is configured such that a first user input, for example a tap on a touchscreen device, directed at the virtual button 155 displaying the emoji, inputs the displayed emoji into the device; and a second user input (different from the first user input), for example a long-press or directional swipe directed at the button 155, opens a menu to other actions, e.g. next most relevant emoji predictions, all emoji, carriage return, etc.
In a second embodiment of a user interface 150 illustrated in Fig. 8, an image (e.g. emoji) prediction 50 mapped to a word prediction 60 (e.g. through the word->emoji correspondence map of fig. 2a) will be presented as a prediction 160 on a prediction pane, alongside the matching word prediction 161. The candidate prediction buttons therefore display the two most relevant word predictions (for the example of a user interface with three candidate buttons) and the image (e.g. emoji) most appropriate for the most relevant word prediction. Alternatively, the image/label prediction presented as a prediction 160 on the prediction pane is the most-likely image/label prediction (as determined by any of the above described systems) and does not therefore need to correspond to a word prediction of the prediction pane. For consistency of layout, the image (e.g. emoji) prediction 160 may always be displayed on the right-hand side of the prediction pane, making the image easy to locate. Alternative image (e.g. emoji) predictions 60 may be made available by long-pressing the image (e.g. emoji) prediction button 160. The emoji button 155 reflects this prediction and also presents emoji related to recently typed words. A first gesture (e.g. a tap) on the image (for the illustrated example, emoji) button 155 will insert the emoji displayed by the button, and a second gesture on the button (e.g. a longpress or swipe) will display the emojis related to recently typed words for user selection.
In a third embodiment of a user interface 150 illustrated in Fig. 9, an image/label (e.g. for the illustrated example, an emoji) candidate prediction button 165 which displays the current most likely image (e.g. emoji) permanently appears on the prediction pane. When there are emoji associated with either the current word candidates (in this example 'food', 'and', 'is') or words that have been recently typed (e.g. 'cat'), one is presented on this candidate button 165. The emoji displayed on the button 165 can be inserted via a first gesture on the button 165 or button 155 (e.g. a tap), with the alternative emojis available via a second gesture on the button 155 or button 165 (e.g. by a long-press or swipe).
In a preferred embodiment, the image/label panel (e.g. emoji panel) displaying alternative relevant images (e.g. emojis) can be accessed by long-pressing the imagel/label candidate prediction button 165. To access all emoji (rather than just those offered as the most likely emoji), the user long presses the emoji candidate prediction button 165, slides their finger towards the emoji panel icon and releases. The emoji panel icon will be on the far left side of the pop-up to allow a 'blind directional swipe' to access it. The rest of the pop-up is filled with extended emoji predictions.
In an alternative user interface, as illustrated in Fig. 10, the image/label (e.g. emoji) can be displayed with its matching word on a candidate button 170 of the prediction pane. The word can be inserted by a first user gesture on the candidate button 170 (e.g. by tapping the button 170), with the image/label (e.g. emoji) inserted via a second user gesture on the candidate button 170 (for example, by a long press of the button 170). Furthermore, if desired, a standard emoji key 155 can be provided as with previous user interface embodiments to allow the user to insert a predicted emoji (which may not necessarily match the predicted word) or allow the user to search for alternative emojis. Fig. 11 illustrates how an image (e.g. emoji) can be displayed and inserted with a continuous touch input, for example as described in detail in earlier application WO2013/107998, which is hereby incorporated by reference in its entirety, and as illustrated in Fig. 1 of WO2013/107998. In the user interface of Fig. 1 1 , the prediction pane comprises a word prediction button 175 'heart' and an emoji prediction button 165 which displays a relevant emoji, e.g. [heart emoji]. To insert the text prediction 'heart' the user moves over to the word prediction pane and removes their finger from contact with the user interface at a location on the word prediction button 175. Alternatively, the word prediction is inserted whenever the user lifts their finger form the user interface, unless their finger is lifted at the emoji button. For example, the processing circuitry can be configured to insert the word if the user lifts their finger from the user interface whilst on the last character of the word or even mid-word when the prediction engine has predicted and displayed that word for user selection and input. To insert the predicted emoji, the user breaks contact with the touchscreen interface at the emoji candidate button 165. Furthermore, the processing circuitry for the user interface may be configured such that the user ending the continuous touch gesture on the emoji button 165 and remaining on the emoji button 165 for a particular length of time brings up a pop-up panel 200 of alternative emojis for user selection.
The user interface has been described as comprising various 'buttons'. The term 'button' is used to describe an area on a user interface where an image/label/word is displayed, where that image/label/word which is displayed can be input by a user by activating the 'button', e.g. by gesturing on or over the area which displays the image/label/word.
By the described user interface, the user is able to insert relevant images/labels (including emojis) with minimal effort. Methods of the present invention will now be described with reference to Figs. 12-16 which are schematic flow charts of methods according to the invention.
Referring to Fig. 12, the present invention provides a method of generating a prediction means to predict an image/label relevant to user inputted text. As discussed above in relation to the various systems of the present invention, the method comprises receiving text having one or more images/labels embedded within sections of text 400, identifying an image/label embedded within the text 410, and associating the identified image/label with sections of the text 420. The prediction means is then trained on the sections of text associated with the image/label. As described above, when the prediction means is a language model 10, the language model 10 is trained on text comprising images/labels, for example by including an image/label in an n-gram word/image sequence or by appending the image/label to an n-gram word sequence. When the prediction means is a search engine 100' comprising a plurality of statistical models, each statistical model can be mapped to a given image/label and trained on text associated with that image/label. When the prediction means is a classifier 100" trained on a plurality of text sources, each text source comprises sections of text associated with a given image/label. In a second method of the invention, as illustrated in Fig. 13, there is provided a method of predicting using a prediction means an image/label relevant to text input into a system by a user, wherein the prediction means is trained on sections of text associated with an image/label. The method comprises receiving at the prediction means the text input by a user 500, determining the relevance of the text input by a user to the sections of text associated with the image/label 510, and predicting on the basis of the sections of text associated with the image/label the relevance of the image/label to the text input by a user 520. As described above in relation to the system description, when the prediction means is the search engine 100', the search engine 100' determines the relevance of the user inputted text by extracting features from the user inputted text and querying an image/label database 70 with those features. By querying the database 70, the search engine 100' is able to determine which image/label statistical model is the most relevant and is therefore able to generate image/label predictions 50, because each statistical model is mapped to a particular image/label. Again, as described above with respect to the system, when the prediction means is a classifier 100", the classifier 100' is able to determine the relevance of an image/label to user inputted text by generating the dot product of a feature vector representing the image/label (generated from the source text which comprises sections of text associated with that image/label) with a feature vector representing the user inputted text. In a third method of the invention, as illustrated in Fig. 14, there is provided a method to predict using a prediction means an image/label relevant to text input into a system by a user, wherein the prediction means is trained on text which comprises an image/label embedded within text, wherein the prediction means has been trained by identifying the image/label within the text and associating the identified image/label with sections of the text. The method comprises receiving at the prediction means the text input by a user 600, comparing the text input by a user to the sections of text associated with the image/label 610, and predicting on the basis of the sections of text associated with the identified image/label the relevance of the image/label to the text input by a user 620. As described above in relation to the system description, when the prediction means is a language model 10, the language model may comprise an image/label within an n-gram word/image sequence of an n-gram map 14' or an image/label appended to an n-gram word sequence of an n-gram map 14'. The language model predicts a relevant image/label 50 by comparing the user inputted text to a stored n-gram sequence and outputting a relevant image/label which is part of the stored n-gram or is appended to the stored n-gram. Alternatively, the language model comprises a word-based n-gram map 14 and a word→image correspondence map 40 which is trained on sections of text (i.e. words) associated with the images. The language model is configured to predict the next word in a sequence of user inputted words by comparing the word sequence to a stored n-gram of the map 14 and then mapping this predicted word to an image using the correspondence map 40. Third and fourth methods of the invention relate to a user's interaction with a touchscreen user interface of a device comprising one or more of the above described systems for generating image/label predictions 50. In particular, the third method of the invention provides a method of entering data into an electronic device comprising a touchscreen user interface having a keyboard, wherein the user interface comprises a virtual image/label button configured to display the predicted image/label for user selection. The method comprises inputting a character sequence via a continuous gesture across the keyboard 700. In response to a user gesture across the image/label virtual button, the method comprises inputting the image/label as data 720. The gesture may include breaking contact with the user interface at the image/label virtual button.
The fourth method relates to a method for selecting between entry of a word/term and entry of an image/label that corresponds to that word/term on a touchscreen user interface comprising a virtual button configured to display a predicted word/term and/or the predicted image/label. The method comprises, in response to receipt of a first gesture type on/across the button, inputting the predicted word/term 800; and, in response to a second gesture type on/across the button, inputting the predicted image/label 810.
As will be apparent from the above description, the present invention solves the above mentioned problems by providing a system and method for predicting emojis/stickers based on user entered text. The present invention is able to increase the speed of emoji input by offering one or several relevant emoji predictions, which saves the user from having to scroll through different emojis to identify the one they want.
Furthermore, the system and method of the present invention provides increased emoji discoverability, as the prediction of emojis based on next-word prediction/correction and context means that an emoji may be predicted and presented to a user, even though the user may not be aware that a relevant or appropriate emoji exists. The systems and methods of the present invention therefore provide efficient emoji selection and input into an electronic device. Rather than having to scroll through possible emojis, the user can insert a relevant emoji by the tap of a virtual key displaying a predicted emoji. Although the examples have been provided with reference to emojis, the invention is equally applicable to the insertion of any image/label relevant to user entered text, as previously described.
The present invention also provides a computer program product comprising a computer readable medium having stored thereon computer program means for causing a processor to carry out one or more of the methods according to the present invention.
The computer program product may be a data carrier having stored thereon computer program means for causing a processor external to the data carrier, i.e. a processor of an electronic device, to carry out the method according to the present invention. The computer program product may also be available for download, for example from a data carrier or from a supplier over the internet or other available network, e.g. downloaded as an app onto a mobile device (such as a mobile phone) or downloaded onto a computer, the mobile device or computer comprising a processor for executing the computer program means once downloaded.
It will be appreciated that this description is by way of example only; alterations and modifications may be made to the described embodiment without departing from the scope of the invention as defined in the claims.

Claims

Claims
1. A system configured to predict an image/label relevant to text input by a user, the system comprising:
a means for receiving text input by a user;
a prediction means trained on sections of text associated with an image/label, wherein the prediction means is configured to:
receive the text input by the user;
determine the relevance of the text input by the user to the sections of text associated with the image/label; and
predict on the basis of the sections of text associated with the image/label the relevance of the image/label to the text input by the user.
2. A system configured to predict an image/label relevant to text input by a user, the system comprising:
a means for receiving text input by a user;
a prediction means trained on text which comprises an image/label embedded within text, wherein the prediction means has been trained by identifying the image/label within the text and associating the identified image/label with sections of the text;
wherein the prediction means is configured to:
receive the text input by the user;
compare the text input by the user to the sections of text associated with the image/label; and
predict on the basis of the sections of text associated with the identified image/label the relevance of the image/label to the text input by the user.
3 The system of claim 2, wherein the prediction means is a language model trained on text comprising a plurality of images/labels embedded with the text.
4. The system claim 1 , wherein the prediction means is a search engine comprising a plurality of statistical models corresponding to a plurality of images/labels, each of the plurality of statistical models being trained on sections of text associated with a corresponding image/label for that statistical model.
5. The system of claim 1 , wherein the prediction means is a classifier which has been trained on a plurality of text sources, each text source comprising sections of text associated with a particular image/label.
6. The system of claim 1 , 4 or 5, wherein the system comprises a feature generation mechanism which is configured to extract a set of features from the text input by the user.
7. The system of claim 6, wherein the feature generation mechanism is configured to generate a feature vector from the features, the feature vector representing the text input by the user.
8. The system of claim 7, wherein the feature generation mechanism is configured to weight the features by their term frequency-inverse document frequency.
9. The system of claim 7 or 8 when dependent on claim 5, wherein the feature generation mechanism is configured to extract features from each of the plurality of text sources and generate a feature vector for each image/label.
10. The system of claim 9, wherein the classifier is trained on the feature vectors for the images/labels.
1 1. The system of claim 10, wherein the classifier generates the dot product of the feature vector representing the text input by the user with the feature vector representing the text associated with an image/labels, to determine whether that image/label is relevant given the text input by the user.
12. The system of claim 7 or 8 when dependent on claim 4, wherein the feature generation mechanism is configured to extract features from the sections of text associated with an image/label and train the corresponding statistical model on the extracted features.
13. The system of claim 12, wherein the search engine is configured to query each statistical model with each feature of the text input by the user to determine the presence of the feature and its frequency.
14. The system of claim 4 or 5, wherein the system further comprises a model which comprises prior probabilities for the images/labels on the basis of prior use or prior input of the images/labels by the user.
15. The system of any preceding claim, wherein the identified image/label is associated with sections of the text which do not immediately precede the identified image/label.
16. The system of any preceding claim, wherein the text input does not correspond to a description of that image/label.
17. The system of claim 3 or claim 15 or 16 when dependent on claim 3, wherein the language model comprises an n-gram map comprising word sequences associated with an image/label.
18. The system of claim 17, wherein the prediction means comprises a prediction engine that comprises:
the language model trained on text comprising a plurality of images/labels embedded with the text; and
a means to generate from the text input by the user a sequence of one or more words; wherein the prediction engine is configured to:
receive the sequence of one or more words;
compare the sequence of one or more words to a stored sequence of one or more words associated with an image/label; and
predict, based on a stored sequence of one or more words associated with an image/label, an image/label relevant for the sequence of one or more words.
19. The system of claim 18, wherein the prediction engine is configured to predict based on the stored sequence of one or more words associated with an image/label, the next word in the sequence of one or more words.
20. The system of claim 2, wherein the prediction means comprises a word-based language model comprising stored sequences of words, a map which maps images to words, the map being trained on words associated with images/labels, and a means to generate from the text input by the user a sequence of one or more words; wherein the prediction means is configured to: compare the sequence of one or more words to a stored sequence of words to predict the next word in the sequence of one or more words; and
predict using the map an image that is associated with the next word.
21. The system of any preceding claim, wherein the prediction means further comprises: a word-based language model comprising stored sequences of words;
wherein the prediction means is configured to
generate a sequence of one or more words from the text input by the user;
compare the sequence of one or more words to a stored sequence of words in the word-based language model; and
predict based on a stored sequence of words the next word in the sequence.
22. The system of any preceding claim, wherein the image is an emoji, emoticon or sticker.
23. The system of any preceding claim, wherein the label is a hashtag.
24. The system of any preceding claim, wherein the prediction engine is configured to output the image/label if it is determined to be relevant to the text input by the user.
25. An electronic device comprising:
a system according to any preceding claim; and
a user interface configured to receive user input and display the predicted image/label.
26. The system of claim 25, wherein the user interface further comprises:
a virtual image/label button configured to display the predicted image/label for user selection.
27. The system of claim 26 when dependent on claim 19 or 21 , wherein the user interface further comprising a word/term virtual button configured to display a predicted word/term for user selection.
28. The system of claim 26 or 27, wherein the user interface is configured to accept user text input as a continuous gesture across a keypad, and the interface is configured such that the predicted image/label is input in response to a gesture across the or on the image/label virtual button.
29. The system of one of claims 25-27, further comprising processing circuitry configured to receive as inputs a first user input on the user interface and a second user input on the user interface, wherein the first user input is different in at least one respect from the second user input;
in response to the receipt of the first user input directed at the virtual button, display on the display the predicted image/label as user inputted data; and
in response to the receipt of the second user input directed at the virtual button, display on the display alternative image/label predictions for user selection.
30. The system of claim 25, wherein the user interface further comprises:
a virtual button configured to display a predicted word/term and/or the predicted image/label for user selection, wherein the word/term corresponds to the image/label.
31. The system of claim 30, wherein the processing circuitry is configured to distinguish between two types of gestures on/across the virtual button and, wherein in response to receipt of a first gesture type on/across the button the predicted word/term is input and in response to a second gesture type on/across the button the predicted image/label is input.
32. A method of generating a prediction means to predict an image/label relevant to text input by a user, the method comprising:
receiving text which comprises one or more images/labels embedded within sections of text;
identifying an image/label embedded within the text;
associating the identified image/label with sections of the text; and
training the prediction means on the sections of text associated with the image/label.
33. A method to predict using a prediction means an image/label relevant to text input into a system by a user, wherein the prediction means is trained on sections of text associated with an image/label, and the method comprises:
receiving at the prediction means the text input by the user;
determining the relevance of the text input by the user to the sections of text associated with the image/label; and
predicting on the basis of the sections of text associated with the image/label the relevance of the image/label to the text input by the user.
34. A method to predict using a prediction means an image/label relevant to text input into a system by a user, wherein the prediction means is trained on text which comprises an image/label embedded within text, wherein the prediction means has been trained by identifying the image/label within the text and associating the identified image/label with sections of the text, and the method comprises:
receiving at the prediction means the text input by the user;
comparing the text input by the user to the sections of text associated with the image/label; and
predicting on the basis of the sections of text associated with the identified image/label the relevance of the image/label to the text input by the user.
35. The method of claim 34, wherein the prediction means is a language model and the method comprises training the language model on text comprising a plurality of images/labels embedded within the text.
36. The method of claim 33 or 34, wherein the prediction means is a search engine comprising a plurality of statistical models corresponding to a plurality of images/labels, wherein the method comprises training each of the plurality of statistical models on sections of text associated with a corresponding image/label for that statistical model.
37. The method of claim 33, wherein the prediction means is a classifier, and the method comprises training the classifier on a plurality of text sources, each text source comprising sections of text associated with a particular image/label.
38. The method of claim 36 or 37, further comprising extracting, using a feature generation mechanism, a set of features from the text input by the user.
39. The method of claim 38, further comprising generating using the feature generation mechanism a feature vector from the features, the feature vector representing the text input by the user.
40. The method of claim 39, further comprising determining the term frequency-inverse document frequency for each feature and weighting the features of the feature vector with the determined values.
41. The method of claim 39 or 40 when dependent on claim 37, further comprising extracting using the feature generation mechanism features from each of the plurality of text sources and generating a feature vector for each image/label.
42. The method of claim 41 , further comprising training the classifier on the feature vectors for the images/labels.
43. The method of claim 41 , wherein the method comprises:
generating the dot product, using the classifier, of the feature vector representing the text input by the user with the feature vector representing the text associated with an image/label to determine whether that image/label is relevant given the text input by the user.
44. The method of claim 38 or 39 when dependent on claim 36, further comprising extracting using the feature generation mechanism features from the sections of text associated with an image/label and training the corresponding statistical model on the extracted features.
45. The method of claim 44, further comprising querying, using the search engine, each statistical model with each feature of the text input by the user to determine the presence of the feature and its frequency.
46. The method of any one of claims 33-45, wherein the identified image/label is associated with sections of the text which do not immediately precede the identified image/label.
47. The method of any one of claims 33-47, wherein the text input does not correspond to a description of that image/label.
48. The method of one of claims 34 or 46-47 when dependent on claim 34, wherein the language model comprises an n-gram map comprising word sequences associated with an image/label and the method comprises:
generating from the text input by the user a sequence of one or more words;
receiving, at a prediction engine comprising the language model, the sequence of one or more words;
comparing the sequence of one or more words to a stored sequence of one or more words associated with an image/label; and predicting, based on a stored sequence of one or more words associated with an image/label, the image/label relevant for the sequence of one or more words.
49. The method of claim 48, wherein the method further comprising predicting with the prediction engine the next word in the sequence of one or more words based on the stored sequence of one or more words associated with an image/label.
50. The method of claim 33, wherein the prediction means comprises a word-based language model having stored sequences of words, a map which maps images to appropriate words, wherein the map has been trained on words associated with images, and the method further comprises:
generating from the input text a sequence of one or more words;
comparing the sequence of one or more words to a stored sequence, predicting the next word in the sequence of one or more words; and identifying using the map the image associated with the next term.
51. The method of any one of claims 33-49, wherein the prediction means further comprises a word-based language model comprising stored sequences of words; and the method comprises:
generating a sequence of one or more words from the text input by the user;
comparing the sequence of one or more words to a stored sequence of words in the word-based language model; and
predicting based on a stored sequence of words the next word in the sequence.
52. The method of any one of claims 32-51 , wherein the image is an emoji, emoticon, or sticker.
53. The method of any one of claims 32-52, wherein the label is a hashtag.
54. A method of entering data into an electronic device comprising a touchscreen user interface having a keyboard, wherein the user interface comprises a virtual image/label button configured to display a predicted image/label for user selection, wherein the method comprises: inputting a character sequence via a continuous gesture across the keyboard, and in response to a gesture across the image/label virtual button, inputting the image/label as data.
55. A method for selecting between entry of a word/term and entry of an image/label that corresponds to that word/term on a touchscreen user interface comprising a virtual button configured to display a predicted word/term and/or the predicted image/label, the method comprising:
in response to receipt of a first gesture type on/across the button, inputting the predicted word/term; and
in response to a second gesture type on/across the button, inputting the predicted image/label.
56. A computer program for causing a processor to carry out the method of any of claims 32-55.
PCT/GB2014/053688 2012-12-27 2014-12-12 System and method for inputting images or labels into electronic devices WO2015087084A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020167018754A KR102345453B1 (en) 2013-12-12 2014-12-12 System and method for inputting images or labels into electronic devices
CN201480067660.XA CN105814519B (en) 2013-12-12 2014-12-12 System and method for inputting image or label to electronic equipment
EP14819056.4A EP3080682A1 (en) 2013-12-12 2014-12-12 System and method for inputting images or labels into electronic devices
US15/179,833 US10664657B2 (en) 2012-12-27 2016-06-10 System and method for inputting images or labels into electronic devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1322037.1 2013-12-12
GBGB1322037.1A GB201322037D0 (en) 2013-12-12 2013-12-12 System and method for inputting images/labels into electronic devices

Related Parent Applications (3)

Application Number Title Priority Date Filing Date
PCT/GB2013/053433 Continuation-In-Part WO2014102548A2 (en) 2012-12-27 2013-12-27 Search system and corresponding method
PCT/GB2013/053433 A-371-Of-International WO2014102548A2 (en) 2012-12-27 2013-12-27 Search system and corresponding method
US14/758,221 Continuation-In-Part US11200503B2 (en) 2012-12-27 2013-12-27 Search system and corresponding method

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US201514758221A Continuation 2012-12-27 2015-06-26
US15/179,833 Continuation US10664657B2 (en) 2012-12-27 2016-06-10 System and method for inputting images or labels into electronic devices

Publications (1)

Publication Number Publication Date
WO2015087084A1 true WO2015087084A1 (en) 2015-06-18

Family

ID=50030861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2014/053688 WO2015087084A1 (en) 2012-12-27 2014-12-12 System and method for inputting images or labels into electronic devices

Country Status (5)

Country Link
EP (1) EP3080682A1 (en)
KR (1) KR102345453B1 (en)
CN (1) CN105814519B (en)
GB (1) GB201322037D0 (en)
WO (1) WO2015087084A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017044300A1 (en) * 2015-09-09 2017-03-16 Apple Inc. Emoji and canned responses
US20170160903A1 (en) * 2015-12-04 2017-06-08 Codeq Llc Methods and Systems for Appending a Graphic to a Digital Message
US9690767B2 (en) 2014-07-07 2017-06-27 Machine Zone, Inc. Systems and methods for identifying and suggesting emoticons
WO2017116839A1 (en) * 2015-12-29 2017-07-06 Machine Zone, Inc. Systems and methods for suggesting emoji
WO2017150860A1 (en) * 2016-02-29 2017-09-08 Samsung Electronics Co., Ltd. Predicting text input based on user demographic information and context information
WO2017223011A1 (en) * 2016-06-23 2017-12-28 Microsoft Technology Licensing, Llc Emoji prediction by suppression
WO2018075191A1 (en) * 2016-10-17 2018-04-26 Google Llc Techniques for scheduling language models and character recognition models for handwriting inputs
WO2018101671A1 (en) * 2016-11-29 2018-06-07 Samsung Electronics Co., Ltd. Apparatus and method for providing sentence based on user input
US9998888B1 (en) 2015-08-14 2018-06-12 Apple Inc. Easy location sharing
CN109074172A (en) * 2016-04-13 2018-12-21 微软技术许可有限责任公司 To electronic equipment input picture
WO2019036127A1 (en) * 2017-08-16 2019-02-21 General Electric Company Framework for rapid additive design with generative techniques
US10254917B2 (en) 2011-12-19 2019-04-09 Mz Ip Holdings, Llc Systems and methods for identifying and suggesting emoticons
WO2019077279A1 (en) * 2017-10-20 2019-04-25 Inria Institut National De Recherche En Informatique Et En Automatique Computer device with improved touch interface and corresponding method
US10402493B2 (en) 2009-03-30 2019-09-03 Touchtype Ltd System and method for inputting text into electronic devices
US10445425B2 (en) 2015-09-15 2019-10-15 Apple Inc. Emoji and canned responses
US10565219B2 (en) 2014-05-30 2020-02-18 Apple Inc. Techniques for automatically generating a suggested contact based on a received message
US10585559B2 (en) 2014-05-30 2020-03-10 Apple Inc. Identifying contact information suggestions from a received message
US10659405B1 (en) 2019-05-06 2020-05-19 Apple Inc. Avatar integration with multiple applications
CN111695357A (en) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 Text labeling method and related product
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11074408B2 (en) 2019-06-01 2021-07-27 Apple Inc. Mail application features
US11103161B2 (en) 2018-05-07 2021-08-31 Apple Inc. Displaying user interfaces associated with physical activities
US11194467B2 (en) 2019-06-01 2021-12-07 Apple Inc. Keyboard management user interfaces
US11575622B2 (en) 2014-05-30 2023-02-07 Apple Inc. Canned answers in messages
US11782575B2 (en) 2018-05-07 2023-10-10 Apple Inc. User interfaces for sharing contextually relevant media content

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180070659A (en) 2015-12-21 2018-06-26 구글 엘엘씨 Automatic suggestions for messaging applications and other content
CN108781175B (en) 2015-12-21 2021-09-21 谷歌有限责任公司 Method, medium, and system for automatic suggestion of message exchange contexts
US10547574B2 (en) 2016-09-20 2020-01-28 Google Llc Suggested responses based on message stickers
WO2018057536A1 (en) 2016-09-20 2018-03-29 Google Llc Bot requesting permission for accessing data
US10416846B2 (en) * 2016-11-12 2019-09-17 Google Llc Determining graphical element(s) for inclusion in an electronic communication
US10068380B2 (en) * 2016-11-17 2018-09-04 Adobe Systems Incorporated Methods and systems for generating virtual reality environments from electronic documents
KR102426435B1 (en) * 2016-11-29 2022-07-29 삼성전자주식회사 Apparatus and method for providing a sentence based on a user input
KR102430567B1 (en) * 2016-12-21 2022-08-09 삼성전자주식회사 Electronic device and method for providing image associated with text
US10579902B2 (en) * 2016-12-21 2020-03-03 Samsung Electronics Co., Ltd. Method and electronic device for providing text-related image
CN108229518B (en) * 2017-02-15 2020-07-10 北京市商汤科技开发有限公司 Statement-based image detection method, device and system
US10891485B2 (en) 2017-05-16 2021-01-12 Google Llc Image archival based on image categories
US10404636B2 (en) 2017-06-15 2019-09-03 Google Llc Embedded programs and interfaces for chat conversations
KR101982081B1 (en) * 2017-08-17 2019-08-28 한국과학기술원 Recommendation System for Corresponding Message
CN109063001B (en) * 2018-07-09 2021-06-04 北京小米移动软件有限公司 Page display method and device
US10956487B2 (en) 2018-12-26 2021-03-23 Industrial Technology Research Institute Method for establishing and processing cross-language information and cross-language information system
CN110443189B (en) * 2019-07-31 2021-08-03 厦门大学 Face attribute identification method based on multitask multi-label learning convolutional neural network
KR20220041624A (en) * 2020-09-25 2022-04-01 삼성전자주식회사 Electronic device and method for recommending emojis
KR102523803B1 (en) * 2020-11-17 2023-04-21 주식회사 한글과컴퓨터 Data processing apparatus for classification of machine learning data and the operating method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244446A1 (en) * 2007-03-29 2008-10-02 Lefevre John Disambiguation of icons and other media in text-based applications
WO2010112841A1 (en) * 2009-03-30 2010-10-07 Touchtype Ltd System and method for inputting text into electronic devices
WO2011042710A1 (en) * 2009-10-09 2011-04-14 Touchtype Ltd System and method for inputting text into electronic devices
US20130159919A1 (en) * 2011-12-19 2013-06-20 Gabriel Leydon Systems and Methods for Identifying and Suggesting Emoticons
WO2013107998A1 (en) * 2012-01-16 2013-07-25 Touchtype Limited A system and method for inputting text

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7679534B2 (en) * 1998-12-04 2010-03-16 Tegic Communications, Inc. Contextual prediction of user words and user actions
EP1566727A1 (en) * 2004-02-20 2005-08-24 Research In Motion Limited Predictive text input system for a mobile communication device
KR20080106265A (en) * 2006-02-16 2008-12-04 에프티케이 테크놀로지스 리미티드 A system and method of inputting data into a computing system
GB201108200D0 (en) * 2011-05-16 2011-06-29 Touchtype Ltd User input prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244446A1 (en) * 2007-03-29 2008-10-02 Lefevre John Disambiguation of icons and other media in text-based applications
WO2010112841A1 (en) * 2009-03-30 2010-10-07 Touchtype Ltd System and method for inputting text into electronic devices
WO2011042710A1 (en) * 2009-10-09 2011-04-14 Touchtype Ltd System and method for inputting text into electronic devices
US20130159919A1 (en) * 2011-12-19 2013-06-20 Gabriel Leydon Systems and Methods for Identifying and Suggesting Emoticons
WO2013107998A1 (en) * 2012-01-16 2013-07-25 Touchtype Limited A system and method for inputting text

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402493B2 (en) 2009-03-30 2019-09-03 Touchtype Ltd System and method for inputting text into electronic devices
US10254917B2 (en) 2011-12-19 2019-04-09 Mz Ip Holdings, Llc Systems and methods for identifying and suggesting emoticons
US10620787B2 (en) 2014-05-30 2020-04-14 Apple Inc. Techniques for structuring suggested contacts and calendar events from messages
US10747397B2 (en) 2014-05-30 2020-08-18 Apple Inc. Structured suggestions
US10565219B2 (en) 2014-05-30 2020-02-18 Apple Inc. Techniques for automatically generating a suggested contact based on a received message
US10585559B2 (en) 2014-05-30 2020-03-10 Apple Inc. Identifying contact information suggestions from a received message
US11895064B2 (en) 2014-05-30 2024-02-06 Apple Inc. Canned answers in messages
US11575622B2 (en) 2014-05-30 2023-02-07 Apple Inc. Canned answers in messages
US10311139B2 (en) 2014-07-07 2019-06-04 Mz Ip Holdings, Llc Systems and methods for identifying and suggesting emoticons
US9690767B2 (en) 2014-07-07 2017-06-27 Machine Zone, Inc. Systems and methods for identifying and suggesting emoticons
US10579717B2 (en) 2014-07-07 2020-03-03 Mz Ip Holdings, Llc Systems and methods for identifying and inserting emoticons
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10003938B2 (en) 2015-08-14 2018-06-19 Apple Inc. Easy location sharing
US9998888B1 (en) 2015-08-14 2018-06-12 Apple Inc. Easy location sharing
US11418929B2 (en) 2015-08-14 2022-08-16 Apple Inc. Easy location sharing
US12089121B2 (en) 2015-08-14 2024-09-10 Apple Inc. Easy location sharing
US10341826B2 (en) 2015-08-14 2019-07-02 Apple Inc. Easy location sharing
WO2017044300A1 (en) * 2015-09-09 2017-03-16 Apple Inc. Emoji and canned responses
US11048873B2 (en) 2015-09-15 2021-06-29 Apple Inc. Emoji and canned responses
US10445425B2 (en) 2015-09-15 2019-10-15 Apple Inc. Emoji and canned responses
US20170160903A1 (en) * 2015-12-04 2017-06-08 Codeq Llc Methods and Systems for Appending a Graphic to a Digital Message
CN108701125A (en) * 2015-12-29 2018-10-23 Mz知识产权控股有限责任公司 System and method for suggesting emoticon
WO2017116839A1 (en) * 2015-12-29 2017-07-06 Machine Zone, Inc. Systems and methods for suggesting emoji
US10921903B2 (en) 2016-02-29 2021-02-16 Samsung Electronics Co., Ltd. Predicting text input based on user demographic information and context information
CN108700952A (en) * 2016-02-29 2018-10-23 三星电子株式会社 Text input is predicted based on user demographic information and contextual information
US10423240B2 (en) 2016-02-29 2019-09-24 Samsung Electronics Co., Ltd. Predicting text input based on user demographic information and context information
WO2017150860A1 (en) * 2016-02-29 2017-09-08 Samsung Electronics Co., Ltd. Predicting text input based on user demographic information and context information
CN109074172A (en) * 2016-04-13 2018-12-21 微软技术许可有限责任公司 To electronic equipment input picture
CN109074172B (en) * 2016-04-13 2023-01-06 微软技术许可有限责任公司 Inputting images to an electronic device
US11494547B2 (en) 2016-04-13 2022-11-08 Microsoft Technology Licensing, Llc Inputting images to electronic devices
US10372310B2 (en) 2016-06-23 2019-08-06 Microsoft Technology Licensing, Llc Suppression of input images
WO2017223011A1 (en) * 2016-06-23 2017-12-28 Microsoft Technology Licensing, Llc Emoji prediction by suppression
US10325018B2 (en) 2016-10-17 2019-06-18 Google Llc Techniques for scheduling language models and character recognition models for handwriting inputs
WO2018075191A1 (en) * 2016-10-17 2018-04-26 Google Llc Techniques for scheduling language models and character recognition models for handwriting inputs
US10762295B2 (en) 2016-11-29 2020-09-01 Samsung Electronics Co., Ltd. Apparatus and method for providing sentence based on user input
WO2018101671A1 (en) * 2016-11-29 2018-06-07 Samsung Electronics Co., Ltd. Apparatus and method for providing sentence based on user input
US11475218B2 (en) 2016-11-29 2022-10-18 Samsung Electronics Co., Ltd. Apparatus and method for providing sentence based on user input
WO2019036127A1 (en) * 2017-08-16 2019-02-21 General Electric Company Framework for rapid additive design with generative techniques
US11079738B2 (en) 2017-08-16 2021-08-03 General Electric Company Framework for rapid additive design with generative techniques
FR3072804A1 (en) * 2017-10-20 2019-04-26 Inria Institut National De Recherche En Informatique Et En Automatique COMPUTER DEVICE WITH IMPROVED TOUCH INTERFACE AND CORRESPONDING METHOD
WO2019077279A1 (en) * 2017-10-20 2019-04-25 Inria Institut National De Recherche En Informatique Et En Automatique Computer device with improved touch interface and corresponding method
US11103161B2 (en) 2018-05-07 2021-08-31 Apple Inc. Displaying user interfaces associated with physical activities
US11782575B2 (en) 2018-05-07 2023-10-10 Apple Inc. User interfaces for sharing contextually relevant media content
US10659405B1 (en) 2019-05-06 2020-05-19 Apple Inc. Avatar integration with multiple applications
US11347943B2 (en) 2019-06-01 2022-05-31 Apple Inc. Mail application features
US11194467B2 (en) 2019-06-01 2021-12-07 Apple Inc. Keyboard management user interfaces
US11074408B2 (en) 2019-06-01 2021-07-27 Apple Inc. Mail application features
US11620046B2 (en) 2019-06-01 2023-04-04 Apple Inc. Keyboard management user interfaces
US11842044B2 (en) 2019-06-01 2023-12-12 Apple Inc. Keyboard management user interfaces
CN111695357A (en) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 Text labeling method and related product

Also Published As

Publication number Publication date
GB201322037D0 (en) 2014-01-29
EP3080682A1 (en) 2016-10-19
KR20160097352A (en) 2016-08-17
KR102345453B1 (en) 2021-12-29
CN105814519B (en) 2020-02-14
CN105814519A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
US10664657B2 (en) System and method for inputting images or labels into electronic devices
CN105814519B (en) System and method for inputting image or label to electronic equipment
US10402493B2 (en) System and method for inputting text into electronic devices
US11416679B2 (en) System and method for inputting text into electronic devices
CN109074172B (en) Inputting images to an electronic device
US10191654B2 (en) System and method for inputting text into electronic devices
KR101613155B1 (en) Content-based automatic input protocol selection
WO2018222776A1 (en) Methods and systems for customizing suggestions using user-specific information
US10614154B2 (en) Methods, devices, and computer-readable medium for predicting the intended input from a user of an application in an electronic device
US20120223889A1 (en) System and Method for Inputting Text into Small Screen Devices
KR20210142891A (en) Method and apparatus for customizing natural language processing model
US10664658B2 (en) Abbreviated handwritten entry translation
US20230100964A1 (en) Data input system/example generator
CN112154442A (en) Text entry and conversion of phrase-level abbreviations
US11899904B2 (en) Text input system with correction facility
Shaik et al. Comparative Analysis of Emotion Classification using TF-IDF Vector
CN114610163A (en) Recommendation method, apparatus and medium
CN114594863A (en) Recommendation method, apparatus and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14819056

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014819056

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014819056

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20167018754

Country of ref document: KR

Kind code of ref document: A