KR20160097352A

KR20160097352A - System and method for inputting images or labels into electronic devices

Info

Publication number: KR20160097352A
Application number: KR1020167018754A
Authority: KR
Inventors: 제임스 앨리; 개리스 존스; 루크 휴이트
Original assignee: 터치타입 리미티드
Priority date: 2013-12-12
Filing date: 2014-12-12
Publication date: 2016-08-17
Also published as: EP3080682A1; CN105814519B; CN105814519A; KR102345453B1; GB201322037D0; WO2015087084A1

Abstract

전자 디바이스로 이미지/라벨을 입력하기 위한 시스템 및 방법이 개시된다. 사용자에 의한 텍스트 입력에 관련된 이미지/라벨을 예측하기 위한 시스템 및 방법이 제공된다. 제1 양상에서, 사용자에 의한 텍스트 입력을 수신하는 수단 및 이미지/라벨과 연관된 텍스트의 섹션들에 대해 트레이닝된 예측 수단을 포함하는 시스템이 제공된다. 예측 수단은, 사용자에 의한 텍스트 입력을 수신하고, 이미지/라벨과 연관된 텍스트의 섹션들에 대한, 사용자에 의한 텍스트 입력의 관련성을 결정하고, 이미지/라벨과 연관된 텍스트의 섹션들에 기초하여 사용자에 의한 텍스트 입력에 대한 이미지/라벨의 관련성을 예측하도록 구성된다. 본 발명의 시스템 및 방법은 이미지/라벨을 입력하는 것의 부담을 감소시킨다. A system and method for inputting an image / label into an electronic device is disclosed. A system and method are provided for predicting images / labels associated with text entry by a user. In a first aspect, a system is provided that includes means for receiving text input by a user and prediction means trained for sections of text associated with the image / label. The predictor means receives the textual input by the user, determines the relevance of the textual input by the user to the sections of text associated with the image / label, and determines the relevance of the textual input to the user based on the sections of text associated with the image / Lt; RTI ID = 0.0 > image / label < / RTI > The system and method of the present invention reduces the burden of inputting images / labels.

Description

FIELD OF THE INVENTION [0001] The present invention relates to systems and methods for inputting images or labels into electronic devices,

본 발명은 전자 디바이스로 이미지/라벨을 입력하기 위한 시스템 및 방법에 관한 것이다. 구체적으로, 본 발명은 사용자가 입력한 텍스트에 기초하여 디바이스로 입력될 이미지/라벨을 제의하기 위한 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for inputting images / labels into an electronic device. Specifically, the present invention relates to a system and method for presenting images / labels to be input to a device based on text entered by a user.

텍스트 작성 및 메시징 환경에 있어서, 사용자들이 워드(word) 기반의 텍스트에 이미지를 포함시키는 것이 대중화되었다. 예를 들어, 사용자들이 :-) 또는 ;-p[서양에서 통상적임] 또는 (^_^)[아시아에서 통상적임]와 같은 감정을 표현하도록 이모티콘으로 알려진 이미지의 텍스트 기반의 표현을 입력하는 것은 흔히 있는 일이다. 보다 최근에는, 이모지(emojis)라 불리는 소형 문자 크기의 이미지가 대중화되었다. 스티커도 또한 대중화되었다. 스티커는 만화와 이모지의 혼합인, 동작 또는 감정을 표현하는 캐릭터의 상세한 예시이다. In text creation and messaging environments, it has become popular for users to include images in word-based text. For example, entering a text-based representation of an image known as an emoticon to express emotions such as :-) or ;-p [common in the West] or (^ _ ^) [common in Asia It is common. More recently, small letter-sized images called emojis have become popular. Stickers were also popularized. A sticker is a detailed example of a character expressing an action or an emotion, which is a mixture of a cartoon and an emotion.

2010년 10월자로, 유니코드 (6.0) 표준은 이모지의 기술(description)로서 722 코드포인트를 할당하고 있다(예로서 U+1F60D: 하트형 눈을 갖는 웃는 얼굴 및 U+1F692: 파이어 엔진을 포함함). 메시징 서비스(예컨대, 페이스북, 왓츠앱)가 송신 및 수신될 수 있도록 이들 유니코드 문자의 각각을 렌더링하는데 사용하는 각자의 이미지 세트를 설계하는 것은 통상적인 일이다. 또한, 안드로이드 (4.1+) 및 iOS (5+) 둘 다 디폴트 폰트의 일부로서 기본적으로 이들 문자의 표현을 제공한다.As of October 2010, the Unicode (6.0) standard has assigned 722 code points as the description of the emotion (eg U + 1F60D: smiley face with heart-shaped eyes and U + 1F692: fire engine box). It is customary to design each set of images used to render each of these Unicode characters so that messaging services (e.g., Facebook, Watts apps) can be sent and received. In addition, Android (4.1+) and iOS (5+) both provide a representation of these characters by default as part of the default font.

이모지를 입력하는 것이 대중적이지만, 그리 하는 것은 여전히 어려운데, 사용자가 적절한 이모지를 발견해야 하고, 적절한 이모지를 알고 있더라도 입력하기를 원하는 것을 찾기 위해 많은 수의 가능한 이모지들을 탐색해야 하기 때문이다. Although it is popular to enter emoji, it is still difficult to do because the user has to find the appropriate emoji, and even if they know the proper emoji, they have to search a large number of possible emoji to find what they want to input.

키보드 및 메시징 클라이언트는 스크롤될 수 있는 여러 카테고리들로 이모지들이 정리되어 있는 이모지 선택 패널을 포함시킴으로써 문제를 감소시키려고 시도하였다. 이모지들이 관련 카테고리들로 그룹핑되었지만, 사용자는 여전히 사용하기를 원하는 이모지를 찾기 위하여 그 카테고리의 이모지들을 검색(search)하여야 한다. 또한, 일부 이모지들은 쉽게 분류되지 않을 수 있으며, 사용자가 그 이모지를 어느 카테고리에서 검색해야 하는지 결정하는 것을 더 어렵게 한다. Keyboard and messaging clients have attempted to reduce the problem by including an image selection panel in which the images are organized into multiple categories that can be scrolled. Although the images are grouped into related categories, the user must still search for images of the category to search for the desired image. In addition, some emoticons may not be easily classified and it is more difficult for the user to determine in which category the emoticons should be searched.

이모지들을 입력하는 것의 부담을 더 감소시키고자 시도하는 알려진 해결책들이 존재한다. 예를 들어, 여러 메시징 클라이언트는 특정 속기 텍스트(shorthand text)를 이미지로 자동으로 교체할 것이다. 예를 들어, 페이스북 메신저는, 이모티콘 :-)를 웃는 얼굴의 그림으로 변환할 것이고, 속기 텍스트 시퀀스(sequence), (y)를 메시지가 보내질 때 엄지 척(thumb up)의 그림으로 변환할 것이다.There are known solutions that attempt to further reduce the burden of inputting emoticons. For example, multiple messaging clients will automatically replace certain shorthand text with images. For example, Facebook Messenger will convert the emoticon :-) into a picture of a smiling face, and translate the shorthand text sequence (y) into a thumb-up picture when the message is sent .

또한, 구글 안드로이드 젤리빈 키보드는, 사용자가 그 이모지의 기술에 대응하는 워드를 정확하게 타이핑할 때 이모지 후보를 제의할 것이며, 예컨대 'snowflake'가 타이핑될 때, 그림

가 후보 입력으로서 사용자에게 제의된다. In addition, the Google Android Jellybean keyboard will offer an emotional candidate when the user types the correct word for the emoticon's technique, such as when the 'snowflake' is typed,

Is presented to the user as a candidate input.

이모지 입력의 부담을 감소시키려는 이들 알려진 해결책은, 여전히 사용자가 이모지를 식별하는 속기 텍스트틀 제공하거나 또는 이모지의 정확한 기술을 타이핑할 것을 요구한다. 알려진 시스템은 이모지들의 스크린을 스크롤해야 하는 요건을 없애기는 하지만, 이들은 여전히 사용자가 입력하기를 원하는 이모지를 명시적으로 그리고 정확하게 식별할 것을 요구한다.These known solutions to reduce the burden of the input of the still input still require the user to provide a shorthand text frame that identifies the anomaly or to type the exact description of the anomaly. Although the known system eliminates the requirement to scroll the screens of the images, they still require that the user explicitly and correctly identify the images they want to enter.

본 발명의 목적은, 상기 언급한 문제에 대처하고, 메시징/텍스트 작성 환경에서 이미지(예컨대, 이모지, 이모티콘 또는 스티커) 및 라벨 입력의 부담을 감소시키고자 하는 것이다.It is an object of the present invention to address the above-mentioned problems and to reduce the burden of image (e.g., emoji, emoticon or sticker) and label input in a messaging / text writing environment.

본 발명은 독립 청구항 1 및 2에 따른 시스템, 독립 청구항 32, 33, 34, 54 및 55에 따른 방법, 및 독립 청구항 56에 따른 프로그램을 제공한다. The invention provides a system according to independent claims 1 and 2, a method according to independent claims 32, 33, 34, 54 and 55, and a program according to independent claim 56.

본 발명의 선택적 특징들은 종속 청구항들의 내용에 있다. Optional features of the invention reside in the context of the dependent claims.

이제 첨부 도면을 참조하여 본 발명이 기재될 것이다.
도 1a 및 도 1b는 본 발명의 제1 시스템 타입에 따라 이미지/라벨 예측을 생성하기 위한 시스템을 도시한다.
도 2a 내지 도 2c는 도 1a 및 도 1b의 시스템에 사용될, 본 발명에 따른 대안의 이미지/라벨 언어 모델의 개략도이다.
도 3은 도 2b 및 도 2c의 언어 모델에 사용하기 위해 이미지/라벨(이 예의 경우, 이모지)과 연관된 텍스트 섹션들을 포함하는 n-gram 맵의 개략도이다.
도 4는 이미지/라벨(이 예의 경우, 이모지)과 연관된 텍스트 섹션들을 포함하는 n-gram 맵의 개략도이며, 도 2b 및 도 2c의 이미지/라벨 언어 모델에 사용하기 위해, 트레이닝 텍스트에서 식별된 이미지/라벨은 식별된 이미지/라벨의 바로 앞에 있지 않는 텍스트의 섹션들과 연관되었다.
도 5는 본 발명의 제2 시스템 타입에 따라 이미지/라벨 예측을 생성하기 위한 시스템을 도시한다.
도 6은 본 발명의 제3 시스템 타입에 따라 이미지/라벨 예측을 생성하기 위한 시스템을 도시한다.
도 7 내지 도 11은 본 발명에 따른 사용자 인터페이스의 상이한 실시예들을 예시한다.
도 12 내지 도 16은 본 발명의 발명들에 따른 흐름도를 도시한다. The present invention will now be described with reference to the accompanying drawings.
Figures 1A and 1B illustrate a system for generating image / label predictions in accordance with a first system type of the present invention.
Figures 2a-2c are schematic diagrams of an alternative image / label language model according to the present invention for use in the systems of Figures 1a and 1b.
Figure 3 is a schematic diagram of an n-gram map that includes text sections associated with an image / label (in this example, emotion) for use in the language model of Figures 2b and 2c.
Figure 4 is a schematic diagram of an n-gram map that includes text sections associated with an image / label (in this example, emotion), for use in the image / label language model of Figures 2b and 2c, The image / label was associated with sections of text not immediately preceding the identified image / label.
Figure 5 illustrates a system for generating image / label predictions in accordance with a second system type of the present invention.
Figure 6 illustrates a system for generating image / label predictions in accordance with a third system type of the present invention.
Figures 7 to 11 illustrate different embodiments of a user interface according to the present invention.
12 to 16 show flowcharts according to the inventions of the present invention.

본 발명의 시스템은 사용자 입력 텍스트에 대하여 관련된 이미지/라벨 예측을 생성하도록 구성된다. 일반적으로, 본 발명의 시스템은 이미지/라벨과 연관된 텍스트의 섹션들에 대해 트레이닝된(trained) 예측 수단을 포함한다. 예측 수단은, 사용자에 의한 텍스트 입력을 수신하고 사용자 입력 텍스트에 대한 이미지/라벨의 관련성(relevance)을 예측하도록 구성된다. The system of the present invention is configured to generate an associated image / label prediction for user input text. In general, the system of the present invention includes predictive means trained for sections of text associated with an image / label. The prediction means is configured to receive text input by a user and to predict the relevance of the image / label to the user input text.

이미지 예측은, 텍스트의 섹션과 연관될 수 있는 사진, 로고, 도면, 아이콘, 이모지 또는 이모티콘, 스티커 또는 임의의 기타 이미지를 포함하는 임의의 종류의 이미지와 관련될 수 있다. 본 발명의 바람직한 실시예에서, 이미지는 이모지이다. Image prediction may relate to any kind of image, including photographs, logos, drawings, icons, emoticons or emoticons, stickers or any other images that may be associated with sections of text. In a preferred embodiment of the present invention, the image is an image.

라벨 예측은 텍스트의 바디와 연관된 임의의 라벨과 관련될 수 있는데, 그 라벨은 텍스트의 바디를 식별하거나 분류하는데 사용된다. 따라서 라벨은 텍스트의 저자, 텍스트의 섹션들을 생성한 회사/사람, 또는 임의의 기타 관련 라벨을 지칭할 수 있다. 본 발명의 바람직한 실시예에서, 라벨은, 예를 들어 트위터(Twitter) 피드에 사용될 때 해시태그(hashtag)이다. The label prediction can be associated with any label associated with the body of the text, which is used to identify or classify the body of the text. The label may thus refer to the author of the text, the company / person who created the sections of text, or any other relevant label. In a preferred embodiment of the present invention, the label is a hashtag when used, for example, in a Twitter feed.

본 발명은 전자 디바이스로의 이미지/라벨 입력의 부담을 감소시키는 문제를 해결하기 위해 이미지/라벨 예측을 생성하는 3가지 대안의 방식들을 제공한다. 구체적으로, 해결책은, 이미지/라벨 예측을 생성하도록 언어 모델(language model)을 사용하는 것과, 복수의 통계적 모델들(statistical models)로부터 이미지/라벨 예측을 생성하도록 검색 엔진(search engine)을 사용하는 것과, 이미지/라벨 예측을 생성하도록 분류기(classifier)를 사용하는 것을 포함한다. 대안의 해결책들(즉, 대안의 예측 수단들)은 이 순서대로 기재될 것이다.The present invention provides three alternative ways of generating image / label prediction to address the problem of reducing the burden of image / label input to the electronic device. Specifically, the solution is to use a language model to generate image / label predictions and to use a search engine to generate image / label predictions from a plurality of statistical models , And using a classifier to generate image / label predictions. Alternative solutions (i.e., alternative prediction means) will be described in this order.

첫 번째 해결책에 따른 시스템은 도 1a 및 도 1b에 도시된 바와 같이 구현될 수 있으며, 도 1a 및 도 1b는 본 발명에 따른 하이 레벨 텍스트 예측 아키텍처의 블록도를 도시한다. 시스템은, 사용자 입력 텍스트에 대하여 관련된 이미지/라벨 예측(50)을 생성하도록 구성된 예측 엔진(100)을 포함한다. The system according to the first solution can be implemented as shown in FIGS. 1A and 1B, and FIGS. 1A and 1B show a block diagram of a high-level text prediction architecture according to the present invention. The system includes a prediction engine 100 configured to generate an associated image / label prediction 50 for user input text.

도 1a에서, 예측 엔진(100)은 이미지/라벨 예측(50) 및 선택적으로 워드 예측(들)(60)을 생성하기 위한 이미지/라벨 언어 모델(10)을 포함한다. 이미지/라벨 언어 모델(10)은, 일반 이미지/라벨 언어 모델, 예컨대 영어에 기초한 언어 모델일 수 있거나, 또는 애플리케이션-특유의(application-specific) 이미지/라벨 언어 모델, 예컨대 SMS 메시지나 이메일 메시지에 대해 트레이닝된 언어 모델, 또는 임의의 기타 적합한 유형의 언어 모델일 수 있다. 도 1b에 도시된 바와 같이, 예측 엔진(100)은 텍스트 단독(text-only) 언어 모델 또는 본 발명에 따른 이미지/라벨 언어 모델일 수 있는 임의의 수의 추가의 언어 모델을 포함할 수 있다. 1A, the prediction engine 100 includes an image / label language model 10 for generating image / label prediction 50 and optionally word prediction (s) The image / label language model 10 may be a generic image / label language model, such as a language model based on English, or an application-specific image / label language model, such as an SMS message or an email message A language model trained for, or any other suitable type of language model. 1B, the prediction engine 100 may include any number of additional language models, which may be a text-only language model or an image / label language model in accordance with the present invention.

도 1b에 도시된 바와 같이, 예측 엔진(100)이 추가의 언어 모델(20)과 같은 하나 이상의 추가의 언어 모델을 포함하는 경우, 예측 엔진(100)은, 각각의 언어 모델(10, 20)로부터 나오는 이미지/라벨 예측 및/또는 워드 예측을 결합하여, 디스플레이 및 사용자 선택을 위해 사용자 인터페이스에 제공될 수 있는 최종 이미지/라벨 예측(50) 및/또는 최종 워드 예측(60)을 생성하기 위한 멀티 언어 모델(30)(Multi-LM)을 포함할 수 있다. 최종 이미지/라벨 예측(50)은 바람직하게 전체 가장 유력한(probable) 예측의 세트(즉, 지정된 수)이다. 시스템은 가장 가능성있는(likely) 이미지/라벨 예측(50)만 사용자에게 제시할 수 있다. 1B, if the prediction engine 100 includes one or more additional language models, such as a further language model 20, then the prediction engine 100 may determine that each of the language models 10, To generate a final image / label prediction 50 and / or a final word prediction 60 that can be provided to the user interface for display and user selection, combining image / label prediction and / Language model 30 (Multi-LM). The final image / label prediction 50 is preferably the entire set of probable predictions (i.e., the designated number). The system can present only the most likely image / label prediction 50 to the user.

복수의 언어 모델들로부터 나온 워드 예측들을 결합하기 위한 Multi-LM(30)의 사용은 WO 2010/112841의 11페이지의 1줄 내지 12페이지의 12줄에 기재되어 있으며, WO 2010/112841은 참조에 의해 여기에 포함된다.The use of Multi-LM 30 to combine word predictions from a plurality of language models is described in line 12, line 1 to page 12, page 11 of WO 2010/112841, and WO 2010/112841, &Lt; / RTI >

예를 들어 WO 2010/112842에 상세하게 기재된 바와 같이 그리고 구체적으로 WO 2010/112842의 도 2a 내지 도 2d에 관련하여 도시된 바와 같이, 추가의 언어 모델(20)이 표준 워드 기반의 언어 모델인 경우, 표준 워드 기반의 언어 모델은 이미지/라벨 기반의 언어 모델(10)과 함께 사용될 수 있으며, 그리하여 예측 엔진(100)은 이미지/라벨 언어 모델(10)로부터 이미지/라벨 예측(50)을 그리고 워드 기반의 언어 모델(20)로부터 워드 예측(60)을 생성한다. 원하는 경우, 이미지/워드 기반의 언어 모델(10)은 또한, 워드 예측(60)의 최종 세트를 생성하도록 Multi-LM(30)에 의해 사용되는 워드 예측을 생성할 수 있다(도 2a 내지 도 2c에 관련하여 아래에 기재되는 바와 같이). 이 실시예의 추가의 언어 모델(20)은 워드만 예측할 수 있으므로, Multi-LM(30)은 최종 이미지/라벨 예측(50)을 출력할 필요가 없다. 워드 기반의 언어 모델(20)은 워드 예측을 생성하는 임의의 적합한 언어 모델로 교체될 수 있으며, 이는 UK 특허 출원 번호 1321927.4에 상세하게 설명되어 있는 바와 같이 형태소(morphemes) 또는 워드-세그먼트에 기초한 언어 모델을 포함할 수 있으며, 이 출원은 그 전체가 참조에 의해 여기에 포함된다. For example, if the additional language model 20 is a standard word-based language model, as described in detail in WO 2010/112842 and in particular in connection with Figures 2A-2D of WO 2010/112842 , A standard word based language model can be used with the image / label based language model 10 so that the prediction engine 100 can generate image / label predictions 50 from the image / Based language model 20 from word-based prediction. If desired, the image / word-based language model 10 may also generate word predictions used by the Multi-LM 30 to generate a final set of word predictions 60 (Figures 2A-2C Lt; / RTI > as described below). The Multi-LM 30 need not output the final image / label prediction 50 because the additional language model 20 of this embodiment can only predict words. The word-based language model 20 may be replaced by any suitable language model for generating word prediction, which may be morphemes or word-segment-based languages, as described in detail in UK Patent Application No. 1321927.4 Model, the entirety of which is incorporated herein by reference.

추가의 언어 모델(20)이 추가의 이미지/라벨 언어 모델인 경우, Multi-LM(30)은 둘 다의 언어 모델(10, 20)로부터 나온 이미지/라벨 예측으로부터 최종 이미지/라벨 예측(50)을 생성하도록 사용될 수 있다. If the additional language model 20 is a further image / label language model, then the Multi-LM 30 generates the final image / label prediction 50 from image / label prediction from both language models 10, Lt; / RTI >

Multi-LM(30)은 또한, WO 2010/112842의 21페이지의 첫 번째 문단에 기재된 바와 같이 그리고 본 발명의 언어 모델 실시예에 관련하여 아래에 더 상세하게 기재되는 바와 같이, 사용자 입력 텍스트를 토큰화(tokenise)하는데 사용될 수 있다. The Multi-LM 30 may also send the user input text to the token (s), as described in the first paragraph on page 21 of WO 2010/112842 and as described in more detail below in connection with the language model embodiment of the present invention. Can be used for tokenizing.

이미지/라벨 언어 모델(10)은 도 2a 내지 도 2c에 관련하여 기재될 것이며, 도 2a 내지 도 2c는 사용자 입력 텍스트를 수신하고 이미지/라벨 예측(50)(및 선택적으로 워드/단어(term) 예측(60))을 반환하는(return) 이미지/라벨 언어 모델의 개략도를 예시한다. 2A-2C illustrate an embodiment of an image / label language model 10 that receives user input text and includes an image / label prediction 50 (and optionally a word / term) Prediction 60) of the image / label language model.

정해진 언어 모델로의 2가지 가능한 입력, 즉 현재 단어 입력(11) 및 컨텍스트(context) 입력(12)이 존재한다. 언어 모델은 가능한 입력 중의 어느 하나 또는 둘 다를 사용할 수 있다. 현재 단어 입력(11)은, 시스템이 예측하려고 하는 그 단어, 예컨대 사용자가 입력하려고 시도하는 워드에 대해 시스템이 갖는 정보를 포함한다(예컨대, 사용자가 "I am working on ge"을 입력한 경우, 현재 단어 입력(11)은 'ge'임). 이는 복수문자 키스트로크(keystrokes)의 시퀀스, 개별 문자 키스트로크, 터치스크린 키패드에 걸친 연속 터치 제스처로부터 결정된 문자, 또는 입력 형태들의 혼합일 수 있다. 컨텍스트 입력(12)은 현재 단어 바로 앞에 사용자에 의해 거기까지 입력된 단어의 시퀀스를 포함하고(예컨대, "I am working"), 이 시퀀스는 Multi-LM(30) 또는 개별 토크나이저(tokeniser)(도시되지 않음)에 의해 '토큰들'로 분할된다. 시스템이 n번째 단어에 대한 예측을 생성하고 있는 경우, 컨텍스트 입력(12)은 사용자에 의해 선택되어 시스템으로 입력된 앞의 n-1 단어들을 포함할 것이다. 컨텍스트의 n-1 단어들은 단일 워드 또는 워드들의 시퀀스를 포함하거나, 또는 현재 워드 입력이 문장을 시작하는 워드에 관련된 경우 어떠한 워드도 포함하지 않을 수 있다.There are two possible inputs to a given language model: the current word input 11 and the context input 12. The language model may use any or both of the possible inputs. The current word input 11 includes information that the system has about the word the system is trying to predict, e.g., the word the user attempts to input (e.g., if the user enters "I am working on ge" The current word input (11) is 'ge'). This may be a sequence of multiple character keystrokes, a separate character keystroke, a character determined from the continuous touch gesture across the touchscreen keypad, or a mix of input forms. Context input 12 includes a sequence of words entered there by the user immediately before the current word (e.g., "I am working") and the sequence is transmitted to Multi-LM 30 or individual tokenizer Tokens "). If the system is generating a prediction for the nth word, the context input 12 will include the previous n-1 words selected by the user and entered into the system. The n-1 words of the context may contain a single word or a sequence of words, or may not contain any words if the current word input is related to a word starting a sentence.

언어 모델은 입력 모델(입력으로서 현재 단어 입력(11)을 취함) 및 컨텍스트 모델(입력으로서 컨텍스트 입력(12)을 취함)을 포함할 수 있다. The language model may include an input model (taking current word input 11 as input) and a context model (taking context input 12 as input).

도 2a에 예시된 제1 실시예에서, 언어 모델은 각각 현재 입력(11) 및 컨텍스트(12)로부터 워드 예측을 생성하기 위한 트리(trie)(13)(입력 모델의 예) 및 워드 기반의 n-gram 맵(14)(컨텍스트 모델의 예)을 포함한다. 이 언어 모델의 처음 부분은, WO 2010/112841에 상세하게 설명된 것에 대응하고 구체적으로 WO 2010/112841의 도 2a 내지 도 2d에 관련하여 기재된 것에 대응한다. 본 발명의 도 2a의 언어 모델은 또한, 트리(13) 및 n-gram 맵(14)에 의해 생성된 예측으로부터 워드 예측(60)의 최종 세트를 계산하기 위한 교집합(intersection)(15)을 포함할 수 있다. WO 2010/112841의 16페이지의 4줄 내지 17페이지의 14줄에 상세하게 기재된 바와 같이, 트리(13)는 표준 트리(WO 2010/112841의 도 3 참조) 또는 다이렉트(direct) 현재 워드-세그먼트 입력(112)으로 질의되는 근사(approximate) 트리(WO 2010/112841의 도 4a 참조)일 수 있다. 대안으로서, 트리(13)는, 여기에 참조에 의해 포함되는 WO 2010/112841의 17페이지의 16줄 내지 20페이지의 16줄에 상세하게 기재된 바와 같이(그리고 도 4b 및 도 4c에 도시됨), 현재 입력으로부터 생성된 KeyPressVector로 질의되는 확률 트리(probabilistic trie)일 수 있다. 언어 모델은 또한, 그 전의 출원에 기재된 바와 같이, 워드 예측(60)의 최종 세트를 생성하기 위한 임의의 수의 필터를 포함할 수 있다. 2a, the language model includes a trie 13 (an example of an input model) and a word-based n < RTI ID = 0.0 > -gram map 14 (an example of a context model). The first part of this language model corresponds to what is described in detail in WO 2010/112841 and specifically corresponds to what has been described in connection with Figs. 2A to 2D of WO 2010/112841. The language model of Figure 2a of the present invention also includes an intersection 15 for calculating the final set of word predictions 60 from the predictions generated by tree 13 and n-gram map 14. [ can do. As described in detail in line 14 of page 16 of page < RTI ID = 0.0 > 2010/112841, < / RTI > line 14, tree 13 may be a tree (see Figure 3 of WO 2010/112841) or a direct current word- (See FIG. 4A of WO 2010/112841), which is queried with a query 112. Alternatively, the tree 13 may be constructed as described in detail on line 16 to page 16, at page 17 of WO 2010/112841, incorporated by reference herein (and shown in Figures 4b and 4c) It can be a probabilistic trie queried with the KeyPressVector generated from the current input. The language model may also include any number of filters for generating a final set of word predictions 60, as described in previous applications.

원하는 경우, 트리에 의해 예측된 후보가 n-gram 맵에 의해 또한 예측되지 않았다면, 도 2a 및 도 2c의 언어 모델(10)의 교집합(15)은 백오프(back-off) 접근을 채용하도록 구성될 수 있다(WO 2010/112841에 기재된, 둘 다에 의해 생성된 후보만 보유하는 것이 아니라). 시스템이 검색된 컨텍스트에 대해 백오프해야 할 때마다, 교집합 메커니즘(15)은 확률에 '백오프' 패널티(이는, 예컨대 고정된 값을 곱함으로써 고정된 페널티일 수 있음)를 적용할 수 있다. 이 실시예에서, 컨텍스트 모델(예컨대, n-gram 맵)은 백오프 패널티가 적용되어 있는 유니그램(unigram) 확률을 포함할 수 있다. If desired, the intersection 15 of the language model 10 of FIGS. 2A and 2C may be configured to employ a back-off approach, if the candidate predicted by the tree is also not predicted by the n-gram map (Not just those candidates generated by both, as described in WO 2010/112841). Each time the system has to back off for a retrieved context, the intersection mechanism 15 may apply a probability to a 'back off' penalty (which may be a fixed penalty, for example, by multiplying a fixed value). In this embodiment, the context model (e.g., n-gram map) may include a unigram probability to which the backoff penalty is applied.

도 2a의 언어 모델은 워드→이미지/라벨 대응 맵(40)을 포함하며, 이는 언어 모델(10)의 각각의 워드를 하나 이상의 관련 이미지/라벨에 맵핑하고, 예컨대 워드 예측(60)이 'pizza'인 경우, 언어 모델은 이미지 예측(50)으로서 피자의 이미지(예컨대, 피자 이모지)를 출력한다. 2a includes a word-to-image / label mapping map 40 that maps each word of the language model 10 to one or more associated images / labels, such as word prediction 60, ', The language model outputs an image of the pizza as an image prediction 50 (e.g., pizza imager).

도 2b는 본 발명의 첫 번째 해결책에 따른 제2 이미지/라벨 언어 모델(10)을 예시한다. 이미지/라벨 언어 모델(10)은 컨텍스트(12) 단독에 기초하여 이미지/라벨 예측(50) 및 선택적으로 워드 예측(60)을 생성하도록 구성된다. 이 실시예에서, 이미지/라벨 언어 모델은 컨텍스트 입력(12)만 수신하고, 이는 n-gram 맵(14')을 검색하는데 사용되는 하나 이상의 워드들을 포함한다. 도 2b의 n-gram 맵(14')은 도 2a의 n-gram 맵과 상이한 방식으로 트레이닝되며, 워드→이미지/라벨 대응 맵(40)의 사용없이 이미지/라벨 언어 모델(10)이 관련 이미지/라벨 예측(50)을 생성할 수 있게 한다. 컨텍스트(12)가 없는 경우, 언어 모델(10)은 문장을 시작하는데 사용된 가장 가능성있는 워드(60)와 연관된 가장 가능성있는 이미지/라벨(50)을 출력할 수 있다. 특정 상황의 경우, 컨텍스트에만 기초하여 이미지/라벨을 예측하는 것, 예컨대 이모지의 예측이 적절할 수 있다. 다른 상황에서, 예컨대 (해시태그와 같은) 라벨의 예측에 있어서, 예측되기 전에 사용자가 라벨을 부분적으로 타이핑하였을 수 있기 때문에, 현재 워드 입력을 사용하는 것이 더 적절할 수 있다(그 자체로 또는 컨텍스트 입력에 추가하여). Figure 2B illustrates a second image / label language model 10 according to the first solution of the present invention. The image / label language model 10 is configured to generate an image / label prediction 50 and optionally a word prediction 60 based on the context 12 alone. In this embodiment, the image / label language model only receives the context input 12, which contains one or more words used to search the n-gram map 14 '. The n-gram map 14 'of FIG. 2b is trained in a manner different from the n-gram map of FIG. 2a, and the image / label language model 10 is trained without using the word- / Label prediction 50. < / RTI > In the absence of the context 12, the language model 10 may output the most probable image / label 50 associated with the most probable word 60 used to start the sentence. In certain situations, predicting the image / label based on context alone, such as predicting the emotion, may be appropriate. In other situations, it may be more appropriate to use the current word input, for example, in predicting a label (such as a hash tag), since the user may have partially typed the label before being predicted ).

제2 실시예의 n-gram 맵(14')의 예가 도 3 및 도 4에 개략적으로 예시되어 있으며, 설명을 위한 목적으로 이모지가 이미지/라벨에 대하여 선택되었다.An example of the n-gram map 14 'of the second embodiment is schematically illustrated in FIGS. 3 and 4, and for the purpose of illustration, an image was selected for the image / label.

도 3의 n-gram 맵(14')은 텍스트의 섹션들에 내장된 이미지/라벨을 포함하는 소스 데이터에 대해 트레이닝되었다. 예를 들어, 언어 모델은 트위터로부터의 데이터에 대해 트레이닝될 수 있으며, 트위트들은 이모지를 포함하는 트위트들을 수집하도록 필터링되었다. 도 3의 n-gram 맵(14')에서, 이모지(이미지/라벨에 대한 예로만 사용됨)는 언어 모델을 생성하도록 워드와 같이 취급되며, 즉 n-gram 컨텍스트 맵은 식별되었던 컨텍스트 내의 이모지를 포함한다. 예를 들어, 소스 데이터가 문장 "I am not happy about this

"을 포함하는 경우, 이모지

는 그의 앞의 컨텍스트만 따를 것이며, 예컨대 n-gram이 4개의 "happy about this

"의 깊이를 갖는 경우 그러하다. 따라서 언어 모델은, 언어 모델로 공급된 컨텍스트(12)가 "happy about this"를 포함하는 경우, 시퀀스의 다음 부분이므로, 이모지

를 예측할 것이다. n-gram 맵은 워드들의 시퀀스 및 이모지와 연관된 확률을 포함하는데, 이모지 및 워드는 확률을 할당하기 위해 마구잡이로 취급된다. 따라서, 그 트레이닝 데이터 내의 특정 컨텍스트가 주어진다면, 트레이닝 데이터 내의 출현 빈도에 기초하여 확률이 할당될 수 있다. The n-gram map 14 'of FIG. 3 has been trained for source data including images / labels embedded in sections of text. For example, the language model may be trained for data from tweeters, and tweets were filtered to collect tweets that include the emoji. In the n-gram map 14 'of FIG. 3, the image (used only as an example for an image / label) is treated like a word to produce a language model, that is, an n-gram context map, . For example, if the source data is sentence "I am not happy about this

Quot ;, "

Will follow only the context of its predecessor, for example, if the n-gram has four "happy about this

"Therefore, the language model is the next part of the sequence if the context 12 supplied in the language model contains" happy about this "

. The n-gram map contains the sequence associated with the word and the probability associated with the word, and the word and word are treated at random to assign a probability. Thus, given a particular context in the training data, probabilities can be assigned based on the frequency of occurrence in the training data.

도 4의 n-gram 맵은 소스 텍스트 내에서 식별된 이미지/라벨을 식별된 이미지/라벨 바로 앞에 있지 않는 텍스트의 섹션과 연관시킴으로써 트레이닝되었다. 이 방식으로 언어 모델을 트레이닝함으로서, 사용자가 관련 이미지/라벨을 기술하는 텍스트를 입력하지 않았고 "I am

"에 대한 "I am"과 같이 보통 이미지/라벨 바로 앞에 있을 텍스트를 입력하지 않았더라도, 언어 모델은 관련/적절한 이미지/라벨을 예측할 수 있다. 이 언어 모델(10)을 트레이닝하도록, 이미지/라벨이 소스 텍스트(예컨대, 필터링된 트위터 트위트들) 내에서 식별되고, 각각의 식별된 이미지/라벨은 그 소스 텍스트 내의 텍스트의 섹션들과 연관된다. 트위트들의 예를 사용하여, 특정 트위트의 이모지는 그 트위트로부터의 모든 n-gram과 연관된다. 예를 들어, 트위트 "I'm not happy about this

"에 대한 트레이닝은, 관련 이모지를 갖는 다음 n-gram을 생성할 것이다:The n-gram map of Figure 4 was trained by associating the identified image / label in the source text with a section of text that is not immediately preceding the identified image / label. By training the language model in this way, if the user did not enter the text describing the associated image / label and "I am

The language model can predict the relevant / appropriate image / label, even if the normal image / label immediately preceding the label, such as "I am" Is identified in the source text (e.g., filtered tweeter tweets), and each identified image / label is associated with sections of text in its source text. Using the example of tweets, For example, tweets such as "I'm not happy about this "

"Will produce the next n-gram with the relevant emotions:

○ I'm not happy

○ not happy about

○ happy about this

○ I'm not

○ not happy

등.Etc.

이러한 비-다이렉트(non-direct) 컨텍스트 n-gram 맵(14')으로부터 이모지 예측을 생성하기 위한 하나의 방식은, 사용자 입력 텍스트의 워드 시퀀스에 가장 근접하게 일치하는 n-gram 맵(14')의 워드 시퀀스들에 첨부되는 이모지들을 취하는 것이다. 사용자 입력 텍스트가 W₁W₂W₃W₄인 경우, 예측된 이모지는 시퀀스 W₁W₂W₃W₄에 첨부되는 이모지이다. 비-다이렉트 컨텍스트 n-gram 맵(14')으로부터 이모지 예측을 생성하기 위한 대안의 방식은, 사용자 입력 텍스트의 각각의 워드에 대한 이모지를 예측하는 것이며, 예컨대 사용자 입력 텍스트의 워드 시퀀스가 W₁W₂W₃W₄ 등인 경우, W₁에 대하여 제1 이모지 e₁를, W₁W₂에 대하여 제2 이모지 e₂를(W₁W₂는 워드 시퀀스 W₁W₂에 대한 이모지를 예측하는 것을 의미함), W₁W₂W₃에 대하여 e₃을, 그리고 W₁W₂W₃W₄에 대하여 e₄를 예측하는 것이다. 이모지 예측 세트(e₁, e₂, e₃, e₄)의 가중화된 평균이 이모지 예측(50)을 생성하는데 사용될 수 있으며, 즉 가장 빈번하게 예측된 이모지가 가장 가능성있는 이모지로서 출력될 것이다. 이모지 예측 세트의 가중화된 평균을 취함으로써, 이모지 예측의 컨텍스트 도달을 증가시키는 것이 가능할 수 있다. One way to generate the exception prediction from such a non-direct context n-gram map 14 'is to use an n-gram map 14' that most closely matches the word sequence of the user input text, ) &Lt; / RTI > If the user input text is W ₁ W ₂ W ₃ and W ₄ , the predicted emotion is emotion attached to the sequences W ₁ W ₂ W ₃ and W ₄ . An alternative way to generate the exception prediction from the non-direct context n-gram map 14 'is to predict the emotions for each word of the user input text, for example if the word sequence of the user input text is W ₁ W ₂ W ₃ W ₄ or the like case, the first emoji e ₁ with respect to W _1, the second emoji e ₂ with respect to W ₁ W ₂ (W _1, W ₂ is whether aunt for word sequence W ₁ W ₂ means to predict), an e ₃ with respect to W ₁ W ₂ W _3, and e ₄ to predict with respect to _{_{_{_{W 1 W 2 W 3 W 4}}}} . Emoji prediction set (e _1, e _2, e _3, e _4), weighted with which the average is used to establish the aunt not predicted 50, that is, land the most frequent prediction aunt most as possible aunt sure that the Will be output. By taking the weighted average of the immediate prediction set, it may be possible to increase the context reach of the immediate prediction.

각각의 이모지와 연관될 수 있는 텍스트의 상이한 섹션들의 수 때문에, 모델은 바람직하게 2가지 방식으로 가지치기(prune)될 수 있다. 첫 번째는 발생 빈도에 기초하여 가지치기하는 것이며, 예컨대 고정된 발생 수보다 적은 빈도 카운트를 갖는 n-gram을 가지치기하는 것이다(예컨대, 특정 n-gram 및 관련 이모지가 트레이닝 데이터에서 10번보다 적게 보이는 경우, 그 n-gram 및 관련 이모지를 제거함). Because of the number of different sections of text that can be associated with each emotion, the model can preferably be pruned in two ways. The first is to prune based on the frequency of occurrence, for example, to prune n-grams with a frequency count that is less than a fixed number of occurrences (e.g., certain n-grams and associated aids are less than 10 in training data Remove the n-grams and related implants if they are visible).

두 번째 가지치기 방식은, 유니그램 확률로부터의 확률 차이에 기초하여 가지치기하는 것이다. 예로서, 컨텍스트 "about this" 후에,

을 예측하는 확률은

의 유니그램 확률보다 훨씬 더 크지 않을 것인데, 트레이닝 또한 어떠한 특정 치우침 없이 이 [EMOJI]에 관하여 많은 다른 n-gram의 형태에 마주쳤을 것이기 때문이다. 따라서 n-gram "about this

"은 가지치기될 수 있다. 2가지 가지치기 방법의 조합도 또한 가능하며, 임의의 다른 적합한 가지치기 방법도 마찬가지로 가능하다.The second pruning method is pruning based on the probability difference from the unigram probability. As an example, after the context "about this"

The probability to predict

, The training would also have encountered many different n-gram forms about this [EMOJI] without any specific bias. Therefore, n-gram "about this

"A combination of two pruning methods is also possible, and any other suitable pruning method is likewise possible.

도 2b를 참조하면, 언어 모델(10)은 Multi-LM(30)으로부터 하나 이상의 워드들의 시퀀스(컨텍스트(12))를 수신하고 하나 이상의 워드들의 시퀀스를 n-gram 맵(14')에 저장된 워드들의 시퀀스와 비교한다. 도 3의 n-gram 맵에 관련하여, 이모지가 하나 이상의 워드들의 시퀀스 바로 다음에 이어지는 경우에만 이모지가 예측되며, 예컨데 컨텍스트 시퀀스 "not happy about this"로부터 언어 모델은 "

"를 예측할 것이다. 도 4의 n-gram 맵에 관련하여, 언어 모델이 다이렉트 및 비-다이렉트 컨텍스트에 대해 트레이닝되었기 때문에, 언어 모델은 훨씬 더 자주 이모지 예측을 생성한다. 2B, the language model 10 receives a sequence of one or more words (context 12) from the Multi-LM 30 and stores a sequence of one or more words into words (words) stored in the n-gram map 14 '&Lt; / RTI > With respect to the n-gram map of Fig. 3, the emotions are only predicted if the emotions immediately follow a sequence of one or more words. For example, from the context sequence " not happy about this "

"With respect to the n-gram map of FIG. 4, because the language model has been trained for direct and non-direct contexts, the language model generates the emotion prediction much more frequently.

도 2b에 도시된 바와 같이, 언어 모델은 이미지/라벨 예측(들)(50)과 함께 하나 이상의 워드 예측(60)을 선택적으로 출력할 수 있다. 언어 모델은 하나 이상의 워드들의 입력 시퀀스(컨텍스트(12))를 (이모지가 첨부된) 워드들의 저장된 시퀀스와 비교한다. 하나 이상의 워드들의 시퀀스를 포함하는, 워드들의 저장된 시퀀스를 식별하는 경우, 예를 들어 사용자 선택을 위해 사용자 인터페이스 상의 다음 워드(60)의 디스플레이를 위해 또는 시스템으로의 다음 워드의 다이렉트 입력을 위해, 하나 이상의 워드들의 시퀀스 다음에 이어지는, 저장된 시퀀스 내의 다음 워드를 출력한다. As shown in FIG. 2B, the language model may selectively output one or more word predictions 60 with image / label prediction (s) 50. The language model compares the input sequence of one or more words (context 12) with the stored sequence of words (with attached images). For example, for display of the next word 60 on the user interface for user selection, or for direct input of the next word into the system, one of a plurality of words, And outputs the next word in the stored sequence following the sequence of words above.

언어 모델(10)의 제3 실시예가 도 2c에 예시되어 있다. 도 2a의 언어 모델과 같이, 도 2c의 언어 모델(10)은 각각 현재 입력(11) 및 컨텍스트 입력(12)으로부터 워드 예측을 생성하기 위한 트리(13) 및 n-gram 맵(14')을 포함하고, 하나 이상의 최종 워드 예측(들)(60)을 생성하기 위한 교집합(15)을 포함한다. 제3 실시예의 n-gram 맵(14')은 제2 실시예의 n-gram 맵과 동일한 것이며, 즉 텍스트의 섹션들 내에 내장되거나 텍스트의 섹션들에 첨부된 이미지/라벨을 포함한다. 따라서 동일 n-gram 맵(14')이 이미지/라벨 예측(50) 뿐만 아니라 워드 예측(60)을 생성하는데 사용될 수 있다. A third embodiment of the language model 10 is illustrated in Fig. 2C. As with the language model of Figure 2a, the language model 10 of Figure 2c includes a tree 13 and an n-gram map 14 'for generating word predictions from the current input 11 and the context input 12, respectively (15) for generating one or more final word prediction (s) (60). The n-gram map 14 'of the third embodiment is the same as the n-gram map of the second embodiment, i.e. it includes an image / label embedded in sections of text or attached to sections of text. Thus, the same n-gram map 14 'may be used to generate word prediction 60 as well as image / label prediction 50.

상기로부터 이해되는 바와 같이, 첫 번째 해결책의 시스템은 사용자 입력 텍스트에 기초하여 이미지/라벨을 그리고 선택적으로 그 사용자 입력 텍스트에 기초하여 워드/단어를 예측한다.As will be appreciated from the above, the system of the first solution predicts the image / label based on the user input text and, optionally, the word / word based on the user input text.

첫 번째 해결책의 이미지/라벨 언어 모델(10)은 트레이닝된 n-gram 맵을 포함하는 언어 모델에 관련하여 기재되었지만, 이는 단지 예로써 이루어진 것이며, 임의의 다른 적합하게 트레이닝된 언어 모델이 사용될 수 있다. Although the image / label language model 10 of the first solution has been described in relation to a language model containing a trained n-gram map, this is done by way of example only and any other suitably trained language model may be used .

이미지/라벨 입력의 부담을 감소시키기 위한 두 번째 해결책은, 그 전체가 참조에 의해 여기에 포함되는 UK 특허 출원 1223450.6에 상세하게 설명된 바와 유사하게, 사용자 입력을 위해 이미지/라벨 예측을 생성하도록 구성된 검색 엔진에 관한 것이다. A second solution for reducing the burden of image / label input is to generate image / label predictions for user input, similar to that detailed in UK patent application 1223450.6, which is hereby incorporated by reference in its entirety Search engines.

도 5는 본 발명의 시스템의 하이 레벨 시스템 아키텍처의 블록도를 도시한다. 검색 엔진(100')은 통계적 모델 대 이미지/라벨의 일대일 맵핑을 바람직하게 포함하는 이미지/라벨 데이터베이스(70)를 사용하며, 즉 이미지/라벨 데이터베이스는 각각의 이미지/라벨(예컨대, 이모지 또는 해시태그)과 연관된 통계적 모델을 포함하고, 각각의 이미지/라벨 통계적 모델은 그 이미지/라벨과 연관된 텍스트의 섹션들에 대해 트레이닝된 것이다. 언어 모델은 통계적 모델의 비한정적인 예인데, 언어 모델은 자연 언어 내에서 일어나는 워드들의 시퀀스들의 통계적 확률을 나타내는 확률 분포이다. 첫 번째 해결책의 언어 모델(10)과는 달리, 이 해결책에 따른 언어 모델은 언어 모델 내에 이미지/라벨을 갖지 않으며, 이는 특정 이미지/라벨에 맵핑된 텍스트 단독 언어 모델이다. 5 shows a block diagram of a high-level system architecture of the system of the present invention. The search engine 100 'uses an image / label database 70 which preferably includes a one-to-one mapping of the statistical model to the image / label, i.e. the image / label database stores each image / label (e.g., Tag), and each image / label statistical model is trained for sections of text associated with the image / label. A language model is a non-limiting example of a statistical model, where the language model is a probability distribution that represents the statistical probability of sequences of words occurring in a natural language. Unlike the language model 10 of the first solution, the language model according to this solution does not have an image / label in the language model, which is a text-only language model mapped to a particular image / label.

이미지/라벨 예측(들)(50)을 생성하기 위해, 검색 엔진(100')은 이미지/라벨 데이터베이스(70) 및 사용자 입력 텍스트(12')를 사용하고, 선택적으로 하나 이상의 기타 증거 소스(12''), 예컨대 시스템의 정해진 사용자에 대한 이미지/라벨 입력 히스토리를 사용한다. 검색을 트리거하도록, 검색 엔진은 사용자 입력 텍스트(12')를 수신한다. To generate the image / label prediction (s) 50, the search engine 100 'uses the image / label database 70 and user input text 12', and optionally one or more other evidence sources 12 ''), For example, an image / label input history for a given user of the system. To trigger a search, the search engine receives user input text 12 '.

나중에 더 기재되는 바와 같이, 이미지/라벨 데이터베이스(70)는 개별 이미지/라벨을 동일한 수의 통계적 모델과 연관시키고, 선택적으로 언어 기반이 아닌 대안의 통계적 모델(도시되지 않음)과 연관시킨다(예컨대, 특정 이미지/라벨의 이전의(prior) 입력이 주어진다면 사용자 관련성(relevance)을 추정하는 모델). As further described below, the image / label database 70 associates individual images / labels with the same number of statistical models and optionally with alternative statistical models (not shown) that are not language based (e.g., A model that estimates user relevance given the prior input of a particular image / label.

검색 엔진(100')은, 사용자 입력 텍스트가 주어진다면 컨텐츠 데이터베이스 내의 각각의 이미지/라벨에 대하여 이미지/라벨이 관련된 우도(likelihood)의 추정치를 생성하기 위하여, 사용자 입력 텍스트 증거(12')를 이미지/라벨 데이터베이스(70)에 질의하도록 구성된다. 검색 엔진은 사용자에게 선택적으로 제시될 수 있는 이미지/라벨 예측(50)으로서 가장 유력한 또는 p 개의 가장 유력한 이미지/라벨을 출력한다. The search engine 100 'may provide the user input text proof 12' to the image (s) in order to generate an estimate of the likelihood associated with the image / label for each image / / Label < / RTI > The search engine outputs the most likely or p most influential images / labels as image / label predictions 50 that may be selectively presented to the user.

연관된 이미지/라벨 통계적 모델 M 하에 이미지/라벨, c가 관련된다고 하면, 사용자 입력 텍스트, e를 관찰할 확률에 대한 추정치, P는 다음과 같다:Assuming that the image / label, c, is associated with the associated image / label statistical model M, the estimate of the probability of observing the user input text, e, P is:

다음과 같이, 필요한 추정치를 계산하도록 검색 엔진에 의해 적용될 수 있는 많은 기술들이 존재한다:There are a number of techniques that can be applied by the search engine to calculate the necessary estimates as follows:

● 나이브(naive) 베이지안(Bayesian) 모델링● naive Bayesian modeling

● 최대 엔트로피 모델링● Maximum entropy modeling

● 통계적 언어 모델링● Statistical language modeling

처음 2개의 접근은, 특징들의 세트를 추출하고 생성(generative) 모델을 트레이닝하는 것에 기초하는 반면(이 경우에, 이는 이미지/라벨과 연관된 텍스트로부터 특징들을 추출하고 이들 특징에 대해 이미지/라벨 통계적 모델을 트레이닝하는 것과 동일시함), 통계적 언어 모델링은 사용자 입력 텍스트 내의 단어들에 대해 순차적 분포를 모델링하려고 시도한다. 작업 예를 제공하기 위해, 첫 번째 접근이 설명되지만, 이들은 전부 적용가능하다. The first two approaches are based on extracting a set of features and training a generative model (in this case, it extracts the features from the text associated with the image / label and applies the image / label statistical model Quot;), statistical language modeling attempts to model a sequential distribution over words in user input text. To provide a working example, the first approach is described, but all of them are applicable.

특징들의 세트는, 바람직하게 검색 엔진(100')의 일부인 임의의 적합한 특징 추출 메커니즘을 사용함으로써, 사용자 입력 텍스트로부터 추출된다. 관련 추정치를 생성하기 위해, 이들 특징들은 연관된 이미지/라벨 통계적 모델에 의해 독립적으로 생성된 것으로 가정된다. The set of features is extracted from the user input text, preferably by using any suitable feature extraction mechanism that is part of the search engine 100 '. To generate the relevant estimates, these features are assumed to be generated independently by the associated image / label statistical model.

특정 이미지/라벨에 관련된 주어진 특징의 확률의 추정치가 이미지/라벨 통계적 모델에 저장된다. 구체적으로, 이미지/라벨 통계적 모델은, 이미지/라벨과 연관된 텍스트로부터 특징들을 추출하고 그 텍스트 내의 이들 특징의 빈도를 분석함으로써, 이미지/라벨과 연관된 텍스트에 대해 트레이닝된다. An estimate of the probability of a given feature associated with a particular image / label is stored in the image / label statistical model. Specifically, the image / label statistical model is trained for the text associated with the image / label by extracting features from the text associated with the image / label and analyzing the frequency of these features within the text.

텍스트로부터 이들 특징의 생성을 위해 당해 기술 분야에서 사용되는 다양한 방법들이 존재한다. 예를 들어:There are a variety of methods used in the art for generating these features from text. E.g:

● 'Bag-of-words' 단어 유무: 특징들은 텍스트에 사용된 고유의 워드들의 세트이다. • "Bag-of-words" words: Features are a set of unique words used in text.

● Unigram: 특징들은 단순히 텍스트의 워드들이다. 이 모델은 복수 회 보이는 워드들을 비례적으로 더 큰 가중치가 주어지는 것으로 내놓는다.● Unigram: Features are simply words of text. In this model, words appearing more than once are proportionally given a larger weight.

● Term combination: 특징들은, 연속 n-gram이거나 비국부적 문장 관계를 나타내는 단어들의 조합을 포함할 수 있다. ● Term combination: Features may contain a combination of words that represent a continuous n-gram or non-local sentence relationship.

● Syntactic: 특징들은 품사 태그 또는 더 높은 레벨의 파스 트리(parse tree) 요소와 같은 구문 정보를 포함할 수 있다.Syntactic: Features may contain syntax information such as a part tag or a higher level parse tree element.

● Latent topics/clusters: 특징들은 기본적인 "주제" 또는 텍스트 내의 테마를 나타낼 수 있는 단어들의 세트/클러스터일 수 있다. ● Latent topics / clusters: Features can be a basic "subject" or a set / cluster of words that can represent a theme within the text.

바람직한 특징들은 통상적으로 개별 단어 또는 짧은 문구(n-gram)이다. 개별 단어 특징들은, 시퀀스를 단어들로 토근화하고(단어가 워드 및 형태소 및/또는 구두점과 같은 추가의 표기 항목 둘 다 나타내는 경우) 원치않는 단어들(예컨대, '불용어(stopwords)'와 같은 어떠한 구문 값도 갖지 않는 단어)을 폐기함으로써 텍스트 시퀀스로부터 추출된다. 일부 경우에, 특징들은 또한 케이스-정규화(case-normalized)될 수 있으며, 즉 하위 케이스로 변환될 수 있다. n-gram 특징들은 인접 단어들을 원자 개체로 연결시킴으로써 생성된다. 예를 들어, 텍스트 시퀀스 "Dear special friends"가 주어진다면, 개별 단어 특징들은 "Dear", "special" 및 "friends"일 것인 반면에, 바이그램(2-그램) 특징은 "Dear_special" 및 "special_friends"일 것이다. Preferred features are typically individual words or short n-grams. Individual word features can be used to invert a sequence into words (such as when words represent both word and morpheme and / or additional notation items such as punctuation), unwanted words (e.g., 'stopwords' Words that do not have a syntax value). In some cases, the features may also be case-normalized, i.e., transformed into sub-cases. The n-gram features are created by concatenating adjacent words into atomic objects. For example, given the text sequence "Dear special friends", the individual word features would be "Dear", "special" and "friends" while the Biagram (2-gram) feature would be "Dear_special" and "special_friends "would.

검색 엔진(100')의 특징 생성 메커니즘은, 선험적으로 유용한 정보를 지닌 가능성이 더 큰 것으로 알려진 것들의 중요도를 과장하기 위해, 사용자 입력 텍스트(12)로부터 추출된 특징들을 가중화하는 것이 바람직하다. 예를 들어, 단어 특징에 대하여, 이는 보통 일반적인 영어에서 워드의 희소성(scarcity)을 캡슐화하는 일종의 발견적(heuristic) 기술을 사용하여 행해지는데(예를 들어, 단어 빈도-역 문서 빈도, TFiDF(term frequency-inverse document frequency)), 흔치 않은 워드가 일반적인 워드보다 관련 이미지/라벨 통계적 모델을 나타낼 가능성이 더 많기 때문이다. TFiDF는 다음과 같이 정의된다:The feature generation mechanism of the search engine 100 'is preferably to weight features extracted from the user input text 12 to exaggerate the importance of those that are more likely to have a priori useful information. For example, for a word feature, it is usually done using a kind of heuristic technique that encapsulates the scarcity of words in normal English (for example, word frequency - inverse document frequency, TFiDF (term frequency-inverse document frequency)), and the unusual word is more likely to represent the associated image / label statistical model than a normal word. TFiDF is defined as:

여기에서,

는 단어 t가 사용자 입력 텍스트에서 발견되는 횟수이고,

는 모든 이미지/라벨 통계적 모델들에 걸쳐 t가 발견되는 이미지/라벨 통계적 모델의 수이다. From here,

Is the number of times the word t is found in the user input text,

Is the number of image / label statistical models for which t is found across all image / label statistical models.

사용자 입력 텍스트(12')의 D 특징은 실수 값의 D차원 벡터로 표현될 수 있다. 그 다음, 검색 엔진(100')에 의해 벡터 각각을 단위 길이로 변환함으로써 정규화가 달성될 수 있다. 특징들의 독립성 가정의 불리한 결과가, 상이한 길이의 사용자 입력 텍스트 샘플들이 상이한 수의 이벤트들에 의해 기술된다는 점이며, 이는 상이한 시스템 질의들에 의해 반환되는 값들의 범위에서 스퓨리어스(spurious) 차이를 초래할 수 있기 때문에, 특징 벡터를 정규화하는 것이 바람직할 수 있다. The D feature of the user input text 12 'can be represented by a D-dimensional vector of real values. Normalization can then be achieved by converting each vector to a unit length by the search engine 100 '. The disadvantageous result of the independence of features is that different lengths of user input text samples are described by different numbers of events which can result in spurious differences in the range of values returned by different system queries , It may be desirable to normalize the feature vector.

연관된 이미지/라벨 통계적 모델 M 하에 이미지/라벨, c가 관련된다고 하면 사용자 입력 텍스트, e를 관찰할 확률,

은, 사용자에 의한 텍스트 입력, e로부터 추출된 독립 특징들,

에 대한 곱으로서 계산된다:If the image / label, c, is associated under the associated image / label statistical model M, the user input text, the probability of observing e,

Text input by the user, independent features extracted from e,

Lt; / RTI >

검색 엔진(100')은 각각의 특징

를 이미지/라벨 데이터베이스(70)에 질의하도록 구성된다. 데이터베이스는 그 특징을 포함하는 모든 이미지/라벨 통계적 모델의 리스트 및 각각의 이미지/라벨 통계적 모델에 대하여 그 특징과 연관된 확률 추정치를 반환한다. 이미지/라벨 통계적 모델, M 하에 이미지/라벨, c가 주어진다면, 사용자 입력 텍스트, e를 관찰할 확률,

은, 이들 특징

를 포함하는 모든 이미지/라벨 통계적 모델 M에 대해 사용자 입력 증거 e의 모든 특징들

에 대한 확률 추정치의 곱으로서 계산된다.The search engine 100 '

To the image / label database (70). The database returns a list of all image / label statistical models that include that feature and a probability estimate associated with that feature for each image / label statistical model. Image / label statistical model, given an image / label, c under M, the user input text, the probability of observing e,

, These features

All the features of the user input evidence e for all image / label statistical models M

Lt; / RTI >

이 표현은, 사용자 입력 텍스트 e(12')에서 주어진 횟수(n_i) 발견된 각각의 고유의 특징인 것으로 g_i를 취하면(

임) , 다시 쓰여질 수 있다:This expression is obtained by taking g _i , which is each unique feature found for the number of times (n _i ) given in the user input text e (12 ')

, Can be rewritten:

검색 엔진(100')이 TFiDF 가중화를 포함한다고 가정하면, n_i는 그의 대응하는 가중치, w_i로 교체될 수 있다. 가중치 벡터 w는 사용자 입력 텍스트로부터 추출된 모든 특징들에 대한 TiFDF 스코어들를 포함하는 벡터이다. 가중치 벡터는 바람직하게 단위 길이를 갖도록 정규화된다:Assuming that search engine 100 'includes a TFiDF weighting, n _i can be replaced with its corresponding weight, w _i . The weight vector w is a vector containing TiFDF scores for all features extracted from the user input text. The weight vector is preferably normalized to have a unit length:

그리고 로그로 변환하면:And when you convert to log:

는 2개의 벡터의 내적(dot product)으로 다시 쓰여질 수 있으며, 하나는 가중치를 나타내고 다른 하나는 로그 확률을 나타낸다:

Can be rewritten as a dot product of two vectors, one representing the weight and the other representing the logarithmic probability:

상기를 계산하기 위해, 이미지/라벨 의존 특징 우도의 추정치

가 필요하다. 검색 엔진(100')은 소스 텍스트 내의 특징들의 빈도를 분석함으로써 트레이닝된 이미지/라벨 통계적 모델로부터 이 추정치를 취한다. To calculate the above, an estimate of the image / label dependent feature likelihood

. The search engine 100 'takes this estimate from the trained image / label statistical model by analyzing the frequency of features in the source text.

그러나 이 접근법에 따르면, 사용자 입력 텍스트의 임의의 특징에 대한 확률 추정치가 제로인 경우(예를 들어, 단어가 언어 모델에 존재하지 않기 때문에), 최종 확률

은 제로일 것이다. 트레이닝 코퍼스(corpus)가 희박한(sparse) 경우, 사용자 입력 텍스트 내의 모든 특징이 이미지/라벨 통계적 모델에 대한 트레이닝 코퍼스에서 발견되었을 가능성이 낮을 것이다. 따라서, 발견된 특징들의 확률 질량의 일부를 발견되지 않은 특징들에 재할당하도록 일종의 스무딩(smoothing)이 사용될 수 있다. 빈도 기반의 확률을 스무딩하기 위한 널리 일반화된 많은 기술들, 예컨대 라플라스(Laplace) 스무딩이 존재한다. However, according to this approach, if the probability estimate for any feature of the user input text is zero (e.g., because the word does not exist in the language model)

Will be zero. If the training corpus is sparse, it is unlikely that all the features in the user input text have been found in the training corpus for the image / label statistical model. Thus, a sort of smoothing may be used to reassign some of the probability mass of found features to undiscovered features. There are a number of widely generalized techniques for smoothing frequency based probabilities, such as Laplace smoothing.

그러므로 검색 엔진(100')은, 어느 이미지/라벨 통계적 모델이 가장 큰 확률 추정치를 제공하는지 결정하기 위해(이미지/라벨 통계적 모델이 대응하는 이미지/라벨에 맵핑되므로), 사용자 입력 텍스트로부터 추출된 특징들

을 이미지/라벨 데이터베이스(70)의 각각의 이미지/라벨 통계적 모델에 질의함으로써, 사용자 입력 텍스트가 주어진다면 어느 이미지/라벨(50)이 가장 관련되는지 결정할 수 있다.Thus, the search engine 100 'is able to determine which image / label statistical model provides the largest probability estimate (since the image / label statistical model is mapped to the corresponding image / label) field

By querying each image / label statistical model of the image / label database 70, it is possible to determine which image / label 50 is most relevant given the user input text.

앞서 언급된 바와 같이, 검색 엔진(100')은 추가적인 타입의 증거, 예컨대 주어진 사용자와 특별히 관련된 증거, 예컨대 이전에 생성된 언어, 이전에 입력된 이미지/라벨, 또는 소셜 컨텍스트/인구통계(예컨대, 대중적으로 사용되는 이모지의 타입이 국가/문화/연령에 따라 달라질 수 있으므로)를 고려할 수 있다.As noted above, the search engine 100 'may include additional types of evidence, such as evidence specifically related to a given user, such as previously generated language, previously entered images / labels, or social context / demographics (e.g., Since the type of emotion used in public may vary depending on the country / culture / age).

또한, 검색 엔진은 이미지/라벨 관련성의 사전 확률(prior probability), 예컨대 개별 사용자 또는 상황에 관련된 어떤 특정 증거가 없을 때에 이미지/라벨이 관련될 우도의 측정치를 고려할 수 있다. 이 사전 확률은 모든 이미지/라벨에 걸쳐 일반적인 사용 패턴의 종합 분석을 사용하여 모델링될 수 있다. 고려될 수 있는 많은 부가의 정보 소스들이 존재하며, 예를 들어 최근성(recency)(얼마나 최근에 이미지/라벨이 사용자에 의해 입력되었는지)이 중요할 수 있는데, 구체적으로 최신(up-to-date) 이미지/라벨이 특히 관련된 경우에 또는 이미지/라벨이 많은 수의 팔로워들이 팔로우하는 트위터 피드에 사용되는 경우에 그러하다.In addition, the search engine may take into account the prior probability of image / label relevance, e.g., a measure of likelihood that an image / label will be associated when there is no specific evidence related to an individual user or situation. This prior probability can be modeled using a comprehensive analysis of common usage patterns across all images / labels. There are a number of additional sources of information that can be considered, such as recent recency (how recently the image / label was entered by the user) may be important, specifically up-to-date ) Especially when the image / label is particularly relevant or when the image / label is used in a twitter feed followed by a large number of followers.

복수의 증거 소스들(12', 12'')이 고려되는 경우, 각각의 증거 소스가 주어진다면 검색 엔진(100')은 각각의 이미지/라벨에 대한 추정치를 생성한다. 각각의 이미지/라벨에 대하여, 검색 엔진은 그 이미지/라벨에 대한 전체 추정치를 생성하도록 증거 소스들에 대한 추정치들을 결합하도록 구성된다. 이를 위해, 검색 엔진(100')은 증거 소스들 각각을 독립적인 것으로서, 즉 사용자의 이미지/라벨 입력 히스토리를 텍스트 입력과 독립적인 것으로서 취급하도록 구성될 수 있다. If a plurality of evidence sources 12 ', 12 " are considered, the search engine 100' generates an estimate for each image / label if each evidence source is given. For each image / label, the search engine is configured to combine the estimates for the evidence sources to produce a global estimate for the image / label. To this end, the search engine 100 'can be configured to treat each of the evidence sources as independent, i.e., to treat the user's image / label input history as independent of the text input.

증거, E를 볼 확률,

을 계산하기 위해, 특정 이미지/라벨, c가 주어진다면, 증거, E는 비중첩적이고 상호 독립적인 세트들, [e₁,...,e_n]로 분리되는 것으로 가정되며, 이들은 타겟 이미지/라벨 c 및 연관된 모델 M_c을 조건으로 하여, 일부 분포로부터 독립적으로 생성된다. 이 독립성 가정은 다음과 같이 쓰일 수 있다: Evidence, probability of seeing E,

, E is assumed to be separated into non-overlapping and mutually independent sets, [e ₁ , ..., e _n ], given a particular image / label, c, Is generated independently from some distributions, subject to label c and associated model M _c . This independence assumption can be written as:

따라서, 확률

은 독립적인 증거 소스 e_i에 대한 확률 추정치들의 곱으로서 검색 엔진(100')에 의해 계산된다. 따라서 검색 엔진(100')은 개별 증거 추정치들을 개별적으로 계산하도록 구성된다. Therefore,

Is calculated by the search engine 100 'as the product of probability estimates for the independent evidence sources e _i. Thus, search engine 100 'is configured to separately calculate individual evidence estimates.

각각의 증거 소스와 연관된 각각의 이미지/라벨에 대한 통계적 모델, M이 존재하고, 개별 증거 소스의 상대적 영향은, 시스템이 각각의 소스에 의해 산출되는 정보의 양에 대한 경계를 지정할 수 있게 해주는 분포당 스무딩 하이퍼-파라미터(hyper-parameter)에 의해 검색 엔진(100')에 의해 제어될 수 있다. 이는 각각의 증거 소스에서의 신뢰성(confidence)으로서 해석될 수 있다. 다른 증거 소스에 비해 증거 소스에 대한 공격적인 스무딩 팩터(smoothing factor)(한정하는 경우는 균일 분포이며, 이 경우에 증거 소스는 본질적으로 무시됨)는, 이미지/라벨의 상이한 부분을 조건으로 하여 증거 소스에 대한 확률 추정치들 간의 차이를 감소시킬 것이다. 스무딩이 증가함에 따라 분포는 더 고르게 되고, 확률

에 미치는 소스의 전체적인 영향은 줄어든다. There is a statistical model, M, for each image / label associated with each evidence source, and the relative impact of the individual evidence sources is determined by the distribution that allows the system to specify the bounds on the amount of information produced by each source Can be controlled by the search engine 100 'by a hyper-parameter per smoothing hyper-parameter. This can be interpreted as confidence in each evidence source. An aggressive smoothing factor (a uniform distribution, in the case of limitation, in which case the evidence source is essentially ignored) for the evidence source relative to the other evidence sources, Lt; RTI ID = 0.0 > a < / RTI > As the smoothing increases, the distribution becomes more even and the probability

The overall effect of the source on the source is reduced.

상기 기재된 바와 같이, 하나의 예에서, 통계적 모델은 언어 모델일 수 있으며, 그리하여 복수의 이미지/라벨과 연관된 복수의 언어 모델이 존재하는데, 이들 언어 모델은 n-gram 워드 시퀀스들을 포함한다. 이러한 실시예에서, 언어 모델은 사용자 입력 텍스트에 기초하여 워드 예측을 생성하는데 사용될 수 있다(예컨대, 저장된 시퀀스에 기초하여 다음 워드를 예측하도록, 사용자 입력 텍스트의 시퀀스를 저장된 워드의 시퀀스와 비교함으로써). 따라서 시스템은 개별 언어 모델을 통해 워드 예측 뿐만 아니라 검색 엔진을 통해 이미지/라벨 예측을 생성할 수 있다. 대안으로서, 시스템은 텍스트 예측을 생성하도록 검색 엔진의 통계적 모델에 추가적으로 하나 이상의 언어 모델(예컨대, 워드 기반의 언어 모델, 형태소 기반의 언어 모델 등)을 포함할 수 있다. As described above, in one example, the statistical model may be a language model, so that there are a plurality of language models associated with the plurality of images / labels, which include n-gram word sequences. In this embodiment, the language model may be used to generate word predictions based on user input text (e.g., by comparing a sequence of user input text with a sequence of stored words to predict a next word based on a stored sequence) . Thus, the system can generate image / label predictions through search engines as well as word predictions through individual language models. Alternatively, the system may include one or more language models (e.g., a word-based language model, a morpheme-based language model, etc.) in addition to the statistical model of the search engine to generate text predictions.

프로세싱 속도를 증가시키기 위해, 검색 엔진(100')은 특정 임계치보다 더 낮은 TFiDF 값을 갖는 모든 특징들 f_i을 폐기하도록 구성될 수 있다. 낮은 TFiDF 가중치를 갖는 특징들은, 일반적으로 전체 확률 추정치에 최소한의 영향을 미칠 것이다. 또한, 낮은 TFiDF 단어('불용어')는 또한 컨텐츠 코퍼스들에 걸쳐 상당히 균일한 발생 분포를 갖는 경향이 있으며, 이는 확률 추정치에 미치는 그의 영향도 또한 클래스들에 걸쳐 상당히 균일할 것임을 의미한다. 검색 엔진(100')이 이미지/라벨 데이터베이스(70)에 질의하는데 사용하는 특징들의 수를 감소시킴으로써, 프로세싱 속도가 증가된다. In order to increase the processing speed, the search engine 100 'may be configured to discard all features f _i that have a lower TFiDF value than a certain threshold. The features with low TFiDF weights will generally have a minimal impact on the overall probability estimate. In addition, the low TFiDF word ('abbreviation') also tends to have a fairly uniform occurrence distribution across the content corpus, meaning that its effect on probability estimates will also be fairly uniform across the classes. By reducing the number of features that search engine 100 ' uses to query image / label database 70, the processing speed is increased.

대안으로서 또는 추가적으로, 검색 엔진은 상위 k개의 이미지/라벨을 조회(retrieve)하도록 구성될 수 있다. 상위 k개의 이미지/라벨 조회는 후보 이미지/라벨의 수를 감소시키기 위한 제1 패스로서 작용하며, 이는 그 다음 보다 많은 자원이 필요한 절차를 사용하여 랭킹될 수 있다. TFiDF t(범위 [0,1]에 있도록 정규화됨)를 갖는 사용자 입력 텍스트, f의 각각의 특징에 대하여, 검색 엔진은 f와 연관된 가장 높은 확률 연관을 갖는 k.t 이미지/라벨을 찾도록 구성되며, 여기에서 이 이미지/라벨 세트는 Cf_로 표기된다. 그 다음, 검색 엔진은 크기가

에 의해 위 경계지어지는 후보 이미지/라벨 세트를 획득하도록 모든 특징들

에 걸쳐 유니온(union)을 결정할 수 있다. 그 다음, 검색 엔진은 이 제한된 후보 이미지/라벨 세트에 관련하여 증거를 '채점(score)'한다. k는 이미지/라벨의 원래 수에 비교하여 작을 것이므로, 이는 상당한 성능 개선을 제공한다. 예를 들어 Apache Lucene(http://lucene.apache.org/)을 사용함으로써 또는 k 개의 가장 가까운 이웃 접근(http://en.wikipedia.org/Nearest_neighbor_search#k-nearest_neighbor) 등을 사용함으로써, 상위 k개의 이미지/라벨을 조회하기 위한 임의의 다른 적합한 해결책이 채용될 수 있다. k에 대한 값은 디바이스 능력 대 정확도 요건 및 계산 복잡도(예를 들어, 특징들의 수 등)에 따라 좌우될 것이다. 이미지/라벨 입력의 부담을 감소시키기 위한 세 번째 해결책은, 사용자 입력 텍스트에 기초하여 관련 이미지/라벨 예측을 생성하도록 분류기를 사용한다. Alternatively or additionally, the search engine may be configured to retrieve the top k images / labels. The top k image / label lookups act as a first pass to reduce the number of candidate images / labels, which can then be ranked using procedures that require more resources. For each feature of the user input text, f, having a TFiDF t (normalized to be in the range [0,1]), the search engine is configured to look for a kt image / label with the highest probability association associated with f, here, the image / label set is denoted _as Cf. Then, the search engine

To obtain a candidate image / label set that is bounded above by all features

The union can be determined. The search engine then 'scores' the evidence in relation to this limited candidate image / label set. Since k will be small compared to the original number of images / labels, this provides a significant performance improvement. For example, by using Apache Lucene (http://lucene.apache.org/) or by using k nearest neighbor approaches (http://en.wikipedia.org/Nearest_neighbor_search#k-nearest_neighbor) Any other suitable solution for querying k images / labels may be employed. The value for k will depend on the device capability versus accuracy requirements and computational complexity (e.g., number of features, etc.). A third solution to reduce the burden of image / label input uses a classifier to generate an associated image / label prediction based on the user input text.

도 6은 사용자 입력 텍스트(12')와 관련된 이미지/라벨 예측(50)을 생성하도록 분류기(100'')를 포함하는 본 발명의 제3 실시예에 따른 시스템을 예시한다. 텍스트 예측을 생성하기 위한 분류기(100'')는, 그 전체가 참조에 의해 여기에 포함되는 WO 2011/042710에 상세하게 기재되었다. 기계 학습 및 통계에 있어서, 분류(classification)는, 카테고리 멤버십이 알려져 있는 관찰(또는 인스턴스)이 들어있는 데이터의 트레이닝 세트에 기초하여, 새로운 관찰이 카테고리 세트 중 어느 것(부분 모집단(sub-population))에 속하는지 식별하는 문제이다. 분류기(100'')는 입력 데이터를 카테고리에 맵핑하며 분류를 구현하는 특징이다. 본 발명에서, 분류기(100'')는 사용자 입력 텍스트를 이미지/라벨에 맵핑하도록 구성된다. Figure 6 illustrates a system according to a third embodiment of the present invention including a classifier 100 " to generate an image / label prediction 50 associated with a user input text 12 '. A classifier 100 " for generating text predictions is described in detail in WO 2011/042710, which is hereby incorporated by reference in its entirety. In machine learning and statistics, classification is based on a training set of data containing observations (or instances) whose category membership is known, so that new observations can be made to any of the set of categories (sub-population) ). &Lt; / RTI > The classifier 100 " is a feature that maps input data to a category and implements the classification. In the present invention, the classifier 100 " is configured to map user input text to an image / label.

분류기(100'')는 이미지/라벨로 사전 라벨링된 텍스트 데이터에 대해 트레이닝되고, 사용자에 의해 시스템으로 입력된 텍스트(12)의 섹션들에 대한 실시간 이미지/라벨 예측(50)을 행한다. The classifier 100 " is trained for textual data pre-labeled with images / labels and performs real-time image / label prediction 50 on the sections of text 12 entered by the user into the system.

복수의 텍스트 소스들(80)이 분류기(100'')를 트레이닝하는데 사용된다. 복수의 텍스트 소스들(80)의 각각은 소스 데이터에서 찾은 특정 이미지/라벨과 연관된 텍스트의 섹션들 전부를 포함한다. 텍스트 소스들을 생성하는 것에 대한 자율 접근에 대하여, 특정 이미지/라벨을 포함하는 문장의 임의의 텍스트는 그 이미지/라벨과 연관된 텍스트인 것으로 취해질 수 있고, 또는 이미지/라벨 앞의 임의의 텍스트는 연관된 텍스트, 예를 들어 트위터 피드와 그의 연관된 해시태그 또는 문장과 그의 연관된 이모지인 것으로 취해질 수 있다. A plurality of text sources 80 are used to train the classifier 100 ". Each of the plurality of text sources 80 includes all of the sections of text associated with a particular image / label found in the source data. For an autonomous approach to creating text sources, any text in a sentence containing a particular image / label may be taken as being text associated with that image / label, or any text before the image / E.g., a twitter feed and its associated hash tag or sentence, and its associated anomaly.

따라서, 복수의 텍스트 소스들(80)의 각각의 텍스트 소스는 특정 이미지/라벨에 맵핑되거나 연관된다. Thus, each text source of a plurality of text sources 80 is mapped or associated with a particular image / label.

사용자 입력 텍스트(12')는 시스템의 특징 벡터 생성기(90)로 입력된다. 특징 벡터 생성기(90)는 사용자 입력 텍스트(12')를 분류 준비가 된 특징 벡터로 변환하도록 구성된다. 특징 벡터 생성기(90)는 검색 엔진 시스템에 대하여 상기 기재된 바와 같다. 특징 벡터 생성기(90)는 또한, 분류기 트레이너(95)를 통해 (복수의 텍스트 소스들로부터) 분류기를 트레이닝하는데 사용되는 특징 벡터를 생성하도록 사용된다. The user input text 12 'is input to the feature vector generator 90 of the system. The feature vector generator 90 is configured to transform the user input text 12 'into a feature vector that is ready for classification. Feature vector generator 90 is as described above for the search engine system. The feature vector generator 90 is also used to generate the feature vectors used to train the classifier (from a plurality of text sources) via the classifier trainer 95.

벡터 공간의 값 D는 실세계 분류 문제에 대하여 통상적으로 10,000 이상으로 모델에 사용된 특징들의 총 수에 의해 지배된다. 특징 벡터 생성기(90)는, 텍스트의 전체 바디에 걸쳐 그의 발생 빈도의 역(TFiDF)에 의해 정규화된, 주어진 텍스트 섹션 내의 그 단어의 발생 빈도에 관련된 값에 따라 각각의 셀을 가중화함으로써, 텍스트의 이산 섹션을 벡터로 변환하도록 구성되는데,

는 현재 소스 텍스트에서 단어 t가 발견되는 횟수이고,

는 텍스트 소스들의 전체 콜렉션에 걸쳐 t가 발견되는 소스 텍스트의 수이다. 그 다음, 각각의 벡터는 특징 벡터 생성기(90)에 의해 단위 길이로 정규화된다. The value D of the vector space is dominated by the total number of features used in the model, typically over 10,000 for real-world classification problems. The feature vector generator 90 may weight each cell according to a value related to the frequency of occurrence of that word in a given text section normalized by the inverse of its occurrence frequency TFiDF over the entire body of the text, To a vector,

Is the number of times the word t is found in the current source text,

Is the number of source texts in which t is found over the entire collection of text sources. Each vector is then normalized to a unit length by a feature vector generator 90.

특징 벡터 생성기(90)는, 사용자 입력 텍스트(12')를 특징들(통상적으로 개별 워드들 또는 짧은 문구들)로 분할하고 특징들로부터 특징 벡터를 생성하도록 구성된다. 특징 벡터는 D차원 실수값의 벡터, R^D이고, 여기에서 각각의 차원은 텍스트를 나타내는데 사용된 특정 특징을 나타낸다. 특징 벡터는 분류기(100'')(이미지/라벨 예측을 생성하도록 특징 벡터를 사용함)로 전달된다. The feature vector generator 90 is configured to divide the user input text 12 'into features (typically individual words or short phrases) and generate feature vectors from the features. The feature vector is a vector of D-dimensional real values, R ^D , where each dimension represents a particular feature used to represent the text. The feature vector is passed to the classifier 100 " (using the feature vector to generate the image / label prediction).

분류기(100')는 텍스트 소스들(80)로부터 특징 벡터 생성기(90)에 의해 생성된 특징 벡터를 사용하여 트레이닝 모듈(95)에 의해 트레이닝된다. 트레이닝된 분류기(100'')는 사용자에 의한 텍스트 입력(12')으로부터 생성된 특징 벡터를 입력으로서 취하고, 출력으로서 확률 값에 맵핑된 이미지/라벨 예측 세트를 포함하는 이미지/라벨 예측(50)을 산출한다. 이미지/라벨 예측(50)은 복수의 텍스트 소스들과 연관된/에 맵핑된 이미지/라벨 예측의 공간으로부터 끌어낸다. The classifier 100 'is trained by the training module 95 using the feature vectors generated by the feature vector generator 90 from the text sources 80. The trained classifier 100 " includes an image / label prediction 50 that takes as input the feature vector generated from the text input 12 'by the user and includes an image / label prediction set mapped to a probability value as an output, . The image / label prediction 50 draws from the space of the image / label prediction mapped to / associated with a plurality of text sources.

바람직한 실시예에서, 분류기(100')는 선형 분류기(특징들의 선형 조합의 값에 기초하여 분류 결정을 행함)이거나 또는 트레이닝 동안 잘못 분류된 모든 인스턴스들의 방향으로 동시에 가중치 벡터가 업데이트되는 경우 배치 퍼셉트론(batch perceptron) 원리에 기초한 분류기이지만, 임의의 적합한 분류기가 이용될 수 있다. 하나의 실시예에서, TAP(timed aggregate perceptron) 분류기가 사용된다. TAP 분류기는 기본적으로 이진(2클래스) 분류 모델이다. 멀티클래스 문제, 즉 복수의 이미지/라벨을 처리하기 위해, TAP 분류기가 모든 다른 이미지/라벨에 대항하여 각각의 이미지/라벨에 대하여 트레이닝되는 일-대-전부 방식이 이용된다. 분류기의 트레이닝은, 참조에 의해 여기에 포함되는 WO 2011/042710의 10페이지 26줄 내지 12페이지 8줄에 보다 상세하게 기재되어 있다. In a preferred embodiment, the classifier 100 'is either a linear classifier (which makes a classification decision based on the value of the linear combination of features) or a batch perceptron (or classifier) if the weight vector is updated simultaneously in the direction of all instances mis- batch perceptron) principle, but any suitable classifier may be used. In one embodiment, a timed aggregate perceptron (TAP) classifier is used. The TAP classifier is basically a binary (two class) classification model. One-to-one approach is used in which a TAP classifier is trained for each image / label against all other images / labels in order to handle a multi-class problem, i.e. multiple images / labels. The training of the classifier is described in more detail in WO 2011/042710, page 10, line 26 to page 12, line 8, which is hereby incorporated by reference.

분류기 트레이닝 모듈(95)은 이미 언급한 바와 같은 트레이닝 프로세스를 수행한다. 트레이닝 모듈(95)은 각각의 클래스에 대한 가중치 벡터, 즉 각각의 이미지/라벨에 대한 가중치 벡터를 산출한다. The classifier training module 95 performs the training process as already mentioned. The training module 95 calculates a weight vector for each class, i.e., a weight vector for each image / label.

타겟 라벨

과 짝지은 차원수(dimensionality) D의 N 샘플 벡터의 세트가 주어진다면, 분류기 트레이닝 절차는 최적화된 가중치 벡터

를 반환한다. 이미지/라벨이 새로운 사용자 입력 텍스트 샘플,

에 대하여 관련되는지 여부의 예측,

은 다음에 의해 결정될 수 있다:Target label

Given a set of N sample vectors with a matched dimensionality D, the categorizer training procedure may use an optimized weight vector

. Image / label This new user-entered text sample,

, &Lt; / RTI >

Can be determined by: < RTI ID = 0.0 >

(1)

(One)

여기에서 sign 함수는 그의 부호에 기초하여 임의의 실수를 +/-1로 변환한다. 디폴트 결정 경계는 불편화 초평면(unbiased hyperplane)

를 따라 놓이지만, 바이어스를 조정하도록 임계치가 도입될 수 있다. Where the sign function converts any real number to +/- 1 based on its sign. The default decision boundary is the unbiased hyperplane,

But a threshold may be introduced to adjust the bias.

각각의 이미지/라벨에 대한 신뢰성 값을 산출하도록 sign 함수 없이 분류 표현식 (1)의 수정된 형태가 사용되며, 그 결과 신뢰성 값의 M차원 벡터가 되고, 여기에서 M은 이미지/라벨의 수이다. 그리하여, 예를 들어, 벡터 샘플

로 표현되는 새로운 처음 보는 사용자 입력 텍스트 섹션이 주어진다고 하면, 다음 신뢰성 벡터

가 생성될 것이다(여기에서, 단순화를 위해 M=3임):The modified form of the classification expression (1) is used without sign function to yield a confidence value for each image / label, resulting in an M-dimensional vector of confidence values, where M is the number of images / labels. Thus, for example,

Given a new first-seen user input text section expressed as < RTI ID = 0.0 >

(Here, for simplicity, M = 3): < RTI ID = 0.0 >

모든 이미지/라벨에 대해 고른 확률을 가정하면, 분류기(100'')에 의해 생성된 이미지/라벨 신뢰성 값들은 이미지/라벨 예측 세트를 생성하는데 사용된다(가장 높은 값(가장 큰 신뢰성)을 갖는 내적이 가장 가능성있는 이미지/라벨로 매칭됨). Assuming an even probability for all images / labels, the image / label confidence values generated by the classifier 100 " are used to generate an image / label prediction set (the inverse with the highest value (highest confidence) Is matched to the most likely image / label).

이미지/라벨이 사전 확률, 예를 들어 개별 사용자 또는 상황에 관련된 어떠한 특정 증거 없이 이미지/라벨이 관련있을 우도의 측정치, 또는 사용자의 이미지/라벨 입력 히스토리에 기초한 사전 확률 등과 함께 제공되는 경우, 시스템은 가중화 모듈을 더 포함할 수 있다. 가중화 모듈(도시되지 않음)은 이미지/라벨 예측(50)의 가중화된 세트를 제공하기 위해 각각의 이미지/라벨에 대한 사전 확률을 가중화하도록 분류기에 의해 생성된 신뢰성 값의 벡터를 사용할 수 있다. If the image / label is provided with prior probabilities, e.g., a measure of likelihood that the image / label will be relevant without any specific evidence relating to the individual user or situation, or a prior probability based on the user's image / label input history, And may further include a weighting module. The weighting module (not shown) may use the vector of reliability values generated by the classifier to weight the prior probabilities for each image / label to provide a weighted set of image / have.

가중화 모듈은, 허위로 추후의 비교를 왜곡시키지 않도록, 이미지/라벨 예측 세트에 할당된 절대 확률을 준수하도록 구성될 수 있다. 따라서, 가중화 모듈은 가장 가능성있는 예측 컴포넌트로부터의 이미지/라벨 예측을 변경하지 않은 채 남기도록 구성될 수 있고, 덜 가능성있는 이미지/라벨로부터의 확률을 비례하여 다운스케일한다. The weighting module may be configured to adhere to an absolute probability assigned to the image / label prediction set so as to not falsely distort subsequent comparisons. Thus, the weighting module may be configured to leave the image / label prediction from the most likely prediction component unchanged, and downscale the probability from the less likely image / label proportionally.

분류기(100'')(또는 가중화 모듈)에 의해 출력된 이미지/라벨 예측(100'')은 사용자 선택을 위해 사용자 인터페이스 상에 디스플레이될 수 있다. The image / label prediction 100 " output by the classifier 100 " (or weighting module) may be displayed on the user interface for user selection.

상기로부터 이해할 수 있듯이, 분류기(100'')는 이미지/라벨 예측(50)을 생성하기 위해 입력 벡터의 각각의 이미지/라벨 벡터와의 내적을 생성하도록 요구된다. 따라서, 이미지/라벨의 수가 클수록, 분류기가 계산해야 하는 내적의 수도 커진다. As can be appreciated from the above, the classifier 100 " is required to produce an inner product with each image / label vector of the input vector to produce the image / Therefore, the larger the number of images / labels, the greater the number of dot products that the classifier has to calculate.

클래스의 수를 감소시키기 위해, 이미지/라벨은 함께 그룹핑될 수 있고, 예컨대 (행복과 같은) 특정 감정과 관련된 모든 이모지들이 하나의 클래스로 그룹핑될 수 있거나, 또는 의류 등과 같은 특정 주제 또는 내용에 관련된 모든 이모지들이 그러하다. 그 경우에, 분류기는 클래스를 예측할 것이며, 예를 들어 그 클래스의 감정(슬픔, 행복 등) 및 n 개의 가장 가능성있는 이모지 예측이 사용자 선택을 위해 사용자에게 디스플레이될 수 있다. 그러나, 이는 사용자가 이모지들의 더 큰 패널로부터 선택해야 하게 될 수 있다. 프로세싱 전력을 감소시키기 위해, 가장 관련된 이모지를 여전히 예측하면서, 이모지의 맞는 카테고리를 찾도록 보다 대략적인(coarse) 등급의 클래스가 사용되고, 그 대략적인 카테고리에 대해서만 보다 정교한(fine) 예측이 일어날 수 있으며, 분류기가 취해야 할 내적의 수를 감소시킨다.To reduce the number of classes, images / labels may be grouped together and all emotions associated with a particular emotion (such as happiness) may be grouped into a class or may be grouped into a single class, This is the case with all of the related emotions. In that case, the classifier will predict the class, for example the emotion (sadness, happiness, etc.) of the class and the n most likely emotional predictions may be displayed to the user for user selection. However, this may require the user to select from a larger panel of images. To reduce the processing power, a more coarse-class class is used to find the correct category of the target, while still predicting the most relevant target, and a finer prediction may only occur for that rough category , Which reduces the number of inner products that the classifier should take.

대안으로서, 제1 특징 세트가 이미지/라벨 예측의 초기 세트를 생성하도록 사용자 입력 텍스트로부터 추출될 수 있고, 제2 특징 세트가 그 이미지/라벨 예측의 초기 세트로부터 하나 이상의 가장 가능성있는 이미지/라벨 예측을 결정하도록 사용자 입력 텍스트로부터 추출될 수 있다. 프로세싱 전력을 절약하기 위해, 제1 특징 세트는 제2 특징 세트보다 수가 더 적을 수 있다. Alternatively, a first feature set may be extracted from the user input text to produce an initial set of image / label predictions, and a second feature set may be extracted from the initial set of image / Lt; RTI ID = 0.0 > text < / RTI > To conserve processing power, the first feature set may be fewer in number than the second feature set.

시스템이 대량의 이미지/라벨을 처리해야 할 경우에는, 검색 엔진(100')의 사용이 분류기(100'')보다 더 바람직하게 될 수 있는데, 검색 엔진은 대량의 이미지/라벨에 대한 확률 추정치를 결정하는 것에 더 잘 대처할 수 있는 상이한 메커니즘에 의해 이미지/라벨과 연관된 확률을 계산하기 때문이다. If the system is required to process a large number of images / labels, then the use of the search engine 100 'may be preferable to the sorter 100 ", where the search engine calculates a probability estimate for a large number of images / Because it calculates the probability associated with the image / label by a different mechanism that can better cope with the decision.

본 발명의 시스템은 광범위한 전자 디바이스에 채용될 수 있다. 비한정적인 예로써, 본 시스템은 이동 전화, PDA 디바이스, 태블릿, 또는 컴퓨터 상의 메시징, 텍스트 작성, 이메일 작성, 트위트 등에 사용될 수 있다. The system of the present invention can be employed in a wide range of electronic devices. By way of non-limiting example, the system may be used for messaging, text creation, email creation, tweeting, etc. on mobile phones, PDA devices, tablets, or computers.

본 발명은 또한 전자 디바이스에 대한 사용자 인터페이스에 관한 것이며, 사용자 인터페이스는 사용자 선택 및 입력을 위해 예측된 이미지/라벨(50)을 디스플레이한다. 이미지/라벨 예측(50)은 상기 설명된 임의의 시스템에 의해 생성될 수 있다. 아래에 보다 상세하게 기재되는 바와 같이, 사용자 인터페이스는 바람직하게, 하나 이상의 이미지/라벨 예측(50)의 디스플레이에 추가적으로, 사용자 선택을 위한 하나 이상의 워드/단어 예측(60)을 디스플레이한다. The invention also relates to a user interface for an electronic device, wherein the user interface displays the predicted image / label (50) for user selection and input. The image / label prediction 50 may be generated by any of the systems described above. As described in more detail below, the user interface preferably displays one or more word / word predictions 60 for user selection, in addition to displaying one or more image / label predictions 50.

이제 도 7 내지 도 11을 참조하여 본 발명의 실시예에 따른 사용자 인터페이스가 기재될 것이다. 도 7 내지 도 11은 단지 예로써 사용자 선택 및 입력을 위한 사용자 인터페이스 상의 이모지의 디스플레이를 예시한다. 그러나, 본 발명은 이모지의 디스플레이 및 입력에 한정되지 않고, 임의의 이미지/라벨 예측(50)에 적용가능하다. 7 to 11, a user interface according to an embodiment of the present invention will be described. Figures 7 to 11 illustrate the display of an image on the user interface for user selection and input, by way of example only. However, the present invention is not limited to the display and input of imagery, but is applicable to any image / label prediction 50.

사용자 인터페이스의 제1 실시예에서, 도 7에 예시된 바와 같이, 사용자 인터페이스는 하나 이상의(이 예에서는 3개의) 가장 가능성있는 사용자 텍스트 예측(즉, 이 예에서는 'The','I', 'What')을 디스플레이하는 하나 이상의 후보 예측 버튼(이 예에서는 3개의 후보 예측 버튼)을 포함한다. 사용자 인터페이스(150)는 또한, 현재 가장 관련된 이미지/라벨 예측(60)(바람직한 실시예에서, 이모지, 그리고 예시된 특정 예에서는 맥주 이모지)을 디스플레이하기 위한 가상 버튼(155)을 포함한다. 디바이스의 프로세싱 회로는, 제1 사용자 입력, 예를 들어 이모지를 디스플레이하는 가상 버튼(155)을 향한 터치스크린 디바이스 상의 탭은 디스플레이된 이모지를 디바이스로 입력하고; 제2 사용자 입력(제1 사용자 입력과 상이함), 예를 들어 버튼(155)을 향한 긴 누름 또는 지향성 스와이프(directional swipe)는 다른 동작으로의 메뉴, 예컨대 다음으로 가장 관련된 이모지 예측, 모든 이모지, 캐리지 반환 등을 열도록 구성된다.In the first embodiment of the user interface, as illustrated in FIG. 7, the user interface includes one or more (three in this example) most likely user text predictions (i.e., 'The', 'I' One candidate prediction button (three candidate prediction buttons in this example) for displaying a plurality of candidate prediction buttons " What " The user interface 150 also includes a virtual button 155 for displaying the currently most relevant image / label prediction 60 (in the preferred embodiment, an image and, in the particular example illustrated, a beer ember). The processing circuitry of the device may be configured such that a tab on a touch screen device towards a virtual button 155 displaying a first user input, e. The second user input (different from the first user input), e.g., a long push or directional swipe towards button 155, may be used to select a menu to another action, e.g., And return the carriage.

도 8에 예시된 사용자 인터페이스(150)의 제2 실시예에서, 워드 예측(60)에 맵핑된 이미지(예컨대, 이모지) 예측(50)이(예컨대, 도 2a의 워드→이모지 대응 맵을 통해), 매칭 워드 예측(161)과 함께, 예측 창(pane) 상에 예측(160)으로서 제시될 것이다. 따라서 후보 예측 버튼은 2개의 가장 관련된 워드 예측(3개의 후보 버튼을 갖는 사용자 인터페이스의 예의 경우) 및 가장 관련된 워드 예측에 가장 적합한 이미지(예컨대, 이모지)를 디스플레이한다. 대안으로서, 예측 창 상에 예측(160)으로서 제시된 이미지/라벨 예측은 (상기 기재된 임의의 시스템에 의해 결정된) 가장 가능성있는 이미지/라벨 예측이고, 따라서 예측 창의 워드 예측에 대응해야 할 필요는 없다. 레이아웃의 일관성을 위해, 이미지(예컨대, 이모지) 예측(160)은 항상 예측 창의 오른편에 디스플레이되며, 이미지의 위치를 찾기 쉽게 할 수 있다. 대안의 이미지(예컨대, 이모지) 예측(60)은 이미지(예컨대, 이모지) 예측 버튼(160)을 길게 누름으로써 이용가능하게 될 수 있다. 이모지 버튼(155)은 이 예측을 반영하고 또한 최근에 타이핑된 워드에 관련된 이모지를 제시한다. 이미지(예시된 예의 경우, 이모지) 버튼(155) 상의 제1 제스처(예컨대, 탭)는 버튼에 의해 디스플레이된 이모지를 삽입할 것이고, 버튼 상의 제2 제스처(예컨대, 길게 누름 또는 스와이프)는 사용자 선택을 위해 최근에 타이핑된 워드에 관련된 이모지를 디스플레이할 것이다. In a second embodiment of the user interface 150 illustrated in FIG. 8, an image (e. G., Imagery) prediction 50 mapped to the word prediction 60 (e. G. , Along with a matching word prediction 161, will be presented as a prediction 160 on a prediction window. Thus, the candidate prediction button displays the two most relevant word prediction (in the case of an example of a user interface with three candidate buttons) and the image most suitable for the most relevant word prediction (e.g., emotion). Alternatively, the image / label prediction presented as prediction 160 on the prediction window is the most likely image / label prediction (as determined by any of the systems described above), and thus need not correspond to the word prediction of the prediction window. For consistency of the layout, the image (e.g., imagery) prediction 160 is always displayed on the right side of the prediction window, making it easier to locate the image. An alternative image (e. G., Emotion) prediction 60 may be made available by long depression of an image (e. The image button 155 reflects this prediction and also presents an image related to the recently typed word. The first gesture (e.g., tab) on the image (in the illustrated example, the outer cover) button 155 will insert the image displayed by the button and the second gesture on the button (e.g., long press or swipe) And will display emotions related to recently typed words for user selection.

도 9에 예시된 사용자 인터페이스(150)의 제3 실시예에서, 현재 가장 가능성있는 이미지(예컨대, 이모지)를 디스플레이하는 이미지/라벨(예컨대, 예시된 예의 경우, 이모지) 후보 예측 버튼(165)은 예측 창 상에 영구적으로 나타난다. 현재 워드 후보(이 예에서, 'food', 'and', 'is')나 최근에 타이핑된 워드(예컨대, 'cat')와 연관된 이모지가 있을 때, 이 후보 버튼(165) 상에 제시된다. 버튼(165) 상에 디스플레이된 이모지는 버튼(165) 또는 버튼(155) 상의 제1 제스처(예컨대, 탭)를 통해 삽입될 수 있으며, 버튼(155) 또는 버튼(165) 상의 제2 제스처를 통해(예컨대, 길게 누름 또는 스와이프) 대안의 이모지가 이용가능하다. In a third embodiment of the user interface 150 illustrated in FIG. 9, an image / label (e.g., in the illustrated example, an emotion) candidate prediction button 165 ) Appears permanently on the prediction window. Is presented on this candidate button 165 when there is an emotion associated with the current word candidate (in this example, 'food', 'and', 'is') or a recently typed word (eg, 'cat' . The emotion displayed on the button 165 may be inserted via the first gesture (e.g., tab) on the button 165 or on the button 155 and through the second gesture on the button 155 or button 165 (E. G., Long press or swipe) alternatives are available.

바람직한 실시예에서, 대안의 관련 이미지(예컨대, 이모지)를 디스플레이하는 이미지/라벨 창(예컨대, 이모지 창)은 이미지/라벨 후보 예측 버튼(165)을 길게 누름으로써 액세스될 수 있다. 모든 이모지에 액세스하기 위해(가장 가능성있는 이모지로서 제의된 것만이 아니라), 사용자는 이모지 후보 예측 버튼(165)을 길게 누르고, 그의 손가락을 이모지 창 아이콘을 향해 슬라이딩한 다음, 해제한다. 이모지 창 아이콘은, '블라인드(blind) 지향성 스와이프'로 그에 액세스할도록 팝업의 먼 왼편에 있을 것이다. 팝업의 나머지는 확장된 이모지 예측들로 채워진다. In a preferred embodiment, an image / label window (e.g., an imagery window) that displays an alternative related image (e.g., an image) can be accessed by pressing and holding the image / label candidate prediction button 165. To access all of the images (not just those suggested as the most likely images), the user presses and holds the image candidate prediction button 165, slides his finger toward the image window icon, and releases it. The Emoji window icon will be on the far left of the pop-up to access it with a 'blind-directional swipe'. The rest of the pop-up is filled with extended emotion predictions.

대안의 사용자 인터페이스에서, 도 10에 예시된 바와 같이, 이미지/라벨(예컨대, 이모지)은 예측 창의 후보 버튼(170) 상에 그의 매칭 워드로 디스플레이될 수 있다. 워드는 후보 버튼(170) 상의 제1 사용자 제스처에 의해(예컨대, 버튼(170)을 탭함으로써) 삽입될 수 있으며, 이미지/라벨(예컨대, 이모지)은 후보 버튼(170) 상의 제2 사용자 제스처를 통해(예를 들어, 버튼(170)을 길게 누름으로써) 삽입된다. 또한, 원하는 경우, 사용자가 예측된 이모지(예측된 워드와 반드시 매칭되지 않을 수 있음)를 삽입할 수 있게 하거나 사용자가 대안의 이모지를 검색할 수 있게 하도록, 표준 이모지 키(155)가 앞의 사용자 인터페이스 실시예와 함께 제공될 수 있다. In an alternative user interface, an image / label (e.g., an image) may be displayed with its matching word on the candidate button 170 of the prediction window, as illustrated in FIG. The word may be inserted by a first user gesture on the candidate button 170 (e.g., by tapping the button 170) and the image / label (e.g., (For example, by depressing the button 170 for a long time). Also, if desired, a standard aide key 155 may be provided to allow the user to insert a predicted emotion (which may not necessarily match the predicted word) or to allow the user to search for alternate emoticons. Lt; RTI ID = 0.0 > embodiment < / RTI >

도 11은, 예를 들어 그 전체가 참조에 의해 여기에 포함된 앞선 출원 WO2013/107998에 상세하게 기재된 바와 같이 그리고 WO2013/107998의 도 1에 예시된 바와 같이, 이미지(예컨대, 이모지)가 어떻게 디스플레이되고 연속 터치 입력으로 삽입될 수 있는지 예시한다. 도 11의 사용자 인터페이스에서, 예측 창은 워드 예측 버튼(175) 'heart' 및 관련 이모지, 예컨대, [heart emoji]를 디스플레이하는 이모지 예측 버튼(165)을 포함한다. 텍스트 예측 'heart'를 삽입하기 위해, 사용자는 워드 예측 창으로 움직이고 워드 예측 버튼(175) 상의 위치에서 사용자 인터페이스와의 접촉으로부터 그의 손가락을 떼어낸다. 대안으로서, 사용자의 손가락이 이모지 버튼에서 들어올리지 않는 한, 워드 예측은 사용자가 사용자 인터페이스로부터 자신의 손가락을 들어올릴 때마다 삽입된다. 예를 들어, 프로세싱 회로는, 예측 엔진이 사용자 선택 및 입력을 위해 그 워드를 예측하고 디스플레이하였을 때, 사용자가 워드의 마지막 문자 또는 심지어는 중간 워드 상에 있는 동안 사용자 인터페이스로부터 그의 손가락을 들어올리는 경우 워드를 삽입하도록 구성될 수 있다. 예측된 이모지를 삽입하기 위해, 사용자는 이모지 후보 버튼(165)에서 터치스크린 인터페이스와의 접촉을 끊는다. 또한, 사용자 인터페이스에 대한 프로세싱 회로는, 사용자가 이모지 버튼(165) 상의 연속 터치 제스처를 종료하고 특정 기간 동안 이모지 버튼(165) 상에 남아있으면, 사용자 선택을 위한 대안의 이모지들의 팝업 패널(200)을 꺼내도록 구성될 수 있다. Figure 11 illustrates how an image (e.g., an emulsion) can be imaged as described in detail in earlier applications WO2013 / 107998, the entirety of which is incorporated herein by reference, and as illustrated in Figure 1 of WO2013 / Is displayed and can be inserted into the continuous touch input. In the user interface of Fig. 11, the prediction window includes a word prediction button 175 ' heart ' and an associated prediction image, e.g., [heart emoji]. To insert the text prediction " heart ", the user moves to the word prediction window and removes his finger from contact with the user interface at a location on the word prediction button 175. [ Alternatively, the word prediction is inserted whenever the user lifts his or her finger from the user interface, unless the user's finger is lifted from the outer button. For example, the processing circuitry may be used when the prediction engine predicts and displays the word for user selection and input, and when the user lifts his finger from the user interface while on the last character of the word, or even on the intermediate word Word. &Lt; / RTI > To insert the predicted emoticon, the user closes the contact with the touch screen interface at the instant candidate button 165. In addition, the processing circuitry for the user interface may allow the user to select a pop-up panel of alternate emoticons for user selection if the user has terminated the continuous touch gesture on the epilating button 165 and remains on the epilating button 165 for a specified period of time. (Not shown).

사용자 인터페이스는 다양한 '버튼'을 포함하는 것으로서 기재되었다. 단어 '버튼'은 이미지/라벨/워드가 디스플레이되는 사용자 인터페이스 상의 영역을 기재하는데 사용되는데, 디스플레이되는 그 이미지/라벨/워드는 '버튼'을 활성화함으로써, 예컨대 이미지/라벨/워드를 디스플레이하는 영역 상에서 또는 위에서 제스처함으로써, 사용자에 의해 입력될 수 있다. The user interface has been described as including various " buttons ". The word ' button ' is used to describe the area on the user interface where the image / label / word is to be displayed, such that the image / label / word to be displayed is displayed on the area displaying the image / Or by a gesture from above.

기재된 사용자 인터페이스에 의해, 사용자는 최소한의 노력으로 관련 이미지/라벨(이모지를 포함함)을 삽입할 수 잇다. With the described user interface, the user can insert relevant images / labels (including emoticons) with minimal effort.

이제 본 발명에 따른 방법의 개략적인 흐름도인 도 12 내지 도 16을 참조하여 본 발명의 방법이 기재될 것이다. The method of the present invention will now be described with reference to Figures 12-16, which is a schematic flow diagram of a method according to the present invention.

도 12를 참조하면, 본 발명은 사용자 입력 텍스트에 관련된 이미지/라벨을 예측하기 위한 예측 수단을 생성하는 방법을 제공한다. 본 발명의 다양한 시스템에 관련하여 상기에 설명된 바와 같이, 방법은 텍스트의 섹션들 내에 내장된 하나 이상의 이미지/라벨을 갖는 텍스트를 수신하고(400), 텍스트 내에 내장된 이미지/라벨을 식별하고(410), 식별된 이미지/라벨을 텍스트의 섹션들과 연관시키는 것(420)을 포함한다. 그 다음, 예측 수단은 이미지/라벨과 연관된 텍스트의 섹션들에 대해 트레이닝된다. 상기 기재된 바와 같이, 예측 수단이 언어 모델(10)일 때, 언어 모델(10)은, 예를 들어 n-gram 워드/이미지 시퀀스에 이미지/라벨을 포함시킴으로써 또는 n-gram 워드 시퀀스에 이미지/라벨을 첨부함으로써, 이미지/라벨을 포함하는 텍스트에 대해 트레이닝된다. 예측 수단이 복수의 통계적 모델을 포함하는 검색 엔진(100')일 때, 각각의 통계적 모델은 주어진 이미지/라벨에 맵핑되고 그 이미지/라벨과 연관된 텍스트에 대해 트레이닝될 수 있다. 예측 수단이 복수의 텍스트 소스에 대해 트레이닝된 분류기(100'')일 때, 각각의 텍스트 소스는 주어진 이미지/라벨과 연관된 텍스트의 섹션들을 포함한다. Referring to FIG. 12, the present invention provides a method for generating prediction means for predicting an image / label associated with a user input text. As described above in connection with the various systems of the present invention, the method includes receiving (400) text having one or more images / labels embedded within sections of text, identifying embedded images / labels within the text 410) and associating (420) the identified image / label with sections of text. The prediction means is then trained for the sections of text associated with the image / label. As described above, when the prediction means is a language model 10, the language model 10 may be implemented as an image / label by including an image / label in, for example, an n-gram word / image sequence or an n-gram word sequence To the text containing the image / label. When the prediction means is a search engine 100 'that includes a plurality of statistical models, each statistical model may be mapped to a given image / label and trained for the text associated with that image / label. When the prediction means is a classifier 100 " trained for a plurality of text sources, each text source includes sections of text associated with a given image / label.

본 발명의 두 번째 방법에서는, 도 13에 예시된 바와 같이, 예측 수단을 사용하여 사용자에 의한 시스템으로의 텍스트 입력에 관련된 이미지/라벨을 예측하는 방법이 제공되며, 예측 수단은 이미지/라벨과 연관된 텍스트의 섹션들에 대해 트레이닝된다. 방법은, 사용자에 의한 텍스트 입력을 예측 수단에서 수신하고(500), 이미지/라벨과 연관된 텍스트의 섹션들에 대한, 사용자에 의한 텍스트 입력의 관련성을 결정하고(510), 이미지/라벨과 연관된 텍스트의 섹션들에 기초하여 사용자에 의한 텍스트 입력에 대한 이미지/라벨의 관련성을 예측하는 것(520)을 포함한다. 시스템 기재에 관련하여 상기 기재된 바와 같이, 예측 수단이 검색 엔진(100')일 때, 검색 엔진(100')은, 사용자 입력 텍스트로부터 특징을 추출하고 이들 특징을 이미지/라벨 데이터베이스(70)에 질의함으로써, 사용자 입력 텍스트의 관련성을 결정한다. 데이터베이스(70)에 질의함으로써, 각각의 통계적 모델이 특정 이미지/라벨에 맵핑되기 때문에, 검색 엔진(100')은 어느 이미지/라벨 통계적 모델이 가장 관련되는지 결정할 수 있고, 따라서 이미지/라벨 예측(50)을 생성할 수 있다. 다시, 시스템에 관련하여 상기 기재된 바와 같이, 예측 수단이 분류기(100'')일 때, 분류기(100'')는, 사용자 입력 텍스트를 나타내는 특징 벡터와, (그 이미지/라벨과 연관된 텍스트의 섹션들을 포함하는 소스 텍스트로부터 생성된) 이미지/라벨을 나타내는 특징 벡터의 내적을 생성함으로써, 사용자 입력 텍스트에 대한 이미지/라벨의 관련성을 결정할 수 있다. In a second method of the present invention, a method is provided for predicting an image / label associated with text entry into a system by a user using prediction means, as illustrated in Figure 13, And are trained on sections of text. The method includes receiving (500) a text input by a user at a prediction means, determining 510 the relevance of a textual input by a user to sections of text associated with the image / label, (520) the relevance of the image / label to the text entry by the user based on the sections of the image / label. As described above with respect to the system description, when the prediction means is a search engine 100 ', the search engine 100' extracts features from the user input text and queries these features into the image / Thereby determining the relevance of the user input text. By querying the database 70, the search engine 100 'can determine which image / label statistical model is most relevant, and therefore the image / label prediction 50 (50), because each statistical model is mapped to a particular image / Can be generated. Again, when the prediction means is a classifier 100 ", as described above with respect to the system, the classifier 100 " includes a feature vector representing the user input text and a section of the text associated with that image / By generating an inner product of a feature vector representing an image / label (generated from the source text that contains the image / label).

본 발명의 세 번째 방법에서는, 도 14에 예시된 바와 같이, 사용자에 의한 시스템으로의 텍스트 입력에 관련된 이미지/라벨을 예측 수단을 사용하여 예측하기 위한 방법이 제공되며, 예측 수단은 텍스트 내에 내장된 이미지/라벨을 포함하는 텍스트에 대해 트레이닝되고, 예측 수단은, 텍스트 내의 이미지/라벨을 식별하고 식별된 이미지/라벨을 텍스트의 섹션들과 연관시킴으로써 트레이닝된 것이다. 방법은, 사용자에 의한 텍스트 입력을 예측 수단에서 수신하고(600), 사용자에 의한 텍스트 입력을 이미지/라벨과 연관된 텍스트의 섹션들과 비교하고(610), 식별된 이미지/라벨과 연관된 텍스트의 섹션들에 기초하여 사용자에 의한 텍스트 입력에 대한 이미지/라벨의 관련성을 예측하는 것(620)을 포함한다. 시스템 기재에 관련하여 상기 기재된 바와 같이, 예측 수단이 언어 모델(10)일 때, 언어 모델은 n-gram 맵(14')의 n-gram 워드/이미지 시퀀스 내의 이미지/라벨 또는 n-gram 맵(14')의 n-gram 워드 시퀀스에 첨부된 이미지/라벨을 포함할 수 있다. 언어 모델은, 사용자 입력 텍스트를 저장된 n-gram 시퀀스와 비교하고, 저장된 n-gram의 일부이거나 또는 저장된 n-gram에 첨부되어 있는 관련 이미지/라벨을 출력함으로써, 관련 이미지/라벨(50)을 예측한다. 대안으로서, 언어 모델은 워드 기반의 n-gram 맵(14) 및 이미지와 연관된 텍스트의 섹션들(즉, 워드들)에 대해 트레이닝된 워드→이미지 대응 맵(40)을 포함한다. 언어 모델은, 워드 시퀀스를 맵(14)의 저장된 n-gram과 비교한 다음, 대응 맵(40)을 사용하여 이 예측된 워드를 이미지에 맵핑함으로써, 사용자 입력 워드들의 시퀀스에서 다음 워드를 예측하도록 구성된다. In a third method of the present invention, as illustrated in FIG. 14, a method is provided for predicting an image / label associated with text entry into the system by a user using prediction means, Is trained for text including an image / label, and the prediction means is trained by identifying an image / label in the text and associating the identified image / label with sections of the text. The method includes receiving (600) a text input by a user at the prediction means, comparing the text input by the user to sections of text associated with the image / label (610), comparing the section of text associated with the identified image / (620) the relevance of the image / label to the text entry by the user based on the image / As described above with reference to the system description, when the prediction means is a language model 10, the language model is an image / label in the n-gram word / image sequence of the n-gram map 14 ' 14 ') < / RTI > The language model predicts the associated image / label 50 by comparing the user input text with the stored n-gram sequence and outputting the associated image / label that is part of the stored n-gram or attached to the stored n-gram do. Alternatively, the language model includes a word-to-image correspondence map 40 trained for a word-based n-gram map 14 and sections of text associated with the image (i.e., words). The language model compares the word sequence with the stored n-gram of the map 14 and then maps the predicted word to the image using the corresponding map 40 to predict the next word in the sequence of user input words .

본 발명의 세 번째 및 네 번째 방법은, 이미지/라벨 예측(50)을 생성하기 위해 상기 기재된 시스템 중의 하나 이상을 포함하는 디바이스의 터치스크린 사용자 인터페이스와의 사용자의 상호작용에 관한 것이다. 구체적으로, 본 발명의 세 번째 방법은, 키보드를 갖는 터치스크린 사용자 인터페이스를 포함하는 전자 디바이스로 데이터를 입력하는 방법을 제공하며, 사용자 인터페이스는 사용자 선택을 위해 예측된 이미지/라벨을 디스플레이하도록 구성된 가상 이미지/라벨 버튼을 포함한다. 방법은 키보드에 걸쳐 연속 제스처를 통해 문자 시퀀스를 입력하는 것(700)을 포함한다. 이미지/라벨 가상 버튼에 걸친 사용자 제스처에 응답하여, 방법은 데이터로서 이미지/라벨을 입력하는 것(720)을 포함한다. 제스처는 이미지/라벨 가상 버튼에서 사용자 인터페이스와의 접촉을 끊는 것을 포함할 수 있다. The third and fourth methods of the invention relate to user interaction with a touch screen user interface of a device comprising one or more of the systems described above to generate image / label prediction 50. Specifically, a third method of the present invention provides a method of inputting data into an electronic device including a touch screen user interface having a keyboard, the user interface including a virtual interface configured to display the predicted image / Image / label button. The method includes inputting (700) a character sequence through a continuous gesture across the keyboard. In response to the user gesture over the image / label virtual button, the method includes inputting image / label 720 as data. The gesture may include breaking the contact with the user interface at the image / label virtual button.

네 번째 방법은, 예측된 워드/단어 및/또는 예측된 이미지/라벨을 디스플레이하도록 구성된 가상 버튼을 포함하는 터치스크린 사용자 인터페이스 상에서 워드/단어의 입력(entry)과 그 워드/단어에 대응하는 이미지/라벨의 입력 간에 선택하기 위한 방법에 관한 것이다. 방법은, 버튼 상의/버튼에 걸친 제1 제스처 타입의 수신에 응답하여 예측된 워드/단어를 입력하고(800), 버튼 상의/버튼에 걸친 제2 제스처 타입에 응답하여 예측된 이미지/라벨을 입력하는 것(810)을 포함한다. The fourth method includes inputting a word / word on a touch screen user interface including a virtual button configured to display the predicted word / word and / or predicted image / label and the image / To a method for selecting between the inputs of a label. The method includes inputting (800) a predicted word / word in response to receiving a first gesture type over a / button on the button, inputting the predicted image / label in response to a second gesture type across the / button on the button 810 < / RTI >

상기 기재로부터 명백하듯이, 본 발명은 사용자 입력 텍스트에 기초하여 이모지/스티커를 예측하기 위한 시스템 및 방법을 제공함으로써 상기 언급한 문제를 해결한다. 본 발명은 하나 또는 여러 개의 관련 이모지 예측을 제의함으로서 이모지 입력의 속도를 증가시킬 수 있으며, 이는 사용자가 원하는 것을 식별하기 위해 상이한 이모지들을 스크롤해야 하는 것으로부터 사용자를 구한다. As is apparent from the above description, the present invention solves the above-mentioned problem by providing a system and method for predicting emotions / stickers based on user input text. The present invention can increase the speed of the immediate input by suggesting one or more related immediate prediction which obtains the user from having to scroll through different aids to identify what the user wants.

또한, 본 발명의 시스템 및 방법은, 사용자가 관련된 또는 적절한 이모지가 존재한다는 것을 알지 못할지라도, 다음 워드 예측/보정에 기초한 이모지의 예측 및 이모지가 예측되고 사용자에게 제시될 수 있는 컨텍스트 수단으로서, 증가된 이모지 발견능력을 제공한다. In addition, the system and method of the present invention can also be used as contextual means in which predictions and emotions of emotions based on the next word predictions / corrections can be predicted and presented to the user, even if the user does not know that relevant or appropriate emotions exist, And provides the ability to detect abnormal emissaries.

따라서 본 발명의 시스템 및 방법은 전자 디바이스로의 효율적인 이모지 선택 및 입력을 제공한다. 가능한 이모지들을 스크롤해야 하는 것이 아니라, 사용자는 예측된 이모지를 디스플레이하는 가상 키의 탭에 의해 관련 이모지를 삽입할 수 있다. The system and method of the present invention thus provides efficient emotional selection and input to electronic devices. Rather than having to scroll through the possible emoticons, the user can insert the relevant epilating by the tab of the virtual key that displays the predicted emoticons.

예들은 이모지에 관련하여 제공되었지만, 본 발명은 앞서 기재된 바와 같이 사용자 입력 텍스트와 관련된 임의의 이미지/라벨의 삽입에 동등하게 적용가능하다. While examples have been provided with respect to emotions, the present invention is equally applicable to the insertion of any image / label associated with a user input text as described above.

본 발명은 또한, 프로세서로 하여금 본 발명에 따른 방법의 하나 이상을 수행하게 하는 컴퓨터 프로그램이 저장되어 있는 컴퓨터 판독가능한 매체를 포함한 컴퓨터 프로그램 제품을 제공한다. The present invention also provides a computer program product comprising a computer readable medium having stored thereon a computer program for causing a processor to perform one or more of the methods according to the present invention.

컴퓨터 프로그램 제품은 데이터 캐리어 외부의 프로세서, 즉 전자 디바이스의 프로세서가 본 발명에 따른 방법을 수행하게 하기 위한 컴퓨터 프로그램 수단이 저장되어 있는 데이터 캐리어일 수 있다. 컴퓨터 프로그램 제품은 또한, 예를 들어 데이터 캐리어로부터 또는 인터넷이나 다른 이용가능한 네트워크를 통해 서플라이어로부터, 다운로드가 이용가능할 수 있으며, 예컨대 (이동 전화와 같은) 모바일 디바이스로 앱으로서 다운로드되거나, 또는 컴퓨터, 이동 디바이스 또는 다운로드되면 컴퓨터 프로그램 수단을 실행하기 위한 프로세서를 포함하는 컴퓨터로 다운로드될 수 있다. The computer program product may be a data carrier in which a processor external to the data carrier, i. E. A computer program means, for causing a processor of the electronic device to perform the method according to the present invention is stored. The computer program product may also be downloadable from, for example, a data carrier, or from a supplier via the Internet or other available network, downloaded as an app to a mobile device (e.g., a mobile phone) A device, or a processor for executing computer program means once downloaded.

본 명세서는 단지 예로써 이루어진 것임을 알아야 할 것이며, 청구항에 정의된 본 발명의 범위로부터 벗어나지 않고서 기재된 실시예에 대한 대안 및 수정이 행해질 수 있다.It is to be understood that this disclosure is made only by way of example, and alternatives and modifications may be made to the embodiments described without departing from the scope of the invention as defined in the claims.

Claims

A system configured to predict an image / label associated with text entry by a user,
Means for receiving text input by a user; And
Comprising prediction means trained for sections of text associated with an image / label,
Wherein,
Receive text input by the user;
Determining relevance of textual input by the user to sections of text associated with the image / label;
And to predict the relevance of the image / label to the text entry by the user based on the sections of text associated with the image / label.

A system configured to predict an image / label associated with text entry by a user,
Means for receiving text input by a user; And
Wherein the means for predicting is adapted to identify the image / label in the text and to correlate the identified image / label with the sections of the text to identify the image / However,
Wherein,
Receive text input by the user;
Compare the textual input by the user with sections of text associated with the image / label;
And to predict the relevance of the image / label to textual input by the user based on the sections of text associated with the identified image / label.

3. The system of claim 2, wherein the prediction means is a language model trained for text including a plurality of images / labels embedded with the text.

The system of claim 1, wherein the prediction means is a search engine that includes a plurality of statistical models corresponding to a plurality of images / labels, each of the plurality of statistical models being associated with a corresponding image / &Lt; / RTI > are trained for sections of text.

The system according to claim 1, wherein the predicting means is a classifier trained for a plurality of text sources, each textual source comprising sections of text associated with a particular image / label.

The system of claim 1, 4, or 5, wherein the system comprises a feature generation mechanism configured to extract a set of features from a text input by the user.

7. The system of claim 6, wherein the feature generation mechanism is configured to generate a feature vector from the features, wherein the feature vector represents text input by the user.

8. The system of claim 7, wherein the feature generation mechanism is configured to weight features by their term-frequency-inverse document frequency.

The system of claim 7 or claim 8, wherein the feature generation mechanism is configured to extract features from each of the plurality of text sources and generate a feature vector for each image / label.

The system of claim 9, wherein the classifier is trained for feature vectors for images / labels.

11. The computer-readable medium of claim 10, wherein the classifier comprises a feature vector representing a text associated with the image / label and a textual input by the user to determine whether the image / And generates a dot product of the feature vector representing the feature vector.

8. The method of claim 7 or claim 8, wherein the feature generation mechanism is configured to extract features from sections of text associated with an image / label and to train a corresponding statistical model for the extracted features system.

13. The system of claim 12, wherein the search engine is configured to query each statistical model for each feature of text input by the user to determine the presence and frequency of the feature.

The system of claim 4 or 5, wherein the system further comprises a model that includes a prior probability for the images / labels based on prior use or previous input of images / labels by the user .

The system of any one of claims 1 to 14, wherein the identified image / label is associated with sections of text that are not immediately preceding the identified image / label.

The system according to any one of claims 1 to 15, wherein the text input does not correspond to a description of the image / label.

The system of claim 15 or claim 16, wherein the language model comprises an n-gram map comprising word sequences associated with an image / label.

18. The apparatus of claim 17, wherein the prediction means comprises a prediction engine,
Wherein the prediction engine comprises:
A language model trained for text including a plurality of images / labels embedded with the text; And
Means for generating a sequence of one or more words from text input by the user,
Wherein the prediction engine comprises:
Receive a sequence of one or more words;
Compare the sequence of one or more words with a stored sequence of one or more words associated with the image / label;
And to predict the associated image / label for the sequence of one or more words based on a stored sequence of one or more words associated with the image / label.

19. The system of claim 18, wherein the prediction engine is configured to predict a next word in the sequence of one or more words based on a stored sequence of one or more words associated with the image / label.

3. The method of claim 2, wherein the predicting means comprises: a word-based language model comprising stored sequences of words; a map trained on words associated with images / labels, maps images to words, Means for generating a sequence of one or more words from a text input,
Compare the sequence of one or more words with a stored sequence of words to predict a next word in the sequence of one or more words;
And to predict an image associated with the next word using the map.

The image processing apparatus according to any one of claims 1 to 20,
Further comprising a word-based language model comprising stored sequences of words,
Wherein,
Generate a sequence of one or more words from text input by the user;
Compare the sequence of one or more words with a stored sequence of words in the word-based language model;
And to predict the next word in the sequence based on the stored sequence of words.

The system according to any one of claims 1 to 21, wherein the image is an emoji, an emoticon or a sticker.

The system of any one of claims 1 to 22, wherein the label is a hash tag.

24. The system of any one of claims 1 to 23, wherein the prediction engine is configured to output the image / label if it is determined to be associated with text input by the user.

In an electronic device,
A system according to any one of the preceding claims, And
And a user interface configured to receive user input and display the predicted image / label.

26. The system of claim 25,
And a virtual image / label button configured to display the predicted image / label for user selection.

The system of claim 26, wherein the user interface further comprises a word / word virtual button configured to display predicted words / words for user selection.

29. The system of claim 26 or 27, wherein the user interface is configured to accept a user text input as a continuous gesture across a keypad, and wherein the interface is responsive to the gesture over the image / And the predicted image / label is input.

28. The method according to any one of claims 25 to 27,
Further comprising a processing circuit, the processing circuit comprising:
A first user input on the user interface and a second user input on the user interface as inputs, the first user input being different from the second user input in at least one aspect;
Displaying the predicted image / label on the display as user input data in response to receiving the first user input toward the virtual button;
In response to receiving the second user input toward the virtual button, display an alternative image / label prediction on the display for user selection.

26. The system of claim 25,
Further comprising a virtual button configured to display predicted word / word and / or predicted image / label for user selection, wherein said word / word corresponds to said image / label.

32. The method of claim 30, wherein the processing circuitry is further configured to distinguish between two gesture types across the virtual button on the virtual button, wherein in response to receiving the first gesture type over the virtual button / Wherein the predicted image / label is entered in response to a second gesture type across the virtual button / over the virtual button.

A method for generating prediction means for predicting an image / label associated with text input by a user,
Receiving text containing one or more images / labels embedded within sections of text;
Identifying an image / label embedded within the text;
Associating the identified image / label with sections of the text; And
Training the prediction means for sections of text associated with the image / label.

A method of predicting an image / label associated with text input to a system by a user using prediction means, the prediction means being trained for sections of text associated with an image / label,
Receiving the text input by the user at the prediction means;
Determining a relevance of textual input by the user to sections of text associated with the image / label; And
Predicting the relevance of the image / label to textual input by the user based on the sections of text associated with the image / label.

A method for predicting an image / label associated with text entry into a system by a user using prediction means, the prediction means being trained for text containing an image / label embedded in the text, Identifying an image / label within the text and associating the identified image / label with sections of the text, the method comprising:
Receiving the text input by the user at the prediction means;
Comparing text input by the user with sections of text associated with the image / label; And
And predicting the relevance of the image / label to the text entry by the user based on the sections of text associated with the identified image / label.

35. The method of claim 34, wherein the means for predicting is a language model and the method comprises training the language model for text comprising a plurality of images / labels embedded within the text.

34. The method of claim 33 or claim 34, wherein the predicting means is a search engine comprising a plurality of statistical models corresponding to a plurality of images / labels, the method comprising the steps of: Training each of the plurality of statistical models with respect to the sections.

34. The method of claim 33, wherein the means for predicting is a classifier, the method comprising training the classifier for a plurality of textual sources, each textual source comprising sections of text associated with a particular image / / RTI >

37. The method of claim 36 or claim 37, further comprising extracting a set of features from a text input by the user using a feature generation mechanism.

39. The method of claim 38, further comprising generating a feature vector from features using the feature generation mechanism, wherein the feature vector represents text input by the user.

41. The method of claim 39, further comprising: determining word frequency versus inverse document frequency for each feature; and weighting features of the feature vector with the determined values.

39. The method of claim 39, further comprising extracting features from each of the plurality of text sources using the feature generation mechanism and generating feature vectors for each image / label Way.

45. The method of claim 41, further comprising training the classifier for feature vectors for images / labels.

42. The method of claim 41,
To determine whether the image / label is associated, if the text input by the user is given, by using a classifier representing a feature vector representing a text associated with the image / label and a feature vector representing a text input by the user &Lt; / RTI >

39. The method of claim 38 or claim 39, further comprising: extracting features from sections of text associated with the image / label using the feature generation mechanism; and training a corresponding statistical model for the extracted features &Lt; / RTI >

45. The method of claim 44, further comprising querying each statistical model of each feature of text input by the user using the search engine to determine the presence of a feature and its frequency.

45. The method of any one of claims 33 to 45, wherein the identified image / label is associated with sections of the text that are not immediately preceding the identified image / label.

47. The method of any one of claims 33 to 47, wherein the textual input does not correspond to a description of the image / label.

47. The method of claim 46 or claim 47, wherein the language model comprises an n-gram map comprising word sequences associated with an image / label,
Generating a sequence of one or more words from a text input by the user;
In a prediction engine including the language model, receiving a sequence of the one or more words;
Comparing the sequence of one or more words with a stored sequence of one or more words associated with an image / label; And
Predicting an associated image / label for the sequence of one or more words based on a stored sequence of one or more words associated with the image / label.

49. The method of claim 48, further comprising predicting, using the prediction engine, the next word in the sequence of one or more words based on a stored sequence of one or more words associated with the image / label.

34. The computer-readable medium of claim 33, wherein the means for predicting comprises a word-based language model having stored sequences of words, a map trained for words associated with images and maps images to appropriate words,
Generating a sequence of the one or more words from the input text; And
Comparing the sequence of one or more words with a stored sequence, predicting a next word in the sequence of one or more words, and using the map to identify an image associated with the next word.

The method of any one of claims 33 to 49, wherein the predicting means further comprises a word-based language model comprising stored sequences of words,
Generating a sequence of one or more words from a text input by the user;
Comparing the sequence of one or more words with a stored sequence of words in the word-based language model; and
And predicting a next word in the sequence based on a stored sequence of the words.

51. The method of any one of claims 32- 51, wherein the image is an emotion, emoticon, or sticker.

The method of any one of claims 32 to 52, wherein the label is a hash tag.

A method of inputting data into an electronic device including a touch screen user interface having a keyboard, the user interface comprising a virtual image / label button configured to display a predicted image / label for user selection, ,
Inputting a character sequence through a continuous gesture across the keyboard; And
Responsive to the gesture across the image / label virtual button, entering the image / label as data.

A selection between a word / word entry on the touch screen user interface including a virtual button configured to display a predicted word / word and / or a predicted image / label and an input of an image / label corresponding to the word / word In the method,
Inputting the predicted word / word in response to receiving a first gesture type over the button / over the button; And
Entering the predicted image / label in response to a second gesture type over the / button.

A computer program for causing a processor to perform the method of any of claims 32 to 55.