RU2444072C2

RU2444072C2 - System and method for using content features and metadata of digital images to find related audio accompaniment

Info

Publication number: RU2444072C2
Application number: RU2008125058/28A
Authority: RU
Inventors: Бартел М. СЛЕЙС (NL); Бартел М. СЛЕЙС; Марк ВЕРБЕРКТ (NL); Марк ВЕРБЕРКТ; Кун Х. Й. ВРИЛИНК (NL); Кун Х. Й. ВРИЛИНК; Альберт РЕЙКАРТ (NL); Альберт РЕЙКАРТ; Вилхелмус Ф. Й. ФОНТЕЙН (NL); Вилхелмус Ф. Й. ФОНТЕЙН
Original assignee: Конинклейке Филипс Электроникс, Н.В.
Priority date: 2005-11-21
Filing date: 2006-11-15
Publication date: 2012-02-27
Also published as: JP2009516951A; EP1958203A2; KR101329266B1; RU2008125058A; US8171016B2; CN101313364A; JP5457676B2; KR20080085848A; CN101313364B; WO2007057850A2; US20080256100A1; WO2007057850A3

Abstract

FIELD: information technology.

SUBSTANCE: invention discloses a system, apparatus and method to automatically play/suggest at least one audio accompaniment while a sequence of digital images is being displayed such that the audio accompaniment matches the content of the particular sequence of images and matches any provided and/or generated image metadata. Search terms are derived from the images themselves as well as any metadata provided by the user. These search terms are then used to find audio accompaniment which either contains these search terms or synonyms thereof in the image or associated text (e.g., song text) or represents the sound normally associated with the images, such as rushing water sound for an image of a fast flowing brook.

EFFECT: automatic playing of audio accompaniment matching content of digital images.

17 cl, 3 dwg

Description

Данное изобретение относится к использованию метаданных последовательности цифровых изображений, чтобы идентифицировать и связать с ними звуковое сопровождение, включая музыку и звук, имеющие текст и метаданные, которые аналогичны метаданным изображения, и создать последовательность, объединенную с идентифицированным звуком для презентации пользователю в качестве предлагаемого списка проигрывания (плей-листа) или в качестве смотрового и звукового показа.This invention relates to the use of metadata in a sequence of digital images to identify and associate soundtracks, including music and sound, having text and metadata that are similar to image metadata, and to create a sequence combined with the identified sound for presentation to the user as a suggested playlist (playlist) or as a viewing and sound show.

Все чаще и чаще системы потребителя сочетают хранение и воспроизведение различных типов контента. В результате система, которую пользователь использует для воспроизведения цифровых фотографий, часто к тому же будет в состоянии воспроизвести музыкальное сопровождение во время отображения этих фотографий. Проблема состоит в том, что эта возможность вводит дополнительную задачу для пользователя, а именно найти и выбрать соответствующую музыку для сопровождения демонстрации фотослайдов (последовательности цифровых изображений).Increasingly, consumer systems combine the storage and playback of various types of content. As a result, the system that the user uses to play digital photographs will often also be able to play back music while these photographs are displayed. The problem is that this feature introduces an additional task for the user, namely, to find and select the appropriate music to accompany the demonstration of photo slides (a sequence of digital images).

Следовательно, нужен способ, чтобы автоматически воспроизвести (или предлагать) звуковое сопровождение, которое соответствует контенту конкретной последовательности цифровых изображений.Therefore, a method is needed to automatically play back (or offer) sound that matches the content of a particular sequence of digital images.

В одном варианте осуществления система, устройство и способ согласно данному изобретению обеспечивают возможность использовать метаданные цифровых изображений (таких как фотографии, фотоальбомы, домашнее видео) для поиска музыки, которая содержит текст песни, имеющий отношение к этим фотографиям.In one embodiment, the system, device, and method of the invention provides the ability to use metadata of digital images (such as photographs, photo albums, home videos) to search for music that contains song lyrics related to these photographs.

Чтобы достигнуть этого:To achieve this:

(1) текстовые метки выводятся из (набора/последовательности) изображений или из метаданных изображений, и(1) text labels are derived from (set / sequence) of images or from image metadata, and

(2) эти текстовые метки или ключевые фразы включают в себя, по меньшей мере, одно ключевое слово и используются, чтобы найти звуковое сопровождение, которое содержит такие же ключевые фразы в заголовке музыки или в (основных/главных) частях текста песни, ассоциированных с музыкой или с метаданными звукозаписи.(2) these text labels or key phrases include at least one keyword and are used to find a soundtrack that contains the same key phrases in the music title or in the (main / main) parts of the lyrics associated with music or sound recording metadata.

Ко многим классическим музыкальным произведениям были написаны слова, например "Я всегда преследую радуги" - это песня, написанная на музыку “Fantasie Impromptu” Шопена. Также множество музыкальных произведений, которые не содержат никаких слов, являются подходящими как музыкальный фон для демонстраций слайдов, например демонстрация слайдов морского курорта ассоциирована со звуками океана. В дополнение к музыке, звуковые дорожки предоставлены как звуковое сопровождение. Эти последние упомянутые звуковые сопровождения должны иметь ассоциированные метаданные, описывающие их контент.Words have been written for many classical pieces of music, for example, “I always chase rainbows” - this is a song written to the music of Chopin's “Fantasie Impromptu”. Also, many musical works that do not contain any words are suitable as a musical background for slide shows, for example, a slide show of a seaside resort is associated with the sounds of the ocean. In addition to music, soundtracks are provided as soundtracks. These last mentioned sounds should have associated metadata describing their content.

ПЕРЕЧЕНЬ ЧЕРТЕЖЕЙLIST OF DRAWINGS

Фиг.1 иллюстрирует функциональную блок-схему последовательности операций способа согласно данному изобретению;Figure 1 illustrates a functional block diagram of the operational sequence of the method according to this invention;

фиг.2 иллюстрирует устройство согласно данному изобретению для того, чтобы ассоциировать музыку с песнями; иfigure 2 illustrates a device according to this invention in order to associate music with songs; and

фиг.3 иллюстрирует систему согласно данному изобретению.figure 3 illustrates a system according to this invention.

Специалисты в данной области техники должны понимать, что последующие описания предоставляются в качестве примера, а не ограничения. Специалист понимает, что есть множество вариаций, которые заложены в сущности изобретения и объеме, определенном приложенной формулой изобретения. Ненужные детали известных функций и операций могут быть пропущены в текущем описании, чтобы не делать данное изобретение неясным.Those skilled in the art should understand that the following descriptions are provided by way of example and not limitation. The specialist understands that there are many variations that are inherent in the essence of the invention and the scope defined by the attached claims. Unnecessary details of known functions and operations may be omitted in the current description so as not to obscure the invention.

В предпочтительном варианте осуществления контент цифрового изображения и характеристики, предоставленные пользователем, используются, чтобы получить текстовые метки (метаданные изображения), которые могут быть ключевой фразой, включающей в себя по меньшей мере одно ключевое слово, полученное из предоставленного пользователем названия/метки изображения, или ключевой фразой, полученной из текстовой аннотации изображения. В предпочтительном варианте осуществления контент последовательности изображений распознается, используя оптическое распознавание символов (OCR) для текстового контента, распознавание сцен изображения для сцен и распознавание объектов изображения для объектов (включая людей и животных). Характеристики изображения, предоставленные пользователем, обрабатываются как метаданные изображения и могут включать в себя такие элементы, как специфические для конкретного пользователя данные (например, этническая принадлежность, пол, возраст, вид деятельности), информацию времени создания и места создания изображения(й), которые преобразовываются в значащие имена или в ключевые фразы, включающие в себя стандартизированные ключевые слова и описывающие наиболее вероятное происходящее событие и местоположение (например, день рождения, Рождество, Нью-Йорк, Париж, летний праздник…).In a preferred embodiment, the digital image content and characteristics provided by the user are used to obtain text labels (image metadata), which may be a keyword phrase including at least one keyword derived from a user-provided image name / label, or a key phrase derived from a text annotation image. In a preferred embodiment, image sequence content is recognized using optical character recognition (OCR) for text content, image scene recognition for scenes, and image object recognition for objects (including humans and animals). Image characteristics provided by the user are processed as image metadata and may include elements such as user-specific data (e.g. ethnicity, gender, age, type of activity), information about the time of creation and the place of creation of the image (s) that translate into meaningful names or key phrases that include standardized keywords and describe the most likely occurring event and location (e.g., birthday oia, Christmas, New York, Paris, summer holiday ...).

В одном варианте осуществления алгоритмы анализа контента изображения используются, чтобы классифицировать пейзаж изображений и предоставлять метаданные, описывающие изображения, которые могут использоваться в качестве поисковых терминов для осуществления поиска в базе данных, индексированных по ключевым фразам звуковых сопровождений. Слова или метки, ассоциированные с таким классом, также используются для получения текстовых меток (метаданных изображения), которые затем используются, чтобы найти соответствующий звуковой контент. Например, можно выявить зимнюю сцену из полной белизны. Идентификация такой характеристики преобразовывается в некоторые текстовые метаданные (метку), такие как зима, снег, белизна. Эти метаданные могут затем использоваться для поиска звукозаписи, имеющей тексты песен и звуковые метаданные о зиме, снеге и т.д.In one embodiment, image content analysis algorithms are used to classify an image landscape and provide metadata describing images that can be used as search terms to search a database indexed by keyword phrases of audio. Words or tags associated with such a class are also used to obtain text tags (image metadata), which are then used to find the corresponding audio content. For example, you can reveal a winter scene of complete whiteness. The identification of such a characteristic is converted into some text metadata (label), such as winter, snow, whiteness. This metadata can then be used to search for sound recordings that have lyrics and sound metadata about winter, snow, etc.

В предпочтительном варианте осуществления метаданные для изображения комбинируются. Например, информация о времени может быть преобразована в "Рождество", тогда как информация анализа сцены дает в результате (из числа других слов) "Белизна". Данное изобретение находит песню "Белое Рождество" в этом примере и звук падающего снега.In a preferred embodiment, the metadata for the image is combined. For example, time information can be converted to "Christmas", while scene analysis information results in (from among other words) "White". The present invention finds the song "White Christmas" in this example and the sound of falling snow.

В предпочтительном варианте осуществления выбранное звуковое сопровождение растягивается во времени, чтобы покрыть последовательность изображений, потому как, в общем, не желательно для подобной последовательности изображений перескакивать на другое звуковое сопровождение для каждого из составляющих изображений. Поэтому предпочтительней объединить метаданные, доступные для последовательности изображений, в описание целой последовательности. Тогда, используя объединенные метаданные, может быть выбрано по меньшей мере одно звуковое сопровождение, которое подходит для всей последовательности изображений.In a preferred embodiment, the selected soundtrack is stretched in time to cover the sequence of images, because, in general, it is not desirable for such a sequence of images to jump to another soundtrack for each of the constituent images. Therefore, it is preferable to combine the metadata available for the image sequence into a description of the whole sequence. Then, using the combined metadata, at least one soundtrack that is suitable for the entire sequence of images can be selected.

Далее, в альтернативном варианте осуществления преобразование метаданных в значимые текстовые метки улучшено наличием доступной информации о пользователе (например, собственное географическое положение для обеспечения соответствующего уровня детализации местоположения, региональный/культурный фон для получения подходящих событий, личная/семейная информация для того, чтобы определить праздники, и т.д.).Further, in an alternative embodiment, the conversion of metadata to meaningful text labels is improved by the availability of user information (e.g., one’s own geographic location to provide an appropriate level of granularity of location, regional / cultural background to receive suitable events, personal / family information to determine holidays , etc.).

В предпочтительном варианте осуществления звуковые эффекты предоставляются как звуковое сопровождение, которое имеет отношение к метаданными изображения(й). Например, демонстрация слайдов вечеринки может быть украшена звуком стаканов. Изображение толпы может инициировать звук бормотания.In a preferred embodiment, sound effects are provided as sound that relates to the metadata of the image (s). For example, a party slideshow can be decorated with the sound of glasses. An image of a crowd can trigger a murmur.

Ссылаясь теперь на фигуру 1, проиллюстрирован один пример блок-схемы последовательности операций предпочтительного варианта осуществления способа согласно данному изобретению. На этапе 102 изображение или последовательность изображений 101 вводятся наряду с ассоциированными метаданными 101, которые являются, по меньшей мере, одними из принятых и созданных по меньшей мере для одного изображения, и изображение и метаданные затем сохраняются в краткосрочной постоянной памяти 103. Метаданные изображения могут быть введены пользователем или могут быть получены согласно данному изобретению и могут включать в себя: дату, время, событие, местоположение, взаимосвязь изображения с пользователем или другой описатель. Система, устройство и способ согласно данному изобретению могут включать в себя словарь терминов и их синонимов 104.1, используемых для того, чтобы привести любые метаданные, введенные пользователем, к стандартному набору, например мам, мать, ма, мама и т.д. все относятся к идентичной характеристике "мама" изображения.Referring now to FIG. 1, one example of a flowchart of a preferred embodiment of a method according to the invention is illustrated. At step 102, an image or image sequence 101 is entered along with associated metadata 101, which is at least one of the received and created for at least one image, and the image and metadata are then stored in the short-term read-only memory 103. Image metadata can be entered by the user or can be obtained according to this invention and may include: date, time, event, location, the relationship of the image with the user or other descriptor. The system, device and method according to this invention may include a glossary of terms and their synonyms 104.1, used to bring any metadata entered by the user to a standard set, for example moms, moms, moms, moms, etc. all refer to the identical characteristic “mother” of the image.

Подобным образом способ 100 согласно данному изобретению может включать в себя возможности анализа изображения для создания метаданных изображения, например белой зимней сцены. Оба типа метаданных, содержащих введенные пользователем метаданные и созданные системой метаданные, сохраняются в постоянной памяти 103 вместе с последовательностью изображений, и по меньшей мере один тип должен быть сохранен для системы 300, устройства 200 и способа 100 согласно данному изобретению, чтобы обнаружить соответствующее звуковое сопровождение для вводимой последовательности изображений.Similarly, the method 100 of the present invention may include image analysis capabilities for generating image metadata, for example, a white winter scene. Both types of metadata containing user-entered metadata and system-generated metadata are stored in read-only memory 103 along with a sequence of images, and at least one type must be stored for system 300, device 200, and method 100 of the present invention in order to detect appropriate audio for the input image sequence.

Предусмотрена база данных звукового сопровождения 104, которая ранее была аннотирована (индексирована) стандартизированными метаданными звукового сопровождения. Используя стандартизированные захваченные/созданные метаданные изображения, сохраненные в краткосрочной постоянной памяти 103, на этапе 105 в предоставленной базе данных музыки 104 осуществляется поиск соответствующих метаданных музыки. На этапе 106 все соответствующие метаданные музыки компонуются в плей-лист, ассоциированный с изображением (изображениями), и сохраняются в постоянной памяти 103. В предпочтительном варианте осуществления также выводится степень соответствия, например изображение зимы и белого фона, и предоставленная пользователем дата 25 декабря приведет к 100%-ному совпадению с "Белым Рождеством" и меньшему совпадению с "Ходьбой В Зимней Стране чудес". На этапе 107 результаты поиска извлекаются из постоянной памяти 103, и наилучшее соответствие либо проигрывается, либо предоставляется в ранжированном списке предлагаемых музыкальных сопровождений, во время отображения изображений. В предпочтительном варианте осуществления и изображения, и звуковые аннотации сохраняются в базе данных 108 для последующего извлечения, отображения и проигрывания.A sound database 104 is provided, which was previously annotated (indexed) by standardized sound metadata. Using the standardized captured / created image metadata stored in the short-term read-only memory 103, at step 105, the music music database 104 is searched for the corresponding music metadata. At step 106, all relevant music metadata is compiled into a playlist associated with the image (s) and stored in the permanent memory 103. In a preferred embodiment, a degree of correspondence is also displayed, for example, an image of winter and a white background, and the date provided by the user on December 25 will result 100% coincidence with White Christmas and less coincidence with Walking in the Winter Wonderland. At step 107, the search results are retrieved from the read-only memory 103, and the best match is either lost or provided in the ranked list of suggested music during image display. In a preferred embodiment, both images and sound annotations are stored in a database 108 for subsequent retrieval, display and playback.

Ссылаясь теперь на фигуру 2, проиллюстрировано устройство 200 для того, чтобы ассоциировать звуковые сопровождения с изображениями, в то время как изображения отображаются, или представить пользователю плей-лист звуковых сопровождений. Устройство включает в себя модуль захвата/создания 201 метаданных изображения, который принимает изображение, последовательность изображений и метаданные, описывающие изображение и последовательность изображений. Метаданные включают в себя дату, время, время года, событие, отношение к пользователю, имя (имена) персоны (персон)/домашнего животного (животных), местоположение изображения и последовательности изображений. Введенные пользователем метаданные захватываются модулем 201 и также создаются модулем 201 посредством анализа изображения, например океан или озеро, острова и т.д. Как только изображение и его метаданные захвачены и метаданные для введенной последовательности изображений созданы модулем 201, они сохраняются в краткосрочной постоянной памяти 103. Тогда модуль 203 поиска/ассоциирования осуществляет поиск по базе данных 104 на предмет соответствующих звуковых сопровождений, основываясь на метаданных, и модуль 204 предложения/проигрывания выполняет по меньшей мере одно из предложения плей-листа и проигрывания наиболее релевантного звукового сопровождения, найденного в результате поиска. Устройство 200 дополнительно содержит модуль 202 отображения последовательности изображений для отображения последовательности изображений одновременно с проигрыванием модулем 204 предложения/проигрывания звукового сопровождения, являющегося результатом поиска. В предпочтительном варианте осуществления результаты сохраняются в базе данных 108 аннотированных изображений для будущего извлечения и проигрывания. Каждый из модуля 201 захвата/создания метаданных изображения и модуля 203 поиска/ассоциирования предпочтительно сконфигурирован для сохранения последовательности изображений, метаданных и звукового сопровождения, являющегося результатом поиска, в базе данных 108, а каждый из модуля 204 предложения/проигрывания звукового сопровождения и модуля 202 отображения последовательности изображений предпочтительно сконфигурирован для извлечения последовательности изображений и ассоциированных метаданных из базы данных 108 для одновременного их отображения и проигрывания. Постоянная память 103 является относительно краткосрочной памятью, действующей только пока пользователь желает, чтобы устройство отображения, которое включает в себя устройство 200, отображало последовательность изображений.Referring now to FIG. 2, an apparatus 200 is illustrated in order to associate sounds with images while images are being displayed, or to present a playlist of sounds to a user. The device includes a capture / creation module 201 image metadata, which receives the image, a sequence of images and metadata describing the image and sequence of images. Metadata includes date, time, time of year, event, user relationship, name (s) of person (s) / pet (s), image location and image sequences. User-entered metadata is captured by module 201 and is also generated by module 201 through image analysis, such as an ocean or lake, islands, etc. As soon as the image and its metadata are captured and the metadata for the entered image sequence is created by the module 201, they are stored in the short-term read-only memory 103. Then, the search / association module 203 searches the database 104 for the corresponding sound data based on the metadata and module 204 offers / plays performs at least one of the offers of the playlist and plays the most relevant sound found as a result of the search. The device 200 further comprises an image sequence display module 202 for displaying an image sequence at the same time as the search result playing / reproducing module 204 plays. In a preferred embodiment, the results are stored in the annotated image database 108 for future retrieval and playback. Each of the image metadata capture / creation module 201 and the search / association module 203 are preferably configured to store a sequence of images, metadata and sound resulting from the search in the database 108, and each of the sound suggestion / playback module 204 and the display module 202 image sequences are preferably configured to retrieve a sequence of images and associated metadata from a database 108 to simultaneously considerations and the play. The read-only memory 103 is a relatively short-term memory valid only as long as the user wants the display device, which includes the device 200, to display a sequence of images.

Ссылаясь теперь на фиг.3, иллюстрируется система 300, включающая в себя устройство 200 по фиг.2. Система дополнительно содержит устройство 302 отображения и устройство 301 проигрывания звукового сопровождения, каждое из которых функционально связано с устройством 200. Система 300 принимает цифровой контент изображений и метаданные, введенные таким образом пользователем 101, и, используя устройство 200 по фиг.2, создает дополнительные метаданные изображений, при необходимости с помощью словаря 104.1, чтобы найти соответствующее звуковое сопровождение в базе данных 104, сохраняя результирующее звуковое сопровождение, ассоциированное с введенной последовательностью, в краткосрочной постоянной памяти 103. Тогда система или проигрывает, через устройство 301 проигрывания звукового сопровождения, звуковое сопровождение при отображении устройством 302 отображения изображения/последовательности, или предлагает, посредством устройства 302 отображения, плей-лист результатов пользователю. Результаты и изображение(я) могут также быть сохранены в базе данных 108 аннотированных изображений для будущего извлечения и просмотра.Referring now to FIG. 3, a system 300 is illustrated including a device 200 of FIG. 2. The system further comprises a display device 302 and an audio playback device 301, each of which is operatively associated with the device 200. The system 300 receives digital image content and metadata thus entered by the user 101, and using the device 200 of FIG. 2, creates additional metadata images, if necessary using the dictionary 104.1, to find the appropriate soundtrack in the database 104, while maintaining the resulting soundtrack associated with the entered sequence in the short-term read-only memory 103. Then the system either loses, through the audio playback device 301, the soundtrack when the image / sequence display device 302 displays, or offers, through the display device 302, a playlist of results to the user. Results and image (s) can also be stored in an annotated image database 108 for future retrieval and viewing.

Хотя предпочтительный вариант осуществления данного изобретения был иллюстрирован и описан, специалисты поймут, что система, устройство и способ, описанные здесь, являются иллюстративными и различные изменения и модификации могут быть реализованы и элементы могут быть заменены эквивалентами, не отступая от истинного объема данного изобретения. В дополнение, множество модификаций могут быть сделаны, чтобы адаптировать идеи данного изобретения к специфическому воспроизведению изображений с установленным звуком/звуковыми эффектами, не отступая от его объема. Поэтому подразумевается, что данное изобретение не будет ограничено специфическими вариантами осуществления, рассмотренными как наилучший способ для осуществления данного изобретения, но чтобы данное изобретение включало в себя все варианты осуществления, подпадающие под объем, определенный приложенной формулой изобретения.Although a preferred embodiment of the present invention has been illustrated and described, those skilled in the art will understand that the system, device and method described herein are illustrative and that various changes and modifications can be made and elements can be replaced by equivalents without departing from the true scope of the present invention. In addition, many modifications can be made to adapt the ideas of the present invention to specific reproduction of images with established sound / sound effects without departing from its scope. Therefore, it is intended that the invention be not limited to the specific embodiments considered to be the best way to carry out the present invention, but that the invention includes all embodiments falling within the scope defined by the appended claims.

Claims

1. A method of using content metadata associated with a sequence of at least one image (101) to provide audio for it, including the steps of
get a text label from the content metadata,
revealing the corresponding (105) soundtrack in the database (104) using the metadata of the content associated with said sequence, and
provide the identified soundtrack (107) as accompaniment for this sequence,
characterized in that said corresponding soundtrack is revealed in the database by searching the database using a text label as a search term.

2. The method according to claim 1, characterized in that the metadata of the content includes date, time, event, location, image-to-user relationship, user characteristics, and a descriptive key phrase.

3. The method according to claim 1, characterized in that it further includes the steps of
provide a dictionary of standard content metadata and their synonyms (104.1) and
use a dictionary (104.1) to bring the content metadata associated with said sequence (101) to standard content metadata (104.1).

4. The method according to claim 3, characterized in that the metadata of the content includes the date, time, event, location, the relationship of the image to the user and the descriptive key phrase.

5. The method according to claim 1, characterized in that the content metadata associated with said sequence is provided by a user or obtained by analyzing the content of said at least one image.

6. The method according to claim 5, characterized in that the content analysis is selected from the group consisting of optical character recognition of the text, recognition of image scenes and recognition of image objects.

7. The method according to claim 5, characterized in that the step of providing further includes initially performing the step of composing the identified soundtrack (106) into a playlist associated with said sequence.

8. The method according to claim 7, characterized in that it further includes the steps of
save the above sequence and the soundtrack identified for it in short-term constant memory (103) and
prior to the provisioning step, the stored sequence and soundtrack identified for it are retrieved.

9. The method according to claim 8, characterized in that the metadata of the content includes user data, date, time, event, location, image to user ratio, the name of the person in the image, the name of the pet in the image, the title of the image, time of year, temperature, latitude, longitude, size, body part, color, and a descriptive key phrase.

10. The method according to claim 9, characterized in that it further includes the steps of
provide a dictionary of standard content metadata and their synonyms (104.1) and
use a dictionary (104.1) to bring the content metadata associated with said sequence (101) to standard content metadata (104.1).

11. The method according to claim 10, characterized in that the detection step further includes a step at which a degree of compliance is obtained; and the provisioning step further includes the step of initially arranging the provided soundtrack according to the degree of correspondence from the lowest to the highest.

12. The method according to claim 10, characterized in that the step of providing further includes the steps at which
save the sequence associated with the identified sound in the database (108) of annotated images and
retrieving the stored sequence and associated sound from the annotated image database (108).

13. A device (200) that associates sound with a sequence of at least one image having content for simultaneous presentation with it, including
an image capturing / creating / metadata module (201) for capturing said sequence and for capturing and creating metadata describing the content of said at least one image,
a soundtrack metadata search / association module (203) that searches for soundtrack in the soundtrack database (104) using a text label obtained from the content metadata as a search term,
a sound suggestion / play module (204) for proposing a playlist of soundtracks resulting from the search, the playlist associating the soundtracks with said sequence, and
an image sequence display module (202) for simultaneously displaying said sequence when the audio suggestion / playing module (204) plays a sound result from a search,
characterized in that the corresponding sound is detected in said database by searching the said database using a text label as a search term.

14. The device (200) according to item 13, characterized in that it further includes a database (108) of annotated images, each of the module (201) capture / create images / metadata and module (203) search / association of metadata the soundtrack is additionally configured to store the mentioned sequence, metadata, and detected soundtrack in the database (108) of annotated images, and each of the module (204) offers / plays soundtrack and module (202) display the sequence of images is further configured to extract a sequence and associated metadata from the database (108), annotated images for simultaneous playback and display them.

15. The device (200) according to claim 13, wherein the image / metadata capture / creation module (201) is further configured to create metadata using image content analysis techniques.

16. The device (200) according to claim 15, characterized in that the image content analysis techniques are selected from the group consisting of optical character recognition of text, recognition of image scenes, recognition of image objects.

17. System (300) for soundtracking a sequence of at least one digital image, including
display device (302),
device (301) for playing sound and
the device (200) according to clause 16, functionally associated with the display device (302) and the sound playback device (301),
wherein the device (200) receives a sequence of at least one image and the first image metadata (101), obtains the second image metadata using the content analysis techniques (102), reveals the corresponding sound using the first and second metadata, and then either the device (301) playing the sound plays the sound when the display device (302) displays the above sequence or the display device (302) offers the user a list of igryvaniya corresponding results.