WO2019112145A1

WO2019112145A1 - Method, device, and system for sharing photographs on basis of voice recognition

Info

Publication number: WO2019112145A1
Application number: PCT/KR2018/009228
Authority: WO
Inventors: 이석희
Original assignee: 라이브픽쳐스(주)
Priority date: 2017-12-05
Filing date: 2018-08-10
Publication date: 2019-06-13
Also published as: KR20190066537A; KR102196199B1

Abstract

According to one aspect of the present invention, disclosed is a method for sharing photographs on the basis of voice recognition. The method comprises the steps of: taking a photograph through a camera so as to acquire an image of the taken photograph; acquiring voice data related with the acquired image; recognizing the acquired voice data so as to generate text; linking the acquired image, the acquired voice data, and the generated text and storing the same; and outputting the stored image together with the stored voice data and/or the stored text.

Description

Speech recognition based photo sharing method, apparatus and system

The present invention relates to a method for sharing a picture, and more particularly, to a method for sharing a picture content conveniently with a fun factor and a convenience factor added to the picture content to allow a large number of users to enjoy the experience.

Conventional analog cameras can view an image captured after an image captured using light is developed, developed and printed.

Recently, however, due to the development of electronic technology, especially optical technology, new types of digital devices have appeared in large quantities, and camera performance in smart phones including camera functions has been improved not only in conventional cameras but also in new concepts .

Therefore, such a camera or a digital camera of a smartphone does not go through a process of complicated printing and development after taking a photograph but stores the screen in a digital storage medium embedded in a camera or a smart phone and attaches it to a camera or a smart phone So that it is possible to easily confirm the photographed image. In addition, the digital camera has a merit that it can replace the role of a conventional camera and a scanner, and has high compatibility with image data of a PC, thereby facilitating editing and correction. However, after time passes, such a digital camera must not only stop the image at the time of photographing such as the place where the image is taken, the feeling at the time of photographing, and the companion, but also various memories including the specific situation and atmosphere There is a problem that must be done.

Smartphones are becoming more and more diverse in terms of functionality, and are increasingly becoming more and more functional in meeting the needs of a variety of consumers, such as increasing the resolution and correction of images in photographing and moving pictures. However, it still has the same problems as conventional smart phones and digital cameras.

In order to solve this problem, technologies for inputting additional information such as texts to images photographed using a camera or a smart phone are being developed. In the conventional information input system of the registered patent No. 10-1053045, text, voice or image information provided from a user or a user terminal is inputted to the photograph or moving picture information stored in an image capturing apparatus including a camera.

However, since the above-mentioned video information input system requires a separate text or voice information or new image information provided to the corresponding picture or moving picture information, it is costly to increase the cost, , And limitations on how to link new images.

A method and apparatus for annotating video content with metadata generated using conventional registration number 10-1115701 voice recognition technology begins by rendering video content on a display device, The voice segment annotates a portion of the image content that is currently rendered, the voice segment is converted to a text segment, and the text segment is associated with the rendered portion of the image content. The text segment is stored in an optionally searchable manner to be associated with the rendered portion of the image content.

Such a conventional technique has proposed a technique of recognizing a voice through speech recognition and converting the recognized voice into a text text and adding it to a digital picture. However, a technique of simply converting a recognized voice into a text and adding it to a picture, A voice recognition error occurs in a state in which the voice is uttered, so that the voice recognition function can not be performed.

According to an aspect of the present invention, there is provided a method of generating a text based on speech recognition, a method of generating a text based on a speech recognition based on a text and / Sharing method, apparatus, and system.

An object according to another aspect of the present invention is to provide a method, an apparatus and a system for sharing a photograph using a block chain.

According to an aspect of the present invention, there is provided a method of sharing a voice recognition-based picture, the method comprising: acquiring an image of a photographed image by photographing the photographed image through a camera; Recognizing the acquired voice data to generate text, associating and storing the obtained image, the obtained voice data and the generated text, and storing the stored image in the stored voice data And outputting at least one of the stored texts.

The acquired image may be obtained from at least one of a photograph currently taken and a photograph previously taken at a time before the present time and previously stored.

The step of associating and storing the obtained image, the obtained voice data and the generated text includes storing the acquired image, the acquired voice data, and information associated with the generated text in a server .

When retrieving the data stored in the server, it is possible to search based on at least one of the voice data and the text.

Wherein the step of associating and storing the acquired image, the obtained speech data, and the generated text includes inserting the text into the image, wherein the text includes a first layer, And can be inserted into different second layers.

Inserting the text into the first hierarchy includes inserting the text into an arbitrary area on the image, identifying a first area in which the text is inserted, and generating an image in which the text is embedded as an image file Wherein the image file is associated with identification information for the first area.

The step of generating an image in which the text is inserted as an image file may include a step of scanning the inserted image to generate an image file.

When the text is inserted into the first layer, the stored voice data may be output corresponding to a user input for the identified first area.

When the text is inserted into the second layer, the stored speech data may be output corresponding to a user input for the second layer of text.

The stored voice data may be packaged and stored with the image and the text.

The stored voice data may be stored in the separate storage, and the image, the text may be packaged with link information to a repository of the voice data.

The associated voice data may include at least one of voice data associated with a photographer present outside a first space associated with photographing and voice data associated with a subject present in the first space.

2. The method of claim 1, wherein the step of associating and storing the obtained image, the obtained voice data, and the generated text comprises the steps of: obtaining the first voice data having the first voice characteristic; 2 audio data having a second audio characteristic, and separating the first audio data and the second audio data.

The first text and the second text are generated by recognizing the separated first voice data to generate a first text and recognizing the separated second voice data to generate a second text, The second voice data may be associated with the second voice data.

The first text may be located at a location on the stored image according to a first input of a user and the second text may be located at a location on the stored image in accordance with a second input of a user.

The step of associating and storing the obtained image, the obtained voice data, and the generated text may include recognizing a first subject and a second subject included in the image by applying an object recognition algorithm to the image, , Associating a first subject included in the image with the first text, and associating a second subject included in the image with the second text.

The first text may be disposed around the first subject, and the second text may be disposed around the second subject.

2. The method of claim 1, wherein associating and storing the obtained image, the obtained voice data, and the generated text further comprises: comparing voice characteristic information associated with the obtained voice data with voice characteristic information previously stored in a voice database And identifying the voice data.

A first mode in which the text is automatically arranged in at least one position among a position of the image and a position in accordance with the image analysis result, and a second mode in which the text is arranged according to a user input, The location of the text can be determined.

The method may further include analyzing the meaning of the text, and when the first mode is operated, the text may be automatically arranged in an area corresponding to the semantic analysis result.

The text having the first meaning may be placed in an area associated with the subject in the image and the text having the second meaning may be placed in any predetermined one of the entire image area regardless of the subject.

A hash tag is automatically generated based on at least one of the image, the voice data, the text, and metadata associated with the image when the stored image is registered in a social network service (SNS) You can register.

When registering the stored image in a social network service (SNS), a hash tag (hashtag) is automatically generated and registered based on the information about the first object by extracting a first object in the image have.

And outputting the text, wherein the output order of the plurality of characters constituting the text, the output order of the plurality of strokes included in each of the plurality of characters, and the drawing order of the output of each of the plurality of strokes ), It is possible to reproduce the text in the form of a dictation from the first character to the last character of the text.

Wherein the step of associating and storing the obtained image, the obtained speech data and the generated text further comprises the steps of: recording the obtained image, the obtained speech data, and information associated with the generated text in a blockchain Step < / RTI >

If there is a request to record the obtained image, the obtained voice data, and the information associated with the generated text in a block chain form, a public key and a private key are generated through an authentication information issuing server, Server to the block-chain data holding server to provide the obtained image, the obtained voice data, and information associated with the generated text.

Wherein the public key and the private key are used for confirmation in the block chain-based data management server, and the obtained image, the obtained voice data and the information associated with the generated text are processed into a hash value, Transaction generated, and the generated transaction may be configured to be delivered to and approved by the block chain holding server.

According to another aspect of the present invention, there is provided a voice recognition-based photo sharing apparatus for acquiring an image of a photographed image by photographing a photograph through a camera, A text conversion unit for recognizing the obtained speech data and generating text, a data storing unit for storing the obtained image, the obtained speech data and the generated text in association with each other, And a data output unit for outputting the stored voice data together with at least one of the stored voice data and the stored text.

According to another aspect of the present invention, there is provided a voice recognition based picture sharing system for acquiring an image associated with a photograph and voice data associated with the image, A user terminal for associating and storing the image, the voice data and the text and requesting to record the stored image, voice data and text in a block-chain form, A block chain management task including at least one of a plurality of block chain holding servers for recording voice data and text in a block chain form, and addition, transfer and deletion of block chain information recorded in the block chain holding servers , A block chain based on processing based on the acknowledgment of the plurality of block chain holding servers It may include data management server.

The block chain-based data management server may record at least one of download information and payment information related to the image, voice data, and text, which are transmitted between the first user terminal and the second user terminal, in the block chain holding servers have.

The user terminal generates a public key and a private key through the authentication information issuing server and transmits the public key and the private key to the block chain based data management server, The server confirms whether or not the public key and the private key received from the user terminal are registered, generates a transaction for information recording by processing the image, voice data and text requested by the user terminal into a hash value, To the block-chain holding servers.

According to the voice recognition-based photo sharing method, apparatus, and system of the present invention, it is possible to add information to a photograph in real time through voice recognition, and to allow a user who uses emotion and vitality to feel the fun and convenience elements together .

FIG. 1 is a conceptual diagram for schematically explaining a method of sharing a photo based on speech recognition according to an embodiment of the present invention; FIG.

FIG. 2 is a block diagram schematically illustrating a voice recognition-based photo sharing apparatus according to an embodiment of the present invention.

FIG. 3 is a flowchart schematically illustrating a method of inserting a photographer's voice and a subject's voice in an image by distinguishing a voice of a photographer and a voice of a subject in a method of sharing photos based on speech recognition, according to an embodiment of the present invention.

4A and 4B are conceptual diagrams for explaining how the text is inserted into the image,

5A and 5B are conceptual diagrams for explaining a method of storing images, texts and voices,

6 is a block diagram showing a configuration in which speech-recognized text data is inserted into an image in association with a subject;

7 is a conceptual diagram for explaining a method of matching voice data of a subject having different voice characteristics with a specific subject in the image,

8A and 8B are conceptual diagrams illustrating a process in which text is arranged at an arbitrary position in an image according to an automatic mode and a manual mode,

9 is a block diagram specifically illustrating a structure for determining an insertion position according to the meaning of the recognized text,

10 is a conceptual diagram for explaining automatic generation of a hash tag,

11 is a conceptual diagram for explaining an emotional text drawing,

12 is a block diagram illustrating a system for storing data based on a block chain according to an embodiment of the present invention.

13 is a flowchart illustrating a method of storing data based on a block chain according to an embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

1 is a conceptual diagram for schematically explaining a method of sharing a photo based on speech recognition according to an embodiment of the present invention.

1, the photo sharing apparatus according to an embodiment of the present invention includes photographing means such as a camera. The apparatus can acquire image information through photographing. The apparatus includes voice acquisition means such as a microphone. The apparatus can acquire data associated with the voice generated at the time of photographing, recognize the acquired voice data using at least one voice recognition algorithm, and convert the recognized information into text. The data associated with the voice may include the input voice when viewing and viewing the pre-stored picture after taking the picture. After speech recognition, the speech data is not discarded but stored in association with the image and text information. For example, a matching relationship between an image and text, text and voice, and / or image and voice is defined so that the voice stored in association with the text can be reproduced when user input such as clicking or touching the generated text occurs . However, when text is clicked only, voice is not reproduced. When one area of the image is clicked, the audio can be reproduced.

Referring to the upper right figure of Fig. 1, the converted text is embedded in one area of the image. The location at which the text is inserted may be manually determined through the user's input, but may be automatically placed in one area. In particular, the semantic analysis of the speech-recognized text can be arranged at a position associated with the semantic analysis result. For example, a word having a meaning of connecting two or more subjects, such as "I love you ", may be defined in advance to be placed in a space between two subjects. Alternatively, when a word representing a person's name such as "xx" and "yy " is associated with an image containing a single subject, it may be arranged around the subject. That is, the location of the text may be determined based on at least one of the semantic analysis result and the number of objects in the image, or the relationship between the objects, and the object analysis result of the objects (including information related to whether the object is a person or person). Thus, after composing the image and text, the associated image may be output with text and / or audio. For example, when a photograph is clicked, a voice can be reproduced. The same is true of downloading from another terminal.

1, the device may package and store image, text, and voice data and store it in a typical social network service (SNS) such as a blog, a cafe, and / or a blockchain And can be shared with a large number of users. Data shared through the shared channel in this manner can be searched based on metadata (e.g., photographing date and time, location, photographing device information, etc.) associated with the image, voice data, and / or text data. For example, not only a search based on a photographing place such as "Seoul ", but also an image including the text can be searched through text generated based on speech recognition such as" I love you.

In an embodiment of the present invention, the device includes a device capable of communicating, such as a camera and a microphone, capable of photographing and audio acquisition. The apparatus also includes a device capable of directly executing a speech recognition algorithm or acquiring voice-recognized information using an external speech recognition algorithm. An apparatus according to an embodiment of the present invention includes at least one of a mobile station (MS), a user equipment (UE), a user terminal (UT), a wireless terminal, an access terminal (AT) Such as a Subscriber Unit, a Subscriber Station (SS), a cellular telephone, a wireless device, a wireless communication device, a Wireless Transmit / Receive Unit (WTRU), a mobile node, May be referred to as a personal digital assistant (PDA), a smart phone, a laptop, a netbook, a personal computer, a wireless sensor, a consumer electronics (CE) Various embodiments of the apparatus may be used in various applications such as cellular telephones, smart phones with wireless communication capabilities, personal digital assistants (PDAs) with wireless communication capabilities, wireless modems, portable computers with wireless communication capabilities, Devices, gaming devices with wireless communication capabilities, video / music storage and playback appliances with wireless communication capabilities, Internet appliances capable of wireless Internet access and browsing, as well as portable units or terminals incorporating combinations of such features But is not limited thereto.

2 is a block diagram schematically illustrating a voice recognition-based photo sharing apparatus according to an exemplary embodiment of the present invention. 2, the photo sharing apparatus 200 includes an information obtaining unit 210, a voice separating unit 220, a text converting unit 230, an image combining unit 240, A data storage unit 250, and a data output unit 260.

Each of the components may be embodied as hardware mounted in the apparatus. The voice separating unit 220, the text converting unit 230, and the image synthesizing unit 240 may be implemented as a microprocessor or a microprocessor It may be implemented as a combination of two or more microprocessors, and executes an instruction to perform each function. The instruction may be stored in a memory (not shown).

Referring to FIG. 2, the information obtaining unit 210 may include a camera and a microphone. The camera can be operated by running a photo shooting application. The camera captures an image of the subject. The camera can generate and provide optically related information. This can be used to calculate the distance to the subject. The information obtaining unit 210 may obtain an image by fetching an image previously stored in a local storage (not shown) as well as an image obtained by executing a current camera. At this time, there may be a case where the voice associated with the pre-stored image already exists, but the file storing only the image can be fetched. The information obtaining unit 210 can receive a pre-stored image and a currently input voice by acquiring a voice inputted by a user while fetching a file by a user's input through a microphone. The microphone is a component that acquires a voice signal. The camera and the microphone may be mounted in the apparatus, or may be provided in a form connected to the apparatus via a separate interface. The image of the information obtained by the information obtaining unit 210 may be provided to the image combining unit 240 and the voice data may be provided to the voice separating unit 220.

The voice separation unit 220 analyzes the voice acquired through the microphone and separates the voice into at least one voice signal. First, a human voice is filtered through a filter (not shown). Since a large number of noise may be contained in the input voice signal, only the voice of a person is extracted by filtering the noise. Then, the voice signal of the extracted person is generated as a voice signal of at least one person by using the frequency of the voice and / or the strength of the voice. The speech separator 220 analyzes the frequency components of the first-order filtered signal to obtain speech characteristic information. The waveform of the speech signal fluctuates widely in the time domain, and the type of the frequency spectrum is comparatively small, and it is easy to extract information such as a property that characterizes the speech. In particular, when a plurality of frequency components are mixed, the speech separator 220 analyzes the extracted frequency components, extracts individual frequency components, and generates a plurality of speech signals. For example, when two signals having different voice characteristics are mixed, the first and second voice signals may be separated and provided to the text conversion unit 230. In addition, additional voice signal analysis may be performed to compare with the associated voice signal database (not shown), and voice signals matched with pre-stored voice signal characteristics may be extracted. Identification information may be added to the matched voice signal. The additional voice signal analysis may be performed by the image combining unit 240. And is provided to the image synthesizing unit 240 for analyzing the signal in the speech separating unit 220. The voice separating unit 220 can separate the voice of the subject included in the photographing area from the voice (e.g., a third person such as a photographer) when photographing. This will be described in more detail below with reference to FIG.

FIG. 3 is a flowchart schematically illustrating a method of inserting a photographer's voice and a subject's voice in an image by distinguishing a voice of a photographer and a voice of a subject in the method of sharing photos based on speech recognition according to an embodiment of the present invention.

Referring to FIG. 3, the speech separator can separate a plurality of speech signals through a frequency component included in the speech signal, as described above, and acquires information related to the separated speech signal (S310). If one person's voice signal, voice separation may not be necessary. Then, the apparatus can distinguish the photographer from the subject based on the intensity of each of the separated voice signals and / or information related to the image capturing (S320). The device can determine how far the separated voice signal has reached the device through the strength of the voice signal and other voice characteristic information. In addition, the image capturing related information may include the size of the subject and zoom in / out information of the camera. Through this, it is possible to determine whether the optical system of the camera pulls or pushes the subject to some degree, and the image related to the optical information and the size of the subject can be analyzed to determine how far the subject is located. The first distance calculated based on the separated voice signals and the second distance calculated based on the result of size analysis of the subject and / or the optical information related to the camera are compared with each other, It is determined whether the first distance is included in a predetermined area (the area can be defined by a predetermined first reference value), and whether the sound is a sound of the subject or not is determined. And, whether the first distance is within the second reference value from the device or not can be distinguished from the voice of the photographer input at a position close from the device. When the sound of the subject is neither the sound of the photographer nor the sound of the photographer, it can be treated as noise or can be controlled to receive different treatments through a separate algorithm.

If the voice is divided into the voice of the photographer and / or the subject through the above process, each voice may be converted into text in the text converter (S330). At this time, the converted text may be referred to as " text corresponding to the voice of the photographer " and / or identification information for identifying the text based on the voice of the subject (or "association information" ) Can be given. Then, each text to which the identification information is assigned can be inserted into the image (S340). At this time, based on the identification information, each of the texts can be handled differently and inserted into the image. For example, the photographer text can be controlled to be centered on the entire area of the image, and the subject text can be inserted around the subject in the image. Or the editing method may be different.

Returning to FIG. 2, the text conversion unit 230 converts at least one speech signal separated by the speech separation unit 220 into text. When separated into two audio signals, the first audio signal is converted into a first text and the second audio signal is converted into a second text. At this time, the first audio signal may be of the photographer, and the second audio signal may be of the subject. Alternatively, the first audio signal may be that of the subject 1, and the second audio signal may be of the subject 2. The text conversion unit 230 converts the separated speech signal into text using a speech recognition algorithm. The speech recognition algorithm according to an embodiment of the present invention includes an algorithm for identifying the linguistic semantic content from each of the speech signals separated by the speech separator 220. More specifically, it includes a process of analyzing a voice waveform to identify a word or a word sequence and extracting a meaning, and it largely includes a process of voice analysis, phoneme recognition, word recognition, sentence analysis, and semantic extraction. The text conversion unit 230 for processing the speech recognition algorithm can be realized as a multi-mm size integrated circuit using a large-scale integrated circuit (LSI) for speech recognition means and speech synthesis means. The speech recognition algorithm according to an embodiment of the present invention can interoperate with a semantic analysis algorithm to realize a full speech-to-text conversion that recognizes a voice due to natural voicing and completely converts the voice into text . This means that it is associated with a speech understanding system that not only recognizes words but also extracts the meaning of consecutive speech or sentences accurately using syntax information (grammar), semantic information, information and knowledge related to the task do. This will be described in more detail below with reference to FIG.

The speech recognition algorithm according to an embodiment of the present invention as described above may be executed in the apparatus, and in some cases, the text conversion unit 230 may provide a speech signal separated into the server 290 or a separate device , The voice recognition text information may be obtained after speech recognition is performed in the server 290 or the separate device. The text conversion unit 230 may provide identification information to the text information converted through each of the separated speech signals to make it possible to check which text is matched with which speech signal.

The image synthesizer 240 synthesizes the image photographed by the camera with the text information converted by the text converter 230. At this time, the image on which compositing is based may include an image that is received and acquired from a previously stored image or another apparatus as well as an image currently being captured, as described above. The image synthesis unit 240 synthesizes the text converted by the image and text conversion unit 230 and generates a single file. At this time, there are various ways in which text is inserted in the image. The text can be distinguished from the text associated with the subject or the text associated with the photographer through voice signal analysis, and can be inserted at different positions according to the distinction. Further, in the case of an image in which a plurality of subjects exist, text associated with different subjects may be arranged around the respective associated objects, respectively.

4A and 4B are conceptual diagrams for explaining how text is inserted into an image.

Referring to FIG. 4A, the apparatus may insert an image and text in a form that text is inserted in the same layer as the image. At this time, the image may include at least one of PNG, JPG, PDF, GIF and / or TIFF format files. However, the image is not necessarily limited to the file of the extension. The image synthesizing unit 240 can convert the file format into a format suitable for synthesis with text. At this time, the inserted text may be inserted in the same first layer as the image. The image composing unit 240 may generate text in an image format and then insert the image format text in the same layer. Alternatively, the text may be placed on the image in the same manner as the text by using the text property, and then scanned to form an image of the same layer. The image thus generated can be a file in a single format and can be generated as a file such as JPG, PNG, or PDF. At this time, the area information in which the imaged text is arranged is generated, so that the text can be reacted to the user's input to the corresponding area. The response of the text can be a way of dragging and outputting the associated voice information. For example, if there is a user input in the area 410 where "I love you" exists, the user input of the area 410 may be detected to output a voice associated with the text.

Referring to FIG. 4B, when the image composing unit 240 assumes that a layer in which an image file exists is a first layer, the text is inserted into a second layer different from the first layer, and the first layer and the second layer are superimposed Images and text can be combined. Accordingly, the second layer in which the text exists can be controlled separately from the first layer, and the placement region 420 of the second layer text can be independently reacted to the user input. That is, a user input to the coordinates of the area 420 where the text is actually placed may be detected to react to output the voice associated with the text.

Additionally, the image and text thus generated may be stored separately. In separate storage, the images and text may be stored separately in the same format (e.g., PNG format) or may be stored separately in different formats (e.g., PNG and JPG formats).

According to the embodiment of the present invention, the text can be edited and inserted into various fonts, colors, sizes, and the like by user setting. In particular, it can be inserted in different fonts, colors, and sizes through association with the subject and / or the photographer. For example, the text associated with the subject may have a size of 12 and be inserted into the paladin, and the text associated with the photographer may have a size of 15 and be inserted into the Gothic font.

2, the data storage unit 250 stores image, voice, and text information. As described above, the image and the text may be stored as one image, or may be stored separately from each other. The portions related to the storage of data are described in more detail in FIGS. 5A and 5B.

5A and 5B are conceptual diagrams for explaining a method for storing images, texts, and voices illustratively.

Referring to FIG. 5A, the data storage unit may store image, text, voice, and metadata as one file 510. At this time, the voice may include a plurality of voice data, and a plurality of voice data having different voice characteristics may be separately stored. The metadata may include a shooting date and time, a shooting location, a shooting device, and photographing-related application information.

As described above with reference to FIGS. 4A and 4B, according to the embodiment of the present invention, images and texts may be stored in one image file or in different files. Accordingly, they can be output to one file at a time, and sequentially output to different files. However, even if it is stored in one image file, the information associated with the text may be separately recorded as metadata. For example, the information to be recorded may be stored as metadata of the text, and related information indicating a voice associated with the text, and search and hashtag generation may be performed based on the stored metadata.

Referring to FIG. 5B, the data storage unit packages the image, text, and metadata into one file 520, and the voice data is stored in a separate storage 530 (a database of a local storage and / or a server external to the apparatus) And store the link information to the voice data in the meta data.

The text information may include a plurality of texts matched to a plurality of separate voice data. In the embodiment of FIG. 5B, the text includes a first text and a second text, each of which is matched to link information for the first speech data and link information for the second speech data. Through such a storage method, the first voice data link information is extracted in response to the user input of the first text, the storage 530 storing the voice information is retrieved based on the retrieved link information, can do. Since only the link information for the voice data is packaged, the weight of the packaged file is light compared with the embodiment of FIG. 5A.

2, the data store 250 may store image, text, and voice data in the device, and the stored data may be shared with the server 290. The server 290 may include a server for managing the SNS. The stored image, text, and / or audio package data may be shared with other terminals 295-1 through 295-N using a particular web page on the Internet via the server 290 according to an embodiment of the present invention. The device 200 can upload data stored in a specific web page using the server 290 and provide the uploaded data to the terminals 295-1 through 295-N visiting the web page . The terminals 295-1 through 295-N may output the text contained in the image of the uploaded data through the user input and / or the audio associated therewith. Conversely, the terminal 295-1 through 295-N may receive the text and / or audio information associated with the image of the uploaded data and store the received text and / or audio information in the data storage unit 250.

The data output unit 260 may include display means such as a monitor, a touch panel, a TV screen, and sound output means such as a speaker and an earphone. The data output unit 260 outputs images, images and text and / or audio information associated therewith. The data output unit 260 may output an image file stored corresponding to a user input through a user interface (not shown) such as a touch screen, a mouse, a keyboard, etc., and may output text including the text in the image. In addition, when there is a user input for an image and / or a text, the associated voice data is output using association information with the text.

According to another embodiment of the present invention, the server 290 includes a server associated with a block chain. At this time, the server 290 operates as a server that manages the block chain, and each of the terminals 295-1 through 295-N can operate as a block chain holding server. This will be described in more detail below with reference to FIGS. 12 and 13. FIG.

The server 290 can receive and store image, text and / or voice data from a plurality of terminals 295-1 through 295-N and the terminals 295-1 through 295-N can return desired data have. At this time, it is possible to search not only the image but also text and / or voice data. Particularly, in the case of searching through text and voice data, since data for an advertisement can be excluded, it is more useful than a search through general search words. That is, the text and voice data items may be searched to search only the images containing the text and voice data. The selection of the item may be variously performed not only in text and voice but also in an object or metadata (shooting date and time, location, etc.) in the image, thereby enabling searching for more compact stored data.

6 is a block diagram showing a configuration in which speech-recognized text data is inserted into an image in association with a subject. 6, the configuration for image insertion according to an embodiment of the present invention includes a voice acquisition unit 610, a voice identification unit 620, an object identification unit 630, and a voice / image matching unit 640 ).

Referring to FIG. 6, the voice acquiring unit 610 may acquire first voice data and second voice data separated through the voice separating unit. Then, the divided voice is provided to the voice identification unit 620.

The voice identification unit 620 identifies the voice of a specific subject and / or the voice of the photographer by comparing the voice characteristic information stored in the voice database 625 based on voice characteristics such as the frequency of the separated voice. The voice identification unit 620 can receive information on the separated voice data from the voice separation unit and use it for voice identification.

Basically, the speech analysis in the speech identification unit 620 is based on frequency analysis. Since the frequency spectrum is obtained by the frequency analysis on the acquired voice data, this is also referred to as spectrum analysis. When the speech is heard as a linguistic note, the power spectrum that only shows amplitude can be used since the difference in the phase spectrum is not affected. The speech waveform exhibits almost constant characteristics (called quasi-normal) in a relatively short period (several tens of ms to several hundreds of milliseconds), and the characteristic changes in a long time interval.

Therefore, it is preferable that the voice identification unit 620 performs a short-time spectral analysis that can be viewed as quasi-normal in the spectrum analysis of a voice signal. In addition to the analysis by the Fourier transform, the frequency analysis method may employ a filter bank method in which outputs of a plurality of band filters having different center frequencies are used.

The process of analyzing the voice characteristic in the voice identification unit 620 and matching the analyzed result with a specific person will be described in more detail with reference to FIG.

7 is a conceptual diagram for explaining a method of matching voice data of a subject having different voice characteristics with a specific subject in the image.

Referring to FIG. 7, the first voice data obtained from the voice identification unit may have a first voice characteristic, and the second voice data may have a second voice characteristic. It can be confirmed that the first voice characteristic corresponds to the person "A " stored in the voice database and the second voice characteristic corresponds to the person" B "stored in the voice database.

That is, the voice database may basically have voice characteristic information for the user of the apparatus stored therein. Since the user of the apparatus is highly likely to be a photographer, it is preferable that the voice characteristic information of the photographer is stored in advance. In addition, voice information about user's peripherals of a device frequently exposed to photographing may be stored. This may be pre-stored using a user-set interface associated with recording the voice characteristics of the camera application. Alternatively, after inputting a voice along with photographing in accordance with the photographing method according to an embodiment of the present invention, when there is no result matching the previously stored voice characteristic, the person information of the inputted voice is inputted, It is possible to store the corresponding person information. At this time, the person information includes direction information indicating that the information is associated with the photographer. This is given in the form of a flag such that "0" represents the photographer and "1" represents the person (including the subject) other than the photographer. Or " 2 "indicates a case where there is no corresponding person but the sex and / or age of the person can be distinguished," 0 "indicates the photographer," 1 "indicates when the corresponding person is other than the photographer, 3 "can indicate when the person-related information can not be grasped. The person information is used to match the object identified in the object identification section, including image information of a specific person.

The voice database including the voice characteristic information and the corresponding person information may be implemented as a local storage in the apparatus or a large-capacity database interworking with the server. In particular, in the case of a large-capacity database interworking with a server, voice identification information of a device is extracted and provided to a server, and information related to a person corresponding thereto is obtained from the server, have. The voice database can accumulate more voice characteristic information and corresponding character information as the device is continuously used for photographing. In addition, since the server obtains the voice characteristic information and the corresponding person information from a plurality of terminals, it is possible to obtain a large amount of voice characteristic information and the corresponding person information in an exponential manner.

The voice data having the voice characteristic corresponding to the voice characteristic of the specific person stored in advance is identified as a specific person and is given identification information of the voice data. This is identification information related to a specific person and is information distinguished from association information with the above-described text.

In addition, the voice identification unit includes an algorithm for distinguishing whether the input voice is male voice, female voice, or voice of a certain age when there is no voice data matching a specific person. This can be done by using the basic range of the male and female, and by using the basic range of the people of a certain age. In addition, an algorithm for voice identification in a server in cooperation with a voice identification unit and / or a voice identification unit may be implemented by training voice characteristic information accumulated in real time and corresponding character information (including sex and age information of the person) Can be generated as a dataset and continue to be machine-learned. It is trained based on a deep-running algorithm. The person information given to the voice data by the voice identification unit through the above process may further include the sex and age information of the voice.

Referring back to FIG. 6, after the person identification information for specific voice data is obtained in the voice identification unit 620, the object identification unit 630 analyzes the objects existing in the image using the object recognition algorithm do. Basically, it intensively analyzes the part of the subject existing in the image related to the person. The object identification unit 630 interfaces with the object database 635. The object database 635 may also be implemented as a large-capacity database that interfaces with local storage and / or servers within the device.

The object database 635 stores image information associated with a specific person and a specific object, and corresponding person and object information. For example, the image of the person "A " (which may include other parts such as face, arm, and leg) may be stored and information related to the person" A ", such as sex, . That is, the object included in the acquired image is analyzed by the object and collated with the image included in the object database, and if there is a corresponding image, the corresponding person information is acquired. Alternatively, in the case of an object object, it may have an image of a non-person object (e.g., a building, a leg, etc.) and information corresponding thereto. Such information can be accumulated continuously according to photographing.

The object database 635 and the voice database 625 can be interlocked. That is, the person information (image information, audio information, and / or person / object information) for the same person can be shared and stacked together. Alternatively, it can be implemented as a single database.

The voice / image matching unit 640 acquires the person and / or object information of the object from the object identification unit 630, acquires the person information obtained through voice data identification from the voice identification unit 620, Compare. As a result of the comparison, when it is judged to be the same person, the voice data is associated with the corresponding subject.

As a result of the association, the text obtained by acquiring the text associated with the specific voice data from the text conversion unit based on the text-voice association information (first association information) can be arranged around the associated subject (voice- (Using the second association information). That is, when the first voice data is identified as the character "A ", and the first subject in the image is identified as the character" A ", the first voice data is associated with the first voice data, B, " the second audio data is identified as a character "B ", and if the second subject in the image is identified as a person" B ", associate the second text with the second text obtained from the second audio data . When the user input is detected on the text arranged in the vicinity of the subject, the voice data associated with the text is fetched and output. For example, when the second text around the second subject is clicked, the second voice data is output, and the contents of the second subject at the time of photographing are output.

In addition, if the person information obtained from the voice data is not clearly identified as a specific person, but is identified as a teenage woman, and the person information through the object analysis is identified as a subject of a teenage woman, The text converted from the voice data can be arranged around the ten female subjects by matching with the voice of the subject. Thus, the information about the person age and the person gender obtained from the voice data can be matched with the subject information according to the object analysis in an optimal manner. It is also possible to analyze the age / sex of the person and the corresponding voice tone by object analysis.

When the photographer is identified by the voice data of the photographer, if the photographer is displayed with a specific subject in the photograph, the photographing person is arranged around the subject by matching with the subject. If the photographer is not in the photograph, The text can be placed at a predetermined position in association with or regardless of the subject.

8A and 8B are conceptual diagrams illustrating a process in which text is arranged at an arbitrary position in the image according to the automatic mode and the manual mode.

Referring to FIG. 8A, the device can use automatic mode to place text around an associated subject. Because the first text 810 is associated with the first subject 812, it is automatically placed around the first subject 812. The second text 820 is automatically associated with the second subject 822 and thus around the second subject 822. [ At this time, the peripheral area to be disposed may be predetermined by the user setting at the upper or lower end of the subject and / or the left or right side of the subject. Also, it is possible to analyze other objects around the subject in the object identification unit, and to arrange them in the most optimized position in relation to other analyzed objects. That is, even if it is set to be disposed at the upper end of the subject, if another object object (e.g., building, sun, etc.) exists at the upper end of the subject, the subject object can be arranged to the left or right side.

Referring to FIG. 8B, the first text 830 and the second text 840 separated from each other are manually placed at specific positions in the image through

user inputs

834 and 844 for respective texts. This is not necessarily related to the subject 832, 842 and should be located around the subject, but the user can arbitrarily determine its position.

According to another embodiment of the present invention, the apparatus (or server) generates a positional relationship between a text layout area and a subject and / or a positional relationship between a text layout area and a subject around the subject in a training data set So that the learning of the optimized insertion position in the automatic mode can be performed through the deep learning algorithm. Thereby enabling the text insertion position in the automatic mode to be adapted to the preference of the user (or a plurality of members connecting to the server).

9 is a block diagram specifically illustrating a structure for determining an insertion position according to the meaning of the recognized text. 9, the structure for determining the text insertion position may include a semantic analysis unit 910 and an insertion position determination unit 920 according to an embodiment of the present invention. This may be a component included in the image combining unit of FIG.

9, the semantic analysis unit 910 acquires the recognized text information from the text conversion unit and performs semantic analysis based on the words stored in the word database 912. FIG. This can be done through parsing.

Then, the analyzed semantic information is provided to the insertion position determining unit 920. The insertion position determination unit 920 determines the insertion position based on the meaning of the text. That is, the positional relationship according to a specific meaning is stored in advance, and the insertion position corresponding to the input text is appropriately determined.

The insertion position determination unit 920 arranges text having a meaning related to the person around the person. For example, it is preferable that words indicating specific parts of a person, such as a person's name such as "Emily "," .

Further, text having a meaning associated with a relationship between persons is arranged between the portrait subjects. For example, words such as "I love you", "I like you", "I hate you", "I love you" can be placed between two or more people or in a central location.

In addition, the text in another specific meaning may be set to be placed in the center, right and left, upper and lower outermost portions of the entire image area without considering the arrangement of the subject.

Particularly, the semantic analysis contents can be inserted into the image in conjunction with a sticker for decorating a subject and a photographic image. For example, text such as "I love you" can be displayed in the image in conjunction with a heart-shaped sticker such as "♡ ". That is, the text having a specific meaning and the corresponding sticker are stored in advance, so that the sticker according to the semantic analysis result of the text can be displayed together with the text in the image.

10 is a conceptual diagram for explaining automatic generation of a hash tag.

Referring to FIG. 10, the device may automatically convert metadata, voice files, and text associated with a photo into a hashtag. In general, the SNS platform has a disadvantage in that the retrieval accuracy is very low because a large number of contents for advertisement are searched by indiscriminately registering a photograph and using a hash tag. Accordingly, the photo sharing apparatus according to the embodiment of the present invention can automatically convert the metadata of the photograph, such as the shooting date and time, and the shooting place information, into a hash tag. In addition, text and audio information is automatically converted into a hashtag.

According to an embodiment of the present invention, a device extracts a specific object in an image and converts the object into a hashtag. For example, when the " XX cafe "is displayed on a signboard attached to a specific building in the image, the object identification unit described above extracts" XX cafe " Can be automatically generated.

In addition, the retrieval accuracy can be improved by generating tags by combining metadata, such as photographing date and time, photographing location, photographing device, and object information in the text, voice, and / or image.

11 is a conceptual diagram for explaining emotional text drawing.

Referring to FIG. 11, in the photo sharing apparatus according to an embodiment of the present invention, when outputting text on an image, text may be reproduced in a dictation format. To this end, it is preferable that an output order of a plurality of characters constituting a text, an output order of a plurality of strokes contained in each of the plurality of characters, and information on a drawing from an output time point to an output end point of each of the plurality of strokes And then reproduced in the form of dictation from the first character to the final character of the text based on the information. That is, it is preferable that characters are recognized from the left side of the text, and the order is determined so as to be outputted from the left character. In the case of "I love you," make sure that "Jesus," "next," and then "sun" are printed. Then, a stroke of each character is written based on the stroke information of the Korean stroke. In the case of the "s", "x" and "a" are written, and each stroke of "/", "\", "l", "-" is output in order. Then, each stroke is drawn from the top left to the bottom right. Such emotional drawing may be implemented through multiple frames, such as animation, so that the text portion is drawn. That is, it can be reproduced in the form of a moving picture such as a gif file.

However, such emotional text drawing is not always executed, but can be changed through user setting.

According to another embodiment of the present invention, a file in which an image and text are synthesized can be reproduced so that only an image is output first, and text is output thereon at once.

12 is a block diagram illustrating a system for storing data based on a block chain according to an embodiment of the present invention. A system for storing data based on a block chain according to an embodiment of the present invention includes a user terminal 1210, an authentication information issuing server 1220, a block chain-based data management server 1230, (1240).

Referring to FIG. 12, a block-chain is a technology for securely recording and storing transaction contents on network communication, as is known. Transaction contents are recorded in each block, which forms a chain over time, and these chains are distributed and stored on the P2P network to form a block-chain network.

12, a terminal 1210 generates a public key and a private key, and generates a block including a public key and a user's identification information required for issuing block-chain-based authentication information, And transmits the personal information for issuing the chain-based authentication information to the authentication information issuing server 1220. [ To this end, the terminal 1210 may include a key generation engine and an encryption / decryption engine. The user information for issuing the block chain-based authentication information may include at least a part of a user name, a user registration number, a user telephone number, and a user email.

The terminal 1210 can check whether the user using the terminal 1210 has registered the identification information of the user in the authentication information issuing server 1220 before generating the public key and the private key. The terminal 1210 transmits the block-chain-based authentication information issuing user information to the authentication information issuing server 1220 to request issuance of the block-chain-based authentication information.

The authentication information issuing server 1220 matches the user information for issuing the block-chain-based authentication information with the user-specific identification information database (not shown) for each account, and if matching information exists, guides the generation of the public key and the private key Generates a key generation guide signal, and transmits it to the terminal 1210. If there is no matching information, the authentication information issuing server 1220 can transmit a message indicating that authentication information issuing is not possible.

Specifically, when the authentication information issuing server 1220 acquires the identification information of the specific user as the issuing request for the authentication information from the terminal 1210, the authentication information issuing server 1220 confirms whether the identification information of the specific user is registered. When the identification information of the specific user is registered, the authentication information issuing server 1220 generates the key generation guide signal to support the terminal 1210 to generate the public key and the private key of the specific user.

Upon receiving the key generation guide signal from the authentication information issue server 1220, the terminal 1210 executes a key generation engine (not shown) to generate a public key and a private key. At this time, the terminal 1210 preferably controls the generation of the public key and the private key in a state in which the network is shut off, so that the terminal 1210 may block the outflow of each key that may be generated even if the terminal 1210 exits.

The terminal 1210 operates an encryption / decryption engine (not shown) to encrypt the private key based on the password and / or image designated by the user and stores the encrypted private key in the local storage (not shown). Accordingly, even if the user's private key is leaked, the information can be read only by knowing the password and image designated by the user, thereby enhancing the security. Terminal 1210 outputs a notification to reconnect the network once the encrypted private key is stored, and the user can connect to the network.

The authentication information issuing server 1220 may have a database linked thereto. The database of the authentication information issue server 1220 stores identification information of a user who operates the terminal 1210. In addition, it includes a user identification information database for each member in which identification information of a user identical to the user information for issuing the block-chain-based authentication information is stored.

The authentication information issuing server 1220 receives the public key and the user information for issuing the block-chain-based authentication information from the terminal 1210, performs a hash operation on the user information for issuing the block-chain-based authentication information, and processes the user- .

The authentication information issuing server 1220 collects the designated user identification information corresponding to the identification information of the user designated in the identification information of the user constituting the user identification identification information, the public key, and the user information for issuing the block chain-based authentication information, and transmits it to the block chain-based data management server 1230. The block chain-

The block chain-based data management server 1230 can perform transaction creation and transmission operations according to whether the user's identification information is registered. Here, the designated user identification information may include the telephone number of the user. To this end, the authentication information issuing server 1220 may include a hash processing engine (not shown). As described above, the hash processing engine performs a function of hashing the user information for issuing the block-chain-based authentication information and processing the user information into the user identification hash information.

If the identification information of the user is acquired in response to the issuance request for the authentication information from the terminal 1210 and it is determined whether or not the identification information of the user is registered, if the user information is in the registration state, And generating a transaction that outputs a hash value of the public key and the identification information or a value obtained by processing the hash value of the public key and the identification information, . The transmission to the block chain may be made to the block chain data holding server 1240. To this end, the block-chain-based data management server 1230 can identify the specific user's identification information in the database. The block chain-based data management server 1230 can acquire and store a transaction ID indicating a location information recorded on the block chain, and hash the user identification hash information and the transaction ID to process the user verification hash information can do.

The block chain-based data management server 1230 performing such a function may be a server of a company that requires authentication to be performed when using the service.

The block chain data holding servers 1240 are each made up of one member. This may be a configuration corresponding to the terminals 295-1 to 295-N of FIG. The transaction corresponding to the user information having the block chain is stored in each block chain data holding server 1240. When a new transaction is received, the transaction information is recorded after the verification, and at the same time, (Hereinafter referred to as "image / text packaging information") according to an embodiment of the present invention to the client 1240 .

More specifically, the propagation of the transaction corresponding to the image / text packaging information is promised by a communication protocol, and when a new transaction is created, one node (referred to as a block chain data holding server 1240 in this case) (For example, eight) nodes, and the bit-coin (which may be another encryption currency such as etherium) may be used. In a pyramid that repeatedly propagates to a plurality of nodes designated for each of a plurality of nodes, And propagated to all of the block chain data holding servers 1240 through the expression propagation, thereby completing the processing. As such, all transactions written to the block chain are not forgery-fake.

As described above, the system for storing data on a block chain basis according to an embodiment of the present invention records image / text packaging information in a block chain holding server 1240 in a block chain form. In addition, the system may record transmission / reception history, search history, and / or payment history information of packaging information transmitted between a terminal and a plurality of terminals in the block chain holding servers 1240.

The block chain-based data management server 1230 performs an information management task including addition, transfer, and deletion of information recorded in the block-chain holding servers 1240, and transmits the information management task to the block- Based on the approval of the servers 1240. [

The image / text packaging information recorded in the block chain holding servers 1240 includes image, text, voice data (or link information for voice data), and metadata.

According to an embodiment of the present invention, when there is a request to record image / text packaging information, the terminal 1210 generates a public key and a private key through the authentication information issuing server 1220, And the block chain-based data management server 1230 checks whether or not the public key and the private key received from the terminal 1210 are registered and then transmits the image / text packaging information requested by the terminal 1210 to the server 1230, To a hash value to generate a transaction for information recording, and transmits the generated transaction to the block-chain holding servers 1240 to be approved.

Referring to FIG. 13, in the above description, the terminal requests the picture information (image / text packaging information) from the block chain-based data management server (S1310). After confirming whether or not the public key and the private key received from the terminal are registered, a hash value is generated to generate a transaction block for information recording (S1320), and the generated transaction block is transmitted to the block chain holding servers (S1330). At this time, the propagation of the transaction is promised by a communication protocol, and when a new transaction is generated, one node propagates to a specified number of nodes, and its bit coin (etherium or other encrypted currency may be used) ) The payment transaction information is propagated to all block chain data holding servers through pyramidal propagation which is repeatedly propagated to a plurality of nodes designated for each of a plurality of nodes that have received the transaction information for payment. When all the block chain holding servers approve the transaction block (S1340), the transaction block is added (S1350), and the recording of the requested photo information from the terminal is completed (S1360).

According to an embodiment of the present invention, an encrypted currency such as a bit coin may be generated with the occurrence of a transaction associated with photo information. Or with other users' sharing requests for a particular transaction. In other words, it is possible to acquire the encrypted currency according to the acquisition of reputation through information sharing. In addition, the block chain-based platform includes a shared platform such as the SNS. That is, a system in which a photo sharing platform such as a Facebook (FACEBOOK) and an Instagram (INSTAGRAM) is operated in the open type block chain manner is a system in which a photo sharing method according to an embodiment of the present invention is applied .

Additionally, according to an embodiment of the present invention, in addition to the open type block chain method described above, a closed block chain method can be applied to the photo sharing system according to an embodiment of the present invention.

The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may be implemented in various forms such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to embodiments may be implemented in the form of a program instruction that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Obtaining an image of the photographed image by photographing the photographed image through a camera;

Obtaining voice data associated with the obtained image;

Recognizing the acquired voice data to generate text;

Associating and storing the obtained image, the obtained voice data and the generated text; And

And outputting the stored image with at least one of the stored voice data and the stored text.
The method according to claim 1,

Wherein the acquired image is obtained from at least one of a photograph being photographed at present and a photographed and previously stored photograph at a point in time before the current point in time.
The method according to claim 1,

And associating and storing the obtained image, the obtained voice data and the generated text,

Storing the acquired image, the obtained voice data, and information associated with the generated text in a server.
The method of claim 3,

And searching for data stored in the server based on at least one of the voice data and the text.
2. The method of claim 1, wherein associating and storing the obtained image, the obtained voice data, and the generated text comprises:

Inserting the text into the image,

Wherein the text is inserted into a first layer that is the same as the image or a second layer that is different from the image.
6. The method of claim 5, wherein insertion of the text into the first layer comprises:

Inserting the text into an arbitrary area on the image;

Identifying a first area in which the text is embedded; And

Generating an image in which the text is embedded as an image file,

Wherein the image file is associated with identification information for the first area.
The method according to claim 6,

Wherein the step of generating the image including the text as an image file comprises scanning the embedded image to generate an image file.
The method according to claim 6,

Wherein when the text is inserted into the first layer, the stored voice data is output corresponding to a user input for the identified first area.
6. The method of claim 5,

When the text is inserted into the second layer,

Wherein the stored voice data is output corresponding to a user input for the text of the second layer.
The method according to claim 1,

Wherein the stored voice data is packaged and stored with the image and the text.
The method according to claim 1,

The stored voice data is stored in the separate storage,

Wherein the image, the text, is packaged with link information to a repository of the voice data.
The method according to claim 1,

Wherein the associated voice data comprises at least one of voice data associated with a photographer present outside a first space associated with photographing and voice data associated with a subject present in the first space.
2. The method of claim 1, wherein associating and storing the obtained image, the obtained voice data,

Wherein the acquired voice data-voice data includes first voice data having a first voice characteristic and second voice data having a second voice characteristic, and performing a voice analysis on the first voice data and the second voice data The method comprising the steps of:
14. The method of claim 13,

Recognizing the separated first voice data to generate a first text,

Recognizing the separated second voice data to generate a second text,

Wherein the first text and the second text are associated with first audio data and second audio data, respectively.
15. The method of claim 14,

Wherein the first text is located at a location on the stored image according to a first input of a user,

Wherein the second text is located at a location on the stored image according to a second input of the user.
15. The method of claim 14, wherein associating and storing the obtained image, the obtained speech data,

Recognizing a first object and a second object respectively included in the image by applying an object recognition algorithm to the image;

Associating a first subject included in the image with the first text; And

And associating a second subject included in the image with the second text.
17. The method of claim 16,

Wherein the first text is disposed around the first subject,

And wherein the second text is disposed around the second subject.
The method according to claim 1,

2. The method of claim 1, wherein associating and storing the obtained image, the obtained voice data,

And comparing the voice characteristic information associated with the obtained voice data to voice characteristic information previously stored in a voice database to identify the voice data.
The method according to claim 1,

A first mode in which the text is automatically placed in at least one position among a region of the image, a pre-designated position, and a position in accordance with an image analysis result; And

Wherein the location of the text is determined by one of a second mode in which the text is arranged according to user input.
20. The method of claim 19,

Further comprising analyzing the meaning of the text,

Wherein the text is automatically placed in an area corresponding to a semantic analysis result when operating in the first mode.
21. The method of claim 20,

Text having a first meaning is placed in an area associated with a subject in the image,

Wherein the text having the second meaning is disposed in a predetermined region of the entire image region, regardless of the subject.
The method according to claim 1,

A hash tag is automatically generated based on at least one of the image, the voice data, the text, and metadata associated with the image when the stored image is registered in a social network service (SNS) A method for sharing photos based on speech recognition.
The method according to claim 1,

When registering the stored image in a social network service (SNS), a hash tag (hashtag) is automatically generated and registered based on information about the first object by extracting a first object in the image Recognition based photo sharing method.
The method according to claim 1,

In outputting the text,

Based on the output order among the plurality of characters constituting the text, the output order between the plurality of strokes included in each of the plurality of characters, and the drawing information from the output time point to the output end point of each of the plurality of strokes ,

Wherein the text is reproduced in the form of a dictation from the first character to the last character of the text.
The method according to claim 1,

And associating and storing the obtained image, the obtained voice data and the generated text,

And recording the acquired image, the obtained voice data, and information associated with the generated text in a blockchain.
26. The method of claim 25,

If there is a request to record the acquired image, the obtained voice data and the information associated with the generated text in block-chain form,

Generating the public key and the private key through the authentication information issuing server and transmitting the public key and the private key to the block chain-based data management server by providing the obtained image, the obtained voice data and the information associated with the generated text Speech recognition based photo sharing method.
27. The method of claim 26,

Wherein the public key and the private key are used to confirm whether or not to register in the block chain-based data management server,

Wherein the acquired image, the acquired voice data, and information associated with the generated text are processed into a hash value and generated in a transaction for information recording,

Wherein the generated transaction is configured to be delivered to and approved by the block chain holding server.
An information acquiring unit acquiring an image of a photographed photograph as a photograph is taken through a camera, and acquiring voice data associated with the obtained image;

A text conversion unit for recognizing the obtained speech data and generating text;

A data storage unit for associating and storing the obtained image, the obtained voice data, and the generated text; And

And a data output unit for outputting the stored image together with at least one of the stored voice data and the stored text.
Acquiring image data associated with the image, and voice data associated with the image, recognizing the acquired voice data to generate text, storing the image, the voice data and the text in association with each other, A user terminal requesting to write text in a block-chain form;

A plurality of block chain holding servers for recording image, voice data and text generated in the user terminal in a block chain form; And

A block chain management task including at least one of adding, transferring, and deleting block chain information recorded in the block chain holding servers based on an acknowledgment of the plurality of block chain holding servers, A voice recognition based photo sharing system including a management server.
30. The method of claim 29,

The block-chain-based data management server includes a block-chain-based data management server for storing at least one of download information and settlement information related to the image, voice data and text sent between the first user terminal and the second user terminal, Recognition based photo sharing system.
30. The method of claim 29,

The user terminal generates a public key and a private key through the authentication information issuing server and transmits the public key and the private key to the block chain based data management server,

The block chain-based data management server checks whether the public key and the private key received from the user terminal are registered, generates a transaction for information recording by processing the image, voice data and text requested by the user terminal into a hash value And to forward the generated transaction to the block chain holding servers and to approve the transaction.