WO2023085695A1

WO2023085695A1 - Image editing device

Info

Publication number: WO2023085695A1
Application number: PCT/KR2022/017172
Authority: WO
Inventors: 이상연
Original assignee: 주식회사 벨루가
Priority date: 2021-11-10
Filing date: 2022-11-03
Publication date: 2023-05-19

Abstract

An image editing device is disclosed in the present invention. The image editing device comprises: a processor; and a memory operatively connected to the processor, wherein the memory stores instructions that, when executed, cause the processor to identify text data from first image data, execute a translation algorithm for providing at least one translation of the text data on the basis of information on the first image data, receive a translation selection signal for selecting, from the outside, a first translation which is one of the at least one translation, from the outside, on the basis of the execution result of the translation algorithm, and identify second image data to which the first translation is applied to the first image data, on the basis of the reception of the translation selection signal.

Description

image editing device

The present invention relates to an image editing apparatus, and more particularly, to an image editing apparatus capable of providing translation for text data included in an image and editing the image.

The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

Recently, due to the development of communication networks such as the Internet, it has become common to read webtoons using various mobile devices as well as PCs through network communication.

On the other hand, when reading webtoons through a mobile device, since the screen of the mobile device is generally smaller than the screen of a PC, the image or text of the webtoon is small, so the user may feel uncomfortable reading the webtoon.

Therefore, it is necessary to develop a webtoon providing method through speech bubble recognition that adjusts the size of the text in the speech bubble area according to the user's needs.

In addition, due to the development of communication networks, it has become common to view image contents from around the world through network communication. On the other hand, when browsing image content from around the world, there may be difficulties in interpreting text included in image content.

Accordingly, there is a need for a device capable of easily translating various languages included in image content and further editing images.

An object of the present invention is to provide an image editing device that provides a webtoon providing method through speech bubble recognition that sets a speech bubble area by executing a speech bubble recognition algorithm when an input event occurs and adjusts the size of text in the speech bubble area. .

In addition, an object of the present invention is to provide a webtoon providing method through speech bubble recognition, which learns to distinguish between speech bubble images and error images through a deep learning module and sets a speech bubble area in image data using the learned deep learning module. It is to provide an image editing device.

Another object of the present invention is to provide an image editing device that provides a translation based on information of image data using deep learning.

The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention not mentioned above can be understood by the following description and will be more clearly understood by the examples of the present invention. It will also be readily apparent that the objects and advantages of the present invention may be realized by means of the instrumentalities and combinations indicated in the claims.

An image editing apparatus according to an embodiment of the present invention includes a processor and a memory operatively connected to the processor, wherein the memory, when executed, causes the processor to identify text data in first image data; , Based on the information of the first image data, a translation algorithm for providing at least one translation for the text data is executed, and based on a result of execution of the translation algorithm, one of the at least one translation from the outside is executed. instructions for receiving a translation selection signal for selecting a first translation and identifying second image data to which the first translation is applied to the first image data based on receiving the translation selection signal; Save.

In addition, the instructions may cause the processor to execute a speech bubble recognition algorithm using the first image data according to the generation of an input event received from the outside, and to execute the speech bubble recognition algorithm based on the execution of the speech bubble recognition algorithm, to determine the first image. A speech bubble area included in the data is set, and the text data within the speech bubble area is identified.

Further, the instructions cause the processor to set the speech bubble area based on the first image data through a deep learning module, wherein the deep learning module uses a generative adversarial network to generate a speech bubble area. It is trained to distinguish between an image and an error image, the deep learning module includes a generator module and an identifier module, the generator module is trained to generate fake data associated with the error image, and the identifier module is associated with the speech bubble image. It is learned to distinguish between real data and the fake data.

In addition, the instructions include, in the processor, applying the information of the first image data, the text data, and the at least one translation to an input node, and applying the first translation to an output node, so that the translation deep learning module Through this, the translation algorithm is learned.

Further, the instructions cause the processor to identify a text area in the first image data and to identify the text data for the text area.

Further, the instructions cause the processor to provide an editing screen that simultaneously displays the first image data and the second image data, and to provide a user interface capable of inputting user data for the second image data. .

Further, the instructions cause the processor to identify the second image data in which the font of the first translation is changed based on a font selection signal received from the outside.

Further, the instructions cause the processor to identify at least one image excluding the text data from the second image data, and to perform image editing on the at least one image.

In addition, the instructions include, the processor receives an inpainting signal for a first image of the at least one image from a unit, removes the first image based on the inpainting signal, and A background image surrounding the first image is applied to a region from which the first image is removed.

Further, the instructions cause the processor to receive input text data from the outside and identify a text conversion image corresponding to the input text data based on a result of execution of a pre-translation algorithm.

The image editing apparatus providing the webtoon providing method through speech bubble recognition according to the present invention may output an image in which the size of the text in the speech bubble area is adjusted when an input event occurs in the user terminal. In this case, the size of the text may be adjusted so as to be included within the speech bubble area. Therefore, even if the image output to the user terminal is small and it is difficult to read the text, the user can adjust the size of the text by generating an input event. In addition, even if the size of the text is adjusted, the size of the text is adjusted only within the speech bubble area, so there is an advantage in that the webtoon can be read without covering the image included in the webtoon.

An image editing device that provides a webtoon providing method through speech bubble recognition according to the present invention can learn to distinguish a speech bubble image from an erroneous image through a deep learning module, and set a speech bubble area in image data using the learned deep learning module. there is. Since the deep learning module learns to distinguish between speech bubble images and erroneous images, it is possible to accurately and quickly set speech bubble areas within image data input by the deep learning module without the need to manually designate speech bubble areas for each image data.

The image editing apparatus of the present invention can increase the accuracy of translation and provide a natural translation by providing a translation based on information of image data through deep learning.

In addition to the above description, specific effects of the present invention will be described together while explaining specific details for carrying out the present invention.

1 is a conceptual diagram illustrating a webtoon providing system through speech bubble recognition according to some embodiments of the present invention.

2 is a flowchart illustrating a webtoon providing method through speech bubble recognition according to some embodiments of the present invention.

FIG. 3 is an exemplary view of a user interface for explaining a method of displaying the image data of FIG. 2 on a screen of a user terminal.

4 is a block diagram schematically illustrating a deep learning module included in a first server according to some embodiments of the present invention.

FIG. 5 is a diagram showing the configuration of the deep learning module of FIG. 4 .

6 to 8 are exemplary views of a user interface for explaining a webtoon providing method according to some embodiments of the present invention.

9 to 11 are exemplary views of a user interface for explaining a method of adjusting the size of a speech bubble area and text according to some embodiments of the present invention.

12 is a diagram for explaining hardware implementation for performing a webtoon providing system through speech bubble recognition according to some embodiments of the present invention.

13 is a diagram for explaining a system including an image editing apparatus according to some embodiments of the present invention.

14 is a diagram for explaining an image editing apparatus according to some embodiments of the present invention.

15 are diagrams for explaining the operation of an image editing apparatus according to some embodiments of the present invention.

FIG. 16 is a diagram for explaining step S1001 of FIG. 15 .

FIG. 17 is a diagram for explaining step S1007 of FIG. 15 .

FIG. 18 is a diagram for explaining step S1007 of FIG. 15 .

19 is a diagram for explaining a learning method of a translation deep learning module according to some embodiments of the present invention.

20 is a sequence diagram illustrating operations of an image editing device and a deep learning translation module according to some embodiments of the present invention.

FIG. 21A is a diagram for explaining step S2005 of FIG. 20 .

FIG. 21B is a diagram for explaining step S2005 of FIG. 20 .

FIG. 22 is a diagram for explaining step S2015 of FIG. 20 .

23 is a diagram for explaining a learning method of a translation deep learning module according to some embodiments of the present invention.

24 is a diagram for explaining image editing of an image editing apparatus according to some embodiments of the present disclosure.

25 is a sequence diagram illustrating operations of an image editing apparatus and an image editing module according to some embodiments of the present disclosure.

FIG. 26 is a diagram for explaining step S3001 of FIG. 25 .

27, 28 and 29 are diagrams for explaining step S3005 of FIG. 25 .

Terms or words used in this specification and claims should not be construed as being limited to a general or dictionary meaning. According to the principle that an inventor may define a term or a concept of a word in order to best describe his/her invention, it should be interpreted as meaning and concept consistent with the technical spirit of the present invention. In addition, the embodiments described in this specification and the configurations shown in the drawings are only one embodiment in which the present invention is realized, and do not represent all of the technical spirit of the present invention, so they can be replaced at the time of the present application. It should be understood that there may be many equivalents and variations and applicable examples.

Terms such as first, second, A, and B used in this specification and claims may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term 'and/or' includes a combination of a plurality of related recited items or any one of a plurality of related recited items.

Terms used in this specification and claims are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. It should be understood that terms such as "include" or "having" in this application do not exclude in advance the possibility of existence or addition of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification. .

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs.

Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't In addition, each configuration, process, process or method included in each embodiment of the present invention may be shared within a range that does not contradict each other technically.

Hereinafter, with reference to FIGS. 1 to 12 , a webtoon providing method through speech bubble recognition provided by an image editing device according to some embodiments of the present invention will be described in detail.

Referring to FIG. 1 , a webtoon providing system (hereinafter, the system) through speech bubble recognition according to some embodiments of the present invention may include a first server 100, a user terminal 200, and a communication network 300. .

The first server 100 operates in association with the user terminal 200, and can be a subject of a webtoon providing method through speech bubble recognition according to some embodiments of the present invention. The first server 100 may execute a speech bubble recognition algorithm using image data. The first server 100 may perform an operation of setting a speech bubble area within the image data, recognizing text within the speech bubble area, and adjusting the size of the text through a speech bubble recognition algorithm.

According to some embodiments, the user terminal 200 may transmit a reading request for a webtoon that the user wants to read to the first server 100 . The first server 100 may provide image data to the user terminal 200 in response to a webtoon reading request. The image data may include one or more images included in the webtoon requested by the user. The user terminal 200 may display and visualize the received image data on a screen.

When an input event occurs in the user terminal 200, the first server 100 may execute a speech bubble recognition algorithm using image data. According to some embodiments, an input event may mean that the user terminal 200 receives an input signal. The input signal may include at least one of a click signal, a double click signal, a tap signal, a double tap signal, and a screen transition signal. In this case, the double-click signal or the double-tap signal means that both the first input signal (eg, the first click signal) and the second input signal (eg, the second click signal) are generated within a predetermined time.

For example, when an input event occurs in the user terminal 200, the user terminal 200 may provide an input event generation signal to the first server 100. The first server 100 may execute a speech bubble recognition algorithm for image data based on the input event generation signal.

The first server 100 may set a speech bubble area included in the image data based on an execution result of the speech bubble recognition algorithm. The speech bubble area refers to a virtual area formed along the edge of the speech bubble image. The speech bubble image is an image in which text about a person's lines, thoughts, and stories are described in image data, and pictures and text can be distinguished due to the speech bubble image. Hereinafter, for convenience of explanation, it is assumed that the size and shape of the speech bubble area and the speech bubble image are dependent on each other. For example, when the size of the speech bubble image increases, the size of the speech bubble area also increases. Conversely, when the size of the speech bubble area increases, the size of the speech bubble image may also increase. Therefore, in the following, the word bubble area and the word bubble image may be used interchangeably.

When the speech bubble area is set, the first server 100 may recognize text within the speech bubble area. Also, the first server 100 may adjust the size of the recognized text. In this case, the size of the text may be determined based on the speech bubble area. Specifically, the size of the text may be determined so that all of the text is included within the speech bubble area. In other words, even if the size of the text is adjusted, the text may not deviate from the speech bubble area. For convenience of explanation, image data before text size adjustment is defined as first image data, and image data after text size adjustment is defined as second image data.

In other words, according to the occurrence of an input event, the first server 100 sets a speech bubble area included in the first image data, recognizes text in the speech bubble area, and adjusts the size of the text to generate second image data. can do. The second image data generated by the first server 100 may be provided to the user terminal 200 . The user terminal 200 may display the second image data on the screen of the user terminal 200 . In some embodiments, the second image data may include an animation effect in which the size of text is gradually adjusted, but the embodiments are not limited thereto.

The user terminal 200 may communicate with the first server 100 through a network. The user terminal 200 may be, for example, a personal digital assistant (PDA), a portable computer, a web tablet, a wireless phone, a mobile phone, It can be applied to a digital music player, a memory card, or any electronic product capable of transmitting and/or receiving information in a wireless environment.

In addition, although only one user terminal 200 is shown in the drawing, the present invention is not limited thereto, and the first server 100 may operate in conjunction with a plurality of user terminals 200 .

Additionally, the user terminal 200 includes an input unit for receiving a user's input, a display unit for displaying visual information, a communication unit for sending and receiving signals to and from the outside, a camera unit for photographing the user's face, and converting the user's voice into digital data. It may include a microphone unit that converts, and a control unit that processes data, controls each unit inside the user terminal 200, and controls data transmission/reception between units.

The input unit of the user terminal 200 may include at least one of a keypad, a keyboard, a touchpad, and a touchscreen. The user terminal 200 may receive an input signal through an input unit. Accordingly, the input signal may be generated by at least one of a keypad, keyboard, touchpad, and touchscreen.

Meanwhile, the communication network 300 serves to connect the first server 100 and the user terminal 200 . That is, the communication network 300 means a communication network that provides an access path so that the user terminal 200 can transmit and receive data after accessing the first server 100 . The communication network 300 may be, for example, a wired network such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), ISDNs (Integrated Service Digital Networks), wireless LANs, CDMA, Bluetooth, satellite communication, etc. However, the scope of the present invention is not limited thereto.

Hereinafter, a webtoon providing method through speech bubble recognition performed in a system according to some embodiments of the present invention will be examined in detail using flowcharts and exemplary drawings. However, the webtoon providing method through speech bubble recognition according to some embodiments of the present invention may be performed by omitting some steps of the flow charts shown in this specification, or may be performed by adding specific steps not shown in this specification. is of course In addition, the webtoon providing method through speech bubble recognition according to some embodiments of the present invention does not necessarily need to be performed in the order of the flow charts shown in this specification, and may be performed by changing the order of specific steps or simultaneously. Those skilled in the art will be able to implement the embodiments of the present invention through various modifications without departing from the scope of the present invention.

Referring to FIGS. 1 and 2 , the user terminal 200 may provide the first server 100 with a reading request for a webtoon that the user wants to read. The first server 100 may provide the first image data to the user terminal 200 in response to a webtoon reading request. The user terminal 200 displays the received first image data on the screen of the user terminal 200 (S100). To describe an example of displaying the first image data on the screen of the user terminal 200 , further reference is made to FIG. 3 .

1 to 3 , the user terminal 200 may display the first image data received from the first server 100 on the screen of the user terminal 200 . The first image data may include a first image A1 and a second image A2.

The first image A1 may include a first speech bubble area B1, a first text area T1, and a first background area C1. For example, the first word balloon area B1 refers to a virtual area formed along the contour of the word balloon image included in the first image A1. The speech bubble image may be an image in which text about lines, thoughts, and stories of characters appearing in the webtoon are described.

The first text area T1 may be a virtual area surrounding text included in the first speech bubble area B1. Similar to the relationship between the speech bubble image and the speech bubble area, it is assumed that the text and the size of the text area are dependent on each other. For example, when the size of the text increases, the size of the text area also increases, and when the size of the text area increases, the size of the text also increases. Therefore, the term 'text' and the term 'text area' may be used interchangeably below. The first text area T1 may be included in the first speech bubble area B1. In other words, the size of the first text area T1 may be smaller than or equal to the size of the first speech bubble area B1.

The first background area C1 may be an area other than the first speech bubble area B1 in the first image A1. That is, the first background area C1 may be an area including characters and a background included in the first image A1 excluding the first speech bubble area B1.

Similarly, the second image A2 may include a second speech bubble area B2, a second text area T2, and a second background area C2. In the drawings, the first image A1 and the second image A2 are illustrated as including one speech bubble area and one text area, but the present invention is not limited thereto.

According to some embodiments, the user terminal 200 may further include a scroll bar (SB) for switching images displayed on the screen of the user terminal 200, but the embodiments are not limited thereto.

When an input event occurs in the user terminal 200, the first server 100 executes a speech bubble recognition algorithm for the first image data (S200). The first server 100 sets a speech bubble area in the first image data based on the execution result of the speech bubble recognition algorithm (S300).

According to some embodiments, the first server 100 may set a speech bubble area included in the first image data by using a deep learning module. For a detailed description of the deep learning module that sets the speech bubble area included in the first image data, further reference is made to FIG. 4 .

4 is a block diagram schematically illustrating a deep learning module included in a first server according to some embodiments of the present invention, and FIG. 5 is a diagram showing the configuration of the deep learning module of FIG. 4 .

(b1) of FIG. 4 shows a learning process of the deep learning module (DM) included in the first server 100. The first server 100 may train the deep learning module (DM) using the speech bubble image.

As described above, the speech bubble image may be an image in which text about lines, thoughts, and stories of characters appearing in the webtoon are described. For example, the speech bubble image may be a circular balloon with a triangular tip attached to it, a cloud-shaped balloon with a circular tip attached to it, or a square balloon with a triangular tip attached to it. It may have a polygonal balloon shape. However, the shape of the speech bubble image is not limited to the above-described example.

The deep learning module (DM) may perform learning based on a generative adversarial network (GAN). The deep learning module (DM) may include a generator module and an identifier module. The generator module may generate fake data by receiving the noise vector. The identifier module may receive real data and fake data generated by the generator module, and learn to distinguish between real data and fake data. The classification result value in the identifier module is provided to the generator module again and can be used for learning of the generator module. In other words, the generator module may be trained in a direction of generating fake data that is difficult to distinguish from real data, and the identifier module may be trained in a direction of increasing classification accuracy between real data and fake data.

According to some embodiments, the deep learning module (DM) may receive a speech bubble image previously stored in the first server 100 as real data. That is, the identifier module may receive a speech bubble image as real data. The generator module may generate an error image similar to the speech bubble image but not the speech bubble image as fake data by using the noise vector. The error image generated by the creator module may be provided to the identifier module as fake data. For example, when the speech bubble image has a shape in which a circular balloon has a triangular tip attached to it, the error image may be a circular shape image.

In other words, the identifier module may receive the error image as fake data and the speech bubble image as real data. The identifier module may learn to distinguish between fake data and real data, that is, an error image and a speech bubble image.

Accordingly, as shown in (b2) of FIG. 4, the trained deep learning module (DM) may receive image data as input data and provide a speech bubble area as output data. In other words, the trained deep learning module (DM) may set a speech bubble area in the image data by distinguishing real data included in the image data.

Deep learning technology, a type of machine learning, learns by going down to a multi-level deep level based on data. That is, deep learning represents a set of machine learning algorithms that extract core data from a plurality of data while stepping up.

The deep learning module (DM) may derive a speech bubble area by performing an algorithm with image data as an input. Although the deep learning module (DM) has been described herein as performing productive adversarial network-based learning, the embodiments are not limited thereto. The deep learning module (DM) may use various well-known neural network structures. For example, the deep learning module (DM) may include a convolutional neural network (CNN), a recurrent neural network (RNN), a deep belief network (DBN), and a GNN. (Graph Neural Network).

As described above, the deep learning module (DM) may be trained to distinguish between a speech bubble image and an erroneous image through a productive adversarial neural network by inputting a speech bubble image and a noise vector. Accordingly, the trained deep learning module (DM) can distinguish a speech bubble image, which is real data, from image data, and accordingly, a speech bubble area can be set.

However, the above-described image data is only an example of an input parameter input to the deep learning module (DM), and input data applied to the deep learning module (DM) may be variously added or changed and used.

Meanwhile, learning of the deep learning module (DM) can be performed by adjusting the weight of the connection line between nodes (and adjusting the bias value if necessary) so that a desired output is produced for a given input. In addition, the deep learning module (DM) may continuously update weight values by learning.

Additionally, although not clearly shown in the drawing, in another embodiment of the present invention, the operation of the deep learning module (DM) may be implemented in the first server 100 or a separate cloud first server. Hereinafter, an example of the aforementioned deep learning module (DM) will be described.

Referring to FIG. 5, the deep learning module (DM) includes an input layer (Input) including an input node receiving image data, an output layer (Output) including an output node for outputting a speech bubble area, and an input layer and It includes M hidden layers arranged between output layers.

Here, a weight may be set to an edge connecting nodes of each layer. The presence or absence of these weights or edges can be added, removed, or updated in the learning process. Therefore, through the learning process, weights of nodes and edges disposed between k input nodes and i output nodes may be updated.

All nodes and edges may be set to initial values before the deep learning module (DM) performs learning. However, when information is input cumulatively, the weights of nodes and edges are changed, and in this process, the parameters input as learning factors (ie, image data) and values assigned to output nodes (ie, speech bubble areas) are changed. matching can be made.

Additionally, in the case of using the first cloud server, the deep learning module (DM) may receive and process a large number of parameters. Therefore, the deep learning module (DM) can perform learning based on massive data.

In addition, the weights of nodes and edges between an input node and an output node constituting the deep learning module (DM) may be updated by the learning process of the deep learning module (DM). In addition, it goes without saying that parameters output from the deep learning module (DM) can be additionally extended to various data besides the speech bubble area.

In the webtoon providing method through speech bubble recognition of the present invention, learning to distinguish a speech bubble image from an erroneous image through a deep learning module, and setting a speech bubble area in image data using the learned deep learning module is possible. Since the deep learning module learns to distinguish between speech bubble images and erroneous images, it is possible to accurately and quickly set speech bubble areas within image data input by the deep learning module without the need to manually designate speech bubble areas for each image data.

Again, referring to FIGS. 1 to 3 , the first server 100 recognizes text within a set speech bubble area (S400). The first server 100 may recognize text within the speech bubble area through a character recognition technique. For example, the first server 100 may recognize text in the first speech bubble area B1 through a character recognition technique, and set the area including the recognized text as the first text area T1. can

The first server 100 may generate second image data by adjusting the size of the text in the speech bubble area (S500). In this case, the size of the adjusted text may not deviate from the speech bubble area. In other words, the size of the text within the speech bubble area may be adjusted so as to be included within the speech bubble area. In some embodiments, the second image data may include an animation effect in which the size of text is gradually changed, but the embodiments are not limited thereto.

The second image data generated by the first server 100 may be provided to the user terminal 200 through the communication network 300 . The user terminal 200 may receive the second image data and display it on the screen of the user terminal 200 (S600).

When the input event ends, the first server 100 may readjust the text size (S700). According to some embodiments, when an input event ends in the user terminal 200, the user terminal 200 may generate an input event end signal. The generated input event termination signal may be provided to the first server 100 through the communication network 300 . The first server 100 may receive the input event end signal and provide the first image data to the user terminal 200 again. The user terminal 200 may output the first image data to the screen of the user terminal 200 .

In the webtoon providing method through speech bubble recognition according to the present invention, when an input event occurs in a user terminal, an image having a text size adjusted in a speech bubble area may be output. In this case, the size of the text may be adjusted so as to be included within the speech bubble area. Therefore, even if the image output to the user terminal is small and it is difficult to read the text, the user can adjust the size of the text by generating an input event. In addition, even if the size of the text is adjusted, the size of the text is adjusted only within the speech bubble area, so there is an advantage in that the webtoon can be read without covering the image included in the webtoon.

According to some embodiments, the first server 100 may increase the size of the text while fixing the size of the speech bubble image. According to some other embodiments, the first server 100 may increase the size of the speech bubble area and increase the size of the text. For exemplary explanation, it will be described with reference to FIGS. 6 to 11 .

First, referring to FIGS. 6 to 8 , a method of increasing the size of text while fixing the size of the speech bubble area will be described.

6 to 8 are exemplary views of a user interface for explaining a webtoon providing method according to some embodiments of the present invention. For convenience of description, the same or similar content as the above description will be omitted or briefly described.

First, referring to (c1) of FIG. 6 , the user may provide an input signal through the user terminal 200 . When an input signal is provided to the user terminal 200, the user terminal 200 may generate an input event. That is, the occurrence of an input event in the user terminal 200 may mean that an input signal is provided to the user terminal 200 . When an input event occurs, the user terminal 200 may provide an input event generation signal to the first server 100 .

For example, the user terminal 200 may receive any one of a click signal, a tap signal, a double click signal, and a double tap signal. When an input signal is provided to the user terminal 200, an input event may occur. The double-click signal or double-tap signal may mean inputting two click signals or tap signals within a predetermined period. In other words, when the user terminal 200 receives the first signal and the second signal within a predetermined time, it may be determined that a double click signal or a double tap signal is input.

Upon receiving the input event generation signal, the first server 100 sets a first speech bubble area B1 included in the first image A1, and sets a first text area T1 within the first speech bubble area B1. ) can be set. The first server 100 may increase the size of the first text area T1 to generate second image data. In this case, the size of the first text area T1 may be smaller than or equal to the size of the first speech bubble area B1. In other words, the first text area T1 may be included in the first speech bubble area B1.

As shown in FIG. 6 (c2), the second image data generated by the first server 100 may be provided to the user terminal 200. The user terminal 200 may receive the second image data and display it on the screen of the user terminal 200 . That is, the user terminal 200 may display the second image data in which the size of the first text area T1 in the first speech bubble area B1 is increased on the screen of the user terminal 200 .

As shown in (c3) of FIG. 6 , when the input event ends, the user terminal 200 may generate an input event end signal and provide it to the first server 100 . When receiving the input event end signal, the first server 100 provides first image data to the user terminal 200 again, and the user terminal 200 may output the first image data.

An input event may be terminated according to various circumstances. For example, the input event may end after a specific time elapses (eg, 3 seconds later) after the input event has occurred. For another example, when an input signal is provided to the user terminal 200 again after an input event has occurred, the input event may end. For another example, an input event may occur only while an input signal is provided to the user terminal 200, and may end when the supply of the input signal is stopped.

Referring to (d1) and (d2) of FIG. 7 , the user terminal 200 according to some embodiments generates an input event when an input signal is provided and ends the input event when the input signal is stopped. can In other words, the user terminal 200 may generate an input event only while the input signal continues. For example, the user terminal 200 may generate an input event while a tap signal or a click signal is maintained, and may end the input event when the tap signal or click signal is stopped. Accordingly, the user terminal 200 may output the second image data with the text size adjusted only while the tap signal or the click signal is maintained.

Referring to (e1) and (e2) of FIG. 8 , the user terminal 200 may generate an input event when receiving a screen change signal. The screen switching signal may mean that the first image A1 included in the image data is switched to a second image A2 different from the first image A1. The user terminal 200 may generate an input event and provide the input event generation signal to the first server 100 when receiving a screen switching signal for switching from the first image A1 to the second image A2. there is. The first server 100 may increase the size of the second text area T2 to generate second image data. The second image data is provided to the user terminal 200, and the user terminal 200 may output the second image data. For example, a user may generate a screen switching signal using a scroll bar SB displayed on the screen of the user terminal 200, but embodiments are not limited thereto.

Hereinafter, a method of increasing the size of a speech bubble area and a text area will be described with reference to FIGS. 9 to 11 . In addition, in the following description, parts overlapping with the above-described embodiment are simplified or omitted.

Referring to FIGS. 1, 3, and 9 to 11 , when an input event occurs, the first server 100 may increase the size of the speech bubble area and the text area together. Even in this case, the size of the text area may be smaller than or equal to the size of the speech bubble area. The first server 100 may generate second image data in which the sizes of the speech bubble area and the text area are increased, and provide the second image data to the user terminal 200 . The user terminal 200 may output the second image data provided to the screen of the user terminal 200 .

According to some embodiments, as shown in (f1) and (f2) of FIG. 9, when the user terminal 200 receives one of a tap signal, a click signal, a double tap signal, and a double click signal, the user terminal 200 You can generate input events. The user terminal 200 provides an input event generation signal to the first server 100, and the first server 100 receives the input event generation signal to form a first speech bubble area B1 and a first text area T1. ) can be increased. Even in this case, the size of the first text area T1 may be smaller than or equal to the size of the first speech bubble area B1.

According to some other embodiments, as shown in (g1) and (g2) of FIG. 10 , an input event is generated when an input signal is provided to the user terminal 200, and an input event is generated when the supply of the input signal is stopped. may end In other words, while an input signal is received from the user terminal 200, the first server 100 may increase the size of the first speech bubble area B1 and the first text area T1. Similarly in this case, the size of the first text area T1 may be smaller than or equal to the size of the first speech bubble area B1.

According to some other embodiments, as shown in (h1) and (h2) of FIG. 11, when a screen change signal is provided to the user terminal 200, the user terminal 200 may generate an input event. For example, the user terminal 200 may receive a screen switching signal for switching from the first image A1 to the second image A2. The user terminal 200 may receive a screen change signal, generate an input event, and provide an input event generation signal to the first server 100 . The first server 100 may receive an input event generation signal and increase the size of the first speech bubble area B1 and the first text area T1.

Hereinafter, hardware implementation that performs a webtoon providing system through speech bubble recognition will be described in detail.

Referring to FIGS. 1 and 12 , the first server 100 may be implemented as an electronic device 1000 . The electronic device 1000 may include a controller 1010, an input/output device 1020 (I/O), a memory device 1030 (memory device), an interface 1040, and a bus 1050. The controller 1010 , the input/output device 1020 , the memory device 1030 and/or the interface 1040 may be coupled to each other through a bus 1050 . The bus 1050 corresponds to a path through which data is moved.

Specifically, the controller 1010 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphic processing unit (GPU), a microprocessor, a digital signal processor, a microcontroller, and an application processor (AP). , application processor), and logic elements capable of performing functions similar thereto.

The input/output device 1020 may include at least one of a keypad, keyboard, touchpad, touchscreen, display device, and joystick. The memory device 1030 may store data and/or programs.

The memory device 1030 is an operating memory for improving the operation of the controller 1010 and may include high-speed DRAM and/or SRAM. The memory device 1030 may store a program or application for a webtoon providing method through speech bubble recognition therein.

The interface 1040 may perform a function of transmitting data to a communication network or receiving data from the communication network. The interface 1040 may operate in a wired or wireless form. For example, the interface 1040 may include an antenna or a wired/wireless transceiver.

However, this is only an example of the electronic device 1000 in which the system of the present invention is implemented, and the webtoon providing method through speech bubble recognition of the present invention can be implemented and operated in various systems and hardware.

According to some embodiments of the present invention, a method for providing a webtoon through speech bubble recognition performed in a first server associated with a user terminal includes displaying first image data on a screen of a user terminal; According to generation, executing a speech bubble recognition algorithm using the first image data, setting a speech bubble area included in the first image data based on a result of the execution of the speech bubble recognition algorithm, setting a first word in the first speech bubble area The steps of setting a text area, generating second image data by adjusting the size of the first text area, and displaying the second image data on a screen of a user terminal, wherein the size of the first text area is adjusted. The size of is smaller than or equal to the size of the first speech bubble area.

In some embodiments, adjusting the size of the first text area may include fixing the size of the first speech bubble area and increasing the size of the first text area.

In some embodiments, the method may further include displaying the first image data on the screen of the user terminal according to the end of the input event.

In some embodiments, adjusting the size of the first text area may include increasing the size of the first speech bubble area and increasing the size of the first text area.

In some embodiments, an input event may occur when an input signal is provided to a user terminal.

In some embodiments, the first signal and the second signal may be provided as input signals within a predetermined time period.

In some embodiments, the input signal may be generated by at least one of a touch pad, a touch screen, a mouse, and a keyboard.

In some embodiments, when the first image included in the first image data displayed on the screen of the user terminal is converted to a second image different from the first image, an input signal may be provided to the user terminal.

In some embodiments, an input event may occur when an input signal is provided to the user terminal, and may end when the input signal is stopped to the user terminal.

In some embodiments, the speech bubble recognition algorithm includes setting a first speech bubble region based on the first image data through a deep learning module, wherein the deep learning module uses a generative adversarial network. The deep learning module includes a generator module and an identifier module, the generator module is trained to generate fake data associated with the error image, and the identifier module is trained to generate fake data associated with the speech bubble image and the identifier module , can be learned to distinguish fake data.

Steps according to the webtoon providing method through speech bubble recognition according to some embodiments of the present invention may be performed by an image editing device described below.

Hereinafter, an image editing apparatus according to some embodiments of the present invention will be described with reference to FIG. 13 . For clarity of description, overlapping with those described above are simplified or omitted.

13 is a diagram for explaining a system including an image editing device 400 according to some embodiments of the present invention.

Referring to FIG. 13 , an image editing device 400 according to some embodiments of the present invention may communicate with a first server 100 and a second server 2000 through a communication network 300 . The image editing device 400 may be included in the user terminal 200 .

The second server 2000 may operate in association with the image editing device 400 .

In some embodiments, the second server 2000 may provide at least one translation based on information of the first image data (eg, original data) by using a translation algorithm. The second server 2000 may include, for example, a translation deep learning module capable of executing a translation algorithm.

In some embodiments, the second server 2000 may identify a text conversion image corresponding to the input text data using a translation algorithm. For example, when input text data is input, the second server 2000 may identify the input text data through a translation algorithm, and may identify an image corresponding to the text content of the input text data as a text conversion image.

Although it has been described in FIG. 13 that the translation algorithm is performed by the second server 2000, it is not limited thereto. For example, the operation of the second server 2000 may be executed in the image editing device 400 .

Hereinafter, an image editing apparatus 400 according to some embodiments of the present invention will be described with reference to FIG. 14 . For clarity of description, overlapping with those described above are simplified or omitted.

14 is a diagram for explaining an image editing apparatus 400 according to some embodiments of the present invention.

Referring to FIG. 14 , an image editing device 400 according to some embodiments of the present invention may include a processor 410, a memory 420, a communication module 430, and a display 440.

The memory 420 may store commands, information, or data related to operations of components included in the image editing device 400 . For example, memory 420 may store instructions that, when executed, enable processor 410 to perform various operations described herein.

The image editing device 400 communicates with other devices (eg, at least one of a user, a first server (100 in FIG. 13) and a second server (2000 in FIG. 13)) through the communication module 430. can communicate

The display 440 may visually provide information to the outside of the image editing apparatus 400 (eg, a user).

The processor 410 may be operatively coupled to the display 440 , the memory 420 , and the communication module 430 in order to perform overall functions of the image editing device 400 . Processor 410 may include, for example, one or more processors. The one or more processors may include, for example, an image signal processor (ISP), an application processor (AP), or a communication processor (CP).

In some embodiments, the processor 410 may identify text data in the first image data. For example, the processor 410 may identify a text area including text data of the first image data and identify text data included in the text area. For example, the processor 410 may identify a speech bubble area of the first image data and identify text included in the speech bubble area.

The processor 410 may execute a translation algorithm that provides at least one translation of text data based on information of the first image data. The information of the first image data may include, for example, at least one of information about a source of the first image data, information about a creator who created the first image data, and information about a category of the first image data. can Information of the first image data has been described with three examples, but is not limited thereto. For example, if the information is the basis for executing the translation algorithm, it can be included in the information of the first image data.

The processor 410 may communicate with an external server to execute a translation algorithm in order to provide at least one translation of text data. The translation algorithm may be executed, for example, in an external server (eg, the second server 2000 of FIG. 13).

The processor 410 may receive a translation selection signal for selecting a first translation, which is one of at least one translation, from the outside, based on an execution result of the translation algorithm. For example, the processor 410 may receive data including at least one translation from an external server (eg, the second server 2000 of FIG. 13 ) based on an execution result of a translation algorithm. The processor 410 may provide, for example, at least one translated text through the display 440 . For example, the processor 410 may receive a translation selection signal for selecting a first translation, which is one of at least one translation, from the outside (eg, a user). The processor 410 may notify an external server that a translation selection signal has been received, for example.

The processor 410 may identify second image data to which the first translation is applied to the first image data, based on receiving the translation selection signal. The processor 410 may receive, for example, second image data to which the first translation is applied from an external server (eg, the second server 2000 of FIG. 13 ). The processor 410 may provide, for example, second image data through the display 440 .

In some embodiments, as described with reference to FIGS. 1 to 12 , the processor 410 performs a speech bubble recognition algorithm using the first image data according to generation of an input event received from the outside (eg, a user). can run Based on the execution of the speech bubble recognition algorithm, the processor 410 may set a speech bubble area included in the first image data and identify text data within the speech bubble area.

The processor 410 may set a speech bubble area through a deep learning module based on the first image data.

In some embodiments, the processor 410 may identify the second image data in which the font of the first translation is changed, based on the font selection signal received from the outside. The processor 410 may notify an external server (eg, the second server 2000 of FIG. 13 ) that a font selection signal has been received. The processor 410 receives, for example, the second image data in which the font of the first translation is changed from an external server (eg, the second server 2000 of FIG. 13 ), and provides it through the display 440 . can

In some embodiments, the processor 410 may receive input text data from an external source (eg, a user), and may identify a text conversion image corresponding to the input text data based on an execution result of a translation algorithm. The processor 410 transmits the input text data to an external server (eg, the second server 2000 of FIG. 13 ) using, for example, a translation algorithm that can interpret the text and identify its meaning. can The processor 410 may receive, for example, a text conversion image corresponding to the content of the text included in the input text data from an external server (eg, the second server 2000 of FIG. 13 ). The processor 410 may provide, for example, a text conversion image through the display 440 .

In some embodiments, the processor 410 applies information of the first image data, text data, and at least one translation to an input node, and applies the first translation to an output node so that the translation algorithm is learned through a translation deep learning module. can be made

In some embodiments, the processor 410 may provide an editing screen that simultaneously displays the first image data and the second image data.

In some embodiments, the processor 410 may provide a user interface through which user data may be input for the second image data.

In some embodiments, after text data is identified in the first image data, the processor 410 identifies at least one image excluding the text data from the first image data, and enables image editing on the at least one image. can do. The processor 410 may cause an external server (eg, the second server 2000 of FIG. 13 ) to identify at least one image excluding text data from the first image data. The at least one image may be, for example, any one of at least one main image and/or at least one background image. The processor 410 may execute an operation related to image editing of an external server (eg, the second server 2000 of FIG. 13 ) to enable image editing of at least one image. Image editing may mean performing image processing, such as changing resolution, removing a background image, changing size, and the like.

In some embodiments, the processor 410 receives an inpainting signal for a first image of at least one image from an outside (eg, user), and removes the first image based on the inpainting signal. and apply a background image surrounding the first image to the region from which the first image is removed. The processor 410 may notify an external server (eg, the second server 2000 of FIG. 13 ) that an inpainting signal for the first image has been received. For example, the processor 410 removes the first image based on the inpainting signal, and transfers second image data to an external server (eg, a background image surrounding the first image) applied to the area where the first image is removed. , may be received from the second server 2000 of FIG. 13 . The processor 410 may provide second image data through, for example, the display 440 .

Hereinafter, the operation of the image editing device (400 in FIG. 14) according to some embodiments of the present invention will be described with reference to FIGS. 15 to 18. For clarity of description, overlapping with those described above are simplified or omitted.

15 are diagrams for explaining the operation of an image editing apparatus according to some embodiments of the present invention. FIG. 16 is a diagram for explaining step S1001 of FIG. 15 . FIG. 17 is a diagram for explaining step S1007 of FIG. 15 . FIG. 18 is a diagram for explaining step S1007 of FIG. 15 .

Hereinafter, it is assumed that the image editing device 400 of FIGS. 13 and 14 performs the process of FIG. 15 . An operation described as being performed by the image editing device 400 is an instruction (command) that can be performed (or executed) by a processor (410 in FIG. 14 ) of the image editing device (400 in FIGS. 13 and 14 ). can be implemented with The instructions may be stored, for example, in a computer recording medium or a memory (420 in FIG. 14) of the image editing device 400 of FIGS. 13 and 14.

Referring to FIG. 15 , in step S1001, the image editing device may identify text data from the first image data. The first image data may further include, for example, at least one of a main image and a background image. The image editing device may transmit the first image data, which is the text data to be identified, to the second server ( 2000 in FIG. 13 ). The image editing apparatus may identify text data in the first image data by receiving information on the text data of the first image data from the second server (2000 in FIG. 13 ).

Referring to FIG. 16 , the image editing device may identify text data 513 , a main image 517 , and a background image 519 from first image data 510 . The image editing device receives information on the text data 513, the main image 517, and the background image 519 from the second server (2000 in FIG. 13), so that the text data ( 513), a main image 517 and a background image 519 can be identified.

In some embodiments, the image editing device may identify text data 513 from the first image data 510 . The image editing device receives information about the text area 511 and the text data 513 included in the text area 511 from the second server (2000 in FIG. 13), thereby converting the text data from the first image data 510 to (513) can be identified.

In some embodiments, the image editing device may identify a speech bubble area 515 in the first image data 510 as described with reference to FIGS. 1 to 12 . The image editing device receives text data from the first image data 510 by receiving information about the speech bubble area 515 and the text data 513 included in the speech bubble area 515 from the second server (2000 in FIG. 13 ). (513) can be identified.

Referring back to FIG. 15 , in step S1003 , the image editing device may execute a translation algorithm that provides at least one translation of the text data based on the information of the first image data. The translation algorithm may be executed, for example, in the second server (2000 in FIG. 13). The image editing device may transmit information of the first image data to the second server (2000 in FIG. 13). When the image editing device transmits information of the first image data to the second server (2000 of FIG. 13), the second server (2000 of FIG. 13) may execute the translation algorithm. The second server (2000 in FIG. 13) may generate at least one translation of the text data using a translation algorithm. The second server (2000 in FIG. 13 ) may generate at least one translation including at least one of a direct translation of the text data and a paraphrase of the text data, based on information of the first image data. The image editing device may receive at least one translated text from the second server (2000 in FIG. 13). The image editing device may provide at least one translated text through a display ( 440 in FIG. 14 ).

In step S1005, the image editing device may receive a translation selection signal for selecting a first translation, which is one of at least one translation, from the outside based on the execution result of the translation algorithm. The image editing device may receive a translation selection signal from the outside (eg, a user).

In operation S1007, the image editing device may identify second image data to which a first translation is applied to the first image data, based on receiving the translation selection signal. The image editing device may inform the second server (2000 in FIG. 13) that the translation selection signal has been received. The second server (2000 in FIG. 13 ) may generate second image data obtained by changing the text data of the text area of the first image data into the first translation, based on the translation selection signal. The image editing apparatus may identify the second image data by receiving the second image data from the second server (2000 in FIG. 13 ).

Referring to FIG. 17 , in some embodiments, an image editing device may provide an editing screen 600 that simultaneously displays first image data 510 and second image data 520 . The image editing device may provide the editing screen 600 through the display ( 440 of FIG. 14 ). The second image data 520 may be image data to which the first translation 523 is applied to the text area 511 . The image editing device provides the first image data 510 to the first area 601 of the editing screen 600 and provides the second image data 520 to the second area 603 of the editing screen 600. can do. The user can simultaneously view the first image data 510 and the second image data 520 to which the first translation 523 is applied on one editing screen 600 .

Referring to FIG. 18 , in some embodiments, the image editing device may provide an editing screen 600 that further includes a user interface 530 through which user data may be input for second image data 520 . A user may input data (eg, text) into the user interface 530 . User data input to the user interface 530 may be editable. Furthermore, user data input to the user interface 530 and the user interface 530 may be deleted by the user.

For example, when a plurality of users edit the second image data 520 using the editing screen 600, when a first user inputs a comment (ie, user data) into the user interface 530 and then saves it , the second user, who is the next user, may check the comments left by the first user through the user interface 530 . The second user may create another user interface 530 to input and store comments (ie, user data). The third user, who is the next user, may check both the comments left by the first user and the comments left by the second user. The image editing apparatus according to an embodiment of the present invention can provide convenience by enabling a plurality of users to leave comments using the user interface 530 when editing image data, and it is possible to provide convenience when editing image data by a plurality of users. can be done easily

Hereinafter, learning of the translation deep learning module 2001 included in the second server (2000 in FIG. 13 ) according to some embodiments of the present invention will be described with reference to FIGS. 5 and 19 . For clarity of description, overlapping with those described above are simplified or omitted.

19 is a diagram for explaining a learning method of a translation deep learning module 2001 according to some embodiments of the present invention.

The description of FIG. 5 can be applied to the translation deep learning module 2001 of FIG. 19 . The operation described as being performed by the image editing device (400 of FIG. 14) is an instruction (command) that can be performed (or executed) by the processor (410 of FIG. 14) of the image editing device (400 of FIG. 14). can be implemented with The instructions may be stored in, for example, a computer recording medium or the memory 420 of the image editing device 400 of FIGS. 13 and 14 .

5 and 19, the image editing device applies information of first image data, text data, and at least one translation to an input node, and applies the first translation to an output node, so that a translation deep learning module (2001 ), the translation algorithm can be learned. The image editing device may transmit the first learning input data to the translation deep learning module 2001 included in the second server (2000 in FIG. 13 ) so that the translation deep learning module 2001 is trained. The first learning input data may include information of the first image data, text data, at least one translation, and first translation. The translation deep learning module 2001 may be learned by applying information of the first image data, text data, and at least one translation to an input node, and applying the first translation to an output node.

Hereinafter, operations of the image editing device 400 and the translation deep learning module 2001 according to some embodiments of the present invention will be described with reference to FIGS. 20 to 22 . For clarity of description, overlapping with those described above are simplified or omitted.

20 is a sequence diagram illustrating operations of an image editing device 400 and a translation deep learning module 2001 according to some embodiments of the present invention. FIG. 21A is a diagram for explaining step S2005 of FIG. 20 . FIG. 21B is a diagram for explaining step S2005 of FIG. 20 . FIG. 22 is a diagram for explaining step S2015 of FIG. 20 .

An operation described as being performed by the image editing device 400 may be implemented as instructions (commands) that may be performed (or executed) by a processor (410 in FIG. 14 ) of the image editing device 400. . The instructions may be stored in, for example, a computer recording medium or the memory 420 of the image editing device 400 of FIGS. 13 and 14 .

Referring to FIG. 20 , in step S2001, the image editing apparatus 400 may receive input text data. The input text data may be received before or after, for example, the second image data to which the first translation is applied is identified by the image editing device 400 . The image editing device 400 may receive input text data from the outside (eg, a user). The input text data may be text data input to obtain an image.

In step S2003, the image editing device 400 may transmit the input text data to the translation deep learning module 2001. The translation deep learning module 2001 may be included in the second server (2000 in FIG. 13).

In step S2005, the translation deep learning module 2001 may generate at least one text conversion image corresponding to the input text data.

Referring to FIGS. 21A and 21B , the translation deep learning module 2001 generates at least one

text conversion image

701 or 703 corresponding to the input text data by identifying the meaning of the input text data using a translation algorithm. can do. For example, when the image editing device 400 receives input text data of “separate collection box” from the outside, the translation deep learning module 2001 converts at least one text corresponding to the input text data “separate collection box”.

Images

701 and 703 can be created. The at least one

text conversion image

701 or 703 may include a reference image 701 corresponding to the input text data and a derivative image 703 of the reference image 701 .

Referring back to FIG. 20 , in step S2007 , the translation deep learning module 2001 may transmit at least one text conversion image to the image editing device 400 .

In step S2009, the image editing device 400 may identify at least one text conversion image. The image editing device 400 may provide at least one text converted image through a display ( 440 of FIG. 14 ).

In operation S2011, the image editing apparatus 400 may receive a correction signal for a first text conversion image that is one of at least one text conversion image. The image editing apparatus 400 may receive a correction signal for correcting the first text conversion image from an outside (eg, user). The modification may include, for example, changing the background image of the first text conversion image.

In step S2013, the image editing device 400 may transmit a correction signal for the first text conversion image to the translation deep learning module 2001.

In step S2015, the translation deep learning module 2001 may generate a modified image of the first text converted image based on the modified signal. For example, when the correction signal includes content for changing the background image of the first converted text image by modifying the first converted text image, the translation deep learning module 2001 changes the background image of the first converted text image. A modified image can be created.

Referring to FIG. 22 , the image editing apparatus 400 includes content for selecting a first text conversion image, which is one of at least one text conversion image, and content for requesting modification of the first text conversion image. A signal may be received from the outside (eg, user). The translation deep learning module 2001 may generate a modified image 705 obtained by changing the background image of the first text converted image (eg, 703 of FIG. 21B ) based on the modified signal.

Referring back to FIG. 20 , in step S2017, the translation deep learning module 2001 may transmit the corrected image to the image editing device 400.

In step S2019, the image editing device 400 may identify a modified image. The image editing device 400 may provide a modified image through a display ( 440 of FIG. 14 ).

Hereinafter, learning of the translation deep learning module 2001 included in the second server (2000 in FIG. 13 ) according to some embodiments of the present invention will be described with reference to FIGS. 5 and 23 . For clarity of description, overlapping with those described above are simplified or omitted.

23 is a diagram for explaining a learning method of a translation deep learning module 2001 according to some embodiments of the present invention.

Referring to FIGS. 5 and 23 , the image editing device applies input text data and a correction signal to an input node, and applies at least one text conversion image and a correction image to an output node to perform a translation deep learning module (2001). Through this, the translation algorithm can be learned. The image editing device may transmit the second learning input data to the translation deep learning module 2001 included in the second server (2000 in FIG. 13 ) so that the translation deep learning module 2001 is trained. The second learning input data may include input text data, a correction signal, at least one text conversion image, and a correction image. The translation deep learning module 2001 may be learned by applying input text data and a correction signal to an input node and applying at least one text conversion image and a correction image to an output node.

Hereinafter, an image editing operation of an image editing apparatus according to some embodiments of the present invention will be described with reference to FIGS. 13 and 24 . For clarity of description, overlapping with those described above are simplified or omitted.

24 is a diagram for explaining image editing by the image editing device 400 according to some embodiments of the present disclosure.

An operation described as being performed by the image editing device 400 may be implemented as instructions (commands) that may be performed (or executed) by a processor (410 in FIG. 14 ) of the image editing device 400. . The instructions may be stored, for example, in a computer recording medium or a memory (420 in FIG. 14) of the image editing apparatus 400 of FIGS. 14 and 24.

Referring to FIGS. 13 and 24 , the image editing device 400 according to some embodiments of the present invention may edit the first translation 523 . Also, the image editing device 400 may perform image editing on at least one image other than text data of the second image data 520 .

In some embodiments, the image editing device 400 may provide third image data, which is image data in which the font of the first translation is changed from the second image data, based on a font selection signal received from the outside (eg, a user) ( 540) can be identified. The image editing device 400 receives the third image data 540 in which the font of the first translation has been changed from the second server 2000 to identify the third image data 540 in which the font of the first translation has been changed. can

For example, the image editing device 400 may notify the second server 2000 that a font selection signal has been received. The second server 2000 applies the modified first translation 524, in which the font of the first translation 523 is changed, to the second image data 520 based on the font selection signal, thereby providing third image data ( 540) can be created. The image editing device 400 may identify the third image data 540 by receiving the third image data 540 from the second server 2000 . The image editing device 400 may provide the third image data 540 to the second area 603 of the editing screen 600 through the display 440 of FIG. 14 .

In some embodiments, the image editing device 400 identifies at least one image (the main image 517 and the background image 519) except for the first translation 523 that is text data from the second image data 520. can do. The image editing device 400 may perform image editing on at least one image.

For example, the image editing device 400 may receive a signal for selecting a background image 519 , which is one of at least one image, and notify the second server 2000 of the signal. The second server 2000 further receives a signal related to image editing of the selected background image 519 from the image editing device 400, performs image editing related to the received signal, and generates third image data 540. can create For example, the image editing device 400 may identify the third image data 540 by receiving the third image data 540 to which the changed background image 529 is applied from the second server 2000 . The image editing device 400 may provide the third image data 540 to the second area 603 of the editing screen 600 through the display 440 of FIG. 14 .

It has been described that image editing is performed on the background image 519 with reference to FIG. 24 , but is not limited thereto. For example, the description of image editing of the background image 519 may also be applied to the main image 517 .

In addition, the case where both the background image 519 and the font of the first translation 523 are changed in FIG. 24 has been described as an example, but is not limited thereto. For example, in some embodiments, only image editing may be performed on at least one image, or only a font change of the first translation 523 may be performed.

In some embodiments, the image editing device 400 may receive an inpainting signal from the outside (eg, a user). The image editing device 400 may notify the second server 2000 that the inpainting signal has been received. Based on the inpainting signal, the second server 2000 may perform an editing operation of removing a specific image and applying a background image surrounding the specific image to an area where the specific image is removed. This is explained below with reference to FIGS. 25 to 29 .

Hereinafter, operations of the image editing device 400 and the image editing module 2003 according to some embodiments of the present invention will be described with reference to FIGS. 25 to 29 . For clarity of description, overlapping with those described above are simplified or omitted.

25 is a sequence diagram illustrating operations of an image editing device 400 and an image editing module 2003 according to some embodiments of the present disclosure. FIG. 26 is a diagram for explaining step S3001 of FIG. 25 . 27, 28 and 29 are diagrams for explaining step S3005 of FIG. 25 .

An operation described as being performed by the image editing device 400 may be implemented as instructions (commands) that may be performed (or executed) by a processor (410 in FIG. 14 ) of the image editing device 400. . The instructions may be stored in, for example, a computer recording medium or the memory 420 of the image editing apparatus 400 of FIG. 14 .

Referring to FIG. 25 , in step S3001, the image editing apparatus 400 may receive an inpainting signal for the first image. The first image may be, for example, one of at least one image excluding the first translation in the image data to which the translation for the text data is applied.

Referring to FIG. 26 , the image editing device 400 may identify at least one image excluding the first translation 523 from the fourth image data 550 to which the translation for text data is applied. The at least one image may include a main image 517 , a background image 519 , and a first image 516 . The image editing device 400 may receive an inpainting signal for the first image 516 of at least one image.

Referring to FIG. 25 , in step S3003, the image editing device 400 may transmit an inpainting signal to the image editing module 2003. The image editing module 2003 may be, for example, a module included in the second server (2000 in FIG. 13 ). The image editing module 2003 may perform an editing operation of image data.

In step S3005, the image editing module 2003 removes the first image based on receiving the inpainting signal, and applies a background image surrounding the removed first image to the region from which the first image is removed. Inpainting image data can be created.

The image editing module 2003 may identify a background image surrounding the first image. The image editing module 2003 may generate inpainting image data by removing the first image and then applying a background image surrounding the identified first image to a region from which the first image is removed.

Referring to FIG. 27 , the image editing module 2003 may identify a background image 518 surrounding the first image 516 based on the inpainting signal. The background image 518 surrounding the first image 516 may be a partial area of the fourth image data 550 including the first image 516 .

Referring to FIGS. 28 and 29 , the image editing module 2003 may remove the first image 516 . The image editing module 2003 generates inpainting image data 560 by applying the background image 518 surrounding the identified first image 516 to the area 516' from which the first image 516 is removed. can do. For example, the image editing module 2003 may generate the inpainting image data 560 by inserting clouds as the background image 518 into the region 516′ from which the first image 516 is removed. .

Referring back to FIG. 25 , in step S3007, the image editing module 2003 may transmit inpainting image data to the image editing device 400.

In step S3009, the image editing device 400 may identify inpainting image data. The image editing device 400 may provide inpainting image data through a display ( 440 of FIG. 14 ).

An image editing device according to an embodiment of the present invention can improve user convenience by simultaneously providing an image editing function as well as a translation function.

The above description is merely an example of the technical idea of the present embodiment, and various modifications and variations can be made to those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

Claims

In the image editing device,

processor; and

a memory operatively coupled to the processor;

The memory, when executed, causes the processor to:

identify text data in the first image data;

Executing a translation algorithm that provides at least one translation for the text data based on information of the first image data;

Based on the execution result of the translation algorithm, receiving a translation selection signal for selecting a first translation, which is one of the at least one translation, from the outside;

Based on receiving the translation selection signal, storing instructions for identifying second image data to which the first translation is applied to the first image data

image editing device.
According to claim 1,

The instructions, the processor,

According to the generation of an input event received from the outside, a speech bubble recognition algorithm is executed using the first image data,

Based on the execution of the speech bubble recognition algorithm, setting a speech bubble area included in the first image data;

To identify the text data in the speech bubble area

image editing device.
According to claim 2,

The instructions, the processor,

Based on the first image data, set the speech bubble area through a deep learning module;

The deep learning module is trained to distinguish a speech bubble image from an error image using a generative adversarial network;

The deep learning module includes a generator module and an identifier module, the generator module is trained to generate fake data associated with the error image, and the identifier module is configured to distinguish between real data associated with the speech bubble image and the fake data. to learn

image editing device.
According to claim 1,

The instructions, the processor,

Applying the information of the first image data, the text data, and the at least one translation to an input node, and applying the first translation to an output node, so that the translation algorithm is learned through a translation deep learning module

image editing device.
According to claim 1,

The instructions, the processor,

Identifying a text area in the first image data;

To identify the text data for the text area.

image editing device.
According to claim 1,

The instructions, the processor,

Provide an editing screen that simultaneously displays the first image data and the second image data;

To provide a user interface for inputting user data for the second image data

image editing device.
According to claim 1,

The instructions, the processor,

Identifying the second image data in which the font of the first translation is changed based on a font selection signal received from the outside

image editing device.
According to claim 1,

The instructions, the processor,

Identifying at least one image excluding the text data from the second image data;

To perform image editing on the at least one image

image editing device.
According to claim 8,

The instructions, the processor,

Receiving an inpainting signal for a first image of the at least one image from the outside;

Based on the inpainting signal, the first image is removed, and a background image surrounding the first image is applied to a region from which the first image is removed.

image editing device.
According to claim 1,

The instructions, the processor,

Receive input text data from the outside,

Based on the execution result of the translation algorithm, to identify a text conversion image corresponding to the input text data

image editing device.