WO2023191182A1

WO2023191182A1 - System and method for automatically converting animation into webcomics by one touch

Info

Publication number: WO2023191182A1
Application number: PCT/KR2022/007300
Authority: WO
Inventors: 김탁훈; 최종원; 배소연; 황진수
Original assignee: (주)탁툰엔터프라이즈
Priority date: 2022-03-31
Filing date: 2022-05-23
Publication date: 2023-10-05
Also published as: KR20230141237A

Abstract

A system for automatically converting an animation into webcomics by one touch according to an embodiment of the present invention comprises: an input unit for receiving an input of an animation from a user terminal; an image cut extraction unit for determining a motion of an object, speech start and end points of the object, and a motion of a camera on the basis of a frame within the animation, a sound, and scene camera coordinate information, and then extracting at least one of multiple valid cuts matching a result of the determination, as a webcomics cut; an onomatopoeia and tone analysis unit for separating object onomatopoeia from the sound, and then analyzing an object tone; and a speech balloon and special effect application unit for inserting or changing concentration lines, speed lines, a speech balloon, a sound effect, and a layout color so that the object onomatopoeia and tone are reflected to an image of the valid cut.

Description

System and method for automatically converting animation to webtoon with one touch

The present invention relates to a system and method for automatically converting animation into webtoon with one touch.

The image editing technology field includes filtering technology that changes the original color of an image to express a desired effect, and image warping technology that changes the entire rectangular area of the image or part of the image into a desired shape.

Additionally, there are image selection technologies that select desired parts or objects in a given image and separate them from the background, and image blending technologies that combine separate images with other images. Video editing technology includes video compression technology that deals with the format of the video itself, and from the perspective of compositing animation, keyframe animation technology transforms the shape of a given object based on several predetermined key shapes, and the movement is expressed in a mathematical formula. There is a procedural animation creation technology that expresses and applies a function of time by using, etc., and a simulation-based animation creation technology that applies the laws of motion of particles or high-dimensional physical laws.

[Prior art literature]

[Patent Document]

Registered Patent Publication No. 10-2086780

The purpose of the present invention is to provide a system and method for automatically converting animation to webtoon with one touch, which can solve conventional problems.

A system for automatically converting animation into webtoon with one touch according to an embodiment of the present invention to solve the above problem includes an input unit that receives animation from a user terminal; After determining the object's movement, the object's dialogue start and end point, and camera movement based on the frame, sound, and scene camera coordinate information in the animation, at least one of a plurality of valid cuts that match the judgment result is converted into a webtoon cut. an image cut extraction unit that extracts; an onomatopoeia and tone analysis unit that separates the onomatopoeia of an object from the sound and then analyzes the object tone; and a speech balloon and special effect application unit that inserts or changes concentration lines, speed lines, speech balloons, sound effects, and layout colors to reflect the onomatopoeia and tone in the effective cut image.

In order to solve the above problem, a method of operating a system for automatically converting animation to webtoon with one touch according to an embodiment of the present invention includes the steps of receiving animation from an input unit; After the image cut extractor determines the movement of the object, the starting and ending points of the object's dialogue, and the camera movement based on the frame, sound, and scene camera coordinate information in the animation, at least one of a plurality of valid cuts matching the judgment result. Step of extracting the above into webtoon cuts; Checking the accuracy between the webtoon cut and the correct answer cut (GT) in an accuracy inspection unit; separating object onomatopoeia from the sound in an onomatopoeia and tone analysis unit and then analyzing object tone; And a step of inserting a concentration line, a speed line, a speech balloon, a sound effect, and a layout color so that the onomatopoeia and tone of the object are reflected in the webtoon cut in the speech balloon and special effect application unit.

Therefore, by using the system and method for automatically converting animation to webtoon with one touch according to an embodiment of the present invention, labor-intensive repetitive work that occurs in the process of converting animation to webtoon can be minimized, saving cost and time. It provides the advantage of being able to

In addition, anyone can easily use it, maximizing accessibility for industry workers, and by outputting the results in split layers, it has the advantage of being easy to correct and supplement.

Figure 1 is a block diagram of a system for automatically converting animation into webtoon with one touch according to an embodiment of the present invention.

FIG. 2 is a detailed configuration diagram of the image cut extractor shown in FIG. 1.

Figures 3 and 4 are exemplary diagrams for explaining the cropping process.

Figure 5 is an example diagram comparing the test result cut and GT cut for accuracy inspection and the result cut.

Figure 6 is an example diagram of a special effect applied by the speech balloon and special effect application unit shown in Figure 1.

FIG. 7 is an example diagram illustrating the arrangement of speech balloons applied by the speech balloon and special effect application unit shown in FIG. 1.

Figure 8 is an example of a special line inserted into a webtoon cut based on the coordinate values of objects for each frame.

Figure 9 is an example diagram of layout rules among the character cropping algorithm.

Figures 10 and 11 are examples of a clustering effectiveness analysis sheet and a speaker classification accuracy analysis sheet.

Figure 12 is a flowchart explaining a method of automatically converting animation to webtoon with one touch according to an embodiment of the present invention.

FIG. 13 is a detailed flowchart of the S720 process shown in FIG. 12.

FIG. 14 is a detailed flowchart of the S750 process shown in FIG. 12.

Figure 15 is a diagram illustrating an example computing environment in which one or more embodiments disclosed herein may be implemented.

Hereinafter, a system and method for automatically converting animation into webtoon according to an embodiment of the present invention will be described in more detail based on the attached drawings.

Figure 1 is a block diagram of a system for automatically converting animation to webtoon with one touch according to an embodiment of the present invention, Figure 2 is a detailed configuration diagram of the image cut extractor shown in Figure 1, and Figures 3 and 4 are crop It is an example diagram to explain the process, and Figure 5 is an example diagram comparing the result cut with the test result cut and GT cut for accuracy inspection, and Figure 6 is a special diagram applied in the speech bubble and special effect application unit shown in Figure 1. It is an example diagram of the effect, and Figure 7 is an example diagram explaining the arrangement of the speech balloon applied in the speech balloon and special effect application unit shown in Figure 1, and Figure 8 shows a special line inserted into the webtoon cut based on the coordinate values of the objects for each frame. Figure 9 is an example diagram of the layout rule among the character cropping algorithm, and Figures 10 and 11 are example diagrams of a clustering effectiveness analysis sheet and a speaker classification accuracy analysis sheet.

First, as shown in FIG. 1, the system 100 for automatically converting animation to webtoon with one touch according to an embodiment of the present invention includes an input unit 110, an image cut extractor 120, and application of speech balloons and special effects. Includes a unit 130 and an output unit 140.

Additionally, the present invention may further include an onomatopoeia and tone analysis unit 150.

Meanwhile, the system 100 of the present invention can be linked with an OPEN API running on a user terminal (not shown), and the OPEN API refers to an application on the terminal, for example, a mobile terminal (smart Includes apps that run on the phone. Apps can be downloaded and installed from the application market, a virtual marketplace where mobile content can be freely bought and sold, or run in conjunction with the cloud.

More specifically, referring to FIG. 1, the input unit 110 may be configured to receive animation input from a user terminal.

The image cut extraction unit 120 determines the movement of the object, the starting and ending points of the object's dialogue, and the camera movement based on the frame, sound, and scene camera coordinate information in the animation, and then generates a plurality of valid cuts that match the judgment results. It may be a configuration that extracts at least one of the webtoon cuts.

Referring to FIG. 2, the image cut extraction unit 120 includes a source classification unit 121, a background/object tracking unit 122, an amplitude and frequency extraction unit 123, a camera coordinate tracking unit 124, and an object movement confirmation unit. It includes a unit 125, a dialogue timing point tracking unit 126, a camera technique determination unit 127, an effective cut determination unit 128, and a webtoon cut extraction unit 129.

The source classification unit 121 may be configured to classify the sources in the animation transmitted from the input unit 110 into ① frame image, ② sound, and ③ scene camera coordinates.

The background/object tracking unit 122 may be configured to track an object or its movement or position change through differences between consecutive frame images using an image distribution-based neural network learning algorithm (Representation Learning).

For reference, the background/object tracking unit 122 can separate the object and the background when animation is input and analyze the shape and size of the object.

In addition, by comparing the previous frame and the next frame, a cut consisting of only moving objects is extracted, and then the object area is specified by connecting pixels at the top, bottom, left, right, and diagonal positions with respect to the pixel, in addition to the pixel with a value of 0, which is judged as the background. can do.

The amplitude and frequency extraction unit 123 may be configured to extract the amplitude and frequency of the sound of the animation.

The camera coordinate tracking unit 124 may be configured to track camera movement based on scene camera coordinates.

The object movement confirmation unit 125 is configured to check (select) frames in which the movement or position change of the object tracked by the background/object tracking unit 122 is minimal.

Additionally, the object motion confirmation unit 125 may calculate a comparison value by comparing the distance to objects detected in the next frame based on the location and size information of the object detected in the previous frame. Additionally, the closest object among objects within a certain distance can be set as the next position to which the current object moves, and then the coordinate values of the objects can be output for each frame.

The dialogue timing point tracking unit 126 may be configured to track frames corresponding to the dialogue timing point from the start point where the object starts the dialogue to the end point where the object ends the dialogue.

The valid cut determination unit 128 may be configured to determine the frame confirmed by the object motion confirmation unit 125 and the frame tracked by the dialogue timing point tracking unit 126 as a valid cut.

The camera technique determination unit 127 may be configured to determine a frame in which a camera technique (eg, zoom in/out, tilt, pan) is reflected within the frame based on camera coordinates tracked in camera coordinate tracking.

For example, matching feature points extracted from a difference image with objects removed from the original image (Image feature matching), for example, using a difference image with saliency applied in the previous frame and using the original image in the next frame, between two frames. After matching the feature points and calculating the x- and y-axis movement amounts for the matched points to calculate the camera's up, down, left, and right movement, some of the feature points from the previous frame and the matched feature points from the next frame are used to determine the distance between each point. Calculate the distance and determine whether the camera zooms in or out using the value calculated as a ratio of the degree of change in distance.

The webtoon cut extraction unit 129 may be configured to select the frames determined as valid cuts by the valid cut determination unit 128 and the frames determined by the camera technique determination unit 127 as webtoon cuts.

Next, the onomatopoeia and tone analysis unit 150 may be configured to separate the object onomatopoeia from the sound and then analyze the object tone.

When the object onomatopoeia is included in the sound, the onomatopoeia and tone analysis unit 160 detects the frame where the voice appears through a pre-trained voice activity detection algorithm, and when the object voice is recognized in the sound, the corresponding When there is no dialogue at the point of view, it may be classified as onomatopoeia.

Additionally, the onomatopoeia and tone analysis unit 150 analyzes the decibel of the onomatopoeia. For example, a class can be assigned by detecting a place where the voice decibel at the time of cut extraction is a certain decibel higher than the average decibel of the corresponding animation.

In addition, when extracting a cut, the section in which the dialogue occurs is received as a time or frame position, and if the proportion of places in this section that are greater than a certain decibel exceeds 20%, it may be classified as a specific decibel greater than the average decibel. .

Next, the speech balloon and special effect application unit 130 may be configured to insert or change concentration lines, speed lines, speech balloons, sound effects, and layout colors so that the object onomatopoeia and tone are reflected in the webtoon cut image.

The speech balloon and special effect application unit 130 automatically generates a speech balloon according to the object onomatopoeia and the tone of the object.

In addition, after tracking the positions of the main object (speaker) and object (listener) using a saliency map, if there is no change in the position or appearance of the object (object) in two or more cuts, the object is cropped using an object crop algorithm. After cropping the (object), a speech bubble and onomatopoeia converted into letters are placed in the area adjacent to the speaker, and the layout color is changed when the background in the video becomes dark.

Additionally, the speech balloon and special effect application unit 130 may insert an effect line into the webtoon cut along the movement line in proportion to the size of the movement or position change of the object or object.

For reference, the speech balloon and special effect application unit 130 can arrange speech balloons according to the location of the object. For example, if the object is located on the left within the webtoon cut, the speech bubble is placed on the right.

In addition, the speech balloon and special effect application unit 130 has the highest corresponding value based on (sum of difference image values between frames in which dialogue is performed)/(area of salience map) calculated by the background and object extraction unit. The part can be recognized as the speaker.

In addition, the speech balloon and special effect application unit 130 provides a comparison value and a certain distance by comparing the distance to objects detected in the next frame based on the location and size information of the object detected in the previous frame in the object movement confirmation unit 125. After setting the closest object among the objects inside to the next position after the current object moves, a special line can be inserted into the webtoon cut based on the coordinate values of the objects for each frame.

For reference, the object crop algorithm may be a program that recognizes and clusters cuts with similar positions and compositions of objects as similar cuts, and repeats the layout form of the entire cut and speaker crop in order within each group.

In addition, the object crop algorithm recognizes and extracts cuts with less change in object and character composition compared to previous webtoon cuts as similar cuts, and determines whether they can actually be used in webtoons when applying the overall cut and speaker crop layout patterns for each group. Determine validity.

Below, with reference to Figures 10 and 11, the analysis sheet recording the clustering effectiveness analysis and speaker classification accuracy analysis will be described.

First, the experimental subject animation was tested with episode 1 of “Suspicious Neighbors in Folk Village” and achieved 79% effectiveness.

For speaker classification accuracy, one cut for each line was extracted based on the script of Episode 1 of Suspicious Neighbors in Folk Village.

As a result of classifying a person with relatively large movements in the scene as a speaker, a speaker classification accuracy of 54% was obtained.

Therefore, by using the system for automatically converting animation to webtoon with one touch according to an embodiment of the present invention, it is possible to save cost and time by minimizing the labor-intensive repetitive work that occurs in the process of converting animation to webtoon. provides an advantage. In addition, anyone can easily use it, maximizing accessibility for industry workers, and by outputting the results in split layers, there is an advantage in that easy correction and security are possible.

FIG. 12 is a flowchart explaining a method of automatically converting animation to webtoon with one touch according to an embodiment of the present invention, FIG. 13 is a detailed flowchart of the S720 process shown in FIG. 12, and FIG. 14 is S740 shown in FIG. 12. This is a detailed flow chart of the process.

First, referring to FIG. 11, the method (S700) for automatically converting animation to webtoon according to an embodiment of the present invention includes the step of receiving animation from an input unit (S710); After the image cut extractor determines the movement of the object, the starting and ending points of the object's dialogue, and the camera movement based on the frame, sound, and scene camera coordinate information in the animation, at least one of a plurality of valid cuts matching the judgment result. The above is extracted as a webtoon cut (S720).

Afterwards, the object onomatopoeia is separated from the sound in the onomatopoeia and tone analysis unit, and then the object tone is analyzed (S730).

At this time, in the S730 process, when the object onomatopoeia is included in the sound, the frame where the voice appears is detected through a pre-trained voice activity detection algorithm, and when the object voice is recognized in the sound, the corresponding point is included in the script. In cases where there is no dialogue, a step of classifying it as onomatopoeia may be included.

Thereafter, a series of processes are included in which the speech balloon and special effect application unit inserts concentration lines, speed lines, speech balloons, sound effects, and layout colors to reflect the onomatopoeia and tone of the object within the webtoon cut (S740).

The S720 process classifies sources in the animation into frame images, sounds, and scene camera coordinates, tracks object movement or position changes through differences between consecutive frame images, extracts the amplitude and frequency of the sound, and calculates the scene camera coordinates. Based on this, track the camera coordinates, check the frame with the least movement or position change of the tracked object, track the frame corresponding to the dialogue timing point from the start point of the dialogue to the end point, and then check the position change. It includes a series of processes to determine the frame corresponding to the dialogue timing point from the starting point of the dialogue to the end point of the dialogue as a valid cut, and extracting the effective cut that reflects the camera technique among the valid cuts as a webtoon cut.

The process S740 may include automatically generating a speech bubble according to the object onomatopoeia and the tone of the object.

Also, referring to FIG. 13, after tracking the positions of the main object (speaker) and object (listener) using the saliency map, if there is no change in the position of the object in two or more cuts, the object is cropped, and then It may include placing speech bubbles and onomatopoeia converted into letters in an area adjacent to the speaker, and changing the layout color using camera coordinate information.

In addition, the step of inserting an effect line into the webtoon cut along the movement line in proportion to the size of the object or object's movement or change in position may be further included.

In addition, anyone can easily use it, maximizing accessibility for industry workers, and by outputting the results in split layers, there is an advantage in that easy correction and security are possible.

15 is a diagram illustrating an example computing environment in which one or more embodiments disclosed herein may be implemented, and is an illustration of a system 1000 that includes a computing device 1100 configured to implement one or more embodiments described above. shows. For example, computing device 1100 may include a personal computer, server computer, handheld or laptop device, mobile device (mobile phone, PDA, media player, etc.), multiprocessor system, consumer electronics, minicomputer, mainframe computer, Distributed computing environments including any of the above-described systems or devices, etc. are included, but are not limited thereto.

Computing device 1100 may include at least one processing unit 1110 and memory 1120. Here, the processing unit 1110 may include, for example, a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), etc. and can have multiple cores. Memory 1120 may be volatile memory (eg, RAM, etc.), non-volatile memory (eg, ROM, flash memory, etc.), or a combination thereof. Additionally, computing device 1100 may include additional storage 1130. Storage 1130 includes, but is not limited to, magnetic storage, optical storage, etc. The storage 1130 may store computer-readable instructions for implementing one or more embodiments disclosed in this specification, and other computer-readable instructions for implementing an operating system, application program, etc. may also be stored. Computer-readable instructions stored in storage 1130 may be loaded into memory 1120 for execution by processing unit 1110. Computing device 1100 may also include input device(s) 1140 and output device(s) 1150.

Here, the input device(s) 1140 may include, for example, a keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, or any other input device, etc. Additionally, output device(s) 1150 may include, for example, one or more displays, speakers, printers, or any other output devices. Additionally, the computing device 1100 may use an input device or output device provided in another computing device as the input device(s) 1140 or the output device(s) 1150. Additionally, computing device 1100 may include communication connection(s) 1160 that allows computing device 1100 to communicate with another device (e.g., computing device 1300).

Here, communication connection(s) 1160 may include a modem, network interface card (NIC), integrated network interface, radio frequency transmitter/receiver, infrared port, USB connection, or other device for connecting computing device 1100 to another computing device. May contain interfaces. Additionally, communication connection(s) 1160 may include a wired connection or a wireless connection. Each component of the computing device 1100 described above may be connected by various interconnections such as buses (e.g., peripheral component interconnect (PCI), USB, firmware (IEEE 1394), optical bus structure, etc.) and may be interconnected by a network 1200. As used herein, terms such as "component", "system", etc. generally refer to computer-related entities that are hardware, a combination of hardware and software, software, or software in execution.

For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. For example, both the application running on the controller and the controller can be components. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer or distributed between two or more computers.

The present invention is not limited to the above-described embodiments and attached drawings. For those skilled in the art to which the present invention pertains, it will be clear that components according to the present invention can be replaced, modified, and changed without departing from the technical spirit of the present invention.

Claims

Input unit that receives animation from the user terminal

After determining the object's movement, the object's dialogue start and end point, and camera movement based on the frame, sound, and scene camera coordinate information in the animation, at least one of a plurality of valid cuts that match the judgment result is converted into a webtoon cut. an image cut extraction unit that extracts;

an onomatopoeia and tone analysis unit that separates the onomatopoeia of an object from the sound and then analyzes the object tone; and

A system that automatically converts animations into webtoons with one touch, including a speech balloon and special effect application section that inserts or changes concentration lines, speed lines, speech bubbles, sound effects, and layout colors.
According to paragraph 1,

The image cut extraction unit

A system that automatically converts animations into webtoons with one touch by classifying sources within the animation into frame images, sounds, and scene camera coordinates.
According to paragraph 2,

A background/object tracking unit that tracks the movement or position change of an object through the difference between consecutive frame images;

Amplitude and frequency extraction unit for extracting the amplitude and frequency of the sound;

A system for automatically converting an animation into a webtoon with one touch, further comprising a camera coordinate tracking unit that tracks camera coordinates based on the scene camera coordinates.
According to paragraph 3,

an object motion confirmation unit that checks the frame in which the movement or position change of the object tracked by the background/object tracking unit is minimal;

a dialogue timing point tracking unit that tracks frames corresponding to dialogue timing points from the start point where the object starts dialogue to the end point where the object ends the dialogue; and

A system for automatically converting an animation into a webtoon with one touch, further comprising a valid cut determination unit that determines the frame confirmed by the object movement confirmation unit and the frame tracked by the dialogue timing point tracking unit as a valid cut.
According to paragraph 4,

A system for automatically converting an animation into a webtoon with one touch, further comprising a webtoon cut extraction unit that extracts a valid cut reflecting a camera technique from among the effective cuts as a webtoon cut.
According to paragraph 1,

The speech balloon and special effects application section

A system that automatically converts animations into webtoons with one touch by automatically creating speech bubbles according to the object onomatopoeia and tone of the object.
According to clause 6,

The onomatopoeia and tone analysis unit

When the object onomatopoeia is included in the sound, the frame in which the voice appears is detected through a pre-trained voice activity detection algorithm,

A system that automatically converts an animation into a webtoon with one touch by classifying it as an onomatopoeia when the object voice is recognized within the sound and there is no dialogue in the script at that point.
According to clause 6,

The speech balloon and special effects application section

After tracking the positions of the main object (speaker) and object (listener) using a salience map, if there is no change in the position of the object in two or more cuts, the object is cropped and a speech balloon and a speech bubble are placed in the adjacent area of the speaker. A system that automatically converts animations into webtoons with one touch, featuring onomatopoeia converted into letters and changing layout colors using camera coordinate information.
According to clause 8,

The speech balloon and special effects application section

A system that automatically converts animations into webtoons with one touch by inserting effect lines into webtoon cuts along the movement line in proportion to the size of the object or object's movement or position change.
Receiving animation from an input unit;

After the image cut extractor determines the movement of the object, the starting and ending points of the object's dialogue, and the camera movement based on the frame, sound, and scene camera coordinate information in the animation, at least one of a plurality of valid cuts matching the judgment result. Step of extracting the above into webtoon cuts;

separating object onomatopoeia from the sound in an onomatopoeia and tone analysis unit and then analyzing object tone; and

A method of automatically converting an animation into a webtoon with one touch, including the step of inserting a concentration line, a speed line, a speech balloon, a sound effect, and a layout color so that the onomatopoeia and tone of the object are reflected in the webtoon cut in the speech balloon and special effect application unit.
According to clause 10,

The steps for extracting the webtoon cut are

classifying sources within the animation into frame images, sounds and scene camera coordinates;

Tracking object movement or position change through differences between consecutive frame images;

extracting the amplitude and frequency of the sound;

tracking camera coordinates based on the scene camera coordinates;

Confirming the frame in which the movement or position change of the tracked object is minimal;

Tracking frames corresponding to dialogue timing points from the starting point of the dialogue to the end point of the dialogue;

A step of determining the frame in which the positional change is confirmed and the frame corresponding to the dialogue timing point from the start point of the dialogue to the end point of the dialogue as valid cuts; and

A method of automatically converting an animation into a webtoon with one touch, including the step of extracting a valid cut reflecting a camera technique among the valid cuts as a webtoon cut.
According to clause 11,

The step of inserting the concentration line, speed line, speech bubble, sound effect, and layout color includes automatically generating a speech balloon according to the object onomatopoeia and the tone of the object. A method of automatically converting an animation into a webtoon with one touch.
According to clause 12,

The step of analyzing the object tone is

When the object onomatopoeia is included in the sound, the frame where the voice appears is detected through a pre-trained voice activity detection algorithm, and when the object voice is recognized in the sound, the point in time is when there is no dialogue in the script. A method of automatically converting animation to webtoon with one touch, including the step of classifying it as onomatopoeia.
According to clause 13,

In the step of inserting the concentration line, speed line, speech bubble, sound effect, and layout color, the positions of the main object (speaker) and object (listener) are tracked using a saliency map, and then the positions of the objects are added in two or more cuts. If there is no change, automatically convert the animation to a webtoon with one touch, including cropping the object, placing speech bubbles and onomatopoeia converted into letters in the area adjacent to the speaker, and changing the layout color using camera coordinate information. How to.
According to clause 14,

The step of inserting the concentration line, speed line, speech bubble, sound effect, and layout color is a one-touch process that further includes the step of inserting the effect line into the webtoon cut along the movement line in proportion to the size of the movement or position change of the object or object. How to automatically convert animation to webtoon.