WO2023106522A1 - System for adding subtitles to sign language video - Google Patents

System for adding subtitles to sign language video Download PDF

Info

Publication number
WO2023106522A1
WO2023106522A1 PCT/KR2022/008059 KR2022008059W WO2023106522A1 WO 2023106522 A1 WO2023106522 A1 WO 2023106522A1 KR 2022008059 W KR2022008059 W KR 2022008059W WO 2023106522 A1 WO2023106522 A1 WO 2023106522A1
Authority
WO
WIPO (PCT)
Prior art keywords
sign language
keyword
video
extracted
folder
Prior art date
Application number
PCT/KR2022/008059
Other languages
French (fr)
Korean (ko)
Inventor
조계연
Original Assignee
주식회사 위아프렌즈
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 위아프렌즈 filed Critical 주식회사 위아프렌즈
Publication of WO2023106522A1 publication Critical patent/WO2023106522A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/231Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Definitions

  • the present invention relates to a system for adding subtitles to sign language videos, and more particularly, subtitles in which a sign language interpreter translates a transmitted sign language video into a gloss unit capable of conveying meaning to deaf culture, a unique culture of the hearing impaired.
  • Data is provided as subtitles to ensure appropriate translation, and at the same time, a keyword folder is created with the keyword extracted as a folder name using text mining techniques for the translated subtitle data to store the divided images based on the subtitle data.
  • a sign language interpreter is directly requested for translation of a sign language video or subtitles are added to a sign language video using a video mining technique, the sign language interpreter translates the translation. It is possible to provide sign language subtitles suitable for deaf culture by extracting keywords using frequently used segmented images from the keyword folder created based on this.
  • sign language refers to visual language expressed using hands, facial expressions, and gestures. Since such sign language is a visual language, it is impossible to communicate in sign language to a person who does not know sign language or in a place where sign language cannot be seen visually. Accordingly, as shown in (Patent Document 1) to (Patent Document 3) below, devices capable of communicating in sign language in various ways have been developed.
  • Patent Document 1 Korean Patent Registration No. 10-1915088
  • the present invention relates to a sign language translation apparatus comprising: a translation unit including an analyzer that analyzes words matched with and extracts sentences, and a control unit that displays the sentences extracted by the analyzer on the display.
  • Patent Document 2 Korea Patent Registration No. 10-2314710
  • It relates to a sign language interpretation service system for the hearing-impaired, which can be worn on the user's head, recognizes the motion of the hand joint through a photograph of the user's sign language motion, generates hand joint motion data, recognizes the other person's voice, and converts it into text.
  • a first wearable device unit that displays, receives sign language interpretation data from the outside, and outputs it as a voice
  • a second wearable device unit that can be worn on a user's hand and generates hand movement tracking data by tracking movement of the hand
  • It is wearable on the user's body corrects the hand joint movement data based on the hand movement tracking data, transmits the corrected sign language movement data to the outside, receives the sign language interpretation data from the outside in response, and 1 portable communication device unit transmitting to the wearable device unit
  • a cloud server unit receiving the sign language operation data from the portable communication device unit, generating the sign language interpretation data through a machine learning algorithm based on the sign language operation data, and transmitting the generated sign language interpretation data to the portable communication device unit.
  • a sign language interpretation service system is disclosed.
  • Patent Document 3 Korea Patent Registration No. 10-2300589
  • AI artificial intelligence
  • a circular dictionary that defines words derived from the same word as one word, a synonym processing dictionary that defines words with the same or similar meaning as one word, a dictionary storage unit storing a stopword processing dictionary defining stopwords not used in sign language translation among morphologically analyzed sentences and a homonym dictionary in which different identification information is set for each homonym; a morpheme analyzer for distinguishing morphemes from an input sentence; a sign language generator for generating a sign language sentence to be translated by comparing each of the classified morphemes with the dictionaries; a motion data extraction unit which extracts, from the storage unit, each motion data indicated by a sign language word code matched with each morpheme constituting the generated sign language sentence; and an avatar motion display unit displaying and controlling the motion of the sign language delivery avatar on the display unit according to the extracted motion data.
  • AI artificial intelligence
  • the existing sign language interpretation system not only has a risk of mistranslation as it translates through the movement of images, but also has limitations in accurately conveying meaning.
  • sign language unlike the standard language of general Korean, can not only use different sign languages depending on regions and groups, but also have different meanings. Due to this difference, existing sign language interpretation systems that perform mechanical translation cannot There is also a risk that the content and the translated content will be translated completely differently.
  • each keyword name folder is configured to store divided videos divided on the basis of subtitle data, so that when a sign language video is requested for subtitles, this sign language video is analyzed using a video mining technique, but keywords are referred to the keyword name folder.
  • the sign language interpreter uses the subtitle data directly translated as subtitles while using the keyword name
  • a system for adding subtitles to sign language videos that not only improves the quality of sign language translations by directly involving sign language interpreters by configuring them to be used for folder creation, but also further enhances the quality of subtitles through keyword searches with high frequency of use. Its purpose is to provide
  • the present invention is configured so that when a sign language interpreter generates subtitle data, translation is performed by dividing one to 10 sign words per one input sign language video, so that the meaning the deaf person wants to convey is expressed in one word, word, or phrase unit.
  • Another object is to provide a system for adding subtitles to sign language videos, in which meaning is conveyed by dividing in gloss units so that meaning can be conveyed more appropriately.
  • a system for adding subtitles to a sign language video includes a first step of receiving a sign language video through an electric/electronic communication network (S100); A second step (S200) of storing the transmitted sign language video; A third step (S300) of keyword analysis of the sign language video using a video mining technique by referring to a keyword name folder; A fourth step (S400) of checking whether keywords have been extracted in the analysis step and, if not, requesting a sign language interpreter to translate the sign language video; If the keyword is extracted in the fourth step (S400), the extracted keyword is transmitted as text to the client to confirm whether to use it as a caption, and if not used as a caption, the sign language interpreter in the fourth step (S400) performs the translation.
  • a fifth step of making a request (S500); If the client agrees to use the transmitted text as a caption in the fifth step (S500), a sixth step (S600) of adding and storing the extracted keyword as a caption to the sign language video; and a seventh step (S700) of transmitting the sign language video with subtitles added to the client;
  • the procedure performed when requesting translation through the sign language interpreter includes a 4-1 step (S410) of requesting translation from the sign language interpreter; While watching the sign language video, a sign language interpreter creates subtitle data including start and end times in gloss units consisting of 1 to 10 sign languages and representing words, phrases, and phrases, and translated text translated in gloss units
  • a 4-2 step to store (S420); a 4-3 step of adding the caption data to the sign language video as a caption (S430); Simultaneously, a 4-3 step of separating translated text from the subtitle data (S430);
  • the sign language video is characterized in that the length of the video is 1 second to 20 minutes.
  • the sign language is the shape of the hand ( ⁇ , dez), the position of the hand ( ⁇ , tab), the movement of the hand ( ⁇ , sig), the direction of the palm ( ⁇ , orientation) ) and at least one of non-manual signals, which are facial expressions and body movements.
  • step 4-7 when the extracted keyword corresponding to each split image consists of at least two words, storing the split images in each keyword name folder including each word characterized by
  • the system for adding subtitles to the sign language video it is characterized in that the update is performed at predetermined time intervals.
  • step 4-1 when requesting a sign language interpreter for translation, the client selects from registered sign language interpreters, requests a sign language interpreter registered in the region where the client lives, or sign language interpreter.
  • the client selects from registered sign language interpreters, requests a sign language interpreter registered in the region where the client lives, or sign language interpreter.
  • it is characterized by designating and requesting a sign language interpreter who had previously requested a sign language video, selecting a sign language interpreter in charge of the area where the client lives, or randomly designating a sign language interpreter to translate do.
  • the system for adding subtitles to sign language videos according to the present invention has the following effects.
  • sign language interpreters When subtitles are provided for sign language videos, sign language interpreters directly translate content in gloss units where meaning is conveyed in deaf culture, based on segmented videos stored in folders created with keywords extracted by text mining techniques
  • translation is performed in units of gloss in which meaning is conveyed in deaf culture, so that meaning transmission in deaf culture is more appropriate.
  • the subtitle data can be obtained by directly requesting a sign language interpreter for translation. is directly used as a subtitle in sign language video, and at the same time, a folder is created with keywords obtained by extracting this subtitle data with text mining technique, and the divided video is stored there and used when extracting keywords with the video mining technique, so that sign language interpreters can By utilizing subtitle data translated in gloss units, the meaning of deaf culture can be conveyed more appropriately.
  • the present invention is configured to be updated every predetermined time unit, such as three times a day, so that the accuracy of translation can be increased and the meaning and subtitles to be conveyed in sign language can be provided as coincidentally as possible.
  • the sign language is the shape of the hand ( ⁇ , dez), the position of the hand ( ⁇ , tab), the movement of the hand ( ⁇ , sig), the direction of the palm (hand direction)
  • non-manual signals which are hand movements, orientation, facial expressions and body movements
  • expressions of intention expressed through facial expressions along with hand or body movements can be converted into subtitle data. Therefore, meaning can be conveyed more appropriately through subtitles.
  • FIG. 1 is a flowchart showing the operation of a system for adding subtitles to a sign language video according to the present invention.
  • FIG. 2 is an image showing subtitle data translated and stored by a sign language interpreter according to the present invention as an example.
  • FIG. 3 is an image showing an example of extracting translated text from subtitle data according to the present invention.
  • FIG. 4 is an image showing part of the result of keyword extraction using a text mining technique for the translated text extracted according to the present invention.
  • keywords are extracted by a video mining technique by referring to the keyword name folder in which divided videos are stored for the transmitted sign language video, .
  • the extracted keywords are provided to the client to confirm whether they will be used as subtitles, and then used as sign language video subtitles.
  • the keyword name folder created based on the subtitle data produced with the participation of sign language interpreters is used, translation suitable for deaf culture is possible. This made it possible to increase the accuracy of subtitles.
  • the subtitle data is divided into gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, and translated, and based on this, the transmitted sign language video is divided to obtain a segmented video obtained by configuring it.
  • keyword extraction is performed in units of words, phrases, or phrases that convey meaning, so that the meaning to be conveyed can be conveyed more appropriately. it did
  • the first step (S100) is a step of receiving a sign language video as shown in [Fig. 1].
  • the client makes a request through an electric/electronic communication network such as the Internet using a mobile terminal such as a smart phone or PDA or a terminal such as a personal computer.
  • the second step (S200) is a step of storing the transmitted sign language video as shown in [Fig. 1].
  • the sign language video is used as the original, and when the subtitles obtained through the process described later are added and sent to the client, or the sign language video is translated into gloss, a word, phrase, or phrase unit that conveys meaning in deaf culture. It is also used when creating segmented videos. This will be described later.
  • the third step (S300) is a step of extracting keywords from the above-described sign language video using a video mining technique, as shown in [Fig. 1].
  • keywords are extracted by referring to divided videos stored in a keyword name folder, which will be described later.
  • the keyword name folder and the divided videos will be described later along with the fourth step (S400).
  • the fourth step (S400), as shown in [Fig. 1], is a step of comparing whether a keyword can be extracted by comparing a sign language video with a divided video stored in a keyword name folder through a video mining technique.
  • the video mining technique analyzes the behavior of visitors obtained through a video camera in the store, calculates the expected sales and number of visitors, and grasps trends in the retail industry based on the data that analyzes what kind of products the customers buy in the recorded video. Use the usual techniques used as data to do.
  • a sign language video is analyzed and analyzed motion is compared with a segmented video to be described later to be used to extract keywords.
  • the fourth step (S400) keywords are extracted from the sign language video using a video mining technique by referring to the keyword name folder in which the divided video is stored.
  • the translation is requested (Step 4-1 (S410)), and when the keyword is extracted, the fifth step (S500) of providing the keyword to the client is performed.
  • the process of storing the segmented video used for keyword extraction in the video mining through subtitle data obtained by requesting a sign language interpreter for translation (S410 to S470) will be described later, and keyword processing extracted through video mining (S500) will be explained first.
  • the extracted keywords are transmitted to the client who requested the above-mentioned sign language video subtitles through an electric/electronic communication network, so that the client reviews the extracted keywords and then selects whether to use them as subtitles or request a sign language interpreter to translate them. It is a step.
  • a sixth step (S600) to be described later is performed, and if a sign language interpreter is requested for translation, a 4-1 step (S410) to be described later is performed.
  • the same procedure as requesting a sign language interpreter is performed because the keyword was not extracted in the 4-1 step (S400), which will be described later together.
  • a sixth step (S600) is a step of storing the sign language video with subtitles added by adding keywords provided to the client and permitted to be used as subtitles to the sign language video, as shown in [Fig. 1].
  • the seventh step (S700) is a step of transmitting the stored sign language video to the client, as shown in [Fig. 1].
  • the sign language video may be transmitted to a terminal used to receive the sign language video or to another terminal designated by the client.
  • Step 4-1 is a step of requesting a sign language interpreter to translate a sign language video, as shown in FIG. 1 .
  • the requester of the sign language video designates and requests the sign language interpreter he or she previously requested for the sign language video, selects a sign language interpreter in charge of the region where the client lives, or arbitrarily selects a sign language interpreter. can also be specified.
  • step 4-2 the sign language interpreter reads the transmitted sign language video and translates it to create and store caption data.
  • subtitle data refers to data that is arbitrarily divided and translated in gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, while watching a sign language video transmitted by a sign language interpreter, as shown in [Fig. 2].
  • Such subtitle data includes a time arbitrarily divided by a sign language interpreter to convey meaning, that is, a start time and an end time, which are the beginning and end of a gloss unit in the timeline of a sign language video, and translated text translated by the gloss unit.
  • the translated text is translated so that the sign language interpreter directly inputs the translated content while watching the sign language video so that the meaning can be conveyed appropriately according to the situation while watching the sign language.
  • 'intRequestIdx' indicates that one sign language video is divided into gloss units, and a number with the same 'intRequestIdx' shows that one video is divided into several gloss units and translated.
  • a sign language video in which 'intRequestIdx' is stored as '1,503' is divided into 8 gloss units by a sign language interpreter to properly convey meaning in deaf culture, and each gloss is 'intRequest It has a pair of 'Idx' and 'intOrder' as an identifier.
  • each gloss unit is stored as 'strStartTime (start time)' and strEndTime (end time)' based on the timeline of the sign language video, and the contents translated in each gloss unit are stored as 'strText'.
  • the sign language may include all movements, actions, facial expressions, etc. used in sign language, and most preferably, hand shape (dez), hand position, water level. ( ⁇ , tab), hand movement ( ⁇ , sig), palm direction ( ⁇ , orientation), facial expression and body movement (non-hand signal) It means at least one expression. This is to enable a sign language interpreter to directly observe the meaning expressed by all means used by the deaf to convey meaning and to comprehend the exact meaning and convert it into subtitles.
  • a gloss unit capable of conveying meaning to deaf culture, which is the unique culture of the hearing impaired, for the sign language.
  • one motion of sign language can convey meaning like one noun, or two or three sign language motions can be used as a unit in which one meaning is conveyed.
  • a video with a video length of about 1 second to 20 minutes for the sign language video it is preferable to use. This is to allow the sign language interpreter to focus and accurately translate, and to quickly add subtitles and give feedback to the deaf right away.
  • the sign language interpreter can focus and accurately translate, and to quickly add subtitles and give feedback to the deaf right away.
  • the length of such a video is longer than this, it will take a little longer to process, and anyone skilled in the art will be able to easily understand that there is no problem in translating and providing subtitles.
  • the caption data is not only provided as a caption to the sign language video (step 4-2), but also extracts keywords through a text mining technique and divides the sign language video based on the caption data.
  • keyword extraction is performed when a search keyword is extracted for other sign language videos using a video mining technique, and subtitles are automatically generated (steps 4-3 to 4-7). step) to be used.
  • the 4-2 step (S420) is a step of using the above-described subtitle data as a subtitle for a sign language video, as shown in [Fig. 1].
  • the subtitle data includes the start time and end time of the gloss unit, which is a unit for conveying meaning (gloss), and the translated text directly translated by the sign language interpreter for the gloss unit, the sign language video is produced using this.
  • Subtitles are generated immediately so that sign language videos can be supplied immediately.
  • Step 4-3 is a step of separating only translated text from the above-described subtitle data, as shown in [Fig. 1]. This is to extract the extracted keywords from the translation test separated by the text mining technique.
  • a folder name is used to store necessary divided images, and as will be described later, when extracting a keyword from a sign language video using a video mining technique, data stored in this folder can be used. This will be explained in turn in the steps to be described later.
  • Step 4-4 is a step of extracting each extraction keyword from the translation text separated in gloss units, as shown in [Fig. 1].
  • the extracted keyword is extracted from the translated text separated from the subtitle data using a text mining technique.
  • the text mining technique uses natural language processing technology based on linguistics, statistics, machine learning, etc. to standardize semi-structured/unstructured text data, extracts its features in the form of keywords, and extracts meaningful information. It refers to the usual technique of finding.
  • the extracted keywords extracted in this way are not only used as folder names (keyword names) of folders to be described later, but also used as subtitles by extracting appropriate keywords according to frequency of use when keywords are extracted from other videos using a video mining technique.
  • [Figure 3] shows the text separated from the subtitle data of [ Figure 2]
  • [ Figure 4] shows some of the extracted keywords extracted from the text of [ Figure 3] through the text mining technique according to the present invention.
  • 'keyword' is the extracted keyword extracted by the text mining technique according to the present invention
  • 'Frequency' is the video mining method for the segmented videos stored in the keyword name folder generated by using the extracted keyword as the folder name.
  • Each represents the frequency of use selected when extracting keywords through
  • Step 4-5 checks whether a folder (keyword name folder) using the extracted keyword extracted in step 4-5 (S450) as a folder name is created, as shown in [Fig. 1]. It is a step. This is to store the split images to be described later in the keyword name folder formed by the keyword names, and to search for and utilize the necessary split images by using these keyword name folders when extracting keywords for sign language videos using the video mining technique. It is for
  • step 4-5 As shown in [Fig. 1], it is checked whether a keyword name folder is created in this way, and if the keyword name folder is not created, a new folder with the keyword name is created in the database. , When the keyword name folder is created, steps 4-6 (S460) are performed.
  • Step 4-6 is a step of dividing the sign language video transmitted together with the above-described caption data into divided images, as shown in [Fig. 1].
  • the split video uses the time information used when dividing the sign language video into gloss units on the timeline of the sign language video, that is, the start time and end time of the sign language when divided into each gloss unit. Accordingly, each of the divided images can be divided into image units in gloss units in which meaning is conveyed in deaf culture as the subtitle data is divided based on the end time.
  • Step 4-7 is a step of storing the divided image divided in step 4-6 (S460) in a folder, as shown in FIG. 1.
  • the folder refers to a keyword name folder in which the keyword extracted from the subtitle data through the above-described step 4-5 (S450) of the translation text corresponding to the divided video section is used as the folder name. Accordingly, for each of the divided videos, the keyword to be delivered as sign language content formed in the video is the same as the name of the folder in which the divided video is stored.
  • the keyword name folder containing each word is searched, and each word is It is preferable to store the divided images in each searched keyword name folder. This is to enable a search in units of gloss including this single word even if the keyword consists of one word.
  • the database is configured to be updated at a predetermined time, for example, three times a day, hourly, etc., so that the frequency of use increases as the number of newly created folders increases. The higher the value, the more accurate the translation becomes.
  • step 4-7 the sign language video is directly translated through a video mining technique without passing through a sign language interpreter and provided to the database so that subtitles can be generated. That is, when there is a request for translation of a sign language video and keywords are extracted from the sign language video through the video mining technique (steps 3 and 4), keywords are extracted from the sign language video by the video mining technique by referring to the divided video and keyword name folder.
  • the sign language interpreter can use the contents properly translated in gloss units as subtitles so that the meaning is conveyed in deaf culture, so that the meaning to be conveyed can be conveyed more appropriately.
  • the divided videos stored in the keyword name folder and used for keyword extraction from sign language videos through a video mining technique are sorted according to the frequency of keyword extraction, and the frequently used videos are Based on the keyword search, a certain number of keywords are left, for example, 3 to 10 in order of frequency of use, and the rest are deleted, leaving only the divided images with high frequency of use, so that they can be translated quickly and appropriately in gloss units. It is desirable to configure this to be done.
  • the present invention made as described above extracts subtitle data translated by a sign language interpreter using a text mining technique to obtain keywords, and stores segmented videos obtained based on the subtitle data in a keyword name folder generated based on the keywords to perform video mining.
  • the sign language interpreter can deliver the meaning that is intended to be conveyed by using appropriately translated content in gloss units that can convey meaning to the deaf culture, the unique culture of the hearing impaired. can be done properly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

In the present invention, a folder is created through a keyword extracted by means of text mining from subtitle data translated by a sign language interpreter in gloss units by which meaning is transferred as words, word segments, or phrases in deaf culture, and then divided videos divided with respect to the subtitle data can be stored in respective folders, and thus, when a subtitle request for a sign language video is received, the sign language video is analyzed by means of video mining, wherein the keyword is extracted with reference to keyword name folders, and the extracted keyword is provided to a client as text so that same is used as subtitles, or, when a keyword cannot be extracted or the provided text is not used as subtitles, subtitle data which the sign language interpreter has directly translated can be used as subtitles and used for folder creation, and thus the quality of sign language translation with direct involvement of the sign language interpreter can be increased.

Description

수어 동영상에 자막을 추가하는 시스템System for adding subtitles to sign language videos
본 발명은 수어 동영상에 자막을 추가하는 시스템에 관한 것으로, 더욱 상세하게는 전송받은 수어 동영상을 수어 통역사가 청각 장애인의 고유문화인 농문화로 의미 전달을 할 수 있는 글로스(Gloss) 단위로 번역한 자막 데이터를 자막으로 제공하여 적절한 번역이 이루어지게 하고, 동시에 이렇게 번역된 자막 데이터에 대해 텍스트 마이닝 기법을 이용하여 추출된 키워드를 폴더명으로 하는 키워드 폴더를 만들어서 자막 데이터를 기준으로 분할된 분할 영상을 저장하되 각 분할 영상이 추출된 키워드를 포함하는 키워드 폴더에 저장되게 구성함으로써, 수어 동영상에 대해 수어 통역사에게 직접 번역을 의뢰하거나, 비디오 마이닝 기법으로 수어 동영상에 자막을 추가할 때 수어 통역사가 번역한 것을 바탕으로 만들어진 키워드 폴더에서 사용빈도가 높은 분할 영상을 이용하여 키워드 추출이 이루어지게 하여 농문화에 적합한 수어 자막을 제공할 수 있게 한 것이다.The present invention relates to a system for adding subtitles to sign language videos, and more particularly, subtitles in which a sign language interpreter translates a transmitted sign language video into a gloss unit capable of conveying meaning to deaf culture, a unique culture of the hearing impaired. Data is provided as subtitles to ensure appropriate translation, and at the same time, a keyword folder is created with the keyword extracted as a folder name using text mining techniques for the translated subtitle data to store the divided images based on the subtitle data. However, by configuring each segmented video to be stored in a keyword folder containing extracted keywords, when a sign language interpreter is directly requested for translation of a sign language video or subtitles are added to a sign language video using a video mining technique, the sign language interpreter translates the translation. It is possible to provide sign language subtitles suitable for deaf culture by extracting keywords using frequently used segmented images from the keyword folder created based on this.
일반적으로 수어는 손이나 표정 그리고 몸짓 등을 사용하여 표현하는 시각언어를 말한다. 이러한 수어는 시각 언어이므로, 수어를 알지 못하는 사람이나 시각적으로 수어를 볼 수 없는 곳에서는 수어로 대화를 나눌 수 없다. 이에, 아래의 (특허문헌 1) 내지 (특허문헌 3)과 같이, 다양한 방식으로 수어로 대화를 나눌 수 있는 장치가 개발되어 있다.In general, sign language refers to visual language expressed using hands, facial expressions, and gestures. Since such sign language is a visual language, it is impossible to communicate in sign language to a person who does not know sign language or in a place where sign language cannot be seen visually. Accordingly, as shown in (Patent Document 1) to (Patent Document 3) below, devices capable of communicating in sign language in various ways have been developed.
(특허문헌 1) 한국등록특허 제10-1915088호(Patent Document 1) Korean Patent Registration No. 10-1915088
양면에 디스플레이가 형성되는 본체부; 상기 본체에 형성되며 수화자의 수어 동작을 촬영하는 카메라부; 상기 카메라부에서 획득한 영상을 수신하여 수어 동작을 추출하는 영상가공부와, 수어 동작과 단어가 매칭되어 저장되어 있는 데이터베이스부와, 상기 영상가공부에서 추출된 수어 동작과 상기 데이터베이스부로부터 추출된 수어 동작에 매칭되는 단어를 분석하여 문장을 추출하는 분석부와, 상기 분석부에서 추출된 문장을 상기 디스플레이에서 현시토록 하는 제어부를 포함하는 번역부;를 포함하는 것을 특징으로 하는 수화번역장치에 관한 것이다.a main body portion with displays formed on both sides; a camera unit formed on the main body and photographing a sign language operation of a caller; An image processing unit that receives the image obtained from the camera unit and extracts a sign language operation; a database unit in which the sign language operation and words are matched and stored; and the sign language operation extracted from the image processing unit and the sign language operation extracted from the database unit. The present invention relates to a sign language translation apparatus comprising: a translation unit including an analyzer that analyzes words matched with and extracts sentences, and a control unit that displays the sentences extracted by the analyzer on the display.
(특허문헌 2) 한국등록특허 제10-2314710호(Patent Document 2) Korea Patent Registration No. 10-2314710
청각장애인을 위한 수어 통역 서비스 시스템에 관한 것으로, 사용자의 두부에 착용 가능하고, 사용자의 수화 동작 촬영을 통해 손 관절의 움직임을 인식하여 손 관절 움직임 데이터를 생성하고, 상대방의 음성을 인식하여 문자로 표시하며, 외부로부터 수어 통역 데이터를 수신하여 음성으로 출력하는 제1 착용형 장치부; 사용자 손에 착용 가능하고, 손의 움직임을 추적하여 손 움직임 추적 데이터를 생성하는 제2 착용형 장치부; 사용자의 몸에 착용 가능하고, 상기 손 관절 움직임 데이터를 상기 손 움직임 추적 데이터에 기초하여 보정하고, 보정된 수화 동작 데이터를 외부로 송출하여 이에 대한 응답으로 외부로부터 상기 수어 통역 데이터를 수신하여 상기 제1 착용형 장치부로 전송하는 휴대통신 장치부; 및 상기 휴대통신 장치부로부터 상기 수화 동작 데이터를 수신하고, 상기 수화 동작 데이터를 기초로 머신 러닝 알고리즘을 통해 상기 수어 통역 데이터를 생성하여 상기 휴대통신 장치부로 전송하는 클라우드 서버부를 포함하는 청각장애인을 위한 수어 통역 서비스 시스템을 개시한다.It relates to a sign language interpretation service system for the hearing-impaired, which can be worn on the user's head, recognizes the motion of the hand joint through a photograph of the user's sign language motion, generates hand joint motion data, recognizes the other person's voice, and converts it into text. a first wearable device unit that displays, receives sign language interpretation data from the outside, and outputs it as a voice; a second wearable device unit that can be worn on a user's hand and generates hand movement tracking data by tracking movement of the hand; It is wearable on the user's body, corrects the hand joint movement data based on the hand movement tracking data, transmits the corrected sign language movement data to the outside, receives the sign language interpretation data from the outside in response, and 1 portable communication device unit transmitting to the wearable device unit; and a cloud server unit receiving the sign language operation data from the portable communication device unit, generating the sign language interpretation data through a machine learning algorithm based on the sign language operation data, and transmitting the generated sign language interpretation data to the portable communication device unit. A sign language interpretation service system is disclosed.
(특허문헌 3) 한국등록특허 제10-2300589호(Patent Document 3) Korea Patent Registration No. 10-2300589
인공지능(AI) 기반의 수어 통역 시스템에 관한 것으로, 같은 단어에서 파생된 단어들을 하나의 단어로 정의해 놓은 원형 사전, 같거나 비슷한 뜻을 가지는 단어들을 하나의 단어로 정의해 놓은 동의어처리 사전, 형태소 분석된 문장 중 수화번역에 사용되지 않는 불용어를 정의해 놓은 불용어처리 사전, 각 동음이의어에 대해 서로 다른 식별정보가 설정되어 있는 동음이의어 사전이 저장된 사전 저장부와; 입력 문장에 대해 형태소를 구분하기 위한 형태소 분석기와; 상기 구분된 각 형태소에 대하여 상기 사전들과 비교해 번역할 수화문장을 생성하는 수화문장 생성기와; 생성된 수화문장을 구성하는 각 형태소에 매칭되는 수화단어코드가 지시하는 모션 데이터 각각을 저장부에서 추출하는 모션 데이터 추출부와; 추출된 모션 데이터에 따라 수화 전달 아바타의 모션을 표시부 상에 표시 제어하는 아바타 모션 표시부;를 포함함을 특징으로 한다.It is about an artificial intelligence (AI)-based sign language interpretation system, a circular dictionary that defines words derived from the same word as one word, a synonym processing dictionary that defines words with the same or similar meaning as one word, a dictionary storage unit storing a stopword processing dictionary defining stopwords not used in sign language translation among morphologically analyzed sentences and a homonym dictionary in which different identification information is set for each homonym; a morpheme analyzer for distinguishing morphemes from an input sentence; a sign language generator for generating a sign language sentence to be translated by comparing each of the classified morphemes with the dictionaries; a motion data extraction unit which extracts, from the storage unit, each motion data indicated by a sign language word code matched with each morpheme constituting the generated sign language sentence; and an avatar motion display unit displaying and controlling the motion of the sign language delivery avatar on the display unit according to the extracted motion data.
하지만, 이러한 기존의 수어 통역 시스템은 다음과 같은 문제가 발생한다.However, the existing sign language interpretation system has the following problems.
(1) 카메라 등을 이용하여 수어 내용을 식별하고 이를 바탕으로 번역해서 보여주어야 하므로, 휴대단말 등을 통해 전송되는 수어 동영상에 대해 바로 번역해 줄 수 없는 불편함이 있다.(1) Since sign language content must be identified using a camera, etc., and translated based on this, it is inconvenient that sign language video transmitted through a mobile terminal cannot be immediately translated.
(2) 또한, 기존의 수어 통역 시스템은 영상의 움직임을 통해 이를 번역함에 따라 잘못 번역할 우려가 있을 뿐만 아니라 정확한 의미 전달에 한계가 있다.(2) In addition, the existing sign language interpretation system not only has a risk of mistranslation as it translates through the movement of images, but also has limitations in accurately conveying meaning.
(3) 특히, 기존의 수어 통역 시스템은 단순히 단어와 관절의 움직임 등을 통해 번역이 이루어지게 되므로, 농문화에서 인식할 수 있는 어절 중심으로 번역되어야 할 부분도 단순히 국어 단어 위주로 번역하여 보여줌에 따라 정확한 의미를 전달할 수 없다.(3) In particular, since the existing sign language interpretation system is simply translated through the movement of words and joints, the part to be translated centered on words recognizable in deaf culture is simply translated and displayed based on Korean words, so that it is accurate. cannot convey meaning.
(4) 게다가 수어는 일반 국어의 표준어와 달리 지역과 집단에 따라 사용하는 수어가 다를 수 있을 뿐만 아니라 뜻도 다를 수 있는데, 이러한 차이로 기계식 번역을 하는 기존의 수어 통역 시스템은 수어 번역 의뢰자의 수어 내용과 번역 내용이 전혀 다르게 번역될 우려도 있다. (4) In addition, sign language, unlike the standard language of general Korean, can not only use different sign languages depending on regions and groups, but also have different meanings. Due to this difference, existing sign language interpretation systems that perform mechanical translation cannot There is also a risk that the content and the translated content will be translated completely differently.
본 발명은 이러한 점을 고려한 것으로, 수어 통역사가 직접 수어 동영상을 보면서 농문화에서 의사소통이 가능한 글로스 단위(Gloss)로 번역한 자막 데이터에서 텍스트 마이닝 기법을 통해 추출된 추출 키워드로 키워드명 폴더를 생성하게 한 다음, 각 키워드명 폴더에는 자막 데이터를 기준으로 분할된 분할 동영상을 저장할 수 있게 구성함으로써, 수어 동영상의 자막 의뢰가 있을 때 이 수어 동영상을 비디오 마이닝 기법으로 분석하되 상기 키워드명 폴더를 참조하여 키워드를 추출하고, 추출된 키워드를 의뢰인에게 텍스트로 제공하여 자막으로 활용하게 하거나, 키워드를 추출할 수 없거나 제공된 텍스트를 자막으로 활용하지 않을 때 수어 통역사가 직접 번역한 자막 데이터를 자막으로 활용하면서 키워드명 폴더 생성에 이용할 수 있게 구성함으로써, 수어 통역사가 직접 관여하여 수어 번역의 품질을 높일 수 있을 뿐만 아니라 사용빈도가 높은 키워드 검색을 통해 자막 품질을 한층 더 높일 수 있게 한 수어 동영상에 자막을 추가하는 시스템을 제공하는데 그 목적이 있다.The present invention takes this point into consideration, so that a sign language interpreter creates a keyword name folder with extracted keywords extracted through a text mining technique from subtitle data translated into gloss units that enable communication in deaf culture while watching a sign language video. Then, each keyword name folder is configured to store divided videos divided on the basis of subtitle data, so that when a sign language video is requested for subtitles, this sign language video is analyzed using a video mining technique, but keywords are referred to the keyword name folder. is extracted, and the extracted keywords are provided to the client as text to be used as subtitles, or when the keywords cannot be extracted or the provided text is not used as subtitles, the sign language interpreter uses the subtitle data directly translated as subtitles while using the keyword name A system for adding subtitles to sign language videos that not only improves the quality of sign language translations by directly involving sign language interpreters by configuring them to be used for folder creation, but also further enhances the quality of subtitles through keyword searches with high frequency of use. Its purpose is to provide
특히, 본 발명은 수어 통역사가 자막 데이터를 생성할 때 하나의 입력 수어 동영상 당 1~10개소의 수어로 나눠서 번역이 이루어지게 구성함으로써, 농인이 전달하려고 하는 뜻을 하나의 단어나 어절이나 어구 단위인 글로스 단위로 나눠서 의미 전달이 이루어지게 되어 더욱더 적절하게 의미 전달이 이루어질 수 있게 한 수어 동영상에 자막을 추가하는 시스템을 제공하는데 다른 목적이 있다.In particular, the present invention is configured so that when a sign language interpreter generates subtitle data, translation is performed by dividing one to 10 sign words per one input sign language video, so that the meaning the deaf person wants to convey is expressed in one word, word, or phrase unit. Another object is to provide a system for adding subtitles to sign language videos, in which meaning is conveyed by dividing in gloss units so that meaning can be conveyed more appropriately.
이러한 목적을 달성하기 위한 본 발명에 따른 수어 동영상에 자막을 추가하는 시스템은, 전기·전자통신망을 통해 수어 동영상을 전송받는 제1단계(S100); 전송받은 상기 수어 동영상을 저장하는 제2단계(S200); 키워드명 폴더를 참조하여 상기 수어 동영상을 비디오 마이닝 기법으로 키워드 분석하는 제3단계(S300); 상기 분석 단계에서 키워드가 추출되었는지 확인하고, 추출되지 않았으면 상기 수어 동영상을 수어 통역사에게 번역을 의뢰하게 하는 제4단계(S400); 만일, 상기 제4단계(S400)에서, 키워드가 추출되면 추출된 키워드를 의뢰인에게 텍스트로 전송하여 자막으로 사용할 것인지 확인하고, 자막으로 사용하지 않으면 상기 제4단계(S400)의 수어 통역사에게 번역을 의뢰하게 하는 제5단계(S500); 만일, 상기 제5단계(S500)에서 의뢰인이 전송받은 텍스트를 자막으로 사용하는 것을 승낙하면, 상기 추출된 키워드를 수어 동영상에 자막으로 추가하여 저장하는 제6단계(S600); 및 자막이 추가된 수어 동영상을 의뢰인에게 전송하게 하는 제7단계(S700);를 포함하되; 상기 수어 통역사를 통해 번역을 의뢰함에 따라 이루어지는 절차는, 수어 통역사에게 번역을 의뢰하게 하는 제4-1단계(S410); 수어 통역사가 상기 수어 동영상을 보면서 1~10개의 수어로 이루어져서 단어나 어절 그리고 어구를 나타내는 글로스(Gloss) 단위의 시작 시각과 종료 시각 그리고 상기 글로스 단위로 번역한 번역 텍스트를 포함하는 자막 데이터를 작성하여 저장하게 하는 제4-2단계(S420); 상기 자막 데이터를 상기 수어 동영상에 자막으로 추가하는 제4-3단계(S430); 동시에, 상기 자막 데이터에서 번역 텍스트를 분리하는 제4-3단계(S430); 분리된 번역 텍스트에 대해 텍스트 마이닝 기법을 이용하여 추출 키워드를 추출하는 제4-4단계(S440); 상기 추출 키워드를 폴더 이름으로 사용하는 키워드명 폴더가 생성되어 있는지 확인하고, 만일 키워드명 폴더가 생성되어 있지 않으면 상기 추출 키워드를 폴더 이름으로 키워드명 폴더를 생성하는 제4-5단계(S450); 상기 수어 동영상에 대해 상기 자막 데이터로 저장된 상기 시작 시각과 종료 시각으로 분할하여 글로스 단위로 분할 영상으로 만드는 제4-6단계(S460); 및 상기 각 분할 영상을 각 글로스 단위에서 추출된 추출 키워드를 폴더 이름으로 사용하는 키워드명 폴더에 저장하여 상기 제3단계(S300)에서 비디오 마이닝 기법으로 키워드를 추출할 때, 상기 키워드명 폴더를 참조하여 사용빈도가 높은 분할 동영상을 통해 키워드 검색되게 제공하는 제4-7단계(S470);를 포함하는 것을 특징으로 한다.A system for adding subtitles to a sign language video according to the present invention for achieving this object includes a first step of receiving a sign language video through an electric/electronic communication network (S100); A second step (S200) of storing the transmitted sign language video; A third step (S300) of keyword analysis of the sign language video using a video mining technique by referring to a keyword name folder; A fourth step (S400) of checking whether keywords have been extracted in the analysis step and, if not, requesting a sign language interpreter to translate the sign language video; If the keyword is extracted in the fourth step (S400), the extracted keyword is transmitted as text to the client to confirm whether to use it as a caption, and if not used as a caption, the sign language interpreter in the fourth step (S400) performs the translation. A fifth step of making a request (S500); If the client agrees to use the transmitted text as a caption in the fifth step (S500), a sixth step (S600) of adding and storing the extracted keyword as a caption to the sign language video; and a seventh step (S700) of transmitting the sign language video with subtitles added to the client; The procedure performed when requesting translation through the sign language interpreter includes a 4-1 step (S410) of requesting translation from the sign language interpreter; While watching the sign language video, a sign language interpreter creates subtitle data including start and end times in gloss units consisting of 1 to 10 sign languages and representing words, phrases, and phrases, and translated text translated in gloss units A 4-2 step to store (S420); a 4-3 step of adding the caption data to the sign language video as a caption (S430); Simultaneously, a 4-3 step of separating translated text from the subtitle data (S430); A 4-4 step (S440) of extracting an extraction keyword from the separated translation text using a text mining technique; Step 4-5 of checking whether a keyword name folder using the extracted keyword as a folder name is created, and if the keyword name folder is not created, creating a keyword name folder using the extracted keyword as the folder name (S450); Step 4-6 (S460) of dividing the sign language video by the start time and end time stored as the caption data into divided images in gloss units; And storing the divided images in a keyword name folder using the extracted keyword extracted from each gloss unit as a folder name, and referring to the keyword name folder when the keyword is extracted by the video mining technique in the third step (S300). and 4th to 7th steps (S470) of providing keyword search through segmented videos with high frequency of use.
특히, 상기 수어 동영상은, 영상 길이가 1초~20분인 것을 특징으로 한다.In particular, the sign language video is characterized in that the length of the video is 1 second to 20 minutes.
또한, 상기 수어는, 손의 모양인 수형(手形, dez), 손의 위치인 수위(手位, tab), 손의 움직임인 수동(手動, sig), 손바닥의 방향인 수향(手向, orientation), 표정과 몸의 움직임인 비수지신호(非手指信號, non-manual signals) 중 적어도 하나로 이루어진 것을 특징으로 한다.In addition, the sign language is the shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (手向, orientation) ) and at least one of non-manual signals, which are facial expressions and body movements.
그리고 상기 제4-7단계(S470)에서, 상기 각 분할 영상에 해당하는 상기 추출 키워드가 적어도 두 개의 단어로 이루어졌을 때는, 상기 각 단어가 포함된 각 키워드명 폴더에 각각 분할 영상을 저장하게 하는 것을 특징으로 한다.And in step 4-7 (S470), when the extracted keyword corresponding to each split image consists of at least two words, storing the split images in each keyword name folder including each word characterized by
한편, 상기 수어 동영상에 자막을 추가하는 시스템에서, 미리 정해진 시간 간격으로 업데이트가 이루어지게 구성된 것을 특징으로 한다.Meanwhile, in the system for adding subtitles to the sign language video, it is characterized in that the update is performed at predetermined time intervals.
마지막으로, 상기 제4-1단계(S410)에서, 수어 통역사에게 번역을 의뢰할 때는, 의뢰인이 등록된 수어 통역사에서 선택하게 하거나, 의뢰인이 살고 있는 지역에 등록된 수어 통역사에게 의뢰하게 하거나, 수어 통역사가 여러 명이 있을 때 자신이 전에 수어 동영상을 의뢰했던 수어 통역사를 지정해서 의뢰하게 하거나, 의뢰인이 살고 있는 지역을 담당하는 수어 통역사를 선택하게 하거나, 임의로 수어 통역사를 지정하여 번역하게 하는 것을 특징으로 한다.Finally, in step 4-1 (S410), when requesting a sign language interpreter for translation, the client selects from registered sign language interpreters, requests a sign language interpreter registered in the region where the client lives, or sign language interpreter. When there are several interpreters, it is characterized by designating and requesting a sign language interpreter who had previously requested a sign language video, selecting a sign language interpreter in charge of the area where the client lives, or randomly designating a sign language interpreter to translate do.
[발명의 효과][Effects of the Invention]
본 발명에 따른 수어 동영상에 자막을 추가하는 시스템는 다음과 같은 효과가 있다.The system for adding subtitles to sign language videos according to the present invention has the following effects.
(1) 수어 동영상에 자막을 제공할 때 수어 통역사가 직접 농문화에서 의미 전달이 이루어지는 글로스(Gloss) 단위로 번역한 내용을 바탕으로 텍스트 마이닝 기법으로 추출한 키워드로 생성된 폴더에 저장된 분할 동영상을 바탕으로 비디오 마이닝 기법을 통해 수어 동영상에 자막을 제공할 수 있게 구성함으로써, 농문화에서 의미 전달이 이루어지는 글로스(Gloss) 단위로 번역이 이루어지게 되어 농문화에서 이루어지는 의미 전달이 더욱더 적절하게 이루어진다.(1) When subtitles are provided for sign language videos, sign language interpreters directly translate content in gloss units where meaning is conveyed in deaf culture, based on segmented videos stored in folders created with keywords extracted by text mining techniques By configuring a sign language video to be provided with subtitles through a video mining technique, translation is performed in units of gloss in which meaning is conveyed in deaf culture, so that meaning transmission in deaf culture is more appropriate.
(2) 특히, 비디오 마이닝 기법으로 키워드가 추출되지 않거나 이렇게 추출된 키워드를 의뢰인에게 제공했는데 자막으로 활용하지 않을 때, 수어 통역사에게 직접 번역을 의뢰하여 자막 데이터를 얻을 수 있게 구성함으로, 상기 자막 데이터를 바로 수어 동영상에 자막으로 활용하고, 동시에 이 자막 데이터를 텍스트 마이닝 기법으로 추출하여 얻은 키워드로 폴더를 만들어서 여기에 분할 동영상을 저장하여 상기 비디오 마이닝 기법으로 키워드를 추출할 때 활용함으로써, 수어 통역사가 글로스 단위로 번역한 자막 데이터를 통해 활용하여 농문화의 의미 전달이 한층 더 적절하게 전달할 수 있다.(2) In particular, when keywords are not extracted by the video mining technique or when the extracted keywords are provided to the client but not used as subtitles, the subtitle data can be obtained by directly requesting a sign language interpreter for translation. is directly used as a subtitle in sign language video, and at the same time, a folder is created with keywords obtained by extracting this subtitle data with text mining technique, and the divided video is stored there and used when extracting keywords with the video mining technique, so that sign language interpreters can By utilizing subtitle data translated in gloss units, the meaning of deaf culture can be conveyed more appropriately.
(3) 한편, 이처럼 수어 통역사가 번역하여 저장한 번역 텍스트(자막 데이터)를 바로 수어 동영상의 자막으로도 활용할 수 있음으로, 텍스트 마이닝을 거치지 않고 직접 통역사의 번역 텍스트를 이용할 수 있어 농문화에서 이루어지는 의미 전달 단위로 더욱 적절하게 의미 전달이 이루어질 수 있다.(3) On the other hand, since the translated text (subtitle data) translated and stored by a sign language interpreter can be directly used as a subtitle for a sign language video, it is possible to directly use the translated text of an interpreter without going through text mining, which means meaning in deaf culture. Meaning can be delivered more appropriately in units of delivery.
(4) 또한, 이때 분할 동영상이 저장된 키워드명 폴더를 참조하여 비디오 마이닝이 이루어질 때는, 키워드명 폴더에서 사용빈도가 가장 높은 분할 영상이 선택되어 키워드 추출이 이루어지게 구성함으로, 자주 추출된 키워드가 반복해서 추출되게 하여 농문화에서 이루어지는 의미 전달이 한층 더 적절하게 이루어지게 한다.(4) In addition, at this time, when video mining is performed by referring to the keyword name folder in which the split videos are stored, the most frequently used split video is selected from the keyword name folder and keyword extraction is performed, so that frequently extracted keywords are repeated. Therefore, it is extracted so that the transmission of meaning in deaf culture is more appropriate.
(5) 한편, 자막 데이터는 몇 개의 수어로 이루어진 글로스 단위로 번역해 놓음으로써, 수어할 때 동작이 느리거나 단어 하나로 뜻을 전달하려고 할 때처럼 완전하게 뜻을 전달할 수 없는 상황에서도 동작의 일부만 입력되어도 의도한 번역 결과를 얻을 수 있다.(5) On the other hand, by translating subtitle data in gloss units consisting of several sign languages, only a part of the movement is input even in a situation where the meaning cannot be completely conveyed, such as when the movement is slow during sign language or when trying to convey the meaning with a single word. However, the intended translation result can be obtained.
(6) 그리고 본 발명은 하루에 3번과 같이 미리 정해놓은 시간 단위마다 업데이트 할 수 있게 구성함으로써, 번역의 정확도를 높여 그만큼 수어로 전달하려고 하는 의미와 자막을 가능한 한 일치하게 제공할 수 있다.(6) In addition, the present invention is configured to be updated every predetermined time unit, such as three times a day, so that the accuracy of translation can be increased and the meaning and subtitles to be conveyed in sign language can be provided as coincidentally as possible.
(7) 마지막으로, 상기 수어는, 손의 모양인 수형(手形, dez), 손의 위치인 수위(手位, tab), 손의 움직임인 수동(手動, sig), 손바닥의 방향인 수향(手向, orientation), 표정과 몸의 움직임인 비수지신호(非手指信號, non-manual signals)를 포함하여 구성함으로써, 손이나 신체의 움직임과 더불어 표정 등을 통해 표현하는 의사표시를 자막 데이터화 할 수 있어 자막을 통한 의미전달이 한층 더 적절하게 이루어지게 된다. (7) Finally, the sign language is the shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (hand direction) By constructing including non-manual signals, which are hand movements, orientation, facial expressions and body movements, expressions of intention expressed through facial expressions along with hand or body movements can be converted into subtitle data. Therefore, meaning can be conveyed more appropriately through subtitles.
[도 1]은 본 발명에 따른 수어 동영상에 자막을 추가하는 시스템의 동작을 보여주는 흐름도이다.1 is a flowchart showing the operation of a system for adding subtitles to a sign language video according to the present invention.
[도 2]는 본 발명에 따라 수어 통역사가 번역하여 저장한 자막 데이터를 예시적으로 보여주는 이미지이다.2 is an image showing subtitle data translated and stored by a sign language interpreter according to the present invention as an example.
[도 3]은 본 발명에 따른 자막 데이터에서 번역 텍스트를 추출한 것을 예시적으로 보여주는 이미지이다.[Fig. 3] is an image showing an example of extracting translated text from subtitle data according to the present invention.
[도 4]는 본 발명에 따라 추출된 번역 텍스트에 대해 텍스트 마이닝 기법을 이용하여 키워드 추출한 결과의 일부를 보여주는 이미지이다.[Fig. 4] is an image showing part of the result of keyword extraction using a text mining technique for the translated text extracted according to the present invention.
[발명의 실시를 위한 최선의 형태][Best mode for carrying out the invention]
이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 더욱 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 안 되며, 발명자는 그 자신의 발명을 최고의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 따라 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. Prior to this, the terms or words used in this specification and claims should not be construed as being limited to their usual or dictionary meanings, and the inventors should properly define the concept of terms in order to best explain their invention. According to the principle that it can be interpreted as meaning and concept consistent with the technical spirit of the present invention.
따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원 시점에서 이들을 대체할 수 있는 다양한 균등물과 변형례가 있을 수 있음을 이해하여야 한다.Therefore, since the embodiments described in this specification and the configurations shown in the drawings are only one of the most preferred embodiments of the present invention and do not represent all of the technical spirit of the present invention, various equivalents that can replace them at the time of this application It should be understood that there may be variations.
[수어 동영상에 자막을 추가하는 시스템][System to add subtitles to sign language videos]
본 발명에 따른 수어 동영상에 자막을 추가하는 시스템에는, [도 1] 내지 [도 4]와 같이, 전송받은 수어 동영상에 대해 분할 동영상이 저장된 키워드명 폴더를 참조하여 비디오 마이닝 기법으로 키워드를 추출하고, 추출된 키워드를 의뢰인에게 제공하여 자막으로 활용할 것인지 확인한 다음 수어 동영상 자막으로 사용하게 구성함으로써, 수어 통역사가 관여하여 제작된 자막 데이터를 바탕으로 생성된 키워드명 폴더가 사용됨에 따라 농문화에 적합한 번역이 이루어지게 하여 자막의 정확도를 높일 수 있게 한 것이다.In the system for adding subtitles to a sign language video according to the present invention, as shown in [Figs. 1] to [Fig. 4], keywords are extracted by a video mining technique by referring to the keyword name folder in which divided videos are stored for the transmitted sign language video, , The extracted keywords are provided to the client to confirm whether they will be used as subtitles, and then used as sign language video subtitles. As the keyword name folder created based on the subtitle data produced with the participation of sign language interpreters is used, translation suitable for deaf culture is possible. This made it possible to increase the accuracy of subtitles.
이때, 상기 자막 데이터는 청각 장애인의 고유문화인 농문화로 의미 전달을 할 수 있는 글로스(Gloss) 단위로 나눠서 번역하게 하고, 이를 바탕으로 전송받은 수어 동영상을 분할하여 얻은 분할 동영상을 얻을 수 있게 구성함으로써, 분할 동영상을 이용하여 비디오 마이닝 기법으로 수어 동영상에서 자막으로 사용될 키워드를 추출할 때 의미 전달이 이루어지는 단어나 어절 또는 어구 단위로 키워드 추출이 이루어지게 되어 전달하려고 하는 의미 전달이 더욱더 적절하게 이루어질 수 있게 한 것이다.At this time, the subtitle data is divided into gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, and translated, and based on this, the transmitted sign language video is divided to obtain a segmented video obtained by configuring it. , When extracting keywords to be used as subtitles from sign language videos using a video mining technique using segmented videos, keyword extraction is performed in units of words, phrases, or phrases that convey meaning, so that the meaning to be conveyed can be conveyed more appropriately. it did
이하, 이러한 구성에 관해 첨부도면을 참조하여 더욱 상세하게 설명하면 다음과 같다. 여기서, 본 발명에 따른 수어 동영상에 자막을 추가하는 시스템은 7단계에 걸쳐 수행됨에 따라 각 단계로 나눠서 설명한다. Hereinafter, this configuration will be described in more detail with reference to the accompanying drawings. Here, as the system for adding subtitles to a sign language video according to the present invention is performed in seven steps, each step will be described separately.
가. 제1단계go. Step 1
제1단계(S100)는, [도 1]과 같이, 수어 동영상을 전송받는 단계이다. 이때 의뢰인이 스마트폰이나 PDA와 같은 휴대단말이나 개인용 컴퓨터와 같은 단말을 이용하여 인터넷과 같은 전기·전자통신망을 통해 의뢰하게 한다.The first step (S100) is a step of receiving a sign language video as shown in [Fig. 1]. At this time, the client makes a request through an electric/electronic communication network such as the Internet using a mobile terminal such as a smart phone or PDA or a terminal such as a personal computer.
나. 제2단계me. Step 2
제2단계(S200)는, [도 1]과 같이, 전송받은 수어 동영상을 저장하는 단계이다. 이때 상기 수어 동영상은 원본으로 사용하게 되고, 후술하는 과정을 통해 얻은 자막을 추가하여 의뢰인에게 보내줄 때나, 수어 동영상을 농문화에서 의미 전달이 이루어지는 단어나 어절 또는 어구 단위인 글로스(Gloss)로 번역되는 분할 동영상을 제작할 때도 이용된다. 이에 관해서는 후술한다.The second step (S200) is a step of storing the transmitted sign language video as shown in [Fig. 1]. At this time, the sign language video is used as the original, and when the subtitles obtained through the process described later are added and sent to the client, or the sign language video is translated into gloss, a word, phrase, or phrase unit that conveys meaning in deaf culture. It is also used when creating segmented videos. This will be described later.
다. 제3단계all. Step 3
제3단계(S300)는, [도 1]과 같이, 상술한 수어 동영상을 비디오 마이닝 기법을 이용하여 키워드를 추출하는 단계이다. 이때, 비디오 마이닝 기법을 이용할 때는 후술할 키워드명 폴더에 저장된 분할 동영상을 참조하여 키워드를 추출하게 된다. 이러한 키워드명 폴더와 분할 동영상에 대해서는 제4단계(S400)와 함께 후술한다.The third step (S300) is a step of extracting keywords from the above-described sign language video using a video mining technique, as shown in [Fig. 1]. At this time, when using the video mining technique, keywords are extracted by referring to divided videos stored in a keyword name folder, which will be described later. The keyword name folder and the divided videos will be described later along with the fourth step (S400).
라. 제4단계la. Step 4
제4단계(S400)는, [도 1]과 같이, 수어 동영상에 대해 비디오 마이닝 기법을 통해 키워드명 폴더에 저장된 분할 동영상과 비교하여 키워드를 추출할 수 있는지 비교하는 단계이다. 여기서, 비디오 마이닝 기법은 매장에서 비디오카메라를 통해 얻은 방문객들의 행태를 분석하여 촬영된 영상의 고객이 어떤 물건을 사는지 등을 분석한 자료를 바탕으로 예상 매출액과 방문자수 산출 그리고 소매 업계의 동향 등을 파악하는 자료로 사용하는 통상의 기술을 이용한다. 특히, 본 발명에서는 수어 동영상을 분석하여 분석된 움직임과 후술할 분할 동영상을 비교하여 키워드를 추출하는데 활용하게 된다.The fourth step (S400), as shown in [Fig. 1], is a step of comparing whether a keyword can be extracted by comparing a sign language video with a divided video stored in a keyword name folder through a video mining technique. Here, the video mining technique analyzes the behavior of visitors obtained through a video camera in the store, calculates the expected sales and number of visitors, and grasps trends in the retail industry based on the data that analyzes what kind of products the customers buy in the recorded video. Use the usual techniques used as data to do. In particular, in the present invention, a sign language video is analyzed and analyzed motion is compared with a segmented video to be described later to be used to extract keywords.
한편, 제4단계(S400)에서는, 이처럼 분할 동영상이 저장된 키워드명 폴더를 참조하여 비디오 마이닝 기법으로 수어 동영상에서 키워드를 추출하게 되는데, 이때 참조할 만한 분할 동영상이 없어 키워드를 추출하지 못하면, 수어 통역사에게 번역을 의뢰하게 하고(제4-1단계(S410)), 키워드를 추출하게 되면 키워드를 의뢰인에게 제공하는 제5단계(S500)를 수행하게 한다. 여기서, 수어 통역사에게 번역을 의뢰하여 얻은 자막 데이터를 통해 상기 비디오 마이닝에서 키워드 추출에 사용되는 분할 동영상을 저장하는 과정(S410~S470)에 대해서는 후술하고, 비디오 마이닝을 통해 추출된 키워드 처리(S500)에 대해 먼저 설명한다.On the other hand, in the fourth step (S400), keywords are extracted from the sign language video using a video mining technique by referring to the keyword name folder in which the divided video is stored. The translation is requested (Step 4-1 (S410)), and when the keyword is extracted, the fifth step (S500) of providing the keyword to the client is performed. Here, the process of storing the segmented video used for keyword extraction in the video mining through subtitle data obtained by requesting a sign language interpreter for translation (S410 to S470) will be described later, and keyword processing extracted through video mining (S500) will be explained first.
마. 제5단계mind. Step 5
제5단계(S500)는, [도 1]과 같이, 상기 제4단계(S400)에서 키워드가 추출되었을 때, 추출된 키워드를 의뢰인에게 제공하여 자막으로 사용할 것인지 확인하는 단계이다. 이때, 추출된 키워드는 전기·전자통신망을 통해 상술한 수어 동영상 자막을 의뢰한 의뢰인에게 전송함으로써, 의뢰인이 추출된 키워드를 검토한 다음, 이를 자막으로 사용할지 수어 통역사에게 번역을 의뢰할지 선택하게 하는 단계이다.The fifth step (S500), when the keyword is extracted in the fourth step (S400), as shown in [Fig. 1], provides the extracted keyword to the client and confirms whether to use it as a subtitle. At this time, the extracted keywords are transmitted to the client who requested the above-mentioned sign language video subtitles through an electric/electronic communication network, so that the client reviews the extracted keywords and then selects whether to use them as subtitles or request a sign language interpreter to translate them. It is a step.
만일, 의뢰인이 자막으로 사용하는 것으로 선택하면 후술하는 제6단계(S600)가 수행되고, 수어 통역사에게 번역을 의뢰하면 후술할 제4-1단계(S410)를 수행하게 된다. 여기서, 상기 제4-1단계(S410)는 제4단계(S400)에서 키워드를 추출하지 못해서 수어 통역사에게 의뢰하는 것과 같은 절차를 수행하게 되므로, 이와 함께 후술하기로 한다.If the client selects to use it as a subtitle, a sixth step (S600) to be described later is performed, and if a sign language interpreter is requested for translation, a 4-1 step (S410) to be described later is performed. Here, in the 4-1 step (S410), the same procedure as requesting a sign language interpreter is performed because the keyword was not extracted in the 4-1 step (S400), which will be described later together.
바. 제6단계bar. Step 6
제6단계(S600)는, [도 1]과 같이, 의뢰인에게 제공하여 자막으로 활용해도 좋다고 허락을 받은 키워드를 상기 수어 동영상에 추가하여 자막이 추가된 수어 동영상을 저장하는 단계이다. A sixth step (S600) is a step of storing the sign language video with subtitles added by adding keywords provided to the client and permitted to be used as subtitles to the sign language video, as shown in [Fig. 1].
사. 제7단계buy. Step 7
제7단계(S700)는, [도 1]과 같이, 저장된 수어 동영상을 의뢰인에게 전송하는 단계이다. 이때, 수어 동영상은 수어 동영상을 전송받을 때 사용되는 단말로 전송할 수도 있고, 의뢰인이 지정한 다른 단말로 전송할 수도 있다.The seventh step (S700) is a step of transmitting the stored sign language video to the client, as shown in [Fig. 1]. In this case, the sign language video may be transmitted to a terminal used to receive the sign language video or to another terminal designated by the client.
(수어 통역사 번역 의뢰)(Request for sign language interpreter translation)
한편, 상술한 제4단계(S400)에서, [도 1]과 같이, 비디오 마이닝을 통해 키워드가 추출되지 않거나, 의뢰인에게 제공된 키워드를 의뢰인이 자막으로 사용하지 않아 수어 통역사에게 번역을 의뢰함에 따라, 수어 통역사가 번역한 내용을 자막으로 사용하는 과정과 수어 동영상을 분할 동영상으로 나누고 이를 키워드명 폴더에 저장하여 비디오 마이닝할 때 이용할 수 있게 이루어지는 절차를 살펴보면 다음과 같다. 이러한 절차는 아래와 같이 7단계에 걸쳐 이루어진다.Meanwhile, in the above-described fourth step (S400), as shown in [Fig. 1], keywords are not extracted through video mining or the client does not use the keywords provided to the client as subtitles, so a sign language interpreter is requested for translation, A process of using the content translated by a sign language interpreter as a subtitle and a process of dividing a sign language video into divided videos and storing them in a keyword name folder so that they can be used for video mining are as follows. This process is carried out in 7 steps as follows.
1. 제4-1단계1. Stage 4-1
제4-1단계(S410)는, [도 1]과 같이, 수어 통역사에게 수어 동영상 번역을 의뢰하는 단계이다. 이때, 수어 동영상의 의뢰자는 수어 통역사가 여러 명이 있을 때 자신이 전에 수어 동영상을 의뢰했던 수어 통역사를 지정해서 의뢰하게 하거나, 의뢰인이 살고 있는 지역을 담당하는 수어 통역사를 선택하게 하거나, 임의로 수어 통역사를 지정하게 할 수도 있다.Step 4-1 (S410) is a step of requesting a sign language interpreter to translate a sign language video, as shown in FIG. 1 . At this time, when there are several sign language interpreters, the requester of the sign language video designates and requests the sign language interpreter he or she previously requested for the sign language video, selects a sign language interpreter in charge of the region where the client lives, or arbitrarily selects a sign language interpreter. can also be specified.
2. 제4-2단계2. Stage 4-2
제4-2단계(S420)는, [도 1] 및 [도 2]와 같이, 전송받은 수어 동영상을 보고 수어 통역사가 번역하여 자막 데이터를 만들어서 저장하게 하는 단계이다. In step 4-2 (S420), as shown in [FIG. 1] and [FIG. 2], the sign language interpreter reads the transmitted sign language video and translates it to create and store caption data.
여기서, 자막 데이터는, [도 2]와 같이, 수어 통역사가 전송받은 수어 동영상을 보면서 청각 장애인의 고유문화인 농문화로 의미 전달을 할 수 있는 글로스(Gloss) 단위로 해서 임의로 나눠서 번역한 데이터를 말한다. 이러한 자막 데이터는 수어 통역사가 의미 전달을 위해 임의로 나눈 시각 즉, 수어 동영상의 타임 라인에서 글로스 단위의 시작과 끝인 시작 시각과 종료 시각 그리고 이 글로스 단위를 번역한 번역 텍스트가 포함된다. 이때, 상기 번역 텍스트는 수어 통역사가 상기 수어 동영상을 보면서 직접 번역한 내용을 입력하게 함으로써, 직접 수어를 보면서 그 상황에 맞도록 적절하게 의미 전달이 이루어질 수 있게 번역이 이루어진다. Here, subtitle data refers to data that is arbitrarily divided and translated in gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, while watching a sign language video transmitted by a sign language interpreter, as shown in [Fig. 2]. . Such subtitle data includes a time arbitrarily divided by a sign language interpreter to convey meaning, that is, a start time and an end time, which are the beginning and end of a gloss unit in the timeline of a sign language video, and translated text translated by the gloss unit. In this case, the translated text is translated so that the sign language interpreter directly inputs the translated content while watching the sign language video so that the meaning can be conveyed appropriately according to the situation while watching the sign language.
[도 2]에서, 'intRequestIdx'는 하나의 수어 동영상에 대해 글로스 단위로 나눈 것을 나타내는 것으로, 'intRequestIdx'가 같은 숫자는 하나의 동영상에 여러 개의 글로스 단위로 구분해서 번역했음을 보여준다. 예를 들어, [도 1]에서 'intRequestIdx'가 '1,503'로 저장되는 하나의 수어 동영상은 수어 통역사가 농문화에서 의미 전달이 적절하게 이루어지도록 8개의 글로스 단위로 구분한 것이고, 각 글로스는 'intRequest Idx' 와 'intOrder' 한 쌍을 식별자로 갖는다. 그리고 각 글로스 단위는 수어 동영상의 타임 라인을 기준으로 'strStartTime(시작 시각)'과 strEndTime(종료 시각)'으로 구분되어 저장되어 있으며, 각 글로스 단위로 번역된 내용은 'strText'로 저장된다.In [Fig. 2], 'intRequestIdx' indicates that one sign language video is divided into gloss units, and a number with the same 'intRequestIdx' shows that one video is divided into several gloss units and translated. For example, in [Fig. 1], a sign language video in which 'intRequestIdx' is stored as '1,503' is divided into 8 gloss units by a sign language interpreter to properly convey meaning in deaf culture, and each gloss is 'intRequest It has a pair of 'Idx' and 'intOrder' as an identifier. In addition, each gloss unit is stored as 'strStartTime (start time)' and strEndTime (end time)' based on the timeline of the sign language video, and the contents translated in each gloss unit are stored as 'strText'.
여기서, 본 발명의 바람직한 실시예에서, 상기 수어는 수어에 이용되는 움직임이나 행동 그리고 표정 등을 모두 포함할 수 있는데, 가장 바람직하게는 손의 모양인 수형(手形, dez), 손의 위치인 수위(手位, tab), 손의 움직임인 수동(手動, sig), 손바닥의 방향인 수향(手向, orientation), 표정과 몸의 움직임인 비수지신호(非手指信號, non-manual signals) 중 적어도 하나로 표현된 것을 의미한다. 이는, 농인이 의미 전달에 사용하는 모든 수단으로 표현하는 의미를 수어 통역사가 직접 보면서 정확한 의미를 파악하여 자막화할 수 있게 하기 위함이다.Here, in a preferred embodiment of the present invention, the sign language may include all movements, actions, facial expressions, etc. used in sign language, and most preferably, hand shape (dez), hand position, water level. (手位, tab), hand movement (手动, sig), palm direction (手向, orientation), facial expression and body movement (non-hand signal) It means at least one expression. This is to enable a sign language interpreter to directly observe the meaning expressed by all means used by the deaf to convey meaning and to comprehend the exact meaning and convert it into subtitles.
그리고 본 발명의 바람직한 실시예에서, 상기 수어는 청각 장애인의 고유문화인 농문화로 의미 전달을 할 수 있는 글로스 단위를 이용하는 것이 바람직하며, 이를 위해, 상기 수어는 1~10개소, 가장 바람직하게는 1~3개소로 이루어지게 구성함으로써, 수어의 한 동작이 하나의 명사와 같이 의미 전달이 되게 하거나, 두 개 또는 세 개의 수어 동작이 하나의 의미 전달이 이루어지는 단위로 활용할 수 있게 한다.And in a preferred embodiment of the present invention, it is preferable to use a gloss unit capable of conveying meaning to deaf culture, which is the unique culture of the hearing impaired, for the sign language. By configuring it to consist of ~3 places, one motion of sign language can convey meaning like one noun, or two or three sign language motions can be used as a unit in which one meaning is conveyed.
또한, 본 발명의 바람직한 실시예에서, 상기 수어 동영상은 영상 길이가 영상 길이가 1초~20분 정도의 영상을 이용하는 것이 바람직하다. 이는, 수어 통역사가 집중하여 정확한 번역이 가능할 뿐만 아니라 신속하게 자막을 추가하여 농인에게 바로 피드백해 주기 위한 것이다. 물론, 이러한 영상의 길이는 이보다 더 길더라도 처리에 시간이 조금 더 걸릴 뿐, 번역하여 자막을 제공하는 데는 아무런 문제가 없음을 본 발명이 속한 기술분야의 종사자라면 누구든지 쉽게 알 수 있을 것이다.In addition, in a preferred embodiment of the present invention, it is preferable to use a video with a video length of about 1 second to 20 minutes for the sign language video. This is to allow the sign language interpreter to focus and accurately translate, and to quickly add subtitles and give feedback to the deaf right away. Of course, even if the length of such a video is longer than this, it will take a little longer to process, and anyone skilled in the art will be able to easily understand that there is no problem in translating and providing subtitles.
이러한 상기 자막 데이터는, [도 1]과 같이, 수어 동영상에 자막으로 제공될 뿐만 아니라(제4-2단계), 텍스트 마이닝 기법을 통해 키워드를 추출하여 이 자막 데이터를 기준으로 수어 동영상을 분할한 분할 동영상을 저장하는 키워드명 폴더를 형성한 다음, 다른 수어 동영상에 대해 비디오 마이닝 기법으로 검색 키워드를 추출할 때 키워드 추출이 이루어지게 하여 자동으로 자막을 생성(제4-3단계 내지 제4-7단계)할 때 이용하게 된다.As shown in [Fig. 1], the caption data is not only provided as a caption to the sign language video (step 4-2), but also extracts keywords through a text mining technique and divides the sign language video based on the caption data. After forming a keyword name folder to store the divided videos, keyword extraction is performed when a search keyword is extracted for other sign language videos using a video mining technique, and subtitles are automatically generated (steps 4-3 to 4-7). step) to be used.
마지막으로, 상기 제4-2단계(S420)는, [도 1]과 같이, 상술한 자막 데이터를 이용하여 수어 동영상에 자막으로 활용하는 단계이다. 이때, 상기 자막 데이터에는 의미 전달 단위(Gloss)인 글로스 단위가 시작하고 끝나는 시작 시각과 종료 시각 그리고 이 글로스 단위에 대해 수어 통역사가 직접 번역한 번역 텍스트가 포함되어 있음으로, 이를 이용해서 수어 동영상에 바로 자막이 생성되게 하여 수어 동영상을 바로 공급할 수 있게 한다.Finally, the 4-2 step (S420) is a step of using the above-described subtitle data as a subtitle for a sign language video, as shown in [Fig. 1]. At this time, since the subtitle data includes the start time and end time of the gloss unit, which is a unit for conveying meaning (gloss), and the translated text directly translated by the sign language interpreter for the gloss unit, the sign language video is produced using this. Subtitles are generated immediately so that sign language videos can be supplied immediately.
3. 제4-3단계3. Stage 4-3
제4-3단계(S430)는, [도 1]과 같이, 상술한 자막 데이터에서 번역 텍스트만 분리하는 단계이다. 이는, 텍스트 마이닝 기법으로 분리된 번역 테스트에서 추출 키워드를 추출하기 위한 것이다. 이때, 상기 추출 키워드로는 폴더명으로 만들어서 필요한 분할 영상을 저장하게 하고, 후술하는 바와 같이 비디오 마이닝 기법으로 수어 동영상에서 키워드를 추출할 때는 이 폴더에 저장된 데이터를 활용할 수 있게 하기 위함이다. 이에 관해서는 후술하는 단계에서 차례로 설명한다.Step 4-3 (S430) is a step of separating only translated text from the above-described subtitle data, as shown in [Fig. 1]. This is to extract the extracted keywords from the translation test separated by the text mining technique. At this time, as the extracted keyword, a folder name is used to store necessary divided images, and as will be described later, when extracting a keyword from a sign language video using a video mining technique, data stored in this folder can be used. This will be explained in turn in the steps to be described later.
4. 제4-4단계4. Stage 4-4
제4-4단계(S440)는, [도 1]과 같이, 글로스 단위로 분리된 번역 텍스트에서 각각의 추출 키워드를 추출하는 단계이다. 이때, 상기 추출 키워드는 자막 데이터에서 분리된 번역 텍스트를 텍스트 마이닝 기법을 이용하여 키워드를 추출한다. 여기서, 텍스트 마이닝(Text Mining) 기법은 언어학이나 통계학 그리고 기계 학습 등을 기반으로 자연언어처리 기술을 활용하여 반정형/비정형 텍스트 데이터를 정형화하고, 그 특징을 키워드 형태로 추출하여 의미가 있는 정보를 찾아내는 통상의 기술을 말한다.Step 4-4 (S440) is a step of extracting each extraction keyword from the translation text separated in gloss units, as shown in [Fig. 1]. In this case, the extracted keyword is extracted from the translated text separated from the subtitle data using a text mining technique. Here, the text mining technique uses natural language processing technology based on linguistics, statistics, machine learning, etc. to standardize semi-structured/unstructured text data, extracts its features in the form of keywords, and extracts meaningful information. It refers to the usual technique of finding.
이처럼 추출된 추출 키워드는 후술하는 폴더의 폴더명(키워드명)으로 사용될 뿐만 아니라, 다른 동영상을 비디오 마이닝 기법으로 키워드를 추출할 때 사용빈도에 따라 적절한 키워드가 추출되게 하여 자막으로 활용된다. [도 3]은 [도 2]의 자막 데이터에서 분리된 텍스트를, [도 4]는 [도 3]의 텍스트를 본 발명에 따라 텍스트 마이닝 기법을 통해 추출된 추출 키워드의 일부를 보여준다. [도 4]에서, 'keyword'는 본 발명에 따라 텍스트 마이닝 기법으로 추출된 추출 키워드를, 'Frequency'는 상기 추출 키워드를 폴더 이름으로 하여 생성된 키워드명 폴더에 저장된 분할 동영상이 비디오 마이닝 기법을 통해 키워드를 추출할 때 선택된 사용 빈도를 각각 나타낸다.The extracted keywords extracted in this way are not only used as folder names (keyword names) of folders to be described later, but also used as subtitles by extracting appropriate keywords according to frequency of use when keywords are extracted from other videos using a video mining technique. [Figure 3] shows the text separated from the subtitle data of [Figure 2], and [Figure 4] shows some of the extracted keywords extracted from the text of [Figure 3] through the text mining technique according to the present invention. In [Figure 4], 'keyword' is the extracted keyword extracted by the text mining technique according to the present invention, and 'Frequency' is the video mining method for the segmented videos stored in the keyword name folder generated by using the extracted keyword as the folder name. Each represents the frequency of use selected when extracting keywords through
5. 제4-5단계5. Stage 4-5
제4-5단계(S450)는, [도 1]과 같이, 상술한 제4-5단계(S450)에서 추출된 추출 키워드를 폴더 이름으로 사용하는 폴더(키워드명 폴더)가 생성되어 있는지 확인하는 단계이다. 이는, 이처럼 키워드 명으로 형성된 키워드명 폴더에 후술할 분할 영상을 저장하고, 비디오 마이닝 기법으로 수어 동영상에 대해 키워드를 추출할 때 이들 키워드명 폴더를 이용하여 필요한 분할 영상 등을 검색하여 활용할 수 있게 하기 위함이다.Step 4-5 (S450) checks whether a folder (keyword name folder) using the extracted keyword extracted in step 4-5 (S450) as a folder name is created, as shown in [Fig. 1]. It is a step. This is to store the split images to be described later in the keyword name folder formed by the keyword names, and to search for and utilize the necessary split images by using these keyword name folders when extracting keywords for sign language videos using the video mining technique. It is for
한편, 상기 제4-5단계(S450)는, [도 1]과 같이, 이처럼 키워드명 폴더가 생성되어 있는지 확인하고, 만일 키워드명 폴더가 생성되어 있지 않으면 데이터베이스에 키워드명을 가진 폴더를 새로 만들고, 키워드명 폴더가 생성되어 있으면 제4-6단계(S460)를 수행하게 한다.On the other hand, in step 4-5 (S450), as shown in [Fig. 1], it is checked whether a keyword name folder is created in this way, and if the keyword name folder is not created, a new folder with the keyword name is created in the database. , When the keyword name folder is created, steps 4-6 (S460) are performed.
6. 제4-6단계6. Steps 4-6
제4-6단계(S460)는, [도 1]과 같이, 상술한 자막 데이터와 함께 전송된 수어 동영상을 분할하여 분할 영상으로 만드는 단계이다. 이때, 분할 영상은 수어 동영상의 타임 라인에서 수어 동영상을 글로스 단위로 나눌 때 사용했던 시각 정보, 즉, 각 글로스 단위로 나눌 때 수어가 시작하는 시작 시각과 끝나는 종료 시각을 이용한다. 이에, 상기 각 분할 영상은 자막 데이터에서 종료 시각을 기준으로 분할함에 따라 농문화에서 의미 전달이 이루어지는 글로스 단위로 영상 단위로 분할할 수 있게 된다.Step 4-6 (S460) is a step of dividing the sign language video transmitted together with the above-described caption data into divided images, as shown in [Fig. 1]. At this time, the split video uses the time information used when dividing the sign language video into gloss units on the timeline of the sign language video, that is, the start time and end time of the sign language when divided into each gloss unit. Accordingly, each of the divided images can be divided into image units in gloss units in which meaning is conveyed in deaf culture as the subtitle data is divided based on the end time.
7. 제4-7단계7. Steps 4-7
제4-7단계(S470)는, [도 1]과 같이, 상술한 제4-6단계(S460)에서 분할된 분할 영상을 폴더에 저장하는 단계이다. 이때, 상기 폴더는 자막 데이터에서 이 분할 영상 구간에 해당하는 번역 텍스트를 상술한 제4-5단계(S450)를 통해 추출된 키워드를 폴더 명으로 사용하는 키워드명 폴더를 말한다. 이에, 상기 각 분할 영상은 그 영상 안에서 이루어지는 수어 내용으로 전달하려고 하는 키워드가 이 분할 영상을 저장한 폴더 이름과 같게 된다.Step 4-7 (S470) is a step of storing the divided image divided in step 4-6 (S460) in a folder, as shown in FIG. 1. At this time, the folder refers to a keyword name folder in which the keyword extracted from the subtitle data through the above-described step 4-5 (S450) of the translation text corresponding to the divided video section is used as the folder name. Accordingly, for each of the divided videos, the keyword to be delivered as sign language content formed in the video is the same as the name of the folder in which the divided video is stored.
한편, 본 발명의 바람직한 실시예에서, 상기 분할 영상은 각 분할 영상에 해당하는 텍스트에서 추출된 추출 키워드가 적어도 두 개의 단어를 포함할 때는, 각 단어가 있는 키워드명 폴더를 검색하고, 각 단어가 검색된 키워드명 폴더에 각각 분할 영상을 저장하게 구성하는 것이 바람직하다. 이는, 키워드가 한 단어로 이루어지더라도 이 한 단어가 포함된 글로스 단위의 검색이 가능하게 하기 위함이다.On the other hand, in a preferred embodiment of the present invention, when the extracted keyword extracted from the text corresponding to each divided image includes at least two words, the keyword name folder containing each word is searched, and each word is It is preferable to store the divided images in each searched keyword name folder. This is to enable a search in units of gloss including this single word even if the keyword consists of one word.
또한, 본 발명의 바람직한 실시예에서, 상기 데이터베이스는 미리 정해진 시간, 예를 들어서 하루에 3번, 한 시간 단위 등으로 업데이트가 이루어지게 구성함으로써, 입력되어 새로 만들어지는 폴더 개수가 늘어남에 따라 사용빈도가 더욱 높아지게 되어 그만큼 더 정확한 번역이 가능하게 된다.In addition, in a preferred embodiment of the present invention, the database is configured to be updated at a predetermined time, for example, three times a day, hourly, etc., so that the frequency of use increases as the number of newly created folders increases. The higher the value, the more accurate the translation becomes.
마지막으로, 상기 제4-7단계(S470)에서는, 수어 동영상에 대해 수어 통역사를 거치지 않고 비디오 마이닝 기법을 통해 직접 수어 동영상을 번역하여 자막을 생성할 수 있게 데이터베이스로 제공된다. 즉, 수어 동영상에 대해 번역 의뢰가 있어 비디오 마이닝 기법을 통해 수어 동영상에서 키워드를 추출할 때(제3 및 제4단계), 분할 동영상과 키워드명 폴더를 참조하여 수어 동영상을 비디오 마이닝 기법으로 키워드를 추출하게 함으로써, 수어 통역사가 농문화에서 의미 전달이 이루어지도록 글로스 단위로 적절하게 번역된 내용을 자막으로 활용할 수 있어 전달하려고 하는 의미 전달이 더욱더 적절하게 이루어질 수 있게 된다.Finally, in step 4-7 (S470), the sign language video is directly translated through a video mining technique without passing through a sign language interpreter and provided to the database so that subtitles can be generated. That is, when there is a request for translation of a sign language video and keywords are extracted from the sign language video through the video mining technique (steps 3 and 4), keywords are extracted from the sign language video by the video mining technique by referring to the divided video and keyword name folder. By extracting, the sign language interpreter can use the contents properly translated in gloss units as subtitles so that the meaning is conveyed in deaf culture, so that the meaning to be conveyed can be conveyed more appropriately.
한편, 본 발명의 바람직한 실시예에서, 상기 키워드명 폴더에서 저장되어 비디오 마이닝 기법을 통해 수어 동영상에서 키워드 추출에 이용되는 분할 동영상은 이처럼 키워드 추출에 사용되는 빈도에 따라 정렬되게 하고, 사용빈도가 많은 것을 기준으로 일정 개수, 예를 들어서 사용 빈도가 높은 순으로 3~10개를 남겨놓고 나머지를 삭제하게 하여 사용 빈도가 높은 분할 영상만을 남겨놓음으로써, 신속하면서도 글로스 단위로 적절하게 번역할 수 있도록 키워드 검색이 이루어지게 구성하는 것이 바람직하다.On the other hand, in a preferred embodiment of the present invention, the divided videos stored in the keyword name folder and used for keyword extraction from sign language videos through a video mining technique are sorted according to the frequency of keyword extraction, and the frequently used videos are Based on the keyword search, a certain number of keywords are left, for example, 3 to 10 in order of frequency of use, and the rest are deleted, leaving only the divided images with high frequency of use, so that they can be translated quickly and appropriately in gloss units. It is desirable to configure this to be done.
이상과 같이 이루어진 본 발명은 수어 통역사가 번역한 자막 데이터를 텍스트 마이닝 기법으로 추출하여 키워드를 얻고, 이 키워드를 바탕으로 생성된 키워드명 폴더에 상기 자막 데이터를 바탕으로 얻은 분할 동영상을 저장하여 비디오 마이닝을 통해 수어 동영상에서 자막으로 사용할 키워드를 추출하게 구성함으로써, 수어 통역사가 청각 장애인의 고유문화인 농문화로 의미 전달을 할 수 있는 글로스 단위로 적절하게 번역된 내용을 활용하여 전달하려고 하는 의미 전달이 더욱더 적절하게 이루어질 수 있게 된다.The present invention made as described above extracts subtitle data translated by a sign language interpreter using a text mining technique to obtain keywords, and stores segmented videos obtained based on the subtitle data in a keyword name folder generated based on the keywords to perform video mining. By extracting keywords to be used as subtitles from sign language videos, the sign language interpreter can deliver the meaning that is intended to be conveyed by using appropriately translated content in gloss units that can convey meaning to the deaf culture, the unique culture of the hearing impaired. can be done properly.

Claims (6)

  1. 전기·전자통신망을 통해 수어 동영상을 전송받는 제1단계(S100); 전송받은 상기 수어 동영상을 저장하는 제2단계(S200); 키워드명 폴더를 참조하여 상기 수어 동영상을 비디오 마이닝 기법으로 키워드 분석하는 제3단계(S300); 상기 분석 단계에서 키워드가 추출되었는지 확인하고, 추출되지 않았으면 상기 수어 동영상을 수어 통역사에게 번역을 의뢰하게 하는 제4단계(S400); 만일, 상기 제4단계(S400)에서, 키워드가 추출되면 추출된 키워드를 의뢰인에게 텍스트로 전송하여 자막으로 사용할 것인지 확인하고, 자막으로 사용하지 않으면 상기 제4단계(S400)의 수어 통역사에게 번역을 의뢰하게 하는 제5단계(S500); 만일, 상기 제5단계(S500)에서 의뢰인이 전송받은 텍스트를 자막으로 사용하는 것을 승낙하면, 상기 추출된 키워드를 수어 동영상에 자막으로 추가하여 저장하는 제6단계(S600); 및 자막이 추가된 수어 동영상을 의뢰인에게 전송하게 하는 제7단계(S700);를 포함하되, A first step (S100) of receiving a sign language video through an electric/electronic communication network; A second step (S200) of storing the transmitted sign language video; A third step (S300) of keyword analysis of the sign language video using a video mining technique by referring to a keyword name folder; A fourth step (S400) of checking whether keywords have been extracted in the analysis step and, if not, requesting a sign language interpreter to translate the sign language video; If the keyword is extracted in the fourth step (S400), the extracted keyword is transmitted as text to the client to confirm whether to use it as a caption, and if not used as a caption, the sign language interpreter in the fourth step (S400) performs the translation. A fifth step of making a request (S500); If the client agrees to use the transmitted text as a caption in the fifth step (S500), a sixth step (S600) of adding and storing the extracted keyword as a caption to the sign language video; And a seventh step (S700) of transmitting a sign language video with captions added to the client,
    상기 수어 통역사를 통해 번역을 의뢰함에 따라 이루어지는 절차는, 수어 통역사에게 번역을 의뢰하게 하는 제4-1단계(S410); 수어 통역사가 상기 수어 동영상을 보면서 1~10개의 수어로 이루어져서 단어나 어절 그리고 어구를 나타내는 글로스(Gloss) 단위의 시작 시각과 종료 시각 그리고 상기 글로스 단위로 번역한 번역 텍스트를 포함하는 자막 데이터를 작성하여 저장하게 하고, 저장된 상기 자막 데이터를 상기 수어 동영상에 자막으로 추가하는 제4-2단계(S420); 동시에, 상기 자막 데이터에서 번역 텍스트를 분리하는 제4-3단계(S430); 분리된 번역 텍스트에 대해 텍스트 마이닝 기법을 이용하여 추출 키워드를 추출하는 제4-4단계(S440); 상기 추출 키워드를 폴더 이름으로 사용하는 키워드명 폴더가 생성되어 있는지 확인하고, 만일 키워드명 폴더가 생성되어 있지 않으면 상기 추출 키워드를 폴더 이름으로 키워드명 폴더를 생성하는 제4-5단계(S450); 상기 수어 동영상에 대해 상기 자막 데이터로 저장된 상기 시작 시각과 종료 시각으로 분할하여 글로스 단위로 분할 영상으로 만드는 제4-6단계(S460); 및 상기 각 분할 영상을 각 글로스 단위에서 추출된 추출 키워드를 폴더 이름으로 사용하는 키워드명 폴더에 저장하여 상기 제3단계(S300)에서 비디오 마이닝 기법으로 키워드를 추출할 때, 상기 키워드명 폴더를 참조하여 사용빈도가 높은 분할 동영상을 통해 키워드 검색되게 제공하는 제4-7단계(S470);를 포함하는 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.The procedure performed when requesting translation through the sign language interpreter includes a 4-1 step (S410) of requesting translation from the sign language interpreter; While watching the sign language video, a sign language interpreter creates subtitle data including start and end times in gloss units consisting of 1 to 10 sign languages and representing words, phrases, and phrases, and translated text translated in gloss units a 4-2 step (S420) of storing and adding the stored caption data to the sign language video as a caption; Simultaneously, a 4-3 step of separating translated text from the subtitle data (S430); A 4-4 step (S440) of extracting an extraction keyword from the separated translation text using a text mining technique; Step 4-5 of checking whether a keyword name folder using the extracted keyword as a folder name is created, and if the keyword name folder is not created, creating a keyword name folder using the extracted keyword as the folder name (S450); Step 4-6 (S460) of dividing the sign language video by the start time and end time stored as the caption data into divided images in gloss units; And storing the divided images in a keyword name folder using the extracted keyword extracted from each gloss unit as a folder name, and referring to the keyword name folder when the keyword is extracted by the video mining technique in the third step (S300). A system for adding subtitles to a sign language video, characterized in that it includes; step 4-7 (S470) of providing keyword search through divided videos with high frequency of use.
  2. 제1항에서,In paragraph 1,
    상기 수어 동영상은,The sign language video,
    영상 길이가 1초~20분인 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.A system for adding subtitles to sign language videos, characterized in that the video length is 1 second to 20 minutes.
  3. 제1항에서,In paragraph 1,
    상기 수어는 The sign language is
    손의 모양인 수형(手形, dez), 손의 위치인 수위(手位, tab), 손의 움직임인 수동(手動, sig), 손바닥의 방향인 수향(手向, orientation), 표정과 몸의 움직임인 비수지신호(非手指信號, non-manual signals) 중 적어도 하나로 이루어진 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.The shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (手向, orientation), the expression and body A system for adding subtitles to sign language videos, characterized in that they consist of at least one of non-manual signals that are movements.
  4. 제1항에서in paragraph 1
    상기 제4-7단계(S470)에서, In the step 4-7 (S470),
    상기 각 분할 영상에 해당하는 상기 추출 키워드가 적어도 두 개의 단어로 이루어졌을 때는,When the extraction keyword corresponding to each of the divided images consists of at least two words,
    상기 각 단어가 포함된 각 키워드명 폴더에 각각 분할 영상을 저장하게 하는 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.A system for adding subtitles to sign language videos, characterized in that the divided images are stored in each keyword name folder including each word.
  5. 제1항에서,In paragraph 1,
    상기 수어 동영상에 자막을 추가하는 시스템에서,In the system for adding subtitles to the sign language video,
    미리 정해진 시간 간격으로 업데이트가 이루어지게 구성된 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.A system for adding subtitles to sign language videos, characterized in that the update is performed at predetermined time intervals.
  6. 제1항 내지 제5항 중 어느 한 항에서,In any one of claims 1 to 5,
    상기 제4-1단계(S410)에서,In the 4-1 step (S410),
    수어 통역사에게 번역을 의뢰할 때는, 의뢰인이 등록된 수어 통역사에서 선택하게 하거나, When requesting translation from a sign language interpreter, the client must select from registered sign language interpreters;
    의뢰인이 살고 있는 지역에 등록된 수어 통역사에게 의뢰하게 하거나, Have the client refer to a sign language interpreter registered in the area where the client lives; or
    수어 통역사가 여러 명이 있을 때 자신이 전에 수어 동영상을 의뢰했던 수어 통역사를 지정해서 의뢰하게 하거나, When there are several sign language interpreters, you can designate and request a sign language interpreter that you previously requested a sign language video for.
    의뢰인이 살고 있는 지역을 담당하는 수어 통역사를 선택하게 하거나, Have the client select a sign language interpreter for the area where the client lives; or
    임의로 수어 통역사를 지정하여 번역하게 하는 것을 특징으로 하는 수어 동영상에 자막을 추가하는 시스템.A system for adding subtitles to sign language videos, characterized in that a sign language interpreter is randomly assigned to translate.
PCT/KR2022/008059 2021-12-10 2022-06-08 System for adding subtitles to sign language video WO2023106522A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210176669A KR102428677B1 (en) 2021-12-10 2021-12-10 System of adding subtitles to sign language videos
KR10-2021-0176669 2021-12-10

Publications (1)

Publication Number Publication Date
WO2023106522A1 true WO2023106522A1 (en) 2023-06-15

Family

ID=82845324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/008059 WO2023106522A1 (en) 2021-12-10 2022-06-08 System for adding subtitles to sign language video

Country Status (2)

Country Link
KR (1) KR102428677B1 (en)
WO (1) WO2023106522A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004282134A (en) * 2003-03-12 2004-10-07 Seiko Epson Corp Sign language translation system and sign language translation program
KR20110102990A (en) * 2010-03-12 2011-09-20 주식회사 써드아이 System and method for interpreting sign language
KR20120073795A (en) * 2010-12-27 2012-07-05 엘지에릭슨 주식회사 Video conference system and method using sign language to subtitle conversion function
JP2015076774A (en) * 2013-10-10 2015-04-20 みずほ情報総研株式会社 Communication support system, communication support method, and communication support program
KR102115551B1 (en) * 2019-08-06 2020-05-26 전자부품연구원 Sign language translation apparatus using gloss and translation model learning apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101915088B1 (en) 2017-08-25 2018-11-05 신정현 Sign language translation device
KR102300589B1 (en) 2019-12-02 2021-09-10 주식회사 에이아이톡 Sign language interpretation system
KR102314710B1 (en) 2019-12-19 2021-10-19 이우준 System sign for providing language translation service for the hearing impaired person

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004282134A (en) * 2003-03-12 2004-10-07 Seiko Epson Corp Sign language translation system and sign language translation program
KR20110102990A (en) * 2010-03-12 2011-09-20 주식회사 써드아이 System and method for interpreting sign language
KR20120073795A (en) * 2010-12-27 2012-07-05 엘지에릭슨 주식회사 Video conference system and method using sign language to subtitle conversion function
JP2015076774A (en) * 2013-10-10 2015-04-20 みずほ情報総研株式会社 Communication support system, communication support method, and communication support program
KR102115551B1 (en) * 2019-08-06 2020-05-26 전자부품연구원 Sign language translation apparatus using gloss and translation model learning apparatus

Also Published As

Publication number Publication date
KR102428677B1 (en) 2022-08-08

Similar Documents

Publication Publication Date Title
WO2018128238A1 (en) Virtual consultation system and method using display device
WO2010036013A2 (en) Apparatus and method for extracting and analyzing opinions in web documents
WO2018097379A1 (en) Method for inserting hash tag by image recognition, and software distribution server storing software for performing same method
WO2011136425A1 (en) Device and method for resource description framework networking using an ontology schema having a combined named dictionary and combined mining rules
WO2015020354A1 (en) Apparatus, server, and method for providing conversation topic
WO2016125949A1 (en) Automatic document summarizing method and server
EP3545487A1 (en) Electronic apparatus, controlling method of thereof and non-transitory computer readable recording medium
WO2020107761A1 (en) Advertising copy processing method, apparatus and device, and computer-readable storage medium
EP3915039A1 (en) System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection
WO2021071137A1 (en) Method and system for automatically generating blank-space inference questions for foreign language sentence
WO2014106979A1 (en) Method for recognizing statistical voice language
WO2015126097A1 (en) Interactive server and method for controlling the server
WO2013077527A1 (en) Multilingual speech system and method of character
WO2018097439A1 (en) Electronic device for performing translation by sharing context of utterance and operation method therefor
WO2021215620A1 (en) Device and method for automatically generating domain-specific image caption by using semantic ontology
WO2014115952A1 (en) Voice dialog system using humorous speech and method thereof
WO2014142422A1 (en) Method for processing dialogue based on processing instructing expression and apparatus therefor
WO2017115994A1 (en) Method and device for providing notes by using artificial intelligence-based correlation calculation
WO2020159140A1 (en) Electronic device and control method therefor
WO2019112117A1 (en) Method and computer program for inferring meta information of text content creator
WO2023106522A1 (en) System for adding subtitles to sign language video
WO2020235910A1 (en) Text reconstruction system and method thereof
WO2021051557A1 (en) Semantic recognition-based keyword determination method and apparatus, and storage medium
WO2023106523A1 (en) Method for establishing database for system adding subtitles to sign language videos and database apparatus using same
WO2021085811A1 (en) Automatic speech recognizer and speech recognition method using keyboard macro function

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22904361

Country of ref document: EP

Kind code of ref document: A1