WO2023106522A1

WO2023106522A1 - System for adding subtitles to sign language video

Info

Publication number: WO2023106522A1
Application number: PCT/KR2022/008059
Authority: WO
Inventors: 조계연
Original assignee: 주식회사 위아프렌즈
Priority date: 2021-12-10
Filing date: 2022-06-08
Publication date: 2023-06-15
Also published as: KR102428677B1

Abstract

In the present invention, a folder is created through a keyword extracted by means of text mining from subtitle data translated by a sign language interpreter in gloss units by which meaning is transferred as words, word segments, or phrases in deaf culture, and then divided videos divided with respect to the subtitle data can be stored in respective folders, and thus, when a subtitle request for a sign language video is received, the sign language video is analyzed by means of video mining, wherein the keyword is extracted with reference to keyword name folders, and the extracted keyword is provided to a client as text so that same is used as subtitles, or, when a keyword cannot be extracted or the provided text is not used as subtitles, subtitle data which the sign language interpreter has directly translated can be used as subtitles and used for folder creation, and thus the quality of sign language translation with direct involvement of the sign language interpreter can be increased.

Description

System for adding subtitles to sign language videos

The present invention relates to a system for adding subtitles to sign language videos, and more particularly, subtitles in which a sign language interpreter translates a transmitted sign language video into a gloss unit capable of conveying meaning to deaf culture, a unique culture of the hearing impaired. Data is provided as subtitles to ensure appropriate translation, and at the same time, a keyword folder is created with the keyword extracted as a folder name using text mining techniques for the translated subtitle data to store the divided images based on the subtitle data. However, by configuring each segmented video to be stored in a keyword folder containing extracted keywords, when a sign language interpreter is directly requested for translation of a sign language video or subtitles are added to a sign language video using a video mining technique, the sign language interpreter translates the translation. It is possible to provide sign language subtitles suitable for deaf culture by extracting keywords using frequently used segmented images from the keyword folder created based on this.

In general, sign language refers to visual language expressed using hands, facial expressions, and gestures. Since such sign language is a visual language, it is impossible to communicate in sign language to a person who does not know sign language or in a place where sign language cannot be seen visually. Accordingly, as shown in (Patent Document 1) to (Patent Document 3) below, devices capable of communicating in sign language in various ways have been developed.

(특허문헌 1) 한국등록특허 제10-1915088호(Patent Document 1) Korean Patent Registration No. 10-1915088

a main body portion with displays formed on both sides; a camera unit formed on the main body and photographing a sign language operation of a caller; An image processing unit that receives the image obtained from the camera unit and extracts a sign language operation; a database unit in which the sign language operation and words are matched and stored; and the sign language operation extracted from the image processing unit and the sign language operation extracted from the database unit. The present invention relates to a sign language translation apparatus comprising: a translation unit including an analyzer that analyzes words matched with and extracts sentences, and a control unit that displays the sentences extracted by the analyzer on the display.

(특허문헌 2) 한국등록특허 제10-2314710호(Patent Document 2) Korea Patent Registration No. 10-2314710

It relates to a sign language interpretation service system for the hearing-impaired, which can be worn on the user's head, recognizes the motion of the hand joint through a photograph of the user's sign language motion, generates hand joint motion data, recognizes the other person's voice, and converts it into text. a first wearable device unit that displays, receives sign language interpretation data from the outside, and outputs it as a voice; a second wearable device unit that can be worn on a user's hand and generates hand movement tracking data by tracking movement of the hand; It is wearable on the user's body, corrects the hand joint movement data based on the hand movement tracking data, transmits the corrected sign language movement data to the outside, receives the sign language interpretation data from the outside in response, and 1 portable communication device unit transmitting to the wearable device unit; and a cloud server unit receiving the sign language operation data from the portable communication device unit, generating the sign language interpretation data through a machine learning algorithm based on the sign language operation data, and transmitting the generated sign language interpretation data to the portable communication device unit. A sign language interpretation service system is disclosed.

(특허문헌 3) 한국등록특허 제10-2300589호(Patent Document 3) Korea Patent Registration No. 10-2300589

It is about an artificial intelligence (AI)-based sign language interpretation system, a circular dictionary that defines words derived from the same word as one word, a synonym processing dictionary that defines words with the same or similar meaning as one word, a dictionary storage unit storing a stopword processing dictionary defining stopwords not used in sign language translation among morphologically analyzed sentences and a homonym dictionary in which different identification information is set for each homonym; a morpheme analyzer for distinguishing morphemes from an input sentence; a sign language generator for generating a sign language sentence to be translated by comparing each of the classified morphemes with the dictionaries; a motion data extraction unit which extracts, from the storage unit, each motion data indicated by a sign language word code matched with each morpheme constituting the generated sign language sentence; and an avatar motion display unit displaying and controlling the motion of the sign language delivery avatar on the display unit according to the extracted motion data.

However, the existing sign language interpretation system has the following problems.

(1) Since sign language content must be identified using a camera, etc., and translated based on this, it is inconvenient that sign language video transmitted through a mobile terminal cannot be immediately translated.

(2) In addition, the existing sign language interpretation system not only has a risk of mistranslation as it translates through the movement of images, but also has limitations in accurately conveying meaning.

(3) In particular, since the existing sign language interpretation system is simply translated through the movement of words and joints, the part to be translated centered on words recognizable in deaf culture is simply translated and displayed based on Korean words, so that it is accurate. cannot convey meaning.

(4) In addition, sign language, unlike the standard language of general Korean, can not only use different sign languages depending on regions and groups, but also have different meanings. Due to this difference, existing sign language interpretation systems that perform mechanical translation cannot There is also a risk that the content and the translated content will be translated completely differently.

The present invention takes this point into consideration, so that a sign language interpreter creates a keyword name folder with extracted keywords extracted through a text mining technique from subtitle data translated into gloss units that enable communication in deaf culture while watching a sign language video. Then, each keyword name folder is configured to store divided videos divided on the basis of subtitle data, so that when a sign language video is requested for subtitles, this sign language video is analyzed using a video mining technique, but keywords are referred to the keyword name folder. is extracted, and the extracted keywords are provided to the client as text to be used as subtitles, or when the keywords cannot be extracted or the provided text is not used as subtitles, the sign language interpreter uses the subtitle data directly translated as subtitles while using the keyword name A system for adding subtitles to sign language videos that not only improves the quality of sign language translations by directly involving sign language interpreters by configuring them to be used for folder creation, but also further enhances the quality of subtitles through keyword searches with high frequency of use. Its purpose is to provide

In particular, the present invention is configured so that when a sign language interpreter generates subtitle data, translation is performed by dividing one to 10 sign words per one input sign language video, so that the meaning the deaf person wants to convey is expressed in one word, word, or phrase unit. Another object is to provide a system for adding subtitles to sign language videos, in which meaning is conveyed by dividing in gloss units so that meaning can be conveyed more appropriately.

A system for adding subtitles to a sign language video according to the present invention for achieving this object includes a first step of receiving a sign language video through an electric/electronic communication network (S100); A second step (S200) of storing the transmitted sign language video; A third step (S300) of keyword analysis of the sign language video using a video mining technique by referring to a keyword name folder; A fourth step (S400) of checking whether keywords have been extracted in the analysis step and, if not, requesting a sign language interpreter to translate the sign language video; If the keyword is extracted in the fourth step (S400), the extracted keyword is transmitted as text to the client to confirm whether to use it as a caption, and if not used as a caption, the sign language interpreter in the fourth step (S400) performs the translation. A fifth step of making a request (S500); If the client agrees to use the transmitted text as a caption in the fifth step (S500), a sixth step (S600) of adding and storing the extracted keyword as a caption to the sign language video; and a seventh step (S700) of transmitting the sign language video with subtitles added to the client; The procedure performed when requesting translation through the sign language interpreter includes a 4-1 step (S410) of requesting translation from the sign language interpreter; While watching the sign language video, a sign language interpreter creates subtitle data including start and end times in gloss units consisting of 1 to 10 sign languages and representing words, phrases, and phrases, and translated text translated in gloss units A 4-2 step to store (S420); a 4-3 step of adding the caption data to the sign language video as a caption (S430); Simultaneously, a 4-3 step of separating translated text from the subtitle data (S430); A 4-4 step (S440) of extracting an extraction keyword from the separated translation text using a text mining technique; Step 4-5 of checking whether a keyword name folder using the extracted keyword as a folder name is created, and if the keyword name folder is not created, creating a keyword name folder using the extracted keyword as the folder name (S450); Step 4-6 (S460) of dividing the sign language video by the start time and end time stored as the caption data into divided images in gloss units; And storing the divided images in a keyword name folder using the extracted keyword extracted from each gloss unit as a folder name, and referring to the keyword name folder when the keyword is extracted by the video mining technique in the third step (S300). and 4th to 7th steps (S470) of providing keyword search through segmented videos with high frequency of use.

In particular, the sign language video is characterized in that the length of the video is 1 second to 20 minutes.

In addition, the sign language is the shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (手向, orientation) ) and at least one of non-manual signals, which are facial expressions and body movements.

And in step 4-7 (S470), when the extracted keyword corresponding to each split image consists of at least two words, storing the split images in each keyword name folder including each word characterized by

Meanwhile, in the system for adding subtitles to the sign language video, it is characterized in that the update is performed at predetermined time intervals.

Finally, in step 4-1 (S410), when requesting a sign language interpreter for translation, the client selects from registered sign language interpreters, requests a sign language interpreter registered in the region where the client lives, or sign language interpreter. When there are several interpreters, it is characterized by designating and requesting a sign language interpreter who had previously requested a sign language video, selecting a sign language interpreter in charge of the area where the client lives, or randomly designating a sign language interpreter to translate do.

[Effects of the Invention]

The system for adding subtitles to sign language videos according to the present invention has the following effects.

(1) When subtitles are provided for sign language videos, sign language interpreters directly translate content in gloss units where meaning is conveyed in deaf culture, based on segmented videos stored in folders created with keywords extracted by text mining techniques By configuring a sign language video to be provided with subtitles through a video mining technique, translation is performed in units of gloss in which meaning is conveyed in deaf culture, so that meaning transmission in deaf culture is more appropriate.

(2) In particular, when keywords are not extracted by the video mining technique or when the extracted keywords are provided to the client but not used as subtitles, the subtitle data can be obtained by directly requesting a sign language interpreter for translation. is directly used as a subtitle in sign language video, and at the same time, a folder is created with keywords obtained by extracting this subtitle data with text mining technique, and the divided video is stored there and used when extracting keywords with the video mining technique, so that sign language interpreters can By utilizing subtitle data translated in gloss units, the meaning of deaf culture can be conveyed more appropriately.

(3) On the other hand, since the translated text (subtitle data) translated and stored by a sign language interpreter can be directly used as a subtitle for a sign language video, it is possible to directly use the translated text of an interpreter without going through text mining, which means meaning in deaf culture. Meaning can be delivered more appropriately in units of delivery.

(4) In addition, at this time, when video mining is performed by referring to the keyword name folder in which the split videos are stored, the most frequently used split video is selected from the keyword name folder and keyword extraction is performed, so that frequently extracted keywords are repeated. Therefore, it is extracted so that the transmission of meaning in deaf culture is more appropriate.

(5) On the other hand, by translating subtitle data in gloss units consisting of several sign languages, only a part of the movement is input even in a situation where the meaning cannot be completely conveyed, such as when the movement is slow during sign language or when trying to convey the meaning with a single word. However, the intended translation result can be obtained.

(6) In addition, the present invention is configured to be updated every predetermined time unit, such as three times a day, so that the accuracy of translation can be increased and the meaning and subtitles to be conveyed in sign language can be provided as coincidentally as possible.

(7) Finally, the sign language is the shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (hand direction) By constructing including non-manual signals, which are hand movements, orientation, facial expressions and body movements, expressions of intention expressed through facial expressions along with hand or body movements can be converted into subtitle data. Therefore, meaning can be conveyed more appropriately through subtitles.

1 is a flowchart showing the operation of a system for adding subtitles to a sign language video according to the present invention.

2 is an image showing subtitle data translated and stored by a sign language interpreter according to the present invention as an example.

[Fig. 3] is an image showing an example of extracting translated text from subtitle data according to the present invention.

[Fig. 4] is an image showing part of the result of keyword extraction using a text mining technique for the translated text extracted according to the present invention.

[Best mode for carrying out the invention]

Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. Prior to this, the terms or words used in this specification and claims should not be construed as being limited to their usual or dictionary meanings, and the inventors should properly define the concept of terms in order to best explain their invention. According to the principle that it can be interpreted as meaning and concept consistent with the technical spirit of the present invention.

Therefore, since the embodiments described in this specification and the configurations shown in the drawings are only one of the most preferred embodiments of the present invention and do not represent all of the technical spirit of the present invention, various equivalents that can replace them at the time of this application It should be understood that there may be variations.

[수어 동영상에 자막을 추가하는 시스템][System to add subtitles to sign language videos]

In the system for adding subtitles to a sign language video according to the present invention, as shown in [Figs. 1] to [Fig. 4], keywords are extracted by a video mining technique by referring to the keyword name folder in which divided videos are stored for the transmitted sign language video, , The extracted keywords are provided to the client to confirm whether they will be used as subtitles, and then used as sign language video subtitles. As the keyword name folder created based on the subtitle data produced with the participation of sign language interpreters is used, translation suitable for deaf culture is possible. This made it possible to increase the accuracy of subtitles.

At this time, the subtitle data is divided into gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, and translated, and based on this, the transmitted sign language video is divided to obtain a segmented video obtained by configuring it. , When extracting keywords to be used as subtitles from sign language videos using a video mining technique using segmented videos, keyword extraction is performed in units of words, phrases, or phrases that convey meaning, so that the meaning to be conveyed can be conveyed more appropriately. it did

Hereinafter, this configuration will be described in more detail with reference to the accompanying drawings. Here, as the system for adding subtitles to a sign language video according to the present invention is performed in seven steps, each step will be described separately.

가. 제1단계go. Step 1

The first step (S100) is a step of receiving a sign language video as shown in [Fig. 1]. At this time, the client makes a request through an electric/electronic communication network such as the Internet using a mobile terminal such as a smart phone or PDA or a terminal such as a personal computer.

나. 제2단계me. Step 2

The second step (S200) is a step of storing the transmitted sign language video as shown in [Fig. 1]. At this time, the sign language video is used as the original, and when the subtitles obtained through the process described later are added and sent to the client, or the sign language video is translated into gloss, a word, phrase, or phrase unit that conveys meaning in deaf culture. It is also used when creating segmented videos. This will be described later.

다. 제3단계all. Step 3

The third step (S300) is a step of extracting keywords from the above-described sign language video using a video mining technique, as shown in [Fig. 1]. At this time, when using the video mining technique, keywords are extracted by referring to divided videos stored in a keyword name folder, which will be described later. The keyword name folder and the divided videos will be described later along with the fourth step (S400).

라. 제4단계la. Step 4

The fourth step (S400), as shown in [Fig. 1], is a step of comparing whether a keyword can be extracted by comparing a sign language video with a divided video stored in a keyword name folder through a video mining technique. Here, the video mining technique analyzes the behavior of visitors obtained through a video camera in the store, calculates the expected sales and number of visitors, and grasps trends in the retail industry based on the data that analyzes what kind of products the customers buy in the recorded video. Use the usual techniques used as data to do. In particular, in the present invention, a sign language video is analyzed and analyzed motion is compared with a segmented video to be described later to be used to extract keywords.

On the other hand, in the fourth step (S400), keywords are extracted from the sign language video using a video mining technique by referring to the keyword name folder in which the divided video is stored. The translation is requested (Step 4-1 (S410)), and when the keyword is extracted, the fifth step (S500) of providing the keyword to the client is performed. Here, the process of storing the segmented video used for keyword extraction in the video mining through subtitle data obtained by requesting a sign language interpreter for translation (S410 to S470) will be described later, and keyword processing extracted through video mining (S500) will be explained first.

마. 제5단계mind. Step 5

The fifth step (S500), when the keyword is extracted in the fourth step (S400), as shown in [Fig. 1], provides the extracted keyword to the client and confirms whether to use it as a subtitle. At this time, the extracted keywords are transmitted to the client who requested the above-mentioned sign language video subtitles through an electric/electronic communication network, so that the client reviews the extracted keywords and then selects whether to use them as subtitles or request a sign language interpreter to translate them. It is a step.

If the client selects to use it as a subtitle, a sixth step (S600) to be described later is performed, and if a sign language interpreter is requested for translation, a 4-1 step (S410) to be described later is performed. Here, in the 4-1 step (S410), the same procedure as requesting a sign language interpreter is performed because the keyword was not extracted in the 4-1 step (S400), which will be described later together.

바. 제6단계bar. Step 6

A sixth step (S600) is a step of storing the sign language video with subtitles added by adding keywords provided to the client and permitted to be used as subtitles to the sign language video, as shown in [Fig. 1].

사. 제7단계buy. Step 7

The seventh step (S700) is a step of transmitting the stored sign language video to the client, as shown in [Fig. 1]. In this case, the sign language video may be transmitted to a terminal used to receive the sign language video or to another terminal designated by the client.

(수어 통역사 번역 의뢰)(Request for sign language interpreter translation)

Meanwhile, in the above-described fourth step (S400), as shown in [Fig. 1], keywords are not extracted through video mining or the client does not use the keywords provided to the client as subtitles, so a sign language interpreter is requested for translation, A process of using the content translated by a sign language interpreter as a subtitle and a process of dividing a sign language video into divided videos and storing them in a keyword name folder so that they can be used for video mining are as follows. This process is carried out in 7 steps as follows.

1. 제4-1단계1. Stage 4-1

Step 4-1 (S410) is a step of requesting a sign language interpreter to translate a sign language video, as shown in FIG. 1 . At this time, when there are several sign language interpreters, the requester of the sign language video designates and requests the sign language interpreter he or she previously requested for the sign language video, selects a sign language interpreter in charge of the region where the client lives, or arbitrarily selects a sign language interpreter. can also be specified.

2. 제4-2단계2. Stage 4-2

In step 4-2 (S420), as shown in [FIG. 1] and [FIG. 2], the sign language interpreter reads the transmitted sign language video and translates it to create and store caption data.

Here, subtitle data refers to data that is arbitrarily divided and translated in gloss units that can convey meaning to deaf culture, which is the unique culture of the hearing impaired, while watching a sign language video transmitted by a sign language interpreter, as shown in [Fig. 2]. . Such subtitle data includes a time arbitrarily divided by a sign language interpreter to convey meaning, that is, a start time and an end time, which are the beginning and end of a gloss unit in the timeline of a sign language video, and translated text translated by the gloss unit. In this case, the translated text is translated so that the sign language interpreter directly inputs the translated content while watching the sign language video so that the meaning can be conveyed appropriately according to the situation while watching the sign language.

In [Fig. 2], 'intRequestIdx' indicates that one sign language video is divided into gloss units, and a number with the same 'intRequestIdx' shows that one video is divided into several gloss units and translated. For example, in [Fig. 1], a sign language video in which 'intRequestIdx' is stored as '1,503' is divided into 8 gloss units by a sign language interpreter to properly convey meaning in deaf culture, and each gloss is 'intRequest It has a pair of 'Idx' and 'intOrder' as an identifier. In addition, each gloss unit is stored as 'strStartTime (start time)' and strEndTime (end time)' based on the timeline of the sign language video, and the contents translated in each gloss unit are stored as 'strText'.

Here, in a preferred embodiment of the present invention, the sign language may include all movements, actions, facial expressions, etc. used in sign language, and most preferably, hand shape (dez), hand position, water level. (手位, tab), hand movement (手动, sig), palm direction (手向, orientation), facial expression and body movement (non-hand signal) It means at least one expression. This is to enable a sign language interpreter to directly observe the meaning expressed by all means used by the deaf to convey meaning and to comprehend the exact meaning and convert it into subtitles.

And in a preferred embodiment of the present invention, it is preferable to use a gloss unit capable of conveying meaning to deaf culture, which is the unique culture of the hearing impaired, for the sign language. By configuring it to consist of ~3 places, one motion of sign language can convey meaning like one noun, or two or three sign language motions can be used as a unit in which one meaning is conveyed.

In addition, in a preferred embodiment of the present invention, it is preferable to use a video with a video length of about 1 second to 20 minutes for the sign language video. This is to allow the sign language interpreter to focus and accurately translate, and to quickly add subtitles and give feedback to the deaf right away. Of course, even if the length of such a video is longer than this, it will take a little longer to process, and anyone skilled in the art will be able to easily understand that there is no problem in translating and providing subtitles.

As shown in [Fig. 1], the caption data is not only provided as a caption to the sign language video (step 4-2), but also extracts keywords through a text mining technique and divides the sign language video based on the caption data. After forming a keyword name folder to store the divided videos, keyword extraction is performed when a search keyword is extracted for other sign language videos using a video mining technique, and subtitles are automatically generated (steps 4-3 to 4-7). step) to be used.

Finally, the 4-2 step (S420) is a step of using the above-described subtitle data as a subtitle for a sign language video, as shown in [Fig. 1]. At this time, since the subtitle data includes the start time and end time of the gloss unit, which is a unit for conveying meaning (gloss), and the translated text directly translated by the sign language interpreter for the gloss unit, the sign language video is produced using this. Subtitles are generated immediately so that sign language videos can be supplied immediately.

3. 제4-3단계3. Stage 4-3

Step 4-3 (S430) is a step of separating only translated text from the above-described subtitle data, as shown in [Fig. 1]. This is to extract the extracted keywords from the translation test separated by the text mining technique. At this time, as the extracted keyword, a folder name is used to store necessary divided images, and as will be described later, when extracting a keyword from a sign language video using a video mining technique, data stored in this folder can be used. This will be explained in turn in the steps to be described later.

4. 제4-4단계4. Stage 4-4

Step 4-4 (S440) is a step of extracting each extraction keyword from the translation text separated in gloss units, as shown in [Fig. 1]. In this case, the extracted keyword is extracted from the translated text separated from the subtitle data using a text mining technique. Here, the text mining technique uses natural language processing technology based on linguistics, statistics, machine learning, etc. to standardize semi-structured/unstructured text data, extracts its features in the form of keywords, and extracts meaningful information. It refers to the usual technique of finding.

The extracted keywords extracted in this way are not only used as folder names (keyword names) of folders to be described later, but also used as subtitles by extracting appropriate keywords according to frequency of use when keywords are extracted from other videos using a video mining technique. [Figure 3] shows the text separated from the subtitle data of [Figure 2], and [Figure 4] shows some of the extracted keywords extracted from the text of [Figure 3] through the text mining technique according to the present invention. In [Figure 4], 'keyword' is the extracted keyword extracted by the text mining technique according to the present invention, and 'Frequency' is the video mining method for the segmented videos stored in the keyword name folder generated by using the extracted keyword as the folder name. Each represents the frequency of use selected when extracting keywords through

5. 제4-5단계5. Stage 4-5

Step 4-5 (S450) checks whether a folder (keyword name folder) using the extracted keyword extracted in step 4-5 (S450) as a folder name is created, as shown in [Fig. 1]. It is a step. This is to store the split images to be described later in the keyword name folder formed by the keyword names, and to search for and utilize the necessary split images by using these keyword name folders when extracting keywords for sign language videos using the video mining technique. It is for

On the other hand, in step 4-5 (S450), as shown in [Fig. 1], it is checked whether a keyword name folder is created in this way, and if the keyword name folder is not created, a new folder with the keyword name is created in the database. , When the keyword name folder is created, steps 4-6 (S460) are performed.

6. 제4-6단계6. Steps 4-6

Step 4-6 (S460) is a step of dividing the sign language video transmitted together with the above-described caption data into divided images, as shown in [Fig. 1]. At this time, the split video uses the time information used when dividing the sign language video into gloss units on the timeline of the sign language video, that is, the start time and end time of the sign language when divided into each gloss unit. Accordingly, each of the divided images can be divided into image units in gloss units in which meaning is conveyed in deaf culture as the subtitle data is divided based on the end time.

7. 제4-7단계7. Steps 4-7

Step 4-7 (S470) is a step of storing the divided image divided in step 4-6 (S460) in a folder, as shown in FIG. 1. At this time, the folder refers to a keyword name folder in which the keyword extracted from the subtitle data through the above-described step 4-5 (S450) of the translation text corresponding to the divided video section is used as the folder name. Accordingly, for each of the divided videos, the keyword to be delivered as sign language content formed in the video is the same as the name of the folder in which the divided video is stored.

On the other hand, in a preferred embodiment of the present invention, when the extracted keyword extracted from the text corresponding to each divided image includes at least two words, the keyword name folder containing each word is searched, and each word is It is preferable to store the divided images in each searched keyword name folder. This is to enable a search in units of gloss including this single word even if the keyword consists of one word.

In addition, in a preferred embodiment of the present invention, the database is configured to be updated at a predetermined time, for example, three times a day, hourly, etc., so that the frequency of use increases as the number of newly created folders increases. The higher the value, the more accurate the translation becomes.

Finally, in step 4-7 (S470), the sign language video is directly translated through a video mining technique without passing through a sign language interpreter and provided to the database so that subtitles can be generated. That is, when there is a request for translation of a sign language video and keywords are extracted from the sign language video through the video mining technique (steps 3 and 4), keywords are extracted from the sign language video by the video mining technique by referring to the divided video and keyword name folder. By extracting, the sign language interpreter can use the contents properly translated in gloss units as subtitles so that the meaning is conveyed in deaf culture, so that the meaning to be conveyed can be conveyed more appropriately.

On the other hand, in a preferred embodiment of the present invention, the divided videos stored in the keyword name folder and used for keyword extraction from sign language videos through a video mining technique are sorted according to the frequency of keyword extraction, and the frequently used videos are Based on the keyword search, a certain number of keywords are left, for example, 3 to 10 in order of frequency of use, and the rest are deleted, leaving only the divided images with high frequency of use, so that they can be translated quickly and appropriately in gloss units. It is desirable to configure this to be done.

The present invention made as described above extracts subtitle data translated by a sign language interpreter using a text mining technique to obtain keywords, and stores segmented videos obtained based on the subtitle data in a keyword name folder generated based on the keywords to perform video mining. By extracting keywords to be used as subtitles from sign language videos, the sign language interpreter can deliver the meaning that is intended to be conveyed by using appropriately translated content in gloss units that can convey meaning to the deaf culture, the unique culture of the hearing impaired. can be done properly.

Claims

A first step (S100) of receiving a sign language video through an electric/electronic communication network; A second step (S200) of storing the transmitted sign language video; A third step (S300) of keyword analysis of the sign language video using a video mining technique by referring to a keyword name folder; A fourth step (S400) of checking whether keywords have been extracted in the analysis step and, if not, requesting a sign language interpreter to translate the sign language video; If the keyword is extracted in the fourth step (S400), the extracted keyword is transmitted as text to the client to confirm whether to use it as a caption, and if not used as a caption, the sign language interpreter in the fourth step (S400) performs the translation. A fifth step of making a request (S500); If the client agrees to use the transmitted text as a caption in the fifth step (S500), a sixth step (S600) of adding and storing the extracted keyword as a caption to the sign language video; And a seventh step (S700) of transmitting a sign language video with captions added to the client,

The procedure performed when requesting translation through the sign language interpreter includes a 4-1 step (S410) of requesting translation from the sign language interpreter; While watching the sign language video, a sign language interpreter creates subtitle data including start and end times in gloss units consisting of 1 to 10 sign languages and representing words, phrases, and phrases, and translated text translated in gloss units a 4-2 step (S420) of storing and adding the stored caption data to the sign language video as a caption; Simultaneously, a 4-3 step of separating translated text from the subtitle data (S430); A 4-4 step (S440) of extracting an extraction keyword from the separated translation text using a text mining technique; Step 4-5 of checking whether a keyword name folder using the extracted keyword as a folder name is created, and if the keyword name folder is not created, creating a keyword name folder using the extracted keyword as the folder name (S450); Step 4-6 (S460) of dividing the sign language video by the start time and end time stored as the caption data into divided images in gloss units; And storing the divided images in a keyword name folder using the extracted keyword extracted from each gloss unit as a folder name, and referring to the keyword name folder when the keyword is extracted by the video mining technique in the third step (S300). A system for adding subtitles to a sign language video, characterized in that it includes; step 4-7 (S470) of providing keyword search through divided videos with high frequency of use.
In paragraph 1,

The sign language video,

A system for adding subtitles to sign language videos, characterized in that the video length is 1 second to 20 minutes.
In paragraph 1,

The sign language is

The shape of the hand (手形, dez), the position of the hand (手位, tab), the movement of the hand (手动, sig), the direction of the palm (手向, orientation), the expression and body A system for adding subtitles to sign language videos, characterized in that they consist of at least one of non-manual signals that are movements.
in paragraph 1

In the step 4-7 (S470),

When the extraction keyword corresponding to each of the divided images consists of at least two words,

A system for adding subtitles to sign language videos, characterized in that the divided images are stored in each keyword name folder including each word.
In paragraph 1,

In the system for adding subtitles to the sign language video,

A system for adding subtitles to sign language videos, characterized in that the update is performed at predetermined time intervals.
In any one of claims 1 to 5,

In the 4-1 step (S410),

When requesting translation from a sign language interpreter, the client must select from registered sign language interpreters;

Have the client refer to a sign language interpreter registered in the area where the client lives; or

When there are several sign language interpreters, you can designate and request a sign language interpreter that you previously requested a sign language video for.

Have the client select a sign language interpreter for the area where the client lives; or

A system for adding subtitles to sign language videos, characterized in that a sign language interpreter is randomly assigned to translate.