KR20160002081A

KR20160002081A - Apparatus and method for translating of sign language using speech recognition

Info

Publication number: KR20160002081A
Application number: KR1020140080858A
Authority: KR
Inventors: 정연홍
Original assignee: 주식회사 써드아이
Priority date: 2014-06-30
Filing date: 2014-06-30
Publication date: 2016-01-07

Abstract

The present invention relates to a sign language translation apparatus and method using speech recognition, and more particularly, to a speech recognition processing unit for converting a speech recognized from a video object into speech data and generating the converted speech data as text data. A text recognition processor for extracting text data inserted into a video object; A sign language translation processing unit for translating the generated text data and extracted text data into sign language words and digest information, and forming sign language sentences using the extracted sign language words and digest information; And an output unit for outputting the generated sentence.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sign language translation apparatus and a method thereof,

The present invention relates to a sign language translation apparatus and a method thereof using speech recognition, and more particularly, to a sign language translation apparatus and method using speech recognition that allow users with hearing impairment to easily utilize video images .

Hearing impairment refers to the weakness of hearing a sound or the inability to hear it completely. If you have a hearing impairment, you will not be able to communicate properly if you do not get the accent or pronunciation of your words and do not have proper language training. Therefore, hearing-impaired people communicate using sign language, but the sign language is unfamiliar to the general public and it is difficult to communicate with people with hearing impairments.

Such a conventional technique of translating a sign language expression into a character recognizes the movement of the whole arm in order to translate all the characters of sign language and requires data according to the movement recognition of the arms. Therefore, although the conventional art uses a plurality of sensors to recognize the movement of the arm, there is a problem that even if a plurality of sensors are used, translation is performed to two or more characters when the motion of the sign language expression is similar, There is a problem that this is difficult.

On the other hand, in the related art, since the movement of the finger is recognized as a bent state, a straightened state, a middle state, and the like, there is a problem that the difference in finger movement can not be accurately discriminated in the case of a sign language expression having a finger movement of a similar operation.

In this regard, Korean Patent Publication No. 2008-0010234 discloses a " multifunctional geological system ".

SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a speech recognition system and a speech recognition method for converting a speech data recognized from a video object into text data, And an object of the present invention is to provide a sign language translation apparatus and method therefor.

The present invention also relates to a sign language translation apparatus and a method therefor, which uses speech recognition to recognize a start time and an end time of a corresponding sentence when recognizing speech data from a video object, The purpose is to provide.

In addition, the present invention provides a sign language translation apparatus and method using speech recognition that determines whether a homonym exists when converting speech data recognized from a video object into text data, and grasping an accurate meaning of a noun and verb on the basis of an investigation The purpose is to provide.

It is another object of the present invention to provide a sign language translation apparatus and method using speech recognition that performs sign language translation of a caption output from a video object.

According to an aspect of the present invention, there is provided a sign language translation apparatus using speech recognition, comprising: a speech recognition processor for converting speech recognized from a video object into speech data and generating converted speech data as text data; A text recognition processor for extracting text data inserted into a video object; A sign language translation processing unit for translating the generated text data and extracted text data into sign language words and digest information, and forming sign language sentences using the extracted sign language words and digest information; And an output unit for outputting the generated sentence.

Further, the speech recognition processing unit may include: a speech extraction unit for recognizing and extracting speech from a video object; A data conversion unit for converting the extracted voice into voice data; And a text data generation unit for generating the converted speech data as text data.

The sign language translation processing unit may include a morpheme analysis unit for analyzing the morphemes of the generated text data and the extracted text data; A homonym processing unit for determining and processing the existence of homonyms in the generated text data and the extracted text data; A sign language word extracting unit for extracting a sign language word based on a morphological analysis result and a homonym word decision result; An intelligence information extracting unit for extracting intelligence information corresponding to a primality, a neutrality, and a consonance of a corresponding morpheme when a sign language word does not exist as a result of the morpheme analysis; And a sign language sentence forming unit for forming a sign language sentence by combining the extracted sign language word and the digestion information.

In addition, the homonym processor may determine whether a homonym exists as a result of the morphological analysis. If the homonym exists in the noun, the semantic identifier is determined based on the verb. If the verb exists in the verb, .

The output unit divides the screen of the display unit to output a sign language sentence on a part of the divided screen, and the sign language sentence is translated into sign language on another divided screen.

The apparatus further includes a time log recording unit for recording a time log of a start time of starting extraction of the audio from the video and a ending time of ending the audio extraction.

In addition, the output unit outputs a sign language sentence on the display unit screen according to the recorded time log.

According to another aspect of the present invention, there is provided a method for translating a sign language using speech recognition, the method comprising: converting a speech recognized from a video object into speech data and generating converted speech data as text data; Extracting text data inserted into a video object by a text recognition processing unit; Translating the generated text data and extracted text data into sign language words and digest information by a sign language translation processing unit and forming a sign language sentence using the extracted sign language words and digest information; And outputting a sign language sentence formed by the output unit; .

The step of converting the voice recognized from the video object into the voice data and generating the converted voice data as text data includes the steps of: recognizing and extracting voice from the video object; Converting the extracted voice into voice data; And generating the converted speech data as text data.

The step of translating the generated text data and the extracted text data into the sign language word and the digestion information and forming the sign language sentence using the extracted sign language word and the digestion information may include the steps of: Lt; / RTI > Determining and processing the existence of homonyms in the generated text data and the extracted text data; Extracting a sign language word based on the morpheme analysis result and the homonym word determination result; Extracting geographical information corresponding to a primitive, a neutral, and a vertex of the morpheme when the sign language word is not present as a result of the morphological analysis; and forming a sign language sentence by combining the extracted sign language word and the geographical information .

In addition, the step of determining existence and existence of a homonym exists in the generated text data and the extracted text data may include determining whether a homonym exists as a result of morphological analysis, if the homonym exists in the noun, If there is a homonym in the verb, it is characterized by recognizing the meaning based on nouns and research.

In addition, in the step of converting the voice recognized from the video object into the voice data and generating the converted voice data as text data, the time log recording unit records, by the time log recording unit, the start time for starting extraction of the audio from the video and the ending time And recording the time log for the first time.

Also. A sign language sentence is output on a partial screen divided by splitting the screen of the display unit and a sign language sentence is translated into sign language on another divided screen, And outputs the audio start time of the video object and the output time of the sign language sentence coincidentally.

The sign language translation apparatus and method using the speech recognition according to the present invention having the above-described configuration can convert the speech data recognized from the video data into text data so that users with hearing impairment can easily use the video data, , It is possible to expand the image contents for users with hearing impairment and to utilize them as a medium for sharing culture in various fields.

Further, the present invention has an effect of improving the translation performance by recognizing the start time and the end time of a corresponding sentence when voice data is recognized from a video object, and adjusting the sign language translation time according to the voice data output time .

In addition, the present invention judges whether a homonym exists at the time of converting the voice data recognized from the video data into the text data and grasps the exact meaning of the verb based on the examination of the noun, thereby eliminating the ambiguity of the language caused by the homonym So that the accuracy of translation can be improved.

In addition, the present invention has an effect of enabling easy and immediate communication between a user having a hearing impairment and a general user, and expanding contents for users with hearing impairment.

1 is a diagram for explaining a configuration of a sign language translation apparatus using speech recognition according to the present invention.
2 is a diagram for explaining a detailed configuration of a speech recognition processor employed in a sign language translation apparatus using speech recognition according to the present invention.
3 is a diagram for explaining the detailed configuration of a sign language translation processing unit employed in a sign language translation apparatus using speech recognition according to the present invention.
4 is a diagram for explaining a procedure of a sign language translation method using speech recognition according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . First, in adding reference numerals to the constituents of the drawings, it is to be noted that the same constituents are denoted by the same reference symbols as possible even if they are displayed on different drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

Hereinafter, a sign language translation apparatus and method using speech recognition according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

1 is a diagram for explaining a configuration of a sign language translation apparatus using speech recognition according to the present invention.

1, a sign language translation apparatus 100 using speech recognition according to the present invention includes a speech recognition processing unit 110, a time log recording unit 120, a text recognition processing unit 130, a sign language translation processing unit 140 and an output unit 150.

The speech recognition processing unit 110 converts the speech recognized from the video object into speech data, and generates the converted speech data as text data. The speech recognition processing unit 110 recognizes and extracts speech from a video object, converts the extracted speech into speech data, and generates the converted speech data as text data. At this time, the text data is generated by being divided into the root and the end.

The time log recording unit 120 records a time log about a start time for starting extraction of speech from the video and an end time for ending speech extraction.

The text recognition processor 130 extracts the text data inserted into the video object. That is, the text recognition processing unit extracts general nouns, compound nouns, numbers, and alphabets included in the subtitles inserted in the video and extracts unnecessary parts of speech by separating the root and the mother.

The sign language translation processing unit 140 translates the generated text data and extracted text data into sign language words and digest information, and forms a sign language sentence using the extracted sign language words and digest information. The sign language translation processing unit 140 forms a sign language sentence based on the morphological analysis of the generated text data and the extracted text data and the result of processing the existence of homonyms. In this case, if there is no sign language according to the morphological analysis result, the geographical information is extracted. The handwriting corresponding to the consonants and vowels of Hangul is stored, and the handwriting information is created to convey the meaning of the present handwriting more accurately when there is no handwriting.

The output unit 150 outputs the generated sign language. The output unit divides the screen of the display unit and translates the sign language sentence into sign language on a part of the divided screen. At this time, the output unit 150 outputs the sign language sentence on the display unit screen according to the recorded time log. That is, the structure of the output sign language is composed of [sentence start time] + [sentence end time] + formed sentence. For example, [2490] [4600] [Our country is now a representative longevity country. , The sentence starts at 2.49 seconds after the start of the video and ends at 4.6 seconds. Therefore, the total play time of the sentence becomes 2.11 seconds. Based on this, the sentence matches the video And outputs it to the display unit screen. On the other hand, the configuration of the display screen can be configured in various ways in addition to the configuration shown.

2 is a diagram for explaining a detailed configuration of a speech recognition processor employed in a sign language translation apparatus using speech recognition according to the present invention.

Referring to FIG. 2, the speech recognition processor 110 according to the present invention converts speech recognized from a video object into speech data, and generates converted speech data as text data.

For this, the voice recognition processing unit 110 includes a voice extracting unit 111, a data changing unit 113, and a text data generating unit 115.

The voice extracting unit 111 recognizes the voice from the video and extracts it. The voice extracting unit 111 extracts only the voice signal from the audio data inserted into the video object.

The data conversion unit 113 converts the extracted voice into voice data.

The text data generation unit 115 generates the converted speech data as text data. The text data generator divides the voice data into root and end, and generates text data in a text file (.txt) format.

3 is a diagram for explaining the detailed configuration of a sign language translation processing unit employed in a sign language translation apparatus using speech recognition according to the present invention.

3, the sign language translation processing unit 140 according to the present invention translates the generated text data and extracted text data into sign language words and digest information, and uses the extracted sign language words and digest information to generate a sign language sentence .

To this end, the sign language translation processing unit 140 includes a morpheme analysis unit 141, a homonyms processing unit 143, a sign language word extraction unit 145, a feature information extraction unit 147, and a sign language sentence formation unit 149 .

The morpheme analysis unit 141 analyzes morphemes of the generated text data and the extracted text data.

The homonym word processing unit 143 judges the presence of homonyms in the generated text data and the extracted text data and processes the homonyms. In the case of a noun, the homonym word processor 143 analyzes the text data and determines what the noun means by referring to the verb. In the case of a verb, the text data is analyzed to determine what the verb means by the examination of nouns and nouns.

The sign language word extracting unit 145 extracts a sign language word using the sign language DB based on the morphological analysis result and the homonym determination result.

If there is no sign language as a result of the morphological analysis, the geographical information extracting unit 147 extracts geographical information corresponding to the primality, neutrality and continuity of the morpheme from the geographical information DB.

The sign language sentence forming unit 149 forms a sign language sentence by combining the extracted sign language word and the digest information.

FIG. 4 is a diagram for explaining a procedure of a sign language translation method using speech recognition according to the present invention, and FIG. 5 is a diagram illustrating an example of a screen implemented according to the present invention.

Referring to FIG. 4, a sign language translation method using speech recognition according to the present invention uses a sign language translation apparatus using speech recognition as described above, and a repeated description will be omitted.

First, speech and text data are recognized from a video object (S100)

Next, a time log is recorded for the start time of starting extraction of audio from the video and the ending time of ending audio extraction (S105).

Next, the recognized voice is converted into voice data, the converted voice data is generated as text data, and the text data inserted in the video data is extracted (S110). In step S110, the voice recognized and extracted from the video data is converted into voice data And generates the converted speech data as text data. In addition, general nouns, compound nouns, numbers, and alphabets included in the subtitles embedded in the video are extracted, and unnecessary parts of speech are removed to separate the root and the end.

Next, the morpheme of the generated text data and the extracted text data is analyzed (S120).

Next, it is determined whether there is a homonym in the generated text data and the extracted text data (S130).

Next, it is determined whether or not the sign language is present based on the results of steps S120 and S130 (S140).

Next, if there is no sign language word, the feature information is extracted (S145).

Next, if a sign language word exists, a sign language sentence is formed (S150).

Finally, the sign language sentence is output by combining the extracted sign language word and the digest information (S160).

As described above, the sign language interpretation apparatus and method using speech recognition according to the present invention converts speech data recognized from a video object into text data and performs sign language translation so that users with hearing impairments can easily utilize the video, It is possible to expand the visual content for users with hearing impairment and further utilize it as a medium for sharing culture in various fields.

In addition, the present invention can improve the translation performance by recognizing the start time and the end time of a corresponding sentence when voice data is recognized from the video object, and adjusting the sign language translation time according to the voice data output time.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the scope of the appended claims. As will be understood by those skilled in the art.

100: Sign language translation device using speech recognition
110: voice recognition processor 120: time log recorder
130: Text recognition processor 140: Sign language translation processor
150:

Claims

A speech recognition processor for converting the speech recognized from the video object into speech data and generating the converted speech data as text data;
A text recognition processor for extracting text data from a video object;
A sign language translation processing unit for translating the generated text data and extracted text data into sign language words and digest information, and forming sign language sentences using the extracted sign language words and digest information; And
An output unit for outputting the generated sign language sentence;
And a sign language translation apparatus using speech recognition.

The method according to claim 1,
The speech recognition processing unit,
A voice extracting unit for recognizing and extracting a voice from a video object;
A data conversion unit for converting the extracted voice into voice data; And
A text data generation unit for generating the converted speech data as text data;
And a sign language translation apparatus using speech recognition.

The method according to claim 1,
The sign language translation processing unit,
A morpheme analysis unit for analyzing morphemes of the generated text data and the extracted text data;
A homonym processing unit for determining and processing the existence of homonyms in the generated text data and the extracted text data;
A sign language word extracting unit for extracting a sign language word based on a morphological analysis result and a homonym word decision result;
An intelligence information extracting unit for extracting intelligence information corresponding to a primality, a neutrality, and a consonance of a corresponding morpheme when a sign language word does not exist as a result of the morpheme analysis; And
A sign language sentence forming unit for forming a sign language sentence by combining the extracted sign language word and the digestion information;
And a sign language translation apparatus using speech recognition.

The method of claim 3,
The homonym word processing unit determines whether a homonym exists as a result of the morphological analysis. If the homonym exists in the noun, it is possible to grasp the meaning based on the verb, and if there is a homonym in the verb, A sign language translation device using speech recognition as a feature.

The method according to claim 1,
Wherein the output unit divides the screen of the display unit to output a sign language sentence on a part of the divided screen, and outputs the sign language sentence to another divided part of the screen to be output as a sign language.

The method according to claim 1,
Further comprising a time log recording unit for recording a time log for a start time for starting extraction of speech from the video object and an end time for ending speech extraction.

The method according to claim 6,
Wherein the output unit outputs a sign language sentence on the display unit screen according to the recorded time log.

Converting the speech recognized from the video object into speech data by the speech recognition processor and generating the converted speech data as text data;
Extracting text data inserted into a video object by a text recognition processing unit;
Translating the generated text data and extracted text data into sign language words and digest information by a sign language translation processing unit and forming a sign language sentence using the extracted sign language words and digest information; And
Outputting a sign language sentence formed by the output unit;
And a speech recognition method using the speech recognition.

9. The method of claim 8,
The step of converting the voice recognized from the video object into the voice data and generating the converted voice data as text data,
Recognizing and extracting speech from a video object;
Converting the extracted voice into voice data; And
Generating converted speech data as text data;
And a speech recognition method using the speech recognition.

9. The method of claim 8,
Translating the generated text data and extracted text data into sign language words and feature information, and forming a sign language sentence using the extracted sign language words and feature information,
Analyzing the morpheme of the generated text data and the extracted text data;
Determining and processing the existence of homonyms in the generated text data and the extracted text data;
Extracting a sign language word based on the morpheme analysis result and the homonym word determination result;
Extracting geographic information corresponding to the primitive, neutral, and longitudinal features of the morpheme if the sign language word does not exist as a result of the morphological analysis; and
Forming a sign language sentence by combining the extracted sign language word and the digestion information;
And a speech recognition method using the speech recognition.

11. The method of claim 10,
The step of determining existence of homonyms in the generated text data and the extracted text data,
The present invention relates to a speech recognition method and a speech recognition method, and more particularly, to a speech recognition method and a speech recognition method, A sign language translation device utilizing the.

9. The method of claim 8,
In the step of converting the voice recognized from the video object into the voice data and generating the converted voice data as text data,
Further comprising the step of recording a time log of a start time for starting extraction of speech from the video and an ending time of ending speech extraction by the time log recording unit.

9. The method of claim 8,
In the step of outputting the formed sentence sentence,
The display screen of the display unit is divided to output a sign language sentence on a part of the divided screen and a sign language sentence is translated into sign language on another divided screen. And outputting the output of the sign language interpretation utilizing the speech recognition.