WO2023135963A1

WO2023135963A1 - Translation device

Info

Publication number: WO2023135963A1
Application number: PCT/JP2022/043979
Authority: WO
Inventors: 謙吾竹谷; 憲卓岡本; 心語郭
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2022-01-13
Filing date: 2022-11-29
Publication date: 2023-07-20

Abstract

This translation device 1 comprises an acquisition unit 12 for acquiring a recognition character string, which is a character string of a speech recognition result or a text recognition result; a detection unit 13 for detecting the end of a sentence in the recognition character string; and a translation unit 14 for translating to the end of the sentence in the recognition character string. This translation device 1 comprises: an acquisition unit 12 for acquiring a recognition character string, which is a character string of a speech recognition result or a text recognition result; a detection unit 13 for detecting the end of a sentence in the recognition character string; and a translation unit 14 translating to the end of the sentence in the recognition character string. The acquisition unit 12 acquires the recognition character string every time a function for carrying out speech recognition or text recognition outputs a recognition character string, and every time the acquisition unit 12 acquires a recognition character string, the translation unit 14 may translate to the end of the sentence detected in said recognition character string by the detection unit 13. The translation unit 14 is not required to translate the relevant character string when the end of a sentence in the recognition character string has not been detected by the detection unit 13. The detection unit 13 is not required to detect, as the end of a sentence, an end of a sentence which is within a prescribed number of characters from the end of the recognition character string.

Description

translation device

One aspect of the present disclosure relates to a translation device that translates a character string of speech recognition results or character recognition results.

In Patent Document 1 below, a speech recognition result is generated by performing speech recognition processing on an input uttered voice, and the machine translation result is obtained by machine-translating the speech recognition result from a first language into a second language. An interpretation device for generating is disclosed.

JP 2016-206929 A

In general, if the speech recognition result ends in the middle of a sentence, machine translation of the speech recognition result may produce misleading machine translation results. Therefore, it is desired to translate the recognition result more accurately.

A translation device according to one aspect of the present disclosure includes an acquisition unit that acquires a recognized character string that is a character string of a speech recognition result or a character recognition result, a detection unit that detects the end of the recognized character string, and and a translation unit for translating up to the end of a sentence.

In this aspect, up to the end of the detected sentence in the recognized character string is translated. As a result, for example, since the translation target does not end in the middle of the sentence, but is the sentence up to the end of the sentence, more accurate translation can be performed for the recognition result.

According to one aspect of the present disclosure, more accurate translation can be performed on recognition results.

It is a figure showing an example of a system configuration of a translation system containing a translation device concerning an embodiment. It is a figure showing an example of functional composition of a translation device concerning an embodiment. It is a figure which shows an example of learning data. It is a figure which shows an example of the production|generation method of learning data. FIG. 4 is a diagram showing an example of learning data to which labels have been added in series labeling; 4 is a flow chart showing an example of translation processing executed by the translation device according to the embodiment; 9 is a flowchart showing another example of translation processing executed by the translation device according to the embodiment; 9 is a flowchart showing yet another example of translation processing executed by the translation device according to the embodiment; FIG. 4 is a diagram showing an example (part 1) of translation behavior by the translation device according to the embodiment; FIG. 10 is a diagram showing an example (part 2) of behavior of translation by the translation device according to the embodiment; FIG. 10 is a diagram showing an example (part 3) of translation behavior by the translation device according to the embodiment; FIG. 11 is a diagram showing an example (part 4) of translation behavior by the translation device according to the embodiment; It is a figure which shows the behavior example (1) of the conventional translation. It is a figure which shows the behavior example (2) of the conventional translation. FIG. 10 is a diagram showing an example (part 1) of behavior of conventional sequential translation; FIG. 10 is a diagram showing a behavior example (part 2) of conventional sequential translation; FIG. 10 is a diagram showing a behavior example (part 3) of conventional sequential translation; FIG. 10 is a diagram showing another behavior example (No. 1) of conventional sequential translation; FIG. 10 is a diagram showing another behavior example (No. 2) of conventional sequential translation; FIG. 4 is a diagram showing a target of translation by the translation device according to the embodiment; It is a figure which shows an example of the hardware constitutions of the computer used with the translation apparatus which concerns on embodiment.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and overlapping descriptions are omitted. In addition, the embodiments of the present disclosure in the following description are specific examples of the present invention, and the present invention is not limited to these embodiments unless specifically stated to limit the present invention.

FIG. 1 is a diagram showing an example of the system configuration of the translation system 3 including the translation device 1 according to the embodiment. As shown in FIG. 1 , the translation system 3 includes a translation device 1 and a recognition device 2 . The translation device 1 and the recognition device 2 are connected for communication with each other via a network such as the Internet, and can exchange information with each other.

The translation device 1 is a computer device that translates a recognized character string, which is a character string resulting from speech recognition or character recognition. A voice recognition result is a result of voice recognition. Speech recognition is a technology that allows a computer to recognize human voices and convert them into character strings. A character recognition result is a result of character recognition. Character recognition is a technique for making a computer recognize images of printed characters or handwritten text and convert them into character strings. Existing technologies are used for speech recognition and character recognition in this embodiment. A string is a set of one or more characters. A recognized character string is a character string including at least one of a character string resulting from speech recognition and a character string resulting from character recognition.

　Translation means replacing a character string expressed in a first language with a second language that is different from the first language. The first language is, for example, Japanese, but may be any other language. The second language is, for example, English, but may be any other language. The first language and the second language may be different local dialects (for example, standard Japanese and Kansai dialect in Japan). The language is not limited to natural language, but may be artificial language or formal language (such as computer programming language). The translation is, for example, machine translation, which is automatic translation using a computer. The details of the translation device 1 will be described later.

The recognition device 2 is a computer device equipped with a function to perform voice recognition or character recognition. For example, the recognition device 2 inputs a human voice (according to the person speaking) in real time, performs voice recognition, and transmits the generated recognition character string to the translation device 1 via the network. Further, for example, the recognition device 2 inputs an image of handwritten text in real time (as a person handwrites the text), performs character recognition, and transmits the generated recognized character string to the translation device 1 via the network. Upon receiving the recognized character string, the translation device 1 uses the recognized character string in a function block described later.

Note that the functions of the recognition device 2 described above may be incorporated into the translation device 1, and the same processing may be performed in the translation device 1. That is, the translation device 1 has a function of performing voice recognition or character recognition, the voice recognition or character recognition is performed by the translation device 1, and the generated recognized character string is used in the function blocks of the translation device 1, which will be described later. may

FIG. 2 is a diagram showing an example of the functional configuration of the translation device 1 according to the embodiment. As shown in FIG. 2, the translation device 1 includes a storage unit 10, a learning unit 11, an acquisition unit 12 (acquisition unit), a detection unit 13 (detection unit), and a translation unit 14 (translation unit).

Each functional block of the translation device 1 is assumed to function within the translation device 1, but is not limited to this. For example, some of the functional blocks of the translation device 1 are computer devices different from the translation device 1, and function while appropriately transmitting and receiving information to and from the translation device 1 within the computer device connected to the translation device 1 via a network. may Also, some functional blocks of the translation apparatus 1 may be omitted, a plurality of functional blocks may be integrated into one functional block, or one functional block may be decomposed into a plurality of functional blocks. .

Each function of the translation device 1 shown in FIG. 2 will be described below.

The storage unit 10 stores arbitrary information used in calculations in the translation device 1, calculation results in the translation device 1, and the like. The information stored by the storage unit 10 may be referred to by each function of the translation device 1 as appropriate.

The storage unit 10 stores a sentence ending symbol insertion model for outputting a character string in which a sentence delimiting symbol (or a sentence ending symbol) is inserted when a character string without a sentence delimiting symbol (or a sentence ending symbol) is input. may Examples of sentence delimiters in Japanese include ",", ".", "!", and "?". For example, in the sentence ending mark insertion model, when a character string without a sentence delimiter is input, a character string with a sentence delimiter inserted is output. The sentence ending insertion model may be generated by existing technology.

A sentence ending mark insertion model is a trained model that has been (machine) learned based on training data, which is a set of character strings without sentence delimiters (or sentence ending marks) and strings with sentence delimiters (or sentence ending marks). may be

FIG. 3 is a diagram showing an example of learning data. In the learning data shown in FIG. 3, character strings without sentence delimiters and character strings with sentence delimiters are associated as pairs. The learning data shown in FIG. 3 exemplifies all or part of one sentence (sentence), but the present invention is not limited to this. For example, all or part of two or more sentences good too. In learning data, character strings without sentence delimiters can be regarded as input data, and character strings with sentence delimiters can be regarded as teacher data.

The learning data is a character string obtained by removing the sentence delimiter (or sentence ending symbol) from the extracted character string, which is a part of the character string extracted from the character string with the sentence delimiter (or sentence ending symbol), and the extracted character string. It may be a pair with The extracted character string may be, for example, a partial character string obtained by dividing a character string with a sentence delimiter (or a sentence end symbol) into word units and dividing at random positions.

FIG. 4 is a diagram showing an example of a method of generating learning data. As shown in FIG. 4, from the original data, which is a character string with a sentence delimiter, "Well, the meeting will begin.", four extracted character strings "Well, the meeting will begin." "Conference" and "Begin" are taken out. Then, for each of the four extracted character strings, the character strings "Now, the meeting will begin", "Well", "Meeting", and "Start" are generated from which the sentence delimiters are removed. In FIG. 4, a set of "Well, let's start the meeting" and "Well, let's start the meeting", a set of "Now" and "Well," Also, a set of "begin" and "begin" is learning data.

Strings with sentence delimiters (or sentence-ending symbols) included in the training data are labels in sequence labeling that indicate whether sentence delimiters (or sentence-ending symbols) come next for each word that composes the string. may be given. In that case, a character string without a sentence delimiter (or sentence ending symbol) included in the learning data may be divided into words. By using the labeled learning data in sequence labeling, machine learning can be performed as a sequence labeling task to predict which sentence delimiter will follow which word.

FIG. 5 is a diagram showing an example of labeled learning data in series labeling. In FIG. 5, for example, with respect to a character string with a sentence delimiter, "Now, let's start a meeting." A label "<O>" indicating that there is no sentence delimiter next to "meeting" and "o" is given, and a label "<PERIOD>" indicating that a full stop comes next to the word "begin". ” is given.

The learning unit 11 generates a sentence ending symbol insertion model. More specifically, the learning unit 11 performs (machine) learning based on learning data that is a set of a character string without a sentence delimiter (or a sentence end symbol) and a character string with a sentence delimiter (or a sentence end symbol). and generate a sentence ending mark insertion model as a trained model. In addition, the learning unit 11 may perform (machine) learning based on the various types of learning data described above to generate a sentence ending symbol insertion model. Further, the learning unit 11 may generate the learning data itself based on the method of generating the learning data described above.

The learning unit 11 causes the storage unit 10 to store the generated sentence ending symbol insertion model. Note that the sentence ending symbol insertion model stored in the storage unit 10 may not be generated by the learning unit 11, but may be generated by another device and obtained via a network.

The acquisition unit 12 acquires a recognized character string, which is a character string resulting from speech recognition or character recognition. The acquisition unit 12 may receive (acquire) the recognized character string transmitted from the recognition device 2, or acquire the recognized character string generated by the speech recognition or character recognition function provided in the translation device 1. Alternatively, a recognized character string stored in advance by the storage unit 10 may be obtained. The acquisition unit 12 may output the acquired recognized character string to the detection unit 13 and the translation unit 14, may store the acquired recognition character string in the storage unit 10, or may translate the acquired recognition character string through a communication device 1004 or an output device 1006, which will be described later. It may be displayed (output) to the user of the device 1 or may be transmitted (output) to another device.

The acquisition unit 12 may acquire the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string. For example, the speech recognition or character recognition function of the recognition device 2 (or the translation device 1) can input human voice in real time (according to human speech), or input a handwritten text image (human handwriting ), the acquisition unit 12 may sequentially (sequentially) acquire the recognized character string each time the recognized character string is sequentially (sequentially) output by inputting in real time. The acquiring unit 12 may sequentially (sequentially) output the acquired recognized character string to the detecting unit 13 and the translating unit 14 each time it sequentially (sequentially) acquires the recognized character string. ) may be stored.

The detection unit 13 detects (determines) the end of the recognized character string. More specifically, the detection unit 13 detects the end of the recognized character string acquired (output) by the acquisition unit 12 . For example, the detection unit 13 detects whether or not the end of the sentence is included in the recognized character string, and if the end of the sentence is included, the position of the end of the sentence in the recognized character string (for example, how many characters from the beginning of the recognized character string) is detected. To detect. Further, for example, the detection unit 13 detects the position of the end of the sentence in the recognized character string (for example, what character it is from the beginning of the recognized character string), and detects that the end of the sentence is not included. An existing technique may be used as a method for detecting the end of a sentence in a character string (recognition character string) by the detection unit 13 . For example, the detection unit 13 may detect the end of the recognized character string based on the end-of-sentence symbol included in the recognized character string.

If the recognized character string includes a plurality of (two or more) sentence endings, the detecting unit 13 may detect the plurality of sentence endings, or detect the sentence ending closest to the beginning of the recognized character string as the sentence ending. Alternatively, the end of the sentence closest to the end of the recognized character string may be detected as the end of the sentence, or the end of the sentence that satisfies a predetermined criterion may be detected as the end of the sentence.

The detection unit 13 may output the detection result to the translation unit 14 or store it in the storage unit 10 . The detection result may include, for example, information regarding the position (or plural positions) of the end of the sentence in the recognized character string and information as to whether or not the end of the sentence is included in the recognized character string.

The detection unit 13 outputs a character string obtained by inputting a recognized character string from which a sentence ending symbol is removed into a sentence ending symbol insertion model that outputs a character string with a sentence ending symbol inserted when a character string without a sentence ending symbol is input. The end of the recognized character string may be detected based on the character string. More specifically, the detecting unit 13 detects the recognition character string based on the output character string obtained by inputting the recognition character string from which the sentence ending symbol is removed into the sentence ending symbol insertion model stored in advance by the storage unit 10. Detect the end of a sentence. For example, the detection unit 13 inputs an output character string "Well, meeting. is the end of the sentence (or the end of the sentence immediately after "su") in the recognized character string is detected.

The detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is five, the detection unit 13 detects that "su" in the recognized character string is at the end of the sentence (or immediately after "su"). end of sentence) is not detected.

The detection unit 13 may detect the end of the part of the recognized character string that excludes a predetermined number of characters from the end. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is 6, the detection unit 13 detects the portion of the recognized character string within six characters from the end excluding "su. Detects the end of the sentence "thank you" (resultingly, the end of the sentence cannot be detected).

When the detection unit 13 detects the end of the recognized character string, the detection unit 13 may insert an end-of-sentence symbol into the detected part of the recognized character string. For example, when the detection unit 13 detects that "su" in the recognition character string "Now, the meeting will start" is the end of the sentence, the detection unit 13 inserts the sentence ending symbol "." Assume that the subsequent recognition character string is "Well, let's start the meeting." The detection unit 13 may output the inserted recognized character string to the translation unit 14 .

When the detection unit 13 detects the end of the recognized character string, the detection unit 13 may replace the recognized character string with the output character string (obtained by the sentence-end symbol insertion model). For example, by inputting the recognition character string ``Well, the meeting begins.'' to the sentence ending mark insertion model, ````Well, the meeting begins. If so, the detection unit 13 replaces the recognized character string "Well, the meeting will start." with the output character string "Well, the meeting will start.". The detection unit 13 may output the recognized character string after replacement to the translation unit 14 .

The translation unit 14 translates the recognized character string (from the beginning of the recognized character string) to the end of the sentence. More specifically, the translation unit 14 extracts the sentence end (output from the detection unit 13) detected by the detection unit 13 (with respect to the recognized character string) in the recognized character string acquired (output) by the acquisition unit 12. Translate up to the end of the sentence) based on the detected result. Existing technology such as machine translation may be used for the translation. The translation unit 14 may display (output) the translation result to the user of the translation device 1 via the communication device 1004 or the output device 1006, which will be described later, or transmit (output) it to another device. , may be stored by the storage unit 10 .

The translation unit 14 may translate up to the end of the recognized character string detected by the detection unit 13 in the recognized character string each time the acquisition unit 12 acquires the recognized character string. That is, each time the acquisition unit 12 acquires a recognized character string, the processes of the detection unit 13 and the translation unit 14 may be executed to translate the recognized character string up to the end of the sentence.

If the detection unit 13 does not detect the end of the recognized character string, the translation unit 14 does not have to translate the recognized character string. More specifically, when the detection result output by the detection unit 13 includes information indicating that the recognized character string does not include the end of the sentence, the translation unit 14 does not translate the recognized character string ( skip translation).

The translation unit 14 does not need to translate the recognized character string if the end of the sentence is within a predetermined number of characters from the end of the recognized character string. More specifically, the translation unit 14 determines that the end of the sentence detected by the detection unit 13 for the recognized character string acquired (output) by the acquisition unit 12 (the end of the sentence based on the detection result output from the detection unit 13) is , if it is within a predetermined number of characters from the end of the recognized string, the recognized string may not be translated. For example, if the recognized character string is "thank you. But", the detected end of the sentence is ".", and the predetermined number of characters is five, the translation unit 14 determines that the end of the sentence "." However, since it is within 5 characters from the end of ", the recognized character string is not translated.

The translation unit 14 may translate up to the sentence ending symbol in the recognized character string into which the sentence ending symbol is inserted. For example, the translation unit 14 translates the inserted recognized character string “Now, the meeting is starting.”

The translation unit 14 may translate up to the end of the sentence in the replaced recognition character string. For example, the translation unit 14 translates the replaced recognized character string “Now, the meeting is starting.”

Each function of the translation device 1 has been described above.

Next, some examples of the processing (translation processing) executed by the translation device 1 will be described with reference to FIGS. 6 to 8. FIG.

FIG. 6 is a flowchart showing an example of translation processing executed by the translation device 1. FIG. First, the acquisition unit 12 acquires a recognized character string (step S1). Next, the detection unit 13 detects the end of the recognized character string acquired in S1 (step S2). Next, the translation unit 14 translates the recognized character string acquired in S1 up to the end of the sentence detected in S2 (step S3).

FIG. 7 is a flowchart showing another example of translation processing (sequential processing) executed by the translation apparatus 1. FIG. First, the acquisition unit 12 acquires (attempts to acquire) a recognized character string (step S10). Next, the acquiring unit 12 (or the translation device 1) determines whether or not the recognized character string has been acquired in S10 (step S11). If it is determined in S11 that the information could not be obtained (S11: NO), the process returns to S10. On the other hand, if it is determined that the character string has been acquired in S11 (S11: YES), the detection unit 13 detects (attempts to detect) the end of the recognized character string acquired in S10 (step S12). Next, the detection unit 13 (or the translation device 1) determines whether or not the end of the sentence has been detected in S12 (step S13). If it is determined in S13 that detection was not possible (S13: NO), the process returns to S10 (without translation by the translation unit 14). On the other hand, if it is determined in S13 that detection was possible (S13: YES), the translation unit 14 translates the recognized character string acquired in S10 up to the end of the sentence detected in S12 (step S14). , the process returns to S10 again.

FIG. 8 is a flow chart showing still another example of translation processing (sequential processing and end determination) executed by the translation apparatus 1. FIG. Since S20 to S23 are the same as S10 to S13 in FIG. 7 respectively, the description thereof is omitted. If it is determined in S23 that the detection was successful (S23: YES), the detection unit 13 or translation unit 14 determines that the end of the sentence detected in S22 is n characters from the end of the recognized character string acquired in S20 (n is Integer of 1 or more) is determined (step S24). If it is determined in S24 that the number of characters is within n characters (S24: NO), the process returns to S20 (without translation by the translation unit 14). On the other hand, if it is determined in S24 that the number of characters is not within n characters (S24: YES), the translation unit 14 translates the recognized character string acquired in S20 up to the end of the sentence detected in S22 ( Step S25), and return to S20 again.

Next, an example of the behavior of translation by the translation device 1 will be described with reference to FIGS. 9 to 12. FIG. As a premise, the upper part of the balloon in the drawing shows the recognized character string acquired and displayed by the acquisition unit 12, and the lower part shows the translation result (translated character string) translated and displayed by the translation unit 14. In addition, when the translation result is "...", it indicates that it is waiting for translation (waiting).

FIG. 9 is a diagram showing an example (part 1) of translation behavior by the translation device 1. FIG. As shown in FIG. 9, the acquisition unit 12 sequentially acquires and displays recognized character strings. Specifically, the acquiring unit 12 first displays the acquired recognized character string “mukashi,” (first state), and then displays the acquired recognized character string “mukashi, mukashi,” (second state). state), and then the acquired recognized character string "mukashi, mukashi, in a certain place, grandfather and grandmother" is displayed (third state). In each state, the detection unit 13 detects the end of the sentence, but the translation unit 14 does not perform translation because the end of the sentence is not detected.

FIG. 10 is a diagram showing an example (part 2) of translation behavior by the translation device 1. FIG. FIG. 10 shows the behavior following FIG. 9 (encircled A). As shown in FIG. 10, the acquisition unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fourth state). In the fourth state, the detection unit 13 detects the end of the sentence, but the end of the sentence (the end of sentence symbol “.” in FIG. 10) within five characters from the end of the recognized character string is not detected as the end of the sentence. , the end of the sentence is not detected, and translation by the translation unit 14 is not continued.

FIG. 11 is a diagram showing an example (part 3) of translation behavior by the translation apparatus 1. FIG. FIG. 11 shows the behavior following FIG. 10 (encircled B). As shown in FIG. 11, the acquiring unit 12 additionally displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fifth state). In the fifth state, the detection unit 13 detects the end of the sentence (the end of sentence symbol “.” in FIG. 11) (because the end of the sentence is beyond the number of five characters from the end of the recognized character string), the translation unit 14 translation is done. Specifically, the translation unit 14 uses the current recognized character string "mukashi, mukashi, in a certain place, where an old man and an old woman were piled up. Grandpa." The translation string "Once upon a time, there was an old man and an old woman who were piled up." is displayed at the bottom.

FIG. 12 is a diagram showing an example (part 4) of translation behavior by the translation apparatus 1. FIG. FIG. 12 shows the behavior following FIG. 11 (encircled C). As shown in FIG. 12, the acquiring unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman lived in a certain place. The old man went to the mountains" (state 6). In the fourth and fifth states, the portion that was erroneously recognized as "loaded" (on the recognition side) is correctly recognized as "live" in the sixth state. In the sixth state as well, similar to the fifth state, the detection unit 13 detects the end of the sentence, so translation is performed by the translation unit 14 . Specifically, the translation unit 14 converts the currently recognized character string "Once upon a time, in a certain place, an old man and an old woman lived together. The old man went to the mountains." An old man and an old woman lived in." is displayed at the bottom.

Next, the effects of the translation device 1 according to the embodiment will be explained.

According to the translation device 1, the acquisition unit 12 acquires the recognized character string which is the character string of the speech recognition result or the character recognition result, the detection unit 13 detects the end of the recognized character string, and the end of the recognized character string. and a translation unit 14 for translating. With this configuration, for example, the translation target does not end in the middle of the sentence, but is the sentence up to the end of the sentence, so that the recognition result can be translated more accurately.

Further, according to the translation device 1, the acquisition unit 12 acquires the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string, and the translation unit 14 outputs the recognized character string. Each time a string is acquired, up to the end of the recognized character string detected by the detection unit 13 in the recognized character string may be translated. With this configuration, for example, translation is performed each time a recognized character string is output (if the end of a sentence is detected), so a translation result can be obtained at an early timing.

Further, according to the translation device 1, the translation unit 14 does not need to translate the recognized character string when the detection unit 13 does not detect the end of the recognized character string. With this configuration, for example, a recognized character string that ends in the middle of a sentence is not translated, so that it is possible to prevent generation of misleading translation results.

Also, according to the translation device 1, the detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. With this configuration, it is possible to prevent translation based on misrecognized sentence endings, thereby preventing generation of misleading translation results. For example, when speech recognition is performed up to "desu" in the utterance "desu but", although "desu but" is originally a single term, "desu" can be prevented from being detected as the end of a sentence.

Further, according to the translation device 1, the detection unit 13 may detect the end of the sentence of the part of the recognized character string that excludes a predetermined number of characters from the end. With this configuration, similarly to the above, it is possible to prevent translation based on the erroneously recognized end of a sentence, thereby preventing generation of misleading translation results.

Further, according to the translation device 1, the translation unit 14 does not need to translate the recognized character string when the end of the sentence is within a predetermined number of characters from the end of the recognized character string. With this configuration, similarly to the above, it is possible to prevent translation based on the erroneously recognized end of a sentence, thereby preventing generation of misleading translation results.

Further, according to the translation device 1, when the detection unit 13 detects the end of the sentence of the recognized character string, the detection unit 13 inserts the end-of-sentence symbol into the detected part of the recognized character string, and the translation unit 14 inserts the end-of-sentence symbol into Of the recognized character string, up to the end of the sentence may be translated. With this configuration, it is possible to more reliably translate up to the end of the sentence. In addition, the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.

Further, according to the translation apparatus 1, the detecting unit 13 inputs the recognized character string with the sentence ending symbol removed into the sentence ending symbol insertion model that outputs a character string with the sentence ending symbol inserted when the character string without the sentence ending symbol is input. The end of the recognized character string may be detected based on the output character string, which is the character string obtained by With this configuration, processing can be performed based on a more accurate output character string that has been corrected based on the sentence ending symbol insertion model, so that more accurate processing can be performed.

Further, according to the translation device 1, when the detection unit 13 detects the end of the recognized character string, the detection unit 13 replaces the recognized character string with the output character string, and the translation unit 14 detects the end of the sentence in the replaced recognized character string. You can translate up to the symbol. With this configuration, it is possible to more reliably translate up to the end of the sentence. In addition, the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.

Further, according to the translation device 1, the sentence ending symbol insertion model may be a trained model trained based on learning data that is a set of a character string without a sentence ending symbol and a character string with a sentence ending symbol. With this configuration, it is possible to more reliably generate a sentence ending mark insertion model that provides more accurate output.

The translation device 1 is an easy-to-understand speech translation control device that applies the end-of-sentence determination technology. The translation method by the translation device 1 is an easy-to-understand speech translation control method that applies the end-of-sentence determination technology.

In a system that translates speech (including handwritten characters, etc.; the same shall apply hereinafter) in real time, how to present speech (including text, etc.; the same applies hereinafter) recognition results and translation results is important for ease of understanding by the user. be. When the translation result is output after the speech recognition result is confirmed by the silent interval, the timing of outputting the translation result is delayed if the utterance is long, and the user cannot keep up with the speech (see FIGS. 13 and 14).

FIG. 13 is a diagram showing an example of conventional translation behavior (Part 1). FIG. 14 is a diagram showing an example (part 2) of conventional translation behavior. 13 and 14 show an example of behavior when a translation result is output for each silent section. As shown in FIGS. 13 and 14, when the speech recognition results are confirmed, the translation results are collectively displayed, which makes it difficult for the user to read. In the case of a presentation, etc., the material advances to the next page, and the user cannot keep up with the story.

On the other hand, when translating and outputting speech recognition results that are updated as needed during speech recognition, the translation results are frequently updated, making it difficult to read. (see FIGS. 15-17).

FIG. 15 is a diagram showing an example (part 1) of conventional sequential translation behavior. FIG. 16 is a diagram showing a behavior example (part 2) of conventional sequential translation. FIG. 17 is a diagram showing a behavior example (part 3) of conventional sequential translation. 15 to 17 show examples of behavior when outputting results of sequential translation. As shown in FIGS. 15 to 17, the translation results fluctuate and are difficult for the user to read. In addition, since the translation is performed in the middle of the sentence, misunderstandings are caused by the user. For example, as shown in FIG. 16, the word "whatever" is translated up to the part of "how", which creates a misunderstanding as a surprise.

In addition, since the output speech recognition result may change during speech recognition, the translation result may also change accordingly, and translation during speech recognition may cause misunderstanding by the user (Fig. 18 and See Figure 19).

FIG. 18 is a diagram showing another behavior example (Part 1) of conventional consecutive translation. FIG. 19 is a diagram showing another behavior example (part 2) of conventional successive translation. 18 and 19 show an example of behavior when outputting results of sequential translation. As shown in FIGS. 18 and 19, the speech recognition result changes, and the translation result also changes accordingly. That is, it is difficult for the user to read, and the user may misunderstand.

According to the translation device 1, at the timing when the speech recognition result is determined as a meaningful sentence, the translation result is output for each meaningful sentence. Regardless of the silence interval, the timing of the output of the translation result is quicker, and the translation of the middle sentence is performed by sequentially judging the end of the sentence while considering the change in the output speech recognition result during the speech recognition for the speech recognition result. It does not cause user misunderstanding due to By performing the above processing, easy-to-understand speech translation can be realized. FIG. 20 is a diagram showing a target of translation by the translation apparatus 1. As shown in FIG. As shown in FIG. 20, the translation apparatus 1 translates up to the end of the sentence.

According to the translation device 1, punctuation processing is sequentially performed on the intermediate results of speech recognition, and if a sentence contains a sentence-ending symbol (such as "."), the sentence up to the sentence-ending symbol is machine-translated and output. According to the translation apparatus 1, the interim speech recognition result is successively subjected to punctuation processing and punctuation is inserted. If the sentence contains a sentence ending symbol (such as "."), the sentence up to the sentence ending symbol is machine-translated and provisionally output. As a result, the translation result can be output quickly in units that make sense.

According to the translation device 1, when the position of the end of sentence symbol is the last n characters (for example, 5 characters) of the interim speech recognition result, it is not determined as the end of the sentence. As a result, erroneous end-of-sentence determination during speech recognition can be eliminated. The sentence ending symbol is added by punctuation processing, and is used to determine the unit of the sentence to be translated.

In the translation device 1, erroneous recognition of the end of the sentence is reduced by performing punctuation processing on successive speech recognition and translating up to the end of the sentence if it is not within the last n characters. In the conventional technology, for example, when ``desu'' in the utterance ``desu but'' is recognized, the end of the sentence is often erroneously recognized.

According to the translation device 1, sequential punctuation processing (only for speech recognition results, not for translation results) is performed, and translation is performed up to the end of a sentence. Also, the sequential speech recognition result is output up to the sentence ending symbol (for example, the part with ".").

According to the translation device 1, every time the speech recognition result is updated, punctuation processing is performed, and the end of the sentence is translated and output. Also, when judging the end of a sentence, the last n characters are ignored.

According to the translation device 1, if there is a period in the result of applying punctuation processing to the speech recognition result, the translation is translated up to the period, and if not, it is not translated. Output the translation result. Consider trailing n characters.

According to the translation device 1, every time the speech recognition result is updated, punctuation is processed, the end of the sentence is translated and output (the translation result is also updated each time (image of overwriting)), and when the end of the sentence is determined, the last n characters If a sentence-ending symbol is included within, it is not determined as the end of the sentence. That is, since an unintended erroneous determination of the end occurs during speech recognition, the end n characters are considered.

According to the translation device 1, punctuation is processed for "sequential" speech recognition, and if the end of the sentence is not within the last n characters, it is translated up to that point.

According to the translation device 1, the following processing is continuously executed during speech recognition.
・"The voice recognition result is updated (about every 0.2 seconds)
・Perform punctuation processing and judge the end of sentence ・Perform translation and output

According to the translation device 1, changes are observed by continuously recognizing, judging, and updating the translation results, and changes are handled by outputting until the speech recognition results are finalized.

According to the translation device 1, the translation is performed up to the judgment of the end of the sentence.

It should be noted that the block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are implemented by any combination of at least one of hardware and software. Also, the method of realizing each functional block is not particularly limited. That is, each functional block may be implemented using one device physically or logically coupled, or directly or indirectly using two or more physically or logically separated devices (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.

Functions include judging, determining, determining, calculating, calculating, processing, deriving, examining, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't For example, a functional block (component) that performs transmission is called a transmitting unit or transmitter. In either case, as described above, the implementation method is not particularly limited.

For example, the translation device 1 according to the embodiment of the present disclosure may function as a computer that performs the translation method of the present disclosure. FIG. 21 is a diagram showing an example of a hardware configuration of translation device 1 according to an embodiment of the present disclosure. The translation device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

In the following explanation, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the translation device 1 may be configured to include one or more of the devices shown in the figure, or may be configured without some of the devices.

Each function of the translation apparatus 1 is performed by causing the processor 1001 to perform calculations, controlling communication by the communication device 1004 and controlling the and by controlling at least one of reading and writing of data in the storage 1003 .

The processor 1001, for example, operates an operating system and controls the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like. For example, the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the processor 1001 .

Also, the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, the learning unit 11, the acquiring unit 12, the detecting unit 13, and the translating unit 14 may be stored in the memory 1002 and implemented by a control program running on the processor 1001, and other functional blocks may be implemented in the same way. good too. Although it has been explained that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.

The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program code), software modules, etc. for implementing a wireless communication method according to an embodiment of the present disclosure.

The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .

The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high-frequency switch, a duplexer, a filter, a frequency synthesizer, etc., in order to realize at least one of frequency division duplex (FDD) and time division duplex (TDD). may consist of For example, the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the communication device 1004 .

The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (for example, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.

The translation device 1 also includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). may be configured, and a part or all of each functional block may be realized by the hardware. For example, processor 1001 may be implemented using at least one of these pieces of hardware.

Notification of information is not limited to the aspects/embodiments described in the present disclosure, and may be performed using other methods.

Each aspect/embodiment described in the present disclosure includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system) system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), and other suitable systems and extended It may be applied to at least one of the next generation systems. Also, a plurality of systems may be applied in combination (for example, a combination of at least one of LTE and LTE-A and 5G, etc.).

The order of the processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in the present disclosure may be changed as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.

Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.

The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).

Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching along with execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.

Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described in this disclosure. The present disclosure can be practiced with modifications and variations without departing from the spirit and scope of the present disclosure as defined by the claims. Accordingly, the description of the present disclosure is for illustrative purposes and is not meant to be limiting in any way.

Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.

In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.

The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of

The terms explained in this disclosure and terms necessary for understanding this disclosure may be replaced with terms having the same or similar meanings.

The terms "system" and "network" used in this disclosure are used interchangeably.

In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.

The names used for the parameters described above are not restrictive names in any respect. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure.

The terms "determining" and "determining" used in this disclosure may encompass a wide variety of actions. "Judgement" and "determination" are, for example, judging, calculating, computing, processing, deriving, investigating, looking up, search, inquiry (eg, lookup in a table, database, or other data structure), ascertaining as "judged" or "determined", and the like. Also, “judgment” and “decision” are used for receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, access (accessing) (for example, accessing data in memory) may include deeming that something has been "determined" or "decided". In addition, "judgment" and "decision" are considered to be "judgment" and "decision" by resolving, selecting, choosing, establishing, comparing, etc. can contain. In other words, "judgment" and "decision" may include considering that some action is "judgment" and "decision". Also, "judgment (decision)" may be read as "assuming", "expecting", "considering", or the like.

The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements being "connected" or "coupled." Couplings or connections between elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access". As used in this disclosure, two elements are defined using at least one of one or more wires, cables, and printed electrical connections and, as some non-limiting and non-exhaustive examples, in the radio frequency domain. , electromagnetic energy having wavelengths in the microwave and optical (both visible and invisible) regions, and the like.

The term "based on" as used in this disclosure does not mean "based only on" unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."

Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.

"Means" in the configuration of each device described above may be replaced with "unit", "circuit", "device", or the like.

Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.

In the present disclosure, when articles are added by translation, such as a, an and the in English, the present disclosure may include that nouns following these articles are plural.

In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."

Reference Signs List 1 translation device 2 recognition device 3 translation system 10 storage unit 11 learning unit 12 acquisition unit 13 detection unit 14 translation unit 1001 processor 1002 memory 1003 Storage 1004 Communication device 1005 Input device 1006 Output device 1007 Bus.

Claims

an acquisition unit that acquires a recognized character string that is a character string of a speech recognition result or a character recognition result;
a detection unit that detects the end of the recognition character string;
a translation unit that translates up to the end of the sentence in the recognized character string;
A translation device with
The acquisition unit acquires the recognized character string each time a function that performs voice recognition or character recognition outputs the recognized character string,
The translation unit translates up to the end of the recognized character string detected by the detection unit in the recognized character string each time the acquisition unit acquires the recognized character string.
A translation device according to claim 1.
The translation unit does not translate the recognized character string if the detection unit does not detect the end of the recognized character string.
3. A translation device according to claim 1 or 2.
The detection unit does not detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence.
A translation device according to any one of claims 1 to 3.
The detection unit detects the end of a part of the recognized character string excluding a predetermined number of characters from the end,
A translation device according to any one of claims 1 to 4.
The translation unit does not translate the recognized character string if the end of the sentence is within a predetermined number of characters from the end of the recognized character string.
A translation device according to any one of claims 1 to 5.
The detection unit, when detecting the end of a sentence in the recognized character string, inserts an end-of-sentence symbol into the detected part of the recognized character string,
The translation unit translates up to the sentence ending symbol in the recognized character string in which the sentence ending symbol is inserted.
A translation device according to any one of claims 1 to 6.
The detection unit is a character string obtained by inputting the recognized character string with the sentence ending symbol removed into a sentence ending symbol insertion model that outputs a character string with a sentence ending symbol inserted when a character string without a sentence ending symbol is input. detecting the end of the recognition string based on the output string;
A translation device according to any one of claims 1 to 7.
The detection unit replaces the recognized character string with the output character string when detecting the end of the recognized character string,
The translation unit translates the replaced recognition character string up to the end of the sentence,
9. A translation device according to claim 8.
The sentence ending symbol insertion model is a trained model trained based on learning data that is a set of a character string without a sentence ending symbol and a character string with a sentence ending symbol.
10. A translation device according to claim 8 or 9.