WO2023135963A1 - Translation device - Google Patents

Translation device Download PDF

Info

Publication number
WO2023135963A1
WO2023135963A1 PCT/JP2022/043979 JP2022043979W WO2023135963A1 WO 2023135963 A1 WO2023135963 A1 WO 2023135963A1 JP 2022043979 W JP2022043979 W JP 2022043979W WO 2023135963 A1 WO2023135963 A1 WO 2023135963A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
sentence
translation
recognized character
unit
Prior art date
Application number
PCT/JP2022/043979
Other languages
French (fr)
Japanese (ja)
Inventor
謙吾 竹谷
憲卓 岡本
心語 郭
Original Assignee
株式会社Nttドコモ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Nttドコモ filed Critical 株式会社Nttドコモ
Publication of WO2023135963A1 publication Critical patent/WO2023135963A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • One aspect of the present disclosure relates to a translation device that translates a character string of speech recognition results or character recognition results.
  • Patent Document 1 a speech recognition result is generated by performing speech recognition processing on an input uttered voice, and the machine translation result is obtained by machine-translating the speech recognition result from a first language into a second language.
  • An interpretation device for generating is disclosed.
  • a translation device includes an acquisition unit that acquires a recognized character string that is a character string of a speech recognition result or a character recognition result, a detection unit that detects the end of the recognized character string, and and a translation unit for translating up to the end of a sentence.
  • more accurate translation can be performed on recognition results.
  • FIG. 4 is a diagram showing an example (part 1) of translation behavior by the translation device according to the embodiment;
  • FIG. 10 is a diagram showing an example (part 2) of behavior of translation by the translation device according to the embodiment;
  • FIG. 10 is a diagram showing an example (part 3) of translation behavior by the translation device according to the embodiment;
  • FIG. 11 is a diagram showing an example (part 4) of translation behavior by the translation device according to the embodiment; It is a figure which shows the behavior example (1) of the conventional translation. It is a figure which shows the behavior example (2) of the conventional translation.
  • FIG. 10 is a diagram showing an example (part 1) of behavior of conventional sequential translation;
  • FIG. 10 is a diagram showing a behavior example (part 2) of conventional sequential translation;
  • FIG. 10 is a diagram showing a behavior example (part 3) of conventional sequential translation;
  • FIG. 10 is a diagram showing another behavior example (No. 1) of conventional sequential translation;
  • FIG. 10 is a diagram showing another behavior example (No. 2) of conventional sequential translation;
  • FIG. 4 is a diagram showing a target of translation by the translation device according to the embodiment; It is a figure which shows an example of the hardware constitutions of the computer used with the translation apparatus which concerns on embodiment.
  • FIG. 1 is a diagram showing an example of the system configuration of the translation system 3 including the translation device 1 according to the embodiment.
  • the translation system 3 includes a translation device 1 and a recognition device 2 .
  • the translation device 1 and the recognition device 2 are connected for communication with each other via a network such as the Internet, and can exchange information with each other.
  • the translation device 1 is a computer device that translates a recognized character string, which is a character string resulting from speech recognition or character recognition.
  • a voice recognition result is a result of voice recognition.
  • Speech recognition is a technology that allows a computer to recognize human voices and convert them into character strings.
  • a character recognition result is a result of character recognition.
  • Character recognition is a technique for making a computer recognize images of printed characters or handwritten text and convert them into character strings. Existing technologies are used for speech recognition and character recognition in this embodiment.
  • a string is a set of one or more characters.
  • a recognized character string is a character string including at least one of a character string resulting from speech recognition and a character string resulting from character recognition.
  • Translation means replacing a character string expressed in a first language with a second language that is different from the first language.
  • the first language is, for example, Japanese, but may be any other language.
  • the second language is, for example, English, but may be any other language.
  • the first language and the second language may be different local dialects (for example, standard Japanese and Kansai dialect in Japan).
  • the language is not limited to natural language, but may be artificial language or formal language (such as computer programming language).
  • the translation is, for example, machine translation, which is automatic translation using a computer. The details of the translation device 1 will be described later.
  • the recognition device 2 is a computer device equipped with a function to perform voice recognition or character recognition.
  • the recognition device 2 inputs a human voice (according to the person speaking) in real time, performs voice recognition, and transmits the generated recognition character string to the translation device 1 via the network.
  • the recognition device 2 inputs an image of handwritten text in real time (as a person handwrites the text), performs character recognition, and transmits the generated recognized character string to the translation device 1 via the network.
  • the translation device 1 uses the recognized character string in a function block described later.
  • the functions of the recognition device 2 described above may be incorporated into the translation device 1, and the same processing may be performed in the translation device 1. That is, the translation device 1 has a function of performing voice recognition or character recognition, the voice recognition or character recognition is performed by the translation device 1, and the generated recognized character string is used in the function blocks of the translation device 1, which will be described later. may
  • FIG. 2 is a diagram showing an example of the functional configuration of the translation device 1 according to the embodiment.
  • the translation device 1 includes a storage unit 10, a learning unit 11, an acquisition unit 12 (acquisition unit), a detection unit 13 (detection unit), and a translation unit 14 (translation unit).
  • Each functional block of the translation device 1 is assumed to function within the translation device 1, but is not limited to this.
  • some of the functional blocks of the translation device 1 are computer devices different from the translation device 1, and function while appropriately transmitting and receiving information to and from the translation device 1 within the computer device connected to the translation device 1 via a network.
  • some functional blocks of the translation apparatus 1 may be omitted, a plurality of functional blocks may be integrated into one functional block, or one functional block may be decomposed into a plurality of functional blocks. .
  • the storage unit 10 stores arbitrary information used in calculations in the translation device 1, calculation results in the translation device 1, and the like.
  • the information stored by the storage unit 10 may be referred to by each function of the translation device 1 as appropriate.
  • the storage unit 10 stores a sentence ending symbol insertion model for outputting a character string in which a sentence delimiting symbol (or a sentence ending symbol) is inserted when a character string without a sentence delimiting symbol (or a sentence ending symbol) is input.
  • sentence delimiters in Japanese include ",”, ".”, "!, and "?”.
  • sentence ending mark insertion model when a character string without a sentence delimiter is input, a character string with a sentence delimiter inserted is output.
  • the sentence ending insertion model may be generated by existing technology.
  • a sentence ending mark insertion model is a trained model that has been (machine) learned based on training data, which is a set of character strings without sentence delimiters (or sentence ending marks) and strings with sentence delimiters (or sentence ending marks). may be
  • FIG. 3 is a diagram showing an example of learning data.
  • character strings without sentence delimiters and character strings with sentence delimiters are associated as pairs.
  • the learning data shown in FIG. 3 exemplifies all or part of one sentence (sentence), but the present invention is not limited to this. For example, all or part of two or more sentences good too.
  • character strings without sentence delimiters can be regarded as input data
  • character strings with sentence delimiters can be regarded as teacher data.
  • the learning data is a character string obtained by removing the sentence delimiter (or sentence ending symbol) from the extracted character string, which is a part of the character string extracted from the character string with the sentence delimiter (or sentence ending symbol), and the extracted character string. It may be a pair with
  • the extracted character string may be, for example, a partial character string obtained by dividing a character string with a sentence delimiter (or a sentence end symbol) into word units and dividing at random positions.
  • FIG. 4 is a diagram showing an example of a method of generating learning data.
  • the original data which is a character string with a sentence delimiter, "Well, the meeting will begin.”
  • four extracted character strings “Well, the meeting will begin.” "Conference” and "Begin” are taken out. Then, for each of the four extracted character strings, the character strings "Now, the meeting will begin", "Well", “Meeting", and "Start” are generated from which the sentence delimiters are removed.
  • a set of "Well, let's start the meeting” and “Well, let's start the meeting" a set of "Now” and “Well”
  • a set of "begin” and "begin” is learning data.
  • Strings with sentence delimiters (or sentence-ending symbols) included in the training data are labels in sequence labeling that indicate whether sentence delimiters (or sentence-ending symbols) come next for each word that composes the string. may be given. In that case, a character string without a sentence delimiter (or sentence ending symbol) included in the learning data may be divided into words.
  • machine learning can be performed as a sequence labeling task to predict which sentence delimiter will follow which word.
  • FIG. 5 is a diagram showing an example of labeled learning data in series labeling.
  • FIG. 5 for example, with respect to a character string with a sentence delimiter, "Now, let's start a meeting.”
  • a label " ⁇ O>” indicating that there is no sentence delimiter next to “meeting” and "o” is given, and a label “ ⁇ PERIOD>” indicating that a full stop comes next to the word “begin”. ” is given.
  • the learning unit 11 generates a sentence ending symbol insertion model. More specifically, the learning unit 11 performs (machine) learning based on learning data that is a set of a character string without a sentence delimiter (or a sentence end symbol) and a character string with a sentence delimiter (or a sentence end symbol). and generate a sentence ending mark insertion model as a trained model. In addition, the learning unit 11 may perform (machine) learning based on the various types of learning data described above to generate a sentence ending symbol insertion model. Further, the learning unit 11 may generate the learning data itself based on the method of generating the learning data described above.
  • the learning unit 11 causes the storage unit 10 to store the generated sentence ending symbol insertion model.
  • the sentence ending symbol insertion model stored in the storage unit 10 may not be generated by the learning unit 11, but may be generated by another device and obtained via a network.
  • the acquisition unit 12 acquires a recognized character string, which is a character string resulting from speech recognition or character recognition.
  • the acquisition unit 12 may receive (acquire) the recognized character string transmitted from the recognition device 2, or acquire the recognized character string generated by the speech recognition or character recognition function provided in the translation device 1. Alternatively, a recognized character string stored in advance by the storage unit 10 may be obtained.
  • the acquisition unit 12 may output the acquired recognized character string to the detection unit 13 and the translation unit 14, may store the acquired recognition character string in the storage unit 10, or may translate the acquired recognition character string through a communication device 1004 or an output device 1006, which will be described later. It may be displayed (output) to the user of the device 1 or may be transmitted (output) to another device.
  • the acquisition unit 12 may acquire the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string.
  • the speech recognition or character recognition function of the recognition device 2 can input human voice in real time (according to human speech), or input a handwritten text image (human handwriting ), the acquisition unit 12 may sequentially (sequentially) acquire the recognized character string each time the recognized character string is sequentially (sequentially) output by inputting in real time.
  • the acquiring unit 12 may sequentially (sequentially) output the acquired recognized character string to the detecting unit 13 and the translating unit 14 each time it sequentially (sequentially) acquires the recognized character string. ) may be stored.
  • the detection unit 13 detects (determines) the end of the recognized character string. More specifically, the detection unit 13 detects the end of the recognized character string acquired (output) by the acquisition unit 12 . For example, the detection unit 13 detects whether or not the end of the sentence is included in the recognized character string, and if the end of the sentence is included, the position of the end of the sentence in the recognized character string (for example, how many characters from the beginning of the recognized character string) is detected. To detect. Further, for example, the detection unit 13 detects the position of the end of the sentence in the recognized character string (for example, what character it is from the beginning of the recognized character string), and detects that the end of the sentence is not included.
  • An existing technique may be used as a method for detecting the end of a sentence in a character string (recognition character string) by the detection unit 13 .
  • the detection unit 13 may detect the end of the recognized character string based on the end-of-sentence symbol included in the recognized character string.
  • the detecting unit 13 may detect the plurality of sentence endings, or detect the sentence ending closest to the beginning of the recognized character string as the sentence ending.
  • the end of the sentence closest to the end of the recognized character string may be detected as the end of the sentence, or the end of the sentence that satisfies a predetermined criterion may be detected as the end of the sentence.
  • the detection unit 13 may output the detection result to the translation unit 14 or store it in the storage unit 10 .
  • the detection result may include, for example, information regarding the position (or plural positions) of the end of the sentence in the recognized character string and information as to whether or not the end of the sentence is included in the recognized character string.
  • the detection unit 13 outputs a character string obtained by inputting a recognized character string from which a sentence ending symbol is removed into a sentence ending symbol insertion model that outputs a character string with a sentence ending symbol inserted when a character string without a sentence ending symbol is input.
  • the end of the recognized character string may be detected based on the character string. More specifically, the detecting unit 13 detects the recognition character string based on the output character string obtained by inputting the recognition character string from which the sentence ending symbol is removed into the sentence ending symbol insertion model stored in advance by the storage unit 10. Detect the end of a sentence. For example, the detection unit 13 inputs an output character string "Well, meeting. is the end of the sentence (or the end of the sentence immediately after "su") in the recognized character string is detected.
  • the detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is five, the detection unit 13 detects that "su" in the recognized character string is at the end of the sentence (or immediately after "su"). end of sentence) is not detected.
  • the detection unit 13 may detect the end of the part of the recognized character string that excludes a predetermined number of characters from the end. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is 6, the detection unit 13 detects the portion of the recognized character string within six characters from the end excluding "su. Detects the end of the sentence "thank you” (resultingly, the end of the sentence cannot be detected).
  • the detection unit 13 may insert an end-of-sentence symbol into the detected part of the recognized character string. For example, when the detection unit 13 detects that "su" in the recognition character string "Now, the meeting will start” is the end of the sentence, the detection unit 13 inserts the sentence ending symbol ".” Assume that the subsequent recognition character string is "Well, let's start the meeting.” The detection unit 13 may output the inserted recognized character string to the translation unit 14 .
  • the detection unit 13 may replace the recognized character string with the output character string (obtained by the sentence-end symbol insertion model). For example, by inputting the recognition character string ⁇ Well, the meeting begins.'' to the sentence ending mark insertion model, ⁇ Well, the meeting begins. If so, the detection unit 13 replaces the recognized character string "Well, the meeting will start.” with the output character string "Well, the meeting will start.”. The detection unit 13 may output the recognized character string after replacement to the translation unit 14 .
  • the translation unit 14 translates the recognized character string (from the beginning of the recognized character string) to the end of the sentence. More specifically, the translation unit 14 extracts the sentence end (output from the detection unit 13) detected by the detection unit 13 (with respect to the recognized character string) in the recognized character string acquired (output) by the acquisition unit 12. Translate up to the end of the sentence) based on the detected result. Existing technology such as machine translation may be used for the translation.
  • the translation unit 14 may display (output) the translation result to the user of the translation device 1 via the communication device 1004 or the output device 1006, which will be described later, or transmit (output) it to another device. , may be stored by the storage unit 10 .
  • the translation unit 14 may translate up to the end of the recognized character string detected by the detection unit 13 in the recognized character string each time the acquisition unit 12 acquires the recognized character string. That is, each time the acquisition unit 12 acquires a recognized character string, the processes of the detection unit 13 and the translation unit 14 may be executed to translate the recognized character string up to the end of the sentence.
  • the translation unit 14 does not have to translate the recognized character string. More specifically, when the detection result output by the detection unit 13 includes information indicating that the recognized character string does not include the end of the sentence, the translation unit 14 does not translate the recognized character string ( skip translation).
  • the translation unit 14 does not need to translate the recognized character string if the end of the sentence is within a predetermined number of characters from the end of the recognized character string. More specifically, the translation unit 14 determines that the end of the sentence detected by the detection unit 13 for the recognized character string acquired (output) by the acquisition unit 12 (the end of the sentence based on the detection result output from the detection unit 13) is , if it is within a predetermined number of characters from the end of the recognized string, the recognized string may not be translated. For example, if the recognized character string is "thank you. But", the detected end of the sentence is ".”, and the predetermined number of characters is five, the translation unit 14 determines that the end of the sentence ".” However, since it is within 5 characters from the end of ", the recognized character string is not translated.
  • the translation unit 14 may translate up to the sentence ending symbol in the recognized character string into which the sentence ending symbol is inserted. For example, the translation unit 14 translates the inserted recognized character string “Now, the meeting is starting.”
  • the translation unit 14 may translate up to the end of the sentence in the replaced recognition character string. For example, the translation unit 14 translates the replaced recognized character string “Now, the meeting is starting.”
  • FIG. 6 is a flowchart showing an example of translation processing executed by the translation device 1.
  • the acquisition unit 12 acquires a recognized character string (step S1).
  • the detection unit 13 detects the end of the recognized character string acquired in S1 (step S2).
  • the translation unit 14 translates the recognized character string acquired in S1 up to the end of the sentence detected in S2 (step S3).
  • FIG. 7 is a flowchart showing another example of translation processing (sequential processing) executed by the translation apparatus 1.
  • the acquisition unit 12 acquires (attempts to acquire) a recognized character string (step S10).
  • the acquiring unit 12 determines whether or not the recognized character string has been acquired in S10 (step S11). If it is determined in S11 that the information could not be obtained (S11: NO), the process returns to S10. On the other hand, if it is determined that the character string has been acquired in S11 (S11: YES), the detection unit 13 detects (attempts to detect) the end of the recognized character string acquired in S10 (step S12).
  • the detection unit 13 determines whether or not the end of the sentence has been detected in S12 (step S13). If it is determined in S13 that detection was not possible (S13: NO), the process returns to S10 (without translation by the translation unit 14). On the other hand, if it is determined in S13 that detection was possible (S13: YES), the translation unit 14 translates the recognized character string acquired in S10 up to the end of the sentence detected in S12 (step S14). , the process returns to S10 again.
  • FIG. 8 is a flow chart showing still another example of translation processing (sequential processing and end determination) executed by the translation apparatus 1.
  • FIG. Since S20 to S23 are the same as S10 to S13 in FIG. 7 respectively, the description thereof is omitted. If it is determined in S23 that the detection was successful (S23: YES), the detection unit 13 or translation unit 14 determines that the end of the sentence detected in S22 is n characters from the end of the recognized character string acquired in S20 (n is Integer of 1 or more) is determined (step S24). If it is determined in S24 that the number of characters is within n characters (S24: NO), the process returns to S20 (without translation by the translation unit 14).
  • the translation unit 14 translates the recognized character string acquired in S20 up to the end of the sentence detected in S22 ( Step S25), and return to S20 again.
  • FIG. 9 As a premise, the upper part of the balloon in the drawing shows the recognized character string acquired and displayed by the acquisition unit 12, and the lower part shows the translation result (translated character string) translated and displayed by the translation unit 14. In addition, when the translation result is "", it indicates that it is waiting for translation (waiting).
  • FIG. 9 is a diagram showing an example (part 1) of translation behavior by the translation device 1.
  • the acquisition unit 12 sequentially acquires and displays recognized character strings. Specifically, the acquiring unit 12 first displays the acquired recognized character string “mukashi,” (first state), and then displays the acquired recognized character string “mukashi, mukashi,” (second state). state), and then the acquired recognized character string “mukashi, mukashi, in a certain place, grandfather and grandmother” is displayed (third state). In each state, the detection unit 13 detects the end of the sentence, but the translation unit 14 does not perform translation because the end of the sentence is not detected.
  • FIG. 10 is a diagram showing an example (part 2) of translation behavior by the translation device 1.
  • FIG. FIG. 10 shows the behavior following FIG. 9 (encircled A).
  • the acquisition unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fourth state).
  • the detection unit 13 detects the end of the sentence, but the end of the sentence (the end of sentence symbol “.” in FIG. 10) within five characters from the end of the recognized character string is not detected as the end of the sentence. , the end of the sentence is not detected, and translation by the translation unit 14 is not continued.
  • FIG. 11 is a diagram showing an example (part 3) of translation behavior by the translation apparatus 1.
  • FIG. FIG. 11 shows the behavior following FIG. 10 (encircled B).
  • the acquiring unit 12 additionally displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fifth state).
  • the detection unit 13 detects the end of the sentence (the end of sentence symbol “.” in FIG. 11) (because the end of the sentence is beyond the number of five characters from the end of the recognized character string), the translation unit 14 translation is done.
  • the translation unit 14 uses the current recognized character string "mukashi, mukashi, in a certain place, where an old man and an old woman were piled up. Grandpa.”
  • the translation string "Once upon a time, there was an old man and an old woman who were piled up.” is displayed at the bottom.
  • FIG. 12 is a diagram showing an example (part 4) of translation behavior by the translation apparatus 1.
  • FIG. FIG. 12 shows the behavior following FIG. 11 (encircled C).
  • the acquiring unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman lived in a certain place. The old man went to the mountains" (state 6).
  • the detection unit 13 detects the end of the sentence, so translation is performed by the translation unit 14 .
  • the translation unit 14 converts the currently recognized character string "Once upon a time, in a certain place, an old man and an old woman lived together. The old man went to the mountains.” An old man and an old woman lived in.” is displayed at the bottom.
  • the acquisition unit 12 acquires the recognized character string which is the character string of the speech recognition result or the character recognition result
  • the detection unit 13 detects the end of the recognized character string, and the end of the recognized character string.
  • a translation unit 14 for translating for example, the translation target does not end in the middle of the sentence, but is the sentence up to the end of the sentence, so that the recognition result can be translated more accurately.
  • the acquisition unit 12 acquires the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string, and the translation unit 14 outputs the recognized character string.
  • the translation unit 14 outputs the recognized character string.
  • Each time a string is acquired up to the end of the recognized character string detected by the detection unit 13 in the recognized character string may be translated.
  • translation is performed each time a recognized character string is output (if the end of a sentence is detected), so a translation result can be obtained at an early timing.
  • the translation unit 14 does not need to translate the recognized character string when the detection unit 13 does not detect the end of the recognized character string.
  • the detection unit 13 does not detect the end of the recognized character string.
  • the detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. With this configuration, it is possible to prevent translation based on misrecognized sentence endings, thereby preventing generation of misleading translation results. For example, when speech recognition is performed up to "desu" in the utterance “desu but", although “desu but” is originally a single term, "desu" can be prevented from being detected as the end of a sentence.
  • the detection unit 13 may detect the end of the sentence of the part of the recognized character string that excludes a predetermined number of characters from the end.
  • the translation unit 14 does not need to translate the recognized character string when the end of the sentence is within a predetermined number of characters from the end of the recognized character string.
  • the detection unit 13 when the detection unit 13 detects the end of the sentence of the recognized character string, the detection unit 13 inserts the end-of-sentence symbol into the detected part of the recognized character string, and the translation unit 14 inserts the end-of-sentence symbol into Of the recognized character string, up to the end of the sentence may be translated.
  • the detection unit 13 when the detection unit 13 detects the end of the sentence of the recognized character string, the detection unit 13 inserts the end-of-sentence symbol into the detected part of the recognized character string, and the translation unit 14 inserts the end-of-sentence symbol into Of the recognized character string, up to the end of the sentence may be translated.
  • the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.
  • the detecting unit 13 inputs the recognized character string with the sentence ending symbol removed into the sentence ending symbol insertion model that outputs a character string with the sentence ending symbol inserted when the character string without the sentence ending symbol is input.
  • the end of the recognized character string may be detected based on the output character string, which is the character string obtained by
  • the detection unit 13 when the detection unit 13 detects the end of the recognized character string, the detection unit 13 replaces the recognized character string with the output character string, and the translation unit 14 detects the end of the sentence in the replaced recognized character string. You can translate up to the symbol. With this configuration, it is possible to more reliably translate up to the end of the sentence. In addition, the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.
  • the sentence ending symbol insertion model may be a trained model trained based on learning data that is a set of a character string without a sentence ending symbol and a character string with a sentence ending symbol. With this configuration, it is possible to more reliably generate a sentence ending mark insertion model that provides more accurate output.
  • the translation device 1 is an easy-to-understand speech translation control device that applies the end-of-sentence determination technology.
  • the translation method by the translation device 1 is an easy-to-understand speech translation control method that applies the end-of-sentence determination technology.
  • FIG. 13 is a diagram showing an example of conventional translation behavior (Part 1).
  • FIG. 14 is a diagram showing an example (part 2) of conventional translation behavior. 13 and 14 show an example of behavior when a translation result is output for each silent section. As shown in FIGS. 13 and 14, when the speech recognition results are confirmed, the translation results are collectively displayed, which makes it difficult for the user to read. In the case of a presentation, etc., the material advances to the next page, and the user cannot keep up with the story.
  • FIG. 15 is a diagram showing an example (part 1) of conventional sequential translation behavior.
  • FIG. 16 is a diagram showing a behavior example (part 2) of conventional sequential translation.
  • FIG. 17 is a diagram showing a behavior example (part 3) of conventional sequential translation.
  • 15 to 17 show examples of behavior when outputting results of sequential translation.
  • the translation results fluctuate and are difficult for the user to read.
  • misunderstandings are caused by the user. For example, as shown in FIG. 16, the word “whatever” is translated up to the part of "how”, which creates a misunderstanding as a surprise.
  • the translation result may also change accordingly, and translation during speech recognition may cause misunderstanding by the user (Fig. 18 and See Figure 19).
  • FIG. 18 is a diagram showing another behavior example (Part 1) of conventional consecutive translation.
  • FIG. 19 is a diagram showing another behavior example (part 2) of conventional successive translation. 18 and 19 show an example of behavior when outputting results of sequential translation. As shown in FIGS. 18 and 19, the speech recognition result changes, and the translation result also changes accordingly. That is, it is difficult for the user to read, and the user may misunderstand.
  • FIG. 20 is a diagram showing a target of translation by the translation apparatus 1. As shown in FIG. As shown in FIG. 20, the translation apparatus 1 translates up to the end of the sentence.
  • punctuation processing is sequentially performed on the intermediate results of speech recognition, and if a sentence contains a sentence-ending symbol (such as ".”), the sentence up to the sentence-ending symbol is machine-translated and output.
  • the interim speech recognition result is successively subjected to punctuation processing and punctuation is inserted. If the sentence contains a sentence ending symbol (such as ".”), the sentence up to the sentence ending symbol is machine-translated and provisionally output.
  • the translation result can be output quickly in units that make sense.
  • the translation device 1 when the position of the end of sentence symbol is the last n characters (for example, 5 characters) of the interim speech recognition result, it is not determined as the end of the sentence. As a result, erroneous end-of-sentence determination during speech recognition can be eliminated.
  • the sentence ending symbol is added by punctuation processing, and is used to determine the unit of the sentence to be translated.
  • erroneous recognition of the end of the sentence is reduced by performing punctuation processing on successive speech recognition and translating up to the end of the sentence if it is not within the last n characters.
  • the end of the sentence is often erroneously recognized.
  • sequential punctuation processing (only for speech recognition results, not for translation results) is performed, and translation is performed up to the end of a sentence. Also, the sequential speech recognition result is output up to the sentence ending symbol (for example, the part with ".”).
  • the translation device 1 every time the speech recognition result is updated, punctuation processing is performed, and the end of the sentence is translated and output. Also, when judging the end of a sentence, the last n characters are ignored.
  • the translation device 1 if there is a period in the result of applying punctuation processing to the speech recognition result, the translation is translated up to the period, and if not, it is not translated. Output the translation result.
  • the translation device 1 every time the speech recognition result is updated, punctuation is processed, the end of the sentence is translated and output (the translation result is also updated each time (image of overwriting)), and when the end of the sentence is determined, the last n characters If a sentence-ending symbol is included within, it is not determined as the end of the sentence. That is, since an unintended erroneous determination of the end occurs during speech recognition, the end n characters are considered.
  • punctuation is processed for "sequential" speech recognition, and if the end of the sentence is not within the last n characters, it is translated up to that point.
  • the following processing is continuously executed during speech recognition. ⁇ "
  • the voice recognition result is updated (about every 0.2 seconds)
  • the translation device 1 changes are observed by continuously recognizing, judging, and updating the translation results, and changes are handled by outputting until the speech recognition results are finalized.
  • the translation is performed up to the judgment of the end of the sentence.
  • each functional block may be implemented using one device physically or logically coupled, or directly or indirectly using two or more physically or logically separated devices (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices.
  • a functional block may be implemented by combining software in the one device or the plurality of devices.
  • Functions include judging, determining, determining, calculating, calculating, processing, deriving, examining, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc.
  • a functional block (component) that performs transmission is called a transmitting unit or transmitter. In either case, as described above, the implementation method is not particularly limited.
  • the translation device 1 may function as a computer that performs the translation method of the present disclosure.
  • FIG. 21 is a diagram showing an example of a hardware configuration of translation device 1 according to an embodiment of the present disclosure.
  • the translation device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.
  • the term “apparatus” can be read as a circuit, device, unit, or the like.
  • the hardware configuration of the translation device 1 may be configured to include one or more of the devices shown in the figure, or may be configured without some of the devices.
  • Each function of the translation apparatus 1 is performed by causing the processor 1001 to perform calculations, controlling communication by the communication device 1004 and controlling the and by controlling at least one of reading and writing of data in the storage 1003 .
  • the processor 1001 for example, operates an operating system and controls the entire computer.
  • the processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.
  • CPU central processing unit
  • the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the processor 1001 .
  • the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them.
  • programs program codes
  • the program a program that causes a computer to execute at least part of the operations described in the above embodiments is used.
  • the learning unit 11, the acquiring unit 12, the detecting unit 13, and the translating unit 14 may be stored in the memory 1002 and implemented by a control program running on the processor 1001, and other functional blocks may be implemented in the same way. good too.
  • FIG. Processor 1001 may be implemented by one or more chips.
  • the program may be transmitted from a network via an electric communication line.
  • the memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be
  • ROM Read Only Memory
  • EPROM Erasable Programmable ROM
  • EEPROM Electrical Erasable Programmable ROM
  • RAM Random Access Memory
  • the memory 1002 may also be called a register, cache, main memory (main storage device), or the like.
  • the memory 1002 can store executable programs (program code), software modules, etc. for implementing a wireless communication method according to an embodiment of the present disclosure.
  • the storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like.
  • Storage 1003 may also be called an auxiliary storage device.
  • the storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .
  • the communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.
  • the communication device 1004 includes a high-frequency switch, a duplexer, a filter, a frequency synthesizer, etc., in order to realize at least one of frequency division duplex (FDD) and time division duplex (TDD).
  • FDD frequency division duplex
  • TDD time division duplex
  • the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the communication device 1004 .
  • the input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside.
  • the output device 1006 is an output device (for example, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
  • Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information.
  • the bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
  • the translation device 1 also includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array).
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • PLD Physical Location Deposition
  • FPGA Field Programmable Gate Array
  • processor 1001 may be implemented using at least one of these pieces of hardware.
  • Each aspect/embodiment described in the present disclosure includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system) system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), and other suitable systems and extended It may be applied to at least one of the next generation systems. Also, a plurality of systems may be applied in combination (for example, a combination of at least one of LTE and LTE-A and 5G, etc.).
  • Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
  • the determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
  • notification of predetermined information is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
  • Software whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
  • software, instructions, information, etc. may be transmitted and received via a transmission medium.
  • the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
  • wired technology coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.
  • wireless technology infrared, microwave, etc.
  • data, instructions, commands, information, signals, bits, symbols, chips, etc. may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of
  • system and “network” used in this disclosure are used interchangeably.
  • information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
  • determining and “determining” used in this disclosure may encompass a wide variety of actions.
  • “Judgement” and “determination” are, for example, judging, calculating, computing, processing, deriving, investigating, looking up, search, inquiry (eg, lookup in a table, database, or other data structure), ascertaining as “judged” or “determined”, and the like.
  • “judgment” and “decision” are used for receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, access (accessing) (for example, accessing data in memory) may include deeming that something has been "determined” or "decided”.
  • judgment and “decision” are considered to be “judgment” and “decision” by resolving, selecting, choosing, establishing, comparing, etc. can contain.
  • judgment and “decision” may include considering that some action is “judgment” and “decision”.
  • judgment (decision) may be read as “assuming”, “expecting”, “considering”, or the like.
  • connection means any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements being “connected” or “coupled.” Couplings or connections between elements may be physical, logical, or a combination thereof. For example, “connection” may be read as "access”.
  • two elements are defined using at least one of one or more wires, cables, and printed electrical connections and, as some non-limiting and non-exhaustive examples, in the radio frequency domain. , electromagnetic energy having wavelengths in the microwave and optical (both visible and invisible) regions, and the like.
  • any reference to elements using the "first,” “second,” etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
  • a and B are different may mean “A and B are different from each other.”
  • the term may also mean that "A and B are different from C”.
  • Terms such as “separate,” “coupled,” etc. may also be interpreted in the same manner as “different.”

Abstract

This translation device 1 comprises an acquisition unit 12 for acquiring a recognition character string, which is a character string of a speech recognition result or a text recognition result; a detection unit 13 for detecting the end of a sentence in the recognition character string; and a translation unit 14 for translating to the end of the sentence in the recognition character string. This translation device 1 comprises: an acquisition unit 12 for acquiring a recognition character string, which is a character string of a speech recognition result or a text recognition result; a detection unit 13 for detecting the end of a sentence in the recognition character string; and a translation unit 14 translating to the end of the sentence in the recognition character string. The acquisition unit 12 acquires the recognition character string every time a function for carrying out speech recognition or text recognition outputs a recognition character string, and every time the acquisition unit 12 acquires a recognition character string, the translation unit 14 may translate to the end of the sentence detected in said recognition character string by the detection unit 13. The translation unit 14 is not required to translate the relevant character string when the end of a sentence in the recognition character string has not been detected by the detection unit 13. The detection unit 13 is not required to detect, as the end of a sentence, an end of a sentence which is within a prescribed number of characters from the end of the recognition character string.

Description

翻訳装置translation device
 本開示の一側面は、音声認識結果又は文字認識結果の文字列を翻訳する翻訳装置に関する。 One aspect of the present disclosure relates to a translation device that translates a character string of speech recognition results or character recognition results.
 下記特許文献1には、入力された発話音声に音声認識処理を行うことによって音声認識結果を生成し、音声認識結果を第1の言語から第2の言語に機械翻訳することによって機械翻訳結果を生成する通訳装置が開示されている。 In Patent Document 1 below, a speech recognition result is generated by performing speech recognition processing on an input uttered voice, and the machine translation result is obtained by machine-translating the speech recognition result from a first language into a second language. An interpretation device for generating is disclosed.
特開2016-206929号公報JP 2016-206929 A
 一般的に、音声認識結果が文の途中で終わっている場合、当該音声認識結果を機械翻訳すると、誤解を生むような機械翻訳結果を生成してしまうことがある。そこで、認識結果に対してより正確な翻訳を行うことが望まれている。 In general, if the speech recognition result ends in the middle of a sentence, machine translation of the speech recognition result may produce misleading machine translation results. Therefore, it is desired to translate the recognition result more accurately.
 本開示の一側面に係る翻訳装置は、音声認識結果又は文字認識結果の文字列である認識文字列を取得する取得部と、認識文字列の文末を検出する検出部と、認識文字列のうち文末までを翻訳する翻訳部と、を備える。 A translation device according to one aspect of the present disclosure includes an acquisition unit that acquires a recognized character string that is a character string of a speech recognition result or a character recognition result, a detection unit that detects the end of the recognized character string, and and a translation unit for translating up to the end of a sentence.
 このような側面においては、認識文字列のうち検出された文末までが翻訳される。これにより例えば、翻訳対象が文の途中で終わっておらず、文末までの文であるため、認識結果に対してより正確な翻訳を行うことができる。 In this aspect, up to the end of the detected sentence in the recognized character string is translated. As a result, for example, since the translation target does not end in the middle of the sentence, but is the sentence up to the end of the sentence, more accurate translation can be performed for the recognition result.
 本開示の一側面によれば、認識結果に対してより正確な翻訳を行うことができる。 According to one aspect of the present disclosure, more accurate translation can be performed on recognition results.
実施形態に係る翻訳装置を含む翻訳システムのシステム構成の一例を示す図である。It is a figure showing an example of a system configuration of a translation system containing a translation device concerning an embodiment. 実施形態に係る翻訳装置の機能構成の一例を示す図である。It is a figure showing an example of functional composition of a translation device concerning an embodiment. 学習データの一例を示す図である。It is a figure which shows an example of learning data. 学習データの生成方法の一例を示す図である。It is a figure which shows an example of the production|generation method of learning data. 系列ラベリングにおけるラベルが付与されている学習データの一例を示す図である。FIG. 4 is a diagram showing an example of learning data to which labels have been added in series labeling; 実施形態に係る翻訳装置が実行する翻訳処理の一例を示すフローチャートである。4 is a flow chart showing an example of translation processing executed by the translation device according to the embodiment; 実施形態に係る翻訳装置が実行する翻訳処理の別の一例を示すフローチャートである。9 is a flowchart showing another example of translation processing executed by the translation device according to the embodiment; 実施形態に係る翻訳装置が実行する翻訳処理のさらに別の一例を示すフローチャートである。9 is a flowchart showing yet another example of translation processing executed by the translation device according to the embodiment; 実施形態に係る翻訳装置による翻訳の挙動例(その1)を示す図である。FIG. 4 is a diagram showing an example (part 1) of translation behavior by the translation device according to the embodiment; 実施形態に係る翻訳装置による翻訳の挙動例(その2)を示す図である。FIG. 10 is a diagram showing an example (part 2) of behavior of translation by the translation device according to the embodiment; 実施形態に係る翻訳装置による翻訳の挙動例(その3)を示す図である。FIG. 10 is a diagram showing an example (part 3) of translation behavior by the translation device according to the embodiment; 実施形態に係る翻訳装置による翻訳の挙動例(その4)を示す図である。FIG. 11 is a diagram showing an example (part 4) of translation behavior by the translation device according to the embodiment; 従来の翻訳の挙動例(その1)を示す図である。It is a figure which shows the behavior example (1) of the conventional translation. 従来の翻訳の挙動例(その2)を示す図である。It is a figure which shows the behavior example (2) of the conventional translation. 従来の逐次翻訳の挙動例(その1)を示す図である。FIG. 10 is a diagram showing an example (part 1) of behavior of conventional sequential translation; 従来の逐次翻訳の挙動例(その2)を示す図である。FIG. 10 is a diagram showing a behavior example (part 2) of conventional sequential translation; 従来の逐次翻訳の挙動例(その3)を示す図である。FIG. 10 is a diagram showing a behavior example (part 3) of conventional sequential translation; 従来の逐次翻訳の別の挙動例(その1)を示す図である。FIG. 10 is a diagram showing another behavior example (No. 1) of conventional sequential translation; 従来の逐次翻訳の別の挙動例(その2)を示す図である。FIG. 10 is a diagram showing another behavior example (No. 2) of conventional sequential translation; 実施形態に係る翻訳装置による翻訳の対象を示す図である。FIG. 4 is a diagram showing a target of translation by the translation device according to the embodiment; 実施形態に係る翻訳装置で用いられるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer used with the translation apparatus which concerns on embodiment.
 以下、図面を参照しながら本開示での実施形態を詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。また、以下の説明における本開示での実施形態は、本発明の具体例であり、特に本発明を限定する旨の記載がない限り、これらの実施形態に限定されないものとする。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and overlapping descriptions are omitted. In addition, the embodiments of the present disclosure in the following description are specific examples of the present invention, and the present invention is not limited to these embodiments unless specifically stated to limit the present invention.
 図1は、実施形態に係る翻訳装置1を含む翻訳システム3のシステム構成の一例を示す図である。図1に示す通り、翻訳システム3は、翻訳装置1と認識装置2とを含んで構成される。翻訳装置1と認識装置2とはインターネット等のネットワークによって互いに通信接続され、互いに情報を送受信可能である。 FIG. 1 is a diagram showing an example of the system configuration of the translation system 3 including the translation device 1 according to the embodiment. As shown in FIG. 1 , the translation system 3 includes a translation device 1 and a recognition device 2 . The translation device 1 and the recognition device 2 are connected for communication with each other via a network such as the Internet, and can exchange information with each other.
 翻訳装置1は、音声認識結果又は文字認識結果の文字列である認識文字列を翻訳するコンピュータ装置である。音声認識結果とは、音声認識の結果である。音声認識とは、人間の声などをコンピュータに認識させ、文字列に変換する技術である。文字認識結果とは、文字認識の結果である。文字認識とは、活字又は手書きテキストの画像などをコンピュータに認識させ、文字列に変換する技術である。本実施形態における音声認識及び文字認識は、既存技術を用いる。文字列とは、1つ以上の文字が連なった集合である。認識文字列は、音声認識結果の文字列、又は、文字認識結果の文字列の少なくとも一方を含む文字列である。 The translation device 1 is a computer device that translates a recognized character string, which is a character string resulting from speech recognition or character recognition. A voice recognition result is a result of voice recognition. Speech recognition is a technology that allows a computer to recognize human voices and convert them into character strings. A character recognition result is a result of character recognition. Character recognition is a technique for making a computer recognize images of printed characters or handwritten text and convert them into character strings. Existing technologies are used for speech recognition and character recognition in this embodiment. A string is a set of one or more characters. A recognized character string is a character string including at least one of a character string resulting from speech recognition and a character string resulting from character recognition.
 翻訳とは、第1言語で表された文字列を、第1言語とは異なる言語である第2言語に置き換えて表すことである。第1言語は、例えば日本語であるが、他のいかなる言語であってもよい。第2言語は、例えば英語であるが、他のいかなる言語であってもよい。第1言語と第2言語とは異なる地方の方言(例えば日本における標準語と関西弁)であってもよい。言語は、自然言語に限らず、人工言語又は形式言語(コンピュータのプログラム言語など)などであってもよい。翻訳は、例えば、コンピュータを利用した自動翻訳である機械翻訳である。翻訳装置1の詳細については後述する。  Translation means replacing a character string expressed in a first language with a second language that is different from the first language. The first language is, for example, Japanese, but may be any other language. The second language is, for example, English, but may be any other language. The first language and the second language may be different local dialects (for example, standard Japanese and Kansai dialect in Japan). The language is not limited to natural language, but may be artificial language or formal language (such as computer programming language). The translation is, for example, machine translation, which is automatic translation using a computer. The details of the translation device 1 will be described later.
 認識装置2は、音声認識又は文字認識を行う機能を備えるコンピュータ装置である。例えば、認識装置2は、人間の声を(人間がしゃべるに従って)リアルタイムに入力し、音声認識を行い、生成された認識文字列をネットワークを介して翻訳装置1に送信する。また例えば、認識装置2は、手書きテキストの画像を(人間が手書きを行うに従って)リアルタイムに入力し、文字認識を行い、生成された認識文字列をネットワークを介して翻訳装置1に送信する。認識文字列を受信した翻訳装置1は、当該認識文字列を後述の機能ブロックにて利用する。 The recognition device 2 is a computer device equipped with a function to perform voice recognition or character recognition. For example, the recognition device 2 inputs a human voice (according to the person speaking) in real time, performs voice recognition, and transmits the generated recognition character string to the translation device 1 via the network. Further, for example, the recognition device 2 inputs an image of handwritten text in real time (as a person handwrites the text), performs character recognition, and transmits the generated recognized character string to the translation device 1 via the network. Upon receiving the recognized character string, the translation device 1 uses the recognized character string in a function block described later.
 なお、上述の認識装置2の機能は翻訳装置1に組み込まれ、翻訳装置1にて同様の処理がなされてもよい。すなわち、翻訳装置1が音声認識又は文字認識を行う機能を備え、翻訳装置1にて音声認識又は文字認識が行われ、生成された認識文字列を翻訳装置1の後述の機能ブロックにて利用してもよい。 Note that the functions of the recognition device 2 described above may be incorporated into the translation device 1, and the same processing may be performed in the translation device 1. That is, the translation device 1 has a function of performing voice recognition or character recognition, the voice recognition or character recognition is performed by the translation device 1, and the generated recognized character string is used in the function blocks of the translation device 1, which will be described later. may
 図2は、実施形態に係る翻訳装置1の機能構成の一例を示す図である。図2に示す通り、翻訳装置1は、格納部10、学習部11、取得部12(取得部)、検出部13(検出部)及び翻訳部14(翻訳部)を含んで構成される。 FIG. 2 is a diagram showing an example of the functional configuration of the translation device 1 according to the embodiment. As shown in FIG. 2, the translation device 1 includes a storage unit 10, a learning unit 11, an acquisition unit 12 (acquisition unit), a detection unit 13 (detection unit), and a translation unit 14 (translation unit).
 翻訳装置1の各機能ブロックは、翻訳装置1内にて機能することを想定しているが、これに限るものではない。例えば、翻訳装置1の機能ブロックの一部は、翻訳装置1とは異なるコンピュータ装置であって、翻訳装置1とネットワーク接続されたコンピュータ装置内において、翻訳装置1と情報を適宜送受信しつつ機能してもよい。また、翻訳装置1の一部の機能ブロックは無くてもよいし、複数の機能ブロックを一つの機能ブロックに統合してもよいし、一つの機能ブロックを複数の機能ブロックに分解してもよい。 Each functional block of the translation device 1 is assumed to function within the translation device 1, but is not limited to this. For example, some of the functional blocks of the translation device 1 are computer devices different from the translation device 1, and function while appropriately transmitting and receiving information to and from the translation device 1 within the computer device connected to the translation device 1 via a network. may Also, some functional blocks of the translation apparatus 1 may be omitted, a plurality of functional blocks may be integrated into one functional block, or one functional block may be decomposed into a plurality of functional blocks. .
 以下、図2に示す翻訳装置1の各機能について説明する。 Each function of the translation device 1 shown in FIG. 2 will be described below.
 格納部10は、翻訳装置1における算出などで利用される任意の情報及び翻訳装置1における算出の結果などを格納する。格納部10によって格納された情報は、翻訳装置1の各機能によって適宜参照されてもよい。 The storage unit 10 stores arbitrary information used in calculations in the translation device 1, calculation results in the translation device 1, and the like. The information stored by the storage unit 10 may be referred to by each function of the translation device 1 as appropriate.
 格納部10は、文を区切る記号である文区切り記号(又は文末記号)のない文字列を入力すると文区切り記号(又は文末記号)が挿入された文字列を出力する文末記号挿入モデルを格納してもよい。文区切り記号の一例として、日本語の場合、「、」、「。」、「!」及び「?」などが挙げられる。例えば、文末記号挿入モデルは、文区切り記号のない文字列「さて会議を始めます」を入力すると文区切り記号が挿入された文字列「さて、会議を始めます。」を出力する。文末記号挿入モデルは、既存技術によって生成されてもよい。 The storage unit 10 stores a sentence ending symbol insertion model for outputting a character string in which a sentence delimiting symbol (or a sentence ending symbol) is inserted when a character string without a sentence delimiting symbol (or a sentence ending symbol) is input. may Examples of sentence delimiters in Japanese include ",", ".", "!", and "?". For example, in the sentence ending mark insertion model, when a character string without a sentence delimiter is input, a character string with a sentence delimiter inserted is output. The sentence ending insertion model may be generated by existing technology.
 文末記号挿入モデルは、文区切り記号(又は文末記号)のない文字列と文区切り記号(又は文末記号)のある文字列との組である学習データに基づいて(機械)学習された学習済みモデルであってもよい。 A sentence ending mark insertion model is a trained model that has been (machine) learned based on training data, which is a set of character strings without sentence delimiters (or sentence ending marks) and strings with sentence delimiters (or sentence ending marks). may be
 図3は、学習データの一例を示す図である。図3に示す学習データでは、文区切り記号のない文字列と文区切り記号のある文字列とが組として対応付いている。なお、図3に示す学習データでは、1つの文(文章)の全部又は一部の例を挙げているが、これに限るものではなく、例えば2つ以上の文の全部又は一部であってもよい。学習データにおいて、文区切り記号のない文字列を入力データとして捉え、文区切り記号のある文字列を教師データとして捉えることができる。 FIG. 3 is a diagram showing an example of learning data. In the learning data shown in FIG. 3, character strings without sentence delimiters and character strings with sentence delimiters are associated as pairs. The learning data shown in FIG. 3 exemplifies all or part of one sentence (sentence), but the present invention is not limited to this. For example, all or part of two or more sentences good too. In learning data, character strings without sentence delimiters can be regarded as input data, and character strings with sentence delimiters can be regarded as teacher data.
 学習データは、文区切り記号(又は文末記号)のある文字列の中から取り出した一部の文字列である取出文字列から文区切り記号(又は文末記号)を除去した文字列と当該取出文字列との組であってもよい。取出文字列は、例えば、文区切り記号(又は文末記号)のある文字列を単語単位で区切り、ランダムな位置で分割することで得られる一部の文字列であってもよい。 The learning data is a character string obtained by removing the sentence delimiter (or sentence ending symbol) from the extracted character string, which is a part of the character string extracted from the character string with the sentence delimiter (or sentence ending symbol), and the extracted character string. It may be a pair with The extracted character string may be, for example, a partial character string obtained by dividing a character string with a sentence delimiter (or a sentence end symbol) into word units and dividing at random positions.
 図4は、学習データの生成方法の一例を示す図である。図4に示すように、文区切り記号のある文字列である元データ「さて、会議を始めます。」から、4つの取出文字列「さて、会議を始めます。」、「さて、」、「会議を」及び「始めます。」が取り出されている。そして、4つの取出文字列それぞれに対して、文区切り記号を除去した文字列「さて会議を始めます」、「さて」、「会議を」及び「始めます」が生成されている。図4において、「さて会議を始めます」と「さて、会議を始めます。」との組、「さて」と「さて、」との組、「会議を」と「会議を」との組、及び、「始めます」と「始めます。」との組が学習データである。 FIG. 4 is a diagram showing an example of a method of generating learning data. As shown in FIG. 4, from the original data, which is a character string with a sentence delimiter, "Well, the meeting will begin.", four extracted character strings "Well, the meeting will begin." "Conference" and "Begin" are taken out. Then, for each of the four extracted character strings, the character strings "Now, the meeting will begin", "Well", "Meeting", and "Start" are generated from which the sentence delimiters are removed. In FIG. 4, a set of "Well, let's start the meeting" and "Well, let's start the meeting", a set of "Now" and "Well," Also, a set of "begin" and "begin" is learning data.
 学習データに含まれる文区切り記号(又は文末記号)のある文字列は、当該文字列を構成する単語ごとに、次に文区切り記号(又は文末記号)が来るか否かを示す系列ラベリングにおけるラベルが付与されていてもよい。その場合、学習データに含まれる文区切り記号(又は文末記号)のない文字列は、単語単位で分割されていてもよい。系列ラベリングにおけるラベルが付与された学習データを用いることで、どの単語の後にどの文区切り記号が入るかを予測する系列ラベリングタスクとして機械学習を行うことができる。 Strings with sentence delimiters (or sentence-ending symbols) included in the training data are labels in sequence labeling that indicate whether sentence delimiters (or sentence-ending symbols) come next for each word that composes the string. may be given. In that case, a character string without a sentence delimiter (or sentence ending symbol) included in the learning data may be divided into words. By using the labeled learning data in sequence labeling, machine learning can be performed as a sequence labeling task to predict which sentence delimiter will follow which word.
 図5は、系列ラベリングにおけるラベルが付与されている学習データの一例を示す図である。図5において、例えば、文区切り記号のある文字列「さて、会議を始めます。」について、単語「さて」には次に読点が来ることを示すラベル「<COMMA>」が付与され、単語「会議」及び「を」にはそれぞれ次に文区切り記号が来ないことを示すラベル「<O>」が付与され、単語「始めます」には次に句点が来ることを示すラベル「<PERIOD>」が付与されている。 FIG. 5 is a diagram showing an example of labeled learning data in series labeling. In FIG. 5, for example, with respect to a character string with a sentence delimiter, "Now, let's start a meeting." A label "<O>" indicating that there is no sentence delimiter next to "meeting" and "o" is given, and a label "<PERIOD>" indicating that a full stop comes next to the word "begin". ” is given.
 学習部11は、文末記号挿入モデルを生成する。より具体的には、学習部11は、文区切り記号(又は文末記号)のない文字列と文区切り記号(又は文末記号)のある文字列との組である学習データに基づいて(機械)学習を行い、学習済みモデルとして文末記号挿入モデルを生成する。その他に、学習部11は、上記で説明した各種の学習データに基づいて(機械)学習を行って文末記号挿入モデルを生成してもよい。また、学習部11は、上記で説明した学習データの生成方法などに基づいて学習データ自体を生成してもよい。 The learning unit 11 generates a sentence ending symbol insertion model. More specifically, the learning unit 11 performs (machine) learning based on learning data that is a set of a character string without a sentence delimiter (or a sentence end symbol) and a character string with a sentence delimiter (or a sentence end symbol). and generate a sentence ending mark insertion model as a trained model. In addition, the learning unit 11 may perform (machine) learning based on the various types of learning data described above to generate a sentence ending symbol insertion model. Further, the learning unit 11 may generate the learning data itself based on the method of generating the learning data described above.
 学習部11は、生成した文末記号挿入モデルを格納部10によって格納させる。なお、格納部10によって格納された文末記号挿入モデルは、学習部11によって生成されたものではなく、他の装置で同様に生成されたものをネットワークを介して取得したものであってもよい。 The learning unit 11 causes the storage unit 10 to store the generated sentence ending symbol insertion model. Note that the sentence ending symbol insertion model stored in the storage unit 10 may not be generated by the learning unit 11, but may be generated by another device and obtained via a network.
 取得部12は、音声認識結果又は文字認識結果の文字列である認識文字列を取得する。取得部12は、認識装置2から送信された認識文字列を受信(取得)してもよいし、翻訳装置1に備えられた音声認識又は文字認識を行う機能が生成した認識文字列を取得してもよいし、格納部10によって予め格納された認識文字列を取得してもよい。取得部12は、取得した認識文字列を検出部13及び翻訳部14に出力してもよいし、格納部10によって格納させてもよいし、後述の通信装置1004又は出力装置1006を介して翻訳装置1のユーザに表示(出力)してもよいし、他の装置に送信(出力)してもよい。 The acquisition unit 12 acquires a recognized character string, which is a character string resulting from speech recognition or character recognition. The acquisition unit 12 may receive (acquire) the recognized character string transmitted from the recognition device 2, or acquire the recognized character string generated by the speech recognition or character recognition function provided in the translation device 1. Alternatively, a recognized character string stored in advance by the storage unit 10 may be obtained. The acquisition unit 12 may output the acquired recognized character string to the detection unit 13 and the translation unit 14, may store the acquired recognition character string in the storage unit 10, or may translate the acquired recognition character string through a communication device 1004 or an output device 1006, which will be described later. It may be displayed (output) to the user of the device 1 or may be transmitted (output) to another device.
 取得部12は、音声認識又は文字認識を行う機能が認識文字列を出力するたびに当該認識文字列を取得してもよい。例えば、認識装置2(又は翻訳装置1)が備える音声認識又は文字認識を行う機能が、人間の声を(人間がしゃべるに従って)リアルタイムに入力すること、又は、手書きテキストの画像を(人間が手書きを行うに従って)リアルタイムに入力することで認識文字列を逐次(順次)出力するたびに、取得部12は当該認識文字列を逐次(順次)取得してもよい。取得部12は、認識文字列を逐次(順次)取得するたびに、取得した認識文字列を検出部13及び翻訳部14に逐次(順次)出力してもよいし、格納部10によって逐次(順次)格納させてもよい。 The acquisition unit 12 may acquire the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string. For example, the speech recognition or character recognition function of the recognition device 2 (or the translation device 1) can input human voice in real time (according to human speech), or input a handwritten text image (human handwriting ), the acquisition unit 12 may sequentially (sequentially) acquire the recognized character string each time the recognized character string is sequentially (sequentially) output by inputting in real time. The acquiring unit 12 may sequentially (sequentially) output the acquired recognized character string to the detecting unit 13 and the translating unit 14 each time it sequentially (sequentially) acquires the recognized character string. ) may be stored.
 検出部13は、認識文字列の文末を検出(判定)する。より具体的には、検出部13は、取得部12によって取得(出力)された認識文字列の文末を検出する。例えば、検出部13は、認識文字列に文末が含まれるか否かを検出し、文末が含まれる場合は認識文字列中の文末の位置(例えば認識文字列の先頭から何文字目か)を検出する。また例えば、検出部13は、認識文字列中の文末の位置(例えば認識文字列の先頭から何文字目か)を検出し、文末が含まれなかった場合はその旨の検出を行う。検出部13による文字列(認識文字列)中の文末の検出方法は、既存技術を用いてもよい。例えば、検出部13は、認識文字列に含まれる文末記号に基づいて当該認識文字列の文末を検出してもよい。 The detection unit 13 detects (determines) the end of the recognized character string. More specifically, the detection unit 13 detects the end of the recognized character string acquired (output) by the acquisition unit 12 . For example, the detection unit 13 detects whether or not the end of the sentence is included in the recognized character string, and if the end of the sentence is included, the position of the end of the sentence in the recognized character string (for example, how many characters from the beginning of the recognized character string) is detected. To detect. Further, for example, the detection unit 13 detects the position of the end of the sentence in the recognized character string (for example, what character it is from the beginning of the recognized character string), and detects that the end of the sentence is not included. An existing technique may be used as a method for detecting the end of a sentence in a character string (recognition character string) by the detection unit 13 . For example, the detection unit 13 may detect the end of the recognized character string based on the end-of-sentence symbol included in the recognized character string.
 検出部13は、認識文字列に複数(2つ以上)の文末が含まれている場合、当該複数の文末を検出してもよいし、認識文字列の先頭に最も近い文末を文末として検出してもよいし、認識文字列の末尾に最も近い文末を文末として検出してもよいし、所定の基準を満たす文末を文末として検出してもよい。 If the recognized character string includes a plurality of (two or more) sentence endings, the detecting unit 13 may detect the plurality of sentence endings, or detect the sentence ending closest to the beginning of the recognized character string as the sentence ending. Alternatively, the end of the sentence closest to the end of the recognized character string may be detected as the end of the sentence, or the end of the sentence that satisfies a predetermined criterion may be detected as the end of the sentence.
 検出部13は、検出結果を、翻訳部14に出力してもよいし、格納部10によって格納させてもよい。検出結果は、例えば、認識文字列中の文末の位置(複数可)に関する情報、及び、認識文字列に文末が含まれるか否かの情報などが含まれていてもよい。 The detection unit 13 may output the detection result to the translation unit 14 or store it in the storage unit 10 . The detection result may include, for example, information regarding the position (or plural positions) of the end of the sentence in the recognized character string and information as to whether or not the end of the sentence is included in the recognized character string.
 検出部13は、文末記号のない文字列を入力すると文末記号が挿入された文字列を出力する文末記号挿入モデルに、文末記号を除去した認識文字列を入力して得られる文字列である出力文字列に基づいて、認識文字列の文末を検出してもよい。より具体的には、検出部13は、格納部10によって予め格納された文末記号挿入モデルに、文末記号を除去した認識文字列を入力して得られる出力文字列に基づいて、認識文字列の文末を検出する。例えば、検出部13は、文末記号挿入モデルに、認識文字列「さて会議を。始めます」から文末記号を除去した「さて会議を始めます」を入力して得られる出力文字列「さて、会議を始めます。」に基づき、認識文字列のうち「す」が文末があること(又は「す」の直後に文末があること)を検出する。 The detection unit 13 outputs a character string obtained by inputting a recognized character string from which a sentence ending symbol is removed into a sentence ending symbol insertion model that outputs a character string with a sentence ending symbol inserted when a character string without a sentence ending symbol is input. The end of the recognized character string may be detected based on the character string. More specifically, the detecting unit 13 detects the recognition character string based on the output character string obtained by inputting the recognition character string from which the sentence ending symbol is removed into the sentence ending symbol insertion model stored in advance by the storage unit 10. Detect the end of a sentence. For example, the detection unit 13 inputs an output character string "Well, meeting. is the end of the sentence (or the end of the sentence immediately after "su") in the recognized character string is detected.
 検出部13は、認識文字列の末尾から所定の文字数以内にある文末は、文末として検出しなくてもよい。例えば、認識文字列が「ありがたいです。けれども」であり、所定の文字数が5文字である場合、検出部13は、認識文字列のうち「す」が文末があること(又は「す」の直後に文末があること)を検出しない。 The detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is five, the detection unit 13 detects that "su" in the recognized character string is at the end of the sentence (or immediately after "su"). end of sentence) is not detected.
 検出部13は、認識文字列のうち末尾から所定の文字数以内を除く部分の文末を検出してもよい。例えば、認識文字列が「ありがたいです。けれども」であり、所定の文字数が6文字である場合、検出部13は、認識文字列のうち末尾から6文字以内の部分「す。けれども」を除く部分である「ありがたいで」の文末を検出する(結果的に文末は検出できず)。 The detection unit 13 may detect the end of the part of the recognized character string that excludes a predetermined number of characters from the end. For example, if the recognized character string is "thank you, but" and the predetermined number of characters is 6, the detection unit 13 detects the portion of the recognized character string within six characters from the end excluding "su. Detects the end of the sentence "thank you" (resultingly, the end of the sentence cannot be detected).
 検出部13は、認識文字列の文末を検出した際に、当該認識文字列の当該検出箇所に文末記号を挿入してもよい。例えば、検出部13は、認識文字列「さて会議を始めます」のうち「す」が文末であることを検出した際に、当該「す」の直後に文末記号「。」を挿入し、挿入後の認識文字列を「さて会議を始めます。」とする。検出部13は、挿入後の認識文字列を翻訳部14に出力してもよい。 When the detection unit 13 detects the end of the recognized character string, the detection unit 13 may insert an end-of-sentence symbol into the detected part of the recognized character string. For example, when the detection unit 13 detects that "su" in the recognition character string "Now, the meeting will start" is the end of the sentence, the detection unit 13 inserts the sentence ending symbol "." Assume that the subsequent recognition character string is "Well, let's start the meeting." The detection unit 13 may output the inserted recognized character string to the translation unit 14 .
 検出部13は、認識文字列の文末を検出した際に、当該認識文字列を(文末記号挿入モデルにより得られる)出力文字列に置き換えてもよい。例えば、文末記号挿入モデルに、認識文字列「さて会議を。始めます」から文末記号を除去した「さて会議を始めます」を入力して出力文字列「さて、会議を始めます。」が得られた場合、検出部13は、当該認識文字列「さて会議を。始めます」を当該出力文字列「さて、会議を始めます。」に置き換える。検出部13は、置き換え後の認識文字列を翻訳部14に出力してもよい。 When the detection unit 13 detects the end of the recognized character string, the detection unit 13 may replace the recognized character string with the output character string (obtained by the sentence-end symbol insertion model). For example, by inputting the recognition character string ``Well, the meeting begins.'' to the sentence ending mark insertion model, ````Well, the meeting begins. If so, the detection unit 13 replaces the recognized character string "Well, the meeting will start." with the output character string "Well, the meeting will start.". The detection unit 13 may output the recognized character string after replacement to the translation unit 14 .
 翻訳部14は、認識文字列のうち(認識文字列の先頭から)文末までを翻訳する。より具体的には、翻訳部14は、取得部12によって取得(出力)された認識文字列のうち、検出部13によって(当該認識文字列に対して)検出された文末(検出部13から出力された検出結果に基づく文末)までを翻訳する。翻訳は、機械翻訳など、既存技術を用いてもよい。翻訳部14は、翻訳結果を、後述の通信装置1004又は出力装置1006を介して翻訳装置1のユーザに表示(出力)してもよいし、他の装置に送信(出力)してもよいし、格納部10によって格納させてもよい。 The translation unit 14 translates the recognized character string (from the beginning of the recognized character string) to the end of the sentence. More specifically, the translation unit 14 extracts the sentence end (output from the detection unit 13) detected by the detection unit 13 (with respect to the recognized character string) in the recognized character string acquired (output) by the acquisition unit 12. Translate up to the end of the sentence) based on the detected result. Existing technology such as machine translation may be used for the translation. The translation unit 14 may display (output) the translation result to the user of the translation device 1 via the communication device 1004 or the output device 1006, which will be described later, or transmit (output) it to another device. , may be stored by the storage unit 10 .
 翻訳部14は、取得部12が認識文字列を取得するたびに、当該認識文字列のうち検出部13によって検出された当該認識文字列の文末までを翻訳してもよい。すなわち、取得部12が認識文字列を取得するたびに、検出部13及び翻訳部14の処理を実行して、当該認識文字列の文末までを翻訳してもよい。 The translation unit 14 may translate up to the end of the recognized character string detected by the detection unit 13 in the recognized character string each time the acquisition unit 12 acquires the recognized character string. That is, each time the acquisition unit 12 acquires a recognized character string, the processes of the detection unit 13 and the translation unit 14 may be executed to translate the recognized character string up to the end of the sentence.
 翻訳部14は、検出部13によって認識文字列の文末が検出されなかった場合、当該認識文字列を翻訳しなくてもよい。より具体的には、翻訳部14は、検出部13によって出力された検出結果が、認識文字列に文末が含まれていない旨の情報が含まれていた場合、当該認識文字列を翻訳しない(翻訳をスキップする)。 If the detection unit 13 does not detect the end of the recognized character string, the translation unit 14 does not have to translate the recognized character string. More specifically, when the detection result output by the detection unit 13 includes information indicating that the recognized character string does not include the end of the sentence, the translation unit 14 does not translate the recognized character string ( skip translation).
 翻訳部14は、文末が認識文字列の末尾から所定の文字数以内にある場合、当該認識文字列を翻訳しなくてもよい。より具体的には、翻訳部14は、取得部12によって取得(出力)された認識文字列に対して検出部13によって検出された文末(検出部13から出力された検出結果に基づく文末)が、当該認識文字列の末尾から所定の文字数以内にある場合、当該認識文字列を翻訳しなくてもよい。例えば、認識文字列が「ありがたいです。けれども」であり、検出された文末が「。」であり、所定の文字数が5文字である場合、翻訳部14は、文末「。」が認識文字列「ありがたいです。けれども」の末尾から5文字以内にあるため、当該認識文字列を翻訳しない。 The translation unit 14 does not need to translate the recognized character string if the end of the sentence is within a predetermined number of characters from the end of the recognized character string. More specifically, the translation unit 14 determines that the end of the sentence detected by the detection unit 13 for the recognized character string acquired (output) by the acquisition unit 12 (the end of the sentence based on the detection result output from the detection unit 13) is , if it is within a predetermined number of characters from the end of the recognized string, the recognized string may not be translated. For example, if the recognized character string is "thank you. But", the detected end of the sentence is ".", and the predetermined number of characters is five, the translation unit 14 determines that the end of the sentence "." However, since it is within 5 characters from the end of ", the recognized character string is not translated.
 翻訳部14は、文末記号が挿入された認識文字列のうち当該文末記号までを翻訳してもよい。例えば、翻訳部14は、検出部13によって出力された挿入後の認識文字列「さて会議を始めます。」について、文末記号「。」までの「さて会議を始めます。」を翻訳する。 The translation unit 14 may translate up to the sentence ending symbol in the recognized character string into which the sentence ending symbol is inserted. For example, the translation unit 14 translates the inserted recognized character string “Now, the meeting is starting.”
 翻訳部14は、置き換えられた認識文字列のうち文末記号までを翻訳してもよい。例えば、翻訳部14は、検出部13によって出力された置き換え後の認識文字列「さて会議を始めます。」について、文末記号「。」までの「さて会議を始めます。」を翻訳する。 The translation unit 14 may translate up to the end of the sentence in the replaced recognition character string. For example, the translation unit 14 translates the replaced recognized character string “Now, the meeting is starting.”
 以上、翻訳装置1の各機能について説明した。 Each function of the translation device 1 has been described above.
 続いて、図6~図8を参照しながら、翻訳装置1が実行する処理(翻訳処理)のいくつかの例を説明する。 Next, some examples of the processing (translation processing) executed by the translation device 1 will be described with reference to FIGS. 6 to 8. FIG.
 図6は、翻訳装置1が実行する翻訳処理の一例を示すフローチャートである。まず、取得部12が、認識文字列を取得する(ステップS1)。次に、検出部13が、S1にて取得された認識文字列の文末を検出する(ステップS2)。次に、翻訳部14が、S1にて取得された認識文字列のうち、S2にて検出された文末までを翻訳する(ステップS3)。 FIG. 6 is a flowchart showing an example of translation processing executed by the translation device 1. FIG. First, the acquisition unit 12 acquires a recognized character string (step S1). Next, the detection unit 13 detects the end of the recognized character string acquired in S1 (step S2). Next, the translation unit 14 translates the recognized character string acquired in S1 up to the end of the sentence detected in S2 (step S3).
 図7は、翻訳装置1が実行する翻訳処理の別の一例(逐次処理)を示すフローチャートである。まず、取得部12が、認識文字列を取得する(取得を試みる)(ステップS10)。次に、取得部12(又は翻訳装置1)が、S10にて認識文字列を取得できたか否かを判定する(ステップS11)。S11にて取得できなかったと判定された場合(S11:NO)、再度S10に戻る。一方、S11にて取得できたと判定された場合(S11:YES)、検出部13が、S10にて取得された認識文字列の文末を検出する(検出を試みる)(ステップS12)。次に、検出部13(又は翻訳装置1)が、S12にて文末を検出できたか否かを判定する(ステップS13)。S13にて検出できなかったと判定された場合(S13:NO)、(翻訳部14による翻訳を行わずに)再度S10に戻る。一方、S13にて検出できたと判定された場合(S13:YES)、翻訳部14が、S10にて取得された認識文字列のうち、S12にて検出された文末までを翻訳し(ステップS14)、再度S10に戻る。 FIG. 7 is a flowchart showing another example of translation processing (sequential processing) executed by the translation apparatus 1. FIG. First, the acquisition unit 12 acquires (attempts to acquire) a recognized character string (step S10). Next, the acquiring unit 12 (or the translation device 1) determines whether or not the recognized character string has been acquired in S10 (step S11). If it is determined in S11 that the information could not be obtained (S11: NO), the process returns to S10. On the other hand, if it is determined that the character string has been acquired in S11 (S11: YES), the detection unit 13 detects (attempts to detect) the end of the recognized character string acquired in S10 (step S12). Next, the detection unit 13 (or the translation device 1) determines whether or not the end of the sentence has been detected in S12 (step S13). If it is determined in S13 that detection was not possible (S13: NO), the process returns to S10 (without translation by the translation unit 14). On the other hand, if it is determined in S13 that detection was possible (S13: YES), the translation unit 14 translates the recognized character string acquired in S10 up to the end of the sentence detected in S12 (step S14). , the process returns to S10 again.
 図8は、翻訳装置1が実行する翻訳処理のさらに別の一例(逐次処理かつ末尾判定)を示すフローチャートである。S20~S23は、それぞれ図7のS10~S13と同様のため説明を省略する。S23にて検出できたと判定された場合(S23:YES)、検出部13又は翻訳部14が、S22にて検出された文末がS20にて取得された認識文字列の末尾からn文字数(nは1以上の整数)以内にないか否かを判定する(ステップS24)。S24にてn文字数以内にあると判定された場合(S24:NO)、(翻訳部14による翻訳を行わずに)再度S20に戻る。一方、S24にてn文字数以内にないと判定された場合(S24:YES)、翻訳部14が、S20にて取得された認識文字列のうち、S22にて検出された文末までを翻訳し(ステップS25)、再度S20に戻る。 FIG. 8 is a flow chart showing still another example of translation processing (sequential processing and end determination) executed by the translation apparatus 1. FIG. Since S20 to S23 are the same as S10 to S13 in FIG. 7 respectively, the description thereof is omitted. If it is determined in S23 that the detection was successful (S23: YES), the detection unit 13 or translation unit 14 determines that the end of the sentence detected in S22 is n characters from the end of the recognized character string acquired in S20 (n is Integer of 1 or more) is determined (step S24). If it is determined in S24 that the number of characters is within n characters (S24: NO), the process returns to S20 (without translation by the translation unit 14). On the other hand, if it is determined in S24 that the number of characters is not within n characters (S24: YES), the translation unit 14 translates the recognized character string acquired in S20 up to the end of the sentence detected in S22 ( Step S25), and return to S20 again.
 続いて、図9~図12を参照しながら、翻訳装置1による翻訳の挙動例を説明する。前提として、図中の吹き出しの上段が、取得部12によって取得及び表示された認識文字列を示し、下段が、翻訳部14によって翻訳及び表示された翻訳結果(翻訳文字列)を示す。なお、翻訳結果が「…」である場合、翻訳待ち(待機中)である旨を示す。 Next, an example of the behavior of translation by the translation device 1 will be described with reference to FIGS. 9 to 12. FIG. As a premise, the upper part of the balloon in the drawing shows the recognized character string acquired and displayed by the acquisition unit 12, and the lower part shows the translation result (translated character string) translated and displayed by the translation unit 14. In addition, when the translation result is "...", it indicates that it is waiting for translation (waiting).
 図9は、翻訳装置1による翻訳の挙動例(その1)を示す図である。図9に示す通り、取得部12は、認識文字列を順次取得し、表示している。具体的には、取得部12は、まず、取得した認識文字列「むかし、」を表示し(第1状態)、次に、取得した認識文字列「むかし、むかし、」を表示し(第2状態)、次に、取得した認識文字列「むかし、むかし、あるところに、おじいさんとおばあさん」を表示している(第3状態)。各状態において、検出部13による文末の検出が行われているが、文末が検出されないため、翻訳部14による翻訳は行われていない。 FIG. 9 is a diagram showing an example (part 1) of translation behavior by the translation device 1. FIG. As shown in FIG. 9, the acquisition unit 12 sequentially acquires and displays recognized character strings. Specifically, the acquiring unit 12 first displays the acquired recognized character string “mukashi,” (first state), and then displays the acquired recognized character string “mukashi, mukashi,” (second state). state), and then the acquired recognized character string "mukashi, mukashi, in a certain place, grandfather and grandmother" is displayed (third state). In each state, the detection unit 13 detects the end of the sentence, but the translation unit 14 does not perform translation because the end of the sentence is not detected.
 図10は、翻訳装置1による翻訳の挙動例(その2)を示す図である。図10は、図9(の丸囲みA)に続く挙動である。図10に示す通り、取得部12は、取得した認識文字列「むかし、むかし、あるところに、おじいさんとおばあさんが積んでいました。おじい」を表示している(第4状態)。第4状態においても、検出部13による文末の検出が行われているが、認識文字列の末尾から5文字数以内にある文末(図10中の文末記号「。」)は、文末として検出しないため、文末は検出されず、翻訳部14による翻訳は引き続き行われていない。 FIG. 10 is a diagram showing an example (part 2) of translation behavior by the translation device 1. FIG. FIG. 10 shows the behavior following FIG. 9 (encircled A). As shown in FIG. 10, the acquisition unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fourth state). In the fourth state, the detection unit 13 detects the end of the sentence, but the end of the sentence (the end of sentence symbol “.” in FIG. 10) within five characters from the end of the recognized character string is not detected as the end of the sentence. , the end of the sentence is not detected, and translation by the translation unit 14 is not continued.
 図11は、翻訳装置1による翻訳の挙動例(その3)を示す図である。図11は、図10(の丸囲みB)に続く挙動である。図11に示す通り、取得部12は、取得した認識文字列「むかし、むかし、あるところに、おじいさんとおばあさんが積んでいました。おじいさん」を追加で表示している(第5状態)。第5状態においては、検出部13により文末(図11中の文末記号「。」)が検出されたため(認識文字列の末尾から5文字数を超えた箇所に文末があるため)、翻訳部14による翻訳が行われる。具体的には、翻訳部14が、現在の認識文字列「むかし、むかし、あるところに、おじいさんとおばあさんが積んでいました。おじいさん」のうち文末までの「むかし、むかし、あるところに、おじいさんとおばあさんが積んでいました。」を翻訳した翻訳文字列「Once upon a time, there was an old man and an old woman who were piling up.」を下段に表示している。 FIG. 11 is a diagram showing an example (part 3) of translation behavior by the translation apparatus 1. FIG. FIG. 11 shows the behavior following FIG. 10 (encircled B). As shown in FIG. 11, the acquiring unit 12 additionally displays the acquired recognized character string "Once upon a time, an old man and an old woman were loading in a certain place. Grandfather" (fifth state). In the fifth state, the detection unit 13 detects the end of the sentence (the end of sentence symbol “.” in FIG. 11) (because the end of the sentence is beyond the number of five characters from the end of the recognized character string), the translation unit 14 translation is done. Specifically, the translation unit 14 uses the current recognized character string "mukashi, mukashi, in a certain place, where an old man and an old woman were piled up. Grandpa." The translation string "Once upon a time, there was an old man and an old woman who were piled up." is displayed at the bottom.
 図12は、翻訳装置1による翻訳の挙動例(その4)を示す図である。図12は、図11(の丸囲みC)に続く挙動である。図12に示す通り、取得部12は、取得した認識文字列「むかし、むかし、あるところに、おじいさんとおばあさんが住んでいました。おじいさんは山へ」を表示している(第6状態)。なお、第4状態及び第5状態では(認識側で)「積んで」と誤って認識されていた部分が、第6状態では「住んで」と正しく認識されている。第6状態においても、第5状態と同様に、検出部13により文末が検出されたため、翻訳部14による翻訳が行われる。具体的には、翻訳部14が、現在の認識文字列「むかし、むかし、あるところに、おじいさんとおばあさんが住んでいました。おじいさんは山へ」のうち文末までの「むかし、むかし、あるところに、おじいさんとおばあさんが住んでいました。」を翻訳した翻訳文字列「Once upon a time, there lived an old man and an old woman.」を下段に表示している。 FIG. 12 is a diagram showing an example (part 4) of translation behavior by the translation apparatus 1. FIG. FIG. 12 shows the behavior following FIG. 11 (encircled C). As shown in FIG. 12, the acquiring unit 12 displays the acquired recognized character string "Once upon a time, an old man and an old woman lived in a certain place. The old man went to the mountains" (state 6). In the fourth and fifth states, the portion that was erroneously recognized as "loaded" (on the recognition side) is correctly recognized as "live" in the sixth state. In the sixth state as well, similar to the fifth state, the detection unit 13 detects the end of the sentence, so translation is performed by the translation unit 14 . Specifically, the translation unit 14 converts the currently recognized character string "Once upon a time, in a certain place, an old man and an old woman lived together. The old man went to the mountains." An old man and an old woman lived in." is displayed at the bottom.
 続いて、実施形態に係る翻訳装置1の作用効果について説明する。 Next, the effects of the translation device 1 according to the embodiment will be explained.
 翻訳装置1によれば、音声認識結果又は文字認識結果の文字列である認識文字列を取得する取得部12と、認識文字列の文末を検出する検出部13と、認識文字列のうち文末までを翻訳する翻訳部14と、を備える。この構成により、例えば、翻訳対象が文の途中で終わっておらず、文末までの文であるため、認識結果に対してより正確な翻訳を行うことができる。 According to the translation device 1, the acquisition unit 12 acquires the recognized character string which is the character string of the speech recognition result or the character recognition result, the detection unit 13 detects the end of the recognized character string, and the end of the recognized character string. and a translation unit 14 for translating. With this configuration, for example, the translation target does not end in the middle of the sentence, but is the sentence up to the end of the sentence, so that the recognition result can be translated more accurately.
 また、翻訳装置1によれば、取得部12は、音声認識又は文字認識を行う機能が認識文字列を出力するたびに当該認識文字列を取得し、翻訳部14は、取得部12が認識文字列を取得するたびに、当該認識文字列のうち検出部13によって検出された当該認識文字列の文末までを翻訳してもよい。この構成により、例えば、認識文字列が出力されるたびに(文末が検出されれば)翻訳が行われるため、翻訳結果を早いタイミングで得ることができる。 Further, according to the translation device 1, the acquisition unit 12 acquires the recognized character string each time the function that performs speech recognition or character recognition outputs the recognized character string, and the translation unit 14 outputs the recognized character string. Each time a string is acquired, up to the end of the recognized character string detected by the detection unit 13 in the recognized character string may be translated. With this configuration, for example, translation is performed each time a recognized character string is output (if the end of a sentence is detected), so a translation result can be obtained at an early timing.
 また、翻訳装置1によれば、翻訳部14は、検出部13によって認識文字列の文末が検出されなかった場合、当該認識文字列を翻訳しなくてもよい。この構成により、例えば、文の途中で終わっている認識文字列は翻訳されないため、誤解を生むような翻訳結果を生成してしまうことを防ぐことができる。 Further, according to the translation device 1, the translation unit 14 does not need to translate the recognized character string when the detection unit 13 does not detect the end of the recognized character string. With this configuration, for example, a recognized character string that ends in the middle of a sentence is not translated, so that it is possible to prevent generation of misleading translation results.
 また、翻訳装置1によれば、検出部13は、認識文字列の末尾から所定の文字数以内にある文末は、文末として検出しなくてもよい。この構成により、誤認識された文末に基づく翻訳を防ぐことができ、それにより誤解を生むような翻訳結果を生成してしまうことを防ぐことができる。例えば、「ですけれども」という発話の「です」までが音声認識された場合、本来「ですけれども」で一つの用語ではあるものの、「です」で文末と検出されるのを防ぐことができる。 Also, according to the translation device 1, the detection unit 13 does not have to detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence. With this configuration, it is possible to prevent translation based on misrecognized sentence endings, thereby preventing generation of misleading translation results. For example, when speech recognition is performed up to "desu" in the utterance "desu but", although "desu but" is originally a single term, "desu" can be prevented from being detected as the end of a sentence.
 また、翻訳装置1によれば、検出部13は、認識文字列のうち末尾から所定の文字数以内を除く部分の文末を検出してもよい。この構成により、上述と同様に、誤認識された文末に基づく翻訳を防ぐことができ、それにより誤解を生むような翻訳結果を生成してしまうことを防ぐことができる。 Further, according to the translation device 1, the detection unit 13 may detect the end of the sentence of the part of the recognized character string that excludes a predetermined number of characters from the end. With this configuration, similarly to the above, it is possible to prevent translation based on the erroneously recognized end of a sentence, thereby preventing generation of misleading translation results.
 また、翻訳装置1によれば、翻訳部14は、文末が認識文字列の末尾から所定の文字数以内にある場合、当該認識文字列を翻訳しなくてもよい。この構成により、上述と同様に、誤認識された文末に基づく翻訳を防ぐことができ、それにより誤解を生むような翻訳結果を生成してしまうことを防ぐことができる。 Further, according to the translation device 1, the translation unit 14 does not need to translate the recognized character string when the end of the sentence is within a predetermined number of characters from the end of the recognized character string. With this configuration, similarly to the above, it is possible to prevent translation based on the erroneously recognized end of a sentence, thereby preventing generation of misleading translation results.
 また、翻訳装置1によれば、検出部13は、認識文字列の文末を検出した際に、当該認識文字列の当該検出箇所に文末記号を挿入し、翻訳部14は、文末記号が挿入された認識文字列のうち当該文末記号までを翻訳してもよい。この構成により、より確実に文末までの翻訳が可能になる。また、翻訳対象通りの文末記号が挿入された認識文字列は、その後の処理で有効活用することができる。 Further, according to the translation device 1, when the detection unit 13 detects the end of the sentence of the recognized character string, the detection unit 13 inserts the end-of-sentence symbol into the detected part of the recognized character string, and the translation unit 14 inserts the end-of-sentence symbol into Of the recognized character string, up to the end of the sentence may be translated. With this configuration, it is possible to more reliably translate up to the end of the sentence. In addition, the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.
 また、翻訳装置1によれば、検出部13は、文末記号のない文字列を入力すると文末記号が挿入された文字列を出力する文末記号挿入モデルに、文末記号を除去した認識文字列を入力して得られる文字列である出力文字列に基づいて、認識文字列の文末を検出してもよい。この構成により、文末記号挿入モデルに基づいて修正された、より正確な出力文字列に基づいて処理を行うことができるので、より精度が高い処理を行うことができる。 Further, according to the translation apparatus 1, the detecting unit 13 inputs the recognized character string with the sentence ending symbol removed into the sentence ending symbol insertion model that outputs a character string with the sentence ending symbol inserted when the character string without the sentence ending symbol is input. The end of the recognized character string may be detected based on the output character string, which is the character string obtained by With this configuration, processing can be performed based on a more accurate output character string that has been corrected based on the sentence ending symbol insertion model, so that more accurate processing can be performed.
 また、翻訳装置1によれば、検出部13は、認識文字列の文末を検出した際に、当該認識文字列を出力文字列に置き換え、翻訳部14は、置き換えられた認識文字列のうち文末記号までを翻訳してもよい。この構成により、より確実に文末までの翻訳が可能になる。また、翻訳対象通りの文末記号が挿入された認識文字列は、その後の処理で有効活用することができる。 Further, according to the translation device 1, when the detection unit 13 detects the end of the recognized character string, the detection unit 13 replaces the recognized character string with the output character string, and the translation unit 14 detects the end of the sentence in the replaced recognized character string. You can translate up to the symbol. With this configuration, it is possible to more reliably translate up to the end of the sentence. In addition, the recognized character string inserted with the sentence-ending symbol that is to be translated can be effectively used in the subsequent processing.
 また、翻訳装置1によれば、文末記号挿入モデルは、文末記号のない文字列と文末記号のある文字列との組である学習データに基づいて学習された学習済みモデルであってもよい。この構成により、より正確な出力を行う文末記号挿入モデルをより確実に生成することができる。 Further, according to the translation device 1, the sentence ending symbol insertion model may be a trained model trained based on learning data that is a set of a character string without a sentence ending symbol and a character string with a sentence ending symbol. With this configuration, it is possible to more reliably generate a sentence ending mark insertion model that provides more accurate output.
 翻訳装置1は、文末判定技術を応用した理解しやすい音声翻訳制御装置である。翻訳装置1による翻訳方法は、文末判定技術を応用した理解しやすい音声翻訳制御方法である。 The translation device 1 is an easy-to-understand speech translation control device that applies the end-of-sentence determination technology. The translation method by the translation device 1 is an easy-to-understand speech translation control method that applies the end-of-sentence determination technology.
 リアルタイムに音声(手書き文字なども含む。以降同様。)を翻訳するシステムにおいて、音声(文字なども含む。以降同様。)認識結果と翻訳結果の見せ方はユーザの理解のしやすさにとって重要である。音声認識結果が無音区間により確定されてから翻訳結果を出力する場合、発話が長い場合、翻訳結果が出力されるタイミングが遅く、ユーザーは話についていけない(図13及び図14参照)。 In a system that translates speech (including handwritten characters, etc.; the same shall apply hereinafter) in real time, how to present speech (including text, etc.; the same applies hereinafter) recognition results and translation results is important for ease of understanding by the user. be. When the translation result is output after the speech recognition result is confirmed by the silent interval, the timing of outputting the translation result is delayed if the utterance is long, and the user cannot keep up with the speech (see FIGS. 13 and 14).
 図13は、従来の翻訳の挙動例(その1)を示す図である。図14は、従来の翻訳の挙動例(その2)を示す図である。図13及び図14は、無音区間単位で翻訳結果を出す場合の挙動例を示す。図13及び図14に示すように、音声認識結果確定時にまとめて翻訳結果が出てくるため、ユーザは読むのが大変である。プレゼンテーションの場合などは、資料が次のページに進んでしまい、ユーザは話についていけない。 FIG. 13 is a diagram showing an example of conventional translation behavior (Part 1). FIG. 14 is a diagram showing an example (part 2) of conventional translation behavior. 13 and 14 show an example of behavior when a translation result is output for each silent section. As shown in FIGS. 13 and 14, when the speech recognition results are confirmed, the translation results are collectively displayed, which makes it difficult for the user to read. In the case of a presentation, etc., the material advances to the next page, and the user cannot keep up with the story.
 一方、音声認識途中に随時更新されている音声認識結果を随時翻訳して出力する場合、翻訳結果が頻繁に更新されるため読みにくく、また、文の途中で翻訳するため誤解を生む可能性がある(図15~図17参照参照)。 On the other hand, when translating and outputting speech recognition results that are updated as needed during speech recognition, the translation results are frequently updated, making it difficult to read. (see FIGS. 15-17).
 図15は、従来の逐次翻訳の挙動例(その1)を示す図である。図16は、従来の逐次翻訳の挙動例(その2)を示す図である。図17は、従来の逐次翻訳の挙動例(その3)を示す図である。図15~図17は、逐次翻訳結果を出す場合の挙動例を示す。図15~図17に示すように、翻訳結果がパタパタ変わり、ユーザは読みにくい。また、文の途中で翻訳するため、ユーザの誤解を生む。例えば、図16に示す通り、「何と言っても」の「なんと」の部分までで翻訳し、サプライズの意味で誤解を生む。 FIG. 15 is a diagram showing an example (part 1) of conventional sequential translation behavior. FIG. 16 is a diagram showing a behavior example (part 2) of conventional sequential translation. FIG. 17 is a diagram showing a behavior example (part 3) of conventional sequential translation. 15 to 17 show examples of behavior when outputting results of sequential translation. As shown in FIGS. 15 to 17, the translation results fluctuate and are difficult for the user to read. In addition, since the translation is performed in the middle of the sentence, misunderstandings are caused by the user. For example, as shown in FIG. 16, the word "whatever" is translated up to the part of "how", which creates a misunderstanding as a surprise.
 また、出力済みの音声認識結果が音声認識途中に変わる可能性があるため、それに伴い翻訳結果も変わることがあり、音声認識途中での翻訳はユーザの誤解を生む可能性がある(図18及び図19参照)。 In addition, since the output speech recognition result may change during speech recognition, the translation result may also change accordingly, and translation during speech recognition may cause misunderstanding by the user (Fig. 18 and See Figure 19).
 図18は、従来の逐次翻訳の別の挙動例(その1)を示す図である。図19は、従来の逐次翻訳の別の挙動例(その2)を示す図である。図18及び図19は、逐次翻訳結果を出す場合の挙動例を示す。図18及び図19に示すように、音声認識結果が変化し、それに伴い翻訳結果も大きく変わる。すなわち、ユーザには読みにくく、ユーザの誤解も生んでしまう。 FIG. 18 is a diagram showing another behavior example (Part 1) of conventional consecutive translation. FIG. 19 is a diagram showing another behavior example (part 2) of conventional successive translation. 18 and 19 show an example of behavior when outputting results of sequential translation. As shown in FIGS. 18 and 19, the speech recognition result changes, and the translation result also changes accordingly. That is, it is difficult for the user to read, and the user may misunderstand.
 翻訳装置1によれば、音声認識結果が意味のある文として確定したタイミングで、意味のある文単位で翻訳結果を出力する。無音区間によらず、逐次音声認識結果に対して音声認識途中での出力済み音声認識結果の変化も考慮しながら逐次文末判定を行うことで、翻訳結果の出力のタイミングが早く、途中文の翻訳によるユーザの誤解も生まない。上記処理を行うことで、理解しやすい音声翻訳を実現することができる。図20は、翻訳装置1による翻訳の対象を示す図である。図20に示すように、翻訳装置1により文末までが翻訳される。 According to the translation device 1, at the timing when the speech recognition result is determined as a meaningful sentence, the translation result is output for each meaningful sentence. Regardless of the silence interval, the timing of the output of the translation result is quicker, and the translation of the middle sentence is performed by sequentially judging the end of the sentence while considering the change in the output speech recognition result during the speech recognition for the speech recognition result. It does not cause user misunderstanding due to By performing the above processing, easy-to-understand speech translation can be realized. FIG. 20 is a diagram showing a target of translation by the translation apparatus 1. As shown in FIG. As shown in FIG. 20, the translation apparatus 1 translates up to the end of the sentence.
 翻訳装置1によれば、音声認識途中結果に対して、逐次句読点処理を行い、文中に文末記号(「。」など)が含まれる場合、文末記号までの文を機械翻訳し、出力する。翻訳装置1によれば、音声認識途中結果に対して、逐次句読点処理を行い、句読点を挿入する。文中に文末記号(「。」など)が含まれる場合、文末記号までの文を機械翻訳し、仮出力する。これにより、翻訳結果を意味のわかる単位で早く出すことができる。 According to the translation device 1, punctuation processing is sequentially performed on the intermediate results of speech recognition, and if a sentence contains a sentence-ending symbol (such as "."), the sentence up to the sentence-ending symbol is machine-translated and output. According to the translation apparatus 1, the interim speech recognition result is successively subjected to punctuation processing and punctuation is inserted. If the sentence contains a sentence ending symbol (such as "."), the sentence up to the sentence ending symbol is machine-translated and provisionally output. As a result, the translation result can be output quickly in units that make sense.
 翻訳装置1によれば、文末記号の位置が音声認識途中結果の末尾n文字(例えば5文字)である場合、文末と判定しない。これにより、音声認識途中の誤った文末判定をなくすことができる。なお、文末記号は、句読点処理によって付与するものであり、翻訳にかける文の単位の確定のために利用する。 According to the translation device 1, when the position of the end of sentence symbol is the last n characters (for example, 5 characters) of the interim speech recognition result, it is not determined as the end of the sentence. As a result, erroneous end-of-sentence determination during speech recognition can be eliminated. The sentence ending symbol is added by punctuation processing, and is used to determine the unit of the sentence to be translated.
 翻訳装置1において、逐次音声認識に対して句読点処理し、文末が末尾n文字以内でなければそこまでを翻訳することで、文末を誤認識することが少なくなる。従来技術では、例えば、「ですけれども」という発話の「です」までが音声認識された場合、文末を誤認識することが多い。 In the translation device 1, erroneous recognition of the end of the sentence is reduced by performing punctuation processing on successive speech recognition and translating up to the end of the sentence if it is not within the last n characters. In the conventional technology, for example, when ``desu'' in the utterance ``desu but'' is recognized, the end of the sentence is often erroneously recognized.
 翻訳装置1によれば、逐次句読点処理(音声認識結果に対してのみ。翻訳結果ではない。)を行い、文末記号までを翻訳する。また、逐次音声認識結果について、文末記号(例えば「。」がついている部分)までを出力する。 According to the translation device 1, sequential punctuation processing (only for speech recognition results, not for translation results) is performed, and translation is performed up to the end of a sentence. Also, the sequential speech recognition result is output up to the sentence ending symbol (for example, the part with ".").
 翻訳装置1によれば、音声認識結果が更新されるたびに句読点処理し、文末までを翻訳して出力する。また、文末判定時、末尾n文字は無視する。 According to the translation device 1, every time the speech recognition result is updated, punctuation processing is performed, and the end of the sentence is translated and output. Also, when judging the end of a sentence, the last n characters are ignored.
 翻訳装置1によれば、逐次音声認識結果について、音声認識結果に対して句読点処理をかけた結果に句点があれば句点までを翻訳し、なければ翻訳しない。翻訳結果は出力する。末尾n文字を考慮する。 According to the translation device 1, if there is a period in the result of applying punctuation processing to the speech recognition result, the translation is translated up to the period, and if not, it is not translated. Output the translation result. Consider trailing n characters.
 翻訳装置1によれば、音声認識結果が更新されるたびに句読点処理し、文末までを翻訳して出力(翻訳結果も都度更新(上書きのイメージ)される)し、文末判定時、末尾n文字以内に文末記号が含まれている場合は文末として判定しない。すなわち、音声認識途中に、意図せぬ末尾の誤判定が発生するため末尾n文字を考慮する。 According to the translation device 1, every time the speech recognition result is updated, punctuation is processed, the end of the sentence is translated and output (the translation result is also updated each time (image of overwriting)), and when the end of the sentence is determined, the last n characters If a sentence-ending symbol is included within, it is not determined as the end of the sentence. That is, since an unintended erroneous determination of the end occurs during speech recognition, the end n characters are considered.
 翻訳装置1によれば、「逐次」音声認識に対して句読点処理し、「文末が末尾n文字」以内でなければそこまでを翻訳する。 According to the translation device 1, punctuation is processed for "sequential" speech recognition, and if the end of the sentence is not within the last n characters, it is translated up to that point.
 翻訳装置1によれば、音声認識途中では下記の処理を実行し続ける。
・「音声認識結果が更新される(約0.2秒毎)
・句読点処理を実行し文末判定
・翻訳を実行し出力
According to the translation device 1, the following processing is continuously executed during speech recognition.
・"The voice recognition result is updated (about every 0.2 seconds)
・Perform punctuation processing and judge the end of sentence ・Perform translation and output
 翻訳装置1によれば、逐次認識、判定、かつ、翻訳結果の更新をし続けることで変化を観察し、音声認識結果の確定までは出力とすることで変化に対応している。 According to the translation device 1, changes are observed by continuously recognizing, judging, and updating the translation results, and changes are handled by outputting until the speech recognition results are finalized.
 翻訳装置1によれば、逐次文末判定までを翻訳にかける。 According to the translation device 1, the translation is performed up to the judgment of the end of the sentence.
 なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック(構成部)は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した1つの装置を用いて実現されてもよいし、物理的又は論理的に分離した2つ以上の装置を直接的又は間接的に(例えば、有線、無線などを用いて)接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記1つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 It should be noted that the block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are implemented by any combination of at least one of hardware and software. Also, the method of realizing each functional block is not particularly limited. That is, each functional block may be implemented using one device physically or logically coupled, or directly or indirectly using two or more physically or logically separated devices (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.
 機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知(broadcasting)、通知(notifying)、通信(communicating)、転送(forwarding)、構成(configuring)、再構成(reconfiguring)、割り当て(allocating、mapping)、割り振り(assigning)などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック(構成部)は、送信部(transmitting unit)や送信機(transmitter)と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judging, determining, determining, calculating, calculating, processing, deriving, examining, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't For example, a functional block (component) that performs transmission is called a transmitting unit or transmitter. In either case, as described above, the implementation method is not particularly limited.
 例えば、本開示の一実施の形態における翻訳装置1などは、本開示の翻訳方法の処理を行うコンピュータとして機能してもよい。図21は、本開示の一実施の形態に係る翻訳装置1のハードウェア構成の一例を示す図である。上述の翻訳装置1は、物理的には、プロセッサ1001、メモリ1002、ストレージ1003、通信装置1004、入力装置1005、出力装置1006、バス1007などを含むコンピュータ装置として構成されてもよい。 For example, the translation device 1 according to the embodiment of the present disclosure may function as a computer that performs the translation method of the present disclosure. FIG. 21 is a diagram showing an example of a hardware configuration of translation device 1 according to an embodiment of the present disclosure. The translation device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.
 なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。翻訳装置1のハードウェア構成は、図に示した各装置を1つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following explanation, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the translation device 1 may be configured to include one or more of the devices shown in the figure, or may be configured without some of the devices.
 翻訳装置1における各機能は、プロセッサ1001、メモリ1002などのハードウェア上に所定のソフトウェア(プログラム)を読み込ませることによって、プロセッサ1001が演算を行い、通信装置1004による通信を制御したり、メモリ1002及びストレージ1003におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 Each function of the translation apparatus 1 is performed by causing the processor 1001 to perform calculations, controlling communication by the communication device 1004 and controlling the and by controlling at least one of reading and writing of data in the storage 1003 .
 プロセッサ1001は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ1001は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置(CPU:Central Processing Unit)によって構成されてもよい。例えば、上述の学習部11、取得部12、検出部13及び翻訳部14などは、プロセッサ1001によって実現されてもよい。 The processor 1001, for example, operates an operating system and controls the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like. For example, the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the processor 1001 .
 また、プロセッサ1001は、プログラム(プログラムコード)、ソフトウェアモジュール、データなどを、ストレージ1003及び通信装置1004の少なくとも一方からメモリ1002に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、学習部11、取得部12、検出部13及び翻訳部14は、メモリ1002に格納され、プロセッサ1001において動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。上述の各種処理は、1つのプロセッサ1001によって実行される旨を説明してきたが、2以上のプロセッサ1001により同時又は逐次に実行されてもよい。プロセッサ1001は、1以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 Also, the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, the learning unit 11, the acquiring unit 12, the detecting unit 13, and the translating unit 14 may be stored in the memory 1002 and implemented by a control program running on the processor 1001, and other functional blocks may be implemented in the same way. good too. Although it has been explained that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.
 メモリ1002は、コンピュータ読み取り可能な記録媒体であり、例えば、ROM(Read Only Memory)、EPROM(Erasable Programmable ROM)、EEPROM(Electrically Erasable Programmable ROM)、RAM(Random Access Memory)などの少なくとも1つによって構成されてもよい。メモリ1002は、レジスタ、キャッシュ、メインメモリ(主記憶装置)などと呼ばれてもよい。メモリ1002は、本開示の一実施の形態に係る無線通信方法を実施するために実行可能なプログラム(プログラムコード)、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program code), software modules, etc. for implementing a wireless communication method according to an embodiment of the present disclosure.
 ストレージ1003は、コンピュータ読み取り可能な記録媒体であり、例えば、CD-ROM(Compact Disc ROM)などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Blu-ray(登録商標)ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー(登録商標)ディスク、磁気ストリップなどの少なくとも1つによって構成されてもよい。ストレージ1003は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ1002及びストレージ1003の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .
 通信装置1004は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア(送受信デバイス)であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置1004は、例えば周波数分割複信(FDD:Frequency Division Duplex)及び時分割複信(TDD:Time Division Duplex)の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。例えば、上述の学習部11、取得部12、検出部13及び翻訳部14などは、通信装置1004によって実現されてもよい。 The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high-frequency switch, a duplexer, a filter, a frequency synthesizer, etc., in order to realize at least one of frequency division duplex (FDD) and time division duplex (TDD). may consist of For example, the learning unit 11 , acquisition unit 12 , detection unit 13 , translation unit 14 and the like described above may be implemented by the communication device 1004 .
 入力装置1005は、外部からの入力を受け付ける入力デバイス(例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど)である。出力装置1006は、外部への出力を実施する出力デバイス(例えば、ディスプレイ、スピーカー、LEDランプなど)である。なお、入力装置1005及び出力装置1006は、一体となった構成(例えば、タッチパネル)であってもよい。 The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (for example, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
 また、プロセッサ1001、メモリ1002などの各装置は、情報を通信するためのバス1007によって接続される。バス1007は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
 また、翻訳装置1は、マイクロプロセッサ、デジタル信号プロセッサ(DSP:Digital Signal Processor)、ASIC(Application Specific Integrated Circuit)、PLD(Programmable Logic Device)、FPGA(Field Programmable Gate Array)などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ1001は、これらのハードウェアの少なくとも1つを用いて実装されてもよい。 The translation device 1 also includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). may be configured, and a part or all of each functional block may be realized by the hardware. For example, processor 1001 may be implemented using at least one of these pieces of hardware.
 情報の通知は、本開示において説明した態様/実施形態に限られず、他の方法を用いて行われてもよい。 Notification of information is not limited to the aspects/embodiments described in the present disclosure, and may be performed using other methods.
 本開示において説明した各態様/実施形態は、LTE(Long Term Evolution)、LTE-A(LTE-Advanced)、SUPER 3G、IMT-Advanced、4G(4th generation mobile communication system)、5G(5th generation mobile communication system)、FRA(Future Radio Access)、NR(new Radio)、W-CDMA(登録商標)、GSM(登録商標)、CDMA2000、UMB(Ultra Mobile Broadband)、IEEE 802.11(Wi-Fi(登録商標))、IEEE 802.16(WiMAX(登録商標))、IEEE 802.20、UWB(Ultra-WideBand)、Bluetooth(登録商標)、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて(例えば、LTE及びLTE-Aの少なくとも一方と5Gとの組み合わせ等)適用されてもよい。 Each aspect/embodiment described in the present disclosure includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system) system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), and other suitable systems and extended It may be applied to at least one of the next generation systems. Also, a plurality of systems may be applied in combination (for example, a combination of at least one of LTE and LTE-A and 5G, etc.).
 本開示において説明した各態様/実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The order of the processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in the present disclosure may be changed as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.
 入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
 判定は、1ビットで表される値(0か1か)によって行われてもよいし、真偽値(Boolean:true又はfalse)によって行われてもよいし、数値の比較(例えば、所定の値との比較)によって行われてもよい。 The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
 本開示において説明した各態様/実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知(例えば、「Xであること」の通知)は、明示的に行うものに限られず、暗黙的(例えば、当該所定の情報の通知を行わない)ことによって行われてもよい。 Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching along with execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
 以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described in this disclosure. The present disclosure can be practiced with modifications and variations without departing from the spirit and scope of the present disclosure as defined by the claims. Accordingly, the description of the present disclosure is for illustrative purposes and is not meant to be limiting in any way.
 ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
 また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術(同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線(DSL:Digital Subscriber Line)など)及び無線技術(赤外線、マイクロ波など)の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
 本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of
 なお、本開示において説明した用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。 The terms explained in this disclosure and terms necessary for understanding this disclosure may be replaced with terms having the same or similar meanings.
 本開示において使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" used in this disclosure are used interchangeably.
 また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。 In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
 上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。 The names used for the parameters described above are not restrictive names in any respect. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure.
 本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)(例えば、テーブル、データベース又は別のデータ構造での探索)、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)(例えば、情報を受信すること)、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)(例えば、メモリ中のデータにアクセスすること)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断(決定)」は、「想定する(assuming)」、「期待する(expecting)」、「みなす(considering)」などで読み替えられてもよい。 The terms "determining" and "determining" used in this disclosure may encompass a wide variety of actions. "Judgement" and "determination" are, for example, judging, calculating, computing, processing, deriving, investigating, looking up, search, inquiry (eg, lookup in a table, database, or other data structure), ascertaining as "judged" or "determined", and the like. Also, “judgment” and “decision” are used for receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, access (accessing) (for example, accessing data in memory) may include deeming that something has been "determined" or "decided". In addition, "judgment" and "decision" are considered to be "judgment" and "decision" by resolving, selecting, choosing, establishing, comparing, etc. can contain. In other words, "judgment" and "decision" may include considering that some action is "judgment" and "decision". Also, "judgment (decision)" may be read as "assuming", "expecting", "considering", or the like.
 「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、2又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された2つの要素間に1又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用する場合、2つの要素は、1又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光(可視及び不可視の両方)領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements being "connected" or "coupled." Couplings or connections between elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access". As used in this disclosure, two elements are defined using at least one of one or more wires, cables, and printed electrical connections and, as some non-limiting and non-exhaustive examples, in the radio frequency domain. , electromagnetic energy having wavelengths in the microwave and optical (both visible and invisible) regions, and the like.
 本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The term "based on" as used in this disclosure does not mean "based only on" unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."
 本開示において使用する「第1の」、「第2の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、2つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第1及び第2の要素への参照は、2つの要素のみが採用され得ること、又は何らかの形で第1の要素が第2の要素に先行しなければならないことを意味しない。 Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
 上記の各装置の構成における「手段」を、「部」、「回路」、「デバイス」等に置き換えてもよい。 "Means" in the configuration of each device described above may be replaced with "unit", "circuit", "device", or the like.
 本開示において、「含む(include)」、「含んでいる(including)」及びそれらの変形が使用されている場合、これらの用語は、用語「備える(comprising)」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は(or)」は、排他的論理和ではないことが意図される。 Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.
 本開示において、例えば、英語でのa、an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In the present disclosure, when articles are added by translation, such as a, an and the in English, the present disclosure may include that nouns following these articles are plural.
 本開示において、「AとBが異なる」という用語は、「AとBが互いに異なる」ことを意味してもよい。なお、当該用語は、「AとBがそれぞれCと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."
 1…翻訳装置、2…認識装置、3…翻訳システム、10…格納部、11…学習部、12…取得部、13…検出部、14…翻訳部、1001…プロセッサ、1002…メモリ、1003…ストレージ、1004…通信装置、1005…入力装置、1006…出力装置、1007…バス。 Reference Signs List 1 translation device 2 recognition device 3 translation system 10 storage unit 11 learning unit 12 acquisition unit 13 detection unit 14 translation unit 1001 processor 1002 memory 1003 Storage 1004 Communication device 1005 Input device 1006 Output device 1007 Bus.

Claims (10)

  1.  音声認識結果又は文字認識結果の文字列である認識文字列を取得する取得部と、
     前記認識文字列の文末を検出する検出部と、
     前記認識文字列のうち前記文末までを翻訳する翻訳部と、
     を備える翻訳装置。
    an acquisition unit that acquires a recognized character string that is a character string of a speech recognition result or a character recognition result;
    a detection unit that detects the end of the recognition character string;
    a translation unit that translates up to the end of the sentence in the recognized character string;
    A translation device with
  2.  前記取得部は、音声認識又は文字認識を行う機能が前記認識文字列を出力するたびに当該認識文字列を取得し、
     前記翻訳部は、前記取得部が前記認識文字列を取得するたびに、当該認識文字列のうち前記検出部によって検出された当該認識文字列の文末までを翻訳する、
     請求項1に記載の翻訳装置。
    The acquisition unit acquires the recognized character string each time a function that performs voice recognition or character recognition outputs the recognized character string,
    The translation unit translates up to the end of the recognized character string detected by the detection unit in the recognized character string each time the acquisition unit acquires the recognized character string.
    A translation device according to claim 1.
  3.  前記翻訳部は、前記検出部によって前記認識文字列の文末が検出されなかった場合、当該認識文字列を翻訳しない、
     請求項1又は2に記載の翻訳装置。
    The translation unit does not translate the recognized character string if the detection unit does not detect the end of the recognized character string.
    3. A translation device according to claim 1 or 2.
  4.  前記検出部は、前記認識文字列の末尾から所定の文字数以内にある文末は、文末として検出しない、
     請求項1~3の何れか一項に記載の翻訳装置。
    The detection unit does not detect the end of a sentence within a predetermined number of characters from the end of the recognized character string as the end of the sentence.
    A translation device according to any one of claims 1 to 3.
  5.  前記検出部は、前記認識文字列のうち末尾から所定の文字数以内を除く部分の文末を検出する、
     請求項1~4の何れか一項に記載の翻訳装置。
    The detection unit detects the end of a part of the recognized character string excluding a predetermined number of characters from the end,
    A translation device according to any one of claims 1 to 4.
  6.  前記翻訳部は、前記文末が前記認識文字列の末尾から所定の文字数以内にある場合、当該認識文字列を翻訳しない、
     請求項1~5の何れか一項に記載の翻訳装置。
    The translation unit does not translate the recognized character string if the end of the sentence is within a predetermined number of characters from the end of the recognized character string.
    A translation device according to any one of claims 1 to 5.
  7.  前記検出部は、前記認識文字列の文末を検出した際に、当該認識文字列の当該検出箇所に文末記号を挿入し、
     前記翻訳部は、文末記号が挿入された前記認識文字列のうち当該文末記号までを翻訳する、
     請求項1~6の何れか一項に記載の翻訳装置。
    The detection unit, when detecting the end of a sentence in the recognized character string, inserts an end-of-sentence symbol into the detected part of the recognized character string,
    The translation unit translates up to the sentence ending symbol in the recognized character string in which the sentence ending symbol is inserted.
    A translation device according to any one of claims 1 to 6.
  8.  前記検出部は、文末記号のない文字列を入力すると文末記号が挿入された文字列を出力する文末記号挿入モデルに、文末記号を除去した前記認識文字列を入力して得られる文字列である出力文字列に基づいて、前記認識文字列の文末を検出する、
     請求項1~7の何れか一項に記載の翻訳装置。
    The detection unit is a character string obtained by inputting the recognized character string with the sentence ending symbol removed into a sentence ending symbol insertion model that outputs a character string with a sentence ending symbol inserted when a character string without a sentence ending symbol is input. detecting the end of the recognition string based on the output string;
    A translation device according to any one of claims 1 to 7.
  9.  前記検出部は、前記認識文字列の文末を検出した際に、当該認識文字列を前記出力文字列に置き換え、
     前記翻訳部は、置き換えられた前記認識文字列のうち文末記号までを翻訳する、
     請求項8に記載の翻訳装置。
    The detection unit replaces the recognized character string with the output character string when detecting the end of the recognized character string,
    The translation unit translates the replaced recognition character string up to the end of the sentence,
    9. A translation device according to claim 8.
  10.  前記文末記号挿入モデルは、文末記号のない文字列と文末記号のある文字列との組である学習データに基づいて学習された学習済みモデルである、
     請求項8又は9に記載の翻訳装置。
    The sentence ending symbol insertion model is a trained model trained based on learning data that is a set of a character string without a sentence ending symbol and a character string with a sentence ending symbol.
    10. A translation device according to claim 8 or 9.
PCT/JP2022/043979 2022-01-13 2022-11-29 Translation device WO2023135963A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022003548 2022-01-13
JP2022-003548 2022-01-13

Publications (1)

Publication Number Publication Date
WO2023135963A1 true WO2023135963A1 (en) 2023-07-20

Family

ID=87278948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/043979 WO2023135963A1 (en) 2022-01-13 2022-11-29 Translation device

Country Status (1)

Country Link
WO (1) WO2023135963A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020024277A (en) * 2018-08-07 2020-02-13 国立研究開発法人情報通信研究機構 Data segmentation device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020024277A (en) * 2018-08-07 2020-02-13 国立研究開発法人情報通信研究機構 Data segmentation device

Similar Documents

Publication Publication Date Title
KR20190046432A (en) Neural machine translation method and apparatus
US9785631B2 (en) Identification and extraction of acronym/definition pairs in documents
US8301435B2 (en) Removing ambiguity when analyzing a sentence with a word having multiple meanings
WO2022095563A1 (en) Text error correction adaptation method and apparatus, and electronic device, and storage medium
WO2019225154A1 (en) Created text evaluation device
JP7222082B2 (en) Recognition error correction device and correction model
US11227116B2 (en) Translation device, translation method, and program
CN104133561A (en) Auxiliary information display method and device based on input method
US10120843B2 (en) Generation of parsable data for deep parsing
US9171234B2 (en) Method of learning a context of a segment of text, and associated handheld electronic device
WO2023135963A1 (en) Translation device
US20210142007A1 (en) Entity identification system
WO2022180990A1 (en) Question generating device
WO2023100433A1 (en) Character string output device
US20230141191A1 (en) Dividing device
US20230223017A1 (en) Punctuation mark delete model training device, punctuation mark delete model, and determination device
WO2020166125A1 (en) Translation data generating system
JP3825645B2 (en) Expression conversion method and expression conversion apparatus
CN113924573A (en) Translation device
US20220277731A1 (en) Word weight calculation system
JP2020177387A (en) Sentence output device
US11862167B2 (en) Voice dialogue system, model generation device, barge-in speech determination model, and voice dialogue program
JP2021179766A (en) Text translation device and translation model
WO2022180989A1 (en) Model generation device and model generation method
WO2020255553A1 (en) Generation device and normalization model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22920483

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023573892

Country of ref document: JP