WO2001080020A1

WO2001080020A1 - Apparatus for transmitting and receiving voice message, method of manufacture thereof, relay device, method of transmitting, receiving and relaying, and recording medium

Info

Publication number: WO2001080020A1
Application number: PCT/JP2001/002332
Authority: WO
Inventors: Kenichi Ohmae
Original assignee: Kenichi Ohmae
Priority date: 2000-04-17
Filing date: 2001-03-23
Publication date: 2001-10-25
Also published as: AU2001242768A1; JP2001306461A

Abstract

The invention provides a method and an apparatus for transmitting and receiving voice messages with reduced amount of data as well as a method and an apparatus for relaying such messages, and also provides a recording medium. A voice message transmitter comprises a voice input device (101) for receiving voice messages, a speech-recognition unit (107) for recognizing a voice message input through the voice input device (101) to convert it into a text file, and a transmitting unit (104) for transmitting the text file converted by the speech-recognition unit (107), together with the voice file including the voice message which has not been recognized as a text file by the speech-recognition unit (107). Compared with the transmission of a voice file alone, the amount of data to be transmitted is reduced because the voice file has converted in part to a text file.

Description

Description Voice document transmission / reception device, device manufacturing method, relay device, transmission / reception relay method, and recording medium

The present invention relates to an audio document transmitting apparatus, a transmitting method, a receiving apparatus, a receiving method, a method of manufacturing a transmitting apparatus receiving apparatus, a maintenance apparatus, a relay method, and a recording medium, and particularly, to efficiently transmit an audio document in a short time. TECHNICAL FIELD The present invention relates to a transmitting device, a transmitting method, a receiving device, a receiving method, a method of manufacturing a transmitting device, a relay device, a relay method, and a recording medium that can perform transmission. Background art

Conventionally, voice has been transmitted to a remote location by means of transmitting the voice as it is, such as by telephone, or by transmitting voice data to a remote location via the Internet.

According to the conventional method as described above, since the spoken word is transmitted in real time over the telephone, the transmitting device is occupied during the conversation and the amount of data to be transmitted is large, so the telephone charge is reduced. It was expensive and won. Also, when transmitting voice data over the Internet, the amount of voice data was large, so the burden on the Internet equipment was increased, and this was a win.

Therefore, an object of the present invention is to provide a transmitting apparatus, a transmitting method, a receiving apparatus, a receiving method, a relay apparatus, a relay method, and a recording medium of a voice document with reduced data amount. DISCLOSURE OF THE INVENTION ''

Voice document transmission apparatus丄_{0 0} according to an exemplary embodiment of the present invention is a so-called voice mail transmitting device, for example, Remind as in FIG. 1, a voice input device 1 0 1 for inputting document by voice; Recognizes the voice input by the voice input device 101 and converts it to text. A voice recognition mechanism 107 for converting to a file; a text file converted by the voice recognition mechanism 107, and a voice file including a voice document not converted to a text file by the voice recognition mechanism 107. And a transmission mechanism 104 for transmitting the mixed data. The fact that the speech recognition mechanism 107 did not convert to a text file typically means that the speech recognition mechanism 107 did not recognize the text file. Considering this, the case may be such that the input is made from the input device 101. The same applies to the voice document transmission method described below.

With this configuration, the text file converted by the voice recognition mechanism and the voice file including the voice document that could not be converted to the text file because it could not be recognized by the voice recognition mechanism were mixed. Since a transmission mechanism is provided for transmitting data, all or a part of the audio file can be transmitted as a text file having a small data load. When all audio files can be converted to text files, the transmission mechanism may of course transmit only text files.

Here, a voice file creation mechanism that converts voice that could not be recognized as a text file by the voice recognition mechanism into a voice file may be provided.

Further, in the voice document transmitting device according to another embodiment of the present invention: I 00, an example sentence for converting the voice input by the voice input device 101 into an identifier of a pre-registered example sentence. The transmitting mechanism 104 may further include a setting mechanism 109, and the transmitting mechanism 104 may be configured to mix and transmit the identifiers of the example sentences. The example sentence identifier may be a symbol, a character, or a code using a number, or may be image data such as an icon or a picture.

Further, in the voice document transmitting apparatus according to still another embodiment of the present invention, the voice recognition mechanism 107 may be configured to refer to the voice profile. A voice profile is a representation of the sound quality, language structure, etc. of the voice, such as gender, age, the place of origin representing the language structure, such as the speaker's dialect (displays Japanese, English, or a dialect), the period of inflection Before or after, whether or not you have a cold, etc. A parameterized version of these may be used as a profile.

Here, the voice profile may be detected by analyzing the input voice, or may be detected by the voice input person inputting his / her gender, age, etc. using an input device such as a keyboard. Is also good. In addition, the sending mechanism also sends the profile to a text file. It may be configured to be transmitted together with.

With such a configuration, the configuration of referring to the voice profile improves the accuracy of voice recognition. When the profile is sent together with the text file, the reproducibility and conversion accuracy when reproducing the audio file from the text file on the receiving side are improved.

Further, the voice document transmitting method according to the embodiment of the present invention is a so-called voice electronic mail transmitting method, and transmits a voice document using the voice document transmitting apparatus as described above.

A voice document transmitting method according to another embodiment of the present invention is a so-called voice e-mail transmitting method, and includes a voice input step of inputting a voice document; and a voice input step of recognizing voice input in the voice input step. A voice recognition step of converting to a text file; and a text file converted in the voice recognition step and a voice file including a voice document which has not been converted to a text file in the voice recognition step. And a transmitting step.

With such a configuration, a transmission step of transmitting the text file and the audio file in a mixed manner is provided, so that all or a part of the audio file can be transmitted as a text file having a small data amount.

The voice document transmitting method according to still another embodiment of the present invention further includes an example sentence identification step of converting the voice input in the voice input step into an identifier of a pre-registered example sentence. The transmitting step includes transmitting the identifiers of the example sentences in a mixed manner. With this configuration, the example sentence identification step is used, and the transmission step further transmits the identifier of the example sentence, so that the amount of data to be transmitted is smaller than that of the text file. It becomes possible.

In the voice document transmitting method according to still another embodiment of the present invention, the voice recognition step may be configured to recognize voice by referring to the voice profile. Here, the voice profile may be detected by analyzing the input voice, or may be detected by a voice input person inputting his / her gender, age, etc. using an input device such as a keyboard. Good. Further, the method may include a step of transmitting the profile together with the audio text file. With this configuration, the voice profile is referenced so that the accuracy of voice recognition is improved.When the profile is transmitted together with the text file, the audio file is transmitted from the text file on the receiving side. Reproducibility when reproducing files is improved.

A method of manufacturing a voice document transmitting apparatus according to an embodiment of the present invention includes: voice input processing for inputting a voice document; voice recognition processing for recognizing voice input in the voice input processing and converting the voice into a text file. Transmitting a text file converted by the voice recognition processing and a voice file including a voice document that has not been converted to a text file by the voice recognition processing; A program for controlling the device is provided to the transmitting device, and is configured as a voice document transmitting device. Here, the transmitting device is typically a computer such as a personal computer, and the program is typically provided via a network such as the Internet and downloaded by a user.

With this configuration, a program for controlling the transmitting device is provided to the transmitting device such as a combi- ter through communication means such as the Internet. For example, a general-purpose computer is used as the voice document transmitting device. Can be configured.

A recording medium readable by the voice document transmitting device (voice e-mail transmitting device) according to the embodiment of the present invention includes: voice input processing for inputting a voice document; and recognizing voice input in the voice input processing. A voice recognition process for converting the text file into a text file, and a text file converted in the voice recognition process and a voice file including a voice document not converted to a text file in the voice recognition process. A program for controlling the voice document transmitting device is stored so as to perform a transmitting process to be transmitted. With this configuration, the program stored in the recording medium is installed as, for example, a personal computer which is used as a voice document transmitting device, so that the personal computer has a predetermined transmitting function as a voice document transmitting device. be able to. A voice document receiving apparatus 200 according to still another embodiment of the present invention is a so-called voice electronic mail receiving apparatus, and receives a signal including a text file as shown in FIG. 2, for example. Receiving mechanism 203 for decoding the signal received by the receiving device 203; and converting the text file decoded by the decoding device 206 to voice. An audio conversion mechanism 207 is provided.

With such a configuration, a text file decrypted by the decryption mechanism is provided with a voice conversion mechanism, and a document transmitted and received in a text file with a small amount of data can be output as voice.

Here, an output device 201 may be provided which outputs the voice converted by the voice conversion mechanism as voice, but the output device 201 outputs the voice as well as a decrypted text file. It may be configured so that it can be output as text as it is. In this case, it is possible to output as audio, output as a text document, or output format as desired.

At this time, the signal received by the receiving device 200 may include the identifier of the example sentence, and may further include an audio file. In any case, the output is converted to one of two formats: speech or text.

Further, the voice document receiving device 200 may have the function of the voice document transmitting device 100, and in this case, it can be used as a voice document transmitting / receiving device. Usually, a sender can change positions and become a recipient. Therefore, it is preferable that the terminal device has both a transmission function and a reception function.

A voice document receiving method according to an embodiment of the present invention receives a voice document using the voice document receiving apparatus.

A voice document receiving method according to an embodiment of the present invention includes: a receiving step of receiving a signal including a text file; a decoding step of decoding the signal received in the receiving step; and a text file decoded in the decoding step. the voice converted by the voice of step _c is found and a speech step of converting the voice may include an audio output step of outputting as a voice. The signal received in the receiving step may include an example sentence identifier, and may further include an audio file. In any case, the output is converted to the audio format. However, in addition to outputting the sound, the decrypted text file may be output as it is. At this time, the user can output a sound, a text document, or select an output format as desired.

A method for manufacturing a voice document receiving apparatus 200 according to an embodiment of the present invention includes: a receiving process of receiving a signal including a text file; and a decoding process of the signal received in the receiving process. Providing a program for controlling the receiving device to the receiving device so as to perform a decoding process of converting the text file included in the signal decoded by the decoding process into a voice; It is configured as a voice document receiving device. Here, the receiving device is typically a computer such as a personal computer, and the program is typically provided through a network such as the Internet and downloaded by a user.

With this configuration, a program for controlling the receiving device is provided to the receiving device such as a computer via a communication means such as the Internet. For example, a general-purpose computer is used as the voice document receiving device. Can be configured.

A recording medium readable by the voice document receiving device (voice email receiving device) according to the embodiment of the present invention includes: a receiving process for receiving a signal including a text file; and a decoding process for decoding the signal received in the receiving process. And a voice processing for converting a text file included in the signal decrypted by the decryption processing into a voice; and storing a program for controlling the voice document receiving apparatus. With this configuration, the voice document receiving device can have a predetermined receiving function.

The voice document relay device 300 according to the embodiment of the present invention is a so-called voice mail relay device, and for example, as shown in FIG. 3, a receiving mechanism 3 for receiving a signal including a text file and a voice file. And a voice recognition mechanism 309 for converting a voice file in a signal received by the receiving mechanism 307 into a text file; and a text file and a voice recognition mechanism 309 received by the receiving mechanism 307. A text file transmitting mechanism 307 for transmitting the converted text file together is provided. Here, typically, as a receiving mechanism and a transmitting mechanism, a mechanism or a device having a 两 function is used as a communication interface.

Typically, there is provided a decoding mechanism 303 for decoding the received signal before converting it into a text file by a speech recognition mechanism. The signal received by the receiving mechanism 307 may include the identifier of the example sentence. At this time, an example sentence identifier or a text file of the text example sentence is also transmitted.

A voice document relay method according to an embodiment of the present invention relays a voice document using the voice document relay device. A voice document relay method according to an embodiment of the present invention is a so-called voice electronic mail relay method, and a receiving step of receiving a signal including a text file and a voice file; a voice file in the signal received in the receiving step; A text file converting step of converting the text file into a text file; and a text file transmitting step of transmitting the text file received in the receiving step and the text file converted in the text converting step together. .

With this configuration, a text conversion step is provided to convert the audio file in the signal received in the reception step into a text file. And a text file transmission step of transmitting the text file received in the reception step and the text file converted in the text conversion step together. It is possible to receive a text file that does not include. Therefore, only a relatively small amount of data needs to be received. This application is based on Japanese Patent Application No. 2000-1-1715179 filed in Japan on April 17, 2000, the content of which is incorporated herein by reference. And form a part of it.

Also, the present invention may be more completely understood from the following detailed description. Further areas of applicability of the present invention will become apparent from the detailed description provided hereinafter. However, the detailed description and specific examples are preferred embodiments of the present invention, and are described for illustrative purposes only. From this detailed description, various changes and modifications will be apparent to those skilled in the art within the spirit and scope of the present invention. Applicant does not intend to publish any of the described embodiments to the public and discloses any of the disclosed modifications and alternatives that may not be literally included within the scope of the claims. It shall be part of the invention under the doctrine of equivalents. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram illustrating a configuration example of a transmission device according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a configuration example of a receiving device according to an embodiment of the present invention. FIG. 3 is a block diagram showing a configuration example of the relay device according to the embodiment of the present invention. FIG. 4 is a conceptual diagram showing a state in which a voice electronic mail transmitting device, a receiving device, and a relay device are connected via a network.

FIG. 5 is a flowchart showing an example of a process when transmitting a voice electronic mail.

FIG. 6 is a flowchart showing a continuation of FIG.

FIG. 7 is a diagram illustrating an example of a screen displayed in a process of inputting a transmission voice electronic mail in the transmission device.

FIG. 8 is a diagram illustrating an example of a format of a packet transmitted by the transmission device.

FIG. 9 is a flowchart showing an example of processing performed by the relay device for voice electronic mail. .

FIG. 10 is a flowchart showing an example of a process when the receiving device receives an e-mail bucket.

FIG. 11 is a flowchart showing a continuation of FIG. 10.

FIG. 12 is a flowchart showing an example of processing at the time of reception when the contents of the received bucket are all text files.

FIG. 13 is a flowchart showing an example of the processing of each device when the voice of the sender is profiled and the receiving device converts the voice based on the profiling.

FIG. 14 is a flowchart showing a continuation of FIG. 13. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, the same or corresponding members are denoted by the same reference numerals, and redundant description will be omitted.

With reference to the block diagram of FIG. 1, a configuration example of a voice electronic mail transmitting apparatus according to an embodiment of the present invention will be described. The transmitting device shown in FIG. 1 is a voice document transmitting device 100 that transmits voice e-mail via a network such as the Internet. <The transmitting device 100 is a dedicated device, but is not limited thereto. IC (integrated circuit, LSI May be incorporated in a mobile phone or a bath computer equipped with. Here, when referring to a voice document transmission device, a document generally represents a person's thought in characters, but here it is sufficient if it can be represented by characters, and it does not necessarily take the form of characters. You may. For example, it may be expressed by a verbal sound without passing through characters.

The transmitting device 100 includes an input device 101 for inputting the voice of the sender and an output device 102 for reproducing the input voice and displaying operation instructions to the sender. have. The input device 101 is, for example, a microphone, and the output device 102 is, for example, a speaker as an audio output device, and a display or a printer as a visual output device.

The transmission device 1 • 0 further includes an IZO interface unit 103 that controls the input device 101 and the output device 102. In addition, a communication interface unit 104 is provided as a transmission mechanism for transmitting / receiving an e-mail by connecting to another transmission / reception device via a network.

Furthermore, a control unit 105 connected to an IZO interface: I03 and a communication interface 104 for sending and receiving e-mails and controlling the entire apparatus is provided. The control unit 105 includes an operation instruction mechanism 106 for performing an operation command and its management, and recognizes a text sent by voice, dictates and documents it, and creates a text file. An example sentence group registered in advance based on a voice recognition mechanism 107, a voice file creation mechanism 108 that creates a voice file from an e-mail sentence input by voice, and a keyword input by the sender's voice A text packet, audio file, and three sample text formats (one to three of the three formats) mixed in one email A transmission bucket creation mechanism 110 is included.

The storage unit 111 is further connected to the control unit 105. The storage unit 111 stores example sentences D Β (database) that stores registered example sentences in association with example sentence identifiers corresponding to each example sentence, and various types of people (gender, age, hometown, etc.). It has a voice DB that stores the voice data of) and has a database, and a language DB that holds the language data. Speech DB and language DΒ are D D groups necessary for speech recognition. The example sentence identifier may be an image data such as an icon or a picture, in addition to a symbol or a character code. In the example sentence setting mechanism, the set example sentence is put on the transmission packet in the form of an identifier. However, the present invention is not limited to this, and may be placed in the form of a text file.

In the transmitting device 100, the input voice file is converted into a text file as much as possible by the voice recognition mechanism 107, but the unrecognizable portion is directly put on the transmission packet as a voice file.

With reference to the block diagram of FIG. 2, a configuration example of a voice electronic mail receiving device will be described. The device shown in FIG. 2 is a voice document receiving device 200 for receiving an electronic mail via a network. The receiving device 200 is a dedicated device, but is not limited thereto, and may be a device incorporated in a mobile phone, a personal computer, or the like having an IC (which may be an integrated circuit or an LSI).

The receiving device 200 is provided with an output device 201 for reproducing the received mail by voice and displaying an operation instruction to a recipient. The output device 201 is, for example, a speaker as an audio output device, or a display as a visual output device.

The receiving device 200 further includes an I / O interface unit 202 that controls the output device 201. In addition, a communication interface unit 203 is provided as a receiving mechanism for transmitting and receiving e-mails by connecting to another transmission / reception device via a network.

Further, a control unit 204 is connected to the IZO interface 202 and the communication interface 203 and performs transmission and reception of mail and control of the entire apparatus. The control unit 204 has a mixture of an operation instruction mechanism 205 for operating instructions and their management, and three types of text files, audio files, and example sentences (one to three of the three types). A packet deciphering mechanism 206 that decomposes received packets into various formats and deciphers them, and a voice deciphering mechanism 200 that verifies received mail texts are included.

The control unit 204 is further connected to a storage unit 208. The storage unit 208 stores an example sentence DB that stores registered example sentences and audio data of various types of people (gender, age, hometown, etc.) and creates a database. I have it. The example sentence DB and the voice DB in the storage unit 208 are stored in the storage unit 1 1 on the transmission device side, respectively. 1 corresponds to the example sentence DB and the voice DB. Alternatively, use the same DB. Therefore, when the text file or the example sentence identifier is vocalized on the receiving side, the voice can have the same content as or close to the content on the transmitting side.

With reference to the block diagram of FIG. 3, a description will be given of a configuration example of a voice electronic mail transmitting / receiving device (relay device), which is a median point of the voice electronic mail transmitting device 100 and the receiving device 200. The device shown in FIG. 3 is a voice document keeping device 300 for relaying voice email transmission and reception.

The media device 300 is configured by a personal computer and a workstation. The relay device 300 is provided with a control unit 301 for sending and receiving mails and controlling the entire device, and includes the following mechanism. First, the operation instruction mechanism 302 performs an operation instruction and its management. The packet decryption mechanism 303 decomposes a received packet in which one to three of the three formats of text files, audio files, and example sentences are mixed. The voice recognition mechanism 304 further dictates and documents the voice file part (the part that could not be dictated by the transmitting device) of the received packet, and documents the text file. create. .

The voice recognition mechanism 304 here is configured to have higher performance than the voice recognition mechanism 107 in the transmission device 100. Therefore, audio data that could not be converted to a text file by the transmitting device 100 can also be converted to text data. The transmission bucket creating mechanism 305 collects a text file, an example sentence, or any one existing in one mail into a single packet.

The control unit 301 is connected to a storage unit 306, which stores voice data of various types of people (gender, age, hometown, etc.) and creates a database of voice data. It has a DB and a language DB that holds language data. The example sentence DB and the voice DB in the storage unit 303 are provided in the storage unit 111 on the transmission device side or the reception device 200, respectively, as described in the case of the reception device 200. Corresponding to the example sentence DB and the voice DB.

A communication interface unit 307 is connected to the control unit 301, and the relay unit 300 is connected to a network to transmit and receive mail.

Referring to the conceptual diagram of FIG. 4, an electronic mail transmitting device 401 and an electronic mail receiving device 4 The relationship between 02 and the electronic mail transmitting / receiving device (relay device) 4 · 03 will be described. These devices are connected via a network 404. The network 404 is a public line such as an Internet telephone line, for example.

Here, the route connecting the e-mail transmitting device 401 and the e-mail receiving device 402 includes a route directly connecting each device via the network 404 and an e-mail transmitting / receiving device. (Relay device) There are two routes that relay 403 between them.

Further, the transmission device 401 and the reception device 402 perform transmission and reception specially like the transmission device 100 described in FIG. 1 and the reception device 200 described in FIG. 2, respectively. Although they are separate devices, the present invention is not limited to this, and a device in which a transmitting device 100 and a receiving device 20 0,0 exist in one device so that both transmitting and receiving can be performed may be used. As the e-mail transmitting / receiving device (relay device) 400 3, for example, the relay device 300 described with reference to FIG. 3 is used.

In addition, for example, a program that allows a personal computer to have the function of a voice e-mail transmission device is stored. A computer-readable recording medium such as an FD or CD-ROM 405 stores the program therein. Install the installed program on the personal computer. A personal computer to be the transmitting device 401 is equipped with a recording medium driving device.

Similarly, for example, from a computer-readable recording medium, such as an FD or CD-ROM 406, which stores a program for causing a personal computer to have a function as a voice electronic mail receiving device, the program is stored therein. Install the installed program on the personal computer. The personal computer to be the receiving device 402 is equipped with a recording medium driving device.

Also, instead of providing the program on the recording media 405 and 406, the program can be installed from the server on the provider side via a network 404 such as a telephone line internet. Alternatively, a general-purpose computer may be configured as the transmitting / receiving device 401 or the receiving device 402 or a transmitting / receiving device having both functions.

Next, with reference to a series of flowcharts shown in FIG. 5 and FIG. 6, an example of a process when the content (text) of an e-mail is input by voice and transmitted will be described. As long as the sender does not finish inputting the e-mail, the processing from step 501 will be started. Step 500).

First, the sender determines whether to enter the mail text by himself or to select from pre-registered example texts (step 501). If you want to enter it yourself, input the e-mail text by voice (step 502). Then, the voice recognition mechanism 107 in the transmission device determines whether or not the input voice can be recognized (step 503). If voice recognition is possible, dictate the input voice and create a text file (step 504).

The created text files are sequentially stored in the transmission bucket (step 505), and the process returns to step 550. If it is determined that the speech cannot be recognized, the speech file creation mechanism 108 creates a speech file of the portion (step 506). The created audio files are sequentially stored in the transmission packet (step 507), and the process returns to step 509.

Also, in step 501, if it is determined that a message is created using an example sentence registered in the example sentence DB in advance, the sender sends a command to the device to select an example sentence ( Step 5 08). The example sentence setting mechanism 109 receives this instruction (step 509), and displays the example sentence group registered in the example sentence DB in the storage unit 111 on the output device 102 such as a display. (Step 5 10).

The sender verbally inputs an identifier (example sentence number, word included in the example sentence, keyword, etc.) that identifies the example sentence to be used by referring to the displayed information (step 51). 1). Then, the voice recognition mechanism 107 determines whether or not the input voice can be recognized (step 512). If it is determined that speech recognition is not possible (step 5 12), a message prompting re-entry is displayed (step 5 13), and the sender is again asked to input the example sentence identifier (step 5). 1 1).

If speech recognition is possible, the sentence-transmitted example sentence identifier is transmitted to the example sentence setting mechanism 109 (step 514). The example sentence setting mechanism 109 searches for an example sentence from this identifier (step 515). If there is no corresponding example sentence (step 5 16), an error message is displayed (step 5 17). If there is a corresponding example sentence (step 5 16), the example sentence is displayed or played back (step 5 18). When the sender confirms the selected example sentence of the device (step 5 19), the confirmed The identification numbers of the example sentences are sequentially stored in the transmission packet (step 502), and the process returns to step 500. If the sender is not confirmed, the example sentence setting mechanism selects another example sentence again (return to step 515). When the sender completes the input of the e-mail message (step 500), one of the transmission buckets (text file, audio file, or example sentence identification number) created by the processing up to this point is performed. And the three forms are mixed) to complete it (Step 5 2 1).

With reference to the screen display example in FIG. 7, a process of inputting outgoing mail in the transmitting device 100 will be described. The example screen 600 is an example of a screen in which the sender selects whether to create an e-mail message by inputting it yourself or to select and create an e-mail message from pre-registered example sentences. If the sender selects the method of inputting by himself, the input screen shown in screen example 61 is displayed, and the sender's words (eg, how are you) are displayed on the screen. If the method of creating an e-mail is selected by selecting an example sentence, a screen for selecting an example sentence is displayed.

Screen example 62 is an example in which a list of example sentence types is displayed. In this example, the sender who wants to select the example sentence of “1. Delivery S setting” has uttered a word (1, delivery date, etc.) that is the identifier of this item.

Screen example 603 is a screen on which an example sentence group of “1. Delivery date setting” is subsequently displayed. In this example, the sender utters the identifier of the example (3, 10 or February 2, PM, etc.) that you also want to select. A screen example 604 shows a state in which a mail sentence created by the selected example sentence is displayed. In other words, the specific sentence and the information of the morning and afternoon are woven into the example sentence. "Please change to February 20 (Tue.).""I would like to specify the time zone." This is the afternoon. " An example of the format of a packet transmitted by this device will be described with reference to FIG. A packet is a mixture of one to three types of text files, audio files, and example sentence identification numbers. Each file in the packet has a sequence tag indicating the sequence number in the packet, a file type tag indicating the type of file (text file or audio file example identification number), and a file length indicating the length of the file. It is stored after the tag. The length of the file is represented, for example, by the number of bytes. Sequence tag, file type tag, file length tag, file (or example sentence identification number) No.), one set, and there are multiple sets from one set, and one packet.

Here, an example of a method in which the e-mail transmitting / receiving device recognizes the input voice or determines that the voice cannot be recognized will be described. The e-mail transmission device 100 stores a voice DB, which is a database obtained by sampling voices of various types of people (gender, age group, region, etc.), and a language DB that holds language data. It is held within 1 1. The speech recognition mechanism 107 compares the input voice data with the voice DB data, performs matching, and performs language analysis using the language DB data to indicate the input voice. Finalize the sentence. In this case, since the data of the speech DB and the data of the language DB are referred to, speech recognition accuracy is significantly improved.

Using the method described above and referring to the example of FIG. 4, the e-mail packet created by the transmitting device 401 is transmitted and received via the network 404 as a relay e-mail. It is transmitted to the device (relay device) 403 or the receiving device 402 which is the final destination. If the packet to be transmitted contains a voice file (a part that cannot be recognized by the transmitting device), it is transmitted to the central device 403, which has a higher-performance voice recognition mechanism than the transmitting device 401. Is done. If the packet does not include an audio file, the packet may be transmitted via the relay device 403 or may be directly transmitted to the reception device 402.

Referring to the flowchart of FIG. 9, an example of processing in an e-mail transmitting / receiving device (relay device) that has received a packet including an audio file will be described in the case of the relay device 300 described in FIG. I do. The packet decryption mechanism 303 in the relay device 300 repeats the processing from step 801, one by one (for each file or for each example sentence identification number) for the contents of the received packet. (Step 800).

First, it is determined whether one file (or one example sentence identification number) stored in the packet is a voice file (step 801). If it is not an audio file, it is either a text file or an identification number as an example identifier. These are stored as they are in the packet for transmission (step 802), and the process returns to step 800.

In the case of an audio file, it is dictated by the voice recognition mechanism 304, The audio is converted to a text file (step 803). Then, the packet creation mechanism 305 stores the text file in the packet for transmission (step 804), and returns to step 800. When the contents of the received packet have been completely verified and become empty (step 800), the transmission packet (text file, example sentence identification number, etc.) created by the processing up to this point is completed. Of the identifiers of which one or two formats are mixed) and complete (step 805). Thereafter, this packet is transmitted to the e-mail receiving device 200 that is the final destination. Next, with reference to a series of flowcharts shown in FIG. 10 and FIG. 11, an example of processing when the electronic mail receiving device 200 receives an electronic mail bucket will be described. Here, the content of the packet of the e-mail received by the receiving device 200 as the final destination is a text file, an example sentence identification number, or a mixture of both. This is because the transmitted audio file has been converted into a text file by the relay device 300.

First, the recipient selects the ability to listen to the received e-mail by voice, or to view it visually (for example, to view it on a screen or print it out on a printer) (step 900). The process in the case of selecting to listen by voice will be described first. The packet decoding mechanism 206 in the receiving device 200 repeats the processing for the contents of the received packet one by one (for each file or for each example sentence identification number) until the end (step 9). 0 1).

First, it is determined whether one file (or one example sentence identification number) stored in the packet is a text file or an example sentence identification number (step 902). In the case of a text file, the voice generator 207 voices the text sentence (step 905) and returns to step 901. In the case of an example sentence identification number, the example sentence is called from the example sentence DB in the storage unit 208 using the example sentence identification number as a key (step 903), and a text file of the corresponding example sentence is created (step 904). . Then, the voice conversion mechanism 207 voices the text sentence (in this case, an example sentence) of the text file (step 905), and returns to step 901. In this way, all the contents of the received packet have been converted to speech, and when the packet becomes empty (step 901), the process ends. Next, in step 900, select the received e-mail by viewing it on the screen. The processing in the case of the case will be described. The packet decryption mechanism 206 in the receiving device 200 repeats the processing up to the end of the contents of the received packet one by one (for each file or for each example sentence identification number) (step 9). 0 6). First, it is determined whether one file (or one example sentence identification number) stored in the packet is a text file or an example sentence identification number (step 907).

If it is a text file, return to step 906. In the case of the example sentence identification number, the example sentence is called from the example sentence DB using the example sentence identification number as a key (step 908), and a text file of the example sentence is created (step 909). Then, this text file is inserted into the location where the example sentence identification number was stored (step 910), and the process returns to step 906. When all the contents of the received packet have been verified (step 906), the completed text file is output to the screen (step 911), and the process ends.

As shown in Fig. 5, Fig. 6 (for transmission), Fig. 10 and Fig. 11 (for reception), the method of transmitting and receiving example sentences is as follows. In addition to the method of creating and reproducing or displaying a text file of an example sentence within an example, the contents of the example sentence may be transmitted and received in a text file format.

In this method, in step 520 of FIG. 6, a text file of an example sentence is stored in the transmission packet. At this time, the transmitted packet is a mixed type of text file and audio file. Also, since all the audio files are converted to text files by the relay device 300, the contents of the packets received by the receiving device 200 at the final destination are all in the text file format.

The processing at the time of reception in this case will be described with reference to the flowchart of FIG. In this flow, a part of the flow charts shown in FIGS. 10 and 11 is omitted. First, the recipient selects whether to listen to the received e-mail by voice or not to see it on the screen (step 100). If the user selects to listen by voice, the voice generator 207 voices the text sentence of the text file (step 1001). On the other hand, if it is selected to be viewed on the screen, a text file is output to the screen (step 1002). In this way, the output format can be selected as desired according to the recipient's preference and need. Referring to the series of flowcharts in FIGS. 13 and 14, the voice input by the sender is profiled (male, female, child, elderly, etc.), and the profile is received by the receiver 200. A description will be given of the processing of each device in the case of performing voice conversion using voices suitable for the aisle. The flowcharts of FIGS. 13 and 14 show an example of the processing in the transmitting apparatus 100. This is almost the same as the flowcharts shown in FIGS. 5 and 6, except that the processing from step 503 to step 505 in FIG. Substitute, replace the processing from Step 5 1 2 to Step 5 1 4, replace Step 1 1 1 4 with Step 1 1 1 8, and replace Step 5 2 0 in Figure 6 with Step 1 1 2 4 I have.

The processing from step 1 _{103 to} step 1 ₁₀₇ is as _{follows.c In} step 1103, the speech recognition mechanism 107 determines whether or not the orally input mail text can be recognized by speech. Deciding. If voice recognition is possible, if voice profiling has not been completed at that time (step 1104), voice profiling is performed here (step 1105).

In other words, input speech is classified into gender, age group, etc., while referring to speech DB which is a database obtained by sampling speech of various types of people (gender, age group, region, etc.). After dictating the input e-mail message and creating a text file (step 1106), the created text file and the profile result are stored in the transmission packet (step 1107). .

In addition, the processing in Steps 1 1 1 1 to 1 1 4 and Step 1 1 2 4 and Step 1 1 2 4 are the same. In step 1 124, processing is performed to store the selected example sentence number and the profile result in the transmission packet. Also, the processing in the receiving device 100 is such that step 905 of the flowchart shown in FIG. 10 is replaced with “speech the text sentence of the text file based on the voice profile”. Becomes The voice profiling classifies the input voice into gender, age group, etc. while referring to the voice DB, or the sender may directly input the voice. Processing can be omitted, and processing speed can be increased accordingly.

In the transmission bucket, the profile result is stored together with the text file and transmitted. Therefore, the receiving device 200 that has received the profile uses the profiling result. The sound can be reproduced by using this function, so the reproducibility of the sound is improved. In the case of a relay device as well, an audio file that has not been converted into text can be converted into text using the profiling result, so that the conversion accuracy can be increased.

In the above embodiment, the recognized speech is converted into at least a text file, and is mixed with the audio file and transmitted.When relaying this, the example sentence identifier is also mixed and transmitted. Also, in the case of relaying this, a case where a text file or a text file and an identifier are received and output as audio or text has been described, but a text file may be combined with an image file. . In this case, at least one of the three files of the audio file, the example sentence identifier, and the image file is transmitted to the text file, and the data is relayed and received.

For example, input “Happy New Year” by voice to the transmitting device, and also input an instruction to display in brush with input devices such as a keyboard and a special button. Then, say "Thank you again this year" by voice, and do not enter any image display instructions using a separate keypad for this document. The receiver that received the signal containing such information displays the `` Happy New Year J '' on the screen in a brush font or a brush image, and the `` Thank you for this year '' Displays in a plain text text document without any decoration. With this configuration, the receiving side can display a visually varied document according to the sender's intention. Industrial potential ''

As described above, according to the present invention, the text file document converted by the voice recognition mechanism and the voice file including the voice document not converted to the text file by the voice recognition mechanism are mixed and transmitted. Since a transmission mechanism is provided, it is possible to provide a transmission device capable of transmitting all or a part of an audio file in a text file with a reduced data amount.

Claims

The scope of the claims

1. a voice input device for inputting a voice document;

A voice recognition mechanism for recognizing voice input by the voice input device and converting it into a text file;

A transmission mechanism for mixing and transmitting a text file converted by the voice recognition mechanism and a voice file including a voice document not converted to a text file by the voice recognition mechanism;

Voice document transmission device.

2. An example sentence setting mechanism for converting a voice input by the voice input device into an identifier of an example sentence registered in advance;

The transmitting mechanism is further configured to transmit an example sentence identifier mixedly;

The voice document transmitting device according to claim 1.

3. The voice document transmitting device according to claim 1, wherein the voice recognition mechanism is configured to refer to the voice profile.

4. A voice document transmitting method for transmitting a voice document using the voice document transmitting device according to any one of claims 1 to 3.

5. a voice input process for inputting a voice document;

A voice recognition step of recognizing the voice input in the voice input step and converting it into a text file;

A transmitting step of mixing and transmitting the text file converted in the voice recognition step and a voice file including a voice document not converted to the text file in the voice recognition step;

Voice document transmission method.

6. An example sentence identification step for converting the speech input in the speech input step into an identifier of an example sentence registered in advance;

The transmitting step further transmits an example sentence identifier mixedly;

The voice document transmission method according to claim 5.

7. The voice document transmitting method according to claim 5, wherein the voice recognition step is configured to recognize the voice by referring to the voice profile.

8. Voice input processing for inputting a document by voice;

Voice recognition processing for recognizing the voice input in the voice input processing and converting it into a text file;

Transmitting a text file converted by the voice recognition process and a voice file including a voice document that has not been converted into a text file by the voice recognition process;

Providing a program for controlling the transmitting device to the transmitting device, to constitute a voice document transmitting device;

A method for manufacturing a voice document transmitting device.

9. voice input processing for inputting a document by voice;

A recording medium that stores a program for controlling the voice document transmission device and is readable by the voice document transmission device.

10. A receiving mechanism for receiving a signal containing a text file; A decoding mechanism for decoding a signal received by the receiving device;

A voice conversion mechanism for converting the text file decrypted by the decryption mechanism into voice;

Voice document receiving device.

11. A voice document receiving method for receiving a voice document using the voice document receiving device according to claim 10.

12. receiving a signal containing a text file;

A decoding step of decoding the signal received in the receiving step;

An audio converting step of converting the text file decrypted in the decrypting step into voice.

How to receive voice documents.

1 3. Reception processing for receiving a signal containing a text file;

Decoding processing for decoding the signal received in the reception processing;

Performing a voice conversion process for converting a text file included in the signal decoded in the decoding process into a voice;

A program for controlling the receiving device is provided to the receiving device, and is configured as a voice document receiving device;

A method for manufacturing a voice document receiving device.

14. Reception processing for receiving a signal containing a text file;

Performing a voice conversion process of converting a text file included in the signal decoded by the decoding process into a voice;

A recording medium that stores a program for controlling the voice document receiving device and is readable by the voice document receiving device.

15. A receiving mechanism for receiving a signal including a text file and a voice file; a voice recognition mechanism for converting a voice file in the signal received by the receiving mechanism into a text file;

A text file transmitting mechanism for transmitting the text file received by the receiving mechanism and the text file converted by the voice recognition mechanism together; a voice document relay device;

16. A voice document relay method for relaying a voice document using the voice document relay device according to claim 15.

17. A receiving step of receiving a signal including a text file and an audio file; a text converting step of converting an audio file in the signal received in the receiving step into a text file;

A text file transmitting step of transmitting together the text file received in the receiving step and the text file converted in the text converting step; a voice document relay method.