WO2007015319A1 - Voice output apparatus, voice communication apparatus and voice output method - Google Patents

Voice output apparatus, voice communication apparatus and voice output method Download PDF

Info

Publication number
WO2007015319A1
WO2007015319A1 PCT/JP2006/304390 JP2006304390W WO2007015319A1 WO 2007015319 A1 WO2007015319 A1 WO 2007015319A1 JP 2006304390 W JP2006304390 W JP 2006304390W WO 2007015319 A1 WO2007015319 A1 WO 2007015319A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
voice
speech
phoneme
control means
Prior art date
Application number
PCT/JP2006/304390
Other languages
French (fr)
Japanese (ja)
Inventor
Kouji Hatano
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Priority to JP2007503136A priority Critical patent/JPWO2007015319A1/en
Publication of WO2007015319A1 publication Critical patent/WO2007015319A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set

Definitions

  • Audio output device audio communication device, and audio output method
  • the present invention relates to an apparatus that performs processing on input voice and outputs the voice, and more particularly to a voice output apparatus that can be used as a means for preventing unjust or illegal calls such as mischievous telephone calls in telephone communications.
  • Patent Document 1 Japanese Patent Laid-Open No. 2000-78246 (Page 6, Figure 1)
  • the present invention solves the above-described conventional problems by making the partner think that the other party does not know the nationality of the user and cannot communicate with the user.
  • An object of the present invention is to provide a voice communication device that can suppress another malicious call.
  • the speech output device of the present invention includes speech analysis means for extracting phonological information from input speech, and speech output control means for instructing speech output based on the phonological information.
  • a voice output unit that outputs voice based on an instruction from the voice output control means is used to output a random phoneme voice based on the phoneme information of the input voice.
  • the phoneme information includes information for identifying a phoneme or syllable included in the input speech, and the speech output control unit outputs the phoneme or syllable by replacing the phoneme or syllable according to a predetermined rule.
  • the phoneme information includes information indicating whether or not the input sound is voiced.
  • the voice output control means instructs the voice output based on the information indicating whether there is sound.
  • the phoneme information includes information indicating the fundamental frequency of the input speech
  • the speech output control means determines the fundamental frequency of the output speech based on the fundamental frequency information.
  • the change in the fundamental frequency of the output speech has fluctuations having the statistically similar properties to the change in the fundamental frequency of the input speech, and the naturalness of the output speech is increased. For this reason, it is possible to make it seem impossible to communicate with the user without giving the other party the room to suspect that the output speech is a standard message or synthesized speech.
  • the present invention also constitutes a voice communication apparatus further comprising a communication unit that executes communication processing and outputs output voice to a communication destination.
  • the speech output method of the present invention also includes a first step of extracting phoneme information of input speech power and a second step of indicating output of output speech composed of random phonemes based on the phoneme information. And a third step of outputting the output sound based on the instruction.
  • FIG. 1 is a schematic configuration diagram of an audio output device according to Embodiment 1 of the present invention.
  • FIG. 2 is a configuration diagram of a mobile phone terminal according to Embodiment 1 of the present invention.
  • FIG. 3 is an external view of a mobile phone terminal according to Embodiment 1 of the present invention.
  • FIG. 4 is a diagram for explaining the contents of a V, phoneme replacement candidate table held by the voice output control means of the mobile phone terminal according to the first embodiment of the present invention.
  • FIG. 5 is a flowchart showing a processing procedure of voice output control means of the mobile phone terminal in the first embodiment of the present invention.
  • FIG. 6 is a diagram for explaining the operation of the mobile phone terminal according to the first embodiment of the present invention.
  • FIG. 7 is a configuration diagram of a mobile phone terminal according to Embodiment 2 of the present invention.
  • FIG. 8 is a diagram for explaining the contents of the syllable string table held by the voice output means of the mobile phone terminal in the second embodiment of the present invention.
  • FIG. 9 is a first flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
  • FIG. 10 is a second flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
  • FIG. 11 is a third flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
  • FIG. 12 is a diagram illustrating a first operation of the mobile phone terminal according to the second embodiment of the present invention.
  • FIG. 13 is a diagram illustrating a second operation of the mobile phone terminal according to the second embodiment of the present invention.
  • FIG. 1 is a schematic configuration diagram of an audio output device 1 according to Embodiment 1 of the present invention.
  • an audio output device 1 includes audio analysis means 2, audio output control means 3, and audio output means.
  • the voice analysis means 2 receives the input voice 201 and executes voice analysis processing, extracts the phoneme information of the input voice 201 and outputs the phoneme information 202.
  • phonological information described in this specification refers to phonological information of speech obtained as a result of speech analysis. For example, a symbol or symbol string that identifies the phonemes or syllables that make up the speech, and the strength or weakness of the speech Information, information on the fundamental frequency, information for distinguishing between sound and silence, and the like correspond to “phonological information” described in this specification.
  • the voice output control means 3 receives the phoneme information 202 and outputs a voice output instruction 302 that indicates the output of the output voice 402.
  • the sound output means 4 outputs the output sound 402 based on the sound output instruction 302.
  • the voice output control means 3 outputs a voice output instruction 302 based on the phoneme information 202 so that the output voice 402 is composed of random phonemes.
  • the voice analysis processing in the voice analysis means 2 may be performed using a known voice analysis device or voice analysis method, and therefore detailed description thereof will be omitted in this specification.
  • the voice output process in the voice output means 4 may be performed using a known voice synthesizer or voice synthesizer. Therefore, detailed description will be omitted in this specification.
  • FIG. 2 is a configuration diagram when the audio output device 1 according to Embodiment 1 of the present invention is configured as the mobile phone terminal 100.
  • the mobile phone terminal 100 includes wireless communication means 5, microphone 61, speaker 62, signal switching means 7, signal switching button 91, on-hook button 92, and off-hook button 93 in addition to the components described in FIG. .
  • the voice analysis means 2 receives the input voice 201, executes voice analysis processing, calculates the short-term average power of the input voice 201, and determines whether the received input voice 201 is a voiced section or a silent section. If there is sound, the signal “ON” is output as the phoneme information 202, and if there is no sound, the signal “OFF” is output as the phoneme information 202. Further, the speech analysis means 2 identifies phonemes by performing cepstrum analysis or the like on the input speech 201 and outputs phoneme symbols (for example, ZpZ, ZaZ) as phoneme information 202. Further, the voice analysis means 2 outputs the fundamental frequency information representing the fundamental frequency of every 30 milliseconds of the input voice 201 as the phoneme information 202.
  • the speech output control means 3 determines the phoneme to be included in the output speech 402 and the fundamental frequency of the output speech 402 based on the phoneme symbol and the fundamental frequency information included in the phoneme information 202, and the speech output instruction 302 Output as.
  • the voice output means 4 synthesizes and outputs the output voice 402 based on the phoneme and the fundamental frequency indicated by the voice output instruction 302. Also, the audio output means 4 performs the articulation coupling process when the phoneme transitions, the interpolation process of the fundamental frequency change, etc., so that the output voice 402 does not become unnatural as a voice! / RU
  • the wireless communication means 5 is a part that performs communication processing by connecting the mobile phone terminal 100 to a wireless public network (not shown). The wireless communication means 5 outputs a transmission signal 501 to the wireless public network. Further, the wireless communication means 5 outputs the reception signal 502 that has acquired the wireless public network power to the speaker 62.
  • the wireless communication means 5 starts outputting the transmission signal 501 to the wireless public network and acquiring the reception signal 502 from the wireless public network based on the operation of the off-hook button 93. Further, the wireless communication means 5 terminates the connection with the wireless public network based on the operation of the on-hook button 92.
  • the microphone 61 converts the user's voice into an electrical signal and outputs a microphone output signal 612.
  • the speaker 62 converts the received signal 502 into air vibration and emits sound.
  • the signal switching means 7 is a means for switching and outputting two types of signals. By operating the signal switching button 91, the output sound 402 is output as the transmission signal 501 or the microphone output signal 612 is used as the transmission signal 501. The output can be switched. That is, the signal switching button determines which of the output voice 402 processed for the user's voice and the microphone output signal 612 that is not processed is output to the wireless public network to be heard by the other party. It is configured so that the user can select by operating 91.
  • the signal switching means 7 outputs the output sound 402 as the transmission signal 501 in the initial state of the communication processing by the wireless communication means 5. This prevents the inconvenience that the other party hears the voice that the user unexpectedly uttered without knowing that the other party is malicious and the user's personal information is known to the other party. Can do.
  • FIG. 3 is a diagram showing an appearance of the mobile phone terminal 100 in the embodiment of the present invention.
  • the signal switching button 91 is arranged on the side of the lower part of the casing so that the user can operate it without looking at his / her hand during a call.
  • FIG. 4 is a diagram showing the contents of the phoneme replacement candidate table T31 held by the audio output control means 3 of the mobile phone terminal 100 according to Embodiment 1 of the present invention.
  • the phoneme replacement candidate table T31 is a table showing another phoneme candidate for replacing a phoneme included in the phoneme information 202.
  • Each record R311 to R318 of the phoneme replacement candidate table T31 includes a first field F311 representing a phoneme p constituting the phoneme information 202, and a second field F312 representing a candidate of another phoneme P ′ that replaces the phoneme p. Become.
  • Sound output control means 3 is sound A speech output instruction 302 is generated by replacing the phoneme p constituting the phoneme information 202 with the phoneme p ′ according to a predetermined rule (hereinafter referred to as a phoneme replacement rule) based on the phoneme replacement candidate table T31. That is, the speech output control means 3 searches the phoneme replacement candidate table T31 for a record including the phoneme P in the first field (F311), and also searches for the medium force of the records R311 to R318, and the second field (F312) of the corresponding record indicates The phoneme p is also selected as a phoneme P ′, and the phoneme p ′ is replaced with the phoneme p ′, and the speech output instruction 302 is generated.
  • a phoneme replacement rule a predetermined rule
  • FIG. 5 is a flowchart showing a processing procedure of audio output control means 3 of mobile phone terminal 100 according to Embodiment 1 of the present invention.
  • the voice output control means 3 first acquires the phoneme information 202 (step S 101), and based on whether the signal included in the phoneme information 202 is “ON” or “OFF”, It is determined whether or not is a voiced section (step S102). If it is a voiced section (YES), the audio output control means 3 proceeds to step S103, and if it is a silent section (NO), proceeds to step S101.
  • step S103 the speech output control means 3 determines whether the phoneme included in the phoneme information 202 acquired this time has the same power as the phoneme included in the phoneme information 202 acquired last time. ) Proceed to step S105, if different! / (YES) Proceed to step S104.
  • step S104 the speech output control means 3 replaces the phoneme p constituting the phoneme information 202 with a new phoneme P ′ in accordance with the phoneme replacement rule, thereby obtaining a speech output instruction 302.
  • step S105 the speech output control means 3 replaces the phoneme p constituting the phoneme information 202 with the phoneme p ′ obtained in the previous processing of step S104 to obtain a speech output instruction 302.
  • the determination in step S103 and the processing in step S105 are for converting a section in which the same phoneme in the input speech 201 is continued into a section in which another same phoneme is continued.
  • step S 106 the audio output control means 3 calculates the basic frequency F 0 ′ according to the frequency conversion equation based on the basic frequency F 0 indicated by the basic frequency information included in the phoneme information 202, and sets it as the audio output instruction 302. .
  • the frequency conversion formula is as follows.
  • F0, F0 * r * (random number (0.4) +0.8)
  • the coefficient r may be changeable by user operation. Random number (0.4) represents a random number less than 0.4.
  • the change in intonation of the input voice 402 can be disturbed, and the intonation power of the output voice 402 can be recognized by the other party as to what language the input voice 201 is in. To prevent.
  • step S107 the audio output control means 3 outputs the audio output instruction 302 generated by the processing up to step S106 to the audio output means 4, and returns to step S101.
  • FIG. 6 shows the contents of input voice 201 and output voice 402 when mobile phone terminal 100 according to Embodiment 1 of the present invention utters a voice “Who?” During a call. It is.
  • the horizontal axis represents time and the vertical axis represents the fundamental frequency.
  • a line group L101 represents a change in the fundamental frequency of the input sound 201
  • a line group L102 represents a change in the fundamental frequency of the output sound 402.
  • the phoneme symbols described immediately above the line group L101 and the line group L102 are the phoneme (D101 to D106) included in the phoneme information 202 and the phoneme (D111 to D116) indicated by the voice output instruction 302, respectively. Indicates.
  • the operation of mobile phone terminal 100 in the first embodiment of the present invention will be described below with reference to FIG.
  • the voice analysis means 2 analyzes the input voice 201 and extracts the phoneme information 202 at the time tlOl.
  • the phoneme information 202 includes information on the phoneme ZdZ (DlOl) and the fundamental frequency F0, and a signal “ON” indicating that it is a voiced section.
  • the audio output control means 3 executes processing according to the procedure shown in the flowchart of FIG.
  • step S101 the audio output control means 3 acquires phonological information 202.
  • step S102 since the sound output control means 3 is a sound section (“ON”) (YES), the process proceeds to Step S103.
  • step S103 the voice output control means 3 has newly received the phoneme ZdZ, so the determination is “YES”, and the flow proceeds to step S104.
  • step S104 the speech output control means 3 refers to the phoneme replacement candidate table T31 in FIG. 4, and selects a phoneme P ′ for replacing the phoneme ZdZ according to the phoneme replacement rule.
  • speech output control means 3 uses phoneme p ′ as the phoneme selected randomly from phoneme ZkZ or ZgZ. And Here, ZkZ is selected as phoneme p '.
  • step S106 of FIG. 5 the audio output control means 3 calculates the basic frequency FO 'according to the frequency conversion equation, and in step 107, the audio output control means 3 instructs the phoneme ZkZ and the basic frequency FO' to be output as an audio signal. Output as 302.
  • the audio output means 4 synthesizes and outputs the output audio 402 based on the audio output instruction 302.
  • the output speech 402 is speech with phoneme ZkZ (Di l i) force, and the fundamental frequency is about half of the fundamental frequency of the input speech 201.
  • the output speech 402 having the phoneme ZiZ (D112) force is output from the speech output means 4.
  • the output voice 402 is sent to the wireless communication means 5 as the transmission signal 501 through the signal switching means, and is further outputted to the terminal of the other party through the wireless public network.
  • the speech analysis means 2 cannot extract the fundamental frequency and outputs 0 as the value of the fundamental frequency FO in the phoneme information 202.
  • the audio output control means 3 outputs 0 as the value of the basic frequency FO ′ in the process of step S106, but the audio output means 4 outputs the basic frequency of the phoneme ZiZ (D112) accepted last time and the next accepted phoneme.
  • the basic frequency of the output speech 402 corresponding to the phoneme ZrZ (D113) is smoothly changed by interpolating with the basic frequency of ZaZ (Dl 14).
  • the fundamental frequency of the phoneme ZiZ (Dl 15) of the output speech 402 corresponding to the phoneme ZaZ (D105) of the input speech 201 is determined by the influence of random numbers in the frequency conversion formula of step S106 in FIG. It is slightly higher than the fundamental frequency of ZnZ.
  • the Japanese input voice 201 “Whisama?” 1S is converted into a male voice “Kirani Johi?” That does not make sense as Japanese, and the wireless public network is Via the other party's terminal.
  • the speech analysis means 2 extracts the phoneme information from the input speech 201 and outputs the phoneme information 202.
  • the voice output control means 3 outputs the voice output instruction 302 based on the phoneme information 202 by executing the processing in the procedure shown in the flowchart of FIG. 5, and as the third step, the voice output means 4 Synthesizes the output sound 402 based on the sound output instruction 302 and outputs it.
  • random voice in which the phoneme of the input voice is replaced with another phoneme can be output to the terminal of the other party.
  • the mobile phone terminal according to Embodiment 1 of the present invention outputs the voice in which the phoneme of the input voice is replaced with another phoneme to the other party's terminal, Since it can output voice that does not make sense to the other party, it can be thought to the other party that it is impossible to communicate with the user without knowing the nationality of the user. .
  • the mobile phone terminal according to Embodiment 1 of the present invention has a fluctuation in which the change in the fundamental frequency of the output voice has a property that is statistically similar to the change in the fundamental frequency of the input voice. Since the naturalness of the voice increases, it is possible to make the other party think that it is impossible to communicate with the user without giving the other party the room to suspect that the output voice is a standard message or synthesized voice. .
  • the second embodiment is characterized in that the received signal that is the output of the wireless communication means 5 is used as the input voice.
  • FIG. 7 is a configuration diagram of the mobile phone terminal 200 according to Embodiment 2 of the present invention.
  • the mobile phone terminal 200 includes a voice analysis means 2, a voice output control means 3, a voice output means 4, a fixed syllable holding means 41, a wireless communication means 5, a microphone 61, a speaker 62, a signal switching means 7, A signal switching button 91, an on-hook button 92, and an off-hook button 93 are provided.
  • the fixed syllable holding means 41 holds a fixed syllable string for constituting the output speech 402. It is means to have.
  • the voice analysis means 2 accepts the received signal 502, which is the output of the wireless communication means 5, as the input voice 201, executes voice analysis processing, calculates the short-term average power of the input voice 201, and accepts it. It is determined whether the input voice 201 is a voiced section or a silent section, and the signal “ON” is output as phoneme information 202 when there is a voice and the signal “OFF” is output when there is no sound.
  • the voice output control means 3 generates a syllable string ID for identifying a syllable string to be output as the output voice 402 based on the phonological information 202, and outputs it as the voice output instruction 302.
  • the audio output control means 3 outputs an on-hook signal 304 that is an instruction for the wireless communication means 5 to end the connection with the wireless public network.
  • the voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data acquired from the fixed syllable holding means 41 based on the syllable string ID included in the voice output instruction 302.
  • the fixed syllable holding means 41 holds a syllable string table which is a table for obtaining syllable string data based on the syllable string ID.
  • FIG. 8 is a diagram showing the contents of the syllable string table T41 held by the fixed syllable holding means 41 of the mobile phone terminal 200 according to Embodiment 2 of the present invention.
  • Each record R411 to R414 of the syllable string table T41 includes a first field F411 representing a syllable string ID and a second field F412 representing syllable string data corresponding to the syllable string ID.
  • the syllable string data is expressed as a string of pairs of syllables (alphabet enclosed in parentheses []) to be output as the output speech 402 and coefficients of the fundamental frequency of syllables (numerical values enclosed in parentheses 0).
  • Syllable string data is not intended to generate speech that makes no sense to the other party.
  • FIGS. 9 to 11 are flowcharts showing the processing procedure of audio output control means 3 of mobile phone terminal 200 according to Embodiment 2 of the present invention.
  • the voice output control means 3 first acquires the phoneme information 202 in step S211.
  • the audio output control means 3 determines whether the signal included in the phoneme information 202 is “ON” or “OFF”. It is determined whether or not the input voice 201 is in a voice section. If it is a voiced section (YES), the audio output control means 3 proceeds to step S221 in FIG. 10, and if it is a silent section (NO), proceeds to step S213.
  • step S213 the audio output control unit 3 determines whether the audio output unit 4 is outputting the output audio 402. If the audio output unit 4 is outputting (YES), the process returns to step S211 and is not being output (NO ), Go to step S214.
  • step S214 the voice output control means 3 determines whether or not the silent section of the input voice 201 is the first silent section after the start of the call, that is, immediately after the start of the call. (YES) Proceed to step S231 in FIG. 11, otherwise (NO) proceed to step S215.
  • step 215 the audio output control means 3 generates a pseudorandom number d that is greater than or equal to 0 and less than 1, and branches the process depending on the value. If d force .2 is less than 2, go to step S216, if 0.2 or more and less than 0.9, go to step S217, if d force .9 or more, go to step S21 9 o
  • step S218 the audio output control means 3 outputs the syllable string ID selected in step S216 or S217 as the audio output instruction 302, and returns to step S211.
  • step S219 the audio output control means 3 outputs the on-hook signal 304, whereby the radio communication means 5 ends the connection with the radio public network.
  • the voice output control means 3 automatically instructs to disconnect the call after a while, the user does not have to operate the on-hook button 92 while listening to the voice of the other party with malicious intent. As a result, convenience is improved.
  • voice output control means when the other party is making voice 3 is a processing procedure.
  • the audio output control means 3 generates a pseudo-random number d that is greater than or equal to 0 and less than 1, and branches the process depending on the value. If d is less than 0.998, the process proceeds to step S222. If d is 0.998 or more, the process proceeds to step S229.
  • Steps S222 to S223 are processes for making the other party think as if the user had stopped speaking because the other party started speaking while the user was speaking. Through the processing of steps S222 to S223, it is possible to make the other party believe that the user is speaking while listening to the voice of the other party.
  • step S229 when the audio output control means 3 outputs the on-hook signal 304, the wireless communication means 5 ends the connection with the wireless public network.
  • Steps S231 to S239 in FIG. 11 are voice output control means when it is determined in step S214 in FIG. 9 that the silent section of the input voice 201 is the first silent section after the start of the call, that is, immediately after the start of the call. 3 is a processing procedure.
  • the audio output control means 3 determines whether or not the two output of the syllable string ID has already been completed. If not completed (NO), the process proceeds to step S232, and if completed, (YES) step. Proceed to S239 to output the on-hook signal 304.
  • a syllable string ID 1, that is, to output a syllable string such as a greeting immediately after the start of a call.
  • Steps S231 to S239 are processes to make the caller feel as if the call is disconnected because the user greets immediately after the call starts and the caller does not respond even if greeting is made twice. is there.
  • the audio output control means 3 When the audio output control means 3 outputs the audio output instruction 302 in steps S218, S223, and S233, the process in which the audio output means 4 receives it and outputs the output audio 402 is executed in the following procedure. .
  • the voice output means 4 can obtain the syllable string data corresponding to the syllable string ID included in the voice output instruction 302 by searching the syllable string table T41 (FIG. 8).
  • the voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data.
  • the voice output means 4 calculates the fundamental frequency FO ′ of the output voice 402 for each syllable according to the frequency calculation formula based on the fundamental frequency coefficient (X based on the syllable string data.
  • the frequency calculation formula is as follows. is there.
  • FO, FObase * a * (random number (0. 4) +0.8)
  • FObase is an initial value of the fundamental frequency of the output sound 402.
  • the value of FObase may be changed by user operation.
  • the random number (0. 4) represents a random number less than 0.4.
  • the voice output means 4 performs processing for interpolating the fundamental frequencies of the preceding and following syllables at the syllable boundary so that the output voice 402 does not become unnatural as a voice.
  • the specific operation of Embodiment 2 of the present invention will be described below.
  • the synthesized voice based on the result of analyzing the other party's voice as the input voice is output to the wireless public network as the output voice, and the other party is told.
  • the other party makes a silent call will be described.
  • FIG. 12 is a diagram showing the contents of output sound 402 in mobile phone terminal 200 according to Embodiment 2 of the present invention.
  • the horizontal axis represents time and the vertical axis represents the fundamental frequency.
  • Syllable symbols D211 to D214 represent syllables included in the output speech 402.
  • the voice analysis means 2 analyzes the input voice 201 and extracts phonological information 202. .
  • the extracted phoneme information 202 includes a signal “OFF” indicating silence.
  • step S211 the audio output control means 3 acquires the phoneme information 202.
  • step S212 the audio output control means 3 is a silent section (“OFF”) (NO), so the process proceeds to step S213.
  • step S213 the sound output control means 3 outputs the output sound 402, and the process proceeds to step S214 because! / ⁇ (NO).
  • step S 214! / The voice output control means 3 is the first silent section after the start of the call (YES), so the process proceeds to step S 231 in FIG.
  • step S231 since the audio output control means 3 has not output the output audio 402 even once (NO), the process proceeds to step S232.
  • step S232 the audio output control means 3 sets the syllable string ID to 1, and outputs an audio output instruction 302 in step 233.
  • the voice output means 4 receives the voice output instruction 302, and refers to the syllable string table T41, and the syllable string data “[ho] (1. 0), [la] ( l. 2) ”(the second field F412 of the record R411 in FIG. 8) is obtained and the output speech 402 (D211 and D212 in FIG. 12) is synthesized.
  • the coefficient ⁇ of the syllable string data is set so that the syllable [ho] is 1.0 and [la] is 1.2, and the fundamental frequency of the syllable [la] is set higher.
  • the fundamental frequency of 402 is higher for syllable [ho] (D211) than for syllable [la] (D212). This is because fluctuation is given to the fundamental frequency of the output speech 402 by the term of the random number in the frequency calculation formula.
  • step S239 the on-hook signal 304 is output so that the wireless communication means 5 at time t202 in FIG. The connection with the wireless public network is terminated.
  • the mobile phone terminal according to Embodiment 2 of the present invention has a communication mode as if it is a fixed phrase that is uttered as a greeting when the other party makes a silent call. After outputting the voice to make it feel like a hand for a specified number of times, the connection with the public network is automatically terminated.
  • FIG. 13 is a diagram showing the contents of the input sound 201 and the output sound 402 in the mobile phone terminal 200 in the mobile phone 2 of the present embodiment.
  • the horizontal axis represents time.
  • FIG. 13 (a), FIG. 13 (b), and FIG. 13 (c) show the continuous operation divided into three for convenience.
  • texts D301 to D305 represent the contents of the input voice 201, that is, the voice of the other party
  • texts D311 to D315 represent the contents of the output voice 402.
  • information on the fundamental frequencies of the input sound 201 and the output sound 402 is omitted.
  • the operation of the mobile phone terminal 200 according to the second embodiment will be described with reference to FIG.
  • the voice analysis means 2 analyzes the input voice 201 and extracts phonological information 202.
  • the extracted phoneme information 202 includes a signal “ON” indicating sound.
  • step S211 the audio output control means 3 acquires the phoneme information 202.
  • step S212 since the sound output control means 3 is a sound section (“ON”) (YES), the process proceeds to step S221 in FIG.
  • step S221 the voice output control means 3 generates a pseudo-random number d.
  • d 0.2
  • d 0.999
  • step S211 the phoneme information 202
  • step S213 the phoneme information 202
  • step S214 since the sound output control means 3 is not the first silent section (NO), the process proceeds to step S215.
  • step S217 the voice output control means 3 randomly selects a syllable string ID.
  • ID 3
  • the voice output means 4 accepts the voice output instruction 302 and refers to the syllable string table T41, and the syllable string data [[ki] (l. 0), [ru] (0. 9), [mi] (0 9), [ji] (l. 2), [hi] (l. 1), [go] (1. 0), [che] (1. 3), [si] (1. 5) ”( The second field F412) of the record R413 in FIG. 8 is obtained, and the sound D312 (FIG. 13) is output as the output sound 402.
  • the voice output control means 3 proceeds to the process of FIG. 10 based on the determination in step S212 (YES).
  • the audio output control means 3 executes the processing of steps S222 and S223.
  • the audio output means 4 interrupts the output of the output audio 402 at time t303 in FIG.
  • the output control means 3 outputs the on-hook signal 304 in step S229, so that the wireless communication means 5 terminates the connection with the wireless public network at time t304.
  • the mobile phone terminal according to Embodiment 2 of the present invention is used by outputting a voice that does not make sense to the other party in the middle of the other party's voice.
  • the call is automatically disconnected after pretending that the person is talking.
  • the mobile phone terminal according to Embodiment 2 of the present invention outputs a syllable string to the call partner's terminal at random, and therefore outputs voice that does not make sense to the call partner. As a result, it is possible to make the call partner think that it is impossible to communicate with the user without knowing the nationality of the user.
  • the mobile phone terminal according to Embodiment 2 of the present invention can change the output of the sound based on whether the other party's voice is sounded, the voice according to the other party's utterance situation Since it can be output, it does not give the other party the room to suspect that the output voice is a standard message or synthesized voice, and it makes sure that the other party thinks that communication with the user is impossible.
  • the mobile phone terminal according to Embodiment 2 of the present invention outputs a standard voice occasionally, the naturalness of the output voice increases, so it is suspected that the output voice is a standard message or a synthesized voice. Without giving room to the other party, it is possible to make the other party think that the communication with the user is impossible.
  • the mobile phone terminal according to Embodiment 2 of the present invention stops voice output when the other party starts speaking during voice output, the user actually listens to the voice of the other party. Since the call partner can be made to think that he / she is speaking, the call partner can be surely made to think that communication with the user is impossible.
  • the mobile phone terminal according to Embodiment 2 of the present invention automatically terminates communication during a call, it is useless even if the user continues the call without knowing the meaning of the other party's call. Therefore, it is possible to make the caller feel as if an on-hook operation has been performed, so that it is possible to make the caller believe that communication with the user is impossible.
  • the syllable string data is also selected in advance, but any means can be used as long as it can output a meaningless voice for the other party.
  • a random syllable string or phoneme string may be generated each time a voice is output.
  • the condition for determining whether to output the output sound or disconnect the communication (the range of the value of d in the branch determination of steps S215 and S221) is constant. Although there are cases where this is not the case, it may be possible to change the conditions, such as increasing the probability of disconnecting communications with each incoming call.
  • the terminal incorporates the audio output device of the present invention.
  • an external device connected to the terminal such as an exchange, a repeater, a server, etc.
  • the voice output device of the invention may be built in, and the voice of the user or the other party may be processed to output the voice.
  • the present invention is not limited to this, and the same applies to a landline phone, an IP phone, a voice chat, an interphone, and the like. An effect is obtained.
  • the audio output device of the present invention can be used for an apparatus that outputs a sound having randomness with respect to a user's input voice.
  • an electronic pet, a pet robot, a toy, a game machine It can also be used as an audio output device for configuring game software or the like.
  • the voice output device and voice output method of the present invention make the caller think that the caller of the malicious call cannot communicate with the user who cannot know the nationality of the user. Therefore, it is effective for a voice communication apparatus capable of suppressing another malicious call by a caller and capable of making a call with an unspecified person.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A mobile telephone terminal that, when receiving a telephone solicitation or receiving an ill-disposed telephone call which tries to steal the identity information of the user of the mobile telephone terminal, causes the caller to think it impossible to communicate with the user, without letting the caller know the nationality of the user. A mobile telephone terminal (100) comprises a voice analyzing means (2) that analyzes a user input voice (201), which is acquired by a microphone (61), to output phonological information (202); a voice output control means (3) that randomly replaces phonemes or syllables included in the phonological information (202) to output them as voice output instructions (302); a voice output means (4) that output, based on the voice output instructions (302), output voice (402); a signal switching means (7) that outputs the output voice (402) as a transmitted telephone signal (501); and a wireless communication means (5) that outputs the transmitted telephone signal (501) to a wireless public network. In this way, the voice of the user is converted to meaningless voice, which is then caused to be heard by the caller.

Description

明 細 書  Specification
音声出力装置、音声通信装置および音声出力方法  Audio output device, audio communication device, and audio output method
技術分野  Technical field
[0001] 本発明は、入力音声に処理を施して音声を出力する装置に関し、特に電話通信に おけるいたずら電話等の不当または不正な呼びを防止する手段として利用可能な音 声出力装置に関する。  TECHNICAL FIELD [0001] The present invention relates to an apparatus that performs processing on input voice and outputs the voice, and more particularly to a voice output apparatus that can be used as a means for preventing unjust or illegal calls such as mischievous telephone calls in telephone communications.
背景技術  Background art
[0002] 近年、電話を利用して人々を騙し、金銭を指定口座に振り込ませることで不正に利 益を得ようとする「振り込め詐欺」が社会問題ィ匕している。また、電話を利用して利用 者が望まない高額なサービスや商品を売りつけようとする悪徳商法も後を絶たない。 このような詐欺行為や勧誘行為等を目的とした悪意呼の他にも、ランダムな電話番号 に発呼して利用者に発声させ、利用者の国籍や性別、年代等の個人情報を入手し、 詐欺行為や勧誘行為につなげようとする悪意呼もあることが知られている。  [0002] In recent years, “transfer fraud” that tries to gain profits by fraudulently using people over the telephone and transferring money to a designated account has become a social problem. In addition, there is no end to the unscrupulous commercial law that attempts to sell expensive services and products that users do not want using telephones. In addition to malicious calls for the purpose of fraud and solicitation, a random phone number is called and the user speaks to obtain personal information such as the nationality, gender, and age of the user. It is also known that there are malicious calls trying to lead to fraud and solicitation.
[0003] 従来より、いたずら電話等の悪意呼を撃退するための装置が提案されてきた。たと えば、利用者が発する音声のピッチ周期を変換して相手に聞力せる電話装置が提 案されている。この発明によれば、女性の利用者が発声しても相手には男性の声に 聞こえるため、相手に利用者が男性であると思わせ、いたずら電話をやめさせること ができる (例えば特許文献 1参照)。  [0003] Conventionally, devices for repelling malicious calls such as prank calls have been proposed. For example, a telephone device has been proposed in which the pitch period of voice uttered by a user is converted and the other party can hear. According to the present invention, even if a female user utters, the other party can hear a male voice, so that the other party can think that the user is a male and stop the prank call (for example, Patent Document 1). reference).
[0004] また、留守番電話の自動応答機能を利用して「ただ今電話に出ることができません 」といった定型メッセージを相手に聞力せることで、いたずら電話をあきらめさせる方 法も知られている。  [0004] In addition, there is a known method of giving up a mischievous call by using the automatic answering function of an answering machine and letting the other party hear a standard message such as "I can't answer the call right now".
特許文献 1:特開 2000— 78246号公報 (第 6ページ、図 1)  Patent Document 1: Japanese Patent Laid-Open No. 2000-78246 (Page 6, Figure 1)
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0005] し力しながら、上記従来の装置にあっては、利用者の性別が相手に知られることは 避けられるものの、利用者の発話内容が保たれたまま相手に伝わってしまうため、利 用者の国籍が判別されてしまい、利用者の個人情報を入手しょうとする悪意呼に対 する対策としては不十分であるという問題があった。また、詐欺行為や勧誘行為を目 的とする者は利用者との意思疎通が可能であることを知り、悪意呼を継続したり繰り 返したりしてしまうという問題があった。 [0005] However, in the above-described conventional apparatus, it is avoided that the user's gender is known to the other party, but it is transmitted to the other party while maintaining the user's utterance content. If the nationality of the user is determined and the malicious call attempts to obtain the user's personal information, There was a problem that it was not enough as a countermeasure. In addition, there was a problem that a person aiming at fraud or solicitation knew that communication with the user was possible and continued or repeated malicious calls.
[0006] また、定型メッセージを相手に聞力せる従来の方法の場合、メッセージが利用者本 人の発話によるものではないことを相手が容易に判別できるため、相手は利用者本 人が電話に出るまであきらめず、悪意呼を何度も繰り返してしまうという問題があった  [0006] In the case of the conventional method of listening to a fixed message to the other party, the other party can easily determine that the message is not due to the user's own utterance. There was a problem that repeated malicious calls many times without giving up
[0007] 本発明は、上記従来の課題を解決するものであり、相手に利用者の国籍を知られ ることがなく、利用者との意思疎通が不可能であると相手に思わせることにより、再度 の悪意呼を抑止することのできる音声通信装置を提供することを目的とする。 [0007] The present invention solves the above-described conventional problems by making the partner think that the other party does not know the nationality of the user and cannot communicate with the user. An object of the present invention is to provide a voice communication device that can suppress another malicious call.
課題を解決するための手段  Means for solving the problem
[0008] 前記従来の課題を解決するために、本発明の音声出力装置は、入力音声から音 韻情報を抽出する音声分析手段と、音韻情報に基づいて音声出力を指示する音声 出力制御手段と、音声出力制御手段の指示に基づいて音声を出力する音声出力手 段とを用いて入力音声の音韻情報に基づいてランダムな音韻の音声を出力するよう 構成した。 [0008] In order to solve the above-described conventional problems, the speech output device of the present invention includes speech analysis means for extracting phonological information from input speech, and speech output control means for instructing speech output based on the phonological information. In addition, a voice output unit that outputs voice based on an instruction from the voice output control means is used to output a random phoneme voice based on the phoneme information of the input voice.
[0009] 上記構成により、入力音声を元にして聞く者にとって意味を成さない音声を出力す ることができるので、相手に利用者の国籍を知られることなぐ利用者との意思疎通が 不可能であると思わせることができる。  [0009] With the above configuration, since it is possible to output a voice that does not make sense for the listener based on the input voice, there is no communication with the user who does not know the nationality of the user. You can make it seem possible.
[0010] また本発明の音声出力装置は、音韻情報が入力音声に含まれる音素または音節 を識別する情報を含み、音声出力制御手段は、音素または音節をあらかじめ定めた 規則に従って置換することによって出力音声を構成する音素または音節を決定する  In the speech output device of the present invention, the phoneme information includes information for identifying a phoneme or syllable included in the input speech, and the speech output control unit outputs the phoneme or syllable by replacing the phoneme or syllable according to a predetermined rule. Determine phonemes or syllables that make up speech
[0011] 上記構成により、入力音声の音素または音節が別の音素または音節に置換された 音声が出力されるため、聞く者にとって意味を成さない音声を出力することができる。 そのため、相手に利用者の国籍を知られることなぐ利用者との意思疎通が不可能で あると居、わせることができる。 [0011] With the configuration described above, since a voice in which the phoneme or syllable of the input voice is replaced with another phoneme or syllable is output, it is possible to output a voice that does not make sense for the listener. Therefore, it can be said that it is impossible to communicate with the user without knowing the nationality of the user.
[0012] また本発明の音声出力装置は、音韻情報が入力音声が有音かどうかを表す情報を 含み、音声出力制御手段は、有音かどうかを表す情報に基づいて音声出力を指示 する。 [0012] Further, in the audio output device of the present invention, the phoneme information includes information indicating whether or not the input sound is voiced. In addition, the voice output control means instructs the voice output based on the information indicating whether there is sound.
[0013] 上記構成により、入力音声が有音かどうかによつて音声出力の開始や停止を制御 することができるため、利用者や相手の発話のタイミングに合わせた音声出力ができ る。そのため、出力音声が定型メッセージや合成音声であると疑う余地を相手に与え ず、利用者との意思疎通が不可能であると思わせることができる。  [0013] With the above configuration, since the start and stop of voice output can be controlled depending on whether or not the input voice is voiced, voice output that matches the utterance timing of the user or the other party can be performed. For this reason, it is possible to make it seem that communication with the user is impossible without giving the other party the possibility that the output speech is a standard message or synthesized speech.
[0014] また本発明の音声出力装置は、音韻情報が入力音声の基本周波数を表す情報を 含み、音声出力制御手段が基本周波数情報に基づいて出力音声の基本周波数を 決定する。  [0014] In the audio output device of the present invention, the phoneme information includes information indicating the fundamental frequency of the input speech, and the speech output control means determines the fundamental frequency of the output speech based on the fundamental frequency information.
[0015] 上記構成により、出力音声の基本周波数の変化が入力音声の基本周波数の変化 と統計的に同様の性質を持つ揺らぎを持つようになり、出力音声の自然さが増す。そ のため、出力音声が定型メッセージや合成音声であると疑う余地を相手に与えず、 利用者との意思疎通が不可能であると思わせることができる。  [0015] With the above configuration, the change in the fundamental frequency of the output speech has fluctuations having the statistically similar properties to the change in the fundamental frequency of the input speech, and the naturalness of the output speech is increased. For this reason, it is possible to make it seem impossible to communicate with the user without giving the other party the room to suspect that the output speech is a standard message or synthesized speech.
[0016] また本発明は、通信処理を実行し出力音声を通信先へ出力する通信手段を更に 具備する音声通信装置を構成する。  The present invention also constitutes a voice communication apparatus further comprising a communication unit that executes communication processing and outputs output voice to a communication destination.
[0017] 上記構成により、相手に利用者の国籍を知られることがなぐ利用者との意思疎通 が不可能であると思わせることにより、再度の悪意呼を抑止することのできる音声通 信装置を提供することができる。  [0017] With the above configuration, a voice communication device capable of suppressing a second malicious call by making it seem that communication with a user whose partner's nationality is not known is impossible. Can be provided.
[0018] また本発明の音声出力方法は、入力音声力も音韻情報を抽出する第 1のステップ と、前記音韻情報に基づいてランダムな音韻で構成される出力音声の出力を指示す る第 2のステップと、前記指示に基づいて前記出力音声を出力する第 3のステップと を有する。  The speech output method of the present invention also includes a first step of extracting phoneme information of input speech power and a second step of indicating output of output speech composed of random phonemes based on the phoneme information. And a third step of outputting the output sound based on the instruction.
[0019] 上記方法により、聞く者にとって意味を成さない音声を入力音声に基づいて出力す ることができるので、相手に利用者の国籍を知られることなぐ利用者との意思疎通が 不可能であると思わせることができる。  [0019] By the above method, it is possible to output a voice that does not make sense for the listener based on the input voice, so it is impossible to communicate with the user without knowing the nationality of the user. Can be thought of as
発明の効果  The invention's effect
[0020] 本発明によれば、入力した音声の音韻情報をもとにランダムな音韻情報を作り出し て音声を出力することにより、悪意呼を発呼した相手に利用者の国籍を知られること がなぐ利用者との意思疎通が不可能であると思わせることができ、再度の悪意呼を 抑止することができる。 [0020] According to the present invention, by generating random phonological information based on the phonological information of the input speech and outputting the speech, it is possible to know the nationality of the user to the other party who originated the malicious call. It is possible to make it seem impossible to communicate with the users who are short-lived, and it is possible to deter malicious calls again.
図面の簡単な説明  Brief Description of Drawings
[0021] [図 1]本発明の実施の形態 1における音声出力装置の概略構成図  FIG. 1 is a schematic configuration diagram of an audio output device according to Embodiment 1 of the present invention.
[図 2]本発明の実施の形態 1における携帯電話端末の構成図  FIG. 2 is a configuration diagram of a mobile phone terminal according to Embodiment 1 of the present invention.
[図 3]本発明の実施の形態 1における携帯電話端末の外観図  FIG. 3 is an external view of a mobile phone terminal according to Embodiment 1 of the present invention.
[図 4]本発明の実施の形態 1における携帯電話端末の音声出力制御手段が保持して V、る音素置換候補テーブルの内容を説明する図  FIG. 4 is a diagram for explaining the contents of a V, phoneme replacement candidate table held by the voice output control means of the mobile phone terminal according to the first embodiment of the present invention.
[図 5]本発明の実施の形態 1における携帯電話端末の音声出力制御手段の処理手 順を示すフローチャート  FIG. 5 is a flowchart showing a processing procedure of voice output control means of the mobile phone terminal in the first embodiment of the present invention.
[図 6]本発明の実施の形態 1における携帯電話端末の動作を説明する図  FIG. 6 is a diagram for explaining the operation of the mobile phone terminal according to the first embodiment of the present invention.
[図 7]本発明の実施の形態 2における携帯電話端末の構成図  FIG. 7 is a configuration diagram of a mobile phone terminal according to Embodiment 2 of the present invention.
[図 8]本発明の実施の形態 2における携帯電話端末の音声出力手段が保持している 音節列テーブルの内容を説明する図  FIG. 8 is a diagram for explaining the contents of the syllable string table held by the voice output means of the mobile phone terminal in the second embodiment of the present invention.
[図 9]本発明の実施の形態 2における携帯電話端末の音声出力制御手段の処理手 順を示す第 1のフローチャート  FIG. 9 is a first flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
[図 10]本発明の実施の形態 2における携帯電話端末の音声出力制御手段の処理手 順を示す第 2のフローチャート  FIG. 10 is a second flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
[図 11]本発明の実施の形態 2における携帯電話端末の音声出力制御手段の処理手 順を示す第 3のフローチャート  FIG. 11 is a third flowchart showing the processing procedure of the voice output control means of the mobile phone terminal in the second embodiment of the present invention.
[図 12]本発明の実施の形態 2における携帯電話端末の第 1の動作を説明する図 [図 13]本発明の実施の形態 2における携帯電話端末の第 2の動作を説明する図 符号の説明  FIG. 12 is a diagram illustrating a first operation of the mobile phone terminal according to the second embodiment of the present invention. FIG. 13 is a diagram illustrating a second operation of the mobile phone terminal according to the second embodiment of the present invention. Explanation
[0022] 1 音声出力装置 [0022] 1 Audio output device
2 音声分析手段  2 Voice analysis means
3 音声出力制御手段  3 Audio output control means
4 音声出力手段  4 Audio output means
5 無線通信手段 7 信号切替手段 5 Wireless communication means 7 Signal switching means
41 定型音節保持手段  41 Standard syllable holding means
61 マイクロホン  61 Microphone
62 スピーカー  62 Speaker
91 信号切替ボタン  91 Signal switch button
92 才ンフックボタン  92 years old hook button
93 オフフックボタン  93 Off-hook button
100, 200 携帯電話端末  100, 200 mobile phone terminals
201 入力音声  201 Input audio
202 音韻情報  202 Phonological information
302 音声出力指示  302 Audio output instruction
304 オンフック信号  304 On-hook signal
402 出力音声  402 output audio
501 送話信号  501 Transmission signal
502 受話信号  502 Receive signal
612 マイクロホン出力信号  612 Microphone output signal
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0023] 以下、本発明を実施するための最良の形態について、図面を参照しながら説明す る。なお、実施の形態を説明するための全図において、同一の構成要素には同一符 号を付与し、重複する説明は省略する。 The best mode for carrying out the present invention will be described below with reference to the drawings. Note that in all the drawings for explaining the embodiments, the same reference numerals are given to the same components, and duplicate explanations are omitted.
[0024] (実施の形態 1) [Embodiment 1]
図 1は、本発明の実施の形態 1における音声出力装置 1の概略構成図である。図 1 において、音声出力装置 1は音声分析手段 2、音声出力制御手段 3、音声出力手段 FIG. 1 is a schematic configuration diagram of an audio output device 1 according to Embodiment 1 of the present invention. In FIG. 1, an audio output device 1 includes audio analysis means 2, audio output control means 3, and audio output means.
4を備える。 With four.
[0025] 音声分析手段 2は、入力音声 201を受け付けて音声分析処理を実行し、入力音声 201の音韻情報を抽出して音韻情報 202を出力する。なお、本明細書に記載する「 音韻情報」とは音声分析の結果得られる音声の音韻的な情報を指す。例えば、音声 を構成する音素や音節を識別するための記号や記号列、音声の強弱や高低を示す 情報、基本周波数の情報、有音か無音かを区別するための情報等が本明細書記載 の「音韻情報」に相当する。 The voice analysis means 2 receives the input voice 201 and executes voice analysis processing, extracts the phoneme information of the input voice 201 and outputs the phoneme information 202. Note that “phonological information” described in this specification refers to phonological information of speech obtained as a result of speech analysis. For example, a symbol or symbol string that identifies the phonemes or syllables that make up the speech, and the strength or weakness of the speech Information, information on the fundamental frequency, information for distinguishing between sound and silence, and the like correspond to “phonological information” described in this specification.
[0026] 音声出力制御手段 3は、音韻情報 202を受け付け、出力音声 402の出力を指示す る音声出力指示 302を出力する。音声出力手段 4は、音声出力指示 302に基づいて 出力音声 402を出力する。ここで、音声出力制御手段 3は、出力音声 402がランダム な音韻で構成されるよう、音韻情報 202に基づいて音声出力指示 302を出力する。  The voice output control means 3 receives the phoneme information 202 and outputs a voice output instruction 302 that indicates the output of the output voice 402. The sound output means 4 outputs the output sound 402 based on the sound output instruction 302. Here, the voice output control means 3 outputs a voice output instruction 302 based on the phoneme information 202 so that the output voice 402 is composed of random phonemes.
[0027] なお、音声分析手段 2における音声分析処理には公知の音声分析装置や音声分 析方法を用いてよぐ従って本明細書中では詳しい説明を省略するものとする。また 、音声出力手段 4における音声出力処理には公知の音声合成装置や音声合成方法 を用いてよぐ従って本明細書中では詳しい説明を省略するものとする。  [0027] It should be noted that the voice analysis processing in the voice analysis means 2 may be performed using a known voice analysis device or voice analysis method, and therefore detailed description thereof will be omitted in this specification. In addition, the voice output process in the voice output means 4 may be performed using a known voice synthesizer or voice synthesizer. Therefore, detailed description will be omitted in this specification.
[0028] 図 2は、本発明の実施の形態 1における音声出力装置 1を携帯電話端末 100として 構成したときの構成図である。図 2において携帯電話端末 100は、図 1で説明した構 成要素に加えて無線通信手段 5、マイクロホン 61、スピーカー 62、信号切替手段 7、 信号切替ボタン 91、オンフックボタン 92、オフフックボタン 93を備える。  FIG. 2 is a configuration diagram when the audio output device 1 according to Embodiment 1 of the present invention is configured as the mobile phone terminal 100. In FIG. 2, the mobile phone terminal 100 includes wireless communication means 5, microphone 61, speaker 62, signal switching means 7, signal switching button 91, on-hook button 92, and off-hook button 93 in addition to the components described in FIG. .
[0029] 音声分析手段 2は、入力音声 201を受け付けて音声分析処理を実行し、入力音声 201の短時間平均パワーを算出して、受け付けた入力音声 201が有音区間であるか 無音区間であるかを判定し、有音の場合に信号「ON」を、無音の場合に信号「OFF 」を音韻情報 202として出力する。また音声分析手段 2は、入力音声 201にケプストラ ム分析等を施すことにより音素を識別し、音素記号 (たとえば ZpZ、 ZaZ)を音韻 情報 202として出力する。さらに音声分析手段 2は、入力音声 201の 30ミリ秒ごとの 基本周波数を表す基本周波数情報を音韻情報 202として出力する。  [0029] The voice analysis means 2 receives the input voice 201, executes voice analysis processing, calculates the short-term average power of the input voice 201, and determines whether the received input voice 201 is a voiced section or a silent section. If there is sound, the signal “ON” is output as the phoneme information 202, and if there is no sound, the signal “OFF” is output as the phoneme information 202. Further, the speech analysis means 2 identifies phonemes by performing cepstrum analysis or the like on the input speech 201 and outputs phoneme symbols (for example, ZpZ, ZaZ) as phoneme information 202. Further, the voice analysis means 2 outputs the fundamental frequency information representing the fundamental frequency of every 30 milliseconds of the input voice 201 as the phoneme information 202.
[0030] 音声出力制御手段 3は、音韻情報 202に含まれる音素記号と基本周波数情報に 基づ 、て、出力音声 402が含むべき音素および出力音声 402の基本周波数を決定 し、音声出力指示 302として出力する。  The speech output control means 3 determines the phoneme to be included in the output speech 402 and the fundamental frequency of the output speech 402 based on the phoneme symbol and the fundamental frequency information included in the phoneme information 202, and the speech output instruction 302 Output as.
[0031] 音声出力手段 4は、音声出力指示 302が指示する音素および基本周波数に基づ いて出力音声 402を合成して出力する。また音声出力手段 4は、音素が遷移する際 の調音結合の処理、および基本周波数変化の補間処理等を行うことにより、出力音 声 402が音声として不自然にならな!、ようにして!/、る。 [0032] 無線通信手段 5は、携帯電話端末 100を無線公衆網(図示せず)と接続して通信 処理を実行する部分である。無線通信手段 5は、送話信号 501を無線公衆網へ出力 する。また無線通信手段 5は、無線公衆網力も取得した受話信号 502をスピーカー 6 2へ出力する。また無線通信手段 5は、オフフックボタン 93の操作に基づいて送話信 号 501の無線公衆網への出力および受話信号 502の無線公衆網からの取得を開始 する。また無線通信手段 5は、オンフックボタン 92の操作に基づいて無線公衆網との 接続を終了する。 [0031] The voice output means 4 synthesizes and outputs the output voice 402 based on the phoneme and the fundamental frequency indicated by the voice output instruction 302. Also, the audio output means 4 performs the articulation coupling process when the phoneme transitions, the interpolation process of the fundamental frequency change, etc., so that the output voice 402 does not become unnatural as a voice! / RU [0032] The wireless communication means 5 is a part that performs communication processing by connecting the mobile phone terminal 100 to a wireless public network (not shown). The wireless communication means 5 outputs a transmission signal 501 to the wireless public network. Further, the wireless communication means 5 outputs the reception signal 502 that has acquired the wireless public network power to the speaker 62. Further, the wireless communication means 5 starts outputting the transmission signal 501 to the wireless public network and acquiring the reception signal 502 from the wireless public network based on the operation of the off-hook button 93. Further, the wireless communication means 5 terminates the connection with the wireless public network based on the operation of the on-hook button 92.
[0033] マイクロホン 61は利用者の音声を電気信号に変換してマイクロホン出力信号 612 を出力する。スピーカー 62は受話信号 502を空気振動に変換して放音する。信号切 替手段 7は、 2種類の信号を切り替えて出力する手段であり、信号切替ボタン 91の操 作により出力音声 402を送話信号 501として出力するかマイクロホン出力信号 612を 送話信号 501として出力するかを切り替えることができる。すなわち、利用者の音声 に処理を施した出力音声 402と、処理を施さないマイクロホン出力信号 612のうち、 どちらの音声を無線公衆網へ出力して通話相手に聞力せるかを、信号切替ボタン 91 の操作によって利用者が選択することができるよう構成されている。なお、信号切替 手段 7は、無線通信手段 5による通信処理の初期状態において出力音声 402を送話 信号 501として出力するようになっている。このことにより、通話相手が悪意を抱いて いることを知らずに利用者が思わず発した音声を通話相手に聞かれてしまい、利用 者の個人情報が通話相手に知られてしまうという不都合を防ぐことができる。  The microphone 61 converts the user's voice into an electrical signal and outputs a microphone output signal 612. The speaker 62 converts the received signal 502 into air vibration and emits sound. The signal switching means 7 is a means for switching and outputting two types of signals. By operating the signal switching button 91, the output sound 402 is output as the transmission signal 501 or the microphone output signal 612 is used as the transmission signal 501. The output can be switched. That is, the signal switching button determines which of the output voice 402 processed for the user's voice and the microphone output signal 612 that is not processed is output to the wireless public network to be heard by the other party. It is configured so that the user can select by operating 91. The signal switching means 7 outputs the output sound 402 as the transmission signal 501 in the initial state of the communication processing by the wireless communication means 5. This prevents the inconvenience that the other party hears the voice that the user unexpectedly uttered without knowing that the other party is malicious and the user's personal information is known to the other party. Can do.
[0034] 図 3は、本発明の実施の形態における携帯電話端末 100の外観を示す図である。  FIG. 3 is a diagram showing an appearance of the mobile phone terminal 100 in the embodiment of the present invention.
信号切替ボタン 91は、筐体下部の側面に配置され、利用者が通話中に手元を見ず に操作できるようになつている。  The signal switching button 91 is arranged on the side of the lower part of the casing so that the user can operate it without looking at his / her hand during a call.
[0035] 図 4は、本発明の実施の形態 1における携帯電話端末 100の、音声出力制御手段 3が保持している音素置換候補テーブル T31の内容を示す図である。音素置換候補 テーブル T31は、音韻情報 202に含まれる音素を置換するための別の音素の候補 を示したテーブルである。音素置換候補テーブル T31の各レコード R311〜R318は 、音韻情報 202を構成する音素 pを表す第 1フィールド F311と、音素 pの替わりとなる 別の音素 P'の候補を表す第 2フィールド F312とから成る。音声出力制御手段 3は音 素置換候補テーブル T31に基づき所定の規則 (以下、音素置換規則という)に従つ て音韻情報 202を構成する音素 pを音素 p'で置換することによって音声出力指示 30 2を生成する。すなわち音声出力制御手段 3は、音素置換候補テーブル T31中の、 音素 Pを第 1フィールド(F311)に含むレコードをレコード R311〜R318の中力も探し 、該当するレコードの第 2フィールド (F312)が示す音素 p,の候補力もランダムに選 択した音素を音素 P'とし、音素 pを音素 p'で置換することによって音声出力指示 302 を生成する。 FIG. 4 is a diagram showing the contents of the phoneme replacement candidate table T31 held by the audio output control means 3 of the mobile phone terminal 100 according to Embodiment 1 of the present invention. The phoneme replacement candidate table T31 is a table showing another phoneme candidate for replacing a phoneme included in the phoneme information 202. Each record R311 to R318 of the phoneme replacement candidate table T31 includes a first field F311 representing a phoneme p constituting the phoneme information 202, and a second field F312 representing a candidate of another phoneme P ′ that replaces the phoneme p. Become. Sound output control means 3 is sound A speech output instruction 302 is generated by replacing the phoneme p constituting the phoneme information 202 with the phoneme p ′ according to a predetermined rule (hereinafter referred to as a phoneme replacement rule) based on the phoneme replacement candidate table T31. That is, the speech output control means 3 searches the phoneme replacement candidate table T31 for a record including the phoneme P in the first field (F311), and also searches for the medium force of the records R311 to R318, and the second field (F312) of the corresponding record indicates The phoneme p is also selected as a phoneme P ′, and the phoneme p ′ is replaced with the phoneme p ′, and the speech output instruction 302 is generated.
[0036] 図 5は、本発明の実施の形態 1における携帯電話端末 100の、音声出力制御手段 3の処理手順を表すフローチャートである。音声出力制御手段 3は、まず音韻情報 2 02を取得し (ステップ S 101)、音韻情報 202に含まれて 、る信号が「ON」であるか「 OFF」であるかに基づいて入力音声 201が有音区間であるかどうかを判断する(ステ ップ S102)。有音区間である場合 (YES)、音声出力制御手段 3はステップ S103へ 進み、無音区間である場合 (NO)、ステップ S 101へ進む。  FIG. 5 is a flowchart showing a processing procedure of audio output control means 3 of mobile phone terminal 100 according to Embodiment 1 of the present invention. The voice output control means 3 first acquires the phoneme information 202 (step S 101), and based on whether the signal included in the phoneme information 202 is “ON” or “OFF”, It is determined whether or not is a voiced section (step S102). If it is a voiced section (YES), the audio output control means 3 proceeds to step S103, and if it is a silent section (NO), proceeds to step S101.
[0037] ステップ S103において音声出力制御手段 3は、今回取得した音韻情報 202に含 まれる音素が前回取得した音韻情報 202に含まれる音素と同じ力どうかを判定し、同 じであれば(NO)ステップ S 105へ進み、異なって!/ヽれば (YES)ステップ S 104へ進 む。  [0037] In step S103, the speech output control means 3 determines whether the phoneme included in the phoneme information 202 acquired this time has the same power as the phoneme included in the phoneme information 202 acquired last time. ) Proceed to step S105, if different! / (YES) Proceed to step S104.
[0038] ステップ S104において音声出力制御手段 3は、音韻情報 202を構成する音素 pを 音素置換規則に従って新たな音素 P'に置換して音声出力指示 302とする。これに対 し、ステップ S105において音声出力制御手段 3は、音韻情報 202を構成する音素 p を前回のステップ S104の処理で得た音素 p'に置換して音声出力指示 302とする。 ステップ S103の判断とステップ S105の処理は、入力音声 201中の同一音素が連続 した区間を別の同一音素が連続した区間に変換するためのものである。  In step S104, the speech output control means 3 replaces the phoneme p constituting the phoneme information 202 with a new phoneme P ′ in accordance with the phoneme replacement rule, thereby obtaining a speech output instruction 302. On the other hand, in step S105, the speech output control means 3 replaces the phoneme p constituting the phoneme information 202 with the phoneme p ′ obtained in the previous processing of step S104 to obtain a speech output instruction 302. The determination in step S103 and the processing in step S105 are for converting a section in which the same phoneme in the input speech 201 is continued into a section in which another same phoneme is continued.
[0039] ステップ S106において音声出力制御手段 3は、音韻情報 202に含まれている基本 周波数情報が示す基本周波数 F0に基づき、周波数変換式に従って基本周波数 F0 'を計算し、音声出力指示 302とする。周波数変換式は次式のとおりである。  In step S 106, the audio output control means 3 calculates the basic frequency F 0 ′ according to the frequency conversion equation based on the basic frequency F 0 indicated by the basic frequency information included in the phoneme information 202, and sets it as the audio output instruction 302. . The frequency conversion formula is as follows.
[0040] F0, =F0 *r * (乱数(0. 4) +0. 8)  [0040] F0, = F0 * r * (random number (0.4) +0.8)
ここで rは、出力音声 402の基本周波数を入力音声 201の基本周波数に対してど の程度高くするかを指示するための予め定められた係数である。係数 rは、 rく 1とす ることにより出力音声 402の基本周波数を入力音声 201の基本周波数よりも全体的 に低くすることがでる。たとえば r=0. 5とすれば、入力音声 201の女声が男声のよう に変換されて出力音声 402として出力されることになるので、通話相手に利用者の性 別が知られることを防ぐことができる。係数 rは利用者の操作によって変更することを 可能としてもよい。乱数 (0. 4)は、 0. 4未満の乱数を表す。基本周波数に乱数等で 揺らぎを与えることにより、入力音声 402のイントネーションの変化を攪乱することが でき、出力音声 402のイントネーション力も入力音声 201が何語であるかを通話相手 に悟られてしまうことを防 、で 、る。 Where r is the fundamental frequency of the output audio 402 relative to the basic frequency of the input audio 201. It is a predetermined coefficient for instructing whether to increase the degree of. By setting the coefficient r to r <1, the fundamental frequency of the output speech 402 can be made lower overall than the fundamental frequency of the input speech 201. For example, if r = 0.5, the female voice of the input voice 201 will be converted to a male voice and output as the output voice 402, thus preventing the calling party from knowing the gender of the user. Can do. The coefficient r may be changeable by user operation. Random number (0.4) represents a random number less than 0.4. By giving fluctuations to the fundamental frequency with random numbers, etc., the change in intonation of the input voice 402 can be disturbed, and the intonation power of the output voice 402 can be recognized by the other party as to what language the input voice 201 is in. To prevent.
[0041] ステップ S107において音声出力制御手段 3は、ステップ S 106までの処理で生成 した音声出力指示 302を音声出力手段 4へ出力し、ステップ S101へ戻る。  In step S107, the audio output control means 3 outputs the audio output instruction 302 generated by the processing up to step S106 to the audio output means 4, and returns to step S101.
[0042] 以下、本発明の実施の形態 1の具体的な動作を説明する。すなわち、利用者の音 声に処理を施して無線公衆網へ出力し、通話相手に聞カゝせる具体例である。ここで 、利用者が日本語を母国語とする女性であり、係数 rが 0. 5である場合を例として説 明する。  [0042] Hereinafter, a specific operation of the first embodiment of the present invention will be described. In other words, this is a specific example in which the user's voice is processed and output to the wireless public network, and the other party can be heard. Here, the case where the user is a woman whose native language is Japanese and the coefficient r is 0.5 will be described as an example.
[0043] 図 6は、本発明の実施の形態 1における携帯電話端末 100において、通話時に利 用者が音声「どちらさま?」を発した際の入力音声 201と出力音声 402の内容を示す 図である。図 6において横軸は時刻を、縦軸は基本周波数を表している。また、線群 L101は入力音声 201の基本周波数変化を、線群 L102は出力音声 402の基本周 波数変化を表している。また、線群 L101と線群 L102のすぐ上に記載されている音 素記号はそれぞれ音韻情報 202に含まれる音素 (D101〜D106)と、音声出力指示 302で指示される音素(D111〜D116)を示す。以下、図 6を中心に本発名の実施 の形態 1における携帯電話端末 100の動作を説明する。  FIG. 6 shows the contents of input voice 201 and output voice 402 when mobile phone terminal 100 according to Embodiment 1 of the present invention utters a voice “Who?” During a call. It is. In FIG. 6, the horizontal axis represents time and the vertical axis represents the fundamental frequency. A line group L101 represents a change in the fundamental frequency of the input sound 201, and a line group L102 represents a change in the fundamental frequency of the output sound 402. The phoneme symbols described immediately above the line group L101 and the line group L102 are the phoneme (D101 to D106) included in the phoneme information 202 and the phoneme (D111 to D116) indicated by the voice output instruction 302, respectively. Indicates. The operation of mobile phone terminal 100 in the first embodiment of the present invention will be described below with reference to FIG.
[0044] まず、利用者が通話を開始すると、音声分析手段 2が入力音声 201を分析して時 刻 tlOlに音韻情報 202を抽出する。音韻情報 202は音素 ZdZ(DlOl)と基本周 波数 F0の情報、有音区間であることを示す信号「ON」を含んでいる。  First, when the user starts a call, the voice analysis means 2 analyzes the input voice 201 and extracts the phoneme information 202 at the time tlOl. The phoneme information 202 includes information on the phoneme ZdZ (DlOl) and the fundamental frequency F0, and a signal “ON” indicating that it is a voiced section.
[0045] 次に、音声出力制御手段 3が図 5のフローチャートに示す手順で処理を実行する。  Next, the audio output control means 3 executes processing according to the procedure shown in the flowchart of FIG.
ステップ S101において音声出力制御手段 3は、音韻情報 202を取得する。ステップ S 102において音声出力制御手段 3は、有音区間(「ON」)である (YES)のでステツ プ S103へ進む。ステップ S103において音声出力制御手段 3は、新たに音素 ZdZ を受け付けたので判断は「YES」となり、ステップ S 104へ進む。 In step S101, the audio output control means 3 acquires phonological information 202. Step In S102, since the sound output control means 3 is a sound section (“ON”) (YES), the process proceeds to Step S103. In step S103, the voice output control means 3 has newly received the phoneme ZdZ, so the determination is “YES”, and the flow proceeds to step S104.
[0046] ステップ S104において音声出力制御手段 3は、図 4の音素置換候補テーブル T3 1を参照し、音素置換規則に従って音素 ZdZを置換するための音素 P'を選ぶ。この とき、音素置換候補テーブル T31中の、音素 ZdZを第 1フィールドとするレコードは レコード R313であるので、音声出力制御手段 3は音素 ZkZまたは ZgZの中からラ ンダムに選択した音素を音素 p'とする。ここで音素 p'として ZkZが選択されたものと する。 In step S104, the speech output control means 3 refers to the phoneme replacement candidate table T31 in FIG. 4, and selects a phoneme P ′ for replacing the phoneme ZdZ according to the phoneme replacement rule. At this time, since the record having phoneme ZdZ as the first field in phoneme replacement candidate table T31 is record R313, speech output control means 3 uses phoneme p ′ as the phoneme selected randomly from phoneme ZkZ or ZgZ. And Here, ZkZ is selected as phoneme p '.
[0047] 図 5のステップ S106において音声出力制御手段 3は、周波数変換式に従って基本 周波数 FO'を計算し、ステップ 107において音声出力制御手段 3は、音素 ZkZと基 本周波数 FO'を音声出力指示 302として出力する。  [0047] In step S106 of FIG. 5, the audio output control means 3 calculates the basic frequency FO 'according to the frequency conversion equation, and in step 107, the audio output control means 3 instructs the phoneme ZkZ and the basic frequency FO' to be output as an audio signal. Output as 302.
[0048] さらに、音声出力手段 4が音声出力指示 302に基づいて出力音声 402を合成して 出力する。出力音声 402は図 6に示すとおり、音素 ZkZ (Di l i)力も成る音声であ り、基本周波数は入力音声 201の基本周波数の約半分となっている。  Further, the audio output means 4 synthesizes and outputs the output audio 402 based on the audio output instruction 302. As shown in FIG. 6, the output speech 402 is speech with phoneme ZkZ (Di l i) force, and the fundamental frequency is about half of the fundamental frequency of the input speech 201.
[0049] 入力音声 201の音素 ZoZ(D102)の部分に対しても同様の処理を行うことにより 、音素 ZiZ(D112)力も成る出力音声 402が音声出力手段 4から出力される。出力 音声 402は信号切替手段を通して送話信号 501として無線通信手段 5へ送られ、さ らに無線公衆網を介して通話相手の端末へ出力される。  By performing the same processing on the phoneme ZoZ (D102) portion of the input speech 201, the output speech 402 having the phoneme ZiZ (D112) force is output from the speech output means 4. The output voice 402 is sent to the wireless communication means 5 as the transmission signal 501 through the signal switching means, and is further outputted to the terminal of the other party through the wireless public network.
[0050] 次に続く音素 ZchZ (D103)は、無声子音であるため音声分析手段 2は基本周波 数を抽出できず、音韻情報 202における基本周波数 FOの値として 0を出力する。音 声出力制御手段 3は、ステップ S106の処理において基本周波数 FO'の値として 0を 出力するが、音声出力手段 4は、前回受け付けた音素 ZiZ(D112)の基本周波数 と、次に受け付けた音素 ZaZ(Dl 14)の基本周波数とを補間し、音素 ZrZ(D113 )に対応する出力音声 402の基本周波数をなめらかに変化させている。  [0050] Since the following phoneme ZchZ (D103) is an unvoiced consonant, the speech analysis means 2 cannot extract the fundamental frequency and outputs 0 as the value of the fundamental frequency FO in the phoneme information 202. The audio output control means 3 outputs 0 as the value of the basic frequency FO ′ in the process of step S106, but the audio output means 4 outputs the basic frequency of the phoneme ZiZ (D112) accepted last time and the next accepted phoneme. The basic frequency of the output speech 402 corresponding to the phoneme ZrZ (D113) is smoothly changed by interpolating with the basic frequency of ZaZ (Dl 14).
[0051] また、入力音声 201の音素 ZaZ (D105)に対応する出力音声 402の音素 ZiZ ( Dl 15)の基本周波数は、図 5のステップ S106の周波数変換式における乱数の影響 により、直前の音素 ZnZの基本周波数に比較して若干上昇している。 [0052] 以下、同様の処理を繰り返すことにより、日本語の入力音声 201「どちらさま?」 1S 日本語として意味を成さない男声「きらにじょひ?」に変換され、無線公衆網を介して 通話相手の端末から出力されることになる。 [0051] The fundamental frequency of the phoneme ZiZ (Dl 15) of the output speech 402 corresponding to the phoneme ZaZ (D105) of the input speech 201 is determined by the influence of random numbers in the frequency conversion formula of step S106 in FIG. It is slightly higher than the fundamental frequency of ZnZ. [0052] Subsequently, by repeating the same process, the Japanese input voice 201 “Whisama?” 1S is converted into a male voice “Kirani Johi?” That does not make sense as Japanese, and the wireless public network is Via the other party's terminal.
[0053] 以上説明したように、本発明の実施の形態 1における携帯電話端末 100では、第 1 のステップとして音声分析手段 2が入力音声 201から音韻情報を抽出して音韻情報 202を出力し、第 2のステップとして音声出力制御手段 3が図 5のフローチャートに示 す手順で処理を実行することにより音韻情報 202に基づいて音声出力指示 302を出 力し、第 3のステップとして音声出力手段 4が音声出力指示 302に基づいて出力音 声 402を合成して出力している。この音声出力方法により、入力音声の音素が別の 音素に置換されたランダムな音声を通話相手の端末へ出力することができる。  [0053] As described above, in the mobile phone terminal 100 according to Embodiment 1 of the present invention, as a first step, the speech analysis means 2 extracts the phoneme information from the input speech 201 and outputs the phoneme information 202. As the second step, the voice output control means 3 outputs the voice output instruction 302 based on the phoneme information 202 by executing the processing in the procedure shown in the flowchart of FIG. 5, and as the third step, the voice output means 4 Synthesizes the output sound 402 based on the sound output instruction 302 and outputs it. By this voice output method, random voice in which the phoneme of the input voice is replaced with another phoneme can be output to the terminal of the other party.
[0054] 以上の説明から明らかなように、本発明の実施の形態 1における携帯電話端末は、 入力音声の音素が別の音素に置換された音声が通話相手の端末へ出力されるため 、通話相手にとって意味を成さない音声を出力することができるので、通話相手に利 用者の国籍を知られることなぐ利用者との意思疎通が不可能であると通話相手に思 わせることができる。  As is clear from the above description, since the mobile phone terminal according to Embodiment 1 of the present invention outputs the voice in which the phoneme of the input voice is replaced with another phoneme to the other party's terminal, Since it can output voice that does not make sense to the other party, it can be thought to the other party that it is impossible to communicate with the user without knowing the nationality of the user. .
[0055] また、本発明の実施の形態 1における携帯電話端末は、出力音声の基本周波数の 変化が入力音声の基本周波数の変化と統計的に同様の性質を持つ揺らぎを持つよ うになり、出力音声の自然さが増すので、出力音声が定型メッセージや合成音声で あると疑う余地を通話相手に与えず、利用者との意思疎通が不可能であると確実に 通話相手に思わせることができる。  [0055] In addition, the mobile phone terminal according to Embodiment 1 of the present invention has a fluctuation in which the change in the fundamental frequency of the output voice has a property that is statistically similar to the change in the fundamental frequency of the input voice. Since the naturalness of the voice increases, it is possible to make the other party think that it is impossible to communicate with the user without giving the other party the room to suspect that the output voice is a standard message or synthesized voice. .
[0056] (実施の形態 2)  [Embodiment 2]
次に本発明の第 2の実施の形態について説明する。本実施の形態 2では、無線通 信手段 5の出力である受話信号を入力音声として用いている点に特徴がある。  Next, a second embodiment of the present invention will be described. The second embodiment is characterized in that the received signal that is the output of the wireless communication means 5 is used as the input voice.
[0057] 図 7は、本発明の実施の形態 2における携帯電話端末 200の構成図である。図 7に おいて、携帯電話端末 200は音声分析手段 2、音声出力制御手段 3、音声出力手段 4、定型音節保持手段 41、無線通信手段 5、マイクロホン 61、スピーカー 62、信号切 替手段 7、信号切替ボタン 91、オンフックボタン 92、オフフックボタン 93を備える。こ こで、定型音節保持手段 41は出力音声 402を構成するための定型的な音節列を保 持する手段である。 FIG. 7 is a configuration diagram of the mobile phone terminal 200 according to Embodiment 2 of the present invention. In FIG. 7, the mobile phone terminal 200 includes a voice analysis means 2, a voice output control means 3, a voice output means 4, a fixed syllable holding means 41, a wireless communication means 5, a microphone 61, a speaker 62, a signal switching means 7, A signal switching button 91, an on-hook button 92, and an off-hook button 93 are provided. Here, the fixed syllable holding means 41 holds a fixed syllable string for constituting the output speech 402. It is means to have.
[0058] 音声分析手段 2は、無線通信手段 5の出力である受話信号 502を入力音声 201と して受け付けて音声分析処理を実行し、入力音声 201の短時間平均パワーを算出し て、受け付けた入力音声 201が有音区間であるか無音区間であるかを判定し、有音 の場合に信号「ON」を、無音の場合に信号「OFF」を音韻情報 202として出力する。  [0058] The voice analysis means 2 accepts the received signal 502, which is the output of the wireless communication means 5, as the input voice 201, executes voice analysis processing, calculates the short-term average power of the input voice 201, and accepts it. It is determined whether the input voice 201 is a voiced section or a silent section, and the signal “ON” is output as phoneme information 202 when there is a voice and the signal “OFF” is output when there is no sound.
[0059] 音声出力制御手段 3は、音韻情報 202に基づいて、出力音声 402として出力すベ き音節列を識別する音節列 IDを生成し、音声出力指示 302として出力する。また音 声出力制御手段 3は、無線通信手段 5が無線公衆網との接続を終了するための指示 であるオンフック信号 304を出力する。  The voice output control means 3 generates a syllable string ID for identifying a syllable string to be output as the output voice 402 based on the phonological information 202, and outputs it as the voice output instruction 302. The audio output control means 3 outputs an on-hook signal 304 that is an instruction for the wireless communication means 5 to end the connection with the wireless public network.
[0060] 音声出力手段 4は、音声出力指示 302に含まれる音節列 IDに基づいて定型音節 保持手段 41から取得した音節列データを元にして出力音声 402を合成して出力す る。定型音節保持手段 41は、音節列 IDに基づいて音節列データを得るためのテー ブルである音節列テーブルを保持する。  The voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data acquired from the fixed syllable holding means 41 based on the syllable string ID included in the voice output instruction 302. The fixed syllable holding means 41 holds a syllable string table which is a table for obtaining syllable string data based on the syllable string ID.
[0061] 図 8は、本発明の実施の形態 2における携帯電話端末 200の、定型音節保持手段 41が保持する音節列テーブル T41の内容を示す図である。音節列テーブル T41の 各レコード R411〜R414は、音節列 IDを表す第 1フィールド F411と、音節列 IDに 対応する音節列データを表す第 2フィールド F412から成る。音節列データは出力音 声 402として出力すべき音節 (括弧 []で囲まれているアルファベット)と、音節の基本 周波数の係数 (括弧 0で囲まれて 、る数値)のペアの列で表されて 、る。たとえば、 レコード R411に格納されている音節列 ID = 1に対応する音節列データ「 [ho] (1. 0 ) , [la] (l. 2)」は、音節 [ho]と音節 [la]を連続して、それぞれ中程度の基本周波数 と高めの基本周波数で出力することを表して 、る。  FIG. 8 is a diagram showing the contents of the syllable string table T41 held by the fixed syllable holding means 41 of the mobile phone terminal 200 according to Embodiment 2 of the present invention. Each record R411 to R414 of the syllable string table T41 includes a first field F411 representing a syllable string ID and a second field F412 representing syllable string data corresponding to the syllable string ID. The syllable string data is expressed as a string of pairs of syllables (alphabet enclosed in parentheses []) to be output as the output speech 402 and coefficients of the fundamental frequency of syllables (numerical values enclosed in parentheses 0). And For example, the syllable string data “[ho] (1. 0), [la] (l. 2)” corresponding to the syllable string ID = 1 stored in the record R411 includes the syllable [ho] and the syllable [la] Indicates that the output is continuous at a medium fundamental frequency and a higher fundamental frequency.
[0062] また図 8において、レコード R411の音節列 ID= 1に対応する音節列データは、通 話開始直後の挨拶として発声される定型句 (たとえば「もしもし」 )であるかのように通 話相手に思わせるためのデータである。また、レコード R412の音節列 ID = 2に対応 する音節列データは、通話相手に問い返す際に発声される定型句(たとえば「はい ?」)であるかのように通話相手に思わせるためのデータである。音節列データは、通 話相手にとって意味を成さない音声を生成することを意図したものではある力 何ら かの定型句であるかのように通話相手に思わせるための音節列データをあら力じめ 用意しておき、ときどき出力音声 402中に出現させるようにすることにより、出力音声 4 02に自然言語としての自然さが生まれ、出力音声 402が利用者本人の発声によるも のであると通話相手に信じさせることができる。 [0062] In FIG. 8, the syllable string data corresponding to syllable string ID = 1 in record R411 is a call as if it were a fixed phrase (for example, "Hello") spoken as a greeting immediately after the start of the call. It is data to make the other party think. In addition, the syllable string data corresponding to syllable string ID = 2 in record R412 is data that makes the caller feel as if it is a fixed phrase (for example, “Yes?”) Spoken when asking the caller. It is. Syllable string data is not intended to generate speech that makes no sense to the other party. By preparing the syllable string data to make it seem to the other party as if it was a fixed phrase, and making it appear in the output voice 402 from time to time naturally Natural language is born, and it is possible to make the other party believe that the output speech 402 is the voice of the user himself / herself.
[0063] 図 9〜図 11は、本発明の実施の形態 2における携帯電話端末 200の、音声出力制 御手段 3の処理手順を表すフローチャートである。音声出力制御手段 3は、まずステ ップ S 211にお 、て音韻情報 202を取得し、ステップ S 212にお 、て音韻情報 202に 含まれて 、る信号が「ON」であるか「OFF」であるかに基づ!/、て入力音声 201が有 音区間であるかどうかを判断する。有音区間である場合 (YES)、音声出力制御手段 3は図 10のステップ S221へ進み、無音区間である場合 (NO)、ステップ S213へ進 む。 FIGS. 9 to 11 are flowcharts showing the processing procedure of audio output control means 3 of mobile phone terminal 200 according to Embodiment 2 of the present invention. The voice output control means 3 first acquires the phoneme information 202 in step S211. In step S212, the audio output control means 3 determines whether the signal included in the phoneme information 202 is “ON” or “OFF”. It is determined whether or not the input voice 201 is in a voice section. If it is a voiced section (YES), the audio output control means 3 proceeds to step S221 in FIG. 10, and if it is a silent section (NO), proceeds to step S213.
[0064] ステップ S213において音声出力制御手段 3は、音声出力手段 4が出力音声 402を 出力中であるかどうかを判断し、出力中(YES)であればステップ S211へ戻り、出力 中でない(NO)ならばステップ S214へ進む。ステップ S214において音声出力制御 手段 3は、入力音声 201の無音区間が通話開始後の最初の無音区間である、すな わち通話開始直後であるかどうかを判断し、最初の無音区間であれば (YES)図 11 のステップ S231へ進み、そうでなければ(NO)ステップ S215へ進む。  [0064] In step S213, the audio output control unit 3 determines whether the audio output unit 4 is outputting the output audio 402. If the audio output unit 4 is outputting (YES), the process returns to step S211 and is not being output (NO ), Go to step S214. In step S214, the voice output control means 3 determines whether or not the silent section of the input voice 201 is the first silent section after the start of the call, that is, immediately after the start of the call. (YES) Proceed to step S231 in FIG. 11, otherwise (NO) proceed to step S215.
[0065] ステップ 215において音声出力制御手段 3は、 0以上 1未満の擬似乱数 dを発生し 、その値によって処理を分岐する。 d力 . 2未満の場合はステップ S216へ進み、 0. 2以上 0. 9未満の場合はステップ S217へ進み、 d力 . 9以上の場合はステップ S21 9へ進む o  In step 215, the audio output control means 3 generates a pseudorandom number d that is greater than or equal to 0 and less than 1, and branches the process depending on the value. If d force .2 is less than 2, go to step S216, if 0.2 or more and less than 0.9, go to step S217, if d force .9 or more, go to step S21 9 o
[0066] ステップ S216において音声出力制御手段 3は、音節列 ID = 2、すなわち通話相手 に問い返すような音節列を出力することを選択する。また、ステップ S217において音 声出力制御手段 3は、図 8の音節列テーブル T41に格納されている音節列 IDまたは 音節列 ID = 0のうちから音節列 IDをランダムに選択する。音節列 ID=0は、音声出 力手段 4の出力音声 402を停止することを示す。ステップ S218において音声出力制 御手段 3は、ステップ S216または S217で選択した音節列 IDを音声出力指示 302と して出力してステップ S211へ戻る。 [0067] ステップ S219において音声出力制御手段 3は、オンフック信号 304を出力すること により、無線通信手段 5は無線公衆網との接続を終了する。このように、確率的に通 話を切断する処理を実行することによって、利用者が通話相手の発話の意味がわか らず通話を継続しても無駄だと判断してオンフック操作を行ったかのように通話相手 に思わせることができる。また、通話開始力もしばらくして音声出力制御手段 3が自動 的に通話の切断を指示するので、悪意を持った通話相手の声を聞きながら利用者自 身がオンフックボタン 92を操作する手間が不要となり、利便性が向上する。 [0066] In step S216, the voice output control means 3 selects to output a syllable string ID = 2, that is, to output a syllable string that asks the other party. In step S217, the voice output control means 3 randomly selects a syllable string ID from syllable string ID or syllable string ID = 0 stored in the syllable string table T41 of FIG. The syllable string ID = 0 indicates that the output sound 402 of the sound output means 4 is stopped. In step S218, the audio output control means 3 outputs the syllable string ID selected in step S216 or S217 as the audio output instruction 302, and returns to step S211. [0067] In step S219, the audio output control means 3 outputs the on-hook signal 304, whereby the radio communication means 5 ends the connection with the radio public network. In this way, by executing the process of probabilistically disconnecting the call, it seems as if the user did not understand the meaning of the other party's utterance and judged that it was useless to continue the call and performed an on-hook operation. Can remind the other party to call you. In addition, since the voice output control means 3 automatically instructs to disconnect the call after a while, the user does not have to operate the on-hook button 92 while listening to the voice of the other party with malicious intent. As a result, convenience is improved.
[0068] 図 10のステップ S221〜S229iま、図 9のステップ S212にお!/、て入力音声 201力 S 有音区間である場合、すなわち通話相手が音声を発している場合の音声出力制御 手段 3の処理手順である。ステップ S221において音声出力制御手段 3は、 0以上 1 未満の擬似乱数 dを発生し、その値によって処理を分岐する。 dが 0. 998未満の場 合はステップ S222へ進み、 0. 998以上の場合はステップ S229へ進む。  [0068] Steps S221 to S229i in FIG. 10 and step S212 in FIG. 9! /, And input voice 201 force S When there is a voiced section, that is, voice output control means when the other party is making voice 3 is a processing procedure. In step S221, the audio output control means 3 generates a pseudo-random number d that is greater than or equal to 0 and less than 1, and branches the process depending on the value. If d is less than 0.998, the process proceeds to step S222. If d is 0.998 or more, the process proceeds to step S229.
[0069] ステップ S222で音声出力制御手段 3は音節列 ID=0とし、ステップ S223で音声 出力指示 302を出力してステップ S211へ戻る。ステップ S222〜S223は、利用者が 発話している最中に通話相手が話し始めたため利用者が発話を中断したかのように 通話相手に思わせるための処理である。ステップ S222〜S223の処理により、通話 相手の音声を聴きながら利用者が発話しているのだと通話相手に信じさせることがで きる。  In step S222, the audio output control means 3 sets syllable string ID = 0, outputs an audio output instruction 302 in step S223, and returns to step S211. Steps S222 to S223 are processes for making the other party think as if the user had stopped speaking because the other party started speaking while the user was speaking. Through the processing of steps S222 to S223, it is possible to make the other party believe that the user is speaking while listening to the voice of the other party.
[0070] ステップ S229において、音声出力制御手段 3がオンフック信号 304を出力すること により、無線通信手段 5は無線公衆網との接続を終了する。  [0070] In step S229, when the audio output control means 3 outputs the on-hook signal 304, the wireless communication means 5 ends the connection with the wireless public network.
[0071] 図 11のステップ S231〜S239は、図 9のステップ S214において入力音声 201の 無音区間が通話開始後の最初の無音区間であると判断した場合、すなわち通話開 始直後の音声出力制御手段 3の処理手順である。ステップ S231において音声出力 制御手段 3は、既に音節列 IDの 2回の出力を完了したかどうかを判断し、完了してい なければ(NO)ステップ S232へ進み、完了していれば (YES)ステップ S239へ進ん でオンフック信号 304を出力する。  Steps S231 to S239 in FIG. 11 are voice output control means when it is determined in step S214 in FIG. 9 that the silent section of the input voice 201 is the first silent section after the start of the call, that is, immediately after the start of the call. 3 is a processing procedure. In step S231, the audio output control means 3 determines whether or not the two output of the syllable string ID has already been completed. If not completed (NO), the process proceeds to step S232, and if completed, (YES) step. Proceed to S239 to output the on-hook signal 304.
[0072] ステップ S232で音声出力制御手段 3は音節列 ID= 1、すなわち通話開始直後の 挨拶のような音節列を出力することを選択し、ステップ S233で音声出力指示 302を 出力して図 9のステップ S211へ戻る。ステップ S231〜S239は、通話開始直後に利 用者が挨拶を発声し、 2回挨拶しても通話相手の応答がな 、ので通話を切断したか のように通話相手に思わせるための処理である。 [0072] In step S232, the voice output control means 3 selects to output a syllable string ID = 1, that is, to output a syllable string such as a greeting immediately after the start of a call. Output and return to step S211 in FIG. Steps S231 to S239 are processes to make the caller feel as if the call is disconnected because the user greets immediately after the call starts and the caller does not respond even if greeting is made twice. is there.
[0073] ステップ S218、 S223、 S233において音声出力制御手段 3が音声出力指示 302 を出力した際に、音声出力手段 4がそれを受け付けて出力音声 402を出力する処理 は以下の手順で実行される。まず音声出力手段 4は、音声出力指示 302に含まれる 音節列 IDに対応する音節列データを音節列テーブル T41 (図 8)を検索することによ り得る。次に音声出力手段 4は、音節列データに基づいて出力音声 402を合成して 出力する。このとき、音声出力手段 4は出力音声 402の基本周波数 FO'を、音節列 データが示す基本周波数の係数 (Xに基づき周波数算出式に従って音節ごとに計算 する。周波数算出式は次式のとおりである。  [0073] When the audio output control means 3 outputs the audio output instruction 302 in steps S218, S223, and S233, the process in which the audio output means 4 receives it and outputs the output audio 402 is executed in the following procedure. . First, the voice output means 4 can obtain the syllable string data corresponding to the syllable string ID included in the voice output instruction 302 by searching the syllable string table T41 (FIG. 8). Next, the voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data. At this time, the voice output means 4 calculates the fundamental frequency FO ′ of the output voice 402 for each syllable according to the frequency calculation formula based on the fundamental frequency coefficient (X based on the syllable string data. The frequency calculation formula is as follows. is there.
[0074] FO, = FObase * a * (乱数(0. 4) +0. 8)  [0074] FO, = FObase * a * (random number (0. 4) +0.8)
ここで FObaseは、出力音声 402の基本周波数の初期値であり、 FObaseの値を調 整しておくことにより出力音声 402を男声や女声とすることができる。たとえば FObase = 120Hzとすれば、出力音声 402を男声にすることができる。 FObaseの値は利用 者の操作によって変更することを可能としてもよい。乱数 (0. 4)は、 0. 4未満の乱数 を表す。基本周波数に乱数を乗ずることで出力音声 402のイントネーション変化にバ リエーシヨンを与えることができ、出力音声 402が合成によるものであることを通話相 手に悟られてしまうことを防いでいる。また音声出力手段 4は、音節の境界において 前後の音節の基本周波数を補間する処理を行うことにより、出力音声 402が音声とし て不自然にならないようにしている。以下、本発明の実施の形態 2の具体的な動作を 説明する。すなわち、実施の形態 2では通話相手の音声を入力音声として分析した 結果に基づいて合成した音声を出力音声として無線公衆網へ出力し、通話相手に 聞かせている。以下、通話相手が無言電話をかけてきた場合を説明する。  Here, FObase is an initial value of the fundamental frequency of the output sound 402. By adjusting the value of FObase, the output sound 402 can be made male voice or female voice. For example, if FObase = 120 Hz, the output voice 402 can be changed to a male voice. The value of FObase may be changed by user operation. The random number (0. 4) represents a random number less than 0.4. By multiplying the fundamental frequency by a random number, a variation can be given to the intonation change of the output voice 402, thereby preventing the other party from realizing that the output voice 402 is due to synthesis. The voice output means 4 performs processing for interpolating the fundamental frequencies of the preceding and following syllables at the syllable boundary so that the output voice 402 does not become unnatural as a voice. The specific operation of Embodiment 2 of the present invention will be described below. In other words, in the second embodiment, the synthesized voice based on the result of analyzing the other party's voice as the input voice is output to the wireless public network as the output voice, and the other party is told. Hereinafter, a case where the other party makes a silent call will be described.
[0075] 図 12は、本発明の実施の形態 2における携帯電話端末 200における出力音声 40 2の内容を示す図である。図 12において横軸は時刻を、縦軸は基本周波数を表して いる。音節記号 D211〜D214は、出力音声 402に含まれる音節を表している。以下 、図 12を中心に本発明の実施の形態 2における携帯電話端末 200の動作を説明す る。 FIG. 12 is a diagram showing the contents of output sound 402 in mobile phone terminal 200 according to Embodiment 2 of the present invention. In Fig. 12, the horizontal axis represents time and the vertical axis represents the fundamental frequency. Syllable symbols D211 to D214 represent syllables included in the output speech 402. Hereinafter, the operation of the mobile phone terminal 200 according to the second embodiment of the present invention will be described with reference to FIG. The
[0076] まず、着信時である時刻 t201にお 、て利用者がオフフックボタン 93を操作して通 話を開始すると、音声分析手段 2が入力音声 201を分析して音韻情報 202を抽出す る。抽出された音韻情報 202には無音を示す信号「OFF」が含まれている。  [0076] First, at time t201 when an incoming call is received, when the user starts a call by operating the off-hook button 93, the voice analysis means 2 analyzes the input voice 201 and extracts phonological information 202. . The extracted phoneme information 202 includes a signal “OFF” indicating silence.
[0077] 次に、音声出力制御手段 3が図 9〜: L 1のフローチャートに示す手順で処理を実行 する。ステップ S211において音声出力制御手段 3は、音韻情報 202を取得する。ス テツプ S212において音声出力制御手段 3は、無音区間(「OFF」)である (NO)ので ステップ S213へ進む。ステップ S213において音声出力制御手段 3は、出力音声 40 2を出力して!/ヽな ヽ(NO)のでステップ S 214へ進む。ステップ S 214にお!/、て音声出 力制御手段 3は、通話開始後の最初の無音区間である (YES)ので図 11のステップ S231へ進む。  Next, the audio output control means 3 executes processing in the procedure shown in the flowchart of FIG. In step S211, the audio output control means 3 acquires the phoneme information 202. In step S212, the audio output control means 3 is a silent section (“OFF”) (NO), so the process proceeds to step S213. In step S213, the sound output control means 3 outputs the output sound 402, and the process proceeds to step S214 because! / ヽ (NO). In step S 214! /, The voice output control means 3 is the first silent section after the start of the call (YES), so the process proceeds to step S 231 in FIG.
[0078] ステップ S231において音声出力制御手段 3は、出力音声 402を 1回も出力してい ない(NO)のでステップ S232へ進む。ステップ S232において音声出力制御手段 3 は音節列 IDを 1とし、ステップ 233において音声出力指示 302を出力する。  In step S231, since the audio output control means 3 has not output the output audio 402 even once (NO), the process proceeds to step S232. In step S232, the audio output control means 3 sets the syllable string ID to 1, and outputs an audio output instruction 302 in step 233.
[0079] さらに、音声出力手段 4が音声出力指示 302を受け付け、音節列テーブル T41を 参照して音節列 ID= 1に対応する音節列データ「 [ho] (1. 0) , [la] (l. 2)」(図 8の レコード R411の第 2フィールド F412)を得て出力音声 402 (図 12の D211、 D212 の部分)を合成する。ここで、音節列データの係数 αは音節 [ho]が 1. 0、 [la]が 1. 2 と、音節 [la]の方の基本周波数を高くするように設定されているが、出力音声 402の 基本周波数は音節 [ho] (D211)の方が音節 [la] (D212)よりも高くなつている。これ は、周波数算出式における乱数の項によって出力音声 402の基本周波数に揺らぎ が与えられたためである。  [0079] Further, the voice output means 4 receives the voice output instruction 302, and refers to the syllable string table T41, and the syllable string data “[ho] (1. 0), [la] ( l. 2) ”(the second field F412 of the record R411 in FIG. 8) is obtained and the output speech 402 (D211 and D212 in FIG. 12) is synthesized. Here, the coefficient α of the syllable string data is set so that the syllable [ho] is 1.0 and [la] is 1.2, and the fundamental frequency of the syllable [la] is set higher. The fundamental frequency of 402 is higher for syllable [ho] (D211) than for syllable [la] (D212). This is because fluctuation is given to the fundamental frequency of the output speech 402 by the term of the random number in the frequency calculation formula.
[0080] 以下、同様の処理により、出力音声 402の 2回目の出力が行われる(図 12の D213 、 D214の部分)。音声出力制御手段 3はステップ S231の 3回目の処理において、 2 回出力済である (YES)ためステップ S239へ進み、オンフック信号 304を出力するこ とにより図 12の時刻 t202に無線通信手段 5が無線公衆網との接続を終了する。  Thereafter, the second output of the output sound 402 is performed by the same processing (parts D213 and D214 in FIG. 12). Since the audio output control means 3 has already been output twice in the third process of step S231 (YES), the process proceeds to step S239, and the on-hook signal 304 is output so that the wireless communication means 5 at time t202 in FIG. The connection with the wireless public network is terminated.
[0081] 以上説明したとおり、本発明の実施の形態 2における携帯電話端末は、通話相手 が無言電話をかけてきた場合、挨拶として発声される定型句であるかのように通話相 手に思わせるための音声を定められた回数だけ出力した後、公衆網との接続を自動 的に終了する。 [0081] As described above, the mobile phone terminal according to Embodiment 2 of the present invention has a communication mode as if it is a fixed phrase that is uttered as a greeting when the other party makes a silent call. After outputting the voice to make it feel like a hand for a specified number of times, the connection with the public network is automatically terminated.
[0082] 本発明の実施の形態 2の他の具体的な動作として、架空請求を目的とした悪意呼 を着信した場合を例として説明する。  As another specific operation of the second embodiment of the present invention, a case where a malicious call for the purpose of fictitious billing is received will be described as an example.
[0083] 図 13は、本実施の携帯 2における携帯電話端末 200における入力音声 201と出力 音声 402の内容を示す図である。図 13において横軸は時刻を表している。図 13 (a) 、図 13 (b)、図 13 (c)は、連続的な動作を便宜上 3つに分けて示したものである。図 13で、テキスト D301〜D305は入力音声 201、すなわち通話相手の音声の内容を 、テキスト D311〜D315は出力音声 402の内容を表している。なお、図 13において 入力音声 201および出力音声 402の基本周波数の情報については記載を省略して ある。以下、図 13を中心として本実施の形態 2における携帯電話端末 200の動作を 説明する。なお、着信時に利用者が時刻 t301にオフフックボタン 93を操作してから 最初の出力音声 402 (D311)が出力されるまでの処理は、本発明の実施の形態 2に おける前述の動作例の処理と同様であるので説明を省略する。  FIG. 13 is a diagram showing the contents of the input sound 201 and the output sound 402 in the mobile phone terminal 200 in the mobile phone 2 of the present embodiment. In FIG. 13, the horizontal axis represents time. FIG. 13 (a), FIG. 13 (b), and FIG. 13 (c) show the continuous operation divided into three for convenience. In FIG. 13, texts D301 to D305 represent the contents of the input voice 201, that is, the voice of the other party, and texts D311 to D315 represent the contents of the output voice 402. In FIG. 13, information on the fundamental frequencies of the input sound 201 and the output sound 402 is omitted. Hereinafter, the operation of the mobile phone terminal 200 according to the second embodiment will be described with reference to FIG. Note that the process from when the user operates the off-hook button 93 at time t301 to the time when the first output sound 402 (D311) is output is the process of the above-described operation example according to the second embodiment of the present invention. Since it is the same as that of FIG.
[0084] 音声 D301が入力されると、音声分析手段 2が入力音声 201を分析して音韻情報 2 02を抽出する。抽出された音韻情報 202には有音を示す信号「ON」が含まれている  When the voice D301 is input, the voice analysis means 2 analyzes the input voice 201 and extracts phonological information 202. The extracted phoneme information 202 includes a signal “ON” indicating sound.
[0085] 次に、音声出力制御手段 3が図 9〜: L 1のフローチャートに示す手順で処理を実行 する。ステップ S211において音声出力制御手段 3は、音韻情報 202を取得する。ス テツプ S212において音声出力制御手段 3は、有音区間(「ON」)である (YES)ので 図 10のステップ S221へ進む。 Next, the audio output control means 3 executes processing in the procedure shown in the flowchart of FIG. In step S211, the audio output control means 3 acquires the phoneme information 202. In step S212, since the sound output control means 3 is a sound section (“ON”) (YES), the process proceeds to step S221 in FIG.
[0086] ステップ S221において音声出力制御手段 3は、擬似乱数 dを発生させる力 ここで は d=0. 2であるとすると、 dく 0. 998であるのでステップ S222へ進み、音節列 ID = 0とし、これをステップ S223で音声出力指示 302として出力する。音声出力手段 4は 音声出力指示 302を受け付ける力 音節列 ID = 0であるので出力音声 402を出力し ない。  [0086] In step S221, the voice output control means 3 generates a pseudo-random number d. Here, if d = 0.2, d is 0.999, so the process proceeds to step S222, and the syllable string ID = This is set to 0, and this is output as an audio output instruction 302 in step S223. The voice output means 4 does not output the output voice 402 because the force syllable string ID = 0 for receiving the voice output instruction 302.
[0087] 入力音声 201が入力されている間は、上述の処理が繰り返されるので、出力音声 4 02は出力されない。 [0088] 音声 D301が終了すると、音声分析手段 2は無音を示す信号「OFF」を含んだ音韻 情報 202を出力する。音声出力制御手段 3は音韻情報 202を取得し (ステップ S211 )、ステップ S212の判断 (NO)によりステップ S213へ進み、音声出力中ではないの でさらにステップ S214へと進む。ステップ S214において音声出力制御手段 3は、最 初の無音区間ではない(NO)のでステップ S215へ進む。 While the input sound 201 is being input, the above-described processing is repeated, so that the output sound 4002 is not output. When the speech D301 ends, the speech analysis means 2 outputs phoneme information 202 including a signal “OFF” indicating silence. The voice output control means 3 acquires the phoneme information 202 (step S211), and proceeds to step S213 based on the determination (NO) in step S212, and further proceeds to step S214 because the voice is not being output. In step S214, since the sound output control means 3 is not the first silent section (NO), the process proceeds to step S215.
[0089] ステップ S215において音声出力制御手段 3が発生した擬似乱数 dの値がここでは 0. 3であるとすると、 0. 2≤d< 0. 9であるのでステップ S217へ進む。ステップ S217 において音声出力制御手段 3は音節列 IDをランダムに選択するが、ここでは ID = 3 であるとすると、ステップ S218において音声出力制御手段 3は音節列 ID= 3を含む 音声出力指示 302を出力する。  [0089] If the value of the pseudorandom number d generated by the audio output control means 3 in step S215 is 0.3 here, the process proceeds to step S217 because 0.2≤d <0.9. In step S217, the voice output control means 3 randomly selects a syllable string ID. Here, if ID = 3, the voice output control means 3 in step S218 gives a voice output instruction 302 including the syllable string ID = 3. Output.
[0090] 音声出力手段 4は音声出力指示 302を受け付け、音節列テーブル T41を参照して 音節列データ [ [ki] (l. 0) , [ru] (0. 9) , [mi] (0. 9) , [ji] (l . 2) , [hi] (l. 1) , [g o] (1. 0) , [che] (1. 3) , [si] (1. 5)」(図 8のレコード R413の第 2フィールド F412) を得て出力音声 402として音声 D312 (図 13)を出力する。  [0090] The voice output means 4 accepts the voice output instruction 302 and refers to the syllable string table T41, and the syllable string data [[ki] (l. 0), [ru] (0. 9), [mi] (0 9), [ji] (l. 2), [hi] (l. 1), [go] (1. 0), [che] (1. 3), [si] (1. 5) ”( The second field F412) of the record R413 in FIG. 8 is obtained, and the sound D312 (FIG. 13) is output as the output sound 402.
[0091] 音声 D313「chegi?」は、ステップ S215において音声出力制御手段 3が発生した 擬似乱数 dの値が 0. 2よりも小さくステップ S216において音節列 ID = 2となった場合 、またはステップ S217において音節列 IDとして 2が選択された場合の出力である。  [0091] The voice D313 “chegi?” Is generated when the value of the pseudo-random number d generated by the voice output control means 3 in step S215 is smaller than 0.2 and the syllable string ID = 2 in step S216, or in step S217. This is the output when 2 is selected as the syllable string ID.
[0092] 音声 D314が出力されている最中に時刻 t302に通話相手が話し始めると、音声出 力制御手段 3はステップ S212の判断 (YES)により図 10の処理へ進む。ここではス テツプ S221において音声出力制御手段 3が発生した擬似乱数 dが 0. 7であったとす ると、 d< 0. 998であるので音声出力制御手段 3はステップ S222、 S223の処理を実 行し、音声出力手段 4は図 13の時刻 t303に出力音声 402の出力を中断する。  If the other party starts speaking at time t302 while the voice D314 is being output, the voice output control means 3 proceeds to the process of FIG. 10 based on the determination in step S212 (YES). Here, if the pseudorandom number d generated by the audio output control means 3 in step S221 is 0.7, since d <0.99, the audio output control means 3 executes the processing of steps S222 and S223. The audio output means 4 interrupts the output of the output audio 402 at time t303 in FIG.
[0093] 音声 D305が入力されている最中にステップ S221で音声出力制御手段 3が発生 する擬似乱数 dの値がここで 0. 9984となったとすると、 d≥0. 998であるので、音声 出力制御手段 3はステップ S229でオンフック信号 304を出力することにより、無線通 信手段 5が時刻 t304に無線公衆網との接続を終了する。  [0093] If the value of the pseudorandom number d generated by the audio output control means 3 in step S221 while the audio D305 is being input is 0.9984 here, d≥0.999. The output control means 3 outputs the on-hook signal 304 in step S229, so that the wireless communication means 5 terminates the connection with the wireless public network at time t304.
[0094] 以上説明したとおり、本発明の実施の形態 2における携帯電話端末は、通話相手 にとつて意味を成さない音声を通話相手の音声の合間に出力することによって利用 者が会話をしているように見せかけた後、自動的に通話を切断する。 [0094] As described above, the mobile phone terminal according to Embodiment 2 of the present invention is used by outputting a voice that does not make sense to the other party in the middle of the other party's voice. The call is automatically disconnected after pretending that the person is talking.
[0095] 以上の説明から明らかなように、本発明の実施の形態 2における携帯電話端末は、 音節列をランダムに通話相手の端末へ出力するため、通話相手にとって意味を成さ ない音声を出力することができるので、通話相手に利用者の国籍を知られることなぐ 利用者との意思疎通が不可能であると通話相手に思わせることができる。  As is apparent from the above description, the mobile phone terminal according to Embodiment 2 of the present invention outputs a syllable string to the call partner's terminal at random, and therefore outputs voice that does not make sense to the call partner. As a result, it is possible to make the call partner think that it is impossible to communicate with the user without knowing the nationality of the user.
[0096] また、本発明の実施の形態 2における携帯電話端末は、通話相手の音声が有音か どうかに基づいて音声の出力を変化させることができるため、通話相手の発話状況に 応じた音声出力ができるので、出力音声が定型メッセージや合成音声であると疑う余 地を相手に与えず、利用者との意思疎通が不可能であると確実に相手に思わせるこ とがでさる。  [0096] In addition, since the mobile phone terminal according to Embodiment 2 of the present invention can change the output of the sound based on whether the other party's voice is sounded, the voice according to the other party's utterance situation Since it can be output, it does not give the other party the room to suspect that the output voice is a standard message or synthesized voice, and it makes sure that the other party thinks that communication with the user is impossible.
[0097] また、本発明の実施の形態 2における携帯電話端末は、定型的な音声をときどき出 力するため、出力音声の自然さが増すので、出力音声が定型メッセージや合成音声 であると疑う余地を通話相手に与えず、利用者との意思疎通が不可能であると確実 に通話相手に思わせることができる。  [0097] In addition, since the mobile phone terminal according to Embodiment 2 of the present invention outputs a standard voice occasionally, the naturalness of the output voice increases, so it is suspected that the output voice is a standard message or a synthesized voice. Without giving room to the other party, it is possible to make the other party think that the communication with the user is impossible.
[0098] また、本発明の実施の形態 2における携帯電話端末は、音声出力の最中に通話相 手が話し始めると音声出力を中止するため、通話相手の音声を聴きながら利用者が 実際に発話しているのだと通話相手に思わせることができるので、利用者との意思疎 通が不可能であると確実に通話相手に思わせることができる。  [0098] In addition, since the mobile phone terminal according to Embodiment 2 of the present invention stops voice output when the other party starts speaking during voice output, the user actually listens to the voice of the other party. Since the call partner can be made to think that he / she is speaking, the call partner can be surely made to think that communication with the user is impossible.
[0099] また、本発明の実施の形態 2における携帯電話端末は、通話中に自動的に通信を 終了するため、利用者が通話相手の通話の意味がわ力 ず通話を継続しても無駄 だと判断してオンフック操作を行ったかのように通話相手に思わせることができるので 、利用者との意思疎通が不可能であると確実に通話相手に思わせることができる。  [0099] In addition, since the mobile phone terminal according to Embodiment 2 of the present invention automatically terminates communication during a call, it is useless even if the user continues the call without knowing the meaning of the other party's call. Therefore, it is possible to make the caller feel as if an on-hook operation has been performed, so that it is possible to make the caller believe that communication with the user is impossible.
[0100] なお、本発明の実施の形態 2において、音節列データは予め定めたもの力も選択 するようにしたが、通話相手にとって無意味な音声を出力できるものであればどのよう な手段を用いてもよぐたとえば音声出力のたびにランダムな音節列や音素列を都度 生成するようにしてもよ ヽ。  [0100] In Embodiment 2 of the present invention, the syllable string data is also selected in advance, but any means can be used as long as it can output a meaningless voice for the other party. For example, a random syllable string or phoneme string may be generated each time a voice is output.
[0101] また、本発明の実施の形態 2において、出力音声を出力するか通信を切断するか を判断する条件 (ステップ S215、 S221の分岐判断における dの値の範囲)は一定で あるとしたが、これに限るものではなぐ通信を切断する確率を着信を重ねるごとに増 カロさせるなど、条件を変化させるようにしてもょ 、。 [0101] Further, in the second embodiment of the present invention, the condition for determining whether to output the output sound or disconnect the communication (the range of the value of d in the branch determination of steps S215 and S221) is constant. Although there are cases where this is not the case, it may be possible to change the conditions, such as increasing the probability of disconnecting communications with each incoming call.
[0102] また、本発明の実施の形態 1および実施の形態 2において、端末が本発明の音声 出力装置を内蔵するとしたが、たとえば交換機、中継機、サーバ等、端末と接続する 外部装置が本発明の音声出力装置を内蔵し、利用者または通話相手の音声を処理 して音声を出力するように構成してもよ 、。  [0102] Furthermore, in Embodiment 1 and Embodiment 2 of the present invention, the terminal incorporates the audio output device of the present invention. However, for example, an external device connected to the terminal, such as an exchange, a repeater, a server, etc. The voice output device of the invention may be built in, and the voice of the user or the other party may be processed to output the voice.
[0103] また、本発明の実施の形態 1および実施の形態 2において、携帯電話端末の例を 説明したが、これに限るものではなぐ固定電話、 IP電話、ボイスチャット、インターホ ン等でも同様の効果が得られる。  [0103] In the first and second embodiments of the present invention, the example of the mobile phone terminal has been described. However, the present invention is not limited to this, and the same applies to a landline phone, an IP phone, a voice chat, an interphone, and the like. An effect is obtained.
[0104] また、本発明の音声出力装置は、利用者の入力音声に対してランダム性を持つ音 声を出力する装置に利用することができ、たとえば電子ペット、ペットロボット、玩具、 ゲーム機やゲームソフト等を構成するための音声出力装置としても利用できる。 本発明を詳細にまた特定の実施態様を参照して説明したが、本発明の精神と範囲 を逸脱することなく様々な変更や修正を加えることができることは当業者にとって明ら かである。  [0104] Further, the audio output device of the present invention can be used for an apparatus that outputs a sound having randomness with respect to a user's input voice. For example, an electronic pet, a pet robot, a toy, a game machine, It can also be used as an audio output device for configuring game software or the like. Although the invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.
本出願は、 2005年 8月 2日出願の日本特許出願 No.2005-223652に基づくものであ り、その内容はここに参照として取り込まれる。  This application is based on Japanese Patent Application No. 2005-223652 filed on August 2, 2005, the contents of which are incorporated herein by reference.
産業上の利用可能性  Industrial applicability
[0105] 本発明の音声出力装置および音声出力方法は、悪意呼の発呼者に利用者の国籍 を知られることがなぐ利用者との意思疎通が不可能であると発呼者に思わせること ができ、発呼者による再度の悪意呼を抑止することができるという効果を有し、不特 定の者との通話が可能な音声通信装置に有用である。 [0105] The voice output device and voice output method of the present invention make the caller think that the caller of the malicious call cannot communicate with the user who cannot know the nationality of the user. Therefore, it is effective for a voice communication apparatus capable of suppressing another malicious call by a caller and capable of making a call with an unspecified person.

Claims

請求の範囲 The scope of the claims
[1] 入力音声から音韻情報を抽出する音声分析手段と、  [1] speech analysis means for extracting phonological information from input speech;
前記音韻情報に基づいて音声出力を指示する音声出力制御手段と、  Voice output control means for instructing voice output based on the phonological information;
前記指示に基づいて音声を出力する音声出力手段と、  Audio output means for outputting audio based on the instructions;
を具備する音声出力装置であって、  An audio output device comprising:
前記入力音声の音韻情報に基づいてランダムな音韻の音声を出力するよう構成した ことを特徴とする音声出力装置。  A speech output device configured to output speech of random phonemes based on phoneme information of the input speech.
[2] 前記音韻情報は前記入力音声に含まれる音素または音節を識別する情報を含み、 前記音声出力制御手段は、前記音素または音節をあらかじめ定めた規則に従って 置換することによって、前記出力音声を構成する音素または音節を決定することを特 徴とする請求項 1記載の音声出力装置。  [2] The phonological information includes information for identifying a phoneme or syllable included in the input speech, and the speech output control means configures the output speech by replacing the phoneme or syllable according to a predetermined rule. The audio output device according to claim 1, wherein the phoneme or syllable to be determined is determined.
[3] 前記音韻情報は、前記入力音声が有音かどうかを表す情報を含み、  [3] The phonological information includes information indicating whether or not the input speech is speech,
前記音声出力制御手段は、前記有音かどうかを表す情報に基づいて音声出力を指 示することを特徴とする請求項 1または 2に記載の音声出力装置。  3. The audio output device according to claim 1, wherein the audio output control means instructs audio output based on information indicating whether the sound is present.
[4] 前記音韻情報は、前記入力音声の基本周波数を表す情報を含み、  [4] The phoneme information includes information indicating a fundamental frequency of the input speech,
前記音声出力制御手段は、前記基本周波数に基づいて前記出力音声の基本周波 数を決定することを特徴とする請求項 1ないし 3のいずれか一項記載の音声出力装 置。  4. The audio output device according to claim 1, wherein the audio output control means determines a fundamental frequency of the output audio based on the fundamental frequency.
[5] 通信処理を実行し、前記出力音声を通信先へ出力する通信手段と、  [5] A communication means for executing communication processing and outputting the output sound to a communication destination;
請求項 1ないし 4のいずれか一項記載の音声出力装置を具備する音声通信装置。  An audio communication device comprising the audio output device according to any one of claims 1 to 4.
[6] 入力音声から音韻情報を抽出する第 1のステップと、  [6] a first step of extracting phonological information from the input speech;
前記音韻情報に基づいてランダムな音韻の音声出力を指示する第 2のステップと、 前記指示に基づいて前記出力音声を出力する第 3のステップと、  A second step of instructing random phonological speech output based on the phonological information; a third step of outputting the output speech based on the instruction;
を有する音声出力方法。  An audio output method comprising:
PCT/JP2006/304390 2005-08-02 2006-03-07 Voice output apparatus, voice communication apparatus and voice output method WO2007015319A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007503136A JPWO2007015319A1 (en) 2005-08-02 2006-03-07 Audio output device, audio communication device, and audio output method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005223652 2005-08-02
JP2005-223652 2005-08-02

Publications (1)

Publication Number Publication Date
WO2007015319A1 true WO2007015319A1 (en) 2007-02-08

Family

ID=37708602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/304390 WO2007015319A1 (en) 2005-08-02 2006-03-07 Voice output apparatus, voice communication apparatus and voice output method

Country Status (2)

Country Link
JP (1) JPWO2007015319A1 (en)
WO (1) WO2007015319A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013195928A (en) * 2012-03-22 2013-09-30 Yamaha Corp Synthesis unit segmentation device
US8706133B2 (en) 2008-06-30 2014-04-22 Motorola Solutions, Inc. Threshold selection for broadcast signal detection
CN107346107A (en) * 2016-05-04 2017-11-14 深圳光启合众科技有限公司 Diversified motion control method and system and the robot with the system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05114880A (en) * 1991-10-22 1993-05-07 Hitachi Ltd Portable mobile radio terminal
JPH0720894A (en) * 1993-06-17 1995-01-24 Sony Corp Voice information processing device
JPH07273864A (en) * 1994-04-04 1995-10-20 Victor Co Of Japan Ltd Cordless telephone system
JP2002101203A (en) * 2000-09-20 2002-04-05 Ricoh Co Ltd Speech processing system, speech processing method and storage medium storing the method
JP2003110712A (en) * 2001-09-30 2003-04-11 Hiroko Ishikawa Voice modulation call
JP2003157100A (en) * 2001-11-22 2003-05-30 Nippon Telegr & Teleph Corp <Ntt> Voice communication method and equipment, and voice communication program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05114880A (en) * 1991-10-22 1993-05-07 Hitachi Ltd Portable mobile radio terminal
JPH0720894A (en) * 1993-06-17 1995-01-24 Sony Corp Voice information processing device
JPH07273864A (en) * 1994-04-04 1995-10-20 Victor Co Of Japan Ltd Cordless telephone system
JP2002101203A (en) * 2000-09-20 2002-04-05 Ricoh Co Ltd Speech processing system, speech processing method and storage medium storing the method
JP2003110712A (en) * 2001-09-30 2003-04-11 Hiroko Ishikawa Voice modulation call
JP2003157100A (en) * 2001-11-22 2003-05-30 Nippon Telegr & Teleph Corp <Ntt> Voice communication method and equipment, and voice communication program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8706133B2 (en) 2008-06-30 2014-04-22 Motorola Solutions, Inc. Threshold selection for broadcast signal detection
JP2013195928A (en) * 2012-03-22 2013-09-30 Yamaha Corp Synthesis unit segmentation device
CN107346107A (en) * 2016-05-04 2017-11-14 深圳光启合众科技有限公司 Diversified motion control method and system and the robot with the system

Also Published As

Publication number Publication date
JPWO2007015319A1 (en) 2009-02-19

Similar Documents

Publication Publication Date Title
US7974392B2 (en) System and method for personalized text-to-voice synthesis
US7400712B2 (en) Network provided information using text-to-speech and speech recognition and text or speech activated network control sequences for complimentary feature access
US7650168B2 (en) Voice activated dialing for wireless headsets
US9361888B2 (en) Method and device for providing speech-to-text encoding and telephony service
US6870914B1 (en) Distributed text-to-speech synthesis between a telephone network and a telephone subscriber unit
US6744860B1 (en) Methods and apparatus for initiating a voice-dialing operation
JP2007529916A (en) Voice communication with a computer
EP1703492B1 (en) System and method for personalised text-to-voice synthesis
US20110173001A1 (en) Sms messaging with voice synthesis and recognition
US20040203613A1 (en) Mobile terminal
CN104811559A (en) Noise reduction method, communication method and mobile terminal
KR20150017662A (en) Method, apparatus and storing medium for text to speech conversion
US20080096587A1 (en) Telephone for Sending Voice and Text Messages
WO2007015319A1 (en) Voice output apparatus, voice communication apparatus and voice output method
US20030120492A1 (en) Apparatus and method for communication with reality in virtual environments
JP2002101203A (en) Speech processing system, speech processing method and storage medium storing the method
US20080146197A1 (en) Method and device for emitting an audible alert
JP4583350B2 (en) Mobile terminal device, ringtone output method
WO2006042042A1 (en) Silent accept for incoming telephone calls
JP4100088B2 (en) Communication terminal, incoming call notification method and program
KR100553437B1 (en) wireless telecommunication terminal and method for transmitting voice message using speech synthesizing
JP2005123869A (en) System and method for dictating call content
JPH04177298A (en) Sound responding device
JP2002344572A (en) Mobile phone terminal, program, and recording medium recorded with program
JP2003046647A (en) System, method and program for repeating call and recording medium with the program recorded thereon

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2007503136

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06715360

Country of ref document: EP

Kind code of ref document: A1