WO2017094500A1 - Determination device and voice provision system provided therewith - Google Patents

Determination device and voice provision system provided therewith Download PDF

Info

Publication number
WO2017094500A1
WO2017094500A1 PCT/JP2016/083894 JP2016083894W WO2017094500A1 WO 2017094500 A1 WO2017094500 A1 WO 2017094500A1 JP 2016083894 W JP2016083894 W JP 2016083894W WO 2017094500 A1 WO2017094500 A1 WO 2017094500A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
text
determination
voice
permissive
Prior art date
Application number
PCT/JP2016/083894
Other languages
French (fr)
Japanese (ja)
Inventor
政文 坂井
裕美子 安田
村上 大介
陽子 福山
哲成 中
Original Assignee
株式会社電通
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社電通 filed Critical 株式会社電通
Priority to JP2017553757A priority Critical patent/JP6836033B2/en
Publication of WO2017094500A1 publication Critical patent/WO2017094500A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the present invention relates to a determination apparatus for determining non-permissive words included in text composed of natural language, and a voice providing system including the same.
  • the speech synthesizer described in Patent Document 1 includes an inappropriate word dictionary in which an inappropriate word or an inappropriate discourse pattern is registered, and the degree of an inappropriate portion included in the text to be read out is determined. Then, depending on the degree of inappropriate expression included in the text, it is possible to synthesize an audio watermark and to register in an external storage terminal server. In this way, it is possible to prevent the abuse of criminals without adversely affecting the text such as insertion of data that degrades the voice.
  • the present invention has been made in view of such a problem, and a determination device capable of accurately detecting non-permissive words contained in text regardless of the context of the text, and a voice providing system including the same. Intended to be provided.
  • the determination apparatus is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word.
  • the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
  • the determination unit determines the non-permissive word by partial coincidence between a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word. According to this configuration, not only when the pronunciation of the first determination target and the non-permissive word completely matches, it is possible to detect a word that partially matches the non-permissive word included in the text. Thus, even words that are similar to the non-permissive words included in the text can be detected.
  • the determination target generation unit divides the text into morphemes to generate a second determination target, and the determination unit generates a pronunciation corresponding to the first determination target and the non-permission Before the comparison with the pronunciation corresponding to the word, it is preferable to determine the non-permissive word and the non-permissive non-permissive word by comparing the second determination target with the non-permissive word. According to this configuration, prior to the comparison between the first determination target and the non-permissible word, it is possible to determine the non-permissive word and the permissive word included in the text based on the morpheme configuring the text.
  • non-permissive words included as morphemes in the text can be reliably detected by comparison with the second determination target.
  • non-permissive words included in the text can be determined in stages, it is possible to reduce omission of non-permissive words.
  • the determination unit may determine the second determination that is determined as the permissible word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word.
  • the non-permissive word is determined by comparing a pronunciation corresponding to an object and a pronunciation corresponding to the non-permissive word.
  • the non-permitted word is determined by comparing the pronunciation corresponding to the second determination target determined to be the permitted word with the pronunciation corresponding to the non-permitted word. Therefore, regardless of the meaning of the morpheme determined to be a permitted word, it is possible to detect non-permissive words included in the morpheme.
  • the determination apparatus is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing the character string forming the first determination target and the character string forming the non-permissive word.
  • the non-permissive word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary place and the character string forming the non-permissible word. Therefore, it is possible to compare a character string constituting an arbitrary combination of characters and numbers constituting the text with a character string constituting the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
  • a voice providing system is a voice providing system including any of the determination devices described above and providing voice corresponding to the text based on specific voiceprint data, wherein the determination device is the determination unit. And a voice generation unit that generates voice data using the specific voiceprint data from the text according to the determination result of
  • the voice data using the specific voiceprint data is generated from the text according to the determination result of the determination unit of the determination apparatus. Therefore, the voice data can be switched and generated according to the presence or absence of the non-permissive word included in the text. As a result, it is possible to provide speech data of different modes according to the presence or absence of non-permissive words included in the text.
  • the voice generation unit generates voice data corresponding to the text when the text does not include the non-permitted word.
  • the voice generation unit generates voice data corresponding to the text that does not include the non-permissive word without special correction or the like. For this reason, it is possible to quickly provide audio data corresponding to text.
  • the voice generation unit when the text includes the non-permissive word, the voice generation unit generates voice data corresponding to the text in which a portion corresponding to the non-permissive word is corrected. According to this configuration, voice data in which the portion corresponding to the non-permissive word is corrected is generated. For this reason, even if it is the text containing a non-permissive word, the audio
  • the voice generation unit deletes a portion corresponding to the non-permitted word included in the text.
  • the voice generation unit generates voice data in which the portion corresponding to the non-permitted word is deleted. For this reason, even if it is the text containing a non-permissive word, the audio
  • the voice generation unit replaces a portion corresponding to the non-permitted word included in the text.
  • the voice generation unit generates voice data in which a portion corresponding to the non-permitted word is replaced. For this reason, it is possible to prevent voice data including an unacceptable word from being provided as it is using specific voiceprint data.
  • the part corresponding to the non-permissive word is replaced, it is possible to avoid that part of the text is missing.
  • the voice generation unit uses voiceprint data different from the specific voiceprint data for a portion corresponding to the non-permitted word included in the text.
  • the voice generation unit generates voice data using different voiceprint data in a portion corresponding to the non-permitted word. For this reason, even in the text including the NG word, it is possible to provide audio data according to the text without changing the meaning of the text.
  • the voice generation unit replaces a portion corresponding to the non-permissive word included in the text with a word having a different expression. According to this configuration, the voice generation unit generates voice data in which the non-permitted word is replaced with a word having a different expression. For this reason, even if it is the text containing a disallowed word, the audio
  • the determination device includes a storage unit that stores the specific voiceprint data, and the storage unit stores the non-permitted word associated with the specific voiceprint data. According to this configuration, it is determined by the determination unit whether the non-permissive word associated with the specific voiceprint data is included in the text. Therefore, it is possible to prevent the provision of voice data including non-permissive words associated with specific voiceprint data.
  • the determination method is a determination method for determining a non-permissive word included in a text composed of natural language, comprising the step of dividing the text at an arbitrary place to generate a first determination target; Determining the non-permissive word by comparing a pronunciation corresponding to a first determination target and a pronunciation corresponding to the non-permissive word.
  • the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
  • FIG. 1 is an explanatory view showing an outline of a voice providing system according to the present embodiment.
  • the voice providing system 1 includes a management server 10 that constitutes an example of a determination device.
  • the management server 10 stores text and voiceprint data (digital voiceprint data) provided from the text registration terminal 20 and the voiceprint registration terminal 30 connected via the network NW such as the Internet, and is connected via the network NW.
  • the external terminal group 40 is provided with voice data generated from text and voiceprint data.
  • the voice providing system 1 receives text and voiceprint data via the network NW and provides voice data generated based on these to the external terminal group 40 is described.
  • the environment to which the voice providing system 1 according to the present invention is applied is not limited to the above environment, and can be changed as appropriate.
  • text and voiceprint data may be directly registered in the management server 10.
  • the voice data is not limited to the external terminal group 40, and may be provided to a terminal (voice output terminal or the like) directly connected to the management server 10.
  • the management server 10 is disposed in a company or the like that provides a voice providing service using the voice providing system 1 according to the present embodiment.
  • the management server 10 is configured, for example, by a personal computer (PC) having a general function, and has a function as a web server.
  • the management server 10 includes a text registration screen 20 (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later.
  • the external terminal group 40 Provided to the external terminal group 40.
  • the text registration terminal 20 is disposed in a company or the like that registers text as a manuscript of voice data provided from the voice providing system 1.
  • the text registration terminal 20 is, for example, a PC having a general function, and has a web browser function.
  • the text registration terminal 20 is disposed as audio data at a manufacturer who wants to provide an advertisement, a newspaper company who wants to provide news, a television station, or the like.
  • the text registration terminal 20 is arranged at a service provider who desires to provide information (for example, voice guidance) from home appliances or the like in an environment where any home appliance such as a television or a refrigerator is connected to the Internet. May be
  • the text registration terminal 20 is provided with the component (an input part, a display part, a communication part, etc.) required as a text input terminal.
  • the text registered from the text registration terminal 20 is composed of natural language.
  • the voiceprint registration terminal 30 is installed in a company or the like that registers voiceprint data (digital voiceprint data) serving as a sound source of voice data provided from the voice providing system 1.
  • the voiceprint registration terminal 30 is formed of, for example, a PC having a general function, and has a web browser function.
  • the voiceprint registration terminal 30 is disposed in a talent office to which an actor or a voice actor belongs, a management office that manages athletes, and the like.
  • the voiceprint registration terminal 30 includes components (input unit, display unit, communication unit, etc.) necessary as a voiceprint input terminal.
  • the voiceprint data includes, for example, fragment data of a voice recording a voice of a specific person, and sound and prosody parameters such as spectrum and fundamental frequency obtained by analyzing the voice of the specific person.
  • the voiceprint data is not limited to these, and includes arbitrary data necessary for voice synthesis technology (for example, waveform connection type voice synthesis, formant synthesis, etc.) by the voice generation unit 112 described later.
  • the external terminal group 40 is configured of, for example, any terminal (device) having a network connection function.
  • a car navigation system hereinafter referred to as "car navigation" 41
  • a portable terminal 42 such as a smartphone
  • a home appliance 43 such as a refrigerator
  • the car navigation system 41, the portable terminal 42 and the home appliance 43 constituting the external terminal group 40 have an audio output function of outputting audio data provided from the management server 10 in addition to the functions specific to each terminal.
  • FIG. 2 is a block diagram of the management server 10 of the voice providing system 1 according to the present embodiment.
  • the management server 10 has a control unit 101 that controls the entire management server 10.
  • a generation unit 102 To the control unit 101, a generation unit 102, a storage unit 103, a determination unit 104, a communication unit 105, an input unit 106, and a display unit 107 are connected.
  • the configuration of the management server 10 is not limited to this, and can be changed as appropriate.
  • the generation unit 102 includes a determination target generation unit 111 and an audio generation unit 112.
  • the determination target generation unit 111 generates a target (determination target) that determines whether the text stored in the storage unit 103 includes a non-permissive word (hereinafter, referred to as an “NG word”). For example, the determination target generation unit 111 generates a first determination target (first determination target) by dividing the text stored in the storage unit 103 at an arbitrary position. In addition, the determination target generation unit 111 divides the text stored in the storage unit 103 into morphemes to generate a second determination target (second determination target). Furthermore, the determination target generation unit 111 converts the text stored in the storage unit 103 or a part of the second determination target into a phonetic word (for example, hiragana or prosody).
  • a phonetic word for example, hiragana or prosody
  • FIG. 3A and 3B are diagrams showing an example of the first determination target and the second determination target generated by the determination target generation unit 111, respectively.
  • FIG. 3A shows a case where the first determination target is generated from the text "Kyouway ITENKIDESU” using Japanese and the text "It is good weather today” using English.
  • FIG. 3B shows the case where the second determination target is generated from the text "Japanese weather is good (KYOUWAY ITENKIDESU)" using Japanese and the text "It is good weather today” using English.
  • the first determination target and the second determination target are indicated by phonetic symbols.
  • the first determination target and the second determination target will be described using text in Japanese.
  • the first determination target is generated by converting the text “It is a good weather today (KYOUWAY OITENKIDESU)” into a phonetic word and dividing it into an arbitrary part.
  • the text “Good weather today (KYOUWAY OITENKIDESU)” is "Kyo-ha-yo-i-te-n-ki-su” (KYO-U-WA-YO-I-TE-N -KI-DE-SU) or "K-Ha-Ye-Ite-Ken-Ki (KYOU-WAYO-ITE-NKI-DESU)" is considered as the first judgment target.
  • the first determination target includes all combinations divided by an arbitrary number of phonetic words (number of hiragana) while maintaining the order of hiragana constituting the text “Today is a fine weather (KYOUWAY OITENKIDESU)”.
  • a first determination target including all the phonetic words is generated. For example, in the text “good (YOI)” shown in FIG. 3A, the pronunciation words “good (YOI)” and “good (II)” exist. Therefore, for the first judgment target generated from this text, “Kyou wa good day (KYOWAYOITENKIDESU)” is divided at any place, and “Kyouha is good day (KYOWA II TENKIDESU)" at any place Included.
  • the second determination target is generated by dividing the text “It is a fine weather today (KYOUWAY OITENKIDESU)” into morphemes.
  • the text “Today is fine weather (KYOUWAY OITENKIDESU)” is divided into “Today-Ha-Good-Weather-(KUOU-WA-YOI-TENKI-DESU)” and is considered as the second judgment target.
  • the determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in the text database (DB) 113 in the storage unit 103 described later in association with the text of the generation source. .
  • the voice generation unit 112 generates voice data from the text stored in the text DB 113 using voiceprint data registered in a voiceprint DB 114 in the storage unit 103 described later.
  • the voice generation unit 112 can also be called a voice synthesis unit that generates a voice waveform based on voiceprint data.
  • the voice generation unit 112 can generate a voice waveform by waveform connection type voice synthesis, formant synthesis, or the like.
  • waveform-type speech synthesis fragment data of a recorded voice of a specific person or the like is concatenated and synthesized.
  • formant synthesis the voice of a particular person recorded is not used, and parameters such as the base frequency, timbre, and noise level are adjusted to form a waveform, and artificial voice data is generated.
  • the voice generation unit 112 changes the mode of the voice data to be generated according to the presence or absence of the NG word included in the text. If the text does not contain an NG word, audio data corresponding to the text is generated without correcting the text. On the other hand, when the text includes an NG word, voice data corresponding to the text in which the portion corresponding to the NG word is corrected is generated. For example, when correcting the part corresponding to the NG word, the voice generation unit 112 can generate voice data in which the part is deleted or replaced.
  • the storage unit 103 stores information necessary for the control unit 101 to control the management server 10.
  • the storage unit 103 stores information for generating a text registration screen (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later.
  • the storage unit 103 stores a database (DB) in which various types of information are registered. Specifically, text DB 113, voiceprint DB 114, first NG word DB 115, and second NG word DB 116 are stored.
  • the text DB 113 texts registered from the text registration terminal 20 via the network NW are registered.
  • the text is registered in association with the identification information of the text registration terminal 20.
  • the determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in association with the text of the generation source.
  • Voiceprint data registered from the voiceprint registration terminal 30 via the network NW is registered in the voiceprint DB 114.
  • voiceprint data is registered in association with identification information of the voiceprint registration terminal 30.
  • the basic NG word DB 115 basic NG words including words that are undesirable to use in social terms and words specified from attribute information of text described later are registered.
  • the basic NG word includes a word that insults a third party, a word that is pronounced of an obscene word or an antisocial statement.
  • the basic NG word includes a word that is associated with a political position when the attribute information of the text is "political”.
  • the second NG word DB 116 individual NG words including words registered from the voiceprint registration terminal 30 via the network NW are registered.
  • the individual NG word is registered in association with the voiceprint data registered in the voiceprint DB 116.
  • the individual NG words include words that adversely affect the impression of the person providing the voiceprint data. For example, when the person providing the voiceprint data is a "sport athlete", words such as "eight hundred long" and "doping" are included.
  • the determination unit 104 determines whether the text registered in the text DB 113 or the first determination target associated with the text or the second determination target includes an NG word or a non-NG word permissible word (hereinafter referred to as an “OK word”). Determine When determining whether the text registered in the text DB 113 contains an NG word, the determination unit 104 refers to the basic NG word registered in the first NG word DB. When determining whether the first determination target or the second determination target registered in the text DB 113 contains an NG word, the determination unit 104 refers to the individual NG word registered in the second NG word DB. In this case, the determination unit 104 generates a phonetic word (NG sound) of the individual NG word as necessary, and compares it with the first determination target and the second determination target.
  • NG sound phonetic word
  • the communication unit 105 communicates information with the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40 under the control of the control unit 101. For example, the communication unit 105 transmits information necessary for the text registration screen (see FIG. 6) and the voiceprint registration screen (see FIG. 8) to the text registration terminal 20 and the voiceprint registration terminal 30, respectively. On the other hand, the communication unit 105 receives text, voiceprint data and setting input information from the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40, respectively.
  • the input unit 106 receives an instruction for the management server 10. For example, the input unit 106 receives an instruction such as editing of the basic NG word in the first NG word DB 115.
  • the display unit 107 displays information required to operate the management server 10. For example, the display unit 107 displays the status of the management server 10 and the registration status of text and voiceprint data stored in the storage unit 103.
  • FIG. 4 is a block diagram of a portable terminal 42 that receives voice provision from the voice provision system 1 according to the present embodiment. In FIG. 4, only the components of the portable terminal 42 related to the present invention are shown.
  • the portable terminal 42 includes a control unit 421 that controls the entire terminal.
  • the control unit 421 is connected to an application execution unit (hereinafter referred to as “application execution unit”) 422, an audio output unit 423, a communication unit 424, an input unit 425, and a display unit 426.
  • application execution unit an application execution unit
  • the configuration of the portable terminal 42 is not limited to the configuration shown in FIG. 4 and can be changed as appropriate.
  • the application execution unit 422 executes processing necessary to output voice data provided from the management server 10. For example, the application execution unit 422 generates a setting input screen (see FIG. 10) for inputting settings for audio data provided from the management server 10, and displays the setting input screen on the display unit 426. In addition, the application execution unit 422 performs confirmation (for example, confirmation as to whether or not the setting on the setting input screen matches) of the audio data received via the communication unit 424, and outputs the audio data to the audio output unit 423.
  • confirmation for example, confirmation as to whether or not the setting on the setting input screen matches
  • the audio output unit 423 outputs the audio data received from the application execution unit 422. For example, the audio output unit 423 outputs audio data corresponding to the text provided from the management server 10 from the speaker.
  • the communication unit 424 communicates information with the management server 10 via the network NW under the control of the control unit 421. For example, the communication unit 424 transmits the information input on the setting input screen described above to the management server 10. The communication unit 424 also receives voice data from the management server 10.
  • the input unit 425 receives an instruction for the portable terminal 42.
  • the input unit 425 receives an instruction to input information on the setting input screen.
  • the display unit 426 displays information necessary to operate the mobile terminal 42.
  • the display unit 426 displays the status of the portable terminal 42, a setting input screen, and the like.
  • the management server 10 receives a text such as a newspaper article from the text registration terminal 20 and receives voiceprint data of a specific actor or the like from the voiceprint registration terminal 30.
  • the management server 10 receives, from the portable terminal 42, setting information in which desired text and voiceprint data are designated.
  • the management server 10 generates voice data from text using specified voiceprint data based on setting information from the portable terminal 42, and provides the voice data to the portable terminal 42.
  • the portable terminal 42 can receive and output voice data in which text such as a newspaper article is read out with the voice of the operator's favorite actor.
  • FIG. 5 is a flowchart for explaining the operation at the time of text registration in the speech providing system 1 according to the present embodiment.
  • step ST501 when registering a text in the management server 10, registration of the text is applied from the text registration terminal 20 (step ST501).
  • information registration screen information
  • Step ST502 information necessary for generating a text registration screen is read out from the information stored in the storage unit 103 and is output to the text registration terminal 20 through the communication unit 105.
  • Step ST503 When the information necessary for the text registration screen is received, the text registration screen is displayed on the text registration terminal 20 (step ST503).
  • FIG. 6 is a view showing an example of a text registration screen 600 used by the voice providing system 1 according to the present embodiment.
  • the text registration screen 600 shown in FIG. 6 is a screen for registering a text that the operator of the text registration terminal 20 wants to provide.
  • the text registration screen 600 is provided with an attribute selection unit 601, a text input unit 602, a reset button 603, and an end button 604.
  • the attribute selection unit 601 is a part for selecting attribute information of text to be registered.
  • the attribute selection unit 601 is provided with a box (category selection box) for selecting a category such as “entertainment”, “sports”, “news”, “economy” or the like as text attribute information. By selecting these category selection boxes, the category to which the text to be registered belongs can be specified.
  • the attribute selecting unit 601 may be configured to directly input text attribute information.
  • the attribute selection unit 601 can adopt an arbitrary configuration on the premise of specifying text attribute information.
  • the text input unit 602 is a portion into which text to be registered is input.
  • the text input unit 602 is provided with a field for inputting text (text input field).
  • text input field By entering text characters and numbers in this text entry field, it is possible to specify text to be registered in the management server 10. For example, in the text entry field, texts related to newspaper articles, traffic information, voice guidance and advertisement information are entered.
  • the reset button 603 is used to reset the information selected and designated on the text registration screen 600.
  • the end button 604 is used when ending the text registration process using the text registration screen 600. By selecting the end button 604, the attribute information and text selected / input via the text registration screen 600 are transmitted to the management server 10.
  • the text registration screen 600 is not limited to the example shown in FIG. 6 and can be changed as appropriate. It is preferable as an embodiment to provide the text registration screen 600 with a portion for specifying the handling of the registered text. For example, the correction method (deletion and substitution of the NG word) of the voice data when the NG text is included in the registered text in relation to the voiceprint data may be designated.
  • attribute information is selected from the attribute selection unit 601, and text is input to the text input unit 602.
  • the end button 604 of the text registration screen 600 is selected, these attribute information and text are transmitted to the management server 10 (step ST504).
  • the determination unit 104 determines whether the text includes an NG word (step ST505). At this time, the determination unit 104 refers to the basic NG word registered in the first NG word DB. As a result, it is detected whether the text contains a word or the like which is undesirable to use, as it is customary. As described above, by determining the presence or absence of the basic NG word at the text registration stage, it is possible to prevent the text including the basic NG word from being registered in the management server 10.
  • step ST505 If the text contains a basic NG word (step ST505: YES), an error message indicating that is output to the text registration terminal 20 (step ST506). By outputting an error message in this manner, it is possible to notify the text registrant of inappropriateness such as text.
  • the text registrant who has received the error message inputs the text etc. again from the text registration screen 600, and transmits it to the management server 10 (step ST504).
  • step ST505 when the text does not contain the basic NG word (step ST505: No), the text transmitted in step ST504 is registered in the text DB 113 (step ST507). Then, when the registration process to the text DB 113 is completed, the management server 10 notifies the text registration terminal 20 of the completion of the text registration (step ST508). Through such a series of operations, a text such as a newspaper article is registered in the management server 10 (text DB 113).
  • FIG. 7 is a flowchart for explaining the operation at the time of voiceprint registration in voice providing system 1 according to the present embodiment.
  • step ST701 when voiceprint data is registered in the management server 10, registration of voiceprint data is applied from the voiceprint registration terminal 30 (step ST701).
  • information registration screen information
  • Step ST702 information necessary for generating the voiceprint registration screen is read out from the information stored in the storage unit 103 and is output to the voiceprint registration terminal 30 through the communication unit 105.
  • Step ST703 When the information necessary for the voiceprint registration screen is received, the voiceprint registration screen is displayed on the voiceprint registration terminal 30 (step ST703).
  • FIG. 8 is a view showing an example of a voiceprint registration screen 800 used in the voice providing system 1 according to the present embodiment.
  • the voiceprint registration screen 800 shown in FIG. 8 is a screen for registering voiceprint data that the operator of the voiceprint registration terminal 30 wants to provide.
  • the voiceprint registration screen 800 is provided with an attribute selection unit 801, an NG word category selection unit 802, an NG word selection / input unit 803, a voiceprint input unit 804, a reset button 805 and an end button 806. There is.
  • the attribute selection unit 801 is a portion for selecting attribute information of voiceprint data to be registered (more specifically, a person of voiceprint data).
  • the attribute selection unit 801 is provided with a box (category selection box) for selecting a category such as "actor", “idle”, “voice actor”, or “artist” as attribute information of voiceprint data. By selecting these category selection boxes, it is possible to specify the category to which voiceprint data to be registered belongs.
  • the attribute selection unit 801 may directly input the attribute information of the voiceprint data.
  • the attribute selection unit 801 can adopt an arbitrary configuration on the premise of designating attribute information of voiceprint data.
  • the NG word category selection unit 802 is a part that selects a category of the NG word (individual NG word).
  • the NG word category selection unit 802 is provided with, for example, a box (category selection box) for selecting a category such as “divorce”, “disaster”, “anti-society” or “advertisement”.
  • category selection boxes NG word candidates (NG word candidates) associated with each category are registered in advance. By selecting these category selection boxes, it is possible to specify the category to which the NG word (individual NG word) associated with the voiceprint data to be registered belongs.
  • the NG word selection / input unit 803 is a portion for selecting or inputting an NG word (individual NG word) associated with voiceprint data to be registered.
  • An NG word candidate is displayed on the NG word selection / input unit 803 by selecting a category from the above-mentioned NG word category selection unit 802.
  • the voiceprint registrant can select an NG word to be associated with voiceprint data to be registered from such an NG word candidate. Also, the voiceprint registrant can directly input an NG word (individual NG word) to the NG word selection / input unit 803.
  • the voiceprint input unit 804 is a part for inputting voiceprint data (digital voiceprint data) to be registered.
  • the voiceprint input unit 804 is provided with a box (voiceprint attachment box) to which voiceprint data is attached. By attaching voiceprint data to the voiceprint attachment box, voiceprint data to be registered in the management server 10 can be designated.
  • the reset button 805 is used to reset the information selected and designated on the voiceprint registration screen 800.
  • the end button 806 is used when ending registration processing of voiceprint data using the voiceprint registration screen 800. By selecting the end button 806, the attribute information and voiceprint data selected / input via the voiceprint registration screen 800 are transmitted to the management server 10.
  • the voiceprint registration screen 800 is not limited to the example shown in FIG. 8 and can be changed as appropriate. It is preferable as an embodiment to provide the voiceprint registration screen 800 with a portion for designating the handling of the registered voiceprint data. For example, when voice data is generated using registered voiceprint data, a correction method (deletion or replacement of NG word) of voice data when an NG word is included in the text may be designated.
  • a voiceprint registration screen 800 having a function of analogically displaying an NG word related to voiceprint data of a specific person.
  • the NG word may be analogized based on the speech and behavior of a specific person in the past one year (for example, an utterance by a medium such as a television or radio), and may be displayed on the NG word selection / input unit 803.
  • These NG words are preferably displayed according to the selection from the voiceprint registrant.
  • the attribute information is selected from the attribute selection unit 801, and the category of the NG word is selected from the NG word category selection unit 802.
  • the attribute information and the NG word category are transmitted to the management server 10 (step ST704).
  • a candidate list of NG words (NG word candidate list) is transmitted from the management server 10 to the voiceprint registration terminal 30 (step ST 705).
  • the NG word candidate list is displayed on the NG word selection / input unit 803 of the voiceprint registration screen 800.
  • the NG word candidate list may be transmitted to the voiceprint registration terminal 30 in step ST702, and may be displayed on the NG word selection / input unit 803 according to the selection of the attribute information and the category.
  • the voiceprint register designates an NG word (individual NG word) from the NG word selection / input unit 803, and voiceprint data is attached by the voiceprint input unit 804. Ru. Then, when the end button 806 of the voiceprint registration screen 800 is selected, these NG words and voiceprint data are transmitted to the management server 10 (step ST706).
  • the voiceprint data is registered in the voiceprint DB 114, and the NG word is registered in the second NG word DB 116 (step ST707).
  • the NG word is registered in association with the voiceprint data.
  • the management server 10 notifies the voiceprint registration terminal 30 of the completion of voiceprint registration (step ST 708).
  • text and voiceprint data for generating voice data are registered in the management server 10.
  • the management server 10 generates voice data using such text and voiceprint data, and provides the generated voice data to the portable terminal 42 and the like.
  • the management server 10 selects text and voiceprint data based on a desired setting specified by the portable terminal 42 or the like, and generates voice data based on the text and voiceprint data.
  • FIG. 9 is a flow chart for explaining the operation at the time of voice provision in voice provision system 1 according to the present embodiment.
  • an audio output application is activated in the portable terminal 42 (step ST901).
  • this voice output application it becomes possible to communicate information related to the voice providing system 1 with the management server 10.
  • the voice output application is activated, a setting input screen for inputting a desired setting on the portable terminal 42 is displayed (step ST902).
  • FIG. 10 is a diagram showing an example of a setting input screen 1000 used by the voice providing system 1 according to the present embodiment.
  • the setting input screen 1000 shown in FIG. 10 is a screen for designating audio data that the operator of the portable terminal 42 wants to receive provision.
  • the setting input screen 1000 is provided with a text designation unit 1001, a voiceprint designation unit 1002, a reset button 1003 and an end button 1004.
  • the text designating unit 1001 is a portion for designating a text corresponding to audio data that the operator of the portable terminal 42 wants to receive.
  • the text designation unit 1001 is provided with a box (text selection box) for selecting a text such as “entertainment”, “sports”, “news”, “economy” or the like indicating the type of text. By selecting these text selection boxes, it is possible to specify the text corresponding to the audio data provided from the management server 10.
  • the text selection box displays content including text of various genres.
  • the voiceprint designating unit 1002 is a portion for designating voiceprint data serving as a sound source of voice data that the operator of the portable terminal 42 wants to receive.
  • Voiceprint designating unit 1002 is provided with a box (category selection box) for selecting a category to which a person corresponding to voiceprint data belongs. By selecting these category selection boxes, it is possible to specify a candidate of a person corresponding to voiceprint data.
  • the voiceprint designation unit 1002 displays a plurality of persons belonging to the category. The operator can specify a person corresponding to voiceprint data by selecting a candidate displayed on the voiceprint specification unit 1002.
  • voiceprint designation section 1002 is provided with an input field where a person corresponding to voiceprint data can be directly input.
  • the reset button 1003 is used to reset information selected and specified on the setting input screen 1000.
  • the end button 1004 is used when ending the input process of the desired setting using the setting input screen 1000. By selecting the end button 1004, the text and voiceprint data selected / input via the setting input screen 1000 are transmitted to the management server 10.
  • the setting input screen 1000 is not limited to the example shown in FIG. 10, and can be changed as appropriate. It is preferable as an embodiment to provide the setting input screen 1000 with a portion for designating the handling of voice data generated from the set text and voiceprint data. For example, a method of correcting voice data (NG word deletion and replacement) may be designated when the set text and voiceprint data include an NG word.
  • NG word deletion and replacement a method of correcting voice data
  • setting information is transmitted to the management server 10 (step ST 903).
  • the setting information includes text selected by the operator and voiceprint data selected by the operator (more specifically, information on a person corresponding to the voiceprint data).
  • the text and voiceprint data included in the setting information are selected in the management server 10 (step ST904).
  • the management server 10 selects the text and voiceprint data included in the setting information from the text DB 113 and the voiceprint DB 114. Then, after the text and voiceprint data are selected, it is determined whether the NG word (individual NG word) associated with the voiceprint data is included in the designated text (hereinafter referred to as “NG determination process”) Is performed (step ST 905).
  • FIG. 11 is a flowchart for explaining the NG determination process in the voice providing system 1 according to the present embodiment.
  • the NG determination processing is mainly executed by the generation unit 102 (the determination target generation unit 111) and the determination unit 104 in the management server 10.
  • the determination target generation unit 111 performs a second determination target generation processing (morpheme analysis processing) on the text selected in step ST904 described above (step ST1101).
  • the selected text is divided into morphemes. That is, the second determination target (see FIG. 3B) is generated from the text by the second determination target generation process.
  • the second determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.
  • the determination unit 104 performs determination processing (hereinafter, referred to as “primary determination processing”) to determine whether the second determination target includes an NG word (individual NG word) Step ST1102).
  • the determination unit 104 reads the individual NG word associated with the voiceprint data selected in step ST 904 from the second NG word DB 116. Then, the determination unit 104 determines the NG word and the OK word in the text by comparing the individual NG word and the second determination target one by one (step ST1103). As a result, the morpheme making up the text and the NG word are compared, and the NG word included in the text is detected.
  • the determination target generation unit 111 When the OK word is detected from the text (step ST1103: OK), the determination target generation unit 111 generates a phonetic word of the OK word (step ST1104).
  • the case where the OK word is detected corresponds to the case where the second determination target that does not correspond to the NG word is detected from the text.
  • the determination unit 104 generates a phonetic word of the individual NG word (hereinafter referred to as “NG sound”) (step ST1105).
  • NG sound a phonetic word of the individual NG word
  • the determination unit 104 When the pronunciation word and the NG sound of the OK word are generated, the determination unit 104 performs a determination process (hereinafter, referred to as “secondary determination process”) to determine whether the NG word is included in the pronunciation word of the OK word (hereinafter Step ST1106).
  • the determination unit 104 determines the NG word and the OK word by comparing the pronunciation word of the OK word and the NG sound one by one (step ST1107).
  • the phonetic word of the second determination target determined as the OK word in the primary determination processing is compared with the phonetic word of the NG word, and the NG word included in the text is detected.
  • the determination target generation unit 111 performs the first determination target generation process on the text selected in step ST 904 (step ST 1108).
  • the first determination target generation process a phonetic word of the selected text is generated, and a determination target in which the phonetic word is divided at an arbitrary position is generated. That is, the first determination target (see FIG. 3A) is generated from the text by the first determination target generation process.
  • the first determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.
  • the determination unit 104 performs determination processing (hereinafter, referred to as “third determination processing”) to determine whether the first determination target includes an NG sound (step ST1109).
  • the determination unit 104 compares, one by one, the match between an arbitrary combination of the phonetic words of the first determination target text and the NG sound of the individual NG word registered in the second NG word DB 116 By doing this, an NG word is determined (step ST1110).
  • the NG sound is compared with an arbitrary combination of the phonetic words of the text, and the NG word not detected in the primary determination process and the secondary determination process is detected. For example, in the example shown in FIG.
  • step ST1110 If an NG word is not detected in the tertiary determination process (step ST1110: No), the determination unit 104 determines that the text does not include the NG word as a determination result of the NG determination process (OK determination) Is selected (step ST1111).
  • step ST1110 determines whether an NG word is detected in the tertiary determination process (step ST1110: Yes). If it is determined (step ST1107: NG), the determination unit 104 records the location of the NG word in the text (step ST1112). Then, the determination unit 104 selects the determination (NG determination) indicating that the text includes the NG word as the determination result of the NG determination process (step ST1113).
  • the determination unit 104 ends the NG determination process. With such an NG determination process, it is determined whether the NG word (individual NG word) associated with the voiceprint data selected in step ST 904 is included in the selected text.
  • an NG word is generated by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary location with the pronunciation (NG sound) corresponding to the individual NG word. Is determined. For this reason, it is possible to compare the pronunciation and the NG sound of any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.
  • the NG word and the OK word included in the text based on the morpheme (second determination target) constituting the text Is determined (primary determination processing).
  • the NG word included as a morpheme in the text can be reliably detected by comparison with the second determination target.
  • the NG word included in the text can be determined stepwise, it is possible to reduce the detection omission of the NG word.
  • the pronunciation and the NG word corresponding to the second determination target determined as the OK word in the primary determination process.
  • the NG word is determined by comparison with the corresponding pronunciation (secondary determination processing). Therefore, regardless of the meaning of the morpheme determined to be the OK word, the NG word included in the morpheme can be detected.
  • the determination unit 104 determines whether the determination result of the NG determination process is an OK determination or an NG determination as shown in FIG. 9 (step ST 906).
  • the audio generation unit 112 generates audio data (step ST 907).
  • the voice generation unit 112 generates voice data corresponding to the text using the voiceprint data selected in step ST904 without performing processing such as correction on the text selected in step ST904. Then, the generated voice data is output from the management server 10 to the portable terminal 42 (step ST 908).
  • the voice generation unit 112 determines whether the determination result of the NG determination process is an NG determination (step ST906: NG). If the determination result of the NG determination process is an NG determination (step ST906: NG), the voice generation unit 112 generates voice data (modified voice data) in which a portion of the text is corrected (step ST909). In this case, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word in the text selected in step ST904. In the portion other than the portion corresponding to the NG word in the text, voice data is generated using the voiceprint data selected in step ST904.
  • the speech generation unit 112 can generate speech data in which the part corresponding to the NG word in the text is deleted. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced. As an aspect of replacing the NG word, the voice generation unit 112 can generate voice data using voiceprint data different from the voiceprint data selected in step ST904, for example. For example, voice data can be generated using predetermined voiceprint data only for the part corresponding to the NG word. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced with a word of another expression. In addition, it is preferable to select the mode of the correction with respect to the part corresponding to NG word in consideration of the intention of a text registrant or a voiceprint registrant.
  • the voice data (corrected voice data) generated by correcting the part corresponding to the NG word is output from the management server 10 to the portable terminal 42 (step ST 910).
  • the portable terminal 42 outputs audio via a speaker or the like (step ST 911). By this voice output, the operation at the time of voice provision in the voice provision system 1 ends.
  • the voice generation unit 112 of the management server 10 determines the specific voice print from the text registered in the storage unit 103 according to the determination result of the determination unit 104. Generate voice data using data. As a result, according to the determination result of the determination unit 104, voice data using specific voiceprint data is generated from the text. Therefore, it is possible to switch and generate voice data according to the presence or absence of the NG word included in the text. As a result, it is possible to provide the portable terminal 42 with audio data of different modes according to the presence or absence of the NG word included in the text.
  • the voice generation unit 112 when the text registered in the storage unit 103 does not include the NG word, the voice generation unit 112 generates voice data corresponding to the text. As a result, voice data corresponding to the text not including the NG word is generated without any special correction or the like. Therefore, voice data corresponding to text can be quickly provided to the portable terminal 42.
  • the voice generation unit 112 when an NG word is included in the text registered in the storage unit 103, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word. As a result, even in the case of text including an NG word, audio data in which the portion of the NG word is corrected can be provided to the portable terminal 42.
  • the voice generation unit 112 can delete or replace a portion corresponding to the NG word included in the text.
  • the portion corresponding to the NG word is deleted, it is possible to provide the portable terminal 42 with voice data in which the NG word contained in the text is surely deleted even if it is a text including the NG word.
  • the part corresponding to the NG word is replaced, it is possible to prevent the voice data including the NG word from being provided as it is to the portable terminal 42 using the specific voiceprint data.
  • voiceprint data different from the specific voiceprint data can be used for that part.
  • audio data according to the text can be provided to the portable terminal 42 without changing the meaning of the text.
  • the part can be replaced with a word of a different expression.
  • audio data according to the text can be provided to the portable terminal 42 without significantly changing the meaning of the text.
  • NG stored in the storage unit 103 (more specifically, the second NG word DB 116) of the management server 10 is associated with voiceprint data registered from the voiceprint registration terminal 30. Word is registered. For this reason, it is determined by the determination unit 104 whether the NG word (individual NG word) associated with the specific voiceprint data is included in the text. This makes it possible to reliably prevent the voice data including the NG word associated with the specific voiceprint data from being provided to the portable terminal 42.
  • the determination unit 104 of the management server 10 determines the presence or absence of the NG word (general NG word) included in the text at the time of text registration from the text registration terminal 20. . This makes it possible to prevent the registration of the text including the NG word in the registration phase of the text from the text registration terminal 20.
  • the determination unit 104 determines the presence or absence of the NG word (general NG word) associated with the attribute information of the text. Thereby, it is possible to prevent the registration of the text including the NG word specified from the attribute information in the registration phase of the text from the text registration terminal 20.
  • the determination unit 104 is registered in the second NG word DB 116 with any combination of the phonetic words of the text that is the first determination target.
  • the case of comparing the match of the NG word (individual NG word) with the NG sound one by one is described.
  • the comparison method in the tertiary determination process is not limited to this, and can be changed as appropriate.
  • the determination unit 104 may determine the NG word based on a partial match between the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word.
  • the ratio regarding partial agreement with the pronunciation corresponding to the NG word may be determined in advance, or may be determined by machine learning based on the results or Bayesian statistics.
  • the determination unit 104 determines whether the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word (individual NG word).
  • the case where the NG word is determined by the comparison is described.
  • the determination method by the determination unit 104 is not limited to this, and can be changed as appropriate.
  • the NG word may be determined by comparing the character string forming the first determination target with the character string forming the NG word.
  • the NG word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary position and the character string forming the NG word. For this reason, it is possible to compare an NG word with any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.
  • management server 10 generates voice data using specific voiceprint data from the text registered in storage unit 103 and provides the generated voice data to portable terminal 42 etc.
  • the information provided to the portable terminal 42 or the like is not limited to only voice data, and can be added as appropriate.
  • the text used for the generation may be provided together.
  • image data, moving image data or computer graphics (CG) may be provided. In this case, it is preferable as an embodiment to provide image data and moving image data related to audio data.
  • the voice generation unit 112 can replace the portion corresponding to the NG word.
  • Such partial substitution of text can also be applied to parts of text other than NG words.
  • a specific word included in the text may be replaced with a different prepared word.
  • the voice generation unit 112 can replace it with a different word prepared in advance.
  • replacement target word a replacement target word
  • replacement target word a replacement target word
  • replacement target word a replacement target word
  • replacement word a word to be replaced with the replacement target word
  • the determination unit 104 determines whether the text “I have a good weather today (KYOUWAY OITENKIDESU)” includes “is (DESU)” which is a replacement target word Judgment (replacement judgment processing).
  • This replacement determination process is replaced with, for example, the NG determination process of step ST 905 shown in FIG.
  • the determination target in this substitution determination process is the first determination target described above (a determination target obtained by converting a text into a phonetic word and divided into arbitrary parts) or a second determination target (a determination target obtained by dividing text into morphemes) be able to.
  • the determination unit 104 determines whether the second determination target and / or the first determination target includes a replacement target word.
  • the voice generation unit 112 replaces the replacement target word with “Nyan (NYAN)” as a replacement word (voice data (replacement). Audio data).
  • voice data corresponding to the text "Kyouwayoite nekiyan” is generated as the replacement voice data.
  • the generated substitute voice data is transmitted from the management server 10 to the portable terminal 42.
  • the substitute audio data is output as audio by a speaker or the like.
  • voice data of “Good weather today (KYOUWAY ITE NKIN YAN)” is output from the portable terminal 42.
  • Audio data can be provided to the portable terminal 42 in accordance with the way of speaking of the character based on the text. Thereby, for example, it is possible to provide an audio providing service for reading newspaper articles and the like by a specific character.
  • the determination of the NG word in the case where the text using Japanese is to be determined is mainly described.
  • the determination target of the NG word is not limited to Japanese, and can be applied to any language used worldwide.
  • the presence or absence of an NG word may be determined across multiple languages. For example, when the pronunciation corresponding to the text using English matches or is similar to the pronunciation corresponding to the NG word in Japanese, part of the text using English can be determined as the NG word.
  • the determination apparatus of the present invention and the voice providing system using the same, it is possible to accurately detect non-permissive words contained in the text regardless of the context of the text, and in particular, specific voiceprint data It can be suitably used for a voice providing service that reads texts using

Abstract

The purpose of the present invention is to accurately detect a non-permissive word included in text regardless of the context of the text. A management server (10) for determining a non-permissive word (NG word) included in text configured in natural language is configured to be provided with: an object-to-be-determined generation unit (111) that generates a first object to be determined by dividing the text at an arbitrarily defined position; and a determination unit (104) that determines the non-permissive word (NG word) by comparison between a pronunciation corresponding to the first object to be determined and a pronunciation corresponding to the non-permissive word (NG word).

Description

判定装置及びこれを備えた音声提供システムJudgment device and voice providing system provided with the same
 本発明は、自然言語で構成されたテキストに含まれる非許容単語を判定する判定装置及びこれを備えた音声提供システムに関する。 The present invention relates to a determination apparatus for determining non-permissive words included in text composed of natural language, and a voice providing system including the same.
 近年、テキストを音声で読み上げる音声合成技術は、交通情報の放送や、美術館又は博物館の展示ガイダンス音声、カーナビケーションシステムなどに幅広く利用されている。このような音声合成技術においては、自然言語で構成されたテキストを合成音声で読み上げることで実際の利用者の音声を秘匿することができる。このため、嫌がらせや脅迫等の犯罪への悪用が懸念される。従来、音声合成技術の犯罪への悪用防止等を目的とした技術が知られている(例えば、特許文献1参照)。 BACKGROUND ART In recent years, speech synthesis technology for reading texts by voice is widely used for broadcasting of traffic information, display guidance voices of art museums or museums, car navigation systems, and the like. In such speech synthesis technology, it is possible to conceal the speech of the actual user by reading out the text composed of natural language by synthetic speech. Because of this, there is a concern about abuses such as harassment and intimidation. Conventionally, there is known a technique for the purpose of preventing abuse of speech synthesis technology for crime and the like (see, for example, Patent Document 1).
 特許文献1に記載の音声合成装置においては、不適切な単語や不適切な用言パターンを登録した不適切語辞書を備え、読み上げるべきテキストに含まれる不適切な部分の度合いを判定する。そして、そのテキストにおける不適切な表現が含まれる度合いによって、音声透かしの合成や、外部記憶端末であるサーバへの登録を可能としている。これにより、適切なテキストに対して音声を劣化させるデータの挿入等の悪影響を及ぼすことなく、犯罪等への悪用を防止する。 The speech synthesizer described in Patent Document 1 includes an inappropriate word dictionary in which an inappropriate word or an inappropriate discourse pattern is registered, and the degree of an inappropriate portion included in the text to be read out is determined. Then, depending on the degree of inappropriate expression included in the text, it is possible to synthesize an audio watermark and to register in an external storage terminal server. In this way, it is possible to prevent the abuse of criminals without adversely affecting the text such as insertion of data that degrades the voice.
特開2007-156169号公報JP 2007-156169 A
 しかしながら、特許文献1記載の音声合成装置においては、読み上げるべきテキストにおける形態素や係り受けや意味の解析を行い、その解析結果に応じて不適切な部分の度合いを判定する。このため、読み上げるべきテキストの文脈に関わらず、テキスト内に存在する不適切な表現については判定することができないという問題がある。 However, in the speech synthesis apparatus described in Patent Document 1, analysis of morphemes, modification and meaning in the text to be read out is performed, and the degree of an inappropriate portion is determined according to the analysis result. Therefore, regardless of the context of the text to be read out, there is a problem that it is not possible to determine the inappropriate expression existing in the text.
 本発明は、このような問題に鑑みてなされたものであり、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができる判定装置及びこれを備えた音声提供システムを提供することを目的とする。 The present invention has been made in view of such a problem, and a determination device capable of accurately detecting non-permissive words contained in text regardless of the context of the text, and a voice providing system including the same. Intended to be provided.
 本発明に係る判定装置は、自然言語で構成されたテキストに含まれる非許容単語を判定する判定装置であって、前記テキストを任意箇所で区切って第1の判定対象を生成する判定対象生成部と、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定する判定部と、を具備することを特徴とする。 The determination apparatus according to the present invention is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word.
 この構成によれば、テキストを任意箇所で区切って生成される第1の判定対象に対応する発音と非許容単語に対応する発音との比較により非許容単語が判定される。このため、テキストを構成する文字や数字の任意の組み合わせの発音と非許容単語の発音とを比較することができる。これにより、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができる。 According to this configuration, the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
 特に、上記判定装置において、前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との部分的な一致により前記非許容単語を判定することが好ましい。この構成によれば、第1の判定対象及び非許容単語の発音が完全に一致する場合に限らず、テキストに含まれる非許容単語と部分的に一致する単語を検出することができる。これにより、テキストに含まれる非許容単語と類似する単語まで検出することができる。 In particular, in the determination apparatus, it is preferable that the determination unit determines the non-permissive word by partial coincidence between a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word. According to this configuration, not only when the pronunciation of the first determination target and the non-permissive word completely matches, it is possible to detect a word that partially matches the non-permissive word included in the text. Thus, even words that are similar to the non-permissive words included in the text can be detected.
 また、上記判定装置において、前記判定対象生成部は、前記テキストを形態素に分割して第2の判定対象を生成し、前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較の前に、前記第2の判定対象と前記非許容単語との比較により前記非許容単語及び前記非許容単語でない許容単語を判定することが好ましい。この構成によれば、第1の判定対象と非許容単語との比較に先立って、テキストを構成する形態素に基づいてテキストに含まれる非許容単語及び許容単語を判定することができる。これにより、第1の判定対象に先立って、第2の判定対象との比較によりテキストに形態素として含まれる非許容単語を確実に検出することができる。また、テキストに含まれる非許容単語を段階的に判定することができるので、非許容単語の検出漏れを低減することができる。 Further, in the determination apparatus, the determination target generation unit divides the text into morphemes to generate a second determination target, and the determination unit generates a pronunciation corresponding to the first determination target and the non-permission Before the comparison with the pronunciation corresponding to the word, it is preferable to determine the non-permissive word and the non-permissive non-permissive word by comparing the second determination target with the non-permissive word. According to this configuration, prior to the comparison between the first determination target and the non-permissible word, it is possible to determine the non-permissive word and the permissive word included in the text based on the morpheme configuring the text. As a result, prior to the first determination target, non-permissive words included as morphemes in the text can be reliably detected by comparison with the second determination target. In addition, since non-permissive words included in the text can be determined in stages, it is possible to reduce omission of non-permissive words.
 さらに、上記判定装置において、前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較の前に、前記許容単語と判定された前記第2の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定することが好ましい。この構成によれば、許容単語と判定された第2の判定対象に対応する発音と非許容単語に対応する発音との比較により非許容単語が判定される。このため、許容単語と判定された形態素の意味に関わらず、当該形態素に含まれる非許容単語を検出することができる。 Furthermore, in the determination apparatus, the determination unit may determine the second determination that is determined as the permissible word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. Preferably, the non-permissive word is determined by comparing a pronunciation corresponding to an object and a pronunciation corresponding to the non-permissive word. According to this configuration, the non-permitted word is determined by comparing the pronunciation corresponding to the second determination target determined to be the permitted word with the pronunciation corresponding to the non-permitted word. Therefore, regardless of the meaning of the morpheme determined to be a permitted word, it is possible to detect non-permissive words included in the morpheme.
 本発明に係る判定装置は、自然言語で構成されたテキストに含まれる非許容単語を判定する判定装置であって、前記テキストを任意箇所で区切って第1の判定対象を生成する判定対象生成部と、前記第1の判定対象を構成する文字列と前記非許容単語を構成する文字列との比較により前記非許容単語を判定する判定部と、を具備することを特徴とする。 The determination apparatus according to the present invention is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing the character string forming the first determination target and the character string forming the non-permissive word.
 この構成によれば、テキストを任意箇所で区切って生成される第1の判定対象を構成する文字列と非許容単語を構成する文字列との比較により非許容単語が判定される。このため、テキストを構成する文字や数字の任意の組み合わせを構成する文字列と非許容単語を構成する文字列とを比較することができる。これにより、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができる。 According to this configuration, the non-permissive word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary place and the character string forming the non-permissible word. Therefore, it is possible to compare a character string constituting an arbitrary combination of characters and numbers constituting the text with a character string constituting the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
 本発明に係る音声提供システムは、上記したいずれかの判定装置を備え、特定の声紋データに基づいて前記テキストに応じた音声を提供する音声提供システムであって、前記判定装置は、前記判定部の判定結果に応じて前記テキストから前記特定の声紋データを用いた音声データを生成する音声生成部を具備することを特徴とする。 A voice providing system according to the present invention is a voice providing system including any of the determination devices described above and providing voice corresponding to the text based on specific voiceprint data, wherein the determination device is the determination unit. And a voice generation unit that generates voice data using the specific voiceprint data from the text according to the determination result of
 この構成によれば、判定装置の判定部の判定結果に応じてテキストから特定の声紋データを用いた音声データが生成される。このため、テキストに含まれる非許容単語の有無に応じて音声データを切り替えて生成することができる。これにより、テキストに含まれる非許容単語の有無に応じて態様の異なる音声データを提供することができる。 According to this configuration, the voice data using the specific voiceprint data is generated from the text according to the determination result of the determination unit of the determination apparatus. Therefore, the voice data can be switched and generated according to the presence or absence of the non-permissive word included in the text. As a result, it is possible to provide speech data of different modes according to the presence or absence of non-permissive words included in the text.
 例えば、上記音声提供システムにおいて、前記音声生成部は、前記テキストに前記非許容単語が含まれない場合に、当該テキストに対応する音声データを生成する。この構成によれば、音声生成部によって特別な修正等が施されることなく、非許容単語を含まないテキストに対応する音声データが生成される。このため、迅速にテキストに対応する音声データを提供することができる。 For example, in the voice providing system, the voice generation unit generates voice data corresponding to the text when the text does not include the non-permitted word. According to this configuration, the voice generation unit generates voice data corresponding to the text that does not include the non-permissive word without special correction or the like. For this reason, it is possible to quickly provide audio data corresponding to text.
 一方、上記音声提供システムにおいて、前記音声生成部は、前記テキストに前記非許容単語が含まれる場合に、当該非許容単語に対応する部分を修正した前記テキストに対応する音声データを生成する。この構成によれば、非許容単語に対応する部分を修正した音声データが生成される。このため、非許容単語を含むテキストであっても、当該非許容単語の部分が修正された音声データを提供することができる。 On the other hand, in the voice providing system, when the text includes the non-permissive word, the voice generation unit generates voice data corresponding to the text in which a portion corresponding to the non-permissive word is corrected. According to this configuration, voice data in which the portion corresponding to the non-permissive word is corrected is generated. For this reason, even if it is the text containing a non-permissive word, the audio | speech data in which the part of the said non-permissive word was corrected can be provided.
 例えば、上記音声提供システムにおいて、前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を削除する。この構成によれば、音声生成部によって非許容単語に対応する部分が削除された音声データが生成される。このため、非許容単語を含むテキストであっても、テキストに含まれる非許容単語が確実に削除された音声データを提供することができる。 For example, in the voice providing system, the voice generation unit deletes a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data in which the portion corresponding to the non-permitted word is deleted. For this reason, even if it is the text containing a non-permissive word, the audio | speech data in which the non-permissive word contained in the text was deleted reliably can be provided.
 また、上記音声提供システムにおいて、前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を置換する。この構成によれば、音声生成部によって非許容単語に対応する部分が置換された音声データが生成される。このため、非許容単語を含む音声データが特定の声紋データを用いてそのまま提供されるのを防止することができる。また、非許容単語に対応する部分が置換されるので、テキストの一部が欠落するのを回避することができる。 Further, in the voice providing system, the voice generation unit replaces a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data in which a portion corresponding to the non-permitted word is replaced. For this reason, it is possible to prevent voice data including an unacceptable word from being provided as it is using specific voiceprint data. In addition, since the part corresponding to the non-permissive word is replaced, it is possible to avoid that part of the text is missing.
 例えば、上記音声提供システムにおいて、前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分に前記特定の声紋データと異なる声紋データを用いる。この構成によれば、音声生成部によって非許容単語に対応する部分に異なる声紋データを用いた音声データが生成される。このため、NGワードを含むテキストであっても、テキストが有する意味を変更させることなく、これに応じた音声データを提供することができる。 For example, in the voice providing system, the voice generation unit uses voiceprint data different from the specific voiceprint data for a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data using different voiceprint data in a portion corresponding to the non-permitted word. For this reason, even in the text including the NG word, it is possible to provide audio data according to the text without changing the meaning of the text.
 また、上記音声提供システムにおいて、前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を異なる表現の単語に置換する。この構成によれば、音声生成部によって非許容単語を異なる表現の単語に置換された音声データが生成される。このため、非許容単語を含むテキストであっても、テキストが有する意味を大幅に変更させることなく、これに応じた音声データを提供することができる。 Further, in the voice providing system, the voice generation unit replaces a portion corresponding to the non-permissive word included in the text with a word having a different expression. According to this configuration, the voice generation unit generates voice data in which the non-permitted word is replaced with a word having a different expression. For this reason, even if it is the text containing a disallowed word, the audio | speech data according to this can be provided, without making the meaning which a text has to be changed significantly.
 上記音声提供システムにおいて、前記判定装置は、前記特定の声紋データを記憶する記憶部を具備し、前記記憶部は、前記特定の声紋データに関連付けられる前記非許容単語を記憶する。この構成によれば、判定部によって特定の声紋データに関連付けられる非許容単語がテキストに含まれるかが判定される。このため、特定の声紋データに関連付けられる非許容単語を含む音声データが提供されるのを防止することができる。 In the voice providing system, the determination device includes a storage unit that stores the specific voiceprint data, and the storage unit stores the non-permitted word associated with the specific voiceprint data. According to this configuration, it is determined by the determination unit whether the non-permissive word associated with the specific voiceprint data is included in the text. Therefore, it is possible to prevent the provision of voice data including non-permissive words associated with specific voiceprint data.
 本発明に係る判定方法は、自然言語で構成されたテキストに含まれる非許容単語を判定する判定方法であって、前記テキストを任意箇所で区切って第1の判定対象を生成するステップと、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定するステップと、を具備することを特徴とする。 The determination method according to the present invention is a determination method for determining a non-permissive word included in a text composed of natural language, comprising the step of dividing the text at an arbitrary place to generate a first determination target; Determining the non-permissive word by comparing a pronunciation corresponding to a first determination target and a pronunciation corresponding to the non-permissive word.
 この方法によれば、テキストを任意箇所で区切って生成される第1の判定対象に対応する発音と非許容単語に対応する発音との比較により非許容単語が判定される。このため、テキストを構成する文字や数字の任意の組み合わせの発音と非許容単語の発音とを比較することができる。これにより、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができる。 According to this method, the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.
 本発明によれば、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができる。 According to the present invention, it is possible to accurately detect non-permissive words contained in text regardless of the context of the text.
本実施の形態に係る音声提供システムの概略を示す説明図である。It is an explanatory view showing the outline of the voice offer system concerning this embodiment. 本実施の形態に係る音声提供システムの管理サーバのブロック図である。It is a block diagram of the management server of the voice provision system concerning this embodiment. 本実施の形態に係る管理サーバの判定対象生成部により生成される判定対象の一例を示す図である。It is a figure which shows an example of the determination target produced | generated by the determination target production | generation part of the management server which concerns on this Embodiment. 本実施の形態に係る音声提供システムから音声提供を受ける携帯端末のブロック図である。It is a block diagram of a portable terminal which receives voice offer from a voice offer system concerning this embodiment. 本実施の形態に係る音声提供システムにおけるテキスト登録時の動作を説明するためのフロー図である。It is a flowchart for demonstrating the operation | movement at the time of the text registration in the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムで利用されるテキスト登録画面の一例を示す図である。It is a figure which shows an example of the text registration screen utilized with the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムにおける声紋登録時の動作を説明するためのフロー図である。It is a flowchart for demonstrating the operation | movement at the time of voiceprint registration in the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムで利用される声紋登録画面の一例を示す図である。It is a figure which shows an example of the voiceprint registration screen utilized with the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムにおける音声提供時の動作を説明するためのフロー図である。It is a flowchart for demonstrating the operation | movement at the time of the audio | voice provision in the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムで利用される設定入力画面の一例を示す図である。It is a figure which shows an example of the setting input screen utilized with the audio | voice provision system which concerns on this Embodiment. 本実施の形態に係る音声提供システムにおけるNG判定処理を説明するためのフロー図である。It is a flowchart for demonstrating NG determination processing in the audio | voice provision system which concerns on this Embodiment.
 以下、本発明の実施の形態に係る音声提供システムについて、添付の図面を参照しながら詳細に説明する。なお、本発明に係る音声提供システムは、以下の実施の形態に限定されるものではなく、本発明の趣旨の範囲で適宜変形して実施することができる。 Hereinafter, a voice providing system according to an embodiment of the present invention will be described in detail with reference to the attached drawings. In addition, the audio | voice provision system which concerns on this invention is not limited to the following embodiment, In the range of the meaning of this invention, it deform | transforms suitably and can be implemented.
 図1は、本実施の形態に係る音声提供システムの概略を示す説明図である。図1に示すように、音声提供システム1は、判定装置の一例を構成する管理サーバ10を備える。管理サーバ10は、インターネット等のネットワークNWを介して接続されたテキスト登録端末20及び声紋登録端末30から提供されるテキスト及び声紋データ(デジタル声紋データ)を蓄積する一方、ネットワークNWを介して接続された外部端末群40に対してテキスト及び声紋データから生成される音声データを提供する。 FIG. 1 is an explanatory view showing an outline of a voice providing system according to the present embodiment. As shown in FIG. 1, the voice providing system 1 includes a management server 10 that constitutes an example of a determination device. The management server 10 stores text and voiceprint data (digital voiceprint data) provided from the text registration terminal 20 and the voiceprint registration terminal 30 connected via the network NW such as the Internet, and is connected via the network NW. The external terminal group 40 is provided with voice data generated from text and voiceprint data.
 なお、図1においては、音声提供システム1がネットワークNWを介してテキストや声紋データを受け取り、これらに基づいて生成された音声データを外部端末群40に提供する場合について説明している。しかしながら、本発明に係る音声提供システム1が適用される環境については、上記環境に限定されるものではなく適宜変更が可能である。例えば、テキストや声紋データは、管理サーバ10に対して直接登録してもよい。また、音声データは、外部端末群40に限定されず、管理サーバ10に直接接続された端末(音声出力端末等)に提供するようしてもよい。 In FIG. 1, the case where the voice providing system 1 receives text and voiceprint data via the network NW and provides voice data generated based on these to the external terminal group 40 is described. However, the environment to which the voice providing system 1 according to the present invention is applied is not limited to the above environment, and can be changed as appropriate. For example, text and voiceprint data may be directly registered in the management server 10. Further, the voice data is not limited to the external terminal group 40, and may be provided to a terminal (voice output terminal or the like) directly connected to the management server 10.
 管理サーバ10は、本実施の形態に係る音声提供システム1を用いた音声提供サービスを提供する企業等に配置される。管理サーバ10は、例えば、一般的な機能を有するパーソナルコンピュータ(PC)で構成され、ウェブサーバとしての機能を有する。例えば、管理サーバ10は、後述するテキスト登録画面(図6参照)、声紋登録画面(図8参照)及び設定入力画面(図10参照)を、ネットワークNWを通じてテキスト登録端末20、声紋登録端末30及び外部端末群40に提供する。 The management server 10 is disposed in a company or the like that provides a voice providing service using the voice providing system 1 according to the present embodiment. The management server 10 is configured, for example, by a personal computer (PC) having a general function, and has a function as a web server. For example, the management server 10 includes a text registration screen 20 (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later. Provided to the external terminal group 40.
 テキスト登録端末20は、音声提供システム1から提供される音声データの原稿となるテキストを登録する企業等に配置される。テキスト登録端末20は、例えば、一般的な機能を有するPCで構成され、ウェブブラウザ機能を有する。例えば、テキスト登録端末20は、音声データとして、広告の提供を希望する製造業者や、ニュースの提供を希望する新聞社やテレビ局などに配置される。また、テキスト登録端末20は、テレビや冷蔵庫などの任意の家電製品等がインターネットに接続される環境において、これらの家電製品等から情報提供(例えば、音声案内)を希望するサービス提供業者に配置されてもよい。なお、テキスト登録端末20は、テキスト入力端末として必要な構成要素(入力部、表示部や通信部等)を備える。テキスト登録端末20から登録されるテキストは、自然言語で構成される。 The text registration terminal 20 is disposed in a company or the like that registers text as a manuscript of voice data provided from the voice providing system 1. The text registration terminal 20 is, for example, a PC having a general function, and has a web browser function. For example, the text registration terminal 20 is disposed as audio data at a manufacturer who wants to provide an advertisement, a newspaper company who wants to provide news, a television station, or the like. In addition, the text registration terminal 20 is arranged at a service provider who desires to provide information (for example, voice guidance) from home appliances or the like in an environment where any home appliance such as a television or a refrigerator is connected to the Internet. May be In addition, the text registration terminal 20 is provided with the component (an input part, a display part, a communication part, etc.) required as a text input terminal. The text registered from the text registration terminal 20 is composed of natural language.
 声紋登録端末30は、音声提供システム1から提供される音声データの音源となる声紋データ(デジタル声紋データ)を登録する企業等に設置される。声紋登録端末30は、例えば、通常の一般的な機能を有するPCで構成され、ウェブブラウザ機能を有する。例えば、声紋登録端末30は、俳優や声優等が所属するタレント事務所や、スポーツ選手等のマネジメントを行うマネジメント事務所などに配置される。なお、声紋登録端末30は、声紋入力端末として必要な構成要素(入力部、表示部や通信部等)を備える。 The voiceprint registration terminal 30 is installed in a company or the like that registers voiceprint data (digital voiceprint data) serving as a sound source of voice data provided from the voice providing system 1. The voiceprint registration terminal 30 is formed of, for example, a PC having a general function, and has a web browser function. For example, the voiceprint registration terminal 30 is disposed in a talent office to which an actor or a voice actor belongs, a management office that manages athletes, and the like. The voiceprint registration terminal 30 includes components (input unit, display unit, communication unit, etc.) necessary as a voiceprint input terminal.
 ここで、声紋登録端末30から登録される声紋データについて説明する。この声紋データには、例えば、特定の人物の発声を録音した音声の断片データや、特定の人物の発声を分析して得られる、スペクトルや基本周波数などの音響・韻律パラメータが含まれる。なお、声紋データは、これらに限定されるものではなく、後述する音声生成部112による音声合成技術(例えば、波形接続型音声合成やフォルマント合成など)に必要な任意のデータを含む。 Here, voiceprint data registered from the voiceprint registration terminal 30 will be described. The voiceprint data includes, for example, fragment data of a voice recording a voice of a specific person, and sound and prosody parameters such as spectrum and fundamental frequency obtained by analyzing the voice of the specific person. The voiceprint data is not limited to these, and includes arbitrary data necessary for voice synthesis technology (for example, waveform connection type voice synthesis, formant synthesis, etc.) by the voice generation unit 112 described later.
 外部端末群40は、例えば、ネットワーク接続機能を有する任意の端末(機器)で構成される。図1においては、外部端末群40として、カーナビゲーションシステム(以下、「カーナビ」という)41、スマートフォン等の携帯端末42及び冷蔵庫などの家電製品43を例示している。外部端末群40を構成するカーナビ41、携帯端末42及び家電製品43は、各端末に特有の機能に加え、管理サーバ10から提供される音声データを出力する音声出力機能を有する。 The external terminal group 40 is configured of, for example, any terminal (device) having a network connection function. In FIG. 1, as the external terminal group 40, a car navigation system (hereinafter referred to as "car navigation") 41, a portable terminal 42 such as a smartphone, and a home appliance 43 such as a refrigerator are illustrated. The car navigation system 41, the portable terminal 42 and the home appliance 43 constituting the external terminal group 40 have an audio output function of outputting audio data provided from the management server 10 in addition to the functions specific to each terminal.
 図2は、本実施の形態に係る音声提供システム1の管理サーバ10のブロック図である。なお、図2においては、本発明に関連する管理サーバ10の構成要素のみを示している。図2に示すように、管理サーバ10は、管理サーバ10の全体の制御を行う制御部101を有する。制御部101には、生成部102、記憶部103、判定部104、通信部105、入力部106及び表示部107が接続されている。なお、管理サーバ10の構成については、これに限定されるものではなく適宜変更が可能である。 FIG. 2 is a block diagram of the management server 10 of the voice providing system 1 according to the present embodiment. In FIG. 2, only the components of the management server 10 related to the present invention are shown. As shown in FIG. 2, the management server 10 has a control unit 101 that controls the entire management server 10. To the control unit 101, a generation unit 102, a storage unit 103, a determination unit 104, a communication unit 105, an input unit 106, and a display unit 107 are connected. The configuration of the management server 10 is not limited to this, and can be changed as appropriate.
 生成部102は、判定対象生成部111及び音声生成部112を有する。判定対象生成部111は、記憶部103に記憶されたテキストに非許容単語(以下、「NGワード」という)が含まれるか否かを判定する対象(判定対象)を生成する。例えば、判定対象生成部111は、記憶部103に記憶されたテキストを任意箇所で区切って第1の判定対象(第1判定対象)を生成する。また、判定対象生成部111は、記憶部103に記憶されたテキストを形態素に分割して第2の判定対象(第2判定対象)を生成する。さらに、判定対象生成部111は、記憶部103に記憶されたテキスト又は第2判定対象の一部を発音語(例えば、平仮名や韻律)に変換する。 The generation unit 102 includes a determination target generation unit 111 and an audio generation unit 112. The determination target generation unit 111 generates a target (determination target) that determines whether the text stored in the storage unit 103 includes a non-permissive word (hereinafter, referred to as an “NG word”). For example, the determination target generation unit 111 generates a first determination target (first determination target) by dividing the text stored in the storage unit 103 at an arbitrary position. In addition, the determination target generation unit 111 divides the text stored in the storage unit 103 into morphemes to generate a second determination target (second determination target). Furthermore, the determination target generation unit 111 converts the text stored in the storage unit 103 or a part of the second determination target into a phonetic word (for example, hiragana or prosody).
 ここで、判定対象生成部111により生成される第1判定対象、第2判定対象について具体的に示す。図3A及び図3Bは、それぞれ判定対象生成部111により生成される第1判定対象及び第2判定対象の一例を示す図である。図3Aにおいては、日本語を用いたテキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」及び英語を用いたテキスト「It is good weather today」から第1判定対象を生成する場合について示している。また、図3Bにおいては、日本語を用いたテキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」及び英語を用いたテキスト「It is good weather today」から第2判定対象を生成する場合について示している。なお、英語を用いてテキストでは、発音記号により第1判定対象、第2判定対象を示している。以下では、日本語を用いたテキストを使用して第1判定対象、第2判定対象について説明する。 Here, the first determination target and the second determination target generated by the determination target generation unit 111 are specifically shown. 3A and 3B are diagrams showing an example of the first determination target and the second determination target generated by the determination target generation unit 111, respectively. FIG. 3A shows a case where the first determination target is generated from the text "Kyouway ITENKIDESU" using Japanese and the text "It is good weather today" using English. Moreover, in FIG. 3B, the case where the second determination target is generated from the text "Japanese weather is good (KYOUWAY ITENKIDESU)" using Japanese and the text "It is good weather today" using English is shown. In the text using English, the first determination target and the second determination target are indicated by phonetic symbols. Hereinafter, the first determination target and the second determination target will be described using text in Japanese.
 図3Aに示すように、第1判定対象は、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」を発音語に変換し、任意箇所で区切って生成される。例えば、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」は、「きょ-う-は-よ-い-て-ん-き-で-す(KYO-U-WA-YO-I-TE-N-KI-DE-SU)」や「きょう-はよ-いて-んき-です(KYOU-WAYO-ITE-NKI-DESU)」に区切って第1判定対象とされる。第1判定対象には、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」を構成する平仮名の順番を維持した状態において、任意の発音語数(平仮名数)で分割した全ての組み合わせが含まれる。 As shown in FIG. 3A, the first determination target is generated by converting the text “It is a good weather today (KYOUWAY OITENKIDESU)” into a phonetic word and dividing it into an arbitrary part. For example, the text "Good weather today (KYOUWAY OITENKIDESU)" is "Kyo-ha-yo-i-te-n-ki-su" (KYO-U-WA-YO-I-TE-N -KI-DE-SU) or "K-Ha-Ye-Ite-Ken-Ki (KYOU-WAYO-ITE-NKI-DESU)" is considered as the first judgment target. The first determination target includes all combinations divided by an arbitrary number of phonetic words (number of hiragana) while maintaining the order of hiragana constituting the text “Today is a fine weather (KYOUWAY OITENKIDESU)”.
 なお、第1判定対象を生成する際、テキストに複数の発音語が含まれる場合には、全ての発音語を含む第1判定対象が生成される。例えば、図3Aに示すテキスト内の「良い(YOI)」には、「よい(YOI)」及び「いい(II)」の発音語が存在する。このため、このテキストから生成される第1判定対象には、「きょうはよいてんきです(KYOUWAYOITENKIDESU)」を任意箇所で区切ったものと、「きょうはいいてんきです(KYOUWAIITENKIDESU)」を任意箇所で区切ったものとが含まれる。 When generating a first determination target, if a plurality of phonetic words are included in the text, a first determination target including all the phonetic words is generated. For example, in the text "good (YOI)" shown in FIG. 3A, the pronunciation words "good (YOI)" and "good (II)" exist. Therefore, for the first judgment target generated from this text, "Kyou wa good day (KYOWAYOITENKIDESU)" is divided at any place, and "Kyouha is good day (KYOWA II TENKIDESU)" at any place Included.
 一方、図3Bに示すように、第2判定対象は、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」を形態素に分割して生成される。例えば、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」は、「今日-は-良い-天気-です(KYOU-WA-YOI-TENKI-DESU)」に分割されて第2判定対象とされる。判定対象生成部111により生成された判定対象(第1判定対象、第2判定対象)は、後述する記憶部103内のテキストデータベース(DB)113にて、生成元のテキストに関連付けて登録される。 On the other hand, as shown in FIG. 3B, the second determination target is generated by dividing the text “It is a fine weather today (KYOUWAY OITENKIDESU)” into morphemes. For example, the text "Today is fine weather (KYOUWAY OITENKIDESU)" is divided into "Today-Ha-Good-Weather-(KUOU-WA-YOI-TENKI-DESU)" and is considered as the second judgment target. The determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in the text database (DB) 113 in the storage unit 103 described later in association with the text of the generation source. .
 音声生成部112は、テキストDB113に保存されたテキストから、後述する記憶部103内の声紋DB114に登録された声紋データを用いて音声データを生成する。音声生成部112は、声紋データに基づいて音声波形を生成する、音声合成部と呼ぶこともできる。例えば、音声生成部112は、波形接続型音声合成やフォルマント合成等により音声波形を生成することができる。波形型音声合成では、録音された特定の人物等の音声の断片データが連結して合成される。一方、フォルマント合成では、録音された特定の人物等の音声は使用されず、基底周波数、音色、雑音レベルなどのパラメータを調整して波形が形成され、人工的な音声データが生成される。 The voice generation unit 112 generates voice data from the text stored in the text DB 113 using voiceprint data registered in a voiceprint DB 114 in the storage unit 103 described later. The voice generation unit 112 can also be called a voice synthesis unit that generates a voice waveform based on voiceprint data. For example, the voice generation unit 112 can generate a voice waveform by waveform connection type voice synthesis, formant synthesis, or the like. In waveform-type speech synthesis, fragment data of a recorded voice of a specific person or the like is concatenated and synthesized. On the other hand, in formant synthesis, the voice of a particular person recorded is not used, and parameters such as the base frequency, timbre, and noise level are adjusted to form a waveform, and artificial voice data is generated.
 また、音声生成部112は、後述するように、テキストに含まれるNGワードの有無に応じて生成する音声データの態様を変更する。テキストにNGワードが含まれない場合には、当該テキストを修正することなくテキストに対応する音声データを生成する。一方、テキストにNGワードが含まれる場合には、NGワードに対応する部分を修正したテキストに対応する音声データを生成する。例えば、NGワードに対応する部分を修正する際、音声生成部112は、当該部分を削除又は置換した音声データを生成することができる。 Further, as described later, the voice generation unit 112 changes the mode of the voice data to be generated according to the presence or absence of the NG word included in the text. If the text does not contain an NG word, audio data corresponding to the text is generated without correcting the text. On the other hand, when the text includes an NG word, voice data corresponding to the text in which the portion corresponding to the NG word is corrected is generated. For example, when correcting the part corresponding to the NG word, the voice generation unit 112 can generate voice data in which the part is deleted or replaced.
 記憶部103は、制御部101が管理サーバ10を制御するために必要な情報が記憶されている。例えば、記憶部103には、後述するテキスト登録画面(図6参照)、声紋登録画面(図8参照)及び設定入力画面(図10参照)を生成するための情報が記憶されている。また、記憶部103には、各種の情報が登録されたデータベース(DB)が記憶されている。具体的には、テキストDB113、声紋DB114、第1NGワードDB115、第2NGワードDB116が記憶されている。 The storage unit 103 stores information necessary for the control unit 101 to control the management server 10. For example, the storage unit 103 stores information for generating a text registration screen (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later. In addition, the storage unit 103 stores a database (DB) in which various types of information are registered. Specifically, text DB 113, voiceprint DB 114, first NG word DB 115, and second NG word DB 116 are stored.
 テキストDB113には、ネットワークNWを介してテキスト登録端末20から登録されたテキストが登録されている。テキストDB113において、テキストは、テキスト登録端末20の識別情報と関連付けて登録されている。また、テキストDB113においては、判定対象生成部111により生成された判定対象(第1判定対象、第2判定対象)が生成元のテキストに関連付けて登録されている。 In the text DB 113, texts registered from the text registration terminal 20 via the network NW are registered. In the text DB 113, the text is registered in association with the identification information of the text registration terminal 20. Further, in the text DB 113, the determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in association with the text of the generation source.
 声紋DB114には、ネットワークNWを介して声紋登録端末30から登録された声紋データが登録されている。声紋DB114において、声紋データは、声紋登録端末30の識別情報と関連付けて登録されている。 Voiceprint data registered from the voiceprint registration terminal 30 via the network NW is registered in the voiceprint DB 114. In the voiceprint DB 114, voiceprint data is registered in association with identification information of the voiceprint registration terminal 30.
 第1NGワードDB115には、社会通念上、使用することが好ましくない単語や、後述するテキストの属性情報から特定される単語を含む基本NGワードが登録されている。例えば、基本NGワードには、第三者を罵倒する単語、猥褻な単語や反社会的な発言を連想させる単語が含まれる。また、基本NGワードには、テキストの属性情報が「政治」の場合に政治的立場を連想させる単語が含まれる。 In the first NG word DB 115, basic NG words including words that are undesirable to use in social terms and words specified from attribute information of text described later are registered. For example, the basic NG word includes a word that insults a third party, a word that is reminiscent of an obscene word or an antisocial statement. In addition, the basic NG word includes a word that is associated with a political position when the attribute information of the text is "political".
 第2NGワードDB116には、ネットワークNWを介して声紋登録端末30から登録された単語を含む個別NGワードが登録されている。この個別NGワードは、声紋DB116に登録された声紋データと関連付けて登録されている。個別NGワードには、声紋データを提供する人物の印象に悪影響を与える単語が含まれる。例えば、声紋データを提供する人物が「スポーツ選手」である場合に「八百長」や「ドーピング」などの単語が含まれる。 In the second NG word DB 116, individual NG words including words registered from the voiceprint registration terminal 30 via the network NW are registered. The individual NG word is registered in association with the voiceprint data registered in the voiceprint DB 116. The individual NG words include words that adversely affect the impression of the person providing the voiceprint data. For example, when the person providing the voiceprint data is a "sport athlete", words such as "eight hundred long" and "doping" are included.
 判定部104は、テキストDB113に登録されるテキストや、このテキストに関連付けられる第1判定対象、第2判定対象にNGワード又はNGワードでない許容単語(以下、「OKワード」という)が含まれるかを判定する。テキストDB113に登録されるテキストにNGワードが含まれるかを判定する際、判定部104は、第1NGワードDBに登録された基本NGワードを参照する。また、テキストDB113に登録された第1判定対象、第2判定対象にNGワードが含まれるかを判定する際、判定部104は、第2NGワードDBに登録された個別NGワードを参照する。この場合、判定部104は、必要に応じて個別NGワードの発音語(NG音)を生成し、第1判定対象及び第2判定対象と比較する。 The determination unit 104 determines whether the text registered in the text DB 113 or the first determination target associated with the text or the second determination target includes an NG word or a non-NG word permissible word (hereinafter referred to as an “OK word”). Determine When determining whether the text registered in the text DB 113 contains an NG word, the determination unit 104 refers to the basic NG word registered in the first NG word DB. When determining whether the first determination target or the second determination target registered in the text DB 113 contains an NG word, the determination unit 104 refers to the individual NG word registered in the second NG word DB. In this case, the determination unit 104 generates a phonetic word (NG sound) of the individual NG word as necessary, and compares it with the first determination target and the second determination target.
 通信部105は、制御部101の制御の下、テキスト登録端末20、声紋登録端末30及び外部端末群40との間で情報の通信を行う。例えば、通信部105は、テキスト登録端末20及び声紋登録端末30に対して、それぞれテキスト登録画面(図6参照)及び声紋登録画面(図8参照)に必要な情報を送信する。一方、通信部105は、テキスト登録端末20、声紋登録端末30及び外部端末群40から、それぞれテキスト、声紋データ及び設定入力情報を受信する。 The communication unit 105 communicates information with the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40 under the control of the control unit 101. For example, the communication unit 105 transmits information necessary for the text registration screen (see FIG. 6) and the voiceprint registration screen (see FIG. 8) to the text registration terminal 20 and the voiceprint registration terminal 30, respectively. On the other hand, the communication unit 105 receives text, voiceprint data and setting input information from the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40, respectively.
 入力部106は、管理サーバ10に対する指示を受け付ける。例えば、入力部106は、第1NGワードDB115における基本NGワードの編集等の指示を受け付ける。表示部107は、管理サーバ10を操作するために必要な情報を表示する。例えば、表示部107には、管理サーバ10のステータスや記憶部103に記憶されたテキストや声紋データの登録状況等が表示される。 The input unit 106 receives an instruction for the management server 10. For example, the input unit 106 receives an instruction such as editing of the basic NG word in the first NG word DB 115. The display unit 107 displays information required to operate the management server 10. For example, the display unit 107 displays the status of the management server 10 and the registration status of text and voiceprint data stored in the storage unit 103.
 ここで、本実施の形態に係る音声提供システム1から音声提供を受ける外部端末群40の構成例について携帯端末42を代表して説明する。図4は、本実施の形態に係る音声提供システム1から音声提供を受ける携帯端末42のブロック図である。なお、図4においては、本発明に関連する携帯端末42の構成要素のみを示している。 Here, a configuration example of the external terminal group 40 that receives voice provision from the voice provision system 1 according to the present embodiment will be described on behalf of the portable terminal 42. FIG. 4 is a block diagram of a portable terminal 42 that receives voice provision from the voice provision system 1 according to the present embodiment. In FIG. 4, only the components of the portable terminal 42 related to the present invention are shown.
 図4に示すように、携帯端末42は、端末全体の制御を行う制御部421を備える。制御部421には、アプリケーション実行部(以下、「アプリ実行部」という)422、音声出力部423、通信部424、入力部425及び表示部426が接続されている。なお、携帯端末42の構成については、図4に示す構成に限定されるものではなく適宜変更が可能である。 As shown in FIG. 4, the portable terminal 42 includes a control unit 421 that controls the entire terminal. The control unit 421 is connected to an application execution unit (hereinafter referred to as “application execution unit”) 422, an audio output unit 423, a communication unit 424, an input unit 425, and a display unit 426. The configuration of the portable terminal 42 is not limited to the configuration shown in FIG. 4 and can be changed as appropriate.
 アプリ実行部422は、管理サーバ10から提供される音声データを出力するために必要な処理を実行する。例えば、アプリ実行部422は、管理サーバ10から提供される音声データに対する設定を入力する設定入力画面(図10参照)を生成し、この設定入力画面を表示部426で表示する。また、アプリ実行部422は、通信部424を介して受信した音声データの確認(例えば、設定入力画面での設定と合致してるか否かの確認)を行い、音声出力部423に出力する。 The application execution unit 422 executes processing necessary to output voice data provided from the management server 10. For example, the application execution unit 422 generates a setting input screen (see FIG. 10) for inputting settings for audio data provided from the management server 10, and displays the setting input screen on the display unit 426. In addition, the application execution unit 422 performs confirmation (for example, confirmation as to whether or not the setting on the setting input screen matches) of the audio data received via the communication unit 424, and outputs the audio data to the audio output unit 423.
 音声出力部423は、アプリ実行部422から受け取った音声データを出力する。例えば、音声出力部423は、管理サーバ10から提供されるテキストに対応する音声データをスピーカから出力する。 The audio output unit 423 outputs the audio data received from the application execution unit 422. For example, the audio output unit 423 outputs audio data corresponding to the text provided from the management server 10 from the speaker.
 通信部424は、制御部421の制御の下、ネットワークNWを介して管理サーバ10との間で情報を通信する。例えば、通信部424は、上述した設定入力画面で入力された情報を管理サーバ10に対して送信する。また、通信部424は、管理サーバ10から音声データを受信する。 The communication unit 424 communicates information with the management server 10 via the network NW under the control of the control unit 421. For example, the communication unit 424 transmits the information input on the setting input screen described above to the management server 10. The communication unit 424 also receives voice data from the management server 10.
 入力部425は、携帯端末42に対する指示を受け付ける。例えば、入力部425は、設定入力画面への情報の入力指示を受け付ける。表示部426は、携帯端末42を操作するために必要な情報を表示する。例えば、表示部426には、携帯端末42のステータスや設定入力画面等が表示される。 The input unit 425 receives an instruction for the portable terminal 42. For example, the input unit 425 receives an instruction to input information on the setting input screen. The display unit 426 displays information necessary to operate the mobile terminal 42. For example, the display unit 426 displays the status of the portable terminal 42, a setting input screen, and the like.
 本実施の形態に係る音声提供システム1において、例えば、管理サーバ10は、テキスト登録端末20から新聞記事などのテキストを受信し、声紋登録端末30から特定の俳優などの声紋データを受信する。一方、管理サーバ10は、携帯端末42から、所望のテキスト及び声紋データが指定された設定情報を受信する。管理サーバ10は、携帯端末42からの設定情報に基づいて、指定された声紋データを用いてテキストから音声データを生成し、携帯端末42に提供する。これにより、携帯端末42においては、新聞記事などのテキストが操作者のお気に入りの俳優の声で読み上げられた音声データを受信し、出力することができる。 In the voice providing system 1 according to the present embodiment, for example, the management server 10 receives a text such as a newspaper article from the text registration terminal 20 and receives voiceprint data of a specific actor or the like from the voiceprint registration terminal 30. On the other hand, the management server 10 receives, from the portable terminal 42, setting information in which desired text and voiceprint data are designated. The management server 10 generates voice data from text using specified voiceprint data based on setting information from the portable terminal 42, and provides the voice data to the portable terminal 42. As a result, the portable terminal 42 can receive and output voice data in which text such as a newspaper article is read out with the voice of the operator's favorite actor.
 以下、このような構成を有する音声提供システム1におけるテキストの登録から音声データの提供までの動作について説明する。まず、テキスト登録端末20から管理サーバ10に対してテキストを登録する際の動作について説明する。図5は、本実施の形態に係る音声提供システム1におけるテキスト登録時の動作を説明するためのフロー図である。 Hereinafter, operations from registration of text to provision of voice data in the voice providing system 1 having such a configuration will be described. First, an operation of registering a text from the text registration terminal 20 to the management server 10 will be described. FIG. 5 is a flowchart for explaining the operation at the time of text registration in the speech providing system 1 according to the present embodiment.
 図5に示すように、管理サーバ10にテキストを登録する際、テキスト登録端末20からテキストの登録が申請される(ステップST501)。このテキスト登録申請を検出すると、記憶部103に記憶された情報からテキスト登録画面の生成に必要な情報(登録画面情報)が読み出され、通信部105を介してテキスト登録端末20に出力される(ステップST502)。このテキスト登録画面に必要な情報を受信すると、テキスト登録端末20にてテキスト登録画面が表示される(ステップST503)。 As shown in FIG. 5, when registering a text in the management server 10, registration of the text is applied from the text registration terminal 20 (step ST501). When the text registration application is detected, information (registration screen information) necessary for generating a text registration screen is read out from the information stored in the storage unit 103 and is output to the text registration terminal 20 through the communication unit 105. (Step ST502). When the information necessary for the text registration screen is received, the text registration screen is displayed on the text registration terminal 20 (step ST503).
 図6は、本実施の形態に係る音声提供システム1で利用されるテキスト登録画面600の一例を示す図である。図6に示すテキスト登録画面600は、テキスト登録端末20の操作者が提供したいテキストを登録するための画面である。図6に示すように、テキスト登録画面600には、属性選択部601、テキスト入力部602、リセットボタン603及び終了ボタン604が設けられている。 FIG. 6 is a view showing an example of a text registration screen 600 used by the voice providing system 1 according to the present embodiment. The text registration screen 600 shown in FIG. 6 is a screen for registering a text that the operator of the text registration terminal 20 wants to provide. As shown in FIG. 6, the text registration screen 600 is provided with an attribute selection unit 601, a text input unit 602, a reset button 603, and an end button 604.
 属性選択部601は、登録するテキストの属性情報を選択する部分である。例えば、属性選択部601には、テキストの属性情報として、「エンタメ」、「スポーツ」、「ニュース」、「経済」等のカテゴリを選択するボックス(カテゴリ選択ボックス)が設けられている。これらのカテゴリ選択ボックスを選択することにより、登録されるテキストの属するカテゴリを特定することができる。なお、属性選択部601には、直接的にテキストの属性情報を入力する構成としてもよい。属性選択部601は、テキストの属性情報を指定することを前提として任意の構成を採用することができる。 The attribute selection unit 601 is a part for selecting attribute information of text to be registered. For example, the attribute selection unit 601 is provided with a box (category selection box) for selecting a category such as “entertainment”, “sports”, “news”, “economy” or the like as text attribute information. By selecting these category selection boxes, the category to which the text to be registered belongs can be specified. The attribute selecting unit 601 may be configured to directly input text attribute information. The attribute selection unit 601 can adopt an arbitrary configuration on the premise of specifying text attribute information.
 テキスト入力部602は、登録するテキストが入力される部分である。テキスト入力部602には、テキストを入力する欄(テキスト入力欄)が設けられている。このテキスト入力欄にテキストの文字や数字を入力することにより、管理サーバ10に登録したいテキストを指定することができる。例えば、テキスト入力欄には、新聞記事、交通情報、音声案内や広告情報に関するテキストが入力される。 The text input unit 602 is a portion into which text to be registered is input. The text input unit 602 is provided with a field for inputting text (text input field). By entering text characters and numbers in this text entry field, it is possible to specify text to be registered in the management server 10. For example, in the text entry field, texts related to newspaper articles, traffic information, voice guidance and advertisement information are entered.
 リセットボタン603は、テキスト登録画面600で選択、指定した情報をリセットする際に利用される。終了ボタン604は、テキスト登録画面600を用いたテキストの登録処理を終了する際に利用される。終了ボタン604を選択することにより、テキスト登録画面600を介して選択/入力した属性情報及びテキストが管理サーバ10に送信される。 The reset button 603 is used to reset the information selected and designated on the text registration screen 600. The end button 604 is used when ending the text registration process using the text registration screen 600. By selecting the end button 604, the attribute information and text selected / input via the text registration screen 600 are transmitted to the management server 10.
 なお、テキスト登録画面600は、図6に示す例に限定されるものではなく適宜変更が可能である。テキスト登録画面600に、登録したテキストに対する取扱いについて指定する部分を設けることは実施の形態として好ましい。例えば、登録したテキストに、声紋データとの関係でNGワードが含まれた場合の音声データの修正方法(NGワードの削除、置換)を指定するようにしてもよい。 The text registration screen 600 is not limited to the example shown in FIG. 6 and can be changed as appropriate. It is preferable as an embodiment to provide the text registration screen 600 with a portion for specifying the handling of the registered text. For example, the correction method (deletion and substitution of the NG word) of the voice data when the NG text is included in the registered text in relation to the voiceprint data may be designated.
 このようなテキスト登録画面600が表示されると、属性選択部601から属性情報が選択され、テキスト入力部602にテキストが入力される。そして、テキスト登録画面600の終了ボタン604が選択されると、これらの属性情報及びテキストが管理サーバ10に送信される(ステップST504)。 When such a text registration screen 600 is displayed, attribute information is selected from the attribute selection unit 601, and text is input to the text input unit 602. When the end button 604 of the text registration screen 600 is selected, these attribute information and text are transmitted to the management server 10 (step ST504).
 属性情報及びテキストを受け取ると、判定部104によりテキストにNGワードが含まれているかが判定される(ステップST505)。このとき、判定部104は、第1NGワードDBに登録された基本NGワードを参照する。これにより、テキスト内に、社会通念上、使用することが好ましくない単語等が含まれるかが検出される。このようにテキストの登録段階で基本NGワードの有無を判定することにより、基本NGワードを含むテキストが管理サーバ10に登録されるのを防止することができる。 When the attribute information and the text are received, it is determined by the determination unit 104 whether the text includes an NG word (step ST505). At this time, the determination unit 104 refers to the basic NG word registered in the first NG word DB. As a result, it is detected whether the text contains a word or the like which is undesirable to use, as it is customary. As described above, by determining the presence or absence of the basic NG word at the text registration stage, it is possible to prevent the text including the basic NG word from being registered in the management server 10.
 テキストに基本NGワードが含まれる場合(ステップST505:Yes)には、テキスト登録端末20にその旨を示すエラーメッセージが出力される(ステップST506)。このようにエラーメッセージを出力することにより、テキスト等の不適切性をテキスト登録者に通知することができる。エラーメッセージを受信したテキスト登録者は、再びテキスト登録画面600からテキスト等を入力し、管理サーバ10に送信する(ステップST504)。 If the text contains a basic NG word (step ST505: YES), an error message indicating that is output to the text registration terminal 20 (step ST506). By outputting an error message in this manner, it is possible to notify the text registrant of inappropriateness such as text. The text registrant who has received the error message inputs the text etc. again from the text registration screen 600, and transmits it to the management server 10 (step ST504).
 一方、テキストに基本NGワードが含まれない場合(ステップST505:No)には、ステップST504で送信されたテキストがテキストDB113に登録される(ステップST507)。そして、テキストDB113への登録処理が完了すると、管理サーバ10からテキスト登録端末20に対してテキスト登録の完了が通知される(ステップST508)。このような一連の動作により管理サーバ10(テキストDB113)には、新聞記事等のテキストが登録される。 On the other hand, when the text does not contain the basic NG word (step ST505: No), the text transmitted in step ST504 is registered in the text DB 113 (step ST507). Then, when the registration process to the text DB 113 is completed, the management server 10 notifies the text registration terminal 20 of the completion of the text registration (step ST508). Through such a series of operations, a text such as a newspaper article is registered in the management server 10 (text DB 113).
 次に、声紋登録端末30から管理サーバ10に対して声紋データを登録する際の動作について説明する。図7は、本実施の形態に係る音声提供システム1における声紋登録時の動作を説明するためのフロー図である。 Next, the operation when voiceprint data is registered from voiceprint registration terminal 30 to management server 10 will be described. FIG. 7 is a flowchart for explaining the operation at the time of voiceprint registration in voice providing system 1 according to the present embodiment.
 図7に示すように、管理サーバ10に声紋データを登録する際、声紋登録端末30から声紋データの登録が申請される(ステップST701)。この声紋登録申請を検出すると、記憶部103に記憶された情報から声紋登録画面の生成に必要な情報(登録画面情報)が読み出され、通信部105を介して声紋登録端末30に出力される(ステップST702)。この声紋登録画面に必要な情報を受信すると、声紋登録端末30にて声紋登録画面が表示される(ステップST703)。 As shown in FIG. 7, when voiceprint data is registered in the management server 10, registration of voiceprint data is applied from the voiceprint registration terminal 30 (step ST701). When the voiceprint registration application is detected, information (registration screen information) necessary for generating the voiceprint registration screen is read out from the information stored in the storage unit 103 and is output to the voiceprint registration terminal 30 through the communication unit 105. (Step ST702). When the information necessary for the voiceprint registration screen is received, the voiceprint registration screen is displayed on the voiceprint registration terminal 30 (step ST703).
 図8は、本実施の形態に係る音声提供システム1で利用される声紋登録画面800の一例を示す図である。図8に示す声紋登録画面800は、声紋登録端末30の操作者が提供したい声紋データを登録するための画面である。図8に示すように、声紋登録画面800には、属性選択部801、NGワードカテゴリ選択部802、NGワード選択/入力部803、声紋入力部804、リセットボタン805及び終了ボタン806が設けられている。 FIG. 8 is a view showing an example of a voiceprint registration screen 800 used in the voice providing system 1 according to the present embodiment. The voiceprint registration screen 800 shown in FIG. 8 is a screen for registering voiceprint data that the operator of the voiceprint registration terminal 30 wants to provide. As shown in FIG. 8, the voiceprint registration screen 800 is provided with an attribute selection unit 801, an NG word category selection unit 802, an NG word selection / input unit 803, a voiceprint input unit 804, a reset button 805 and an end button 806. There is.
 属性選択部801は、登録する声紋データ(より具体的には、声紋データの人物)の属性情報を選択する部分である。例えば、属性選択部801には、声紋データの属性情報として、「俳優」、「アイドル」、「声優」、「アーティスト」等のカテゴリを選択するボックス(カテゴリ選択ボックス)が設けられている。これらのカテゴリ選択ボックスを選択することにより、登録される声紋データの属するカテゴリを特定することができる。なお、属性選択部801には、直接的に声紋データの属性情報を入力するようにしてもよい。属性選択部801は、声紋データの属性情報を指定することを前提として任意の構成を採用することができる。 The attribute selection unit 801 is a portion for selecting attribute information of voiceprint data to be registered (more specifically, a person of voiceprint data). For example, the attribute selection unit 801 is provided with a box (category selection box) for selecting a category such as "actor", "idle", "voice actor", or "artist" as attribute information of voiceprint data. By selecting these category selection boxes, it is possible to specify the category to which voiceprint data to be registered belongs. The attribute selection unit 801 may directly input the attribute information of the voiceprint data. The attribute selection unit 801 can adopt an arbitrary configuration on the premise of designating attribute information of voiceprint data.
 NGワードカテゴリ選択部802は、NGワード(個別NGワード)のカテゴリを選択する部分である。NGワードカテゴリ選択部802には、例えば、「離婚」、「災害」、「反社会」や「広告」などのカテゴリを選択するボックス(カテゴリ選択ボックス)が設けられている。これらのカテゴリ選択ボックスには、各カテゴリに関連付けられるNGワードの候補(NGワード候補)が予め登録されている。これらのカテゴリ選択ボックスを選択することにより、登録される声紋データに関連付けられるNGワード(個別NGワード)の属するカテゴリを特定することができる。 The NG word category selection unit 802 is a part that selects a category of the NG word (individual NG word). The NG word category selection unit 802 is provided with, for example, a box (category selection box) for selecting a category such as “divorce”, “disaster”, “anti-society” or “advertisement”. In these category selection boxes, NG word candidates (NG word candidates) associated with each category are registered in advance. By selecting these category selection boxes, it is possible to specify the category to which the NG word (individual NG word) associated with the voiceprint data to be registered belongs.
 NGワード選択/入力部803は、登録する声紋データに関連付けられるNGワード(個別NGワード)を選択又は入力する部分である。NGワード選択/入力部803には、上述したNGワードカテゴリ選択部802からカテゴリを選択することにより、NGワード候補が表示される。声紋登録者は、このようなNGワード候補から登録する声紋データに関連付けられるNGワードを選択することができる。また、声紋登録者は、NGワード選択/入力部803に直接的にNGワード(個別NGワード)を入力することもできる。 The NG word selection / input unit 803 is a portion for selecting or inputting an NG word (individual NG word) associated with voiceprint data to be registered. An NG word candidate is displayed on the NG word selection / input unit 803 by selecting a category from the above-mentioned NG word category selection unit 802. The voiceprint registrant can select an NG word to be associated with voiceprint data to be registered from such an NG word candidate. Also, the voiceprint registrant can directly input an NG word (individual NG word) to the NG word selection / input unit 803.
 声紋入力部804は、登録する声紋データ(デジタル声紋データ)を入力する部分である。声紋入力部804には、声紋データを添付するボックス(声紋添付ボックス)が設けられている。この声紋添付ボックスに声紋データを添付することにより、管理サーバ10に登録したい声紋データを指定することができる。 The voiceprint input unit 804 is a part for inputting voiceprint data (digital voiceprint data) to be registered. The voiceprint input unit 804 is provided with a box (voiceprint attachment box) to which voiceprint data is attached. By attaching voiceprint data to the voiceprint attachment box, voiceprint data to be registered in the management server 10 can be designated.
 リセットボタン805は、声紋登録画面800で選択、指定した情報をリセットする際に利用される。終了ボタン806は、声紋登録画面800を用いた声紋データの登録処理を終了する際に利用される。終了ボタン806を選択することにより、声紋登録画面800を介して選択/入力した属性情報及び声紋データが管理サーバ10に送信される。 The reset button 805 is used to reset the information selected and designated on the voiceprint registration screen 800. The end button 806 is used when ending registration processing of voiceprint data using the voiceprint registration screen 800. By selecting the end button 806, the attribute information and voiceprint data selected / input via the voiceprint registration screen 800 are transmitted to the management server 10.
 なお、声紋登録画面800は、図8に示す例に限定されるものではなく適宜変更が可能である。声紋登録画面800に、登録した声紋データに対する取扱いについて指定する部分を設けることは実施の形態として好ましい。例えば、登録した声紋データを用いて音声データを生成する際、テキストにNGワードが含まれた場合の音声データの修正方法(NGワードの削除、置換)を指定するようにしてもよい。 The voiceprint registration screen 800 is not limited to the example shown in FIG. 8 and can be changed as appropriate. It is preferable as an embodiment to provide the voiceprint registration screen 800 with a portion for designating the handling of the registered voiceprint data. For example, when voice data is generated using registered voiceprint data, a correction method (deletion or replacement of NG word) of voice data when an NG word is included in the text may be designated.
 また、声紋登録画面800において、特定の人物の声紋データに関するNGワードを類推して表示させる機能を備えることは実施の形態として好ましい。例えば、特定の人物の過去1年間の言動(例えば、テレビやラジオ等のメディアにより発言等)に基づいてNGワードを類推し、NGワード選択/入力部803に表示するようにしてもよい。これらのNGワードは、声紋登録者からの選択に応じて表示することが好ましい。 Further, it is preferable as a preferred embodiment to have a voiceprint registration screen 800 having a function of analogically displaying an NG word related to voiceprint data of a specific person. For example, the NG word may be analogized based on the speech and behavior of a specific person in the past one year (for example, an utterance by a medium such as a television or radio), and may be displayed on the NG word selection / input unit 803. These NG words are preferably displayed according to the selection from the voiceprint registrant.
 このような声紋登録画面800が表示されると、属性選択部801から属性情報が選択され、NGワードカテゴリ選択部802からNGワードのカテゴリが選択される。これらの情報が選択されると、属性情報及びNGワードカテゴリが管理サーバ10に送信される(ステップST704)。 When such a voiceprint registration screen 800 is displayed, the attribute information is selected from the attribute selection unit 801, and the category of the NG word is selected from the NG word category selection unit 802. When these pieces of information are selected, the attribute information and the NG word category are transmitted to the management server 10 (step ST704).
 属性情報及びNGワードカテゴリを受け取ると、管理サーバ10からNGワードの候補リスト(NGワード候補リスト)が声紋登録端末30に送信される(ステップST705)。このNGワード候補リストは、声紋登録画面800のNGワード選択/入力部803に表示される。 When the attribute information and the NG word category are received, a candidate list of NG words (NG word candidate list) is transmitted from the management server 10 to the voiceprint registration terminal 30 (step ST 705). The NG word candidate list is displayed on the NG word selection / input unit 803 of the voiceprint registration screen 800.
 なお、ここでは、NGワードカテゴリ選択部802からのカテゴリの選択等に応じてNGワード候補リストを声紋登録端末30に送信する態様について説明しているが、これに限定されない。例えば、ステップST702でNGワード候補リストを声紋登録端末30に送信しておき、属性情報及びカテゴリの選択に応じてこれらをNGワード選択/入力部803に表示するようにしてもよい。 Here, although the aspect of transmitting the NG word candidate list to the voiceprint registration terminal 30 in accordance with the selection of the category from the NG word category selection unit 802 or the like has been described, the present invention is not limited thereto. For example, the NG word candidate list may be transmitted to the voiceprint registration terminal 30 in step ST702, and may be displayed on the NG word selection / input unit 803 according to the selection of the attribute information and the category.
 NGワード選択/入力部803にNGワードが表示されると、声紋登録者によりNGワード選択/入力部803からNGワード(個別NGワード)が指定され、声紋入力部804にて声紋データが添付される。そして、声紋登録画面800の終了ボタン806が選択されると、これらのNGワード及び声紋データが管理サーバ10に送信される(ステップST706)。 When an NG word is displayed on the NG word selection / input unit 803, the voiceprint register designates an NG word (individual NG word) from the NG word selection / input unit 803, and voiceprint data is attached by the voiceprint input unit 804. Ru. Then, when the end button 806 of the voiceprint registration screen 800 is selected, these NG words and voiceprint data are transmitted to the management server 10 (step ST706).
 NGワード及び声紋データを受信すると、声紋データは声紋DB114に登録され、NGワードは、第2NGワードDB116に登録される(ステップST707)。なお、第2NGワードDB116において、NGワードは、この声紋データに関連付けて登録されている。そして、声紋DB114、第2NGワードDB116への登録処理が完了すると、管理サーバ10から声紋登録端末30に対して声紋登録の完了が通知される(ステップST708)。このような一連の動作により、管理サーバ10(声紋DB114、第2NGワードDB116)には、声紋登録者が音声データの生成に利用可能な俳優や女優等の声紋データが登録されると共に、その声紋データのNGワードが登録される。 When the NG word and the voiceprint data are received, the voiceprint data is registered in the voiceprint DB 114, and the NG word is registered in the second NG word DB 116 (step ST707). In the second NG word DB 116, the NG word is registered in association with the voiceprint data. Then, when the registration process for the voiceprint DB 114 and the second NG word DB 116 is completed, the management server 10 notifies the voiceprint registration terminal 30 of the completion of voiceprint registration (step ST 708). By such a series of operations, voiceprint data of an actor, an actress, etc. that can be used by the voiceprint registrant to generate voice data is registered in the management server 10 (voiceprint DB 114, second NG word DB 116). NG word of data is registered.
 以上のテキスト登録動作及び声紋登録動作により、管理サーバ10には、音声データを生成するためのテキスト及び声紋データが登録される。管理サーバ10は、このようなテキスト及び声紋データを用いて音声データを生成し、生成した音声データを携帯端末42等に提供する。この際、管理サーバ10は、携帯端末42等から指定される所望の設定に基づいて、テキスト及び声紋データを選択し、これらのテキスト及び声紋データに基づいて音声データを生成する。 By the above text registration operation and voiceprint registration operation, text and voiceprint data for generating voice data are registered in the management server 10. The management server 10 generates voice data using such text and voiceprint data, and provides the generated voice data to the portable terminal 42 and the like. At this time, the management server 10 selects text and voiceprint data based on a desired setting specified by the portable terminal 42 or the like, and generates voice data based on the text and voiceprint data.
 次に、携帯端末42から管理サーバ10に所望の設定を指定し、管理サーバ10から提供された音声データを携帯端末42にて出力する動作について説明する。図9は、本実施の形態に係る音声提供システム1における音声提供時の動作を説明するためのフロー図である。 Next, an operation of specifying desired settings from the portable terminal 42 to the management server 10 and outputting the voice data provided from the management server 10 by the portable terminal 42 will be described. FIG. 9 is a flow chart for explaining the operation at the time of voice provision in voice provision system 1 according to the present embodiment.
 携帯端末42にて管理サーバ10からの音声データの提供を受ける場合、図9に示すように、携帯端末42において音声出力アプリが起動される(ステップST901)。この音声出力アプリを起動することにより、管理サーバ10との間で音声提供システム1に関する情報を通信することが可能となる。音声出力アプリが起動されると、携帯端末42における所望の設定を入力するための設定入力画面が表示される(ステップST902)。 When the portable terminal 42 receives provision of audio data from the management server 10, as shown in FIG. 9, an audio output application is activated in the portable terminal 42 (step ST901). By activating this voice output application, it becomes possible to communicate information related to the voice providing system 1 with the management server 10. When the voice output application is activated, a setting input screen for inputting a desired setting on the portable terminal 42 is displayed (step ST902).
 図10は、本実施の形態に係る音声提供システム1で利用される設定入力画面1000の一例を示す図である。図10に示す設定入力画面1000は、携帯端末42の操作者が提供を受けたい音声データを指定するための画面である。図10に示すように、設定入力画面1000には、テキスト指定部1001、声紋指定部1002、リセットボタン1003及び終了ボタン1004が設けられている。 FIG. 10 is a diagram showing an example of a setting input screen 1000 used by the voice providing system 1 according to the present embodiment. The setting input screen 1000 shown in FIG. 10 is a screen for designating audio data that the operator of the portable terminal 42 wants to receive provision. As shown in FIG. 10, the setting input screen 1000 is provided with a text designation unit 1001, a voiceprint designation unit 1002, a reset button 1003 and an end button 1004.
 テキスト指定部1001は、携帯端末42の操作者が提供を受けたい音声データに対応するテキストを指定する部分である。テキスト指定部1001には、テキストの種別を示す「エンタメ」、「スポーツ」、「ニュース」、「経済」等のテキストを選択するボックス(テキスト選択ボックス)が設けられている。これらテキスト選択ボックスを選択することにより、管理サーバ10から提供される音声データに対応するテキストを特定することができる。 The text designating unit 1001 is a portion for designating a text corresponding to audio data that the operator of the portable terminal 42 wants to receive. The text designation unit 1001 is provided with a box (text selection box) for selecting a text such as “entertainment”, “sports”, “news”, “economy” or the like indicating the type of text. By selecting these text selection boxes, it is possible to specify the text corresponding to the audio data provided from the management server 10.
 図10においては、説明の便宜上、簡略化しているが、テキスト選択ボックスには様々なジャンルのテキストを含むコンテンツが表示される。また、テキスト選択ボックスを、テキスト登録者を識別可能なアイコンで構成することは実施の形態として好ましい。この場合には、携帯端末42の操作者は、直感的に所望のテキストを選択することが可能となる。 In FIG. 10, for convenience of explanation, although simplified, the text selection box displays content including text of various genres. In addition, it is preferable as an embodiment to configure the text selection box by an icon that can identify the text registrant. In this case, the operator of the portable terminal 42 can intuitively select a desired text.
 声紋指定部1002は、携帯端末42の操作者が提供を受けたい音声データの音源となる声紋データを指定する部分である。声紋指定部1002には、声紋データに対応する人物が属するカテゴリを選択するボックス(カテゴリ選択ボックス)が設けられている。これらのカテゴリ選択ボックスを選択することにより、声紋データに対応する人物の候補者を特定していくことができる。特定のカテゴリ選択ボックスが選択されると、声紋指定部1002には、そのカテゴリに属する複数の人物が表示される。操作者は、声紋指定部1002に表示される候補者を選択することにより、声紋データに対応する人物を特定することができる。また、声紋指定部1002には、声紋データに対応する人物を直接入力することができる入力欄が設けられている。 The voiceprint designating unit 1002 is a portion for designating voiceprint data serving as a sound source of voice data that the operator of the portable terminal 42 wants to receive. Voiceprint designating unit 1002 is provided with a box (category selection box) for selecting a category to which a person corresponding to voiceprint data belongs. By selecting these category selection boxes, it is possible to specify a candidate of a person corresponding to voiceprint data. When a specific category selection box is selected, the voiceprint designation unit 1002 displays a plurality of persons belonging to the category. The operator can specify a person corresponding to voiceprint data by selecting a candidate displayed on the voiceprint specification unit 1002. Further, voiceprint designation section 1002 is provided with an input field where a person corresponding to voiceprint data can be directly input.
 リセットボタン1003は、設定入力画面1000で選択、指定した情報をリセットする際に利用される。終了ボタン1004は、設定入力画面1000を用いた所望の設定の入力処理を終了する際に利用される。終了ボタン1004を選択することにより、設定入力画面1000を介して選択/入力したテキスト及び声紋データが管理サーバ10に送信される。 The reset button 1003 is used to reset information selected and specified on the setting input screen 1000. The end button 1004 is used when ending the input process of the desired setting using the setting input screen 1000. By selecting the end button 1004, the text and voiceprint data selected / input via the setting input screen 1000 are transmitted to the management server 10.
 なお、設定入力画面1000は、図10に示す例に限定されるものではなく適宜変更が可能である。設定入力画面1000に、設定したテキスト及び声紋データから生成される音声データに対する取扱いについて指定する部分を設けることは実施の形態として好ましい。例えば、設定したテキスト及び声紋データに、NGワードが含まれた場合の音声データの修正方法(NGワードの削除、置換)を指定するようにしてもよい。 The setting input screen 1000 is not limited to the example shown in FIG. 10, and can be changed as appropriate. It is preferable as an embodiment to provide the setting input screen 1000 with a portion for designating the handling of voice data generated from the set text and voiceprint data. For example, a method of correcting voice data (NG word deletion and replacement) may be designated when the set text and voiceprint data include an NG word.
 このような設定入力画面1000に対して操作者から所望の設定が入力され、終了ボタン1004が選択されると、管理サーバ10に対して設定情報が送信される(ステップST903)。この設定情報には、操作者が選択したテキストと、操作者が選択した声紋データ(より具体的には、声紋データに対応する人物に関する情報)が含まれる。 When the operator inputs a desired setting on such a setting input screen 1000 and the end button 1004 is selected, setting information is transmitted to the management server 10 (step ST 903). The setting information includes text selected by the operator and voiceprint data selected by the operator (more specifically, information on a person corresponding to the voiceprint data).
 携帯端末42から設定情報を受信すると、管理サーバ10において、この設定情報に含まれるテキスト及び声紋データが選択される(ステップST904)。管理サーバ10は、テキストDB113及び声紋DB114から設定情報に含まれるテキスト及び声紋データを選択する。そして、テキスト及び声紋データを選択した後、その声紋データに関連付けられたNGワード(個別NGワード)が、指定されたテキストに含まれるかを判定する判定処理(以下、「NG判定処理」という)が行われる(ステップST905)。 When the setting information is received from the portable terminal 42, the text and voiceprint data included in the setting information are selected in the management server 10 (step ST904). The management server 10 selects the text and voiceprint data included in the setting information from the text DB 113 and the voiceprint DB 114. Then, after the text and voiceprint data are selected, it is determined whether the NG word (individual NG word) associated with the voiceprint data is included in the designated text (hereinafter referred to as “NG determination process”) Is performed (step ST 905).
 ここで、このNG判定処理について説明する。図11は、本実施の形態に係る音声提供システム1におけるNG判定処理を説明するためのフロー図である。このNG判定処理は、主に管理サーバ10における生成部102(判定対象生成部111)及び判定部104で実行される。 Here, the NG determination process will be described. FIG. 11 is a flowchart for explaining the NG determination process in the voice providing system 1 according to the present embodiment. The NG determination processing is mainly executed by the generation unit 102 (the determination target generation unit 111) and the determination unit 104 in the management server 10.
 図11に示すように、NG判定処理において、まず判定対象生成部111は、上述したステップST904で選択されたテキストに対して第2判定対象生成処理(形態素解析処理)を行う(ステップST1101)。第2判定対象生成処理においては、選択されたテキストが形態素に分割される。すなわち、第2判定対象生成処理によりテキストから第2判定対象(図3B参照)が生成される。テキストから生成された第2判定対象は、記憶部103のテキストDB113に当該テキストに対応付けて登録される。 As shown in FIG. 11, in the NG determination processing, first, the determination target generation unit 111 performs a second determination target generation processing (morpheme analysis processing) on the text selected in step ST904 described above (step ST1101). In the second determination target generation process, the selected text is divided into morphemes. That is, the second determination target (see FIG. 3B) is generated from the text by the second determination target generation process. The second determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.
 第2判定対象が登録されると、判定部104は、この第2判定対象にNGワード(個別NGワード)が含まれるかを判定する判定処理(以下、「一次判定処理」という)を行う(ステップST1102)。この一次判定処理において、判定部104は、ステップST904で選択された声紋データに関連付けられた個別NGワードを第2NGワードDB116から読み出す。そして、判定部104は、この個別NGワードと第2判定対象とを一つ一つ比較することにより、テキストにおけるNGワード及びOKワードを判定する(ステップST1103)。これにより、テキストを構成する形態素とNGワードとが比較され、テキストに含まれるNGワードが検出される。 When the second determination target is registered, the determination unit 104 performs determination processing (hereinafter, referred to as “primary determination processing”) to determine whether the second determination target includes an NG word (individual NG word) Step ST1102). In this primary determination process, the determination unit 104 reads the individual NG word associated with the voiceprint data selected in step ST 904 from the second NG word DB 116. Then, the determination unit 104 determines the NG word and the OK word in the text by comparing the individual NG word and the second determination target one by one (step ST1103). As a result, the morpheme making up the text and the NG word are compared, and the NG word included in the text is detected.
 テキストからOKワードが検出されると(ステップST1103:OK)、判定対象生成部111は、そのOKワードの発音語を生成する(ステップST1104)。ここで、OKワードが検出される場合とは、テキストからNGワードに該当しない第2判定対象が検出された場合に相当する。一方、判定部104は、個別NGワードの発音語(以下、「NG音」という)を生成する(ステップST1105)。この場合、生成されたOKワードの発音語はテキストDB113に登録され、生成されたNG音は、第2NGワードDB116に登録される。 When the OK word is detected from the text (step ST1103: OK), the determination target generation unit 111 generates a phonetic word of the OK word (step ST1104). Here, the case where the OK word is detected corresponds to the case where the second determination target that does not correspond to the NG word is detected from the text. On the other hand, the determination unit 104 generates a phonetic word of the individual NG word (hereinafter referred to as “NG sound”) (step ST1105). In this case, the pronunciation word of the generated OK word is registered in the text DB 113, and the generated NG sound is registered in the second NG word DB 116.
 OKワードの発音語及びNG音が生成されると、判定部104は、OKワードの発音語にNG音が含まれるかを判定する判定処理(以下、「二次判定処理」という)を行う(ステップST1106)。この二次判定処理において、判定部104は、OKワードの発音語とNG音とを一つ一つ比較することにより、NGワード及びOKワードを判定する(ステップST1107)。これにより、一次判定処理でOKワードと判定された第2判定対象の発音語とNGワードの発音語とが比較され、テキストに含まれるNGワードが検出される。 When the pronunciation word and the NG sound of the OK word are generated, the determination unit 104 performs a determination process (hereinafter, referred to as “secondary determination process”) to determine whether the NG word is included in the pronunciation word of the OK word (hereinafter Step ST1106). In this secondary determination process, the determination unit 104 determines the NG word and the OK word by comparing the pronunciation word of the OK word and the NG sound one by one (step ST1107). As a result, the phonetic word of the second determination target determined as the OK word in the primary determination processing is compared with the phonetic word of the NG word, and the NG word included in the text is detected.
 二次判定処理においても、OKワードが検出されると、判定対象生成部111は、ステップST904で選択されたテキストに対して第1判定対象生成処理を行う(ステップST1108)。第1判定対象生成処理においては、選択されたテキストの発音語が生成され、その発音語を任意箇所で区切った判定対象が生成される。すなわち、第1判定対象生成処理によりテキストから第1判定対象(図3A参照)が生成される。テキストから生成された第1判定対象は、記憶部103のテキストDB113に当該テキストに関連付けて登録される。 Also in the secondary determination process, when the OK word is detected, the determination target generation unit 111 performs the first determination target generation process on the text selected in step ST 904 (step ST 1108). In the first determination target generation process, a phonetic word of the selected text is generated, and a determination target in which the phonetic word is divided at an arbitrary position is generated. That is, the first determination target (see FIG. 3A) is generated from the text by the first determination target generation process. The first determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.
 第1判定対象が登録されると、判定部104は、この第1判定対象にNG音が含まれるかを判定する判定処理(以下、「三次判定処理」という)を行う(ステップST1109)。この三次判定処理において、判定部104は、第1判定対象であるテキストの発音語の任意の組み合わせと、第2NGワードDB116に登録された個別NGワードのNG音との合致を一つ一つ比較することにより、NGワードを判定する(ステップST1110)。これにより、テキストの発音語の任意の組み合わせとNG音とが比較され、一次判定処理及び二次判定処理で検出されなかったNGワードが検出される。例えば、図3Aに示す例において、第1判定対象の一つとして「きょう-はよ-いて-んき-です(KYOU-WAYO-ITE-NKI-DESU)」が生成されている場合、「きょう(KYOU)」、「はよ(WAYO)」、「いて(ITE)」、「んき(NKI)」、「です(DESU)」のそれぞれの発音語とNG音とが比較される。 When the first determination target is registered, the determination unit 104 performs determination processing (hereinafter, referred to as “third determination processing”) to determine whether the first determination target includes an NG sound (step ST1109). In this tertiary determination process, the determination unit 104 compares, one by one, the match between an arbitrary combination of the phonetic words of the first determination target text and the NG sound of the individual NG word registered in the second NG word DB 116 By doing this, an NG word is determined (step ST1110). Thereby, the NG sound is compared with an arbitrary combination of the phonetic words of the text, and the NG word not detected in the primary determination process and the secondary determination process is detected. For example, in the example shown in FIG. 3A, if "Kyou-Ye-Ike-Nki- (KYOU-WAYO-ITE-NKI-DESU)" is generated as one of the first judgment targets, "Kyou" The phonetic words of (KYOU), “Hayo (WAYO)”, “Ite (ITE)”, “Nki (NKI)” and “I will (DESU)” are compared with the NG sound.
 三次判定処理において、NGワードが検出されなかった場合(ステップST1110:No)、判定部104は、NG判定処理の判定結果として、テキストにNGワードが含まれていないことを示す判定(OK判定)を選択する(ステップST1111)。 If an NG word is not detected in the tertiary determination process (step ST1110: No), the determination unit 104 determines that the text does not include the NG word as a determination result of the NG determination process (OK determination) Is selected (step ST1111).
 一方、三次判定処置において、NGワードが検出された場合(ステップST1110:Yes)、一次判定処理において、NGワードが検出された場合(ステップST1103:NG)及び二次判定処理においてNGワードが検出された場合(ステップST1107:NG)には、判定部104は、テキストにおけるNGワードの箇所を記録する(ステップST1112)。そして、判定部104は、NG判定処理の判定結果として、テキストにNGワードが含まれていることを示す判定(NG判定)を選択する(ステップST1113)。 On the other hand, if an NG word is detected in the tertiary determination process (step ST1110: Yes), an NG word is detected in the primary determination process (step ST1103: NG) and an NG word is detected in the secondary determination process. If it is determined (step ST1107: NG), the determination unit 104 records the location of the NG word in the text (step ST1112). Then, the determination unit 104 selects the determination (NG determination) indicating that the text includes the NG word as the determination result of the NG determination process (step ST1113).
 ステップST1111にてOK判定を選択し、或いは、ステップST1113にてNG判定を選択すると、判定部104は、NG判定処理を終了する。このようなNG判定処理により、ステップST904で選択された声紋データに関連付けられたNGワード(個別NGワード)が、選択されたテキストに含まれるか否かが判定される。 If the OK determination is selected in step ST1111 or the NG determination is selected in step ST1113, the determination unit 104 ends the NG determination process. With such an NG determination process, it is determined whether the NG word (individual NG word) associated with the voiceprint data selected in step ST 904 is included in the selected text.
 このようなNG判定処理においては、三次判定処理において、テキストを任意箇所で区切って生成される第1判定対象に対応する発音と個別NGワードに対応する発音(NG音)との比較によりNGワードが判定される。このため、テキストを構成する文字や数字の任意の組み合わせの発音とNG音とを比較することができる。これにより、テキストの文脈に関わらず、テキストに含まれるNGワードを精度良く検出することができる。 In such an NG determination process, in the tertiary determination process, an NG word is generated by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary location with the pronunciation (NG sound) corresponding to the individual NG word. Is determined. For this reason, it is possible to compare the pronunciation and the NG sound of any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.
 また、NG判定処理においては、第1判定対象とNGワードとの比較(三次判定処理)に先立って、テキストを構成する形態素(第2判定対象)に基づいてテキストに含まれるNGワード及びOKワードが判定される(一次判定処理)。これにより、第1判定対象に先立って、第2判定対象との比較によりテキストに形態素として含まれるNGワードを確実に検出することができる。また、テキストに含まれるNGワードを段階的に判定することができるので、NGワードの検出漏れを低減することができる。 Also, in the NG determination process, prior to the comparison between the first determination target and the NG word (tertiary determination process), the NG word and the OK word included in the text based on the morpheme (second determination target) constituting the text Is determined (primary determination processing). As a result, prior to the first determination target, the NG word included as a morpheme in the text can be reliably detected by comparison with the second determination target. Further, since the NG word included in the text can be determined stepwise, it is possible to reduce the detection omission of the NG word.
 さらに、NG判定処理においては、第1判定対象とNGワードとの比較(三次判定処理)に先立って、一次判定処理にてOKワードと判定された第2判定対象に対応する発音とNGワードに対応する発音との比較によりNGワードが判定される(二次判定処理)。このため、OKワードと判定された形態素の意味に関わらず、当該形態素に含まれるNGワードを検出することができる。 Furthermore, in the NG determination process, prior to the comparison between the first determination target and the NG word (tertiary determination process), the pronunciation and the NG word corresponding to the second determination target determined as the OK word in the primary determination process. The NG word is determined by comparison with the corresponding pronunciation (secondary determination processing). Therefore, regardless of the meaning of the morpheme determined to be the OK word, the NG word included in the morpheme can be detected.
 NG判定処理を終了すると、判定部104により、図9に示すように、NG判定処理の判定結果がOK判定か、NG判定かが判定される(ステップST906)。ここで、OK判定の場合(ステップST906:OK)、音声生成部112により音声データが生成される(ステップST907)。この場合、音声生成部112は、ステップST904で選択されたテキストに修正等の処理を施すことなく、ステップST904で選択された声紋データを用いてテキストに対応する音声データを生成する。そして、生成された音声データは、管理サーバ10から携帯端末42に出力される(ステップST908)。 When the NG determination process ends, the determination unit 104 determines whether the determination result of the NG determination process is an OK determination or an NG determination as shown in FIG. 9 (step ST 906). Here, in the case of the OK determination (step ST 906: OK), the audio generation unit 112 generates audio data (step ST 907). In this case, the voice generation unit 112 generates voice data corresponding to the text using the voiceprint data selected in step ST904 without performing processing such as correction on the text selected in step ST904. Then, the generated voice data is output from the management server 10 to the portable terminal 42 (step ST 908).
 一方、NG判定処理の判定結果がNG判定の場合(ステップST906:NG)、音声生成部112によりテキストの一部を修正した音声データ(修正音声データ)が生成される(ステップST909)。この場合、音声生成部112は、ステップST904で選択されたテキストにおけるNGワードに対応する部分を修正したテキストに対応する音声データを生成する。なお、テキストにおけるNGワードに対応する部分以外の部分は、ステップST904で選択された声紋データを用いて音声データが生成される。 On the other hand, if the determination result of the NG determination process is an NG determination (step ST906: NG), the voice generation unit 112 generates voice data (modified voice data) in which a portion of the text is corrected (step ST909). In this case, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word in the text selected in step ST904. In the portion other than the portion corresponding to the NG word in the text, voice data is generated using the voiceprint data selected in step ST904.
 NGワードに対応する部分を修正する際、音声生成部112は、テキストにおけるNGワードに対応する部分を削除した音声データを生成することができる。また、音声生成部112は、テキストにおけるNGワードに対応する部分を置換した音声データを生成することができる。NGワードを置換する態様として、音声生成部112は、例えば、ステップST904で選択した声紋データと異なる声紋データを用いた音声データを生成することができる。例えば、NGワードに対応する部分のみを、予め定めた声紋データを用いて音声データを生成することができる。また、音声生成部112は、テキストにおけるNGワードに対応する部分を他の表現の単語に置換した音声データを生成することができる。なお、NGワードに対応する部分に対する修正の態様は、テキスト登録者や声紋登録者の意向を考慮して選択することが好ましい。 When correcting the part corresponding to the NG word, the speech generation unit 112 can generate speech data in which the part corresponding to the NG word in the text is deleted. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced. As an aspect of replacing the NG word, the voice generation unit 112 can generate voice data using voiceprint data different from the voiceprint data selected in step ST904, for example. For example, voice data can be generated using predetermined voiceprint data only for the part corresponding to the NG word. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced with a word of another expression. In addition, it is preferable to select the mode of the correction with respect to the part corresponding to NG word in consideration of the intention of a text registrant or a voiceprint registrant.
 NGワードに対応する部分を修正して生成された音声データ(修正音声データ)は、管理サーバ10から携帯端末42に出力される(ステップST910)。ステップST908又はステップST910にて管理サーバ10から音声データを受け取ると、携帯端末42にて、スピーカ等を介して音声出力される(ステップST911)。この音声出力により音声提供システム1における音声提供時の動作が終了する。 The voice data (corrected voice data) generated by correcting the part corresponding to the NG word is output from the management server 10 to the portable terminal 42 (step ST 910). When audio data is received from the management server 10 in step ST 908 or step ST 910, the portable terminal 42 outputs audio via a speaker or the like (step ST 911). By this voice output, the operation at the time of voice provision in the voice provision system 1 ends.
 以上説明したように、本実施の形態に係る音声提供システム1において、管理サーバ10の音声生成部112は、判定部104の判定結果に応じて、記憶部103に登録されたテキストから特定の声紋データを用いた音声データを生成する。これにより、判定部104の判定結果に応じてテキストから特定の声紋データを用いた音声データが生成される。このため、テキストに含まれるNGワードの有無に応じて音声データを切り替えて生成することができる。これにより、テキストに含まれるNGワードの有無に応じて態様の異なる音声データを携帯端末42に提供することができる。 As described above, in the voice providing system 1 according to the present embodiment, the voice generation unit 112 of the management server 10 determines the specific voice print from the text registered in the storage unit 103 according to the determination result of the determination unit 104. Generate voice data using data. As a result, according to the determination result of the determination unit 104, voice data using specific voiceprint data is generated from the text. Therefore, it is possible to switch and generate voice data according to the presence or absence of the NG word included in the text. As a result, it is possible to provide the portable terminal 42 with audio data of different modes according to the presence or absence of the NG word included in the text.
 ここで、記憶部103に登録されたテキストにNGワードが含まれない場合、音声生成部112は、テキストに対応する音声データを生成する。これにより、特別な修正等が施されることなく、NGワードを含まないテキストに対応する音声データが生成される。このため、迅速にテキストに対応する音声データを携帯端末42に提供することができる。 Here, when the text registered in the storage unit 103 does not include the NG word, the voice generation unit 112 generates voice data corresponding to the text. As a result, voice data corresponding to the text not including the NG word is generated without any special correction or the like. Therefore, voice data corresponding to text can be quickly provided to the portable terminal 42.
 一方、記憶部103に登録されたテキストにNGワードが含まれる場合、音声生成部112は、NGワードに対応する部分を修正したテキストに対応する音声データを生成する。これにより、NGワードを含むテキストであっても、当該NGワードの部分が修正された音声データを携帯端末42に提供することができる。 On the other hand, when an NG word is included in the text registered in the storage unit 103, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word. As a result, even in the case of text including an NG word, audio data in which the portion of the NG word is corrected can be provided to the portable terminal 42.
 例えば、記憶部103に登録されたテキストにNGワードが含まれる場合、音声生成部112は、テキストに含まれるNGワードに対応する部分を削除又は置換することができる。NGワードに対応する部分を削除する場合には、NGワードを含むテキストであっても、テキストに含まれるNGワードが確実に削除された音声データを携帯端末42に提供することができる。一方、NGワードに対応する部分を置換する場合には、NGワードを含む音声データが特定の声紋データを用いてそのまま携帯端末42に提供されるのを防止することができる。 For example, when an NG word is included in the text registered in the storage unit 103, the voice generation unit 112 can delete or replace a portion corresponding to the NG word included in the text. When the portion corresponding to the NG word is deleted, it is possible to provide the portable terminal 42 with voice data in which the NG word contained in the text is surely deleted even if it is a text including the NG word. On the other hand, when the part corresponding to the NG word is replaced, it is possible to prevent the voice data including the NG word from being provided as it is to the portable terminal 42 using the specific voiceprint data.
 NGワードに対応する部分を置換する際には、当該部分を特定の声紋データと異なる声紋データを用いることができる。この場合には、NGワードを含むテキストであっても、テキストが有する意味を変更させることなく、これに応じた音声データを携帯端末42に提供することができる。また、NGワードに対応する部分を置換する際には、当該部分を異なる表現の単語に置換することができる。この場合には、NGワードを含むテキストであっても、テキストが有する意味を大幅に変更させることなく、これに応じた音声データを携帯端末42に提供することができる。 When replacing the part corresponding to the NG word, voiceprint data different from the specific voiceprint data can be used for that part. In this case, even if the text includes an NG word, audio data according to the text can be provided to the portable terminal 42 without changing the meaning of the text. Also, when replacing the part corresponding to the NG word, the part can be replaced with a word of a different expression. In this case, even if the text includes an NG word, audio data according to the text can be provided to the portable terminal 42 without significantly changing the meaning of the text.
 また、本実施の形態に係る音声提供システム1において、管理サーバ10の記憶部103(より具体的には、第2NGワードDB116)には、声紋登録端末30から登録された声紋データに関連付けられるNGワードが登録される。このため、判定部104によって特定の声紋データに関連付けられるNGワード(個別NGワード)がテキストに含まれるかが判定される。これにより、特定の声紋データに関連付けられるNGワードを含む音声データが携帯端末42に提供されるのを確実に防止することができる。 Further, in the voice providing system 1 according to the present embodiment, NG stored in the storage unit 103 (more specifically, the second NG word DB 116) of the management server 10 is associated with voiceprint data registered from the voiceprint registration terminal 30. Word is registered. For this reason, it is determined by the determination unit 104 whether the NG word (individual NG word) associated with the specific voiceprint data is included in the text. This makes it possible to reliably prevent the voice data including the NG word associated with the specific voiceprint data from being provided to the portable terminal 42.
 さらに、本実施の形態に係る音声提供システム1において、管理サーバ10の判定部104は、テキスト登録端末20からのテキスト登録時に、当該テキストに含まれるNGワード(一般NGワード)の有無を判定する。これにより、テキスト登録端末20からのテキストの登録段階において、NGワードを含むテキストが登録されるのを防止することができる。 Furthermore, in the voice providing system 1 according to the present embodiment, the determination unit 104 of the management server 10 determines the presence or absence of the NG word (general NG word) included in the text at the time of text registration from the text registration terminal 20. . This makes it possible to prevent the registration of the text including the NG word in the registration phase of the text from the text registration terminal 20.
 この場合において、判定部104は、テキストの属性情報に関連付けられるNGワード(一般NGワード)の有無を判定する。これにより、テキスト登録端末20からのテキストの登録段階において、属性情報から特定されるNGワードを含むテキストが登録されるのを防止することができる。 In this case, the determination unit 104 determines the presence or absence of the NG word (general NG word) associated with the attribute information of the text. Thereby, it is possible to prevent the registration of the text including the NG word specified from the attribute information in the registration phase of the text from the text registration terminal 20.
 なお、本発明は上記実施の形態に限定されず、さまざまに変更して実施可能である。上記実施の形態において、添付図面に図示されている構成要素については、これに限定されず、本発明の効果を発揮する範囲内で適宜変更が可能である。その他、本発明の目的の範囲を逸脱しない限りにおいて適宜変更して実施可能である。 The present invention is not limited to the above embodiment, and can be implemented with various modifications. In the embodiment described above, the components illustrated in the attached drawings are not limited to the above, and various modifications can be made as long as the effects of the present invention are exhibited. In addition, the present invention can be modified as appropriate without departing from the scope of the object of the present invention.
 例えば、上記実施の形態においては、図11に示すNG判定処理における三次判定処理において、判定部104が、第1判定対象であるテキストの発音語の任意の組み合わせと、第2NGワードDB116に登録されたNGワード(個別NGワード)のNG音との合致を一つ一つ比較する場合について説明している。しかしながら、三次判定処理における比較方法については、これに限定されるものではなく適宜変更が可能である。例えば、判定部104は、第1の判定対象に対応する発音と、NGワードに対応する発音との部分的な一致によりNGワードを判定するようにしてもよい。この場合には、第1の判定対象及びNGワードの発音が完全に一致する場合に限らず、テキストに含まれるNGワードと部分的に一致する単語を検出することができる。これにより、テキストに含まれるNGワードと類似する単語まで検出することができる。なお、NGワードに対応する発音との部分的な一致に関する割合は、予め定めてもよいし、実績に基づく機械学習やベイズ統計により定めてもよい。 For example, in the above embodiment, in the tertiary determination process in the NG determination process shown in FIG. 11, the determination unit 104 is registered in the second NG word DB 116 with any combination of the phonetic words of the text that is the first determination target. The case of comparing the match of the NG word (individual NG word) with the NG sound one by one is described. However, the comparison method in the tertiary determination process is not limited to this, and can be changed as appropriate. For example, the determination unit 104 may determine the NG word based on a partial match between the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word. In this case, not only when the first judgment target and the pronunciation of the NG word completely match, it is possible to detect a word that partially matches the NG word included in the text. Thus, even words that are similar to the NG word included in the text can be detected. Note that the ratio regarding partial agreement with the pronunciation corresponding to the NG word may be determined in advance, or may be determined by machine learning based on the results or Bayesian statistics.
 また、上記実施の形態においては、図11に示すNG判定処理における三次判定処理において、判定部104が、第1の判定対象に対応する発音とNGワード(個別NGワード)に対応する発音との比較によりNGワードを判定する場合について説明している。しかしながら、判定部104による判定方法は、これに限定されるものではなく適宜変更が可能である。例えば、第1の判定対象を構成する文字列とNGワードを構成する文字列との比較によりNGワードを判定するようにしてもよい。この場合には、テキストを任意箇所で区切って生成される第1判定対象を構成する文字列とNGワードを構成する文字列との比較によりNGワードが判定される。このため、テキストを構成する文字や数字の任意の組み合わせとNGワードとを比較することができる。これにより、テキストの文脈に関わらず、テキストに含まれるNGワードを精度良く検出することができる。 Further, in the above embodiment, in the tertiary determination process in the NG determination process shown in FIG. 11, the determination unit 104 determines whether the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word (individual NG word). The case where the NG word is determined by the comparison is described. However, the determination method by the determination unit 104 is not limited to this, and can be changed as appropriate. For example, the NG word may be determined by comparing the character string forming the first determination target with the character string forming the NG word. In this case, the NG word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary position and the character string forming the NG word. For this reason, it is possible to compare an NG word with any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.
 さらに、上記実施の形態においては、管理サーバ10が、記憶部103に登録されたテキストから特定の声紋データを用いた音声データを生成し、生成した音声データを携帯端末42等に提供する場合について説明している。しかしながら、携帯端末42等に提供される情報については、音声データのみに限定されるものではなく適宜追加することが可能である。例えば、音声データに加え、その生成に用いたテキストを一緒に提供するようにしてもよい。また、音声データに加え、画像データ、動画データ又はコンピュータグラフィックス(CG)を提供するようにしてもよい。この場合、音声データに関連する画像データや動画データ等を提供することは実施の形態として好ましい。 Furthermore, in the above embodiment, the case where management server 10 generates voice data using specific voiceprint data from the text registered in storage unit 103 and provides the generated voice data to portable terminal 42 etc. Explained. However, the information provided to the portable terminal 42 or the like is not limited to only voice data, and can be added as appropriate. For example, in addition to the audio data, the text used for the generation may be provided together. In addition to audio data, image data, moving image data or computer graphics (CG) may be provided. In this case, it is preferable as an embodiment to provide image data and moving image data related to audio data.
 さらに、上記実施の形態においては、記憶部103に登録されたテキストにNGワードが含まれる場合、音声生成部112により、NGワードに対応する部分を置換することができる。このようなテキストの一部の置換は、NGワード以外のテキストの一部に適用することもできる。例えば、テキストに含まれる特定の単語を、予め用意された異なる単語に置換するようにしてもよい。この場合、例えば、判定部104によってテキストから特定の単語が検出されると、音声生成部112は、予め用意された異なる単語に置換することができる。 Furthermore, in the above embodiment, when the text registered in the storage unit 103 contains an NG word, the voice generation unit 112 can replace the portion corresponding to the NG word. Such partial substitution of text can also be applied to parts of text other than NG words. For example, a specific word included in the text may be replaced with a different prepared word. In this case, for example, when the determination unit 104 detects a specific word from the text, the voice generation unit 112 can replace it with a different word prepared in advance.
 このような置換の態様について具体例を示す。例えば、図3Aに示すテキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」のうち、「です(DESU)」が置換対象となる単語(以下、「置換対象ワード」という)として登録されているものとする。また、この置換対象ワードに置換される単語(以下、「置換ワード」という)として、「にゃん(NYAN)」が予め登録されているものとする。例えば、これらの置換対象ワードや置換ワードは、声紋登録端末30から登録することができる。 A specific example is shown about the aspect of such substitution. For example, it is assumed that "is (DESU)" is registered as a replacement target word (hereinafter referred to as "replacement target word") in the text "It is good weather today (KYOUWAY OITENKIDESU)" shown in FIG. 3A. . Also, it is assumed that "NYAN" is registered in advance as a word to be replaced with the replacement target word (hereinafter referred to as "replacement word"). For example, these replacement target words and replacement words can be registered from the voiceprint registration terminal 30.
 このようにテキストに含まれる特定の単語を置換する実施の形態において、判定部104は、テキスト「今日は良い天気です(KYOUWAYOITENKIDESU)」に置換対象ワードである「です(DESU)」が含まれるか判定する(置換判定処理)。この置換判定処理は、例えば、図9に示すステップST905のNG判定処理に置き換えられる。この置換判定処理における判定対象は、上述した第1判定対象(テキストを発音語に変換し、任意箇所で区切った判定対象)や、第2判定対象(テキストを形態素に分割した判定対象)とすることができる。置換判定処理において、判定部104は、第2判定対象及び/又は第1判定対象に置換対象ワードが含まれるかを判定する。 As described above, in the embodiment in which a specific word included in the text is replaced, the determination unit 104 determines whether the text “I have a good weather today (KYOUWAY OITENKIDESU)” includes “is (DESU)” which is a replacement target word Judgment (replacement judgment processing). This replacement determination process is replaced with, for example, the NG determination process of step ST 905 shown in FIG. The determination target in this substitution determination process is the first determination target described above (a determination target obtained by converting a text into a phonetic word and divided into arbitrary parts) or a second determination target (a determination target obtained by dividing text into morphemes) be able to. In the replacement determination process, the determination unit 104 determines whether the second determination target and / or the first determination target includes a replacement target word.
 置換判定処理により、置換対象ワードである「です(DESU)」が検出されると、音声生成部112は、この置換対象ワードを置換ワードである「にゃん(NYAN)」に置換した音声データ(置換音声データ)を生成する。これにより、置換音声データとして、「今日は良い天気にゃん(KYOUWAYOITENKINYAN)」というテキストに対応する音声データが生成される。そして、生成された置換音声データは、管理サーバ10から携帯端末42に送信される。携帯端末42においては、この置換音声データをスピーカ等で音声出力する。これにより、「今日は良い天気にゃん(KYOUWAYOITENKINYAN)」という音声データが携帯端末42から出力される。 When the replacement determination process detects “I (DESU)” as a replacement target word, the voice generation unit 112 replaces the replacement target word with “Nyan (NYAN)” as a replacement word (voice data (replacement). Audio data). As a result, voice data corresponding to the text "Kyouwayoite nekiyan" is generated as the replacement voice data. Then, the generated substitute voice data is transmitted from the management server 10 to the portable terminal 42. In the portable terminal 42, the substitute audio data is output as audio by a speaker or the like. As a result, voice data of “Good weather today (KYOUWAY ITE NKIN YAN)” is output from the portable terminal 42.
 このようにテキストに含まれる特定の単語を置換する実施の形態においては、例えば、アニメ等の特定のキャラクターの声紋データが設定入力画面(図10参照)から選択された場合において、新聞記事等のテキストに基づいて当該キャラクターの話し方に合わせた音声データを携帯端末42に提供することができる。これにより、例えば、特定のキャラクターによって新聞記事等を読み上げる音声提供サービスを提供することができる。 Thus, in the embodiment in which a specific word included in the text is replaced, for example, when the voiceprint data of a specific character such as animation is selected from the setting input screen (see FIG. 10), the newspaper article etc. Audio data can be provided to the portable terminal 42 in accordance with the way of speaking of the character based on the text. Thereby, for example, it is possible to provide an audio providing service for reading newspaper articles and the like by a specific character.
 さらに、上記実施の形態においては、日本語を用いたテキストが判定対象となる場合におけるNGワードの判定を中心に説明している。しかしながら、NGワードの判定対象は、日本語に限定されることなく、世界各国で用いられる任意の言語に適用することができる。また、複数の言語に跨ってNGワードの有無を判定するようにしてもよい。例えば、英語を用いたテキストに対応する発音が、日本語のNGワードに対応する発音に一致又は類似する場合、英語を用いたテキストの一部をNGワードと判定することができる。 Furthermore, in the above embodiment, the determination of the NG word in the case where the text using Japanese is to be determined is mainly described. However, the determination target of the NG word is not limited to Japanese, and can be applied to any language used worldwide. Further, the presence or absence of an NG word may be determined across multiple languages. For example, when the pronunciation corresponding to the text using English matches or is similar to the pronunciation corresponding to the NG word in Japanese, part of the text using English can be determined as the NG word.
 本発明の判定装置及びこれを用いた音声提供システムによれば、テキストの文脈に関わらず、テキストに含まれる非許容単語を精度良く検出することができるという効果を奏し、特に、特定の声紋データを用いてテキストを読み上げる音声提供サービス等に好適に用いることができる。 According to the determination apparatus of the present invention and the voice providing system using the same, it is possible to accurately detect non-permissive words contained in the text regardless of the context of the text, and in particular, specific voiceprint data It can be suitably used for a voice providing service that reads texts using
 本出願は、2015年12月2日出願の特願2015-235703に基づく。この内容は、全てここに含めておく。 This application is based on Japanese Patent Application No. 2015-235703 filed on Dec. 2, 2015. All this content is included here.

Claims (14)

  1.  自然言語で構成されたテキストに含まれる非許容単語を判定する判定装置であって、
     前記テキストを任意箇所で区切って第1の判定対象を生成する判定対象生成部と、
     前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定する判定部と、
    を具備することを特徴とする判定装置。
    A determination apparatus for determining non-permissive words contained in texts configured in natural language, comprising:
    A determination target generation unit that generates a first determination target by dividing the text at an arbitrary position;
    A determination unit that determines the non-permissive word by comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word;
    A determination apparatus comprising:
  2.  前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との部分的な一致により前記非許容単語を判定することを特徴とする請求項1に記載の判定装置。 The determination according to claim 1, wherein the determination unit determines the non-permissive word by partially matching the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. apparatus.
  3.  前記判定対象生成部は、前記テキストを形態素に分割して第2の判定対象を生成し、
     前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較の前に、前記第2の判定対象と前記非許容単語との比較により前記非許容単語及び前記非許容単語でない許容単語を判定することを特徴とする請求項1又は請求項2に記載の判定装置。
    The determination target generation unit generates a second determination target by dividing the text into morphemes.
    The determination unit is configured to compare the second determination target with the non-permissive word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. The judgment device according to claim 1 or 2, wherein the permission word which is not the non-permission word is determined.
  4.  前記判定部は、前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較の前に、前記許容単語と判定された前記第2の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定することを特徴とする請求項3に記載の判定装置。 The determination unit determines a pronunciation corresponding to the second determination target determined as the allowable word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. The determination apparatus according to claim 3, wherein the non-permissive word is determined by comparison with a pronunciation corresponding to the non-permissive word.
  5.  自然言語で構成されたテキストに含まれる非許容単語を判定する判定装置であって、
     前記テキストを任意箇所で区切って第1の判定対象を生成する判定対象生成部と、
     前記第1の判定対象を構成する文字列と前記非許容単語を構成する文字列との比較により前記非許容単語を判定する判定部と、
    を具備することを特徴とする判定装置。
    A determination apparatus for determining non-permissive words contained in texts configured in natural language, comprising:
    A determination target generation unit that generates a first determination target by dividing the text at an arbitrary position;
    A determination unit that determines the non-permissive word by comparing the character string that configures the first determination target with the character string that configures the non-permissive word;
    A determination apparatus comprising:
  6.  請求項1から請求項5のいずれかに記載の判定装置を備え、特定の声紋データに基づいて前記テキストに応じた音声を提供する音声提供システムであって、
     前記判定装置は、前記判定部の判定結果に応じて前記テキストから前記特定の声紋データを用いた音声データを生成する音声生成部を具備することを特徴とする音声提供システム。
    A voice providing system comprising the determination device according to any one of claims 1 to 5 and providing voice according to the text based on specific voiceprint data,
    The voice providing system according to claim 1, wherein the determination device comprises a voice generation unit that generates voice data using the specific voiceprint data from the text according to the determination result of the determination unit.
  7.  前記音声生成部は、前記テキストに前記非許容単語が含まれない場合に、当該テキストに対応する音声データを生成することを特徴とする請求項6に記載の音声提供システム。 The voice providing system according to claim 6, wherein the voice generation unit generates voice data corresponding to the text when the text does not include the non-permitted word.
  8.  前記音声生成部は、前記テキストに前記非許容単語が含まれる場合に、当該非許容単語に対応する部分を修正した前記テキストに対応する音声データを生成することを特徴とする請求項6又は請求項7に記載の音声提供システム。 7. The voice generation unit according to claim 6, wherein the voice generation unit generates voice data corresponding to the text in which a portion corresponding to the non-permissive word is corrected, when the non-permissive word is included in the text. The voice provision system according to Item 7.
  9.  前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を削除することを特徴とする請求項8に記載の音声提供システム。 The voice providing system according to claim 8, wherein the voice generation unit deletes a portion corresponding to the non-permitted word included in the text.
  10.  前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を置換することを特徴とする請求項8に記載の音声提供システム。 The voice providing system according to claim 8, wherein the voice generation unit replaces a portion corresponding to the non-permitted word included in the text.
  11.  前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分に前記特定の声紋データと異なる声紋データを用いることを特徴とする請求項10に記載の音声提供システム。 11. The voice providing system according to claim 10, wherein the voice generation unit uses voiceprint data different from the specific voiceprint data for a portion corresponding to the non-permitted word included in the text.
  12.  前記音声生成部は、前記テキストに含まれる前記非許容単語に対応する部分を異なる表現の単語に置換することを特徴とする請求項10に記載の音声提供システム。 11. The voice providing system according to claim 10, wherein the voice generation unit replaces a portion corresponding to the non-permitted word included in the text with a word of a different expression.
  13.  前記判定装置は、前記特定の声紋データを記憶する記憶部を具備し、
     前記記憶部は、前記特定の声紋データに関連付けられる前記非許容単語を記憶することを特徴とする請求項6に記載の音声提供システム。
    The determination apparatus includes a storage unit that stores the specific voiceprint data.
    The voice providing system according to claim 6, wherein the storage unit stores the non-permitted word associated with the specific voiceprint data.
  14.  自然言語で構成されたテキストに含まれる非許容単語を判定する判定方法であって、
     前記テキストを任意箇所で区切って第1の判定対象を生成するステップと、
     前記第1の判定対象に対応する発音と前記非許容単語に対応する発音との比較により前記非許容単語を判定するステップと、を具備することを特徴とする判定方法。
    A determination method for determining a non-permissive word included in a text composed of natural language,
    Dividing the text at an arbitrary position to generate a first determination target;
    Determining the non-permissive word by comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word.
PCT/JP2016/083894 2015-12-02 2016-11-16 Determination device and voice provision system provided therewith WO2017094500A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017553757A JP6836033B2 (en) 2015-12-02 2016-11-16 Judgment device and voice providing system equipped with this

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015-235703 2015-12-02
JP2015235703 2015-12-02

Publications (1)

Publication Number Publication Date
WO2017094500A1 true WO2017094500A1 (en) 2017-06-08

Family

ID=58797215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/083894 WO2017094500A1 (en) 2015-12-02 2016-11-16 Determination device and voice provision system provided therewith

Country Status (3)

Country Link
JP (1) JP6836033B2 (en)
TW (1) TWI717426B (en)
WO (1) WO2017094500A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019054321A (en) * 2017-09-12 2019-04-04 株式会社Nttドコモ Communication controller and terminal
CN110164413A (en) * 2019-05-13 2019-08-23 北京百度网讯科技有限公司 Phoneme synthesizing method, device, computer equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102362533B1 (en) * 2021-03-18 2022-02-15 주식회사 로컬링크 Hospital's counter offer curating server using remote video and Hospital's counter offer curating program using remote video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05165486A (en) * 1991-12-18 1993-07-02 Oki Electric Ind Co Ltd Text voice transforming device
JP2004271727A (en) * 2003-03-06 2004-09-30 Seiko Epson Corp Voice data providing system and device and program for generating voice data
JP2007316303A (en) * 2006-05-25 2007-12-06 Nippon Telegr & Teleph Corp <Ntt> Voice synthesizing method and device with speaker selection function, and voice synthesizing program with speaker selection function
JP2007334144A (en) * 2006-06-16 2007-12-27 Oki Electric Ind Co Ltd Speech synthesis method, speech synthesizer, and speech synthesis program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05165486A (en) * 1991-12-18 1993-07-02 Oki Electric Ind Co Ltd Text voice transforming device
JP2004271727A (en) * 2003-03-06 2004-09-30 Seiko Epson Corp Voice data providing system and device and program for generating voice data
JP2007316303A (en) * 2006-05-25 2007-12-06 Nippon Telegr & Teleph Corp <Ntt> Voice synthesizing method and device with speaker selection function, and voice synthesizing program with speaker selection function
JP2007334144A (en) * 2006-06-16 2007-12-27 Oki Electric Ind Co Ltd Speech synthesis method, speech synthesizer, and speech synthesis program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019054321A (en) * 2017-09-12 2019-04-04 株式会社Nttドコモ Communication controller and terminal
CN110164413A (en) * 2019-05-13 2019-08-23 北京百度网讯科技有限公司 Phoneme synthesizing method, device, computer equipment and storage medium
CN110164413B (en) * 2019-05-13 2021-06-04 北京百度网讯科技有限公司 Speech synthesis method, apparatus, computer device and storage medium

Also Published As

Publication number Publication date
TWI717426B (en) 2021-02-01
JPWO2017094500A1 (en) 2018-11-08
JP6836033B2 (en) 2021-02-24
TW201732649A (en) 2017-09-16

Similar Documents

Publication Publication Date Title
US20180144747A1 (en) Real-time caption correction by moderator
US8712776B2 (en) Systems and methods for selective text to speech synthesis
US20180143956A1 (en) Real-time caption correction by audience
Caldwell et al. Web content accessibility guidelines (WCAG) 2.0
US7983910B2 (en) Communicating across voice and text channels with emotion preservation
CN104731959A (en) Video abstraction generating method, device and system based on text webpage content
CN101422041A (en) Internet search-based television
JP6836033B2 (en) Judgment device and voice providing system equipped with this
US9818450B2 (en) System and method of subtitling by dividing script text into two languages
JP2016090664A (en) Voice synthesis device, voice synthesis method, and program
Ciobanu et al. Speech recognition and synthesis technologies in the translation workflow
CN104484057A (en) Associative result providing method and device
US20220414132A1 (en) Subtitle rendering based on the reading pace
CN102323858B (en) Identify the input method of modification item in input, terminal and system
KR102643902B1 (en) Apparatus for managing minutes and method thereof
US11416530B1 (en) Subtitle rendering based on the reading pace
KR20140106479A (en) Method for producing lecture text data mobile terminal and monbile terminal using the same
CN115061580A (en) Input method, input device, electronic equipment and readable storage medium
JP6168422B2 (en) Information processing apparatus, information processing method, and program
KR20210022360A (en) A method and apparatus for automatically converting web content to video content
JP2015060038A (en) Voice synthesizer, language dictionary correction method, language dictionary correction computer program
US11907677B1 (en) Immutable universal language assistive translation and interpretation system that verifies and validates translations and interpretations by smart contract and blockchain technology
CN104199908A (en) Method and system for generating customization contents through search engine and search engine
JP2012108899A (en) Electronic equipment, network system and content edition method
US20220414133A1 (en) Subtitle rendering based on the reading pace

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16870438

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017553757

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16870438

Country of ref document: EP

Kind code of ref document: A1