WO2017094500A1

WO2017094500A1 - Determination device and voice provision system provided therewith

Info

Publication number: WO2017094500A1
Application number: PCT/JP2016/083894
Authority: WO
Inventors: 政文坂井; 裕美子安田; 村上　大介; 陽子福山; 哲成中
Original assignee: 株式会社電通
Priority date: 2015-12-02
Filing date: 2016-11-16
Publication date: 2017-06-08
Also published as: TWI717426B; JPWO2017094500A1; JP6836033B2; TW201732649A

Abstract

The purpose of the present invention is to accurately detect a non-permissive word included in text regardless of the context of the text. A management server (10) for determining a non-permissive word (NG word) included in text configured in natural language is configured to be provided with: an object-to-be-determined generation unit (111) that generates a first object to be determined by dividing the text at an arbitrarily defined position; and a determination unit (104) that determines the non-permissive word (NG word) by comparison between a pronunciation corresponding to the first object to be determined and a pronunciation corresponding to the non-permissive word (NG word).

Description

Judgment device and voice providing system provided with the same

The present invention relates to a determination apparatus for determining non-permissive words included in text composed of natural language, and a voice providing system including the same.

BACKGROUND ART In recent years, speech synthesis technology for reading texts by voice is widely used for broadcasting of traffic information, display guidance voices of art museums or museums, car navigation systems, and the like. In such speech synthesis technology, it is possible to conceal the speech of the actual user by reading out the text composed of natural language by synthetic speech. Because of this, there is a concern about abuses such as harassment and intimidation. Conventionally, there is known a technique for the purpose of preventing abuse of speech synthesis technology for crime and the like (see, for example, Patent Document 1).

The speech synthesizer described in Patent Document 1 includes an inappropriate word dictionary in which an inappropriate word or an inappropriate discourse pattern is registered, and the degree of an inappropriate portion included in the text to be read out is determined. Then, depending on the degree of inappropriate expression included in the text, it is possible to synthesize an audio watermark and to register in an external storage terminal server. In this way, it is possible to prevent the abuse of criminals without adversely affecting the text such as insertion of data that degrades the voice.

JP 2007-156169 A

However, in the speech synthesis apparatus described in Patent Document 1, analysis of morphemes, modification and meaning in the text to be read out is performed, and the degree of an inappropriate portion is determined according to the analysis result. Therefore, regardless of the context of the text to be read out, there is a problem that it is not possible to determine the inappropriate expression existing in the text.

The present invention has been made in view of such a problem, and a determination device capable of accurately detecting non-permissive words contained in text regardless of the context of the text, and a voice providing system including the same. Intended to be provided.

The determination apparatus according to the present invention is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word.

According to this configuration, the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.

In particular, in the determination apparatus, it is preferable that the determination unit determines the non-permissive word by partial coincidence between a pronunciation corresponding to the first determination target and a pronunciation corresponding to the non-permissive word. According to this configuration, not only when the pronunciation of the first determination target and the non-permissive word completely matches, it is possible to detect a word that partially matches the non-permissive word included in the text. Thus, even words that are similar to the non-permissive words included in the text can be detected.

Further, in the determination apparatus, the determination target generation unit divides the text into morphemes to generate a second determination target, and the determination unit generates a pronunciation corresponding to the first determination target and the non-permission Before the comparison with the pronunciation corresponding to the word, it is preferable to determine the non-permissive word and the non-permissive non-permissive word by comparing the second determination target with the non-permissive word. According to this configuration, prior to the comparison between the first determination target and the non-permissible word, it is possible to determine the non-permissive word and the permissive word included in the text based on the morpheme configuring the text. As a result, prior to the first determination target, non-permissive words included as morphemes in the text can be reliably detected by comparison with the second determination target. In addition, since non-permissive words included in the text can be determined in stages, it is possible to reduce omission of non-permissive words.

Furthermore, in the determination apparatus, the determination unit may determine the second determination that is determined as the permissible word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. Preferably, the non-permissive word is determined by comparing a pronunciation corresponding to an object and a pronunciation corresponding to the non-permissive word. According to this configuration, the non-permitted word is determined by comparing the pronunciation corresponding to the second determination target determined to be the permitted word with the pronunciation corresponding to the non-permitted word. Therefore, regardless of the meaning of the morpheme determined to be a permitted word, it is possible to detect non-permissive words included in the morpheme.

The determination apparatus according to the present invention is a determination apparatus for determining a non-permissive word included in a text composed of natural language, and a determination target generation unit that generates a first determination target by dividing the text at an arbitrary position. And a determination unit that determines the non-permissive word by comparing the character string forming the first determination target and the character string forming the non-permissive word.

According to this configuration, the non-permissive word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary place and the character string forming the non-permissible word. Therefore, it is possible to compare a character string constituting an arbitrary combination of characters and numbers constituting the text with a character string constituting the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.

A voice providing system according to the present invention is a voice providing system including any of the determination devices described above and providing voice corresponding to the text based on specific voiceprint data, wherein the determination device is the determination unit. And a voice generation unit that generates voice data using the specific voiceprint data from the text according to the determination result of

According to this configuration, the voice data using the specific voiceprint data is generated from the text according to the determination result of the determination unit of the determination apparatus. Therefore, the voice data can be switched and generated according to the presence or absence of the non-permissive word included in the text. As a result, it is possible to provide speech data of different modes according to the presence or absence of non-permissive words included in the text.

For example, in the voice providing system, the voice generation unit generates voice data corresponding to the text when the text does not include the non-permitted word. According to this configuration, the voice generation unit generates voice data corresponding to the text that does not include the non-permissive word without special correction or the like. For this reason, it is possible to quickly provide audio data corresponding to text.

On the other hand, in the voice providing system, when the text includes the non-permissive word, the voice generation unit generates voice data corresponding to the text in which a portion corresponding to the non-permissive word is corrected. According to this configuration, voice data in which the portion corresponding to the non-permissive word is corrected is generated. For this reason, even if it is the text containing a non-permissive word, the audio | speech data in which the part of the said non-permissive word was corrected can be provided.

For example, in the voice providing system, the voice generation unit deletes a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data in which the portion corresponding to the non-permitted word is deleted. For this reason, even if it is the text containing a non-permissive word, the audio | speech data in which the non-permissive word contained in the text was deleted reliably can be provided.

Further, in the voice providing system, the voice generation unit replaces a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data in which a portion corresponding to the non-permitted word is replaced. For this reason, it is possible to prevent voice data including an unacceptable word from being provided as it is using specific voiceprint data. In addition, since the part corresponding to the non-permissive word is replaced, it is possible to avoid that part of the text is missing.

For example, in the voice providing system, the voice generation unit uses voiceprint data different from the specific voiceprint data for a portion corresponding to the non-permitted word included in the text. According to this configuration, the voice generation unit generates voice data using different voiceprint data in a portion corresponding to the non-permitted word. For this reason, even in the text including the NG word, it is possible to provide audio data according to the text without changing the meaning of the text.

Further, in the voice providing system, the voice generation unit replaces a portion corresponding to the non-permissive word included in the text with a word having a different expression. According to this configuration, the voice generation unit generates voice data in which the non-permitted word is replaced with a word having a different expression. For this reason, even if it is the text containing a disallowed word, the audio | speech data according to this can be provided, without making the meaning which a text has to be changed significantly.

In the voice providing system, the determination device includes a storage unit that stores the specific voiceprint data, and the storage unit stores the non-permitted word associated with the specific voiceprint data. According to this configuration, it is determined by the determination unit whether the non-permissive word associated with the specific voiceprint data is included in the text. Therefore, it is possible to prevent the provision of voice data including non-permissive words associated with specific voiceprint data.

The determination method according to the present invention is a determination method for determining a non-permissive word included in a text composed of natural language, comprising the step of dividing the text at an arbitrary place to generate a first determination target; Determining the non-permissive word by comparing a pronunciation corresponding to a first determination target and a pronunciation corresponding to the non-permissive word.

According to this method, the non-permitted word is determined by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary position and the pronunciation corresponding to the non-permitted word. Therefore, it is possible to compare the pronunciation of any combination of letters and numerals constituting the text with the pronunciation of the non-permissive word. This makes it possible to accurately detect non-permissive words contained in the text regardless of the context of the text.

According to the present invention, it is possible to accurately detect non-permissive words contained in text regardless of the context of the text.

It is an explanatory view showing the outline of the voice offer system concerning this embodiment. It is a block diagram of the management server of the voice provision system concerning this embodiment. It is a figure which shows an example of the determination target produced | generated by the determination target production | generation part of the management server which concerns on this Embodiment. It is a block diagram of a portable terminal which receives voice offer from a voice offer system concerning this embodiment. It is a flowchart for demonstrating the operation | movement at the time of the text registration in the audio | voice provision system which concerns on this Embodiment. It is a figure which shows an example of the text registration screen utilized with the audio | voice provision system which concerns on this Embodiment. It is a flowchart for demonstrating the operation | movement at the time of voiceprint registration in the audio | voice provision system which concerns on this Embodiment. It is a figure which shows an example of the voiceprint registration screen utilized with the audio | voice provision system which concerns on this Embodiment. It is a flowchart for demonstrating the operation | movement at the time of the audio | voice provision in the audio | voice provision system which concerns on this Embodiment. It is a figure which shows an example of the setting input screen utilized with the audio | voice provision system which concerns on this Embodiment. It is a flowchart for demonstrating NG determination processing in the audio | voice provision system which concerns on this Embodiment.

Hereinafter, a voice providing system according to an embodiment of the present invention will be described in detail with reference to the attached drawings. In addition, the audio | voice provision system which concerns on this invention is not limited to the following embodiment, In the range of the meaning of this invention, it deform | transforms suitably and can be implemented.

FIG. 1 is an explanatory view showing an outline of a voice providing system according to the present embodiment. As shown in FIG. 1, the voice providing system 1 includes a management server 10 that constitutes an example of a determination device. The management server 10 stores text and voiceprint data (digital voiceprint data) provided from the text registration terminal 20 and the voiceprint registration terminal 30 connected via the network NW such as the Internet, and is connected via the network NW. The external terminal group 40 is provided with voice data generated from text and voiceprint data.

In FIG. 1, the case where the voice providing system 1 receives text and voiceprint data via the network NW and provides voice data generated based on these to the external terminal group 40 is described. However, the environment to which the voice providing system 1 according to the present invention is applied is not limited to the above environment, and can be changed as appropriate. For example, text and voiceprint data may be directly registered in the management server 10. Further, the voice data is not limited to the external terminal group 40, and may be provided to a terminal (voice output terminal or the like) directly connected to the management server 10.

The management server 10 is disposed in a company or the like that provides a voice providing service using the voice providing system 1 according to the present embodiment. The management server 10 is configured, for example, by a personal computer (PC) having a general function, and has a function as a web server. For example, the management server 10 includes a text registration screen 20 (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later. Provided to the external terminal group 40.

The text registration terminal 20 is disposed in a company or the like that registers text as a manuscript of voice data provided from the voice providing system 1. The text registration terminal 20 is, for example, a PC having a general function, and has a web browser function. For example, the text registration terminal 20 is disposed as audio data at a manufacturer who wants to provide an advertisement, a newspaper company who wants to provide news, a television station, or the like. In addition, the text registration terminal 20 is arranged at a service provider who desires to provide information (for example, voice guidance) from home appliances or the like in an environment where any home appliance such as a television or a refrigerator is connected to the Internet. May be In addition, the text registration terminal 20 is provided with the component (an input part, a display part, a communication part, etc.) required as a text input terminal. The text registered from the text registration terminal 20 is composed of natural language.

The voiceprint registration terminal 30 is installed in a company or the like that registers voiceprint data (digital voiceprint data) serving as a sound source of voice data provided from the voice providing system 1. The voiceprint registration terminal 30 is formed of, for example, a PC having a general function, and has a web browser function. For example, the voiceprint registration terminal 30 is disposed in a talent office to which an actor or a voice actor belongs, a management office that manages athletes, and the like. The voiceprint registration terminal 30 includes components (input unit, display unit, communication unit, etc.) necessary as a voiceprint input terminal.

Here, voiceprint data registered from the voiceprint registration terminal 30 will be described. The voiceprint data includes, for example, fragment data of a voice recording a voice of a specific person, and sound and prosody parameters such as spectrum and fundamental frequency obtained by analyzing the voice of the specific person. The voiceprint data is not limited to these, and includes arbitrary data necessary for voice synthesis technology (for example, waveform connection type voice synthesis, formant synthesis, etc.) by the voice generation unit 112 described later.

The external terminal group 40 is configured of, for example, any terminal (device) having a network connection function. In FIG. 1, as the external terminal group 40, a car navigation system (hereinafter referred to as "car navigation") 41, a portable terminal 42 such as a smartphone, and a home appliance 43 such as a refrigerator are illustrated. The car navigation system 41, the portable terminal 42 and the home appliance 43 constituting the external terminal group 40 have an audio output function of outputting audio data provided from the management server 10 in addition to the functions specific to each terminal.

FIG. 2 is a block diagram of the management server 10 of the voice providing system 1 according to the present embodiment. In FIG. 2, only the components of the management server 10 related to the present invention are shown. As shown in FIG. 2, the management server 10 has a control unit 101 that controls the entire management server 10. To the control unit 101, a generation unit 102, a storage unit 103, a determination unit 104, a communication unit 105, an input unit 106, and a display unit 107 are connected. The configuration of the management server 10 is not limited to this, and can be changed as appropriate.

The generation unit 102 includes a determination target generation unit 111 and an audio generation unit 112. The determination target generation unit 111 generates a target (determination target) that determines whether the text stored in the storage unit 103 includes a non-permissive word (hereinafter, referred to as an “NG word”). For example, the determination target generation unit 111 generates a first determination target (first determination target) by dividing the text stored in the storage unit 103 at an arbitrary position. In addition, the determination target generation unit 111 divides the text stored in the storage unit 103 into morphemes to generate a second determination target (second determination target). Furthermore, the determination target generation unit 111 converts the text stored in the storage unit 103 or a part of the second determination target into a phonetic word (for example, hiragana or prosody).

Here, the first determination target and the second determination target generated by the determination target generation unit 111 are specifically shown. 3A and 3B are diagrams showing an example of the first determination target and the second determination target generated by the determination target generation unit 111, respectively. FIG. 3A shows a case where the first determination target is generated from the text "Kyouway ITENKIDESU" using Japanese and the text "It is good weather today" using English. Moreover, in FIG. 3B, the case where the second determination target is generated from the text "Japanese weather is good (KYOUWAY ITENKIDESU)" using Japanese and the text "It is good weather today" using English is shown. In the text using English, the first determination target and the second determination target are indicated by phonetic symbols. Hereinafter, the first determination target and the second determination target will be described using text in Japanese.

As shown in FIG. 3A, the first determination target is generated by converting the text “It is a good weather today (KYOUWAY OITENKIDESU)” into a phonetic word and dividing it into an arbitrary part. For example, the text "Good weather today (KYOUWAY OITENKIDESU)" is "Kyo-ha-yo-i-te-n-ki-su" (KYO-U-WA-YO-I-TE-N -KI-DE-SU) or "K-Ha-Ye-Ite-Ken-Ki (KYOU-WAYO-ITE-NKI-DESU)" is considered as the first judgment target. The first determination target includes all combinations divided by an arbitrary number of phonetic words (number of hiragana) while maintaining the order of hiragana constituting the text “Today is a fine weather (KYOUWAY OITENKIDESU)”.

When generating a first determination target, if a plurality of phonetic words are included in the text, a first determination target including all the phonetic words is generated. For example, in the text "good (YOI)" shown in FIG. 3A, the pronunciation words "good (YOI)" and "good (II)" exist. Therefore, for the first judgment target generated from this text, "Kyou wa good day (KYOWAYOITENKIDESU)" is divided at any place, and "Kyouha is good day (KYOWA II TENKIDESU)" at any place Included.

On the other hand, as shown in FIG. 3B, the second determination target is generated by dividing the text “It is a fine weather today (KYOUWAY OITENKIDESU)” into morphemes. For example, the text "Today is fine weather (KYOUWAY OITENKIDESU)" is divided into "Today-Ha-Good-Weather-(KUOU-WA-YOI-TENKI-DESU)" and is considered as the second judgment target. The determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in the text database (DB) 113 in the storage unit 103 described later in association with the text of the generation source. .

The voice generation unit 112 generates voice data from the text stored in the text DB 113 using voiceprint data registered in a voiceprint DB 114 in the storage unit 103 described later. The voice generation unit 112 can also be called a voice synthesis unit that generates a voice waveform based on voiceprint data. For example, the voice generation unit 112 can generate a voice waveform by waveform connection type voice synthesis, formant synthesis, or the like. In waveform-type speech synthesis, fragment data of a recorded voice of a specific person or the like is concatenated and synthesized. On the other hand, in formant synthesis, the voice of a particular person recorded is not used, and parameters such as the base frequency, timbre, and noise level are adjusted to form a waveform, and artificial voice data is generated.

Further, as described later, the voice generation unit 112 changes the mode of the voice data to be generated according to the presence or absence of the NG word included in the text. If the text does not contain an NG word, audio data corresponding to the text is generated without correcting the text. On the other hand, when the text includes an NG word, voice data corresponding to the text in which the portion corresponding to the NG word is corrected is generated. For example, when correcting the part corresponding to the NG word, the voice generation unit 112 can generate voice data in which the part is deleted or replaced.

The storage unit 103 stores information necessary for the control unit 101 to control the management server 10. For example, the storage unit 103 stores information for generating a text registration screen (see FIG. 6), a voiceprint registration screen (see FIG. 8) and a setting input screen (see FIG. 10) described later. In addition, the storage unit 103 stores a database (DB) in which various types of information are registered. Specifically, text DB 113, voiceprint DB 114, first NG word DB 115, and second NG word DB 116 are stored.

In the text DB 113, texts registered from the text registration terminal 20 via the network NW are registered. In the text DB 113, the text is registered in association with the identification information of the text registration terminal 20. Further, in the text DB 113, the determination target (first determination target, second determination target) generated by the determination target generation unit 111 is registered in association with the text of the generation source.

Voiceprint data registered from the voiceprint registration terminal 30 via the network NW is registered in the voiceprint DB 114. In the voiceprint DB 114, voiceprint data is registered in association with identification information of the voiceprint registration terminal 30.

In the first NG word DB 115, basic NG words including words that are undesirable to use in social terms and words specified from attribute information of text described later are registered. For example, the basic NG word includes a word that insults a third party, a word that is reminiscent of an obscene word or an antisocial statement. In addition, the basic NG word includes a word that is associated with a political position when the attribute information of the text is "political".

In the second NG word DB 116, individual NG words including words registered from the voiceprint registration terminal 30 via the network NW are registered. The individual NG word is registered in association with the voiceprint data registered in the voiceprint DB 116. The individual NG words include words that adversely affect the impression of the person providing the voiceprint data. For example, when the person providing the voiceprint data is a "sport athlete", words such as "eight hundred long" and "doping" are included.

The determination unit 104 determines whether the text registered in the text DB 113 or the first determination target associated with the text or the second determination target includes an NG word or a non-NG word permissible word (hereinafter referred to as an “OK word”). Determine When determining whether the text registered in the text DB 113 contains an NG word, the determination unit 104 refers to the basic NG word registered in the first NG word DB. When determining whether the first determination target or the second determination target registered in the text DB 113 contains an NG word, the determination unit 104 refers to the individual NG word registered in the second NG word DB. In this case, the determination unit 104 generates a phonetic word (NG sound) of the individual NG word as necessary, and compares it with the first determination target and the second determination target.

The communication unit 105 communicates information with the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40 under the control of the control unit 101. For example, the communication unit 105 transmits information necessary for the text registration screen (see FIG. 6) and the voiceprint registration screen (see FIG. 8) to the text registration terminal 20 and the voiceprint registration terminal 30, respectively. On the other hand, the communication unit 105 receives text, voiceprint data and setting input information from the text registration terminal 20, the voiceprint registration terminal 30, and the external terminal group 40, respectively.

The input unit 106 receives an instruction for the management server 10. For example, the input unit 106 receives an instruction such as editing of the basic NG word in the first NG word DB 115. The display unit 107 displays information required to operate the management server 10. For example, the display unit 107 displays the status of the management server 10 and the registration status of text and voiceprint data stored in the storage unit 103.

Here, a configuration example of the external terminal group 40 that receives voice provision from the voice provision system 1 according to the present embodiment will be described on behalf of the portable terminal 42. FIG. 4 is a block diagram of a portable terminal 42 that receives voice provision from the voice provision system 1 according to the present embodiment. In FIG. 4, only the components of the portable terminal 42 related to the present invention are shown.

As shown in FIG. 4, the portable terminal 42 includes a control unit 421 that controls the entire terminal. The control unit 421 is connected to an application execution unit (hereinafter referred to as “application execution unit”) 422, an audio output unit 423, a communication unit 424, an input unit 425, and a display unit 426. The configuration of the portable terminal 42 is not limited to the configuration shown in FIG. 4 and can be changed as appropriate.

The application execution unit 422 executes processing necessary to output voice data provided from the management server 10. For example, the application execution unit 422 generates a setting input screen (see FIG. 10) for inputting settings for audio data provided from the management server 10, and displays the setting input screen on the display unit 426. In addition, the application execution unit 422 performs confirmation (for example, confirmation as to whether or not the setting on the setting input screen matches) of the audio data received via the communication unit 424, and outputs the audio data to the audio output unit 423.

The audio output unit 423 outputs the audio data received from the application execution unit 422. For example, the audio output unit 423 outputs audio data corresponding to the text provided from the management server 10 from the speaker.

The communication unit 424 communicates information with the management server 10 via the network NW under the control of the control unit 421. For example, the communication unit 424 transmits the information input on the setting input screen described above to the management server 10. The communication unit 424 also receives voice data from the management server 10.

The input unit 425 receives an instruction for the portable terminal 42. For example, the input unit 425 receives an instruction to input information on the setting input screen. The display unit 426 displays information necessary to operate the mobile terminal 42. For example, the display unit 426 displays the status of the portable terminal 42, a setting input screen, and the like.

In the voice providing system 1 according to the present embodiment, for example, the management server 10 receives a text such as a newspaper article from the text registration terminal 20 and receives voiceprint data of a specific actor or the like from the voiceprint registration terminal 30. On the other hand, the management server 10 receives, from the portable terminal 42, setting information in which desired text and voiceprint data are designated. The management server 10 generates voice data from text using specified voiceprint data based on setting information from the portable terminal 42, and provides the voice data to the portable terminal 42. As a result, the portable terminal 42 can receive and output voice data in which text such as a newspaper article is read out with the voice of the operator's favorite actor.

Hereinafter, operations from registration of text to provision of voice data in the voice providing system 1 having such a configuration will be described. First, an operation of registering a text from the text registration terminal 20 to the management server 10 will be described. FIG. 5 is a flowchart for explaining the operation at the time of text registration in the speech providing system 1 according to the present embodiment.

As shown in FIG. 5, when registering a text in the management server 10, registration of the text is applied from the text registration terminal 20 (step ST501). When the text registration application is detected, information (registration screen information) necessary for generating a text registration screen is read out from the information stored in the storage unit 103 and is output to the text registration terminal 20 through the communication unit 105. (Step ST502). When the information necessary for the text registration screen is received, the text registration screen is displayed on the text registration terminal 20 (step ST503).

FIG. 6 is a view showing an example of a text registration screen 600 used by the voice providing system 1 according to the present embodiment. The text registration screen 600 shown in FIG. 6 is a screen for registering a text that the operator of the text registration terminal 20 wants to provide. As shown in FIG. 6, the text registration screen 600 is provided with an attribute selection unit 601, a text input unit 602, a reset button 603, and an end button 604.

The attribute selection unit 601 is a part for selecting attribute information of text to be registered. For example, the attribute selection unit 601 is provided with a box (category selection box) for selecting a category such as “entertainment”, “sports”, “news”, “economy” or the like as text attribute information. By selecting these category selection boxes, the category to which the text to be registered belongs can be specified. The attribute selecting unit 601 may be configured to directly input text attribute information. The attribute selection unit 601 can adopt an arbitrary configuration on the premise of specifying text attribute information.

The text input unit 602 is a portion into which text to be registered is input. The text input unit 602 is provided with a field for inputting text (text input field). By entering text characters and numbers in this text entry field, it is possible to specify text to be registered in the management server 10. For example, in the text entry field, texts related to newspaper articles, traffic information, voice guidance and advertisement information are entered.

The reset button 603 is used to reset the information selected and designated on the text registration screen 600. The end button 604 is used when ending the text registration process using the text registration screen 600. By selecting the end button 604, the attribute information and text selected / input via the text registration screen 600 are transmitted to the management server 10.

The text registration screen 600 is not limited to the example shown in FIG. 6 and can be changed as appropriate. It is preferable as an embodiment to provide the text registration screen 600 with a portion for specifying the handling of the registered text. For example, the correction method (deletion and substitution of the NG word) of the voice data when the NG text is included in the registered text in relation to the voiceprint data may be designated.

When such a text registration screen 600 is displayed, attribute information is selected from the attribute selection unit 601, and text is input to the text input unit 602. When the end button 604 of the text registration screen 600 is selected, these attribute information and text are transmitted to the management server 10 (step ST504).

When the attribute information and the text are received, it is determined by the determination unit 104 whether the text includes an NG word (step ST505). At this time, the determination unit 104 refers to the basic NG word registered in the first NG word DB. As a result, it is detected whether the text contains a word or the like which is undesirable to use, as it is customary. As described above, by determining the presence or absence of the basic NG word at the text registration stage, it is possible to prevent the text including the basic NG word from being registered in the management server 10.

If the text contains a basic NG word (step ST505: YES), an error message indicating that is output to the text registration terminal 20 (step ST506). By outputting an error message in this manner, it is possible to notify the text registrant of inappropriateness such as text. The text registrant who has received the error message inputs the text etc. again from the text registration screen 600, and transmits it to the management server 10 (step ST504).

On the other hand, when the text does not contain the basic NG word (step ST505: No), the text transmitted in step ST504 is registered in the text DB 113 (step ST507). Then, when the registration process to the text DB 113 is completed, the management server 10 notifies the text registration terminal 20 of the completion of the text registration (step ST508). Through such a series of operations, a text such as a newspaper article is registered in the management server 10 (text DB 113).

Next, the operation when voiceprint data is registered from voiceprint registration terminal 30 to management server 10 will be described. FIG. 7 is a flowchart for explaining the operation at the time of voiceprint registration in voice providing system 1 according to the present embodiment.

As shown in FIG. 7, when voiceprint data is registered in the management server 10, registration of voiceprint data is applied from the voiceprint registration terminal 30 (step ST701). When the voiceprint registration application is detected, information (registration screen information) necessary for generating the voiceprint registration screen is read out from the information stored in the storage unit 103 and is output to the voiceprint registration terminal 30 through the communication unit 105. (Step ST702). When the information necessary for the voiceprint registration screen is received, the voiceprint registration screen is displayed on the voiceprint registration terminal 30 (step ST703).

FIG. 8 is a view showing an example of a voiceprint registration screen 800 used in the voice providing system 1 according to the present embodiment. The voiceprint registration screen 800 shown in FIG. 8 is a screen for registering voiceprint data that the operator of the voiceprint registration terminal 30 wants to provide. As shown in FIG. 8, the voiceprint registration screen 800 is provided with an attribute selection unit 801, an NG word category selection unit 802, an NG word selection / input unit 803, a voiceprint input unit 804, a reset button 805 and an end button 806. There is.

The attribute selection unit 801 is a portion for selecting attribute information of voiceprint data to be registered (more specifically, a person of voiceprint data). For example, the attribute selection unit 801 is provided with a box (category selection box) for selecting a category such as "actor", "idle", "voice actor", or "artist" as attribute information of voiceprint data. By selecting these category selection boxes, it is possible to specify the category to which voiceprint data to be registered belongs. The attribute selection unit 801 may directly input the attribute information of the voiceprint data. The attribute selection unit 801 can adopt an arbitrary configuration on the premise of designating attribute information of voiceprint data.

The NG word category selection unit 802 is a part that selects a category of the NG word (individual NG word). The NG word category selection unit 802 is provided with, for example, a box (category selection box) for selecting a category such as “divorce”, “disaster”, “anti-society” or “advertisement”. In these category selection boxes, NG word candidates (NG word candidates) associated with each category are registered in advance. By selecting these category selection boxes, it is possible to specify the category to which the NG word (individual NG word) associated with the voiceprint data to be registered belongs.

The NG word selection / input unit 803 is a portion for selecting or inputting an NG word (individual NG word) associated with voiceprint data to be registered. An NG word candidate is displayed on the NG word selection / input unit 803 by selecting a category from the above-mentioned NG word category selection unit 802. The voiceprint registrant can select an NG word to be associated with voiceprint data to be registered from such an NG word candidate. Also, the voiceprint registrant can directly input an NG word (individual NG word) to the NG word selection / input unit 803.

The voiceprint input unit 804 is a part for inputting voiceprint data (digital voiceprint data) to be registered. The voiceprint input unit 804 is provided with a box (voiceprint attachment box) to which voiceprint data is attached. By attaching voiceprint data to the voiceprint attachment box, voiceprint data to be registered in the management server 10 can be designated.

The reset button 805 is used to reset the information selected and designated on the voiceprint registration screen 800. The end button 806 is used when ending registration processing of voiceprint data using the voiceprint registration screen 800. By selecting the end button 806, the attribute information and voiceprint data selected / input via the voiceprint registration screen 800 are transmitted to the management server 10.

The voiceprint registration screen 800 is not limited to the example shown in FIG. 8 and can be changed as appropriate. It is preferable as an embodiment to provide the voiceprint registration screen 800 with a portion for designating the handling of the registered voiceprint data. For example, when voice data is generated using registered voiceprint data, a correction method (deletion or replacement of NG word) of voice data when an NG word is included in the text may be designated.

Further, it is preferable as a preferred embodiment to have a voiceprint registration screen 800 having a function of analogically displaying an NG word related to voiceprint data of a specific person. For example, the NG word may be analogized based on the speech and behavior of a specific person in the past one year (for example, an utterance by a medium such as a television or radio), and may be displayed on the NG word selection / input unit 803. These NG words are preferably displayed according to the selection from the voiceprint registrant.

When such a voiceprint registration screen 800 is displayed, the attribute information is selected from the attribute selection unit 801, and the category of the NG word is selected from the NG word category selection unit 802. When these pieces of information are selected, the attribute information and the NG word category are transmitted to the management server 10 (step ST704).

When the attribute information and the NG word category are received, a candidate list of NG words (NG word candidate list) is transmitted from the management server 10 to the voiceprint registration terminal 30 (step ST 705). The NG word candidate list is displayed on the NG word selection / input unit 803 of the voiceprint registration screen 800.

Here, although the aspect of transmitting the NG word candidate list to the voiceprint registration terminal 30 in accordance with the selection of the category from the NG word category selection unit 802 or the like has been described, the present invention is not limited thereto. For example, the NG word candidate list may be transmitted to the voiceprint registration terminal 30 in step ST702, and may be displayed on the NG word selection / input unit 803 according to the selection of the attribute information and the category.

When an NG word is displayed on the NG word selection / input unit 803, the voiceprint register designates an NG word (individual NG word) from the NG word selection / input unit 803, and voiceprint data is attached by the voiceprint input unit 804. Ru. Then, when the end button 806 of the voiceprint registration screen 800 is selected, these NG words and voiceprint data are transmitted to the management server 10 (step ST706).

When the NG word and the voiceprint data are received, the voiceprint data is registered in the voiceprint DB 114, and the NG word is registered in the second NG word DB 116 (step ST707). In the second NG word DB 116, the NG word is registered in association with the voiceprint data. Then, when the registration process for the voiceprint DB 114 and the second NG word DB 116 is completed, the management server 10 notifies the voiceprint registration terminal 30 of the completion of voiceprint registration (step ST 708). By such a series of operations, voiceprint data of an actor, an actress, etc. that can be used by the voiceprint registrant to generate voice data is registered in the management server 10 (voiceprint DB 114, second NG word DB 116). NG word of data is registered.

By the above text registration operation and voiceprint registration operation, text and voiceprint data for generating voice data are registered in the management server 10. The management server 10 generates voice data using such text and voiceprint data, and provides the generated voice data to the portable terminal 42 and the like. At this time, the management server 10 selects text and voiceprint data based on a desired setting specified by the portable terminal 42 or the like, and generates voice data based on the text and voiceprint data.

Next, an operation of specifying desired settings from the portable terminal 42 to the management server 10 and outputting the voice data provided from the management server 10 by the portable terminal 42 will be described. FIG. 9 is a flow chart for explaining the operation at the time of voice provision in voice provision system 1 according to the present embodiment.

When the portable terminal 42 receives provision of audio data from the management server 10, as shown in FIG. 9, an audio output application is activated in the portable terminal 42 (step ST901). By activating this voice output application, it becomes possible to communicate information related to the voice providing system 1 with the management server 10. When the voice output application is activated, a setting input screen for inputting a desired setting on the portable terminal 42 is displayed (step ST902).

FIG. 10 is a diagram showing an example of a setting input screen 1000 used by the voice providing system 1 according to the present embodiment. The setting input screen 1000 shown in FIG. 10 is a screen for designating audio data that the operator of the portable terminal 42 wants to receive provision. As shown in FIG. 10, the setting input screen 1000 is provided with a text designation unit 1001, a voiceprint designation unit 1002, a reset button 1003 and an end button 1004.

The text designating unit 1001 is a portion for designating a text corresponding to audio data that the operator of the portable terminal 42 wants to receive. The text designation unit 1001 is provided with a box (text selection box) for selecting a text such as “entertainment”, “sports”, “news”, “economy” or the like indicating the type of text. By selecting these text selection boxes, it is possible to specify the text corresponding to the audio data provided from the management server 10.

In FIG. 10, for convenience of explanation, although simplified, the text selection box displays content including text of various genres. In addition, it is preferable as an embodiment to configure the text selection box by an icon that can identify the text registrant. In this case, the operator of the portable terminal 42 can intuitively select a desired text.

The voiceprint designating unit 1002 is a portion for designating voiceprint data serving as a sound source of voice data that the operator of the portable terminal 42 wants to receive. Voiceprint designating unit 1002 is provided with a box (category selection box) for selecting a category to which a person corresponding to voiceprint data belongs. By selecting these category selection boxes, it is possible to specify a candidate of a person corresponding to voiceprint data. When a specific category selection box is selected, the voiceprint designation unit 1002 displays a plurality of persons belonging to the category. The operator can specify a person corresponding to voiceprint data by selecting a candidate displayed on the voiceprint specification unit 1002. Further, voiceprint designation section 1002 is provided with an input field where a person corresponding to voiceprint data can be directly input.

The reset button 1003 is used to reset information selected and specified on the setting input screen 1000. The end button 1004 is used when ending the input process of the desired setting using the setting input screen 1000. By selecting the end button 1004, the text and voiceprint data selected / input via the setting input screen 1000 are transmitted to the management server 10.

The setting input screen 1000 is not limited to the example shown in FIG. 10, and can be changed as appropriate. It is preferable as an embodiment to provide the setting input screen 1000 with a portion for designating the handling of voice data generated from the set text and voiceprint data. For example, a method of correcting voice data (NG word deletion and replacement) may be designated when the set text and voiceprint data include an NG word.

When the operator inputs a desired setting on such a setting input screen 1000 and the end button 1004 is selected, setting information is transmitted to the management server 10 (step ST 903). The setting information includes text selected by the operator and voiceprint data selected by the operator (more specifically, information on a person corresponding to the voiceprint data).

When the setting information is received from the portable terminal 42, the text and voiceprint data included in the setting information are selected in the management server 10 (step ST904). The management server 10 selects the text and voiceprint data included in the setting information from the text DB 113 and the voiceprint DB 114. Then, after the text and voiceprint data are selected, it is determined whether the NG word (individual NG word) associated with the voiceprint data is included in the designated text (hereinafter referred to as “NG determination process”) Is performed (step ST 905).

Here, the NG determination process will be described. FIG. 11 is a flowchart for explaining the NG determination process in the voice providing system 1 according to the present embodiment. The NG determination processing is mainly executed by the generation unit 102 (the determination target generation unit 111) and the determination unit 104 in the management server 10.

As shown in FIG. 11, in the NG determination processing, first, the determination target generation unit 111 performs a second determination target generation processing (morpheme analysis processing) on the text selected in step ST904 described above (step ST1101). In the second determination target generation process, the selected text is divided into morphemes. That is, the second determination target (see FIG. 3B) is generated from the text by the second determination target generation process. The second determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.

When the second determination target is registered, the determination unit 104 performs determination processing (hereinafter, referred to as “primary determination processing”) to determine whether the second determination target includes an NG word (individual NG word) Step ST1102). In this primary determination process, the determination unit 104 reads the individual NG word associated with the voiceprint data selected in step ST 904 from the second NG word DB 116. Then, the determination unit 104 determines the NG word and the OK word in the text by comparing the individual NG word and the second determination target one by one (step ST1103). As a result, the morpheme making up the text and the NG word are compared, and the NG word included in the text is detected.

When the OK word is detected from the text (step ST1103: OK), the determination target generation unit 111 generates a phonetic word of the OK word (step ST1104). Here, the case where the OK word is detected corresponds to the case where the second determination target that does not correspond to the NG word is detected from the text. On the other hand, the determination unit 104 generates a phonetic word of the individual NG word (hereinafter referred to as “NG sound”) (step ST1105). In this case, the pronunciation word of the generated OK word is registered in the text DB 113, and the generated NG sound is registered in the second NG word DB 116.

When the pronunciation word and the NG sound of the OK word are generated, the determination unit 104 performs a determination process (hereinafter, referred to as “secondary determination process”) to determine whether the NG word is included in the pronunciation word of the OK word (hereinafter Step ST1106). In this secondary determination process, the determination unit 104 determines the NG word and the OK word by comparing the pronunciation word of the OK word and the NG sound one by one (step ST1107). As a result, the phonetic word of the second determination target determined as the OK word in the primary determination processing is compared with the phonetic word of the NG word, and the NG word included in the text is detected.

Also in the secondary determination process, when the OK word is detected, the determination target generation unit 111 performs the first determination target generation process on the text selected in step ST 904 (step ST 1108). In the first determination target generation process, a phonetic word of the selected text is generated, and a determination target in which the phonetic word is divided at an arbitrary position is generated. That is, the first determination target (see FIG. 3A) is generated from the text by the first determination target generation process. The first determination target generated from the text is registered in the text DB 113 of the storage unit 103 in association with the text.

When the first determination target is registered, the determination unit 104 performs determination processing (hereinafter, referred to as “third determination processing”) to determine whether the first determination target includes an NG sound (step ST1109). In this tertiary determination process, the determination unit 104 compares, one by one, the match between an arbitrary combination of the phonetic words of the first determination target text and the NG sound of the individual NG word registered in the second NG word DB 116 By doing this, an NG word is determined (step ST1110). Thereby, the NG sound is compared with an arbitrary combination of the phonetic words of the text, and the NG word not detected in the primary determination process and the secondary determination process is detected. For example, in the example shown in FIG. 3A, if "Kyou-Ye-Ike-Nki- (KYOU-WAYO-ITE-NKI-DESU)" is generated as one of the first judgment targets, "Kyou" The phonetic words of (KYOU), “Hayo (WAYO)”, “Ite (ITE)”, “Nki (NKI)” and “I will (DESU)” are compared with the NG sound.

If an NG word is not detected in the tertiary determination process (step ST1110: No), the determination unit 104 determines that the text does not include the NG word as a determination result of the NG determination process (OK determination) Is selected (step ST1111).

On the other hand, if an NG word is detected in the tertiary determination process (step ST1110: Yes), an NG word is detected in the primary determination process (step ST1103: NG) and an NG word is detected in the secondary determination process. If it is determined (step ST1107: NG), the determination unit 104 records the location of the NG word in the text (step ST1112). Then, the determination unit 104 selects the determination (NG determination) indicating that the text includes the NG word as the determination result of the NG determination process (step ST1113).

If the OK determination is selected in step ST1111 or the NG determination is selected in step ST1113, the determination unit 104 ends the NG determination process. With such an NG determination process, it is determined whether the NG word (individual NG word) associated with the voiceprint data selected in step ST 904 is included in the selected text.

In such an NG determination process, in the tertiary determination process, an NG word is generated by comparing the pronunciation corresponding to the first determination target generated by dividing the text at an arbitrary location with the pronunciation (NG sound) corresponding to the individual NG word. Is determined. For this reason, it is possible to compare the pronunciation and the NG sound of any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.

Also, in the NG determination process, prior to the comparison between the first determination target and the NG word (tertiary determination process), the NG word and the OK word included in the text based on the morpheme (second determination target) constituting the text Is determined (primary determination processing). As a result, prior to the first determination target, the NG word included as a morpheme in the text can be reliably detected by comparison with the second determination target. Further, since the NG word included in the text can be determined stepwise, it is possible to reduce the detection omission of the NG word.

Furthermore, in the NG determination process, prior to the comparison between the first determination target and the NG word (tertiary determination process), the pronunciation and the NG word corresponding to the second determination target determined as the OK word in the primary determination process. The NG word is determined by comparison with the corresponding pronunciation (secondary determination processing). Therefore, regardless of the meaning of the morpheme determined to be the OK word, the NG word included in the morpheme can be detected.

When the NG determination process ends, the determination unit 104 determines whether the determination result of the NG determination process is an OK determination or an NG determination as shown in FIG. 9 (step ST 906). Here, in the case of the OK determination (step ST 906: OK), the audio generation unit 112 generates audio data (step ST 907). In this case, the voice generation unit 112 generates voice data corresponding to the text using the voiceprint data selected in step ST904 without performing processing such as correction on the text selected in step ST904. Then, the generated voice data is output from the management server 10 to the portable terminal 42 (step ST 908).

On the other hand, if the determination result of the NG determination process is an NG determination (step ST906: NG), the voice generation unit 112 generates voice data (modified voice data) in which a portion of the text is corrected (step ST909). In this case, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word in the text selected in step ST904. In the portion other than the portion corresponding to the NG word in the text, voice data is generated using the voiceprint data selected in step ST904.

When correcting the part corresponding to the NG word, the speech generation unit 112 can generate speech data in which the part corresponding to the NG word in the text is deleted. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced. As an aspect of replacing the NG word, the voice generation unit 112 can generate voice data using voiceprint data different from the voiceprint data selected in step ST904, for example. For example, voice data can be generated using predetermined voiceprint data only for the part corresponding to the NG word. Further, the voice generation unit 112 can generate voice data in which a portion corresponding to the NG word in the text is replaced with a word of another expression. In addition, it is preferable to select the mode of the correction with respect to the part corresponding to NG word in consideration of the intention of a text registrant or a voiceprint registrant.

The voice data (corrected voice data) generated by correcting the part corresponding to the NG word is output from the management server 10 to the portable terminal 42 (step ST 910). When audio data is received from the management server 10 in step ST 908 or step ST 910, the portable terminal 42 outputs audio via a speaker or the like (step ST 911). By this voice output, the operation at the time of voice provision in the voice provision system 1 ends.

As described above, in the voice providing system 1 according to the present embodiment, the voice generation unit 112 of the management server 10 determines the specific voice print from the text registered in the storage unit 103 according to the determination result of the determination unit 104. Generate voice data using data. As a result, according to the determination result of the determination unit 104, voice data using specific voiceprint data is generated from the text. Therefore, it is possible to switch and generate voice data according to the presence or absence of the NG word included in the text. As a result, it is possible to provide the portable terminal 42 with audio data of different modes according to the presence or absence of the NG word included in the text.

Here, when the text registered in the storage unit 103 does not include the NG word, the voice generation unit 112 generates voice data corresponding to the text. As a result, voice data corresponding to the text not including the NG word is generated without any special correction or the like. Therefore, voice data corresponding to text can be quickly provided to the portable terminal 42.

On the other hand, when an NG word is included in the text registered in the storage unit 103, the voice generation unit 112 generates voice data corresponding to the text obtained by correcting the part corresponding to the NG word. As a result, even in the case of text including an NG word, audio data in which the portion of the NG word is corrected can be provided to the portable terminal 42.

For example, when an NG word is included in the text registered in the storage unit 103, the voice generation unit 112 can delete or replace a portion corresponding to the NG word included in the text. When the portion corresponding to the NG word is deleted, it is possible to provide the portable terminal 42 with voice data in which the NG word contained in the text is surely deleted even if it is a text including the NG word. On the other hand, when the part corresponding to the NG word is replaced, it is possible to prevent the voice data including the NG word from being provided as it is to the portable terminal 42 using the specific voiceprint data.

When replacing the part corresponding to the NG word, voiceprint data different from the specific voiceprint data can be used for that part. In this case, even if the text includes an NG word, audio data according to the text can be provided to the portable terminal 42 without changing the meaning of the text. Also, when replacing the part corresponding to the NG word, the part can be replaced with a word of a different expression. In this case, even if the text includes an NG word, audio data according to the text can be provided to the portable terminal 42 without significantly changing the meaning of the text.

Further, in the voice providing system 1 according to the present embodiment, NG stored in the storage unit 103 (more specifically, the second NG word DB 116) of the management server 10 is associated with voiceprint data registered from the voiceprint registration terminal 30. Word is registered. For this reason, it is determined by the determination unit 104 whether the NG word (individual NG word) associated with the specific voiceprint data is included in the text. This makes it possible to reliably prevent the voice data including the NG word associated with the specific voiceprint data from being provided to the portable terminal 42.

Furthermore, in the voice providing system 1 according to the present embodiment, the determination unit 104 of the management server 10 determines the presence or absence of the NG word (general NG word) included in the text at the time of text registration from the text registration terminal 20. . This makes it possible to prevent the registration of the text including the NG word in the registration phase of the text from the text registration terminal 20.

In this case, the determination unit 104 determines the presence or absence of the NG word (general NG word) associated with the attribute information of the text. Thereby, it is possible to prevent the registration of the text including the NG word specified from the attribute information in the registration phase of the text from the text registration terminal 20.

The present invention is not limited to the above embodiment, and can be implemented with various modifications. In the embodiment described above, the components illustrated in the attached drawings are not limited to the above, and various modifications can be made as long as the effects of the present invention are exhibited. In addition, the present invention can be modified as appropriate without departing from the scope of the object of the present invention.

For example, in the above embodiment, in the tertiary determination process in the NG determination process shown in FIG. 11, the determination unit 104 is registered in the second NG word DB 116 with any combination of the phonetic words of the text that is the first determination target. The case of comparing the match of the NG word (individual NG word) with the NG sound one by one is described. However, the comparison method in the tertiary determination process is not limited to this, and can be changed as appropriate. For example, the determination unit 104 may determine the NG word based on a partial match between the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word. In this case, not only when the first judgment target and the pronunciation of the NG word completely match, it is possible to detect a word that partially matches the NG word included in the text. Thus, even words that are similar to the NG word included in the text can be detected. Note that the ratio regarding partial agreement with the pronunciation corresponding to the NG word may be determined in advance, or may be determined by machine learning based on the results or Bayesian statistics.

Further, in the above embodiment, in the tertiary determination process in the NG determination process shown in FIG. 11, the determination unit 104 determines whether the pronunciation corresponding to the first determination target and the pronunciation corresponding to the NG word (individual NG word). The case where the NG word is determined by the comparison is described. However, the determination method by the determination unit 104 is not limited to this, and can be changed as appropriate. For example, the NG word may be determined by comparing the character string forming the first determination target with the character string forming the NG word. In this case, the NG word is determined by comparing the character string forming the first determination target generated by dividing the text at an arbitrary position and the character string forming the NG word. For this reason, it is possible to compare an NG word with any combination of characters and numbers constituting the text. Thereby, regardless of the context of the text, it is possible to accurately detect the NG word contained in the text.

Furthermore, in the above embodiment, the case where management server 10 generates voice data using specific voiceprint data from the text registered in storage unit 103 and provides the generated voice data to portable terminal 42 etc. Explained. However, the information provided to the portable terminal 42 or the like is not limited to only voice data, and can be added as appropriate. For example, in addition to the audio data, the text used for the generation may be provided together. In addition to audio data, image data, moving image data or computer graphics (CG) may be provided. In this case, it is preferable as an embodiment to provide image data and moving image data related to audio data.

Furthermore, in the above embodiment, when the text registered in the storage unit 103 contains an NG word, the voice generation unit 112 can replace the portion corresponding to the NG word. Such partial substitution of text can also be applied to parts of text other than NG words. For example, a specific word included in the text may be replaced with a different prepared word. In this case, for example, when the determination unit 104 detects a specific word from the text, the voice generation unit 112 can replace it with a different word prepared in advance.

A specific example is shown about the aspect of such substitution. For example, it is assumed that "is (DESU)" is registered as a replacement target word (hereinafter referred to as "replacement target word") in the text "It is good weather today (KYOUWAY OITENKIDESU)" shown in FIG. 3A. . Also, it is assumed that "NYAN" is registered in advance as a word to be replaced with the replacement target word (hereinafter referred to as "replacement word"). For example, these replacement target words and replacement words can be registered from the voiceprint registration terminal 30.

As described above, in the embodiment in which a specific word included in the text is replaced, the determination unit 104 determines whether the text “I have a good weather today (KYOUWAY OITENKIDESU)” includes “is (DESU)” which is a replacement target word Judgment (replacement judgment processing). This replacement determination process is replaced with, for example, the NG determination process of step ST 905 shown in FIG. The determination target in this substitution determination process is the first determination target described above (a determination target obtained by converting a text into a phonetic word and divided into arbitrary parts) or a second determination target (a determination target obtained by dividing text into morphemes) be able to. In the replacement determination process, the determination unit 104 determines whether the second determination target and / or the first determination target includes a replacement target word.

When the replacement determination process detects “I (DESU)” as a replacement target word, the voice generation unit 112 replaces the replacement target word with “Nyan (NYAN)” as a replacement word (voice data (replacement). Audio data). As a result, voice data corresponding to the text "Kyouwayoite nekiyan" is generated as the replacement voice data. Then, the generated substitute voice data is transmitted from the management server 10 to the portable terminal 42. In the portable terminal 42, the substitute audio data is output as audio by a speaker or the like. As a result, voice data of “Good weather today (KYOUWAY ITE NKIN YAN)” is output from the portable terminal 42.

Thus, in the embodiment in which a specific word included in the text is replaced, for example, when the voiceprint data of a specific character such as animation is selected from the setting input screen (see FIG. 10), the newspaper article etc. Audio data can be provided to the portable terminal 42 in accordance with the way of speaking of the character based on the text. Thereby, for example, it is possible to provide an audio providing service for reading newspaper articles and the like by a specific character.

Furthermore, in the above embodiment, the determination of the NG word in the case where the text using Japanese is to be determined is mainly described. However, the determination target of the NG word is not limited to Japanese, and can be applied to any language used worldwide. Further, the presence or absence of an NG word may be determined across multiple languages. For example, when the pronunciation corresponding to the text using English matches or is similar to the pronunciation corresponding to the NG word in Japanese, part of the text using English can be determined as the NG word.

According to the determination apparatus of the present invention and the voice providing system using the same, it is possible to accurately detect non-permissive words contained in the text regardless of the context of the text, and in particular, specific voiceprint data It can be suitably used for a voice providing service that reads texts using

This application is based on Japanese Patent Application No. 2015-235703 filed on Dec. 2, 2015. All this content is included here.

Claims

A determination apparatus for determining non-permissive words contained in texts configured in natural language, comprising:
A determination target generation unit that generates a first determination target by dividing the text at an arbitrary position;
A determination unit that determines the non-permissive word by comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word;
A determination apparatus comprising:
The determination according to claim 1, wherein the determination unit determines the non-permissive word by partially matching the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. apparatus.
The determination target generation unit generates a second determination target by dividing the text into morphemes.
The determination unit is configured to compare the second determination target with the non-permissive word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. The judgment device according to claim 1 or 2, wherein the permission word which is not the non-permission word is determined.
The determination unit determines a pronunciation corresponding to the second determination target determined as the allowable word before comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word. The determination apparatus according to claim 3, wherein the non-permissive word is determined by comparison with a pronunciation corresponding to the non-permissive word.
A determination apparatus for determining non-permissive words contained in texts configured in natural language, comprising:
A determination target generation unit that generates a first determination target by dividing the text at an arbitrary position;
A determination unit that determines the non-permissive word by comparing the character string that configures the first determination target with the character string that configures the non-permissive word;
A determination apparatus comprising:
A voice providing system comprising the determination device according to any one of claims 1 to 5 and providing voice according to the text based on specific voiceprint data,
The voice providing system according to claim 1, wherein the determination device comprises a voice generation unit that generates voice data using the specific voiceprint data from the text according to the determination result of the determination unit.
The voice providing system according to claim 6, wherein the voice generation unit generates voice data corresponding to the text when the text does not include the non-permitted word.
7. The voice generation unit according to claim 6, wherein the voice generation unit generates voice data corresponding to the text in which a portion corresponding to the non-permissive word is corrected, when the non-permissive word is included in the text. The voice provision system according to Item 7.
The voice providing system according to claim 8, wherein the voice generation unit deletes a portion corresponding to the non-permitted word included in the text.
The voice providing system according to claim 8, wherein the voice generation unit replaces a portion corresponding to the non-permitted word included in the text.
11. The voice providing system according to claim 10, wherein the voice generation unit uses voiceprint data different from the specific voiceprint data for a portion corresponding to the non-permitted word included in the text.
11. The voice providing system according to claim 10, wherein the voice generation unit replaces a portion corresponding to the non-permitted word included in the text with a word of a different expression.
The determination apparatus includes a storage unit that stores the specific voiceprint data.
The voice providing system according to claim 6, wherein the storage unit stores the non-permitted word associated with the specific voiceprint data.
A determination method for determining a non-permissive word included in a text composed of natural language,
Dividing the text at an arbitrary position to generate a first determination target;
Determining the non-permissive word by comparing the pronunciation corresponding to the first determination target and the pronunciation corresponding to the non-permissive word.