WO2009104613A1

WO2009104613A1 - Text conversion device, method, and program

Info

Publication number: WO2009104613A1
Application number: PCT/JP2009/052716
Authority: WO
Inventors: 玲史近藤; 康行三井
Original assignee: 日本電気株式会社
Priority date: 2008-02-19
Filing date: 2009-02-17
Publication date: 2009-08-27
Also published as: JP5521554B2; JPWO2009104613A1

Abstract

It is possible to convert a sentence of an inputted text according to a parameter expressing a psychological state of a listener without changing the content. That is, a parameter reflecting a psychological state of a listener (tension, urgency, concentration) is used to deform the text itself to be transferred so that the listener at the moment can easily understand the content of the text.

Description

[Name of invention determined by ISA based on Rule 37.2] Text conversion device, method, program

(Description of related applications)
This application claims the priority of the previous Japanese Patent Application No. 2008-037603 (filed on Feb. 19, 2008), and the entire description of the previous application is incorporated herein by reference. Is considered to be.
The present invention relates to a text conversion device, a text conversion method, a text conversion program, a speech synthesizer, and a robot, and in particular, a text conversion device, a text conversion method, a text conversion program, and a voice that perform conversion that facilitates transmission of text contents. The present invention relates to a synthesis apparatus and a robot.

Patent Document 1 discloses a sentence conversion technique for converting a character string described in a certain natural language into a character string in another representation of the natural language using a conversion rule prepared for each conversion purpose. Yes.

Patent Document 2 includes a driving load determination unit that determines a driving load of a driver who is driving a vehicle, controls the driver's voice output unit according to the driving load of the driver, and sets a reading pause period and speech rate. Techniques for changing are disclosed.

Patent Document 3 discloses a technique for controlling the sound quality of speech synthesis according to user attribute information in a voice response service device.

Patent Document 4 discloses a technique for estimating a speaker's emotion based on features such as sound pressure, pitch frequency, and duration of input speech.

Japanese Patent No. 3932350 JP-A-2005-070703 Japanese Patent No. 3936351 JP 2003-228391 A

The entire disclosures of Patent Documents 1 to 4 above are incorporated herein by reference. The following is an analysis of the related art according to the present invention.
The listener does not always use his / her full ability to listen to the content, and may be missed or misunderstood depending on the sequential psychological situation. An object of the present invention is to provide a text conversion device, a text conversion method, a text conversion program, a speech synthesizer, and a robot that perform text conversion so that the semantic content of the listener is easily transmitted in consideration of the psychological situation of the listener. There is.

In this regard, Patent Document 1 includes, as a specific application example, an application example to (A) question answering system, (B) sentence compression system, (C) recommendation system, (D) difficult sentence conversion system, and Although there are examples of conversion to different expressions that will be performed in the past, there is no guarantee that the text after these conversions will be easily communicated to the listener. In addition, Patent Document 1 suggests application to conversion / inverse conversion between written words and spoken words (paragraph 0064). However, the purpose of conversion is to generate text that is easy to convey to the listener, and a specific example for that purpose. No specific conversion rules have been disclosed.

In addition, the “driver's driving load” as referred to in Patent Document 2 specifically refers to the vehicle speed. When driving, the reading speed is higher than normal (the reading speed for other passengers). It is only disclosed that audio output is performed.

In addition, although the technique described in Patent Document 3 does not perform the text conversion as originally mentioned above, it is impossible to cope with the psychological situation of the listener that changes every moment.

Further, Patent Document 4 describes a technique for estimating a speaker's emotion from input speech. Specifically, the application to a nursing robot that performs an inquiry and measures a change in emotion from its feature amount is disclosed. It has only been done.

According to the first aspect of the present invention, the input text conversion suitable for the input parameter is performed within a range in which the parameter representing the psychological state of the user and the text are input and the meaning of the input text is not changed. A text conversion device that performs the operation is provided.

According to a second aspect of the present invention, there is provided a text conversion method using a text conversion device, the step of inputting a parameter representing a user's psychological state and text, and the meaning of the sentence based on the input parameter. And a step of converting the input text within a range that does not change.

According to a third aspect of the present invention, the input text is converted within a range that does not change the meaning of the input text and the process of inputting the parameter representing the psychological state of the user and the text. And a text conversion program for causing a computer to execute the process. This text conversion program can be recorded on a computer-readable storage medium.

According to the present invention, it is possible to convert an arbitrary text into a text whose meaning is easy to be transmitted to the listener. The reason is that a configuration is adopted in which text conversion is performed with parameters representing the psychological situation of the listener as input. Also, derived from this effect, for example, the burden of considering the listener's situation when creating a text to be read is reduced, and the creation of these texts is facilitated.

It is a figure for demonstrating the outline | summary of this invention. It is a block diagram showing the structure of the text converter which concerns on the 1st Embodiment of this invention. It is a figure for demonstrating the relationship between the pitch frequency of an audio | voice, and the urgency level of a user. It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 1st Embodiment of this invention. It is a figure for demonstrating the structure of the word conversion database of the text converter which concerns on the 1st Embodiment of this invention. It is a figure for demonstrating operation | movement of the text conversion apparatus which concerns on the 1st Embodiment of this invention. It is a block diagram showing the structure of the text conversion apparatus which concerns on the 2nd Embodiment of this invention. It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 2nd Embodiment of this invention. It is a figure for demonstrating operation | movement of the text conversion apparatus which concerns on the 2nd Embodiment of this invention. It is a block diagram showing the structure of the text conversion apparatus which concerns on the 3rd Embodiment of this invention. It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 3rd Embodiment of this invention.

Explanation of symbols

11 Microphone (voice input unit)
DESCRIPTION OF SYMBOLS 12 Speech recognition part 13 Response word generation part 14 Text speech synthesis part 15 Speaker 20 Pitch frequency analysis part 21 Psychological condition estimation part 22 Text conversion part 23 Speech rate measurement part 31 Word conversion part 32 Text division part 33 Candidate selection part 34 Word conversion Database (word conversion DB)
36 Text summary section 37 Text enhancement section

Subsequently, the first to third embodiments will be described and described as preferred embodiments of the present invention. In these embodiments, as abstracted in FIG. 1, in both cases, the target text is “spoken as speech” according to a parameter (psychological situation parameter) that represents the psychological situation of the listener. It comprises text conversion means (text conversion unit in FIG. 1) for transforming the text itself so that it becomes “intelligible text”.

[First Embodiment]
First, a first embodiment of the present invention in which text conversion is performed using a user's urgency (degree of urgency) as a parameter representing a user's psychological situation will be described. FIG. 2 is a block diagram showing the configuration of the text conversion apparatus according to the first embodiment of the present invention.

Referring to FIG. 2, the text conversion apparatus according to the first embodiment of the present invention includes a voice input unit 11 such as a microphone, a pitch frequency analysis unit 20, a psychological situation estimation unit 21, a text conversion unit 22, It is configured with. The processing means of these text conversion apparatuses can be realized by a program that causes a computer constituting the text conversion apparatus to execute each process described later.

The pitch frequency analysis unit 20 is a means for analyzing the voice uttered by the user input from the voice input unit 11 and obtaining the pitch frequency.

The psychological situation estimation unit 21 is a means for obtaining a parameter x1 representing the degree of urgency of the user from the average value of the pitch frequency of the user voice and outputting it to the text conversion unit 22.

FIG. 3 is a diagram showing a map for obtaining the user's urgency level x1 from the average value of the pitch frequency. In the example of FIG. 3, the higher the average value of the pitch frequency is, the higher the user's urgency level x1 is. The curve of the monotonically increasing relationship (strictly speaking, when the average value of the pitch frequency exceeds the first threshold th1). When the user's urgency level x1 increases rapidly and the average value of the pitch frequency exceeds the second threshold th2, the S-shaped curve gradually increases again. This is because when the pitch frequency is high, it is possible to estimate that the urgency level of the user is high because the voice is uttered.

The text conversion unit 22 is a means for converting the input text based on the score according to the user's urgency level x1 obtained as described above to generate the output text. The score or the total score in the present embodiment is an index representing the ease of understanding as a voice.

FIG. 4 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text division unit 32, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

The word conversion unit 31 uses the word pairs registered in the word conversion DB 34 to obtain word conversion candidate groups that are all sets of text candidates that can be changed and are included in the input text having a length L, and the respective conversion scores. Output.

The word conversion DB 34 records one or more pairs of words that have substantially the same meaning (the sentence meaning does not change even if they are replaced), and a conversion score S1 (i) when the words are converted by each word pair i. Has been. FIG. 5 is an example of word pairs and conversion scores registered in the word conversion DB 34. The conversion score can be set to be low for an input word having a homonym, and to be high for an output word having no homonym. This is because an input word in which a homonym is present (“personal computer” in FIG. 5 and “patker” each have a homonym of “PC”) is output word (in FIG. 5). “PC” and “Pokécon” do not have homonyms.

The text dividing unit 32 converts each text of the input word conversion candidate group into lengths L (1), L (2),. . . A text conversion candidate group that is all combinations divided into N division units of L (N) is output.

For example, when the input text is divided using reading marks, when M reading marks can be inserted, division of 2 M power types is possible.

The text lengths L and L (1) to L (N) can be easily obtained and will be described as using the number of characters correlated with the pronunciation time length. If it can be obtained, the number of mora more strongly correlated with the pronunciation duration can be used.

The candidate selection unit 33 selects a candidate in which y1 = x1− (α1 * S1 + α2 * S2) is positive and minimum among all combinations of the text conversion candidates output by the text dividing unit 32, and the corresponding converted text Is output. Here, α1 and α2 are predetermined constants.

Here, S1 calculated for each word conversion candidate is the sum of the conversion scores S1 (i) of the words used for conversion.

Also, S2 calculated for each text conversion candidate is obtained by S2 = L ^ 2-Σ (L (i) * L (i)). The S2 obtained in this way becomes smaller as the number of divisions is smaller, and as the phrase length after conversion and division is uniform.

FIG. 6 shows the result of calculating the score (total score) for the input text “I bought a personal computer” using S1 and S2. In the example of FIG. 6, α1 = 10 and α2 = 1 are set as the constants α1 and α2.

Referring to FIG. 6, the score calculation method will be described. For example, in the case of word conversion “none” for candidate number 1 and text division “none”, S1 = 0 (no conversion), S2 = 20 ^ 2− (20 ^ 2) = 0, and the total score is It is calculated as α1 × 0 + α2 × 0 = 0.

Similarly, word conversion “a” of candidate number 2 (in the case of “personal computer” in FIG. 5 converted to “personal computer” and text division “none”, S1 = 50, S2 = 20 ^ 2- (13 ^ 2 ) = 231, and the total score is calculated as α1 × 50 + α2 × 231 = 731.

Similarly, the word conversion “b” of candidate number 3 (in the case of “personal computer” in FIG. 5 converted to “PC” and text division “none”, S1 = 50, S2 = 20 ^ 2- (13 ^ 2 ) = 231, and the total score is calculated as α1 × 50 + α2 × 231 = 731.

If the criterion for selecting a candidate for which y1 = x1− (α1 * S1 + α2 * S2) of the candidate selection unit 33 using the total score is positive and minimum is applied, when the user's urgency level x1 is extremely high, The text conversion candidate of number 2 is selected. If the user's urgency level x1 is not remarkably high but is a certain value or more, the text conversion candidate with candidate number 3 is selected. When the user's urgency level x1 is low, the text conversion candidate with candidate number 1 is selected. That is, when the user's urgency level x1 is high, the candidate number 2 is selected with a short synonym of “I bought a personal computer” and a short paraphrase.

The total score is increased every time the text is divided. For example, the text conversion candidates of candidate numbers 4 to 12 in FIG. 6 are more than the text conversion candidates of candidate numbers 1 to 3 that have been subjected to the same word conversion. The overall score is also high. In a state where all text conversion candidates are obtained, when the user's urgency level x1 is 200, candidate number 7 (total score = 159, y1 = 41) obtained by dividing the input text once is selected. Similarly, when the user's urgency level x1 is increased to 500, candidate number 3 (total score = 479, y1 = 21) obtained by performing word conversion so as to shorten the read-out sentence is selected. Further, when the user's urgency level x1 is further increased to 900, the candidate number 11 (total score = 855, y1 = 45) that is further finely divided is selected.

As described above, a user with high urgency generates an output text that does not include synonyms as much as possible (easy to understand) and is finely divided (easy to hear).

In the present embodiment, description has been made assuming that all candidates that can be selected are listed in advance, and selection is performed. However, the user's urgency level x1 is input to the word conversion unit 31 and the text division unit 32, and is not required at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the user's urgency level x1 is high, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text dividing unit 32 and candidate selecting unit 33. Is possible.

[Second Embodiment]
Next, a second embodiment of the present invention in which text conversion is performed using a user's urgency level (immediate degree) as a parameter representing the user's psychological situation will be described. FIG. 7 is a block diagram showing the configuration of the text conversion apparatus according to the second embodiment of the present invention.

Referring to FIG. 7, a text conversion apparatus according to the second embodiment of the present invention includes a voice input unit 11 such as a microphone, a voice recognition unit 12, a response message generation unit 13, an utterance speed measurement unit 23, A psychological situation estimation unit 21, a text conversion unit 22, a text speech synthesis unit 14, and a speaker 15 are configured. The processing means of the speech recognition unit 12, the response message generation unit 13, the speech rate measurement unit 23, the psychological situation estimation unit 21, the text conversion unit 22, and the text speech synthesis unit 14 constitutes a text conversion device. It can be realized by a program that causes a computer to execute each process described later.

The voice recognition unit 12 is means for recognizing the voice input from the microphone 11 and outputting it to the response word generation unit 13.

The response word generation unit 13 is a unit that generates a word that responds to the content of the user's utterance recognized by the voice recognition unit 12 and outputs it to the text conversion unit 22 as input text.

The utterance speed measuring unit 23 is a means for measuring the utterance speed of the voice uttered by the user. The sound input from the microphone 11 is also input to the speech rate measuring unit 23, and the speech rate of the speech uttered by the user is measured.

The psychological state estimation unit 21 outputs a numerical value x2 representing the degree of urgency based on the value of the speech rate measured by the speech rate measurement unit 23.

Here, the numerical value x2 representing the degree of urgency can be obtained from a relationship given in advance so as to have a monotonically increasing relationship with the speech rate value (unit: mora per second).

The text conversion unit 22 is a means for converting the input text based on the score according to the user's degree of urgency x2 obtained as described above, and generating an output text. The score or the total score in the present embodiment is an index representing the ease of understanding as a voice.

FIG. 8 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text summarization unit 36, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

The word conversion unit 31 uses the word pairs registered in the word conversion DB 34 to output word conversion candidate groups which are all sets of changeable text candidates included in the input text of length L.

In the word conversion DB 34, one or more pairs of words having substantially the same meaning are recorded as in the first embodiment.

In the present embodiment, the score S1 calculated for each word conversion candidate is obtained as S1 = L11−L12 from the number of characters L11 of the text before conversion and the number of characters L12 of the text after conversion by the word conversion unit 31. And

The text summarizing section 36 summarizes the text of the input word conversion candidate group having a length L21 and outputs a summary text having a length L22. When a plurality of summary candidates can be generated by the text summarizing unit 36, all of them are included in the text conversion candidate group.

Suppose that the score S2 calculated for each text conversion candidate is obtained by S2 = L21−L22.

The candidate selection unit 33 selects a candidate in which y2 = x2− (α1 * S1 + α2 * S2) is positive and minimum among all combinations of the text conversion candidates output from the word conversion unit 31 and the text summarization unit 36. And output. However, when y2 of all the text conversion candidates becomes negative, the candidate selection unit 33 selects and outputs a candidate that maximizes S1 + S2.

FIG. 9 shows the result of calculating the score (total score) for the input text “My personal computer has been broken” using S1 and S2. In the example of FIG. 9, α1 = 1 and α2 = 1 are set as the constants α1 and α2 in the calculation formula of y2.

Referring to FIG. 9, the score calculation method will be described. For example, in the case of word conversion “none” for candidate number 1 and text summary “none”, S1 = 0 (no shortening due to conversion), S2 = 0 (no shortening due to summary), and the total score is calculated as 0. Is done.

Similarly, the word conversion “a” of candidate number 2 (in the case of “personal computer” in FIG. 5 converted to “personal computer” and the text summary “none” is calculated as S1 = 7, S2 = 0, and the total score is 7 is calculated.

Similarly, in the case of the word conversion “b” of candidate number 3 (“personal computer” in FIG. 5 is converted to “PC” and the text summary is “none”, S1 = 9 and S2 = 0 are calculated, and the total score is 9 is calculated.

When the above-described candidate selection unit 33 using the total score is applied to a criterion for selecting a candidate where y2 = x2− (α1 * S1 + α2 * S2) is positive and minimum, the user's urgency level x2 is extremely high (frequency) 9 or more), the text conversion candidate with the shortest candidate number 3 among the candidate numbers 1 to 3 is selected. When the user's urgency level x2 is not remarkably high but is a certain value or more (frequency 7 or more and less than 9), the text conversion candidate of candidate number 3 is selected. When the user's urgency level x2 is low (frequency less than 7), the text conversion candidate with candidate number 1 is selected. That is, when the user's urgency level x2 is high, a candidate that is paraphrased shortly is selected by the word conversion of candidate number 3 "My PC has been broken."

The total score becomes higher as the effect of text summarization becomes larger. For example, the text conversion candidates of candidate numbers 4 to 9 in FIG. 6 are the text conversion candidates of candidate numbers 1 to 3 that have been subjected to the same word conversion. The overall score is higher than. In a state where all text conversion candidates are obtained, if the user's urgency level x2 is 20, the candidate number 9 (total score = 18, y2 = 2) is selected.

As described above, in this embodiment, when it is determined that the degree of urgency is high, it is possible to generate a sentence that is shorter and highly likely to be transmitted in a short time.

In the present embodiment, it has been described that selection is performed after all possible candidates are listed. However, the user's urgency level x2 is input to the word conversion unit 31 and the text summarization unit 36, and is not required at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the user's urgency level x2 is high, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text summarization unit 36 and candidate selection unit 33. Is possible.

Further, the numerical value x2 representing the degree of urgency of the user is not limited to the above, and can be obtained by the following methods.

For example, the numerical value x2 representing the degree of urgency is obtained as a monotonically increasing relationship with the corresponding value by inputting the moving speed (vehicle speed or driving wheel rotational speed) of a vehicle such as an automobile on which the user is boarding (driving). .

Also, for example, the numerical value x2 representing the degree of urgency is obtained when the brake operation of the automobile is input, the value of x2 is increased when the user driving the automobile depresses the brake, and the acceleration of the movement of the brake pedal is large. , X2 can be obtained by further increasing the value.

[Third Embodiment]
Next, a third embodiment of the present invention in which text conversion is performed using a user's concentration degree (concentration degree) as a parameter representing the user's psychological situation will be described. FIG. 10 is a block diagram showing the configuration of the text conversion apparatus according to the third embodiment of the present invention.

Referring to FIG. 10, the text conversion device according to the third exemplary embodiment of the present invention includes a voice input unit 11 such as a microphone, a voice recognition unit 12, a response message generation unit 13, a speech rate measurement unit 23, A psychological situation estimation unit 21, a text conversion unit 22, a text speech synthesis unit 14, and a speaker 15 are configured. As in the second embodiment, the processing means of the text conversion apparatus of this embodiment can be realized by a program that causes a computer constituting the text conversion apparatus to execute each process described later.

In the present embodiment, the psychological situation estimation unit 21 outputs a value x3 indicating the user's concentration level from the user's utterance speed, and the text conversion unit 22 performs text conversion using the value x3 indicating the user's concentration level. The other elements are the same as those in the second embodiment described above, and the differences will be mainly described.

The psychological situation estimation unit 21 obtains the user concentration x3 from the temporal variation component of the speech rate value. Specifically, the fluctuation component Vdiff = Vmax−Vmin is calculated from the maximum value Vmax and the minimum value Vmin of the speech rate.

When the value of the temporal variation component Vdiff of the utterance speed value is large, the psychological situation estimation unit 21 determines that there is a high possibility of being distracted by things other than dialogue and is a numerical value indicating the degree of concentration. x3 is output.

Therefore, the numerical value x3 representing the degree of concentration can be obtained from a relationship given in advance so as to have a monotonously decreasing relationship with the value Vdiff of the temporal variation component of the speech speed value.

The text conversion unit 22 is a means for converting the input text based on the score according to the user concentration x3 obtained as described above, and generating the output text.

FIG. 11 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text enhancement unit 37, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

In the word conversion DB 34, as in the first embodiment, one or more pairs of words having substantially the same meaning and a conversion score S1 (i) when the word is converted by each word pair i are recorded. Yes.

The text emphasizing unit 37 extracts an important word from the text of the input word conversion candidate group having a length L21, and generates an output text having a length L22 in which the important word is repeated an arbitrary number of times (phrase repetition processing). For example, for the input text “Please press the B button next”, the text emphasizing unit 37 extracts the “B button” as an important word, and repeats the important word twice with a punctuation mark. "B button, please press B button". When the input text includes a plurality of important words, the text emphasizing unit 37 outputs all combinations of patterns obtained by repeating each important word as a text conversion candidate group.

Important word candidates may be defined in the text emphasizing unit 37 in advance, or an object or the like may be extracted as an important word according to a certain rule.

Suppose that the score S2 calculated for each text conversion candidate is obtained by S2 = L22−L21.

The candidate selection unit 33 determines that y3 = (1 / x3) − (β1 * S1 + β2 * S2) is positive and minimum among all combinations of the text conversion candidates output by the word conversion unit 31 and the text enhancement unit 37. Select the candidate to be output. However, when y3 of all the text conversion candidates is negative, the candidate selection unit 33 selects and outputs a candidate having the maximum S1 + S2.

The score calculation method in this embodiment will be described. For example, in the case of word conversion “none” and text enhancement “none”, S1 = 0 (no conversion) and S2 = 0 (no enhancement) are calculated, and the total score is calculated as zero.

On the other hand, word conversion such as conversion of “personal computer” to “personal computer” in FIG. 5 is performed, and in the case of text emphasis “none”, S1 = 50 and S2 = 0 are calculated, and β1 and β2 are set to 1, respectively. In this case, the total score is calculated as 50.

On the other hand, in the case where word conversion such as conversion of “personal computer” in FIG. 5 to “personal computer” is performed and the text emphasis “button B” is repeated twice, S1 = 50 and S2 = 4 are calculated, and β1 , Β2 is 1, and the total score is calculated as 54.

When the above-mentioned candidate selection unit 33 using the total score is applied to a criterion for selecting a candidate in which y3 = (1 / x3) − (β1 * S1 + β2 * S2) is positive and minimum, the user concentration level x3 is low. Sometimes, a text conversion candidate that performs both the word conversion and text emphasis is selected. On the other hand, when it is determined that the user's concentration x3 is high, a text conversion candidate that is not subjected to the word conversion or text emphasis is selected.

As described above, in this embodiment, when it is determined that the user's concentration is low, it is possible to generate a more verbose but easy-to-understand expression sentence.

In this embodiment, the selection is made after all possible candidates are listed in advance. However, the user's concentration x3 is input to the word conversion unit 31 and the text emphasizing unit 37, and is unnecessary at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the degree of user concentration x3 is low, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text enhancement unit 37 and candidate selection unit 33. Is possible.

Further, the numerical value x3 representing the user concentration level is not limited to the above, and can be obtained by the following methods.

For example, the numerical value x3 representing the concentration level is obtained by measuring and inputting the electrical resistance of the user's skin to estimate the amount of sweating of the user as a value that has a monotonous decrease relationship with the electrical resistance. It can be obtained from the relationship that the degree of concentration is high.

Also, for example, the numerical value x3 representing the degree of concentration can be obtained from the relationship that the degree of concentration is high when the number of breaths per hour is small by measuring and inputting the user's respiration.

Also, for example, the numerical value x3 representing the degree of concentration can be obtained from the relationship that the degree of concentration is high when the number of beats per hour is large by measuring and inputting the user's pulse.

The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and further modifications, replacements, and replacements may be made without departing from the basic technical idea of the present invention. Adjustments can be made. Further, for example, in the above-described second embodiment, it has been described that the score from the viewpoint of whether or not the converted word has a synonym is not used, but by appropriately correcting this score and adding it to the total score It is possible to output a voice that is easy to be transmitted to an imminent user.

Further, in the above first to third embodiments, the text conversion based on the user's urgency level, urgency level, and concentration level has been described, but the user urgency levels of the above first to third embodiments, It is also possible to perform text conversion based on the degree of urgency and the degree of concentration, respectively, and select a text suitable for the psychological situation of the user from the text conversion and utter it. Of course, it is possible to perform text conversion not only with the user's urgency level, urgency level, and concentration level but also with parameters representing other psychological situations.
The parameters representing various psychological situations such as the user's urgency level, urgency level, and concentration level, input / output text, and text conversion program may be anything that can be handled as a physical or electrical signal by the computer. . The text conversion program causes a computer in which parameters and text representing the psychological situation are input to function as physical means for outputting the converted text.

The present invention can be used in various applications such as a speech synthesizer, a speech dialogue system, a speech automatic response device, an intelligent robot, etc., which changes the utterance text by combining with a text-to-speech synthesizer. it can.
It should be noted that the embodiments and examples can be changed and adjusted within the scope of the entire disclosure (including claims) of the present invention and based on the basic technical concept. Various combinations and selections of various disclosed elements are possible within the scope of the claims of the present invention. That is, the present invention of course includes various variations and modifications that could be made by those skilled in the art according to the entire disclosure including the claims and the technical idea.

Claims

A text conversion characterized in that a conversion operation of the input text suitable for the input parameter is performed within a range that does not change the meaning of the input text, taking as input a parameter representing the user's psychological state and text. apparatus.
Creating a plurality of text conversion candidates, obtaining a score indicating the clarity when the output text is received as speech for each text conversion candidate, and selecting a text conversion candidate having a score that matches the input parameter. Features
The text conversion apparatus according to claim 1.
3. The text conversion apparatus according to claim 1, wherein the user's urgency is used as a parameter representing the user's psychological state.
3. The text conversion apparatus according to claim 1, wherein the user's urgency level is used as a parameter representing the user's psychological state.
The text conversion device according to claim 1 or 2, wherein the user's concentration is used as a parameter representing the psychological state of the user.
The text conversion device according to claim 3, wherein the urgency level of the user is replaced with an average value of pitch frequencies of voices uttered by the user.
5. The text conversion apparatus according to claim 4, wherein the urgency level of the user is replaced by a speed of voice uttered by the user.
6. The text conversion apparatus according to claim 5, wherein the degree of concentration of the user is replaced with a temporal variation component of a speed of voice uttered by the user.
Create multiple text conversion candidates by replacing a word in the input text with another word,
9. The text conversion apparatus according to claim 2, wherein a score is given to each of the text conversion candidates according to the number of homonyms that the replaced word has.
Create multiple text conversion candidates by replacing a word in the input text with another word,
The score is given to each of the text conversion candidates according to a time length when the text conversion candidates are read out or a sentence length of each of the text conversion candidates. A text conversion device according to claim 1.
Create multiple text conversion candidates by dividing the input text into multiple units,
The text according to any one of claims 2 to 8, wherein a score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit. Conversion device.
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
The text conversion apparatus according to claim 2, wherein a score is given to each of the text conversion candidates according to the number of times of the conversion operation.
Based on the user's voice or user's operation content, a psychological situation estimation unit that outputs a parameter for estimating the user's psychological situation;
Based on the parameter for estimating the psychological state of the user, one or more text conversion processes of replacement of a word in the input text with a synonym, insertion of a pause point in the input text, and repetition of a phrase in the input text are performed. A text converter to perform,
A text conversion device comprising:
The text conversion unit generates a text conversion candidate group according to a predetermined text conversion rule, assigns a score according to a predetermined score calculation formula to each text conversion candidate included in the text conversion candidate group, and the user The text conversion device according to claim 13, wherein a text conversion candidate to which a score that matches a parameter for estimating a psychological state of the text is selected.
15. A speech output device comprising: the text conversion device according to claim 1; and a text-to-speech synthesis unit that reads out text output from the text conversion device.
A robot comprising the voice output device according to claim 15 and performing voice output according to a listener's psychological situation.
A text conversion method by a text conversion device,
Inputting parameters representing the user's psychological state and text;
Converting the inputted text within a range that does not change the meaning of the sentence based on the input parameter.
A plurality of text conversion candidates are generated from the text using a predetermined text conversion candidate creation rule,
18. The text conversion method according to claim 17, wherein for each of the text conversion candidates, a score indicating ease of understanding when the output text is received as speech is obtained, and a text conversion candidate having a score commensurate with the input parameter is selected.
Create multiple text conversion candidates by replacing a word in the input text with another word,
19. The text conversion method according to claim 18, wherein a score is given to each of the text conversion candidates in accordance with the number of homonyms that the replaced word has.
Create multiple text conversion candidates by replacing a word in the input text with another word,
19. The score according to claim 18, wherein a score is given to each text conversion candidate according to a time length when the text conversion candidates are read out or a sentence length of each text conversion candidate. Text conversion method.
Create multiple text conversion candidates by dividing the input text into multiple units,
The text conversion method according to claim 18, wherein a score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit.
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
The text conversion method according to claim 18, wherein a score is given to each text conversion candidate according to the number of times of the conversion operation.
A process of inputting a parameter representing the psychological state of the user and text,
A text conversion program that causes a computer to execute a process of converting the input text within a range that does not change the meaning of the input text.
A process of generating a plurality of text conversion candidates from the text using a predetermined text conversion candidate creation rule;
A process for obtaining a score indicating ease of understanding when the output text is received as speech for each of the text conversion candidates and selecting the text conversion candidate having a score that matches the input parameter is executed by the computer. Item 24. The text conversion program according to Item 23.
Create multiple text conversion candidates by replacing a word in the input text with another word,
The text conversion program according to claim 24, wherein a score is given to each of the text conversion candidates according to the number of homonyms possessed by the replaced word.
By replacing a word in the input text with another word, a plurality of text conversion candidates are created, and the time length when each text conversion candidate is read out, or the sentence length of each text conversion candidate is set. 25. The text conversion program according to claim 24, wherein a score is given to each of the text conversion candidates accordingly.
Create multiple text conversion candidates by dividing the input text into multiple units,
The text conversion program according to claim 24, wherein a score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit.
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
The text conversion program according to claim 24, wherein a score is given to each text conversion candidate according to the number of times of the conversion operation.
29. A voice output program for causing the computer to further execute a process of converting a text to be output into a voice by a text voice synthesis technique using the text conversion program according to any one of claims 23 to 28.