WO2019142419A1

WO2019142419A1 - Information processing device and information processing method

Info

Publication number: WO2019142419A1
Application number: PCT/JP2018/038725
Authority: WO
Inventors: 亜由美中川; 賢次杉原
Original assignee: ソニー株式会社
Priority date: 2018-01-22
Filing date: 2018-10-17
Publication date: 2019-07-25

Abstract

[Problem] To easily correct erroneous selection of a form to be input. [Solution] Provided is an information processing device, comprising: a control unit selecting, from a plurality of forms, a first target form for entry on the basis of a user input operation, and enters characters in the first target form. The control unit selects a second target form that is different from the first target form on the basis of feedback from the user to the content of input in the first target form, and enters characters in the second target form.

Description

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD

The present disclosure relates to an information processing apparatus and an information processing method.

In recent years, many techniques have been developed to reduce the burden on the user using a character input device such as a keyboard. The above-mentioned techniques include, for example, a speech recognition technique for recognizing a user's speech and converting it into a character string. Also, many techniques for improving the accuracy of speech recognition have been proposed. For example, Patent Document 1 discloses a technique for correcting recognition errors associated with proper nouns.

JP 2004-258531 A

By the way, in recent years, an input form in which a plurality of forms to be subjected to character input exist is widely spread. In the case of such an input format, it may be assumed that a form for inputting a speech recognition result is erroneously selected, but in the technique described in Patent Document 1, it is difficult to correct an error of the form.

Therefore, the present disclosure proposes a new and improved information processing apparatus and information processing method capable of easily correcting a selection error of a form to be input.

According to the present disclosure, there is provided a control unit which selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form; The unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and selects the second target form as the second target form. An information processing apparatus for inputting characters is provided.

Further, according to the present disclosure, the processor selects a first target form to be input from a plurality of forms based on the user's input operation, and performs character input on the first target form. A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form An information processing method is provided, including: performing.

As described above, according to the present disclosure, it is possible to easily correct a selection error of a form to be input.

Note that the above-mentioned effects are not necessarily limited, and, along with or in place of the above-mentioned effects, any of the effects shown in the present specification, or other effects that can be grasped from the present specification May be played.

It is a figure for explaining an outline of a 1st embodiment of this indication. It is a figure for demonstrating the outline | summary of the embodiment. It is a block diagram showing an example of composition of an information processing system concerning the embodiment. It is a block diagram showing an example of functional composition of an information processing terminal concerning the embodiment. It is a block diagram showing an example of functional composition of an information processing server concerning the embodiment. It is an example of the Nbest result of the speech recognition process concerning the embodiment. It is a figure for demonstrating the correction process which designated the order block which concerns on the embodiment. It is a figure for demonstrating the correction process based on the connection probability between unit blocks which concerns on the same embodiment. It is a figure for demonstrating the correction process when the form which a user does not intend is selected by the setting of the semantic analysis which concerns on the same embodiment. It is a figure for demonstrating the correction process when the form which a user does not intend is selected by the unset of the domain which concerns on the same embodiment. It is a figure for demonstrating the addition of the domain with respect to the specific expression which concerns on the embodiment. It is a figure for demonstrating the addition of the domain with respect to the specific expression which concerns on the embodiment. It is a figure for demonstrating the input-output control in, when the reliability of the speech recognition based on the embodiment is low. It is a figure for demonstrating input-output control in, when the reliability of the character string candidate which concerns on the same embodiment antagonizes. It is a figure for demonstrating input-output control in, when the reliability of the character string candidate which concerns on the same embodiment antagonizes. It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. It is a figure for demonstrating the correction process in case the several unit block with which the same domain based on the embodiment was matched exists. It is a figure for demonstrating the process in case a character string is already input into the form designated by the user's feedback which concerns on the same embodiment. It is a figure for demonstrating the correction process in case English is used as a classification of the character string which concerns on the same embodiment. It is a flowchart which shows the flow of operation | movement of the information processing server which concerns on the embodiment. It is a figure for demonstrating the automatic input with respect to the form arrange | positioned on a web page. It is a figure which shows the example of the input mistake by an automatic input tool. It is a figure which shows the example of the input mistake by an automatic input tool. It is a figure for demonstrating the automatic input control by the input-output control part 220 which concerns on 2nd Embodiment of this indication. It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. It is a figure showing an example of hardware constitutions concerning one embodiment of this indication.

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration will be assigned the same reference numerals and redundant description will be omitted.

The description will be made in the following order.
1. First Embodiment 1.1. Overview 1.2. System configuration example 1.3. Functional configuration example of information processing terminal 10 1.4. Functional configuration example of information processing server 20 1.5. Details of Function 1.6. Flow of operation 2. Second Embodiment 2.1. Overview 2.2. Function Details 3. Hardware configuration example 4. Summary

<1. First embodiment>
<< 1.1. Overview >>
First, an outline of the first embodiment of the present disclosure will be described. In recent years, many techniques have been developed to reduce the burden on the user associated with character input. Such techniques include, for example, speech recognition techniques that recognize the user's speech and convert it into a character string. According to this technique, it is possible to release the user from the load of character input using a keyboard or the like.

On the other hand, in character input by speech recognition, the accuracy with which the user's speech is accurately converted to a character string becomes very important. If the accuracy of speech recognition is low, it is also assumed that the load for the user to correct an erroneously inputted character string increases and the significance of character input by speech recognition is lost.

In addition, as described above, in recent years, an input form in which a plurality of forms to be input for character input exist is in widespread use. Such an input format is adopted, for example, in an input interface such as a to-do list or a scheduler, and has a plurality of forms corresponding to a title, a date, a time, and the like.

Here, when character input by voice recognition is realized in the above input format, it is required to correctly input a character string corresponding to the user's utterance in a form intended by the user. However, in actuality, it is also assumed that the speech recognition result is input to a form not intended by the user due to factors such as the speech recognition accuracy and the setting of semantic analysis. Also, in this case, it is difficult for the general voice input interface to easily correct the error of the form as described above by voice alone.

In order to avoid the above situation, for example, it is also assumed that the user explicitly specifies a form to be input in advance and speaks. However, in the case of a device mainly composed of plaintext input (free speech input), it is difficult to specify a form in advance, and the significance of plaintext input that allows free speech will be lost.

The technical idea concerning this opening was conceived by paying attention to the above points, and when characters are input to an unintended form, the user designates the correct form after the fact that the form error is made Make it possible to eliminate it. To this end, the information processing apparatus according to an embodiment of the present disclosure selects a first target form to be input from a plurality of forms based on the user's input operation, and inputs characters to the first target form. Control unit to perform the In addition, the control unit selects a second target form different from the first target form based on the user's feedback on the input content input to the first target form, and the second target form is a character. One of the features is to perform input.

FIG. 1 and FIG. 2 are diagrams for explaining the outline of the present embodiment. In a speech input interface including a plurality of forms, as a factor by which a speech recognition result is input to a form not intended by the user, for example, the accuracy of speech recognition itself may be mentioned.

FIG. 1 is a diagram showing an example when a form not intended by the user is selected due to an error in speech recognition result. FIG. 1 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4. Here, the forms F1 to F4 may be forms for inputting the title, date, time, and day of the week regarding the schedule, respectively.

In FIG. 1, first, as shown in the upper part, the user U instructs the utterance UO 1a to register a schedule of “Tuesday”. However, at this time, since there is an error in the speech recognition and “Tuesday” is recognized as “Kaya you”, it corresponds to the title that allows free input, not the form F 4 corresponding to the day intended by the user U The voice recognition result is input to Form F1.

Next, as shown on the lower left, the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week. I am doing UO1b.

At this time, the information processing server 20 according to the present embodiment may move the speech recognition result from the form F1 to the form F4 based on the detected speech UO1b of the user U. At this time, the information processing server 20 according to the present embodiment can correct the voice input result in accordance with the input format of the form F4 corresponding to the day of the week. Focusing on the lower right of FIG. 1, it can be understood that “Kayayou” input to the form F1 is corrected to “Tuesday” by the above-described processing, and is correctly input to the form F4 intended by the user U.

As described above, according to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the speech recognition and further correct the error itself of the speech recognition.

FIG. 2 is a diagram showing an example in which a form not intended by the user is selected due to an error in character conversion in the speech recognition process. FIG. 2 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4 as in FIG. In the following description of the drawings, the description of the configuration and the like of the common form is omitted.

In FIG. 1, first, as shown in the upper part, the user U instructs the utterance UO 2 a to register a schedule of “5 days”. However, at this time, since there is an error in the character conversion in the speech recognition process, and "5 days" is recognized as "when", the user U accepts free input instead of the form F2 corresponding to the intended date. The speech recognition result is input to the form F1 corresponding to the title.

Next, as shown on the lower left, the user U utters a form number to input the speech recognition result erroneously input to the form F1 into the intended form, that is, the form F2 corresponding to the date. I am doing UO2b. As described above, the user U may designate a form number displayed on the voice input interface IF in addition to the form name to issue a correction instruction.

At this time, the information processing server 20 according to the present embodiment may move the speech recognition result from the form F1 to the form F2 based on the detected speech UO2b of the user U. At this time, the information processing server 20 according to the present embodiment may correct the voice input result in accordance with the input format of the form F2 corresponding to the date. Focusing on the lower right of FIG. 2, it can be seen that “when” that is input to the form F1 is corrected to a date format representing 5 days, and is correctly input to the form F2 intended by the user U. Further, since the date is correctly input to the form F2, the information processing server 20 may automatically input the day of the week corresponding to the form F4 based on the date.

As described above, according to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the character conversion in the speech recognition process and further correct the error itself of the character conversion. It becomes possible.

The outline of the present embodiment has been described above. Hereinafter, various correction functions realized by the information processing method according to the present embodiment will be described in detail with specific examples.

<< 1.2. System configuration example >>
First, a configuration example of an information processing system according to an embodiment of the present disclosure will be described. FIG. 3 is a block diagram showing an exemplary configuration of the information processing system according to the present embodiment. Referring to FIG. 3, the information processing system according to the present embodiment includes an information processing terminal 10 and an information processing server 20. Further, the information processing terminal 10 and the information processing server 20 are connected via the network 30 so as to be able to communicate with each other.

(Information processing terminal 10)
The information processing terminal 10 according to the present embodiment is an information processing apparatus that provides the user with a character input interface having a plurality of forms based on control by the information processing server 20. The information processing terminal 10 according to the present embodiment is realized by, for example, a smartphone, a tablet, a head mounted display, a general-purpose computer, or a dedicated device of a stationary type or an autonomous moving type.

(Information processing server 20)
The information processing server 20 according to the present embodiment is an information processing apparatus that controls input / output related to a character input interface including a plurality of forms. The information processing server 20 according to the present embodiment may control the display of the character input interface and the character input to the form.

In addition, the information processing server 20 according to the present embodiment is characterized in that it realizes a character input interface which allows the user to easily correct the error of the form as described with reference to FIGS. 1 and 2. One.

(Network 30)
The network 30 has a function of connecting the information processing terminal 10 and the information processing server 20. The network 30 may include the Internet, a public network such as a telephone network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), a WAN (Wide Area Network), and the like. Also, the network 30 may include a leased line network such as an Internet Protocol-Virtual Private Network (IP-VPN). The network 30 may also include a wireless communication network such as Wi-Fi (registered trademark) or Bluetooth (registered trademark).

The configuration example of the information processing system according to the present embodiment has been described above. The configuration described above with reference to FIG. 3 is merely an example, and the configuration of the information processing system according to the present embodiment is not limited to such an example. For example, the functions of the information processing terminal 10 and the information processing server 20 according to the present embodiment may be realized by a single device. The configuration of the information processing system according to the present embodiment can be flexibly deformed according to the specification and the operation.

<< 1.3. Functional configuration example of the information processing terminal 10 >>
Next, a functional configuration example of the information processing terminal 10 according to the present embodiment will be described. FIG. 4 is a block diagram showing an example of a functional configuration of the information processing terminal 10 according to the present embodiment. Referring to FIG. 4, the information processing terminal 10 according to the present embodiment includes a display unit 110, an audio output unit 120, an audio input unit 130, an imaging unit 140, a sensor unit 150, a control unit 160, and a server communication unit 170. .

(Display unit 110)
The display unit 110 according to the present embodiment has a function of outputting visual information such as an image or text. The display unit 110 according to the present embodiment displays a character input interface based on control by the information processing server 20, for example.

To this end, the display unit 110 according to the present embodiment includes a display device or the like that presents visual information. Examples of the display device include a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, and a touch panel. In addition, the display unit 110 according to the present embodiment may output visual information by a projection function.

(Voice output unit 120)
The voice output unit 120 according to the present embodiment has a function of outputting various sounds including voice. For this purpose, the audio output unit 120 according to the present embodiment includes an audio output device such as a speaker or an amplifier.

(Voice input unit 130)
The voice input unit 130 according to the present embodiment has a function of collecting sound information such as an utterance of a user and an ambient sound generated around the information processing terminal 10. The voice input unit 130 according to the present embodiment includes a plurality of microphones for collecting sound information.

(Imaging unit 140)
The imaging unit 140 according to the present embodiment has a function of capturing an image of the user or the surrounding environment. The image information captured by the imaging unit 140 may be used for detection of the line of sight of the user by the information processing server 20 or the like. The imaging unit 140 according to the present embodiment includes an imaging device capable of capturing an image. Note that the above image includes moving images as well as still images.

(Sensor unit 150)
The sensor unit 150 according to the present embodiment has a function of collecting various sensor information related to the surrounding environment and the user. The sensor information collected by the sensor unit 150 may be used, for example, for gesture recognition by the information processing server 20. The sensor unit 150 includes, for example, an infrared sensor, an acceleration sensor, a gyro sensor, and the like.

(Control unit 160)
The control part 160 which concerns on this embodiment has a function which controls each structure with which the information processing terminal 10 is provided. The control unit 160 controls, for example, start and stop of each component. Further, the control unit 160 inputs a control signal generated by the information processing server 20 to the display unit 110 or the audio output unit 120. The control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20 described later.

(Server communication unit 170)
The server communication unit 170 according to the present embodiment has a function of performing information communication with the information processing server 20 via the network 30. Specifically, the server communication unit 170 transmits, to the information processing server 20, the sound information collected by the voice input unit 130, the image information captured by the imaging unit 140, and the sensor information collected by the sensor unit 150. The server communication unit 170 also receives, from the information processing server 20, a control signal and the like relating to the output of the character input interface.

The example of the functional configuration of the information processing terminal 10 according to the present embodiment has been described above. The above configuration described using FIG. 4 is merely an example, and the functional configuration of the information processing terminal 10 according to the present embodiment is not limited to such an example. For example, the information processing terminal 10 according to the present embodiment may not necessarily include all of the configurations shown in FIG. 4. For example, the information processing terminal 10 can be configured not to include the imaging unit 140, the sensor unit 150, and the like. In addition, as described above, the control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20. The functional configuration of the information processing terminal 10 according to the present embodiment can be flexibly deformed according to the specification and the operation.

<< 1.4. Functional configuration example of the information processing server 20 >>
Next, a functional configuration example of the information processing server 20 according to an embodiment of the present disclosure will be described. FIG. 5 is a block diagram showing an example of a functional configuration of the information processing server 20 according to the present embodiment. Referring to FIG. 5, the information processing server 20 according to the present embodiment includes a recognition unit 210, an input / output control unit 220, and a terminal communication unit 230.

(Recognition unit 210)
The recognition unit 210 according to the present embodiment executes voice recognition processing based on the user's uttered voice collected by the information processing terminal 10. Further, the recognition unit 210 may execute gaze detection based on an image captured by the information processing terminal 10, gesture recognition based on an image or sensor information, and the like.

(Input / output control unit 220)
The input / output control unit 220 according to the present embodiment totally controls input / output processing related to the character input interface. The input / output control unit 220, for example, performs character input on the form of the character input interface based on the user's input operation.

At this time, the input / output control unit 220 according to the present embodiment selects a first target form to be an input target from a plurality of forms based on an input operation using a user's utterance or the like, and the first target You may enter text on the form. That is, the input / output control unit 220 according to the present embodiment can automatically select a form for character input based on the result of speech recognition for the user's speech.

Further, the input / output control unit 220 according to the present embodiment selects a second target form different from the first target form based on user feedback on the input content input to the first target form, It has a function of inputting characters in the second target form. More specifically, the input / output control unit 220 selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is selected. You may enter in the second target form.

Here, the above user's feedback may be an instruction to correct a form error. That is, when the automatically selected form is incorrect, the input / output control unit 220 according to the present embodiment can perform a correction process so that the voice recognition result is input to the form designated by the user. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to easily correct a form error due to various factors without requiring a complicated operation.

(Terminal communication unit 230)
The terminal communication unit 230 according to the present embodiment performs information communication with the information processing terminal 10 via the network 30. Specifically, the terminal communication unit 230 receives sound information, image information, sensor information, and the like from the information processing terminal 10. The terminal communication unit 230 also transmits the control signal generated by the input / output control unit 220 to the information processing terminal 10.

Heretofore, the functional configuration example of the information processing server 20 according to an embodiment of the present disclosure has been described. The above configuration described using FIG. 5 is merely an example, and the functional configuration of the information processing server 20 according to the present embodiment is not limited to such an example. For example, the configuration shown above may be realized by being distributed by a plurality of devices. Further, as described above, the functions of the information processing terminal 10 and the information processing server 20 may be realized by a single device. The functional configuration of the information processing server 20 according to the present embodiment can be flexibly deformed according to the specification and the operation.

<< 1.5. Function Details >>
Next, details of the functions of the input / output control unit 220 according to the present embodiment will be described. As described above, the input / output control unit 220 according to the present embodiment selects the first target form to be input from the plurality of forms based on the input operation of the user, and the first target form is character Has a function to perform input. The input / output control unit 220 may select the first target form based on, for example, the speech recognition result for the input operation performed by speech, and may input the speech recognition result to the first target form.

At this time, the input / output control unit 220 can select the first target form based on the speech recognition result for the speech and the domain set in each form. Also, as described above, when moving the speech recognition result to the second target form designated by the user's feedback, the input / output control unit 220 is corrected based on the domain set in the second target form. The voice recognition result may be input to the second target form.

FIG. 6 is an example of the Nbest result of the speech recognition process according to the present embodiment. The recognition unit 210 according to the present embodiment generates, for example, a plurality of character string candidates based on the user's utterance, and outputs the character string candidate having the highest reliability among the character string candidates as a final speech recognition result. You may At this time, the Nbest result is a collection of character string candidates corresponding to the first to nth degrees of reliability.

On the left of FIG. 6, an example of the Nbest result when the user utters "Tuesday" is shown. In the case of this example, since the recognition unit 210 has the highest degree of reliability related to the character string candidate "Kyanobi", it outputs "Kyanobi" as a speech recognition result.

Further, as illustrated, each character string candidate is associated with a domain indicating an attribute of the character string. For example, the character string candidate "Tuesday" is associated with the domain "day of the week" since the character string is one of the days of the week, and the character string candidate "Kyanobi" has a domain where free input is permitted. "Title" is associated.

At this time, the input / output control unit 220 uses the first target form for the form in which the domain “title” is set based on the domain “title” associated with the “keyboard” output as the speech recognition result. Select as and enter the speech recognition result.

On the other hand, here, when there is feedback from the user regarding the specification of the form, the input / output control unit 220 relates to voice recognition based on the form specified by the user, that is, the domain set in the second target form. It is possible to control the recalculation of the reliability and input the corrected speech recognition result into the second target form.

In the example shown in FIG. 6, the input / output control unit 220 recalculates the reliability in the recognition unit 210 based on the domain "day of the week" by the user specifying a form in which the domain "day of the week" is set by feedback. I am doing it. The right side of FIG. 6 shows the Nbest result re-obtained by the recalculation of the reliability. Referring to the Nbest result, the recognition unit 210 calculates the reliability of the character string candidate associated with the domain “day of the week” to the top, and thus the reliability of the character string candidate “Tuesday” changes the highest. I understand that At this time, the recognition unit 210 outputs “Tuesday” with the highest degree of reliability as a speech recognition result.

Thus, the input / output control unit 220 according to the present embodiment is corrected by causing the recognition unit 210 to recalculate the reliability based on the domain set in the second target form specified by the user. It is possible to obtain speech recognition results and realize input in line with the user's intention. According to the above-described function of the input / output control unit 220, it is possible to easily correct form errors and speech recognition errors without requiring complicated operations.

Next, control in the case where a plurality of unit blocks are included in the speech recognition result will be described. FIG. 7 is a diagram for describing a correction process in which a unit block is designated according to the present embodiment. In FIG. 7, first, as shown in the upper part, the user U instructs the utterance UO 7a to register a schedule of "English from 18 o'clock on Tuesday". However, at this time, there is an error in the speech recognition, and "Tuesday" is recognized as "Kyappie", and it is input in the form F1 corresponding to the title which allows free input, together with the correctly recognized "English" character string. It is done.

Thus, the speech recognition result according to the present embodiment may include a plurality of unit blocks. Here, the above-mentioned unit block indicates, for example, a character string divided by a unit such as a word, a phrase, or a clause, and in the above-mentioned example, corresponds to "gaze" and "English".

At this time, the input / output control unit 220 according to the present embodiment may display information on unit blocks included in the input content together with the input content input to the form. In the example shown in FIG. 7, in the form F1, the input / output control unit 220 displays the unit block relating to "Kyanobi" as "A" and the unit block relating to "English" as "B".

In addition, as shown in the lower left, the user U inputs the character string "Kyanobi" corresponding to the unit block A incorrectly input to the form F1 into the form F4 corresponding to the day of the week, the unit block An utterance UO 7b in which A and form number 2 are designated is performed.

At this time, the input / output control unit 220 deletes the character string "Kyanei" corresponding to the unit block A from the form F1 based on the detected speech UO 7b of the user U, and is corrected by recalculation of the reliability. The string "Tuesday" can be entered into Form F4. As described above, according to the input / output control unit 220 according to the present embodiment, even when the input content input to the form includes a plurality of unit blocks, the correction specifying the target unit block is realized. It becomes possible.

The input / output control unit 220 according to the present embodiment can also correct the input content based on the connection probability between unit blocks. FIG. 8 is a diagram for describing a correction process based on the connection probability between unit blocks according to the present embodiment.

In FIG. 8, first, as shown in the upper part, the user U instructs the utterance UO 8a to register a schedule of “5:00 pm on the 5th”. However, at this time, there is an error in the speech recognition, and "5 days" is recognized as "when" and "3 pm" is recognized as "afternoon shinji", together with the correctly recognized "meeting" string, A form F1 corresponding to a title allowing free input is entered.

In addition, as shown in the lower left, the user U inputs the character string “when”, which corresponds to the unit block A incorrectly input to the form F1, into the form F2 corresponding to the date. The utterance UO 8b specifying the form number 2 is performed.

At this time, based on the domain "date" of form F2 specified for unit block A, input / output control unit 220 not only unit block A's reliability but unit blocks located before and after unit block A. Recalculate the connection probability concerning B.

In the speech recognition process by the recognition unit 210, since the result is output including the probability of the connection relationship between the unit blocks, the character string corresponding to a certain unit block (first unit block) is corrected based on the domain. In this case, the connection relationship between the second unit blocks located before and after the first unit block may also be recalculated.

In the example shown in FIG. 8, since “when” corresponding to the unit block A is corrected to “five days”, the connection probability with the block B is simply recalculated. It is corrected to "3 pm" which has a high probability of connecting with "5 days". At this time, the input / output control unit 220 corrects "3 pm" according to the form format based on the domain associated with the corrected character string "3 pm", and inputs it to the form F3. .

As described above, according to the input / output control unit 220 and the recognition unit 210 according to the present embodiment, a more effective correction can be realized by considering the connection probability of unit blocks. Even if errors in the previous and subsequent unit blocks are not corrected by one process, it is possible to correct the errors in all the unit blocks by repeating the above process.

Next, correction processing of errors derived from semantic analysis according to the present embodiment will be described. Although the above describes the case where the input / output control unit 220 corrects an error caused by speech recognition, a form selection error can also occur depending on the setting of semantic analysis or the user's intention.

FIG. 9 is a diagram for describing correction processing when a form not intended by the user is selected by setting of semantic analysis according to the present embodiment. In FIG. 9, first, as shown in the upper part, the user U instructs the utterance UO 9a to register a schedule of “greeting from 15 o'clock”. Here, it is assumed that the user U wants to input all the character strings related to “pick up from 15:00” in the form F1.

However, since the domain "time" is associated with the recognized character string "15 o'clock", the input / output control unit 220 inputs "15 o'clock" into the form F3, and only "pick up" is the form F1. Has entered. As described above, even if there is no error in the speech recognition result, if a domain not intended by the user is set in the recognized character string, the character string is input in a form not conforming to the user's intention. There is a case.

In this case, as shown in the lower left, the user U may make an utterance UO 9 b for moving “15:00” input to the unintended form F 3 to the form F 1. At this time, the user U can designate an arbitrary form by the form number or the form name.

Next, the input / output control unit 220 deletes “15 o'clock” from the form F3 based on the recognized speech UO 9b of the user U, and adds it to the form F1. In addition, when correcting the form of the input destination in this way, the input / output control unit 220 corrects and inputs the character string in accordance with the input format of the second target form which is the corrected input destination. It is also good.

FIG. 10 is a diagram for describing a correction process when a form not intended by the user is selected due to the unset domain, according to the present embodiment. In FIG. 10, first, as shown in the upper part, the user U instructs the utterance UO 10a to register a schedule of "20th (Hatuka)". Here, it is assumed that the user U wants to input "20th" in the form F2 corresponding to the date.

However, in the example shown in FIG. 10, since the domain "date" is not associated with "20 days (Hattsuka)", the input / output control unit 220 sets "20 days" for the form F1 that allows free input. "Has been entered. As described above, even if there is no error in the speech recognition result, when the domain intended by the user is not set in the recognized character string, the character string is input in a form not conforming to the user's intention There is a case.

In this case, as shown in the lower left, the user U may perform the speech UO 10 b for associating the domain newly set in the form F 2 with the “20 days” input in the unintended form F 1. .

At this time, the input / output control unit 220 according to the present embodiment, based on the feedback of the user U by the utterance UO 10b, the designated "character string" and the domain "date" set in the designated form F2. It is possible to correspond newly. Further, the input / output control unit 220 may delete “20 days” input to the form F1 based on the utterance UO 10b, and may perform input in accordance with the form F2.

As described above, according to the input / output control unit 220 according to the present embodiment, it is possible to newly associate the domain intended by the user with the character string based on the user's instruction, and thereafter, the input reflecting the user's intention It is possible to realize

11A and 11B are diagrams for explaining addition of a domain to the specific expression according to the present embodiment. In FIG. 11A, first, as shown in the upper part, the user U instructs the utterance UO 11 a to register the schedule of “the day of π”. Here, it is assumed that the user U expresses “March 14” as “the day of π” from the convention related to the pi.

However, since the domain “date” is associated with the recognized character string “day of π”, the input / output control unit 220 inputs “day of π” to the form F1. Thus, when no domain is associated with a specific expression, a string may be input to a form that does not conform to the user's intention. In addition, an alias, an abbreviation, etc. are widely contained in said specific expression. The specific expression according to the present embodiment may be an expression used only in a specific group, for example, in a home, in addition to an expression used in the world.

At this time, as shown in the lower left, the user U may make an utterance UO 11 b for moving “the day of π” input to the unintended form F 1 to the form F 2. Here, when general knowledge that “the day of π” corresponds to “March 14” can be acquired from the Internet or the like, the input / output control unit 220 sets “the day of π” to “March 14”. You may convert and fill in form F2. Further, at this time, the input / output control unit 220 may perform control to newly associate the “π day” and the “March 14” domain “date”.

On the other hand, when the general knowledge that “the day of π” corresponds to “March 14” is not obtained, the input / output control unit 220 is shown in the upper part of FIG. As shown, the information processing terminal 10 may be made to output a voice SO11 for inquiring of the user U a date expression related to "the day of π".

Here, as shown in the lower part of FIG. 11B, when the user U makes an utterance UO11c indicating a date expression, the input / output control unit 220 determines that “the day of π” and “March 14”, the domain “date” Can be newly associated. The input / output control unit 220 may also display the character string while maintaining the expression “the day of π” in the form F 2 in order to reflect the intention of the user U better. In this case, since “the day of π” and “March 14” are associated inside, it is possible to execute the scheduler function etc without any problem.

Next, control based on the reliability related to speech recognition will be described. The input / output control unit 220 according to the present embodiment can flexibly control the input / output related to the input interface IF based on the reliability related to speech recognition.

FIG. 12 is a diagram for describing input / output control in the case where the degree of reliability related to speech recognition is low. In the upper part of FIG. 12, the input / output control unit 220 does not input the speech recognition result to the form, but inputs it to the user U based on the reliability of the speech recognition of the speech UO 12a performed by the user U falling below the threshold. The information processing terminal 10 is made to output voice SO12 for requesting specification of a form to be executed.

As described above, when the reliability associated with speech recognition is low, the input / output control unit 220 can request the user to explicitly designate a form for inputting the speech recognition result. Here, as shown in the lower part of the figure, when the speech UO 12b relating to the specification of the form is obtained, the input / output control unit 220 controls recalculation of the reliability based on the speech UO 12b, and the corrected voice The recognition result can be input to form F4.

FIGS. 13 and 14 are diagrams for describing input / output control in the case where the reliability of character string candidates is antagonized. As described above, the recognition unit 210 according to the present embodiment generates a plurality of character string candidates based on the user's utterance, and finally recognizes the character string candidate having the highest reliability among the character string candidates. It can be output as a result. On the other hand, it is also assumed here that the reliability of a plurality of character string candidates antagonize.

For example, in the example illustrated in FIG. 13, in the Nbest result for the utterance UO 13 related to “Tuesday” performed by the user U, the reliability of the character string “Kaya you” and the character string “Tuesday” is low and antagonized. At this time, if "Kyanobi" is adopted based on the degree of reliability, an erroneous speech recognition result will be output.

For this reason, the input / output control unit 220 according to the present embodiment inputs each of the competing character strings to the form when the difference in reliability from the first to nth positions falls below the threshold Td in the Nbest result. Good. At this time, when the reliability value is not normalized, the input / output control unit 220 may obtain the difference after normalizing the reliability. In the example shown in FIG. 13, the input / output control unit 220 generates a plurality of second target forms based on the domains corresponding to the character string “Kayobi” and the character string “Tuesday” having competing degrees of reliability, That is, the forms F1 and F4 are selected, and the character string "Kyanobi" and the character string "Tuesday" are respectively input.

Further, at this time, the input / output control unit 220 causes the information processing terminal 10 to output the voice SO13 for confirming which form the input result is correct to obtain the feedback from the user U, thereby the intention of the user U Input can be realized.

Further, in the example shown in FIG. 13, in the Nbest result for the utterance UO 14 related to “today” performed by the user U, the reliability of the character string “today” and the character string “today” is highly antagonistic. At this time, if "Today" is adopted based on the degree of reliability, an erroneous speech recognition result will be output.

For this reason, as in the case of FIG. 13, the input / output control unit 220 according to the present embodiment has a plurality of based on the domains corresponding to the character string “todaya” and the character string “today” that have competitive degrees. A second target form of, ie, forms F1 and F2, may be selected, and the string "Today's" and the string "Today" may be entered, respectively. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to effectively reduce the load of correction in the case where the reliability levels conflict.

Next, a correction process in the case where the division of unit blocks relating to the speech recognition result is different from the user's assumption will be described. FIG. 15 is a diagram showing a correction example involving separation of unit blocks according to the present embodiment. In FIG. 15, first, as shown in the upper part, the user U instructs the utterance UO 15a to register a schedule of “5:00 pm on the 5th”.

However, at this time, there is an error in the speech recognition, and "5 days" is recognized as "when" and "3 pm" is recognized as "afternoon shinji", together with the correctly recognized "meeting" string, A form F1 corresponding to a title allowing free input is entered. In addition, at this time, there is an error in the setting of the unit block, and the unit block that is set independently by "someday" and "afternoon" is collectively set as "someday after afternoon". It is done.

At this time, as shown on the lower left, the user U may make an utterance UO 15b specifying a plurality of forms for inputting the unit block A corresponding to "someday afternoon sharing". In this case, the input / output control unit 220 can cause the recognition unit 210 to execute the calculation of Nbest relating to the unit block A again based on the utterance UO 15 b. Also, the input / output control unit 220 may separate the character strings “5 days” and “3 pm” included in the unit block A based on the recalculated Nbest result, and input them to the forms F2 and F3, respectively. it can.

Further, as described above, when the division setting of the unit blocks is not successful, the input / output control unit 220 may display the statement T1 prompting the correction of the error on the input interface IF. In this case, for example, the user U may make an utterance divided into units of the form, such as "A date someday A", "A date someday 2", "someday date", "Shinjiha Time", etc. Is expected to have the effect of realizing more efficient correction.

Note that the user U can also specify a correction other than the specification of the form, such as "not being a shanghai, sanji". As a result, by recognizing and inputting "3 o'clock", the setting of the unit block is reviewed again, and it is possible for the user U to present the input intended.

In addition, when the division setting of the unit block is not successful, as shown in the upper part of FIG. 16, the wording asking for specification of the form to be input to the user U without inputting the speech recognition result in the form T2 may be output on the input interface IF.

Here, as shown in the lower left, when the user U makes an utterance UO 16b specifying a form F1 to F3, the input / output control unit 220 generates characters from forms F2 and F3 other than the form F1 allowing free input. By fitting a column, making corrections, and inputting the remaining character string into the form F1, the user U can present the intended input.

Further, when the division setting of the unit block is not successful, as shown in the upper part of FIG. 17, the input / output control unit 220 requests the user U to input the title again without inputting the speech recognition result to the form. The word T3 may be output on the input interface IF.

Here, as shown in the lower left, when the user U makes an utterance 17b including only the character string "pick up" that the user wants to input to the form F1, the input / output control unit 220 converts the character string "pick up" to the form F1. By entering and removing the "pickup" from the first time speech recognition result "someday afternoon greeting" and applying the remaining "someday afternoon greeting" to each form. , The input intended by the user U can be presented.

Next, correction processing in the case where there are a plurality of unit blocks associated with the same domain will be described. In FIG. 18, first, as shown in the upper part, the user U instructs the utterance UO 18a to register a hotel schedule. However, the utterance UO 18 a includes two character strings “Tomorrow” and “10 days” associated with the domain “Date”. In this case, it is difficult for the input / output control unit 220 to determine which of the character strings “Tomorrow” and “10 days” is to be input to the form F2.

For this reason, when a plurality of character strings associated with the same domain are included in the speech recognition result, the input / output control unit 220 designates a form F1 that allows free input, in which no domain is set, and a title The word T4 requesting the user to utter again the content to be input to may be displayed on the input interface IF.

At this time, for example, it may be considered to display words such as "Please give me the date again." In this case, the user U determines which of "Tomorrow" and "10 days" should be uttered. It is difficult. For this reason, the input / output control unit 220 can induce the user U's utterance not to fluctuate by designating the form F1 which permits free input and urging the user to speak again. In addition, even in a case where two character strings corresponding to the domain "date" and two character strings corresponding to the domain "time" are included in the speech recognition result, the title is re-uttered, It is possible to correct correctly for both domains.

The lower part of FIG. 18 shows the result of the correction made by the input / output control unit 220 based on the utterance UO 18 b for designating the title, which the user U has made. In the case of this example, the input / output control unit 220 inputs "Reserve 10 days hotel" done by the utterance UO 18b in the form F1, and inputs the remaining "Tomorrow" date in the form F2, and “Friday,” which is the date of “Tomorrow,” is entered in Form F4. If the expression in the user's re-speech is different from the expression in the original speech, the input / output control unit 220 may input the speech recognition result for the re-speech to the form F1. Also in this case, it is possible to enter in the form F2 a character string corresponding to the domain "date" not included in the re-speech.

Next, processing in the case where a character string has already been input to a form specified by user feedback will be described. In FIG. 19, first, as shown in the upper part, the user U instructs the utterance UO 19a to register a schedule of “Tuesday”. However, at this time, since there is an error in the speech recognition and “Tuesday” is recognized as “Kaya you”, it corresponds to the title that allows free input, not the form F 4 corresponding to the day intended by the user U The voice recognition result is input to Form F1.

Next, as shown on the lower left, the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week. We are doing UO19b. However, at this time, "Wednesday" has already been input to the form F4 designated by the user by the utterance UO 19a.

In this case, the input / output control unit 220 determines whether the character string already input and the character string newly instructed to be input can be compatible, and performs control based on the determination. For example, in the case of an example shown in FIG. 19, the input / output control unit 220 can not accept both “Wednesday” and “Tuesday” due to the nature of the form F4, so “Tuesday” newly input is instructed. You may overwrite the "Wednesday" of. On the other hand, in the case of a form that allows free input, such as form F1, for example, the input / output control unit 220 appends a character string for which a new input is instructed while maintaining the already input character string. You may As described above, according to the input / output control unit 220 according to the present embodiment, it is possible to realize appropriate correction control based on the nature of the foam.

The functions of the input / output control unit 220 according to the present embodiment have been described above in detail with specific examples. Although the case where Japanese is used as the type of the character string to be input has been described above, the function possessed by the input / output control unit 220 according to the present embodiment is applicable regardless of the type of language.

FIG. 20 is a diagram for describing a correction process when English is used as the type of character string. In FIG. 20, first, as shown in the upper part, the user U instructs the utterance UO 20a to register the schedule of “Tuesday”. However, at this time, since there is an error in speech recognition and "Tuesday" is recognized as "Choose way", the input / output control unit 220 corresponds the title "Choose way" to a title that allows free input. You have filled in form F1. At this time, the input / output control unit 220 displays the two unit blocks “Choose” and “way” included in “Choose way” as unit blocks A and B, respectively.

Next, as shown in the lower left, the user U is making a speech UO 20b for moving the unit blocks A and B to the form F2 in order to correct the form error and the speech recognition error. At this time, the input / output control unit 220 according to the present embodiment deletes the unit blocks A and B from the form F1 based on the utterance UO 20b, and corrects based on the domain of the specified form F2 “Tuesday” Can be entered into form F2.

<< 1.6. Flow of operation >>
Next, the flow of the operation of the information processing server 20 according to the present embodiment will be described in detail. FIG. 21 is a flowchart showing the flow of the operation of the information processing server 20 according to the present embodiment.

Referring to FIG. 21, first, the terminal communication unit 230 receives the speech information of the user collected by the information processing terminal 10 (S1101).

Next, the recognition unit 210 executes speech recognition processing based on the speech information received in step S1101 (S1102). At this time, the recognition unit 210 may perform the calculation of the reliability, the acquisition of the Nbest result, the calculation of the connection probability between unit blocks, and the like.

Next, the input / output control unit 220 determines whether or not the difference in reliability between the 1st and nth places in the Nbest result is smaller than a threshold (S1103).

Here, when the difference in reliability between the 1st and nth places in the Nbest result falls below the threshold (S1103: Yes), the input / output control unit 220 determines whether to perform input / output control at the time of antagonism of reliability. (S1104).

Here, when it is determined that the input / output control unit 220 does not perform input / output control at the time of antagonism of the reliability (S1104: No), or the difference between the first to nth reliability in the Nbest result is equal to or more than the threshold. If there is (S1103: No), the recognition unit 210 outputs the character string candidate with the highest reliability as the speech recognition result, and input / output control of the form is executed by the input / output control unit 220 (S1105).

On the other hand, when it is determined that the input / output control unit 220 performs input / output control at the time of antagonism of the reliability (S1104: Yes), the input / output control unit 220 antagonizes the reliability as shown in FIG. 13 and FIG. The form input / output control at the time is executed (S1106).

After that, when feedback indicating correction is detected (S1107: Yes), the input / output control unit 220 causes the recognition unit 210 to calculate the reliability again based on the above feedback (S1108), and the recalculation is performed. Form input / output control is performed based on the reliability (S1109). The above-mentioned feedback may be performed not only by voice but also by sight line, gesture, operation of an input device, or the like.

On the other hand, when the feedback instructing correction is not detected (S1107: No), the information processing terminal 10 ends the series of processing.

<2. Second embodiment>
<< 2.1. Overview >>
Next, a second embodiment of the present disclosure will be described. In the first embodiment described above, the case has been described where the input / output control unit 220 controls the voice input interface provided with a plurality of forms. On the other hand, the application scope of the technical idea according to the present disclosure is not limited to the voice input interface. Therefore, in the second embodiment, a case will be described where the input / output control unit 220 controls character string input to a form placed on a Web page.

In recent years, with the development of information processing technology, various services using Web pages have become widespread. In such services, it is not uncommon to place multiple forms on a Web page for entering information about the user. In addition, there is also a technology for automatically inputting a character string to a form as described above.

FIG. 22 is a diagram for describing an automatic input for a form placed on a web page. FIG. 22 shows a web page WP having a plurality of forms corresponding to a name, a birthday, a telephone number, a zip code and the like. The user can enter information in each of the placed forms using, for example, a keyboard, but as the number of forms increases, the load associated with the input operation increases, and input errors etc. Is also expected to occur.

On the other hand, in recent years, tools have also been provided for automatically inputting character strings in forms based on preset information and past input results. According to such a tool, as shown in FIG. 22, it is possible to realize automatic input of information to a plurality of forms, and to greatly reduce the burden on the user regarding the input operation.

However, with such tools as described above, there are many mistakes such as information being input in an incorrect form. FIG. 23A and FIG. 23B are diagrams showing examples of input errors by the automatic input tool.

For example, in the example shown in FIG. 23A, in the form corresponding to the name (Kanji) and the name (Kana), “last name” and “first name” and “sei” and “mei” have been input in reverse. Such an input error may occur, for example, when the name and last name are reversely associated and managed in the automatic input tool.

Further, in the example shown in FIG. 23A, an input error occurs such that the information to be separately input to the two forms corresponding to the zip code can be forced into one of the forms. Such an input error may occur, for example, when the postal code is managed as information corresponding to one form in the automatic input tool.

Also, FIG. 23B shows an example in which information is input in a language different from the language assumed by the form. In FIG. 23B, Japanese first name and last name are input in the form corresponding to First name and Last name, which should be originally input in English. Such an input error may occur, for example, when only information written in Japanese is stored in the automatic input tool.

The technical idea according to the present embodiment was conceived focusing on the above points, and even if information is input to an incorrect form by automatic input, it is possible to perform easy correction without complicated operations. Do. Further, according to the information processing server 20 according to the present embodiment, it is possible to realize information input with fewer errors.

Hereinafter, the functions and features of the information processing server 20 according to the present embodiment and the effects of the features will be described in detail. The following description focuses on differences from the first embodiment, and a detailed description of configurations, functions, and effects common to the first embodiment is omitted.

<< 2.2. Function Details >>
As described above, the information processing server 20 according to the present embodiment can realize more convenient automatic input to a plurality of forms arranged on a Web page or the like. FIG. 24 is a diagram for describing automatic input control by the input / output control unit 220 according to the present embodiment.

FIG. 24 shows a Web page WP having a plurality of forms corresponding to full name (Kanji), full name (Kana), birthday, phone number, zip code and the like. At this time, the input / output control unit 220 according to the present embodiment selects a plurality of first target forms for performing information input from the plurality of forms based on the user's input operation, and selects a plurality of selected first targets. You can automatically input the specified string to the form.

At this time, the input / output control unit 220 according to the present embodiment may use, for example, an utterance of the user, an operation using an input device such as a mouse, a touch, or the like as a trigger of the automatic input. Further, the input / output control unit 220 may execute automatic input for a plurality of forms using information input by the user and information set in the form set FS designated in advance.

FIG. 24 shows an example of the form set FS according to the present embodiment. The form set FS according to the present embodiment is an information set in which information to be automatically input to a plurality of forms is summarized for each user and application. In the example shown in FIG. 24, the form set FS includes last name (Kanji), first name (Kanji), last name (Kana), first name (Kana), date of birth, telephone number, and zip code grouped by user. It is defined. The form set FS may be automatically generated by the input / output control unit 220 based on past input results, or may be generated and edited by the user.

Here, the input / output control unit 220 may present the form set FS as visual information to the user. At this time, the input / output control unit 220 may assign an ID to the name of the form set FS or each character string included in the form set FS. For example, the user acquires a form set FS used for automatic input by designating the name “Toshi” and the ID “1” corresponding to the name “Toshi”, and a plurality of information are included using the form set. It is possible to perform automatic filling of forms of.

In the example shown in FIG. 24, the input / output control unit 220 executes automatic input for a plurality of forms using the form set FS corresponding to the form set name “Toshishi” designated by the user. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to specify information used for automatic input after the user visually recognizes, and automatic input with less input errors according to the user's intention It is possible to realize In addition, when there is no specification of the user, the input / output control unit 220 may obtain the form set FS set by default and perform automatic input.

In addition, when the above-described automatic input is performed, the input / output control unit 220 according to the present embodiment is characterized by assigning an ID to each form arranged in the web page WP. In the example shown in FIG. 24, the input / output control unit 220 assigns IDs “1” to “12” to each form and displays the forms on the web page WP. The ID given to each form and each piece of information included in the form set FS may be for the user to more easily realize correction of the input mistake when an input mistake occurs.

25 to 27 are diagrams for explaining the correction of input information according to the present embodiment. For example, in the upper part of FIG. 25, a situation after the input / output control unit 220 has automatically input the form placed on the web page WP is shown. Note that, in FIG. 25, as in the case shown in FIG. 23A, an example is shown in which “last name” and “first name” and “sei” and “mei” are input in reverse.

In this case, the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS. In the example shown in FIG. 25, the user U performs feedback relating to a correction instruction by performing an utterance UO 25 a with a content of “1 and 2 are reversed” and an utterance UO 25 b with a content of “A to 1”. .

At this time, the input / output control unit 220 can, for example, replace the information input to the form “last name” corresponding to the ID “1” and the form “first name” corresponding to the ID “2” based on the utterance UO 25a. . Also, the input / output control unit 220 overwrites the form “surname” corresponding to the ID “1” with the character string “Ueda” corresponding to the ID “A” included in the form set FS, for example, based on the utterance UO 25 b. Also, it is possible to move the character string "Koshishi" entered in the form "surname" to the form "first name".

In addition, as described above, when it is instructed to replace the character string input in the form "last name" and the form "first name", the input / output control unit 220 It is possible to automatically replace the input character string. Also, for example, when the user U utters "1 to 3", etc., the input / output control unit 220 inputs the character string "Toshishi" entered in the form "surname" as the input form of the form "Mei". It is also possible to fill in the form "Mei" after modifying it to a kana expression.

Also, in FIG. 26, the case where the zip code that should originally be dispersed and input into the form to which ID "11" and ID "12" are assigned is input only to the form to which ID "11" is assigned. An example is shown.

Also in this case, the user U can give an instruction to correct the automatic input result using the ID given to each form or the identifier given to each character string included in the form set FS.

In the example shown in FIG. 26, the user U performs the feedback relating to the correction instruction by performing the utterance UO 26 a with the content of “11 to 11 and 12” and the utterance UO 26 b with the content “G to 11 and 12”. Is going.

At this time, the input / output control unit 220 refers to the character string “111-2222” input to the form given the ID “11” based on the speech UO 26 a and the speech U O 26 b, for example, and is included in the character string The character string can be divided based on the delimiter, the attribute of the form, the general knowledge, etc., and the character string can be input to the form to which the ID "11" and the ID "12" are given.

In addition, when the break position of a character string can not be acquired, the input / output control unit 220 may cause the information processing terminal 10 to perform an output requesting the user to specify the break position. In this case, the input / output control unit 220 acquires the break position based on, for example, the user speaking "3 digits and 4 digits", and the contents of the character string held by the form set FS It is also possible to correct

Also, FIG. 27 shows an example of the case where a Japanese-written character string is input to a form that should normally be input in English.

In this case, the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS. The user U may also issue a correction instruction using the name or ID of the form set FS.

In the example shown in FIG. 27, the user U performs feedback relating to the correction instruction by performing the utterance UO 27a with the content "form set in English" and the utterance UO 27b with the content "A and B in English". There is.

At this time, the input / output control unit 220 may execute automatic input again after switching the form set FS based on, for example, the utterance UO 26a or the utterance UO 26b. If, for example, a correction instruction relating to switching between forms is performed before the instruction relating to the switching of the form set FS, the input / output control unit 220 determines the content of the correction instruction, It may be reflected even after switching of the form set FS.

As described above, a plurality of form sets FS according to the present embodiment can be set according to the user, the language, the location, the application, and the like, and can be switched according to the situation.

The functions of the input / output control unit 220 according to the present embodiment have been described above in detail. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to realize automatic input of a form with fewer input errors and to easily correct input contents even when an input error occurs. It becomes possible. In the above, although the case where the input / output control unit 220 automatically inputs to the form arranged in the Web page has been described as an example, the input / output control unit 220 is not limited to such an example. It is possible to correspond widely to the automatic input to the form.

<3. Hardware configuration example>
Next, a hardware configuration example common to the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure will be described. FIG. 28 is a block diagram illustrating an exemplary hardware configuration of the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure. Referring to FIG. 28, the information processing terminal 10 and the information processing server 20 include, for example, a processor 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, and an input device 878. , An output device 879, a storage 880, a drive 881, a connection port 882, and a communication device 883. Note that the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than the components shown here may be further included.

(Processor 871)
The processor 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation or a part of each component based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable recording medium 901. .

(ROM 872, RAM 873)
The ROM 872 is a means for storing a program read by the processor 871, data used for an operation, and the like. The RAM 873 temporarily or permanently stores, for example, a program read by the processor 871 and various parameters and the like that appropriately change when the program is executed.

(Host bus 874, bridge 875, external bus 876, interface 877)
The processor 871, the ROM 872, and the RAM 873 are connected to one another via, for example, a host bus 874 capable of high-speed data transmission. On the other hand, host bus 874 is connected to external bus 876, which has a relatively low data transmission speed, via bridge 875, for example. The external bus 876 is also connected to various components via an interface 877.

(Input device 878)
For the input device 878, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like are used. Furthermore, as the input device 878, a remote controller (hereinafter, remote control) capable of transmitting a control signal using infrared rays or other radio waves may be used. The input device 878 also includes a voice input device such as a microphone.

(Output device 879)
The output device 879 is a display device such as a CRT (Cathode Ray Tube), an LCD, or an organic EL, a speaker, an audio output device such as a headphone, a printer, a mobile phone, or a facsimile. It is a device that can be notified visually or aurally. Also, the output device 879 according to the present disclosure includes various vibration devices capable of outputting haptic stimulation.

(Storage 880)
The storage 880 is a device for storing various data. As the storage 880, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.

(Drive 881)
The drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901, for example.

(Removable recording medium 901)
The removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like. Of course, the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.

(Connection port 882)
The connection port 882 is, for example, a port for connecting an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.

(Externally connected device 902)
The external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.

(Communication device 883)
The communication device 883 is a communication device for connecting to a network. For example, a communication card for wired or wireless LAN, Bluetooth (registered trademark) or WUSB (Wireless USB), a router for optical communication, ADSL (Asymmetric Digital) (Subscriber Line) router, or modem for various communications.

<4. Summary>
As described above, the information processing server 20 according to an embodiment of the present disclosure selects a first target form to be input from a plurality of forms based on the input operation of the user, and the first target It has an input / output control unit 220 for inputting characters in a form. In addition, the input / output control unit 220 according to an embodiment of the present disclosure is configured to select a second target form different from the first target form based on user feedback on the input content input to the first target form. One of the features is to select and perform the character input on the second target form. According to the configuration, it is possible to easily correct the selection error of the form to be input.

The preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It will be apparent to those skilled in the art of the present disclosure that various modifications and alterations can be conceived within the scope of the technical idea described in the claims. It is naturally understood that the technical scope of the present disclosure is also included.

In addition, the effects described in the present specification are merely illustrative or exemplary, and not limiting. That is, the technology according to the present disclosure can exhibit other effects apparent to those skilled in the art from the description of the present specification, in addition to or instead of the effects described above.

Moreover, each step concerning processing of information processing server 20 of this specification does not necessarily need to be processed in chronological order according to the order described in the flowchart. For example, the steps related to the processing of the information processing server 20 may be processed in an order different from the order described in the flowchart or may be processed in parallel.

The following configurations are also within the technical scope of the present disclosure.
(1)
A control unit that selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form,
Equipped with
The control unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and the second target form Perform the above character input,
Information processing device.
(2)
The control unit selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is the second target Fill in the form,
The information processing apparatus according to (1).
(3)
The control unit causes a unit block included in the input content input to the first target form to be displayed together with the input content, and a character corresponding to the unit block specified by the feedback from the first target form While deleting, the character corresponding to the said unit block is input into said 2nd object form,
The information processing apparatus according to (1) or (2).
(4)
The control unit separates a character string included in the unit block based on the feedback, and inputs the separated character string to the second target form.
The information processing apparatus according to (3).
(5)
At least one of the input operation and the feedback is performed by speech.
The information processing apparatus according to any one of the above (1) to (4).
(6)
The control unit selects the first target form based on the result of speech recognition for the input operation performed by speech, and inputs the result of the speech recognition to the first target form.
The information processing apparatus according to any one of the above (1) to (5).
(7)
The control unit selects the first target form based on the speech recognition result and a domain set in the form.
The information processing apparatus according to (6).
(8)
The control unit inputs, to the second target form, the speech recognition result corrected based on a domain set in the selected second target form.
The information processing apparatus according to (6) or (7).
(9)
The control unit controls recalculation of the reliability related to the voice recognition result based on the domain set in the selected second target form, and the corrected voice recognition result is converted to a second target form. Enter in
The information processing apparatus according to any one of the above (6) to (8).
(10)
The control unit causes the unit block included in the voice recognition result input to the first target form to be displayed together with the voice recognition result, and the first unit block designated by the feedback and the feedback are designated by the feedback. Causing the connection probability of the second unit block located before and after the first unit block to be recalculated based on the domain set in the form;
The information processing apparatus according to any one of the above (6) to (9).
(11)
The control unit inputs a character string corresponding to a second unit block corrected by recalculation of the connection probability into the form in which a domain associated with the character string is set.
The information processing apparatus according to (10).
(12)
The control unit newly associates a domain with at least a part of the speech recognition result based on the feedback.
The information processing apparatus according to any one of the above (6) to (11).
(13)
The control unit newly associates a character string designated by the feedback with a domain set in the form designated by the feedback.
The information processing apparatus according to (12).
(14)
The control unit requests the user to provide feedback for specifying the form for inputting the speech recognition result without selecting the first target form when the reliability of the speech recognition result is lower than a threshold.
The information processing apparatus according to any one of the above (6) to (13).
(15)
The control unit is configured to select a plurality of second target forms based on a domain corresponding to the character string candidate having the reliability that is competitive when the reliability of the character string candidate related to the speech recognition result is antagonized. The character string candidates having the reliability to be competitively selected are respectively input to the plurality of second target forms.
The information processing apparatus according to any one of the above (6) to (14).
(16)
When the voice recognition result includes a plurality of character strings associated with the same domain, the control unit designates the form in which the domain is not set, and utters the input content for the designated form. Ask the user,
The information processing apparatus according to any one of the above (6) to (15).
(17)
The control unit selects a plurality of the first target forms based on the input operation, and performs automatic input of a set character string.
The information processing apparatus according to any one of the above (1) to (16).
(18)
The control unit presents to the user a form set that defines a string of characters to be automatically input to the plurality of first target forms, and executes the automatic input based on the designated form set.
The information processing apparatus according to (17).
(19)
The control unit adds an identifier to at least one of the character string included in the form set and the form, and corrects the result of the automatic input based on the identifier included in the feedback.
The information processing apparatus according to (18).
(20)
The processor selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form;
A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form What to do,
including,
Information processing method.

10 information processing terminal 110 display unit 120 voice output unit 130 voice input unit 140 imaging unit 150 sensor unit 160 control unit 170 server communication unit 20 information processing server 210 recognition unit 220 input / output control unit 230 terminal communication unit

Claims

A control unit that selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form,
Equipped with
The control unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and the second target form Perform the above character input,
Information processing device.
The control unit selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is the second target Fill in the form,
An information processing apparatus according to claim 1.
The control unit causes a unit block included in the input content input to the first target form to be displayed together with the input content, and a character corresponding to the unit block specified by the feedback from the first target form While deleting, the character corresponding to the said unit block is input into said 2nd object form,
An information processing apparatus according to claim 1.
The control unit separates a character string included in the unit block based on the feedback, and inputs the separated character string to the second target form.
The information processing apparatus according to claim 3.
At least one of the input operation and the feedback is performed by speech.
An information processing apparatus according to claim 1.
The control unit selects the first target form based on the result of speech recognition for the input operation performed by speech, and inputs the result of the speech recognition to the first target form.
An information processing apparatus according to claim 1.
The control unit selects the first target form based on the speech recognition result and a domain set in the form.
The information processing apparatus according to claim 6.
The control unit inputs, to the second target form, the speech recognition result corrected based on a domain set in the selected second target form.
The information processing apparatus according to claim 6.
The control unit controls recalculation of the reliability related to the voice recognition result based on the domain set in the selected second target form, and the corrected voice recognition result is converted to a second target form. Enter in
The information processing apparatus according to claim 6.
The control unit causes the unit block included in the voice recognition result input to the first target form to be displayed together with the voice recognition result, and the first unit block designated by the feedback and the feedback are designated by the feedback. Causing the connection probability of the second unit block located before and after the first unit block to be recalculated based on the domain set in the form;
The information processing apparatus according to claim 6.
The control unit inputs a character string corresponding to a second unit block corrected by recalculation of the connection probability into the form in which a domain associated with the character string is set.
The information processing apparatus according to claim 10.
The control unit newly associates a domain with at least a part of the speech recognition result based on the feedback.
The information processing apparatus according to claim 6.
The control unit newly associates a character string designated by the feedback with a domain set in the form designated by the feedback.
The information processing apparatus according to claim 12.
The control unit requests the user to provide feedback for specifying the form for inputting the speech recognition result without selecting the first target form when the reliability of the speech recognition result is lower than a threshold.
The information processing apparatus according to claim 6.
The control unit is configured to select a plurality of second target forms based on a domain corresponding to the character string candidate having the reliability that is competitive when the reliability of the character string candidate related to the speech recognition result is antagonized. The character string candidates having the reliability to be competitively selected are respectively input to the plurality of second target forms.
The information processing apparatus according to claim 6.
When the voice recognition result includes a plurality of character strings associated with the same domain, the control unit designates the form in which the domain is not set, and utters the input content for the designated form. Ask the user,
The information processing apparatus according to claim 6.
The control unit selects a plurality of the first target forms based on the input operation, and performs automatic input of a set character string.
An information processing apparatus according to claim 1.
The control unit presents to the user a form set that defines a string of characters to be automatically input to the plurality of first target forms, and executes the automatic input based on the designated form set.
The information processing apparatus according to claim 17.
The control unit adds an identifier to at least one of the character string included in the form set and the form, and corrects the result of the automatic input based on the identifier included in the feedback.
An information processing apparatus according to claim 18.
The processor selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form;
A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form What to do,
including,
Information processing method.