WO2019142419A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2019142419A1
WO2019142419A1 PCT/JP2018/038725 JP2018038725W WO2019142419A1 WO 2019142419 A1 WO2019142419 A1 WO 2019142419A1 JP 2018038725 W JP2018038725 W JP 2018038725W WO 2019142419 A1 WO2019142419 A1 WO 2019142419A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
information processing
control unit
user
target
Prior art date
Application number
PCT/JP2018/038725
Other languages
French (fr)
Japanese (ja)
Inventor
亜由美 中川
賢次 杉原
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2019142419A1 publication Critical patent/WO2019142419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present disclosure relates to an information processing apparatus and an information processing method.
  • Patent Document 1 discloses a technique for correcting recognition errors associated with proper nouns.
  • the present disclosure proposes a new and improved information processing apparatus and information processing method capable of easily correcting a selection error of a form to be input.
  • a control unit which selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form;
  • the unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and selects the second target form as the second target form.
  • An information processing apparatus for inputting characters is provided.
  • the processor selects a first target form to be input from a plurality of forms based on the user's input operation, and performs character input on the first target form.
  • a second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form.
  • an input form in which a plurality of forms to be input for character input exist is in widespread use.
  • Such an input format is adopted, for example, in an input interface such as a to-do list or a scheduler, and has a plurality of forms corresponding to a title, a date, a time, and the like.
  • the information processing apparatus selects a first target form to be input from a plurality of forms based on the user's input operation, and inputs characters to the first target form.
  • Control unit to perform the the control unit selects a second target form different from the first target form based on the user's feedback on the input content input to the first target form, and the second target form is a character.
  • One of the features is to perform input.
  • FIG. 1 and FIG. 2 are diagrams for explaining the outline of the present embodiment.
  • a speech input interface including a plurality of forms as a factor by which a speech recognition result is input to a form not intended by the user, for example, the accuracy of speech recognition itself may be mentioned.
  • FIG. 1 is a diagram showing an example when a form not intended by the user is selected due to an error in speech recognition result.
  • FIG. 1 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4.
  • the forms F1 to F4 may be forms for inputting the title, date, time, and day of the week regarding the schedule, respectively.
  • the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week. I am doing UO1b.
  • the information processing server 20 may move the speech recognition result from the form F1 to the form F4 based on the detected speech UO1b of the user U.
  • the information processing server 20 according to the present embodiment can correct the voice input result in accordance with the input format of the form F4 corresponding to the day of the week. Focusing on the lower right of FIG. 1, it can be understood that “Kayayou” input to the form F1 is corrected to “Tuesday” by the above-described processing, and is correctly input to the form F4 intended by the user U.
  • the information processing server 20 According to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the speech recognition and further correct the error itself of the speech recognition.
  • FIG. 2 is a diagram showing an example in which a form not intended by the user is selected due to an error in character conversion in the speech recognition process.
  • FIG. 2 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4 as in FIG.
  • the description of the configuration and the like of the common form is omitted.
  • the user U instructs the utterance UO 2 a to register a schedule of “5 days”. However, at this time, since there is an error in the character conversion in the speech recognition process, and "5 days" is recognized as "when", the user U accepts free input instead of the form F2 corresponding to the intended date.
  • the speech recognition result is input to the form F1 corresponding to the title.
  • the user U utters a form number to input the speech recognition result erroneously input to the form F1 into the intended form, that is, the form F2 corresponding to the date. I am doing UO2b.
  • the user U may designate a form number displayed on the voice input interface IF in addition to the form name to issue a correction instruction.
  • the information processing server 20 may move the speech recognition result from the form F1 to the form F2 based on the detected speech UO2b of the user U.
  • the information processing server 20 may correct the voice input result in accordance with the input format of the form F2 corresponding to the date. Focusing on the lower right of FIG. 2, it can be seen that “when” that is input to the form F1 is corrected to a date format representing 5 days, and is correctly input to the form F2 intended by the user U. Further, since the date is correctly input to the form F2, the information processing server 20 may automatically input the day of the week corresponding to the form F4 based on the date.
  • the information processing server 20 According to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the character conversion in the speech recognition process and further correct the error itself of the character conversion. It becomes possible.
  • FIG. 3 is a block diagram showing an exemplary configuration of the information processing system according to the present embodiment.
  • the information processing system according to the present embodiment includes an information processing terminal 10 and an information processing server 20. Further, the information processing terminal 10 and the information processing server 20 are connected via the network 30 so as to be able to communicate with each other.
  • the information processing terminal 10 is an information processing apparatus that provides the user with a character input interface having a plurality of forms based on control by the information processing server 20.
  • the information processing terminal 10 according to the present embodiment is realized by, for example, a smartphone, a tablet, a head mounted display, a general-purpose computer, or a dedicated device of a stationary type or an autonomous moving type.
  • the information processing server 20 is an information processing apparatus that controls input / output related to a character input interface including a plurality of forms.
  • the information processing server 20 according to the present embodiment may control the display of the character input interface and the character input to the form.
  • the information processing server 20 is characterized in that it realizes a character input interface which allows the user to easily correct the error of the form as described with reference to FIGS. 1 and 2.
  • a character input interface which allows the user to easily correct the error of the form as described with reference to FIGS. 1 and 2.
  • the network 30 has a function of connecting the information processing terminal 10 and the information processing server 20.
  • the network 30 may include the Internet, a public network such as a telephone network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), a WAN (Wide Area Network), and the like.
  • the network 30 may include a leased line network such as an Internet Protocol-Virtual Private Network (IP-VPN).
  • IP-VPN Internet Protocol-Virtual Private Network
  • the network 30 may also include a wireless communication network such as Wi-Fi (registered trademark) or Bluetooth (registered trademark).
  • the configuration example of the information processing system according to the present embodiment has been described above.
  • the configuration described above with reference to FIG. 3 is merely an example, and the configuration of the information processing system according to the present embodiment is not limited to such an example.
  • the functions of the information processing terminal 10 and the information processing server 20 according to the present embodiment may be realized by a single device.
  • the configuration of the information processing system according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • FIG. 4 is a block diagram showing an example of a functional configuration of the information processing terminal 10 according to the present embodiment.
  • the information processing terminal 10 according to the present embodiment includes a display unit 110, an audio output unit 120, an audio input unit 130, an imaging unit 140, a sensor unit 150, a control unit 160, and a server communication unit 170. .
  • the display unit 110 has a function of outputting visual information such as an image or text.
  • the display unit 110 according to the present embodiment displays a character input interface based on control by the information processing server 20, for example.
  • the display unit 110 includes a display device or the like that presents visual information.
  • the display device include a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, and a touch panel.
  • the display unit 110 according to the present embodiment may output visual information by a projection function.
  • the voice output unit 120 has a function of outputting various sounds including voice.
  • the audio output unit 120 according to the present embodiment includes an audio output device such as a speaker or an amplifier.
  • the voice input unit 130 has a function of collecting sound information such as an utterance of a user and an ambient sound generated around the information processing terminal 10.
  • the voice input unit 130 according to the present embodiment includes a plurality of microphones for collecting sound information.
  • the imaging unit 140 has a function of capturing an image of the user or the surrounding environment.
  • the image information captured by the imaging unit 140 may be used for detection of the line of sight of the user by the information processing server 20 or the like.
  • the imaging unit 140 according to the present embodiment includes an imaging device capable of capturing an image. Note that the above image includes moving images as well as still images.
  • the sensor unit 150 has a function of collecting various sensor information related to the surrounding environment and the user.
  • the sensor information collected by the sensor unit 150 may be used, for example, for gesture recognition by the information processing server 20.
  • the sensor unit 150 includes, for example, an infrared sensor, an acceleration sensor, a gyro sensor, and the like.
  • Control unit 160 The control part 160 which concerns on this embodiment has a function which controls each structure with which the information processing terminal 10 is provided.
  • the control unit 160 controls, for example, start and stop of each component. Further, the control unit 160 inputs a control signal generated by the information processing server 20 to the display unit 110 or the audio output unit 120.
  • the control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20 described later.
  • the server communication unit 170 has a function of performing information communication with the information processing server 20 via the network 30. Specifically, the server communication unit 170 transmits, to the information processing server 20, the sound information collected by the voice input unit 130, the image information captured by the imaging unit 140, and the sensor information collected by the sensor unit 150. The server communication unit 170 also receives, from the information processing server 20, a control signal and the like relating to the output of the character input interface.
  • the example of the functional configuration of the information processing terminal 10 according to the present embodiment has been described above.
  • the above configuration described using FIG. 4 is merely an example, and the functional configuration of the information processing terminal 10 according to the present embodiment is not limited to such an example.
  • the information processing terminal 10 according to the present embodiment may not necessarily include all of the configurations shown in FIG. 4.
  • the information processing terminal 10 can be configured not to include the imaging unit 140, the sensor unit 150, and the like.
  • the control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20.
  • the functional configuration of the information processing terminal 10 according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • FIG. 5 is a block diagram showing an example of a functional configuration of the information processing server 20 according to the present embodiment.
  • the information processing server 20 according to the present embodiment includes a recognition unit 210, an input / output control unit 220, and a terminal communication unit 230.
  • the recognition unit 210 executes voice recognition processing based on the user's uttered voice collected by the information processing terminal 10. Further, the recognition unit 210 may execute gaze detection based on an image captured by the information processing terminal 10, gesture recognition based on an image or sensor information, and the like.
  • the input / output control unit 220 totally controls input / output processing related to the character input interface.
  • the input / output control unit 220 for example, performs character input on the form of the character input interface based on the user's input operation.
  • the input / output control unit 220 selects a first target form to be an input target from a plurality of forms based on an input operation using a user's utterance or the like, and the first target You may enter text on the form. That is, the input / output control unit 220 according to the present embodiment can automatically select a form for character input based on the result of speech recognition for the user's speech.
  • the input / output control unit 220 selects a second target form different from the first target form based on user feedback on the input content input to the first target form, It has a function of inputting characters in the second target form. More specifically, the input / output control unit 220 selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is selected. You may enter in the second target form.
  • the above user's feedback may be an instruction to correct a form error. That is, when the automatically selected form is incorrect, the input / output control unit 220 according to the present embodiment can perform a correction process so that the voice recognition result is input to the form designated by the user. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to easily correct a form error due to various factors without requiring a complicated operation.
  • Terminal communication unit 230 The terminal communication unit 230 performs information communication with the information processing terminal 10 via the network 30. Specifically, the terminal communication unit 230 receives sound information, image information, sensor information, and the like from the information processing terminal 10. The terminal communication unit 230 also transmits the control signal generated by the input / output control unit 220 to the information processing terminal 10.
  • the functional configuration example of the information processing server 20 according to an embodiment of the present disclosure has been described.
  • the above configuration described using FIG. 5 is merely an example, and the functional configuration of the information processing server 20 according to the present embodiment is not limited to such an example.
  • the configuration shown above may be realized by being distributed by a plurality of devices.
  • the functions of the information processing terminal 10 and the information processing server 20 may be realized by a single device.
  • the functional configuration of the information processing server 20 according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • the input / output control unit 220 selects the first target form to be input from the plurality of forms based on the input operation of the user, and the first target form is character Has a function to perform input.
  • the input / output control unit 220 may select the first target form based on, for example, the speech recognition result for the input operation performed by speech, and may input the speech recognition result to the first target form.
  • the input / output control unit 220 can select the first target form based on the speech recognition result for the speech and the domain set in each form. Also, as described above, when moving the speech recognition result to the second target form designated by the user's feedback, the input / output control unit 220 is corrected based on the domain set in the second target form. The voice recognition result may be input to the second target form.
  • FIG. 6 is an example of the Nbest result of the speech recognition process according to the present embodiment.
  • the recognition unit 210 according to the present embodiment generates, for example, a plurality of character string candidates based on the user's utterance, and outputs the character string candidate having the highest reliability among the character string candidates as a final speech recognition result. You may At this time, the Nbest result is a collection of character string candidates corresponding to the first to nth degrees of reliability.
  • each character string candidate is associated with a domain indicating an attribute of the character string.
  • the character string candidate "Tuesday” is associated with the domain “day of the week” since the character string is one of the days of the week, and the character string candidate "Kyanobi” has a domain where free input is permitted. "Title” is associated.
  • the input / output control unit 220 uses the first target form for the form in which the domain “title” is set based on the domain “title” associated with the “keyboard” output as the speech recognition result. Select as and enter the speech recognition result.
  • the input / output control unit 220 relates to voice recognition based on the form specified by the user, that is, the domain set in the second target form. It is possible to control the recalculation of the reliability and input the corrected speech recognition result into the second target form.
  • the input / output control unit 220 recalculates the reliability in the recognition unit 210 based on the domain "day of the week” by the user specifying a form in which the domain "day of the week” is set by feedback. I am doing it.
  • the right side of FIG. 6 shows the Nbest result re-obtained by the recalculation of the reliability.
  • the recognition unit 210 calculates the reliability of the character string candidate associated with the domain “day of the week” to the top, and thus the reliability of the character string candidate “Tuesday” changes the highest. I understand that At this time, the recognition unit 210 outputs “Tuesday” with the highest degree of reliability as a speech recognition result.
  • the input / output control unit 220 is corrected by causing the recognition unit 210 to recalculate the reliability based on the domain set in the second target form specified by the user. It is possible to obtain speech recognition results and realize input in line with the user's intention. According to the above-described function of the input / output control unit 220, it is possible to easily correct form errors and speech recognition errors without requiring complicated operations.
  • FIG. 7 is a diagram for describing a correction process in which a unit block is designated according to the present embodiment.
  • the user U instructs the utterance UO 7a to register a schedule of "English from 18 o'clock on Tuesday".
  • "Tuesday” is recognized as "Kyappie”
  • F1 corresponding to the title which allows free input, together with the correctly recognized "English” character string. It is done.
  • the speech recognition result according to the present embodiment may include a plurality of unit blocks.
  • the above-mentioned unit block indicates, for example, a character string divided by a unit such as a word, a phrase, or a clause, and in the above-mentioned example, corresponds to "gaze" and "English".
  • the input / output control unit 220 may display information on unit blocks included in the input content together with the input content input to the form.
  • the input / output control unit 220 displays the unit block relating to "Kyanobi” as "A” and the unit block relating to "English” as "B".
  • the user U inputs the character string "Kyanobi" corresponding to the unit block A incorrectly input to the form F1 into the form F4 corresponding to the day of the week, the unit block An utterance UO 7b in which A and form number 2 are designated is performed.
  • the input / output control unit 220 deletes the character string "Kyanei” corresponding to the unit block A from the form F1 based on the detected speech UO 7b of the user U, and is corrected by recalculation of the reliability.
  • the string "Tuesday” can be entered into Form F4.
  • the input / output control unit 220 can also correct the input content based on the connection probability between unit blocks.
  • FIG. 8 is a diagram for describing a correction process based on the connection probability between unit blocks according to the present embodiment.
  • the user U inputs the character string “when”, which corresponds to the unit block A incorrectly input to the form F1, into the form F2 corresponding to the date.
  • the utterance UO 8b specifying the form number 2 is performed.
  • input / output control unit 220 not only unit block A's reliability but unit blocks located before and after unit block A. Recalculate the connection probability concerning B.
  • the recognition unit 210 since the result is output including the probability of the connection relationship between the unit blocks, the character string corresponding to a certain unit block (first unit block) is corrected based on the domain. In this case, the connection relationship between the second unit blocks located before and after the first unit block may also be recalculated.
  • the connection probability with the block B is simply recalculated. It is corrected to "3 pm” which has a high probability of connecting with "5 days”.
  • the input / output control unit 220 corrects "3 pm” according to the form format based on the domain associated with the corrected character string "3 pm", and inputs it to the form F3. .
  • a more effective correction can be realized by considering the connection probability of unit blocks. Even if errors in the previous and subsequent unit blocks are not corrected by one process, it is possible to correct the errors in all the unit blocks by repeating the above process.
  • FIG. 9 is a diagram for describing correction processing when a form not intended by the user is selected by setting of semantic analysis according to the present embodiment.
  • the user U instructs the utterance UO 9a to register a schedule of “greeting from 15 o'clock”.
  • the user U wants to input all the character strings related to “pick up from 15:00” in the form F1.
  • the input / output control unit 220 inputs "15 o'clock” into the form F3, and only "pick up” is the form F1. Has entered.
  • the character string is input in a form not conforming to the user's intention. There is a case.
  • the user U may make an utterance UO 9 b for moving “15:00” input to the unintended form F 3 to the form F 1.
  • the user U can designate an arbitrary form by the form number or the form name.
  • the input / output control unit 220 deletes “15 o'clock” from the form F3 based on the recognized speech UO 9b of the user U, and adds it to the form F1.
  • the input / output control unit 220 corrects and inputs the character string in accordance with the input format of the second target form which is the corrected input destination. It is also good.
  • FIG. 10 is a diagram for describing a correction process when a form not intended by the user is selected due to the unset domain, according to the present embodiment.
  • the user U instructs the utterance UO 10a to register a schedule of "20th (Hatuka)".
  • the user U wants to input "20th” in the form F2 corresponding to the date.
  • the input / output control unit 220 sets "20 days” for the form F1 that allows free input. "Has been entered. As described above, even if there is no error in the speech recognition result, when the domain intended by the user is not set in the recognized character string, the character string is input in a form not conforming to the user's intention There is a case.
  • the user U may perform the speech UO 10 b for associating the domain newly set in the form F 2 with the “20 days” input in the unintended form F 1. .
  • the input / output control unit 220 based on the feedback of the user U by the utterance UO 10b, the designated "character string” and the domain "date” set in the designated form F2. It is possible to correspond newly. Further, the input / output control unit 220 may delete “20 days” input to the form F1 based on the utterance UO 10b, and may perform input in accordance with the form F2.
  • the input / output control unit 220 it is possible to newly associate the domain intended by the user with the character string based on the user's instruction, and thereafter, the input reflecting the user's intention It is possible to realize
  • FIG. 11A and 11B are diagrams for explaining addition of a domain to the specific expression according to the present embodiment.
  • the user U instructs the utterance UO 11 a to register the schedule of “the day of ⁇ ”.
  • the user U expresses “March 14” as “the day of ⁇ ” from the convention related to the pi.
  • the input / output control unit 220 inputs “day of ⁇ ” to the form F1.
  • a string may be input to a form that does not conform to the user's intention.
  • an alias, an abbreviation, etc. are widely contained in said specific expression.
  • the specific expression according to the present embodiment may be an expression used only in a specific group, for example, in a home, in addition to an expression used in the world.
  • the user U may make an utterance UO 11 b for moving “the day of ⁇ ” input to the unintended form F 1 to the form F 2.
  • the input / output control unit 220 sets “the day of ⁇ ” to “March 14”. You may convert and fill in form F2. Further, at this time, the input / output control unit 220 may perform control to newly associate the “ ⁇ day” and the “March 14” domain “date”.
  • the input / output control unit 220 is shown in the upper part of FIG. As shown, the information processing terminal 10 may be made to output a voice SO11 for inquiring of the user U a date expression related to "the day of ⁇ ".
  • the input / output control unit 220 determines that “the day of ⁇ ” and “March 14”, the domain “date” Can be newly associated.
  • the input / output control unit 220 may also display the character string while maintaining the expression “the day of ⁇ ” in the form F 2 in order to reflect the intention of the user U better. In this case, since “the day of ⁇ ” and “March 14” are associated inside, it is possible to execute the scheduler function etc without any problem.
  • the input / output control unit 220 can flexibly control the input / output related to the input interface IF based on the reliability related to speech recognition.
  • FIG. 12 is a diagram for describing input / output control in the case where the degree of reliability related to speech recognition is low.
  • the input / output control unit 220 does not input the speech recognition result to the form, but inputs it to the user U based on the reliability of the speech recognition of the speech UO 12a performed by the user U falling below the threshold.
  • the information processing terminal 10 is made to output voice SO12 for requesting specification of a form to be executed.
  • the input / output control unit 220 can request the user to explicitly designate a form for inputting the speech recognition result.
  • the input / output control unit 220 controls recalculation of the reliability based on the speech UO 12b, and the corrected voice
  • the recognition result can be input to form F4.
  • FIGS. 13 and 14 are diagrams for describing input / output control in the case where the reliability of character string candidates is antagonized.
  • the recognition unit 210 according to the present embodiment generates a plurality of character string candidates based on the user's utterance, and finally recognizes the character string candidate having the highest reliability among the character string candidates. It can be output as a result.
  • the reliability of a plurality of character string candidates antagonize.
  • the input / output control unit 220 inputs each of the competing character strings to the form when the difference in reliability from the first to nth positions falls below the threshold Td in the Nbest result. Good.
  • the input / output control unit 220 may obtain the difference after normalizing the reliability.
  • the input / output control unit 220 generates a plurality of second target forms based on the domains corresponding to the character string “Kayobi” and the character string “Tuesday” having competing degrees of reliability, That is, the forms F1 and F4 are selected, and the character string "Kyanobi" and the character string “Tuesday” are respectively input.
  • the input / output control unit 220 causes the information processing terminal 10 to output the voice SO13 for confirming which form the input result is correct to obtain the feedback from the user U, thereby the intention of the user U Input can be realized.
  • the input / output control unit 220 has a plurality of based on the domains corresponding to the character string “todaya” and the character string “today” that have competitive degrees.
  • a second target form of, ie, forms F1 and F2 may be selected, and the string "Today's" and the string “Today” may be entered, respectively.
  • FIG. 15 is a diagram showing a correction example involving separation of unit blocks according to the present embodiment.
  • the user U instructs the utterance UO 15a to register a schedule of “5:00 pm on the 5th”.
  • the user U may make an utterance UO 15b specifying a plurality of forms for inputting the unit block A corresponding to "someday afternoon sharing".
  • the input / output control unit 220 can cause the recognition unit 210 to execute the calculation of Nbest relating to the unit block A again based on the utterance UO 15 b.
  • the input / output control unit 220 may separate the character strings “5 days” and “3 pm” included in the unit block A based on the recalculated Nbest result, and input them to the forms F2 and F3, respectively. it can.
  • the input / output control unit 220 may display the statement T1 prompting the correction of the error on the input interface IF.
  • the user U may make an utterance divided into units of the form, such as "A date someday A”, “A date someday 2", “someday date”, "Shinjiha Time”, etc. Is expected to have the effect of realizing more efficient correction.
  • the user U can also specify a correction other than the specification of the form, such as "not being a shanghai, sanji”.
  • a correction other than the specification of the form such as "not being a shanghai, sanji”.
  • the wording asking for specification of the form to be input to the user U without inputting the speech recognition result in the form T2 may be output on the input interface IF.
  • the input / output control unit 220 when the user U makes an utterance UO 16b specifying a form F1 to F3, the input / output control unit 220 generates characters from forms F2 and F3 other than the form F1 allowing free input. By fitting a column, making corrections, and inputting the remaining character string into the form F1, the user U can present the intended input.
  • the input / output control unit 220 requests the user U to input the title again without inputting the speech recognition result to the form.
  • the word T3 may be output on the input interface IF.
  • the input / output control unit 220 converts the character string "pick up” to the form F1.
  • the input intended by the user U can be presented.
  • the user U instructs the utterance UO 18a to register a hotel schedule.
  • the utterance UO 18 a includes two character strings “Tomorrow” and “10 days” associated with the domain “Date”.
  • the input / output control unit 220 it is difficult for the input / output control unit 220 to determine which of the character strings “Tomorrow” and “10 days” is to be input to the form F2.
  • the input / output control unit 220 designates a form F1 that allows free input, in which no domain is set, and a title
  • the word T4 requesting the user to utter again the content to be input to may be displayed on the input interface IF.
  • the input / output control unit 220 can induce the user U's utterance not to fluctuate by designating the form F1 which permits free input and urging the user to speak again.
  • the title is re-uttered, It is possible to correct correctly for both domains.
  • the lower part of FIG. 18 shows the result of the correction made by the input / output control unit 220 based on the utterance UO 18 b for designating the title, which the user U has made.
  • the input / output control unit 220 inputs "Reserve 10 days hotel" done by the utterance UO 18b in the form F1, and inputs the remaining "Tomorrow” date in the form F2, and “Friday,” which is the date of “Tomorrow,” is entered in Form F4.
  • the input / output control unit 220 may input the speech recognition result for the re-speech to the form F1. Also in this case, it is possible to enter in the form F2 a character string corresponding to the domain "date” not included in the re-speech.
  • the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week.
  • the form F4 corresponding to the day of the week.
  • UO19b we are doing UO19b.
  • "Wednesday" has already been input to the form F4 designated by the user by the utterance UO 19a.
  • the input / output control unit 220 determines whether the character string already input and the character string newly instructed to be input can be compatible, and performs control based on the determination. For example, in the case of an example shown in FIG. 19, the input / output control unit 220 can not accept both “Wednesday” and “Tuesday” due to the nature of the form F4, so “Tuesday” newly input is instructed. You may overwrite the "Wednesday" of. On the other hand, in the case of a form that allows free input, such as form F1, for example, the input / output control unit 220 appends a character string for which a new input is instructed while maintaining the already input character string. You may As described above, according to the input / output control unit 220 according to the present embodiment, it is possible to realize appropriate correction control based on the nature of the foam.
  • the functions of the input / output control unit 220 according to the present embodiment have been described above in detail with specific examples. Although the case where Japanese is used as the type of the character string to be input has been described above, the function possessed by the input / output control unit 220 according to the present embodiment is applicable regardless of the type of language.
  • FIG. 20 is a diagram for describing a correction process when English is used as the type of character string.
  • the user U instructs the utterance UO 20a to register the schedule of “Tuesday”.
  • the input / output control unit 220 corresponds the title “Choose way” to a title that allows free input. You have filled in form F1.
  • the input / output control unit 220 displays the two unit blocks “Choose” and “way” included in “Choose way” as unit blocks A and B, respectively.
  • the input / output control unit 220 deletes the unit blocks A and B from the form F1 based on the utterance UO 20b, and corrects based on the domain of the specified form F2 “Tuesday” Can be entered into form F2.
  • FIG. 21 is a flowchart showing the flow of the operation of the information processing server 20 according to the present embodiment.
  • the terminal communication unit 230 receives the speech information of the user collected by the information processing terminal 10 (S1101).
  • the recognition unit 210 executes speech recognition processing based on the speech information received in step S1101 (S1102). At this time, the recognition unit 210 may perform the calculation of the reliability, the acquisition of the Nbest result, the calculation of the connection probability between unit blocks, and the like.
  • the input / output control unit 220 determines whether or not the difference in reliability between the 1st and nth places in the Nbest result is smaller than a threshold (S1103).
  • the input / output control unit 220 determines whether to perform input / output control at the time of antagonism of reliability. (S1104).
  • the recognition unit 210 outputs the character string candidate with the highest reliability as the speech recognition result, and input / output control of the form is executed by the input / output control unit 220 (S1105).
  • the input / output control unit 220 performs input / output control at the time of antagonism of the reliability (S1104: Yes)
  • the input / output control unit 220 antagonizes the reliability as shown in FIG. 13 and FIG.
  • the form input / output control at the time is executed (S1106).
  • the input / output control unit 220 causes the recognition unit 210 to calculate the reliability again based on the above feedback (S1108), and the recalculation is performed. Form input / output control is performed based on the reliability (S1109).
  • the above-mentioned feedback may be performed not only by voice but also by sight line, gesture, operation of an input device, or the like.
  • the input / output control unit 220 controls the voice input interface provided with a plurality of forms.
  • the application scope of the technical idea according to the present disclosure is not limited to the voice input interface. Therefore, in the second embodiment, a case will be described where the input / output control unit 220 controls character string input to a form placed on a Web page.
  • FIG. 22 is a diagram for describing an automatic input for a form placed on a web page.
  • FIG. 22 shows a web page WP having a plurality of forms corresponding to a name, a birthday, a telephone number, a zip code and the like.
  • the user can enter information in each of the placed forms using, for example, a keyboard, but as the number of forms increases, the load associated with the input operation increases, and input errors etc. Is also expected to occur.
  • FIG. 23A and FIG. 23B are diagrams showing examples of input errors by the automatic input tool.
  • an input error occurs such that the information to be separately input to the two forms corresponding to the zip code can be forced into one of the forms.
  • Such an input error may occur, for example, when the postal code is managed as information corresponding to one form in the automatic input tool.
  • FIG. 23B shows an example in which information is input in a language different from the language assumed by the form.
  • Japanese first name and last name are input in the form corresponding to First name and Last name, which should be originally input in English.
  • Such an input error may occur, for example, when only information written in Japanese is stored in the automatic input tool.
  • the technical idea according to the present embodiment was conceived focusing on the above points, and even if information is input to an incorrect form by automatic input, it is possible to perform easy correction without complicated operations. Do. Further, according to the information processing server 20 according to the present embodiment, it is possible to realize information input with fewer errors.
  • FIG. 24 is a diagram for describing automatic input control by the input / output control unit 220 according to the present embodiment.
  • FIG. 24 shows a Web page WP having a plurality of forms corresponding to full name (Kanji), full name (Kana), birthday, phone number, zip code and the like.
  • the input / output control unit 220 selects a plurality of first target forms for performing information input from the plurality of forms based on the user's input operation, and selects a plurality of selected first targets. You can automatically input the specified string to the form.
  • the input / output control unit 220 may use, for example, an utterance of the user, an operation using an input device such as a mouse, a touch, or the like as a trigger of the automatic input. Further, the input / output control unit 220 may execute automatic input for a plurality of forms using information input by the user and information set in the form set FS designated in advance.
  • FIG. 24 shows an example of the form set FS according to the present embodiment.
  • the form set FS according to the present embodiment is an information set in which information to be automatically input to a plurality of forms is summarized for each user and application.
  • the form set FS includes last name (Kanji), first name (Kanji), last name (Kana), first name (Kana), date of birth, telephone number, and zip code grouped by user. It is defined.
  • the form set FS may be automatically generated by the input / output control unit 220 based on past input results, or may be generated and edited by the user.
  • the input / output control unit 220 may present the form set FS as visual information to the user.
  • the input / output control unit 220 may assign an ID to the name of the form set FS or each character string included in the form set FS.
  • the user acquires a form set FS used for automatic input by designating the name “Toshi” and the ID “1” corresponding to the name “Toshi”, and a plurality of information are included using the form set. It is possible to perform automatic filling of forms of.
  • the input / output control unit 220 executes automatic input for a plurality of forms using the form set FS corresponding to the form set name “Toshishi” designated by the user.
  • the input / output control unit 220 may obtain the form set FS set by default and perform automatic input.
  • the input / output control unit 220 when the above-described automatic input is performed, is characterized by assigning an ID to each form arranged in the web page WP.
  • the input / output control unit 220 assigns IDs “1” to “12” to each form and displays the forms on the web page WP.
  • the ID given to each form and each piece of information included in the form set FS may be for the user to more easily realize correction of the input mistake when an input mistake occurs.
  • 25 to 27 are diagrams for explaining the correction of input information according to the present embodiment.
  • a situation after the input / output control unit 220 has automatically input the form placed on the web page WP is shown.
  • FIG. 25 as in the case shown in FIG. 23A, an example is shown in which “last name” and “first name” and “sei” and “mei” are input in reverse.
  • the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS.
  • the user U performs feedback relating to a correction instruction by performing an utterance UO 25 a with a content of “1 and 2 are reversed” and an utterance UO 25 b with a content of “A to 1”. .
  • the input / output control unit 220 can, for example, replace the information input to the form “last name” corresponding to the ID “1” and the form “first name” corresponding to the ID “2” based on the utterance UO 25a. . Also, the input / output control unit 220 overwrites the form “surname” corresponding to the ID “1” with the character string “Ueda” corresponding to the ID “A” included in the form set FS, for example, based on the utterance UO 25 b. Also, it is possible to move the character string "Koshishi" entered in the form "surname” to the form "first name".
  • the input / output control unit 220 when it is instructed to replace the character string input in the form "last name” and the form "first name", the input / output control unit 220 It is possible to automatically replace the input character string. Also, for example, when the user U utters "1 to 3", etc., the input / output control unit 220 inputs the character string "Toshishi” entered in the form "surname” as the input form of the form "Mei”. It is also possible to fill in the form "Mei” after modifying it to a kana expression.
  • the user U can give an instruction to correct the automatic input result using the ID given to each form or the identifier given to each character string included in the form set FS.
  • the user U performs the feedback relating to the correction instruction by performing the utterance UO 26 a with the content of “11 to 11 and 12” and the utterance UO 26 b with the content “G to 11 and 12”. Is going.
  • the input / output control unit 220 refers to the character string “111-2222” input to the form given the ID “11” based on the speech UO 26 a and the speech U O 26 b, for example, and is included in the character string
  • the character string can be divided based on the delimiter, the attribute of the form, the general knowledge, etc., and the character string can be input to the form to which the ID "11” and the ID "12" are given.
  • the input / output control unit 220 may cause the information processing terminal 10 to perform an output requesting the user to specify the break position.
  • the input / output control unit 220 acquires the break position based on, for example, the user speaking "3 digits and 4 digits", and the contents of the character string held by the form set FS It is also possible to correct
  • FIG. 27 shows an example of the case where a Japanese-written character string is input to a form that should normally be input in English.
  • the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS.
  • the user U may also issue a correction instruction using the name or ID of the form set FS.
  • the user U performs feedback relating to the correction instruction by performing the utterance UO 27a with the content "form set in English” and the utterance UO 27b with the content "A and B in English". There is.
  • the input / output control unit 220 may execute automatic input again after switching the form set FS based on, for example, the utterance UO 26a or the utterance UO 26b. If, for example, a correction instruction relating to switching between forms is performed before the instruction relating to the switching of the form set FS, the input / output control unit 220 determines the content of the correction instruction, It may be reflected even after switching of the form set FS.
  • a plurality of form sets FS according to the present embodiment can be set according to the user, the language, the location, the application, and the like, and can be switched according to the situation.
  • the functions of the input / output control unit 220 according to the present embodiment have been described above in detail. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to realize automatic input of a form with fewer input errors and to easily correct input contents even when an input error occurs. It becomes possible.
  • the input / output control unit 220 automatically inputs to the form arranged in the Web page has been described as an example, the input / output control unit 220 is not limited to such an example. It is possible to correspond widely to the automatic input to the form.
  • FIG. 28 is a block diagram illustrating an exemplary hardware configuration of the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure.
  • the information processing terminal 10 and the information processing server 20 include, for example, a processor 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, and an input device 878. , An output device 879, a storage 880, a drive 881, a connection port 882, and a communication device 883.
  • the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than the components shown here may be further included.
  • the processor 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation or a part of each component based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable recording medium 901. .
  • the ROM 872 is a means for storing a program read by the processor 871, data used for an operation, and the like.
  • the RAM 873 temporarily or permanently stores, for example, a program read by the processor 871 and various parameters and the like that appropriately change when the program is executed.
  • the processor 871, the ROM 872, and the RAM 873 are connected to one another via, for example, a host bus 874 capable of high-speed data transmission.
  • host bus 874 is connected to external bus 876, which has a relatively low data transmission speed, via bridge 875, for example.
  • the external bus 876 is also connected to various components via an interface 877.
  • Input device 8708 For the input device 878, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like are used. Furthermore, as the input device 878, a remote controller (hereinafter, remote control) capable of transmitting a control signal using infrared rays or other radio waves may be used.
  • the input device 878 also includes a voice input device such as a microphone.
  • the output device 879 is a display device such as a CRT (Cathode Ray Tube), an LCD, or an organic EL, a speaker, an audio output device such as a headphone, a printer, a mobile phone, or a facsimile. It is a device that can be notified visually or aurally. Also, the output device 879 according to the present disclosure includes various vibration devices capable of outputting haptic stimulation.
  • the storage 880 is a device for storing various data.
  • a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.
  • the drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901, for example.
  • a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory
  • the removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like.
  • the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.
  • connection port 882 is, for example, a port for connecting an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
  • an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
  • the external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.
  • the communication device 883 is a communication device for connecting to a network.
  • a communication card for wired or wireless LAN Bluetooth (registered trademark) or WUSB (Wireless USB), a router for optical communication, ADSL (Asymmetric Digital) (Subscriber Line) router, or modem for various communications.
  • Bluetooth registered trademark
  • WUSB Wireless USB
  • ADSL Asymmetric Digital
  • Subscriber Line Subscriber Line
  • the information processing server 20 selects a first target form to be input from a plurality of forms based on the input operation of the user, and the first target It has an input / output control unit 220 for inputting characters in a form.
  • the input / output control unit 220 according to an embodiment of the present disclosure is configured to select a second target form different from the first target form based on user feedback on the input content input to the first target form.
  • One of the features is to select and perform the character input on the second target form. According to the configuration, it is possible to easily correct the selection error of the form to be input.
  • each step concerning processing of information processing server 20 of this specification does not necessarily need to be processed in chronological order according to the order described in the flowchart.
  • the steps related to the processing of the information processing server 20 may be processed in an order different from the order described in the flowchart or may be processed in parallel.
  • a control unit that selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form, Equipped with The control unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and the second target form Perform the above character input, Information processing device.
  • the control unit selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is the second target Fill in the form, The information processing apparatus according to (1).
  • the control unit causes a unit block included in the input content input to the first target form to be displayed together with the input content, and a character corresponding to the unit block specified by the feedback from the first target form While deleting, the character corresponding to the said unit block is input into said 2nd object form,
  • the control unit separates a character string included in the unit block based on the feedback, and inputs the separated character string to the second target form.
  • At least one of the input operation and the feedback is performed by speech.
  • the information processing apparatus according to any one of the above (1) to (4).
  • the control unit selects the first target form based on the result of speech recognition for the input operation performed by speech, and inputs the result of the speech recognition to the first target form.
  • the information processing apparatus according to any one of the above (1) to (5).
  • the control unit selects the first target form based on the speech recognition result and a domain set in the form.
  • the control unit inputs, to the second target form, the speech recognition result corrected based on a domain set in the selected second target form.
  • the information processing apparatus according to (6) or (7). The control unit controls recalculation of the reliability related to the voice recognition result based on the domain set in the selected second target form, and the corrected voice recognition result is converted to a second target form.
  • the control unit causes the unit block included in the voice recognition result input to the first target form to be displayed together with the voice recognition result, and the first unit block designated by the feedback and the feedback are designated by the feedback. Causing the connection probability of the second unit block located before and after the first unit block to be recalculated based on the domain set in the form; The information processing apparatus according to any one of the above (6) to (9). (11) The control unit inputs a character string corresponding to a second unit block corrected by recalculation of the connection probability into the form in which a domain associated with the character string is set. The information processing apparatus according to (10).
  • the control unit newly associates a domain with at least a part of the speech recognition result based on the feedback.
  • the control unit newly associates a character string designated by the feedback with a domain set in the form designated by the feedback.
  • the control unit requests the user to provide feedback for specifying the form for inputting the speech recognition result without selecting the first target form when the reliability of the speech recognition result is lower than a threshold.
  • the information processing apparatus according to any one of the above (6) to (13).
  • the control unit is configured to select a plurality of second target forms based on a domain corresponding to the character string candidate having the reliability that is competitive when the reliability of the character string candidate related to the speech recognition result is antagonized.
  • the character string candidates having the reliability to be competitively selected are respectively input to the plurality of second target forms.
  • the information processing apparatus according to any one of the above (6) to (14).
  • the control unit designates the form in which the domain is not set, and utters the input content for the designated form. Ask the user, The information processing apparatus according to any one of the above (6) to (15).
  • the control unit selects a plurality of the first target forms based on the input operation, and performs automatic input of a set character string.
  • the information processing apparatus according to any one of the above (1) to (16).
  • the control unit presents to the user a form set that defines a string of characters to be automatically input to the plurality of first target forms, and executes the automatic input based on the designated form set.
  • the control unit adds an identifier to at least one of the character string included in the form set and the form, and corrects the result of the automatic input based on the identifier included in the feedback.
  • the information processing apparatus according to (18).
  • the processor selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form; A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form What to do, including, Information processing method.
  • information processing terminal 110 display unit 120 voice output unit 130 voice input unit 140 imaging unit 150 sensor unit 160 control unit 170 server communication unit 20 information processing server 210 recognition unit 220 input / output control unit 230 terminal communication unit

Abstract

[Problem] To easily correct erroneous selection of a form to be input. [Solution] Provided is an information processing device, comprising: a control unit selecting, from a plurality of forms, a first target form for entry on the basis of a user input operation, and enters characters in the first target form. The control unit selects a second target form that is different from the first target form on the basis of feedback from the user to the content of input in the first target form, and enters characters in the second target form.

Description

情報処理装置および情報処理方法INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD
 本開示は、情報処理装置および情報処理方法に関する。 The present disclosure relates to an information processing apparatus and an information processing method.
 近年、キーボードなどの文字入力デバイスを用いるユーザの負担を軽減する技術が多く開発されている。上記の技術には、例えば、ユーザの発話音声を認識し文字列に変換する音声認識技術が含まれる。また、音声認識の精度を向上させるための技術も多々提案されている。例えば、特許文献1には、固有名詞に関連する認識誤りを訂正する技術が開示されている。 In recent years, many techniques have been developed to reduce the burden on the user using a character input device such as a keyboard. The above-mentioned techniques include, for example, a speech recognition technique for recognizing a user's speech and converting it into a character string. Also, many techniques for improving the accuracy of speech recognition have been proposed. For example, Patent Document 1 discloses a technique for correcting recognition errors associated with proper nouns.
特開2004-258531号公報JP 2004-258531 A
 ところで、近年では、文字入力の対象となるフォームが複数存在する入力形式も広く普及している。このような入力形式の場合、音声認識結果を入力するフォームが誤って選択される場合も想定されるが、特許文献1に記載の技術では、当該フォームの誤りを修正することが困難である。 By the way, in recent years, an input form in which a plurality of forms to be subjected to character input exist is widely spread. In the case of such an input format, it may be assumed that a form for inputting a speech recognition result is erroneously selected, but in the technique described in Patent Document 1, it is difficult to correct an error of the form.
 そこで、本開示では、入力対象となるフォームの選択誤りを容易に修正することが可能な、新規かつ改良された情報処理装置および情報処理方法を提案する。 Therefore, the present disclosure proposes a new and improved information processing apparatus and information processing method capable of easily correcting a selection error of a form to be input.
 本開示によれば、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行う制御部、を備え、前記制御部は、前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行う、情報処理装置が提供される。 According to the present disclosure, there is provided a control unit which selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form; The unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and selects the second target form as the second target form. An information processing apparatus for inputting characters is provided.
 また、本開示によれば、プロセッサが、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行うことと、前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行うことと、を含む、情報処理方法が提供される。 Further, according to the present disclosure, the processor selects a first target form to be input from a plurality of forms based on the user's input operation, and performs character input on the first target form. A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form An information processing method is provided, including: performing.
 以上説明したように本開示によれば、入力対象となるフォームの選択誤りを容易に修正することが可能となる。 As described above, according to the present disclosure, it is possible to easily correct a selection error of a form to be input.
 なお、上記の効果は必ずしも限定的なものではなく、上記の効果とともに、または上記の効果に代えて、本明細書に示されたいずれかの効果、または本明細書から把握され得る他の効果が奏されてもよい。 Note that the above-mentioned effects are not necessarily limited, and, along with or in place of the above-mentioned effects, any of the effects shown in the present specification, or other effects that can be grasped from the present specification May be played.
本開示の第1の実施形態の概要について説明するための図である。It is a figure for explaining an outline of a 1st embodiment of this indication. 同実施形態の概要について説明するための図である。It is a figure for demonstrating the outline | summary of the embodiment. 同実施形態に係る情報処理システムの構成例を示すブロック図である。It is a block diagram showing an example of composition of an information processing system concerning the embodiment. 同実施形態に係る情報処理端末の機能構成例を示すブロック図である。It is a block diagram showing an example of functional composition of an information processing terminal concerning the embodiment. 同実施形態に係る情報処理サーバの機能構成例を示すブロック図である。It is a block diagram showing an example of functional composition of an information processing server concerning the embodiment. 同実施形態に係る音声認識処理のNbest結果の一例である。It is an example of the Nbest result of the speech recognition process concerning the embodiment. 同実施形態に係る位ブロックを指定した修正処理について説明するための図である。It is a figure for demonstrating the correction process which designated the order block which concerns on the embodiment. 同実施形態に係る単位ブロック間のつながり確率に基づく修正処理について説明するための図である。It is a figure for demonstrating the correction process based on the connection probability between unit blocks which concerns on the same embodiment. 同実施形態に係る意味解析の設定によりユーザの意図しないフォームが選択された場合の修正処理について説明するための図である。It is a figure for demonstrating the correction process when the form which a user does not intend is selected by the setting of the semantic analysis which concerns on the same embodiment. 同実施形態に係るドメインの未設定によりユーザの意図しないフォームが選択された場合の修正処理について説明するための図である。It is a figure for demonstrating the correction process when the form which a user does not intend is selected by the unset of the domain which concerns on the same embodiment. 同実施形態に係る特有表現に対するドメインの追加について説明するための図である。It is a figure for demonstrating the addition of the domain with respect to the specific expression which concerns on the embodiment. 同実施形態に係る特有表現に対するドメインの追加について説明するための図である。It is a figure for demonstrating the addition of the domain with respect to the specific expression which concerns on the embodiment. 同実施形態に係る音声認識の信頼度が低い場合における入出力制御について説明するための図である。It is a figure for demonstrating the input-output control in, when the reliability of the speech recognition based on the embodiment is low. 同実施形態に係る文字列候補の信頼度が拮抗した場合における入出力制御について説明するための図である。It is a figure for demonstrating input-output control in, when the reliability of the character string candidate which concerns on the same embodiment antagonizes. 同実施形態に係る文字列候補の信頼度が拮抗した場合における入出力制御について説明するための図である。It is a figure for demonstrating input-output control in, when the reliability of the character string candidate which concerns on the same embodiment antagonizes. 同実施形態に係る単位ブロックの分離を伴う修正例を示す図である。It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. 同実施形態に係る単位ブロックの分離を伴う修正例を示す図である。It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. 同実施形態に係る単位ブロックの分離を伴う修正例を示す図である。It is a figure which shows the example of a correction accompanied by isolation | separation of the unit block which concerns on the embodiment. 同実施形態に係る同一のドメインが対応付けられた複数の単位ブロックが存在する場合の修正処理について説明するための図である。It is a figure for demonstrating the correction process in case the several unit block with which the same domain based on the embodiment was matched exists. 同実施形態に係るユーザのフィードバックにより指定されたフォームに既に文字列が入力されている場合の処理について説明するための図である。It is a figure for demonstrating the process in case a character string is already input into the form designated by the user's feedback which concerns on the same embodiment. 同実施形態に係る文字列の種別として英語が用いられる場合の修正処理について説明するための図である。It is a figure for demonstrating the correction process in case English is used as a classification of the character string which concerns on the same embodiment. 同実施形態に係る情報処理サーバの動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement of the information processing server which concerns on the embodiment. Webページ上に配置されるフォームに対する自動入力について説明するための図である。It is a figure for demonstrating the automatic input with respect to the form arrange | positioned on a web page. 自動入力ツールによる入力ミスの例を示す図である。It is a figure which shows the example of the input mistake by an automatic input tool. 自動入力ツールによる入力ミスの例を示す図である。It is a figure which shows the example of the input mistake by an automatic input tool. 本開示の第2の実施形態に係る入出力制御部220による自動入力制御について説明するための図である。It is a figure for demonstrating the automatic input control by the input-output control part 220 which concerns on 2nd Embodiment of this indication. 同実施形態に係る入力情報の修正について説明するための図である。It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. 同実施形態に係る入力情報の修正について説明するための図である。It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. 同実施形態に係る入力情報の修正について説明するための図である。It is a figure for demonstrating correction | amendment of the input information which concerns on the embodiment. 本開示の一実施形態に係るハードウェア構成例を示す図である。It is a figure showing an example of hardware constitutions concerning one embodiment of this indication.
 以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration will be assigned the same reference numerals and redundant description will be omitted.
 なお、説明は以下の順序で行うものとする。
 1.第1の実施形態
  1.1.概要
  1.2.システム構成例
  1.3.情報処理端末10の機能構成例
  1.4.情報処理サーバ20の機能構成例
  1.5.機能の詳細
  1.6.動作の流れ
 2.第2の実施形態
  2.1.概要
  2.2.機能の詳細
 3.ハードウェア構成例
 4.まとめ
The description will be made in the following order.
1. First Embodiment 1.1. Overview 1.2. System configuration example 1.3. Functional configuration example of information processing terminal 10 1.4. Functional configuration example of information processing server 20 1.5. Details of Function 1.6. Flow of operation 2. Second Embodiment 2.1. Overview 2.2. Function Details 3. Hardware configuration example 4. Summary
 <1.第1の実施形態>
 <<1.1.概要>>
 まず、本開示の第1の実施形態の概要について説明する。近年では、文字入力に係るユーザの負担を軽減する技術が多く開発されている。上記のような技術には、例えば、ユーザの発話音声を認識し文字列に変換する音声認識技術が含まれる。当該技術によれば、キーボードなどを用いた文字入力の負荷からユーザを解放することが可能となる。
<1. First embodiment>
<< 1.1. Overview >>
First, an outline of the first embodiment of the present disclosure will be described. In recent years, many techniques have been developed to reduce the burden on the user associated with character input. Such techniques include, for example, speech recognition techniques that recognize the user's speech and convert it into a character string. According to this technique, it is possible to release the user from the load of character input using a keyboard or the like.
 一方、音声認識による文字入力においては、ユーザの発話を正確に文字列に変換する精度が非常に重要となる。音声認識の精度が低い場合、誤って入力された文字列をユーザが修正するための負荷が増大し、音声認識による文字入力の有意性が失われる事態も想定される。 On the other hand, in character input by speech recognition, the accuracy with which the user's speech is accurately converted to a character string becomes very important. If the accuracy of speech recognition is low, it is also assumed that the load for the user to correct an erroneously inputted character string increases and the significance of character input by speech recognition is lost.
 また、上述したように、近年では、文字入力の対象となるフォームが複数存在する入力形式も普及している。このような入力形式は、例えば、ToDoリストやスケジューラなどの入力インタフェースにおいて採用され、タイトルや日付、時間などに対応する複数のフォームを有する。 In addition, as described above, in recent years, an input form in which a plurality of forms to be input for character input exist is in widespread use. Such an input format is adopted, for example, in an input interface such as a to-do list or a scheduler, and has a plurality of forms corresponding to a title, a date, a time, and the like.
 ここで、上記のような入力形式において、音声認識による文字入力を実現する場合、ユーの発話に対応する文字列を、ユーザが意図するフォームに正しく入力することが求められる。しかし、実際には、音声認識精度や意味解析の設定などの要因により、ユーザの意図としないフォームに音声認識結果が入力されてしまうことも想定される。また、この場合、一般的な音声入力インタフェースでは、上記のようなフォームの誤りを音声のみで容易に修正することが困難である。 Here, when character input by voice recognition is realized in the above input format, it is required to correctly input a character string corresponding to the user's utterance in a form intended by the user. However, in actuality, it is also assumed that the speech recognition result is input to a form not intended by the user due to factors such as the speech recognition accuracy and the setting of semantic analysis. Also, in this case, it is difficult for the general voice input interface to easily correct the error of the form as described above by voice alone.
 上記のような事態を回避するためには、例えば、ユーザが事前に入力対象となるフォームを明示的に指定し発話を行うことも想定される。しかし、平文入力(自由発話入力)を主体とする装置の場合、事前にフォームを指定することは困難であると共に、自由発話を許容する平文入力の有意性が失われることとなる。 In order to avoid the above situation, for example, it is also assumed that the user explicitly specifies a form to be input in advance and speaks. However, in the case of a device mainly composed of plaintext input (free speech input), it is difficult to specify a form in advance, and the significance of plaintext input that allows free speech will be lost.
 本開に係る技術思想は、上記の点に着目して発想されたものであり、意図しないフォームに文字が入力された場合、ユーザが正しいフォームを事後的に指定することで、フォームの誤りを解消することを可能とする。このために、本開示の一実施形態に係る情報処理装置は、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、第1の対象フォームに文字入力を行う制御部を備える。また、上記制御部は、第1の対象フォームに入力された入力内容に対するユーザのフィードバックに基づいて、第1の対象フォームとは異なる第2の対象フォームを選択し、第2の対象フォームに文字入力を行うこと、を特徴の一つとする。 The technical idea concerning this opening was conceived by paying attention to the above points, and when characters are input to an unintended form, the user designates the correct form after the fact that the form error is made Make it possible to eliminate it. To this end, the information processing apparatus according to an embodiment of the present disclosure selects a first target form to be input from a plurality of forms based on the user's input operation, and inputs characters to the first target form. Control unit to perform the In addition, the control unit selects a second target form different from the first target form based on the user's feedback on the input content input to the first target form, and the second target form is a character. One of the features is to perform input.
 図1および図2は、本実施形態の概要について説明するための図である。複数のフォームを備える音声入力インタフェースにおいて、ユーザの意図しないフォームに音声認識結果が入力される要因としては、例えば、音声認識自体の精度が挙げられる。 FIG. 1 and FIG. 2 are diagrams for explaining the outline of the present embodiment. In a speech input interface including a plurality of forms, as a factor by which a speech recognition result is input to a form not intended by the user, for example, the accuracy of speech recognition itself may be mentioned.
 図1は、音声認識結果の誤りに起因してユーザの意図しないフォームが選択される場合の一例を示す図である。図1には、複数のフォームF1~F4を有する音声入力インタフェースIFを用いて、ユーザUが発話により予定を登録する状況が示されている。ここで、フォームF1~F4は、それぞれ予定に関するタイトル、日付、時間、曜日を入力するためのフォームであってよい。 FIG. 1 is a diagram showing an example when a form not intended by the user is selected due to an error in speech recognition result. FIG. 1 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4. Here, the forms F1 to F4 may be forms for inputting the title, date, time, and day of the week regarding the schedule, respectively.
 図1では、まず上段に示されるように、ユーザUが発話UO1aにより、「火曜日」の予定を登録する指示を行っている。しかし、この際、音声認識に誤りがあり、「火曜日」が「きゃようび」と認識されたことから、ユーザUが意図する曜日に対応するフォームF4ではなく、自由入力を許容するタイトルに対応するフォームF1に音声認識結果が入力されている。 In FIG. 1, first, as shown in the upper part, the user U instructs the utterance UO 1a to register a schedule of “Tuesday”. However, at this time, since there is an error in the speech recognition and “Tuesday” is recognized as “Kaya you”, it corresponds to the title that allows free input, not the form F 4 corresponding to the day intended by the user U The voice recognition result is input to Form F1.
 次に、ユーザUは、下段左に示すように、フォームF1に誤って入力された音声認識結果を、意図するフォーム、すなわち曜日に対応するフォームF4に入力するために、フォーム名を指定する発話UO1bを行っている。 Next, as shown on the lower left, the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week. I am doing UO1b.
 この際、本実施形態に係る情報処理サーバ20は、検出されたユーザUの発話UO1bに基づいて、音声認識結果をフォームF1からフォームF4に移動させてよい。また、この際、本実施形態に係る情報処理サーバ20は、曜日に対応するフォームF4の入力形式に合わせて、音声入力結果を修正することが可能である。図1の下段右に着目すると、上記の処理により、フォームF1に入力された「きゃようび」が「火曜日」に修正され、ユーザUの意図するフォームF4に正しく入力されていることがわかる。 At this time, the information processing server 20 according to the present embodiment may move the speech recognition result from the form F1 to the form F4 based on the detected speech UO1b of the user U. At this time, the information processing server 20 according to the present embodiment can correct the voice input result in accordance with the input format of the form F4 corresponding to the day of the week. Focusing on the lower right of FIG. 1, it can be understood that “Kayayou” input to the form F1 is corrected to “Tuesday” by the above-described processing, and is correctly input to the form F4 intended by the user U.
 このように、本実施形態に係る情報処理サーバ20によれば、音声認識の誤りに起因するフォームの選択誤りを容易に修正し、さらに上記音声認識の誤り自体を修正することが可能となる。 As described above, according to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the speech recognition and further correct the error itself of the speech recognition.
 また、図2は、音声認識処理における文字変換の誤りに起因してユーザの意図しないフォームが選択される場合の一例を示す図である。図2には、図1と同様に複数のフォームF1~F4を有する音声入力インタフェースIFを用いて、ユーザUが発話により予定を登録する状況が示されている。なお、以降の図面の説明においては、共通するフォームの構成などに関する説明は省略する。 FIG. 2 is a diagram showing an example in which a form not intended by the user is selected due to an error in character conversion in the speech recognition process. FIG. 2 shows a situation in which the user U registers a schedule by speech using a voice input interface IF having a plurality of forms F1 to F4 as in FIG. In the following description of the drawings, the description of the configuration and the like of the common form is omitted.
 図1では、まず上段に示されるように、ユーザUが発話UO2aにより、「5日」の予定を登録する指示を行っている。しかし、この際、音声認識処理における文字変換に誤りがあり、「5日」が「いつか」と認識されたことから、ユーザUが意図する日付に対応するフォームF2ではなく、自由入力を許容するタイトルに対応するフォームF1に音声認識結果が入力されている。 In FIG. 1, first, as shown in the upper part, the user U instructs the utterance UO 2 a to register a schedule of “5 days”. However, at this time, since there is an error in the character conversion in the speech recognition process, and "5 days" is recognized as "when", the user U accepts free input instead of the form F2 corresponding to the intended date. The speech recognition result is input to the form F1 corresponding to the title.
 次に、ユーザUは、下段左に示すように、フォームF1に誤って入力された音声認識結果を、意図するフォーム、すなわち日付に対応するフォームF2に入力するために、フォーム番号を指定する発話UO2bを行っている。このように、ユーザUは、フォーム名の他、音声入力インタフェースIFに表示されるフォーム番号を指定して修正の指示を行ってもよい。 Next, as shown on the lower left, the user U utters a form number to input the speech recognition result erroneously input to the form F1 into the intended form, that is, the form F2 corresponding to the date. I am doing UO2b. As described above, the user U may designate a form number displayed on the voice input interface IF in addition to the form name to issue a correction instruction.
 この際、本実施形態に係る情報処理サーバ20は、検出されたユーザUの発話UO2bに基づいて、音声認識結果をフォームF1からフォームF2に移動させてよい。また、この際、本実施形態に係る情報処理サーバ20は、日付に対応するフォームF2の入力形式に合わせて、音声入力結果を修正してよい。図2の下段右に着目すると、フォームF1に入力された「いつか」が5日を表す日付の形式に修正され、ユーザUの意図するフォームF2に正しく入力されていることがわかる。また、情報処理サーバ20は、フォームF2に日付が正しく入力されたことから、当該日付に基づいてフォームF4に対応する曜日を自動で入力してもよい。 At this time, the information processing server 20 according to the present embodiment may move the speech recognition result from the form F1 to the form F2 based on the detected speech UO2b of the user U. At this time, the information processing server 20 according to the present embodiment may correct the voice input result in accordance with the input format of the form F2 corresponding to the date. Focusing on the lower right of FIG. 2, it can be seen that “when” that is input to the form F1 is corrected to a date format representing 5 days, and is correctly input to the form F2 intended by the user U. Further, since the date is correctly input to the form F2, the information processing server 20 may automatically input the day of the week corresponding to the form F4 based on the date.
 このように、本実施形態に係る情報処理サーバ20によれば、音声認識処理における文字変換の誤りに起因するフォームの選択誤りを容易に修正し、さらに上記文字変換の誤り自体を修正することが可能となる。 As described above, according to the information processing server 20 according to the present embodiment, it is possible to easily correct the selection error of the form caused by the error of the character conversion in the speech recognition process and further correct the error itself of the character conversion. It becomes possible.
 以上、本実施形態の概要について概要を説明した。以下、本実施形態に係る情報処理方法が実現する種々の修正機能について具体例を挙げながら詳細に説明する。 The outline of the present embodiment has been described above. Hereinafter, various correction functions realized by the information processing method according to the present embodiment will be described in detail with specific examples.
 <<1.2.システム構成例>>
 まず、本開示の一実施形態に係る情報処理システムの構成例について説明する。図3は、本実施形態に係る情報処理システムの構成例を示すブロック図である。図3を参照すると、本実施形態に係る情報処理システムは、情報処理端末10および情報処理サーバ20を備える。また、情報処理端末10と情報処理サーバ20は、互いに通信が行えるように、ネットワーク30を介して接続される。
<< 1.2. System configuration example >>
First, a configuration example of an information processing system according to an embodiment of the present disclosure will be described. FIG. 3 is a block diagram showing an exemplary configuration of the information processing system according to the present embodiment. Referring to FIG. 3, the information processing system according to the present embodiment includes an information processing terminal 10 and an information processing server 20. Further, the information processing terminal 10 and the information processing server 20 are connected via the network 30 so as to be able to communicate with each other.
 (情報処理端末10)
 本実施形態に係る情報処理端末10は、情報処理サーバ20による制御に基づいて、複数のフォームを有する文字入力インタフェースをユーザに提供する情報処理装置である。本実施形態に係る情報処理端末10は、例えば、スマートフォン、タブレット、ヘッドマウントディスプレイ、汎用コンピュータ、据え置き型または自律移動型の専用装置などにより実現される。
(Information processing terminal 10)
The information processing terminal 10 according to the present embodiment is an information processing apparatus that provides the user with a character input interface having a plurality of forms based on control by the information processing server 20. The information processing terminal 10 according to the present embodiment is realized by, for example, a smartphone, a tablet, a head mounted display, a general-purpose computer, or a dedicated device of a stationary type or an autonomous moving type.
 (情報処理サーバ20)
 本実施形態に係る情報処理サーバ20は、複数のフォームを備える文字入力インタフェースに係る入出力の制御を行う情報処理装置である。本実施形態に係る情報処理サーバ20は、文字入力インタフェースの表示や、フォームに対する文字入力を制御してよい。
(Information processing server 20)
The information processing server 20 according to the present embodiment is an information processing apparatus that controls input / output related to a character input interface including a plurality of forms. The information processing server 20 according to the present embodiment may control the display of the character input interface and the character input to the form.
 また、本実施形態に係る情報処理サーバ20は、図1および図2を用いて説明したようなフォームの誤りを、ユーザが容易に修正することが可能な文字入力インタフェースを実現することを特徴の一つとする。 In addition, the information processing server 20 according to the present embodiment is characterized in that it realizes a character input interface which allows the user to easily correct the error of the form as described with reference to FIGS. 1 and 2. One.
 (ネットワーク30)
 ネットワーク30は、情報処理端末10と情報処理サーバ20とを接続する機能を有する。ネットワーク30は、インターネット、電話回線網、衛星通信網などの公衆回線網や、Ethernet(登録商標)を含む各種のLAN(Local Area Network)、WAN(Wide Area Network)などを含んでもよい。また、ネットワーク30は、IP-VPN(Internet Protocol-Virtual Private Network)などの専用回線網を含んでもよい。また、ネットワーク30は、Wi-Fi(登録商標)、Bluetooth(登録商標)など無線通信網を含んでもよい。
(Network 30)
The network 30 has a function of connecting the information processing terminal 10 and the information processing server 20. The network 30 may include the Internet, a public network such as a telephone network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), a WAN (Wide Area Network), and the like. Also, the network 30 may include a leased line network such as an Internet Protocol-Virtual Private Network (IP-VPN). The network 30 may also include a wireless communication network such as Wi-Fi (registered trademark) or Bluetooth (registered trademark).
 以上、本実施形態に係る情報処理システムの構成例について説明した。なお、図3を用いて説明した上記の構成はあくまで一例であり、本実施形態に係る情報処理システムの構成は係る例に限定されない。例えば、本実施形態に係る情報処理端末10および情報処理サーバ20が有する機能は、単一の装置により実現されてもよい。本実施形態に係る情報処理システムの構成は、仕様や運用に応じて柔軟に変形可能である。 The configuration example of the information processing system according to the present embodiment has been described above. The configuration described above with reference to FIG. 3 is merely an example, and the configuration of the information processing system according to the present embodiment is not limited to such an example. For example, the functions of the information processing terminal 10 and the information processing server 20 according to the present embodiment may be realized by a single device. The configuration of the information processing system according to the present embodiment can be flexibly deformed according to the specification and the operation.
 <<1.3.情報処理端末10の機能構成例>>
 次に、本実施形態に係る情報処理端末10の機能構成例について説明する。図4は、本実施形態に係る情報処理端末10の機能構成例を示すブロック図である。図4を参照すると、本実施形態に係る情報処理端末10は、表示部110、音声出力部120、音声入力部130、撮像部140、センサ部150、制御部160、およびサーバ通信部170を備える。
<< 1.3. Functional configuration example of the information processing terminal 10 >>
Next, a functional configuration example of the information processing terminal 10 according to the present embodiment will be described. FIG. 4 is a block diagram showing an example of a functional configuration of the information processing terminal 10 according to the present embodiment. Referring to FIG. 4, the information processing terminal 10 according to the present embodiment includes a display unit 110, an audio output unit 120, an audio input unit 130, an imaging unit 140, a sensor unit 150, a control unit 160, and a server communication unit 170. .
 (表示部110)
 本実施形態に係る表示部110は、画像やテキストなどの視覚情報を出力する機能を有する。本実施形態に係る表示部110は、例えば、情報処理サーバ20による制御に基づいて、文字入力インタフェースを表示する。
(Display unit 110)
The display unit 110 according to the present embodiment has a function of outputting visual information such as an image or text. The display unit 110 according to the present embodiment displays a character input interface based on control by the information processing server 20, for example.
 このために、本実施形態に係る表示部110は、視覚情報を提示する表示デバイスなどを備える。上記の表示デバイスには、例えば、液晶ディスプレイ(LCD:Liquid Crystal Display)装置、OLED(Organic Light Emitting Diode)装置、タッチパネルなどが挙げられる。また、本実施形態に係る表示部110は、プロジェクション機能により視覚情報を出力してもよい。 To this end, the display unit 110 according to the present embodiment includes a display device or the like that presents visual information. Examples of the display device include a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, and a touch panel. In addition, the display unit 110 according to the present embodiment may output visual information by a projection function.
 (音声出力部120)
 本実施形態に係る音声出力部120は、音声を含む種々の音を出力する機能を有する。このために、本実施形態に係る音声出力部120は、スピーカやアンプなどの音声出力装置を備える。
(Voice output unit 120)
The voice output unit 120 according to the present embodiment has a function of outputting various sounds including voice. For this purpose, the audio output unit 120 according to the present embodiment includes an audio output device such as a speaker or an amplifier.
 (音声入力部130)
 本実施形態に係る音声入力部130は、ユーザの発話や、情報処理端末10の周囲で発生する周囲音などの音情報を収集する機能を有する。本実施形態に係る音声入力部130は、音情報を収集するための複数のマイクロフォンを備える。
(Voice input unit 130)
The voice input unit 130 according to the present embodiment has a function of collecting sound information such as an utterance of a user and an ambient sound generated around the information processing terminal 10. The voice input unit 130 according to the present embodiment includes a plurality of microphones for collecting sound information.
 (撮像部140)
 本実施形態に係る撮像部140は、ユーザや周囲環境の画像を撮像する機能を有する。撮像部140が撮像した画像情報は、情報処理サーバ20によるユーザの視線検出などに用いられてもよい。本実施形態に係る撮像部140は、画像を撮像することが可能な撮像装置を備える。なお、上記の画像には、静止画像のほか動画像が含まれる。
(Imaging unit 140)
The imaging unit 140 according to the present embodiment has a function of capturing an image of the user or the surrounding environment. The image information captured by the imaging unit 140 may be used for detection of the line of sight of the user by the information processing server 20 or the like. The imaging unit 140 according to the present embodiment includes an imaging device capable of capturing an image. Note that the above image includes moving images as well as still images.
 (センサ部150)
 本実施形態に係るセンサ部150は、周囲環境やユーザに関する種々のセンサ情報を収集する機能を有する。センサ部150が収集したセンサ情報は、例えば、情報処理サーバ20によるジェスチャ認識などに用いられ得る。センサ部150は、例えば、赤外線センサ、加速度センサ、ジャイロセンサなどを備える。
(Sensor unit 150)
The sensor unit 150 according to the present embodiment has a function of collecting various sensor information related to the surrounding environment and the user. The sensor information collected by the sensor unit 150 may be used, for example, for gesture recognition by the information processing server 20. The sensor unit 150 includes, for example, an infrared sensor, an acceleration sensor, a gyro sensor, and the like.
 (制御部160)
 本実施形態に係る制御部160は、情報処理端末10が備える各構成を制御する機能を有する。制御部160は、例えば、各構成の起動や停止を制御する。また、制御部160は、情報処理サーバ20により生成される制御信号を表示部110や音声出力部120に入力する。また、本実施形態に係る制御部160は、後述する情報処理サーバ20の入出力制御部220と同等の機能を有してもよい。
(Control unit 160)
The control part 160 which concerns on this embodiment has a function which controls each structure with which the information processing terminal 10 is provided. The control unit 160 controls, for example, start and stop of each component. Further, the control unit 160 inputs a control signal generated by the information processing server 20 to the display unit 110 or the audio output unit 120. The control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20 described later.
 (サーバ通信部170)
 本実施形態に係るサーバ通信部170は、ネットワーク30を介して情報処理サーバ20との情報通信を行う機能を有する。具体的には、サーバ通信部170は、音声入力部130が収集した音情報や、撮像部140が撮像した画像情報、センサ部150が収集したセンサ情報を情報処理サーバ20に送信する。また、サーバ通信部170は、情報処理サーバ20から、文字入力インタフェースの出力に係る制御信号などを受信する。
(Server communication unit 170)
The server communication unit 170 according to the present embodiment has a function of performing information communication with the information processing server 20 via the network 30. Specifically, the server communication unit 170 transmits, to the information processing server 20, the sound information collected by the voice input unit 130, the image information captured by the imaging unit 140, and the sensor information collected by the sensor unit 150. The server communication unit 170 also receives, from the information processing server 20, a control signal and the like relating to the output of the character input interface.
 以上、本実施形態に係る情報処理端末10の機能構成例について説明した。なお、図4を用いて説明した上記の構成はあくまで一例であり、本実施形態に係る情報処理端末10の機能構成は係る例に限定されない。例えば、本実施形態に係る情報処理端末10は、図4に示す構成のすべてを必ずしも備えなくてもよい。例えば、情報処理端末10は、撮像部140やセンサ部150などを備えない構成をとることもできる。また、上述したように、本実施形態に係る制御部160は、情報処理サーバ20の入出力制御部220と同等の機能を有してもよい。本実施形態に係る情報処理端末10の機能構成は、仕様や運用に応じて柔軟に変形可能である。 The example of the functional configuration of the information processing terminal 10 according to the present embodiment has been described above. The above configuration described using FIG. 4 is merely an example, and the functional configuration of the information processing terminal 10 according to the present embodiment is not limited to such an example. For example, the information processing terminal 10 according to the present embodiment may not necessarily include all of the configurations shown in FIG. 4. For example, the information processing terminal 10 can be configured not to include the imaging unit 140, the sensor unit 150, and the like. In addition, as described above, the control unit 160 according to the present embodiment may have the same function as the input / output control unit 220 of the information processing server 20. The functional configuration of the information processing terminal 10 according to the present embodiment can be flexibly deformed according to the specification and the operation.
 <<1.4.情報処理サーバ20の機能構成例>>
 次に、本開示の一実施形態に係る情報処理サーバ20の機能構成例について説明する。図5は、本実施形態に係る情報処理サーバ20の機能構成例を示すブロック図である。図5を参照すると、本実施形態に係る情報処理サーバ20は、認識部210、入出力制御部220、および端末通信部230を備える。
<< 1.4. Functional configuration example of the information processing server 20 >>
Next, a functional configuration example of the information processing server 20 according to an embodiment of the present disclosure will be described. FIG. 5 is a block diagram showing an example of a functional configuration of the information processing server 20 according to the present embodiment. Referring to FIG. 5, the information processing server 20 according to the present embodiment includes a recognition unit 210, an input / output control unit 220, and a terminal communication unit 230.
 (認識部210)
 本実施形態に係る認識部210は、情報処理端末10が収集したユーザの発話音声に基づく音声認識処理を実行する。また、認識部210は、情報処理端末10が撮像した画像に基づく視線検出や、画像やセンサ情報に基づくジェスチャ認識などを実行してもよい。
(Recognition unit 210)
The recognition unit 210 according to the present embodiment executes voice recognition processing based on the user's uttered voice collected by the information processing terminal 10. Further, the recognition unit 210 may execute gaze detection based on an image captured by the information processing terminal 10, gesture recognition based on an image or sensor information, and the like.
 (入出力制御部220)
 本実施形態に係る入出力制御部220は、文字入力インタフェースに係る入出力処理を全体的に制御する。入出力制御部220は、例えば、ユーザの入力操作に基づいて、文字入力インタフェースが有するフォームに文字入力を行う。
(Input / output control unit 220)
The input / output control unit 220 according to the present embodiment totally controls input / output processing related to the character input interface. The input / output control unit 220, for example, performs character input on the form of the character input interface based on the user's input operation.
 この際、本実施形態に係る入出力制御部220は、ユーザの発話などを用いた入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、当該第1の対象フォームに文字入力を行ってよい。すなわち、本実施形態に係る入出力制御部220は、ユーザの発話に対する音声認識結果などに基づいて、文字入力を行うフォームを自動で選択することができる。 At this time, the input / output control unit 220 according to the present embodiment selects a first target form to be an input target from a plurality of forms based on an input operation using a user's utterance or the like, and the first target You may enter text on the form. That is, the input / output control unit 220 according to the present embodiment can automatically select a form for character input based on the result of speech recognition for the user's speech.
 また、本実施形態に係る入出力制御部220は、第1の対象フォームに入力された入力内容に対するユーザのフィードバックに基づいて、第1の対象フォームとは異なる第2の対象フォームを選択し、当該第2の対象フォームに文字入力を行う機能を有する。より具体的には、入出力制御部220は、上記フィードバックにより指定されたフォームを第2の対象フォームとして選択し、第1の対象フォームに入力された入力内容の少なくとも一部に対応する文字を第2の対象フォームに入力してよい。 Further, the input / output control unit 220 according to the present embodiment selects a second target form different from the first target form based on user feedback on the input content input to the first target form, It has a function of inputting characters in the second target form. More specifically, the input / output control unit 220 selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is selected. You may enter in the second target form.
 ここで、上記のユーザのフィードバックとは、フォームの誤りを修正するための指示であってよい。すなわち、本実施形態に係る入出力制御部220は、自動で選択したフォームが誤っていた場合、ユーザの指定したフォームに音声認識結果が入力されるよう修正処理を行うことができる。本実施形態に係る入出力制御部220が有する上記の機能によれば、種々の要因に起因するフォームの誤りを煩雑な操作を必要とせず容易に修正することが可能となる。 Here, the above user's feedback may be an instruction to correct a form error. That is, when the automatically selected form is incorrect, the input / output control unit 220 according to the present embodiment can perform a correction process so that the voice recognition result is input to the form designated by the user. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to easily correct a form error due to various factors without requiring a complicated operation.
 (端末通信部230)
 本実施形態に係る端末通信部230は、ネットワーク30を介して、情報処理端末10との情報通信を行う。具体的には、端末通信部230は、情報処理端末10から、音情報、画像情報、センサ情報などを受信する。また、端末通信部230は、入出力制御部220が生成した制御信号を情報処理端末10に送信する。
(Terminal communication unit 230)
The terminal communication unit 230 according to the present embodiment performs information communication with the information processing terminal 10 via the network 30. Specifically, the terminal communication unit 230 receives sound information, image information, sensor information, and the like from the information processing terminal 10. The terminal communication unit 230 also transmits the control signal generated by the input / output control unit 220 to the information processing terminal 10.
 以上、本開示の一実施形態に係る情報処理サーバ20の機能構成例について説明した。なお、図5を用いて説明した上記の構成はあくまで一例であり、本実施形態に係る情報処理サーバ20の機能構成は係る例に限定されない。例えば、上記に示した構成は、複数の装置により分散されて実現されてもよい。また、上述したように、情報処理端末10と情報処理サーバ20が有する機能は、単一の装置により実現されてもよい。本実施形態に係る情報処理サーバ20の機能構成は、仕様や運用に応じて柔軟に変形可能である。 Heretofore, the functional configuration example of the information processing server 20 according to an embodiment of the present disclosure has been described. The above configuration described using FIG. 5 is merely an example, and the functional configuration of the information processing server 20 according to the present embodiment is not limited to such an example. For example, the configuration shown above may be realized by being distributed by a plurality of devices. Further, as described above, the functions of the information processing terminal 10 and the information processing server 20 may be realized by a single device. The functional configuration of the information processing server 20 according to the present embodiment can be flexibly deformed according to the specification and the operation.
 <<1.5.機能の詳細>>
 次に、本実施形態に係る入出力制御部220が有する機能の詳細について説明する。上述したように、本実施形態に係る入出力制御部220は、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、当該第1の対象フォームに文字入力を行う機能を有する。入出力制御部220は、例えば、発話により行われる入力操作に対する音声認識結果に基づいて、第1の対象フォームを選択し、当該音声認識結果を第1の対象フォームに入力してよい。
<< 1.5. Function Details >>
Next, details of the functions of the input / output control unit 220 according to the present embodiment will be described. As described above, the input / output control unit 220 according to the present embodiment selects the first target form to be input from the plurality of forms based on the input operation of the user, and the first target form is character Has a function to perform input. The input / output control unit 220 may select the first target form based on, for example, the speech recognition result for the input operation performed by speech, and may input the speech recognition result to the first target form.
 この際、入出力制御部220は、発話に対する音声認識結果と、各フォームに設定されたドメインとに基づいて、第1の対象フォームを選択することが可能である。また、上述したように、入出力制御部220は、ユーザのフィードバックにより指定された第2の対象フォームに音声認識結果を移動する際、第2の対象フォームに設定されたドメインに基づいて修正された音声認識結果を当該第2の対象フォームに入力してよい。 At this time, the input / output control unit 220 can select the first target form based on the speech recognition result for the speech and the domain set in each form. Also, as described above, when moving the speech recognition result to the second target form designated by the user's feedback, the input / output control unit 220 is corrected based on the domain set in the second target form. The voice recognition result may be input to the second target form.
 図6は、本実施形態に係る音声認識処理のNbest結果の一例である。本実施形態に係る認識部210は、例えば、ユーザの発話に基づいて複数の文字列候補を生成し、当該文字列候補のうち最も信頼度が高い文字列候補を最終的な音声認識結果として出力してよい。この際、信頼度1~n位に該当する文字列候補をまとめたものがNbest結果である。 FIG. 6 is an example of the Nbest result of the speech recognition process according to the present embodiment. The recognition unit 210 according to the present embodiment generates, for example, a plurality of character string candidates based on the user's utterance, and outputs the character string candidate having the highest reliability among the character string candidates as a final speech recognition result. You may At this time, the Nbest result is a collection of character string candidates corresponding to the first to nth degrees of reliability.
 図6の左には、ユーザが「火曜日」と発話した場合におけるNbest結果の一例が示されている。本例の場合、認識部210は、文字列候補「きゃようび」に係る信頼度が最も高いことから、「きゃようび」を音声認識結果として出力する。 On the left of FIG. 6, an example of the Nbest result when the user utters "Tuesday" is shown. In the case of this example, since the recognition unit 210 has the highest degree of reliability related to the character string candidate "Kyanobi", it outputs "Kyanobi" as a speech recognition result.
 また、図示するように、各文字列候補には、文字列の属性を示すドメインが対応付けられている。例えば、文字列候補「火曜日」には、文字列が曜日の一つであることからドメイン「曜日」が対応付けられ、文字列候補「きゃようび」には、自由入力が許容されるドメイン「タイトル」が対応付けられている。 Further, as illustrated, each character string candidate is associated with a domain indicating an attribute of the character string. For example, the character string candidate "Tuesday" is associated with the domain "day of the week" since the character string is one of the days of the week, and the character string candidate "Kyanobi" has a domain where free input is permitted. "Title" is associated.
 この際、入出力制御部220は、音声認識結果として出力された「きゃようび」に対応付けられたドメイン「タイトル」に基づいて、ドメイン「タイトル」が設定されたフォームを第1の対象フォームとして選択し、音声認識結果を入力する。 At this time, the input / output control unit 220 uses the first target form for the form in which the domain “title” is set based on the domain “title” associated with the “keyboard” output as the speech recognition result. Select as and enter the speech recognition result.
 一方、ここで、ユーザからフォームの指定に係るフィードバックがあった場合、入出力制御部220は、ユーザに指定されたフォーム、すなわち第2の対象フォームに設定されたドメインに基づいて音声認識に係る信頼度の再算出を制御し、修正された音声認識結果を、第2の対象フォームに入力することができる。 On the other hand, here, when there is feedback from the user regarding the specification of the form, the input / output control unit 220 relates to voice recognition based on the form specified by the user, that is, the domain set in the second target form. It is possible to control the recalculation of the reliability and input the corrected speech recognition result into the second target form.
 図6に示す一例では、ユーザがフィードバックによりドメイン「曜日」が設定されたフォームを指定したことにより、入出力制御部220が、認識部210に、ドメイン「曜日」に基づいて信頼度を再算出させている。図6の右には、上記信頼度の再算出により取得しなおされたNbest結果が示されている。当該Nbest結果を参照すると、認識部210が、ドメイン「曜日」が対応付けられた文字列候補の信頼度を上位に算出することにより、文字列候補「火曜日」の信頼度が最も高く変化していることがわかる。この際、認識部210は、最も信頼度の高い「火曜日」を音声認識結果として出力する。 In the example shown in FIG. 6, the input / output control unit 220 recalculates the reliability in the recognition unit 210 based on the domain "day of the week" by the user specifying a form in which the domain "day of the week" is set by feedback. I am doing it. The right side of FIG. 6 shows the Nbest result re-obtained by the recalculation of the reliability. Referring to the Nbest result, the recognition unit 210 calculates the reliability of the character string candidate associated with the domain “day of the week” to the top, and thus the reliability of the character string candidate “Tuesday” changes the highest. I understand that At this time, the recognition unit 210 outputs “Tuesday” with the highest degree of reliability as a speech recognition result.
 このように、本実施形態に係る入出力制御部220は、ユーザが指定した第2の対象フォームに設定されたドメインに基づいて認識部210に信頼度の再算出を実行させることで、修正された音声認識結果を取得し、ユーザの意図に沿った入力を実現することができる。入出力制御部220が有する上記の機能によれば、煩雑な操作を必要とせずに、フォームの誤りおよび音声認識の誤りを容易に修正することが可能となる。 Thus, the input / output control unit 220 according to the present embodiment is corrected by causing the recognition unit 210 to recalculate the reliability based on the domain set in the second target form specified by the user. It is possible to obtain speech recognition results and realize input in line with the user's intention. According to the above-described function of the input / output control unit 220, it is possible to easily correct form errors and speech recognition errors without requiring complicated operations.
 次に、音声認識結果に複数の単位ブロックが含まれる場合の制御について説明する。図7は、本実施形態に係る単位ブロックを指定した修正処理について説明するための図である。図7では、まず上段に示されるように、ユーザUが発話UO7aにより、「火曜日の18時から英語」の予定を登録する指示を行っている。しかし、この際、音声認識に誤りがあり、「火曜日」が「きゃようび」と認識され、正しく認識された「英語」の文字列と共に、自由入力を許容するタイトルに対応するフォームF1に入力されている。 Next, control in the case where a plurality of unit blocks are included in the speech recognition result will be described. FIG. 7 is a diagram for describing a correction process in which a unit block is designated according to the present embodiment. In FIG. 7, first, as shown in the upper part, the user U instructs the utterance UO 7a to register a schedule of "English from 18 o'clock on Tuesday". However, at this time, there is an error in the speech recognition, and "Tuesday" is recognized as "Kyappie", and it is input in the form F1 corresponding to the title which allows free input, together with the correctly recognized "English" character string. It is done.
 このように、本実施形態に係る音声認識結果は、複数の単位ブロックを含む場合がある。ここで、上記の単位ブロックとは、例えば、単語、句、節などの単位で区切られた文字列を示し、上記の例においては、「きゃようび」および「英語」に該当する。 Thus, the speech recognition result according to the present embodiment may include a plurality of unit blocks. Here, the above-mentioned unit block indicates, for example, a character string divided by a unit such as a word, a phrase, or a clause, and in the above-mentioned example, corresponds to "gaze" and "English".
 この際、本実施形態に係る入出力制御部220は、フォームに入力した入力内容と共に当該入力内容が含む単位ブロックの情報を表示させてよい。図7に示す一例の場合、入出力制御部220は、フォームF1において、「きゃようび」に係る単位ブロックを「A」、「英語」に係る単位ブロックを「B」として表示させている。 At this time, the input / output control unit 220 according to the present embodiment may display information on unit blocks included in the input content together with the input content input to the form. In the example shown in FIG. 7, in the form F1, the input / output control unit 220 displays the unit block relating to "Kyanobi" as "A" and the unit block relating to "English" as "B".
 また、ユーザUは、下段左に示すように、フォームF1に誤って入力された単位ブロックAに対応する文字列「きゃようび」を、曜日に対応するフォームF4に入力するために、単位ブロックAとフォーム番号2を指定した発話UO7bを行っている。 In addition, as shown in the lower left, the user U inputs the character string "Kyanobi" corresponding to the unit block A incorrectly input to the form F1 into the form F4 corresponding to the day of the week, the unit block An utterance UO 7b in which A and form number 2 are designated is performed.
 この際、入出力制御部220は、検出されたユーザUの発話UO7bに基づいて、単位ブロックAに対応する文字列「きゃようび」をフォームF1から削除し、信頼度の再算出により修正された文字列「火曜日」をフォームF4に入力することができる。このように、本実施形態に係る入出力制御部220によれば、フォームに入力された入力内容が複数の単位ブロックを含む場合であっても、目的とする単位ブロックを指定した修正を実現することが可能となる。 At this time, the input / output control unit 220 deletes the character string "Kyanei" corresponding to the unit block A from the form F1 based on the detected speech UO 7b of the user U, and is corrected by recalculation of the reliability. The string "Tuesday" can be entered into Form F4. As described above, according to the input / output control unit 220 according to the present embodiment, even when the input content input to the form includes a plurality of unit blocks, the correction specifying the target unit block is realized. It becomes possible.
 また、本実施形態に係る入出力制御部220は、単位ブロック間のつながり確率に基づいて入力内容を修正することも可能である。図8は、本実施形態に係る単位ブロック間のつながり確率に基づく修正処理について説明するための図である。 The input / output control unit 220 according to the present embodiment can also correct the input content based on the connection probability between unit blocks. FIG. 8 is a diagram for describing a correction process based on the connection probability between unit blocks according to the present embodiment.
 図8では、まず上段に示されるように、ユーザUが発話UO8aにより、「5日午後3時お迎え」の予定を登録する指示を行っている。しかし、この際、音声認識に誤りがあり、「5日」が「いつか」、また「午後3時」が「午後しゃんじ」と認識され、正しく認識された「お迎え」の文字列と共に、自由入力を許容するタイトルに対応するフォームF1に入力されている。 In FIG. 8, first, as shown in the upper part, the user U instructs the utterance UO 8a to register a schedule of “5:00 pm on the 5th”. However, at this time, there is an error in the speech recognition, and "5 days" is recognized as "when" and "3 pm" is recognized as "afternoon shinji", together with the correctly recognized "meeting" string, A form F1 corresponding to a title allowing free input is entered.
 また、ユーザUは、下段左に示すように、フォームF1に誤って入力された単位ブロックAに対応する文字列「いつか」を、日付に対応するフォームF2に入力するために、単位ブロックAとフォーム番号2を指定した発話UO8bを行っている。 In addition, as shown in the lower left, the user U inputs the character string “when”, which corresponds to the unit block A incorrectly input to the form F1, into the form F2 corresponding to the date. The utterance UO 8b specifying the form number 2 is performed.
 この際、入出力制御部220は、単位ブロックAに対して指定されたフォームF2のドメイン「日付」に基づいて、単位ブロックAの信頼度のみではなく、単位ブロックAの前後に位置する単位ブロックBに係るつながり確率を再算出させる。 At this time, based on the domain "date" of form F2 specified for unit block A, input / output control unit 220 not only unit block A's reliability but unit blocks located before and after unit block A. Recalculate the connection probability concerning B.
 認識部210による音声認識処理では、単位ブロック間のつながり関係の確率も含めて結果が出力されることから、ドメインに基づいてある単位ブロック(第1の単位ブロック)に対応する文字列が修正された場合、第1の単位ブロックの前後に位置する第2の単位ブロックのつながり関係もまた再算出されてよい。 In the speech recognition process by the recognition unit 210, since the result is output including the probability of the connection relationship between the unit blocks, the character string corresponding to a certain unit block (first unit block) is corrected based on the domain. In this case, the connection relationship between the second unit blocks located before and after the first unit block may also be recalculated.
 図8に示す一例の場合、単位ブロックAに対応する「いつか」が「5日」に修正されたことにより、単にブロックBとのつながり確率が再算出され、単位ブロックBに対応する「午後しゃんじ」が「5日」とつながる確率の高い「午後3時」に修正されている。この際、入出力制御部220は、修正された文字列「午後3時」に対応付けられたドメインに基づいて、「午後3時」をフォーム形式に合わせて修正しフォームF3に入力している。 In the example shown in FIG. 8, since “when” corresponding to the unit block A is corrected to “five days”, the connection probability with the block B is simply recalculated. It is corrected to "3 pm" which has a high probability of connecting with "5 days". At this time, the input / output control unit 220 corrects "3 pm" according to the form format based on the domain associated with the corrected character string "3 pm", and inputs it to the form F3. .
 このように、本実施形態に係る入出力制御部220および認識部210によれば、単位ブロックのつながり確率を考慮することで、より効果の高い修正を実現することができる。なお、一度の処理で前後の単位ブロックに係る誤りが修正されない場合であっても、上記の処理を繰り返すことで、すべての単位ブロックに係る誤りを修正することが可能である。 As described above, according to the input / output control unit 220 and the recognition unit 210 according to the present embodiment, a more effective correction can be realized by considering the connection probability of unit blocks. Even if errors in the previous and subsequent unit blocks are not corrected by one process, it is possible to correct the errors in all the unit blocks by repeating the above process.
 次に、本実施形態に係る意味解析に由来する誤りの修正処理について説明する。上記では、入出力制御部220が、音声認識に起因する誤りを修正する場合について述べたが、フォームの選択誤りは、意味解析の設定や、ユーザの意図に依っても起こり得る。 Next, correction processing of errors derived from semantic analysis according to the present embodiment will be described. Although the above describes the case where the input / output control unit 220 corrects an error caused by speech recognition, a form selection error can also occur depending on the setting of semantic analysis or the user's intention.
 図9は、本実施形態に係る意味解析の設定によりユーザの意図しないフォームが選択された場合の修正処理について説明するための図である。図9では、まず上段に示されるように、ユーザUが発話UO9aにより、「15時からお迎え」の予定を登録する指示を行っている。ここで、ユーザUは、「15時からお迎え」に係る文字列をすべてフォームF1に入力したいとする。 FIG. 9 is a diagram for describing correction processing when a form not intended by the user is selected by setting of semantic analysis according to the present embodiment. In FIG. 9, first, as shown in the upper part, the user U instructs the utterance UO 9a to register a schedule of “greeting from 15 o'clock”. Here, it is assumed that the user U wants to input all the character strings related to “pick up from 15:00” in the form F1.
 しかし、入出力制御部220は、認識された文字列「15時」にドメイン「時間」が対応付けられていることから、「15時」をフォームF3に入力し、「お迎え」のみをフォームF1に入力している。このように、音声認識結果に誤りがない場合であっても、認識された文字列にユーザの意図しないドメインが設定されている場合、ユーザの意図に沿わないフォームに当該文字列が入力される場合がある。 However, since the domain "time" is associated with the recognized character string "15 o'clock", the input / output control unit 220 inputs "15 o'clock" into the form F3, and only "pick up" is the form F1. Has entered. As described above, even if there is no error in the speech recognition result, if a domain not intended by the user is set in the recognized character string, the character string is input in a form not conforming to the user's intention. There is a case.
 この場合、ユーザUは、下段左に示すように、意図しないフォームF3に入力された「15時」をフォームF1に移動させるための発話UO9bを行ってよい。この際、ユーザUは、フォーム番号やフォーム名称により任意のフォームを指定することが可能である。 In this case, as shown in the lower left, the user U may make an utterance UO 9 b for moving “15:00” input to the unintended form F 3 to the form F 1. At this time, the user U can designate an arbitrary form by the form number or the form name.
 次に、入出力制御部220は、認識されたユーザUの発話UO9bに基づいて、「15時」をフォームF3から削除し、フォームF1に追加する。なお、このように、入力先のフォームを修正する場合、入出力制御部220は、修正された入力先である第2の対象フォームの入力形式に合わせて、文字列を修正して入力してもよい。 Next, the input / output control unit 220 deletes “15 o'clock” from the form F3 based on the recognized speech UO 9b of the user U, and adds it to the form F1. In addition, when correcting the form of the input destination in this way, the input / output control unit 220 corrects and inputs the character string in accordance with the input format of the second target form which is the corrected input destination. It is also good.
 また、図10は、本実施形態に係るドメインの未設定によりユーザの意図しないフォームが選択された場合の修正処理について説明するための図である。図10では、まず上段に示されるように、ユーザUが発話UO10aにより、「20日(はつか)」の予定を登録する指示を行っている。ここで、ユーザUは、「20日」を日付に対応するフォームF2に入力したいとする。 FIG. 10 is a diagram for describing a correction process when a form not intended by the user is selected due to the unset domain, according to the present embodiment. In FIG. 10, first, as shown in the upper part, the user U instructs the utterance UO 10a to register a schedule of "20th (Hatuka)". Here, it is assumed that the user U wants to input "20th" in the form F2 corresponding to the date.
 しかし、図10に示す一例の場合、「20日(はつか)」にドメイン「日付」が対応付けられていないことから、入出力制御部220は、自由入力を許容するフォームF1に「20日」を入力している。このように、音声認識結果に誤りがない場合であっても、認識された文字列にユーザの意図するドメインが設定されていない場合、ユーザの意図に沿わないフォームに当該文字列が入力される場合がある。 However, in the example shown in FIG. 10, since the domain "date" is not associated with "20 days (Hattsuka)", the input / output control unit 220 sets "20 days" for the form F1 that allows free input. "Has been entered. As described above, even if there is no error in the speech recognition result, when the domain intended by the user is not set in the recognized character string, the character string is input in a form not conforming to the user's intention There is a case.
 この場合、ユーザUは、下段左に示すように、意図しないフォームF1に入力された「20日」に対し、新たにフォームF2に設定されたドメインを対応付けさせるための発話UO10bを行ってよい。 In this case, as shown in the lower left, the user U may perform the speech UO 10 b for associating the domain newly set in the form F 2 with the “20 days” input in the unintended form F 1. .
 この際、本実施形態に係る入出力制御部220は、発話UO10bによるユーザUのフィードバックに基づいて、指定された「文字列」と、指定されたフォームF2に設定されるドメイン「日付」とを新たに対応付けることが可能である。また、入出力制御部220は、発話UO10bに基づいて、フォームF1に入力された「20日」を削除し、フォームF2の形式に合わせた入力を行ってよい。 At this time, the input / output control unit 220 according to the present embodiment, based on the feedback of the user U by the utterance UO 10b, the designated "character string" and the domain "date" set in the designated form F2. It is possible to correspond newly. Further, the input / output control unit 220 may delete “20 days” input to the form F1 based on the utterance UO 10b, and may perform input in accordance with the form F2.
 このように、本実施形態に係る入出力制御部220によれば、ユーザの指示に基づき、ユーザが意図するドメインと文字列とを新たに対応付けることができ、以降、ユーザの意図を反映した入力を実現することが可能となる。 As described above, according to the input / output control unit 220 according to the present embodiment, it is possible to newly associate the domain intended by the user with the character string based on the user's instruction, and thereafter, the input reflecting the user's intention It is possible to realize
 また、図11Aおよび図11Bは、本実施形態に係る特有表現に対するドメインの追加について説明するための図である。図11Aでは、まず上段に示されるように、ユーザUが発話UO11aにより、「πの日」の予定を登録する指示を行っている。ここで、ユーザUは、円周率にちなんだ慣習から、「3月14日」を「πの日」と表現したとする。 11A and 11B are diagrams for explaining addition of a domain to the specific expression according to the present embodiment. In FIG. 11A, first, as shown in the upper part, the user U instructs the utterance UO 11 a to register the schedule of “the day of π”. Here, it is assumed that the user U expresses “March 14” as “the day of π” from the convention related to the pi.
 しかし、入出力制御部220は、認識された文字列「πの日」にドメイン「日付」が対応付けられていることから、「πの日」をフォームF1に入力している。このように、特有表現について、ドメインが対応付けられていない場合、ユーザの意図に沿わないフォームに文字列が入力される場合がある。なお、上記の特有表現には、別称、略称などが広く含まれる。本実施形態に係る特有表現は、世間で用いられる表現の他、特定の集団、例えば、家庭内などにおいてのみ用いられる表現であってもよい。 However, since the domain “date” is associated with the recognized character string “day of π”, the input / output control unit 220 inputs “day of π” to the form F1. Thus, when no domain is associated with a specific expression, a string may be input to a form that does not conform to the user's intention. In addition, an alias, an abbreviation, etc. are widely contained in said specific expression. The specific expression according to the present embodiment may be an expression used only in a specific group, for example, in a home, in addition to an expression used in the world.
 この際、ユーザUは、下段左に示すように、意図しないフォームF1に入力された「πの日」をフォームF2に移動させるための発話UO11bを行ってよい。ここで、「πの日」が「3月14日」に対応するという一般知識がインターネットなどから取得できた場合、入出力制御部220は、「πの日」を「3月14日」に変換し、フォームF2に入力してよい。また、この際、入出力制御部220は、「πの日」と「3月14日」、ドメイン「日付」を新たに対応付ける制御を行ってよい。 At this time, as shown in the lower left, the user U may make an utterance UO 11 b for moving “the day of π” input to the unintended form F 1 to the form F 2. Here, when general knowledge that “the day of π” corresponds to “March 14” can be acquired from the Internet or the like, the input / output control unit 220 sets “the day of π” to “March 14”. You may convert and fill in form F2. Further, at this time, the input / output control unit 220 may perform control to newly associate the “π day” and the “March 14” domain “date”.
 一方、「πの日」が「3月14日」に対応するという一般知識が得られない場合、入出力制御部220は、指定されたフォームF2に入力を行うために、図11Bの上段に示すように、「πの日」に係る日付表現をユーザUに問い合わせる音声SO11を情報処理端末10に出力させてもよい。 On the other hand, when the general knowledge that “the day of π” corresponds to “March 14” is not obtained, the input / output control unit 220 is shown in the upper part of FIG. As shown, the information processing terminal 10 may be made to output a voice SO11 for inquiring of the user U a date expression related to "the day of π".
 ここで、図11Bの下段に示すように、ユーザUが日付表現を示す発話UO11cを行った場合、入出力制御部220は、「πの日」と「3月14日」、ドメイン「日付」を新たに対応付けることができる。なお、入出力制御部220は、ユーザUの意図をよりよく反映するために、フォームF2においても、「πの日」という表現を保ったまま文字列を表示させてもよい。この場合、内部においては、「πの日」と「3月14日」が対応付けられていることから、問題なくスケジューラ機能などを実行することが可能である。 Here, as shown in the lower part of FIG. 11B, when the user U makes an utterance UO11c indicating a date expression, the input / output control unit 220 determines that “the day of π” and “March 14”, the domain “date” Can be newly associated. The input / output control unit 220 may also display the character string while maintaining the expression “the day of π” in the form F 2 in order to reflect the intention of the user U better. In this case, since “the day of π” and “March 14” are associated inside, it is possible to execute the scheduler function etc without any problem.
 次に、音声認識に係る信頼度に基づく制御について説明する。本実施形態に係る入出力制御部220は、音声認識に係る信頼度に基づいて入力インタフェースIFに係る入出力を柔軟に制御することが可能である。 Next, control based on the reliability related to speech recognition will be described. The input / output control unit 220 according to the present embodiment can flexibly control the input / output related to the input interface IF based on the reliability related to speech recognition.
 図12は、音声認識に係る信頼度が低い場合における入出力制御について説明するための図である。図12の上段では、ユーザUが行った発話UO12aの音声認識に係る信頼度が閾値を下回ることに基づいて、入出力制御部220が、音声認識結果をフォームに入力せず、ユーザUに入力するフォームの指定を求める音声SO12を、情報処理端末10に出力させている。 FIG. 12 is a diagram for describing input / output control in the case where the degree of reliability related to speech recognition is low. In the upper part of FIG. 12, the input / output control unit 220 does not input the speech recognition result to the form, but inputs it to the user U based on the reliability of the speech recognition of the speech UO 12a performed by the user U falling below the threshold. The information processing terminal 10 is made to output voice SO12 for requesting specification of a form to be executed.
 このように、音声認識に係る信頼度が低い場合、入出力制御部220は、ユーザに対し、音声認識結果を入力するフォームを明示的に指定するよう要求することができる。ここで、図中下段に示すように、フォームの指定に係る発話UO12bが得られた場合、入出力制御部220は、発話UO12bに基づいて、信頼度の再算出を制御し、修正された音声認識結果をフォームF4に入力することができる。 As described above, when the reliability associated with speech recognition is low, the input / output control unit 220 can request the user to explicitly designate a form for inputting the speech recognition result. Here, as shown in the lower part of the figure, when the speech UO 12b relating to the specification of the form is obtained, the input / output control unit 220 controls recalculation of the reliability based on the speech UO 12b, and the corrected voice The recognition result can be input to form F4.
 また、図13および図14は、文字列候補の信頼度が拮抗した場合における入出力制御について説明するための図である。上述したように、本実施形態に係る認識部210は、ユーザの発話に基づいて複数の文字列候補を生成し、当該文字列候補のうち最も信頼度が高い文字列候補を最終的な音声認識結果として出力することができる。一方、ここで、複数の文字列候補の信頼度が拮抗する場合も想定される。 FIGS. 13 and 14 are diagrams for describing input / output control in the case where the reliability of character string candidates is antagonized. As described above, the recognition unit 210 according to the present embodiment generates a plurality of character string candidates based on the user's utterance, and finally recognizes the character string candidate having the highest reliability among the character string candidates. It can be output as a result. On the other hand, it is also assumed here that the reliability of a plurality of character string candidates antagonize.
 例えば、図13に示す一例の場合、ユーザUが行った「火曜日」に係る発話UO13に対するNbest結果において、文字列「きゃようび」および文字列「火曜日」の信頼度が低く拮抗している。この際、信頼度の高さに基づいて「きゃようび」を採択した場合、誤った音声認識結果を出力することとなる。 For example, in the example illustrated in FIG. 13, in the Nbest result for the utterance UO 13 related to “Tuesday” performed by the user U, the reliability of the character string “Kaya you” and the character string “Tuesday” is low and antagonized. At this time, if "Kyanobi" is adopted based on the degree of reliability, an erroneous speech recognition result will be output.
 このため、本実施形態に係る入出力制御部220は、Nbest結果において、1位~n位までの信頼度の差が閾値Tdを下回る場合、拮抗する文字列のそれぞれをフォームに入力してもよい。この際、信頼度の値が正規化されていない場合においては、入出力制御部220は、信頼度を正規化したうえで差を求めてよい。図13に示す一例の場合、入出力制御部220は、拮抗する信頼度を有する文字列「きゃようび」および文字列「火曜日」に対応するドメインに基づいて、複数の第2の対象フォーム、すなわち、フォームF1およびF4を選択し、文字列「きゃようび」および文字列「火曜日」をそれぞれ入力している。 For this reason, the input / output control unit 220 according to the present embodiment inputs each of the competing character strings to the form when the difference in reliability from the first to nth positions falls below the threshold Td in the Nbest result. Good. At this time, when the reliability value is not normalized, the input / output control unit 220 may obtain the difference after normalizing the reliability. In the example shown in FIG. 13, the input / output control unit 220 generates a plurality of second target forms based on the domains corresponding to the character string “Kayobi” and the character string “Tuesday” having competing degrees of reliability, That is, the forms F1 and F4 are selected, and the character string "Kyanobi" and the character string "Tuesday" are respectively input.
 また、この際、入出力制御部220は、いずれのフォームに対する入力結果が正しいかを確認する音声SO13を情報処理端末10に出力させ、ユーザUからのフィードバックを得ることで、ユーザUの意図する入力を実現することができる。 Further, at this time, the input / output control unit 220 causes the information processing terminal 10 to output the voice SO13 for confirming which form the input result is correct to obtain the feedback from the user U, thereby the intention of the user U Input can be realized.
 また、図13に示す一例の場合、ユーザUが行った「今日」に係る発話UO14に対するNbest結果において、文字列「今日は」および文字列「今日」の信頼度が高く拮抗している。この際、信頼度の高さに基づいて「今日は」を採択した場合、誤った音声認識結果を出力することとなる。 Further, in the example shown in FIG. 13, in the Nbest result for the utterance UO 14 related to “today” performed by the user U, the reliability of the character string “today” and the character string “today” is highly antagonistic. At this time, if "Today" is adopted based on the degree of reliability, an erroneous speech recognition result will be output.
 このため、本実施形態に係る入出力制御部220は、図13の場合と同様に、拮抗する信頼度を有する文字列「今日は」および文字列「今日」に対応するドメインに基づいて、複数の第2の対象フォーム、すなわち、フォームF1およびF2を選択し、文字列「今日は」および文字列「今日」をそれぞれ入力してよい。本実施形態に係る入出力制御部220が有する上記の機能によれば、信頼度が拮抗する場合における修正の負荷を効果的に低減することが可能となる。 For this reason, as in the case of FIG. 13, the input / output control unit 220 according to the present embodiment has a plurality of based on the domains corresponding to the character string “todaya” and the character string “today” that have competitive degrees. A second target form of, ie, forms F1 and F2, may be selected, and the string "Today's" and the string "Today" may be entered, respectively. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to effectively reduce the load of correction in the case where the reliability levels conflict.
 次に、音声認識結果に係る単位ブロックの区切りがユーザの想定と異なる場合の修正処理について説明する。図15は、本実施形態に係る単位ブロックの分離を伴う修正例を示す図である。図15では、まず上段に示されるように、ユーザUが発話UO15aにより、「5日午後3時お迎え」の予定を登録する指示を行っている。 Next, a correction process in the case where the division of unit blocks relating to the speech recognition result is different from the user's assumption will be described. FIG. 15 is a diagram showing a correction example involving separation of unit blocks according to the present embodiment. In FIG. 15, first, as shown in the upper part, the user U instructs the utterance UO 15a to register a schedule of “5:00 pm on the 5th”.
 しかし、この際、音声認識に誤りがあり、「5日」が「いつか」、また「午後3時」が「午後しゃんじ」と認識され、正しく認識された「お迎え」の文字列と共に、自由入力を許容するタイトルに対応するフォームF1に入力されている。また、この際、単位ブロックの設定に誤りがあり、本来、「いつか」と「午後しゃんじ」とで独立して設定される単位ブロックが、「いつか午後しゃんじ」と、まとまって設定されている。 However, at this time, there is an error in the speech recognition, and "5 days" is recognized as "when" and "3 pm" is recognized as "afternoon shinji", together with the correctly recognized "meeting" string, A form F1 corresponding to a title allowing free input is entered. In addition, at this time, there is an error in the setting of the unit block, and the unit block that is set independently by "someday" and "afternoon" is collectively set as "someday after afternoon". It is done.
 この際、ユーザUは、下段左に示すように、「いつか午後しゃんじ」に対応する単位ブロックAを入力するフォームを複数指定する発話UO15bを行ってよい。この場合、入出力制御部220は、発話UO15bに基づいて、単位ブロックAに係るNbestの計算を認識部210に再度実行させることができる。また、入出力制御部220は、再計算されたNbest結果に基づいて、単位ブロックAが含む文字列「5日」および「午後3時」を分離し、フォームF2およびF3にそれぞれ入力することができる。 At this time, as shown on the lower left, the user U may make an utterance UO 15b specifying a plurality of forms for inputting the unit block A corresponding to "someday afternoon sharing". In this case, the input / output control unit 220 can cause the recognition unit 210 to execute the calculation of Nbest relating to the unit block A again based on the utterance UO 15 b. Also, the input / output control unit 220 may separate the character strings “5 days” and “3 pm” included in the unit block A based on the recalculated Nbest result, and input them to the forms F2 and F3, respectively. it can.
 また、上記のように単位ブロックの区切り設定が上手くいかない場合、入出力制御部220は、誤りの修正を促す文言T1を入力インタフェースIF上に表示させてもよい。この場合、ユーザUは、たとえば、「Aのいつかは日付」、「Aのいつかは2」、「いつかは日付」、「しゃんじは時間」など、フォームの単位に区切った発話を行うことが期待され、より効率的な修正を実現する効果が期待される。 Further, as described above, when the division setting of the unit blocks is not successful, the input / output control unit 220 may display the statement T1 prompting the correction of the error on the input interface IF. In this case, for example, the user U may make an utterance divided into units of the form, such as "A date someday A", "A date someday 2", "someday date", "Shinjiha Time", etc. Is expected to have the effect of realizing more efficient correction.
 なお、ユーザUは、「しゃんじじゃなくて、さんじ」のように、フォームの指定以外の修正を指示することも可能である。この結果、「3時」が認識、入力されることにより、再度単位ブロックの設定が見直され、ユーザUが意図する入力を提示することが可能である。 Note that the user U can also specify a correction other than the specification of the form, such as "not being a shanghai, sanji". As a result, by recognizing and inputting "3 o'clock", the setting of the unit block is reviewed again, and it is possible for the user U to present the input intended.
 また、単位ブロックの区切り設定が上手くいかない場合、入出力制御部220は、図16の上段に示すように、音声認識結果をフォームに入力せず、ユーザUに入力するフォームの指定を求める文言T2を、入力インタフェースIF上に出力させてもよい。 In addition, when the division setting of the unit block is not successful, as shown in the upper part of FIG. 16, the wording asking for specification of the form to be input to the user U without inputting the speech recognition result in the form T2 may be output on the input interface IF.
 ここで、下段左に示すように、ユーザUが、フォームF1~F3を指定する発話UO16bを行った場合、入出力制御部220は、自由入力を許容するフォームF1以外のフォームF2およびF3から文字列を当てはめて修正を行い、残った文字列をフォームF1に入力することで、ユーザUが意図する入力を提示することができる。 Here, as shown in the lower left, when the user U makes an utterance UO 16b specifying a form F1 to F3, the input / output control unit 220 generates characters from forms F2 and F3 other than the form F1 allowing free input. By fitting a column, making corrections, and inputting the remaining character string into the form F1, the user U can present the intended input.
 また、単位ブロックの区切り設定が上手くいかない場合、入出力制御部220は、図17の上段に示すように、音声認識結果をフォームに入力せず、ユーザUにタイトルを再度入力することを求める文言T3を、入力インタフェースIF上に出力させてもよい。 Further, when the division setting of the unit block is not successful, as shown in the upper part of FIG. 17, the input / output control unit 220 requests the user U to input the title again without inputting the speech recognition result to the form. The word T3 may be output on the input interface IF.
 ここで、下段左に示すように、ユーザUが、フォームF1に入力したい文字列「お迎え」のみを含む発話17bを行った場合、入出力制御部220は、文字列「お迎え」をフォームF1に入力すると共に、最初に取得された音声認識結果「いつか午後しゃんじお迎え」から「お迎え」を削除し、残る「いつか午後しゃんじ」が含む文字列を各フォームへ当てはめて修正することで、ユーザUが意図する入力を提示することができる。 Here, as shown in the lower left, when the user U makes an utterance 17b including only the character string "pick up" that the user wants to input to the form F1, the input / output control unit 220 converts the character string "pick up" to the form F1. By entering and removing the "pickup" from the first time speech recognition result "someday afternoon greeting" and applying the remaining "someday afternoon greeting" to each form. , The input intended by the user U can be presented.
 次に、同一のドメインが対応付けられた複数の単位ブロックが存在する場合の修正処理について説明する。図18では、まず上段に示されるように、ユーザUが発話UO18aにより、ホテルの予定を登録する指示を行っている。しかし、発話UO18aには、ドメイン「日付」に対応付けられた2つの文字列「明日」と「10日」が含まれている。この場合、入出力制御部220は、文字列「明日」と「10日」のどちらをフォームF2に入力するのかを判断することが困難である。 Next, correction processing in the case where there are a plurality of unit blocks associated with the same domain will be described. In FIG. 18, first, as shown in the upper part, the user U instructs the utterance UO 18a to register a hotel schedule. However, the utterance UO 18 a includes two character strings “Tomorrow” and “10 days” associated with the domain “Date”. In this case, it is difficult for the input / output control unit 220 to determine which of the character strings “Tomorrow” and “10 days” is to be input to the form F2.
 このため、音声認識結果に同一のドメインに対応付けられた複数の文字列が含まれる場合、入出力制御部220は、ドメインが設定されていない、自由入力を許容するフォームF1を指定し、タイトルに入力する内容の発話を再度行うようユーザに要求する文言T4を、入力インタフェースIF上に表示させてよい。 For this reason, when a plurality of character strings associated with the same domain are included in the speech recognition result, the input / output control unit 220 designates a form F1 that allows free input, in which no domain is set, and a title The word T4 requesting the user to utter again the content to be input to may be displayed on the input interface IF.
 この際、例えば、「もう一度、日付をお願いします」などの文言を表示することも考えられるが、この場合、ユーザUは、「明日」と「10日」のどちらを発話するべきか判断することが難しい。このため、入出力制御部220は、自由入力を許容するフォームF1を指定し再発話を促すことで、ユーザUの発話に揺らぎが生じないよう誘導することができる。また、音声認識結果に、ドメイン「日付」に対応する文字列が2つ、ドメイン「時間」に対応する文字列が2つ含まれるような場合であっても、タイトルを再発話させることで、両ドメインについて正しく修正することが可能である。 At this time, for example, it may be considered to display words such as "Please give me the date again." In this case, the user U determines which of "Tomorrow" and "10 days" should be uttered. It is difficult. For this reason, the input / output control unit 220 can induce the user U's utterance not to fluctuate by designating the form F1 which permits free input and urging the user to speak again. In addition, even in a case where two character strings corresponding to the domain "date" and two character strings corresponding to the domain "time" are included in the speech recognition result, the title is re-uttered, It is possible to correct correctly for both domains.
 図18の下段には、ユーザUが行った、タイトルを指定する発話UO18bに基づいて、入出力制御部220が行った修正の結果が示されている。本例の場合、入出力制御部220は、発話UO18bによりしていされた「10日のホテルを予約」をフォームF1に入力し、残った「明日」の日付をフォームF2に入力すると共に、「明日」の日付である「水曜日」をフォームF4に入力している。なお、ユーザの再発話における表現が、当初の発話における表現と異なっている場合、入出力制御部220は、再発話に対する音声認識結果をフォームF1に入力してよい。この場合でも同様に、再発話に含まれないドメイン「日付」に対応する文字列をフォームF2に入力することが可能である。 The lower part of FIG. 18 shows the result of the correction made by the input / output control unit 220 based on the utterance UO 18 b for designating the title, which the user U has made. In the case of this example, the input / output control unit 220 inputs "Reserve 10 days hotel" done by the utterance UO 18b in the form F1, and inputs the remaining "Tomorrow" date in the form F2, and “Friday,” which is the date of “Tomorrow,” is entered in Form F4. If the expression in the user's re-speech is different from the expression in the original speech, the input / output control unit 220 may input the speech recognition result for the re-speech to the form F1. Also in this case, it is possible to enter in the form F2 a character string corresponding to the domain "date" not included in the re-speech.
 次に、ユーザのフィードバックにより指定されたフォームに既に文字列が入力されている場合の処理について説明する。図19では、まず上段に示されるように、ユーザUが発話UO19aにより、「火曜日」の予定を登録する指示を行っている。しかし、この際、音声認識に誤りがあり、「火曜日」が「きゃようび」と認識されたことから、ユーザUが意図する曜日に対応するフォームF4ではなく、自由入力を許容するタイトルに対応するフォームF1に音声認識結果が入力されている。 Next, processing in the case where a character string has already been input to a form specified by user feedback will be described. In FIG. 19, first, as shown in the upper part, the user U instructs the utterance UO 19a to register a schedule of “Tuesday”. However, at this time, since there is an error in the speech recognition and “Tuesday” is recognized as “Kaya you”, it corresponds to the title that allows free input, not the form F 4 corresponding to the day intended by the user U The voice recognition result is input to Form F1.
 次に、ユーザUは、下段左に示すように、フォームF1に誤って入力された音声認識結果を、意図するフォーム、すなわち曜日に対応するフォームF4に入力するために、フォーム名を指定する発話UO19bを行っている。しかし、この際、ユーザが発話UO19aにより指定したフォームF4には、既に「水曜日」が入力されている。 Next, as shown on the lower left, the user U utters a name of a form for specifying a voice recognition result erroneously input to the form F1 to the intended form, ie, the form F4 corresponding to the day of the week. We are doing UO19b. However, at this time, "Wednesday" has already been input to the form F4 designated by the user by the utterance UO 19a.
 この場合、入出力制御部220は、既に入力されている文字列と、新たに入力が指示された文字列とが両立できるか否かを判定し、当該判定に基づく制御を行う。例えば、図19に示す一例の場合、入出力制御部220は、フォームF4の性質上、「水曜日」と「火曜日」の両立が許容できないことから、新たに入力が指示された「火曜日」で既存の「水曜日」を上書きしてよい。一方、例えば、フォームF1のように自由入力を許容するフォームの場合には、入出力制御部220は、既に入力されている文字列を維持したまま、新たに入力が指示された文字列を追記してよい。このように、本実施形態に係る入出力制御部220によれば、フォームの性質に基づいた適切な修正制御を実現することが可能である。 In this case, the input / output control unit 220 determines whether the character string already input and the character string newly instructed to be input can be compatible, and performs control based on the determination. For example, in the case of an example shown in FIG. 19, the input / output control unit 220 can not accept both “Wednesday” and “Tuesday” due to the nature of the form F4, so “Tuesday” newly input is instructed. You may overwrite the "Wednesday" of. On the other hand, in the case of a form that allows free input, such as form F1, for example, the input / output control unit 220 appends a character string for which a new input is instructed while maintaining the already input character string. You may As described above, according to the input / output control unit 220 according to the present embodiment, it is possible to realize appropriate correction control based on the nature of the foam.
 以上、本実施形態に係る入出力制御部220が有する機能について具体例を挙げながら詳細に説明した。なお、上記では、入力される文字列の種別として日本語が用いられる場合を述べたが、本実施形態に係る入出力制御部220が有する機能は、言語の種類を問わず適用可能である。 The functions of the input / output control unit 220 according to the present embodiment have been described above in detail with specific examples. Although the case where Japanese is used as the type of the character string to be input has been described above, the function possessed by the input / output control unit 220 according to the present embodiment is applicable regardless of the type of language.
 図20は、文字列の種別として英語が用いられる場合の修正処理について説明するための図である。図20では、まず上段に示されるように、ユーザUが発話UO20aにより、「Tuesday」の予定を登録する指示を行っている。しかし、この際、音声認識に誤りがあり、「Tuesday」が「Choose way」と認識されたことから、入出力制御部220は、文字列「Choose way」を自由入力を許容するタイトルに対応するフォームF1に入力している。また、この際、入出力制御部220は、「Choose way」が含む2つの単位ブロック「Choose」および「way」をそれぞれ単位ブロックAおよびBとして表示させている。 FIG. 20 is a diagram for describing a correction process when English is used as the type of character string. In FIG. 20, first, as shown in the upper part, the user U instructs the utterance UO 20a to register the schedule of “Tuesday”. However, at this time, since there is an error in speech recognition and "Tuesday" is recognized as "Choose way", the input / output control unit 220 corresponds the title "Choose way" to a title that allows free input. You have filled in form F1. At this time, the input / output control unit 220 displays the two unit blocks “Choose” and “way” included in “Choose way” as unit blocks A and B, respectively.
 次に、ユーザUは、下段左に示すように、フォームの誤りおよび音声認識の誤りを修正するため、単位ブロックAおよびBをフォームF2に移動させるための発話UO20bを行っている。この際、本実施形態に係る入出力制御部220は、発話UO20bに基づいて、単位ブロックAおよびBをフォームF1から削除し、また指定されたフォームF2のドメインに基づいて修正された「Tuesday」をフォームF2に入力することが可能である。 Next, as shown in the lower left, the user U is making a speech UO 20b for moving the unit blocks A and B to the form F2 in order to correct the form error and the speech recognition error. At this time, the input / output control unit 220 according to the present embodiment deletes the unit blocks A and B from the form F1 based on the utterance UO 20b, and corrects based on the domain of the specified form F2 “Tuesday” Can be entered into form F2.
 <<1.6.動作の流れ>>
 次に、本実施形態に係る情報処理サーバ20の動作の流れについて詳細に説明する。図21は、本実施形態に係る情報処理サーバ20の動作の流れを示すフローチャートである。
<< 1.6. Flow of operation >>
Next, the flow of the operation of the information processing server 20 according to the present embodiment will be described in detail. FIG. 21 is a flowchart showing the flow of the operation of the information processing server 20 according to the present embodiment.
 図21を参照すると、まず、端末通信部230が、情報処理端末10が収集したユーザの発話情報を受信する(S1101)。 Referring to FIG. 21, first, the terminal communication unit 230 receives the speech information of the user collected by the information processing terminal 10 (S1101).
 次に、認識部210がステップS1101において受信された発話情報に基づいて、音声認識処理を実行する(S1102)。この際、認識部210は、上述した信頼度の算出、Nbest結果の取得、単位ブロック間のつながり確率の算出などを行ってよい。 Next, the recognition unit 210 executes speech recognition processing based on the speech information received in step S1101 (S1102). At this time, the recognition unit 210 may perform the calculation of the reliability, the acquisition of the Nbest result, the calculation of the connection probability between unit blocks, and the like.
 次に、入出力制御部220は、Nbest結果における1位~n位の信頼度の差が閾値を下回るか否かを判定する(S1103)。 Next, the input / output control unit 220 determines whether or not the difference in reliability between the 1st and nth places in the Nbest result is smaller than a threshold (S1103).
 ここで、Nbest結果における1位~n位の信頼度の差が閾値を下回る場合(S1103:Yes)、入出力制御部220は、信頼度の拮抗時における入出力制御を行うか否かを判断する(S1104)。 Here, when the difference in reliability between the 1st and nth places in the Nbest result falls below the threshold (S1103: Yes), the input / output control unit 220 determines whether to perform input / output control at the time of antagonism of reliability. (S1104).
 ここで、入出力制御部220が信頼度の拮抗時における入出力制御を行わないと判断した場合(S1104:No)、または、Nbest結果における1位~n位の信頼度の差が閾値以上である場合(S1103:No)、認識部210が最も信頼度の高い文字列候補を音声認識結果として出力し、入出力制御部220によるフォームの入出力制御が実行される(S1105)。 Here, when it is determined that the input / output control unit 220 does not perform input / output control at the time of antagonism of the reliability (S1104: No), or the difference between the first to nth reliability in the Nbest result is equal to or more than the threshold. If there is (S1103: No), the recognition unit 210 outputs the character string candidate with the highest reliability as the speech recognition result, and input / output control of the form is executed by the input / output control unit 220 (S1105).
 一方、入出力制御部220が信頼度の拮抗時における入出力制御を行うと判断した場合(S1104:Yes)、入出力制御部220は、図13および図14で示したような信頼度の拮抗時におけるフォームの入出力制御を実行する(S1106)。 On the other hand, when it is determined that the input / output control unit 220 performs input / output control at the time of antagonism of the reliability (S1104: Yes), the input / output control unit 220 antagonizes the reliability as shown in FIG. 13 and FIG. The form input / output control at the time is executed (S1106).
 この後、修正を指示するフィードバックが検出された場合(S1107:Yes)、入出力制御部220は、上記のフィードバックに基づいて認識部210に再度信頼度を計算させ(S1108)、再計算された信頼度に基づいてフォームの入出力制御を行う(S1109)。なお、上記のフィードバックは、音声のほか、視線、ジェスチャ、入力デバイスの操作などにより行われてもよい。 After that, when feedback indicating correction is detected (S1107: Yes), the input / output control unit 220 causes the recognition unit 210 to calculate the reliability again based on the above feedback (S1108), and the recalculation is performed. Form input / output control is performed based on the reliability (S1109). The above-mentioned feedback may be performed not only by voice but also by sight line, gesture, operation of an input device, or the like.
 一方、修正を指示するフィードバックが検出されない場合(S1107:No)、情報処理端末10は、一連の処理を終了する。 On the other hand, when the feedback instructing correction is not detected (S1107: No), the information processing terminal 10 ends the series of processing.
 <2.第2の実施形態>
 <<2.1.概要>>
 次に、本開示の第2の実施形態について説明する。上記の第1の実施形態では、入出力制御部220が、複数のフォームを備える音声入力インタフェースの制御を行う場合について述べた。一方、本開示に係る技術思想の適用範囲は、音声入力インタフェースに限定されない。そこで、第2の実施形態では、入出力制御部220が、Webページ上に配置されるフォームに対する文字列入力を制御する場合について説明する。
<2. Second embodiment>
<< 2.1. Overview >>
Next, a second embodiment of the present disclosure will be described. In the first embodiment described above, the case has been described where the input / output control unit 220 controls the voice input interface provided with a plurality of forms. On the other hand, the application scope of the technical idea according to the present disclosure is not limited to the voice input interface. Therefore, in the second embodiment, a case will be described where the input / output control unit 220 controls character string input to a form placed on a Web page.
 近年、情報処理技術の発展により、Webページを利用した種々のサービスが普及している。上記のようなサービスでは、Webページ上にユーザに関する情報などを入力させるための複数のフォームを配置することも珍しくない。また、上記のようなフォームに対し文字列を自動で入力する技術も存在する。 In recent years, with the development of information processing technology, various services using Web pages have become widespread. In such services, it is not uncommon to place multiple forms on a Web page for entering information about the user. In addition, there is also a technology for automatically inputting a character string to a form as described above.
 図22は、Webページ上に配置されるフォームに対する自動入力について説明するための図である。図22には、氏名、誕生日、電話番号、郵便番号などに対応する複数のフォームを有するWebページWPが示されている。ユーザは、配置されるフォームの1つ1つに、例えば、キーボードなどを用いて情報を入力することができるが、フォームの数が増大するほど、入力作業に係る負荷が高く、また入力ミスなどが発生することも想定される。 FIG. 22 is a diagram for describing an automatic input for a form placed on a web page. FIG. 22 shows a web page WP having a plurality of forms corresponding to a name, a birthday, a telephone number, a zip code and the like. The user can enter information in each of the placed forms using, for example, a keyboard, but as the number of forms increases, the load associated with the input operation increases, and input errors etc. Is also expected to occur.
 一方、近年では、予め設定された情報や過去の入力実績に基づいて、フォームに文字列を自動入力するツールも提供されている。係るツールによれば、図22に示すように、複数のフォームに対する情報の自動入力を実現し、入力作用に係るユーザの負担を大きく低減することが可能である。 On the other hand, in recent years, tools have also been provided for automatically inputting character strings in forms based on preset information and past input results. According to such a tool, as shown in FIG. 22, it is possible to realize automatic input of information to a plurality of forms, and to greatly reduce the burden on the user regarding the input operation.
 しかし、上記のようなツールでは、情報が誤ったフォームに入力されてしまうなどのミスが多く発生するのが実情である。図23Aおよび図23Bは、自動入力ツールによる入力ミスの例を示す図である。 However, with such tools as described above, there are many mistakes such as information being input in an incorrect form. FIG. 23A and FIG. 23B are diagrams showing examples of input errors by the automatic input tool.
 例えば、図23Aに示す一例の場合、氏名(漢字)および氏名(かな)に対応するフォームにおいて、「姓」と「名」、「せい」と「めい」が反対に入力されてしまっている。このような入力ミスは、例えば、自動入力ツールにおいて名前と名字が逆に対応付けられて管理されている場合などに生じ得る。 For example, in the example shown in FIG. 23A, in the form corresponding to the name (Kanji) and the name (Kana), “last name” and “first name” and “sei” and “mei” have been input in reverse. Such an input error may occur, for example, when the name and last name are reversely associated and managed in the automatic input tool.
 また、図23Aに示す一例では、郵便番号に対応する2つのフォームにそれぞれ分散されて入力されるべき情報が、一方のフォームに無理に押し込められるような入力ミスが発生している。このような入力ミスは、例えば、自動入力ツールにおいて、郵便番号がフォーム1つに対応する情報として管理されている場合などに発生し得る。 Further, in the example shown in FIG. 23A, an input error occurs such that the information to be separately input to the two forms corresponding to the zip code can be forced into one of the forms. Such an input error may occur, for example, when the postal code is managed as information corresponding to one form in the automatic input tool.
 また、図23Bには、フォームが想定する言語とは異なる言語により情報が入力されてしまう場合の一例が示されている。図23Bでは、本来、英語により入力すべきFirst nameおよびLast nameに対応するフォームに日本語表記による名前および名字がそれぞれ入力されている。このような入力ミスは、例えば、自動入力ツールに日本語表記の情報のみが記憶されている場合などに生じ得る。 Also, FIG. 23B shows an example in which information is input in a language different from the language assumed by the form. In FIG. 23B, Japanese first name and last name are input in the form corresponding to First name and Last name, which should be originally input in English. Such an input error may occur, for example, when only information written in Japanese is stored in the automatic input tool.
 本実施形態に係る技術思想は、上記の点に着目し発想されたものであり、自動入力により誤ったフォームに情報が入力された場合であっても、煩雑な操作なく容易な修正を可能とする。また、本実施形態に係る情報処理サーバ20によれば、よりミスの少ない情報入力を実現することが可能となる。 The technical idea according to the present embodiment was conceived focusing on the above points, and even if information is input to an incorrect form by automatic input, it is possible to perform easy correction without complicated operations. Do. Further, according to the information processing server 20 according to the present embodiment, it is possible to realize information input with fewer errors.
 以下、本実施形態に係る情報処理サーバ20が有する機能と特徴と当該特徴が奏する効果について詳細に説明する。なお、以下においては、第1の実施形態との差異について着目して説明を行い、第1の実施形態と共通する構成、機能、効果に係る詳細な説明は省略する。 Hereinafter, the functions and features of the information processing server 20 according to the present embodiment and the effects of the features will be described in detail. The following description focuses on differences from the first embodiment, and a detailed description of configurations, functions, and effects common to the first embodiment is omitted.
 <<2.2.機能の詳細>>
 上述したように、本実施形態に係る情報処理サーバ20は、Webページなどに配置される複数のフォームに対するより利便性の高い自動入力を実現することが可能である。図24は、本実施形態に係る入出力制御部220による自動入力制御について説明するための図である。
<< 2.2. Function Details >>
As described above, the information processing server 20 according to the present embodiment can realize more convenient automatic input to a plurality of forms arranged on a Web page or the like. FIG. 24 is a diagram for describing automatic input control by the input / output control unit 220 according to the present embodiment.
 図24には、氏名(漢字)、氏名(かな)、誕生日、電話番号、郵便番号などに対応する複数のフォームを有するWebページWPが示されている。この際、本実施形態に係る入出力制御部220は、ユーザの入力操作に基づいて、上記複数のフォームから情報入力を行う第1の対象フォームを複数選択し、選択した複数の第1の対象フォームに対し、指定された文字列の自動入力を行ってよい。 FIG. 24 shows a Web page WP having a plurality of forms corresponding to full name (Kanji), full name (Kana), birthday, phone number, zip code and the like. At this time, the input / output control unit 220 according to the present embodiment selects a plurality of first target forms for performing information input from the plurality of forms based on the user's input operation, and selects a plurality of selected first targets. You can automatically input the specified string to the form.
 この際、本実施形態に係る入出力制御部220は、例えば、ユーザの発話、マウスなどの入力デバイスを用いた操作、タッチなどを自動入力のトリガーとしてもよい。また、入出力制御部220は、ユーザの入力操作や事前に指定されたフォームセットFSに設定される情報を用いて、複数のフォームに対する自動入力を実行してよい。 At this time, the input / output control unit 220 according to the present embodiment may use, for example, an utterance of the user, an operation using an input device such as a mouse, a touch, or the like as a trigger of the automatic input. Further, the input / output control unit 220 may execute automatic input for a plurality of forms using information input by the user and information set in the form set FS designated in advance.
 図24には、本実施形態に係るフォームセットFSの一例が示されている。本実施形態に係るフォームセットFSは、複数のフォームに自動入力する情報をユーザや用途ごとにまとめた情報セットである。図24に示す一例の場合、フォームセットFSには、ユーザごとにまとめられた名字(漢字)、名前(漢字)、名字(かな)、名前(かな)、誕生日、電話番号、および郵便番号が定義されている。フォームセットFSは、過去の入力実績に基づいて、入出力制御部220が自動で生成してもよいし、ユーザが生成、編集可能なものであってもよい。 FIG. 24 shows an example of the form set FS according to the present embodiment. The form set FS according to the present embodiment is an information set in which information to be automatically input to a plurality of forms is summarized for each user and application. In the example shown in FIG. 24, the form set FS includes last name (Kanji), first name (Kanji), last name (Kana), first name (Kana), date of birth, telephone number, and zip code grouped by user. It is defined. The form set FS may be automatically generated by the input / output control unit 220 based on past input results, or may be generated and edited by the user.
 ここで、入出力制御部220は、フォームセットFSを視覚情報としてユーザに提示してもよい。この際、入出力制御部220は、フォームセットFSの名称や、フォームセットFSが含む各文字列にIDを付与してよい。ユーザは、例えば、名称「高志」や、名称「高志」に対応するID「1」を指定することで、自動入力に用いるフォームセットFSを取得し、当該フォームセットが含む情報を用いて、複数のフォームに対する自動入力を実行することができる。 Here, the input / output control unit 220 may present the form set FS as visual information to the user. At this time, the input / output control unit 220 may assign an ID to the name of the form set FS or each character string included in the form set FS. For example, the user acquires a form set FS used for automatic input by designating the name “Toshi” and the ID “1” corresponding to the name “Toshi”, and a plurality of information are included using the form set. It is possible to perform automatic filling of forms of.
 図24に示す一例の場合、入出力制御部220は、ユーザが指定したフォームセット名称「高志」に対応するフォームセットFSを用いて、複数のフォームに対する自動入力を実行している。本実施形態に係る入出力制御部220が有する上記の機能によれば、自動入力に用いる情報をユーザが視認したうえで指定することができ、よりユーザの意図に沿った入力ミスの少ない自動入力を実現することが可能となる。なお、ユーザの指定がない場合、入出力制御部220は、デフォルトで設定しているフォームセットFSを取得し自動入力を行ってもよい。 In the example shown in FIG. 24, the input / output control unit 220 executes automatic input for a plurality of forms using the form set FS corresponding to the form set name “Toshishi” designated by the user. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to specify information used for automatic input after the user visually recognizes, and automatic input with less input errors according to the user's intention It is possible to realize In addition, when there is no specification of the user, the input / output control unit 220 may obtain the form set FS set by default and perform automatic input.
 また、上記のような自動入力を行った場合、本実施形態に係る入出力制御部220は、WebページWPに配置される各フォームに対しIDを付与することを特徴の一つとする。図24に示す一例の場合、入出力制御部220は、各フォームにID「1」~「12」を付与し、WebページWP上に表示している。各フォームおよびフォームセットFSが含む各情報に対して付与されるIDは、入力ミスが生じた場合に、当該入力ミスの修正をユーザがより容易に実現するためのものであってよい。 In addition, when the above-described automatic input is performed, the input / output control unit 220 according to the present embodiment is characterized by assigning an ID to each form arranged in the web page WP. In the example shown in FIG. 24, the input / output control unit 220 assigns IDs “1” to “12” to each form and displays the forms on the web page WP. The ID given to each form and each piece of information included in the form set FS may be for the user to more easily realize correction of the input mistake when an input mistake occurs.
 図25~図27は、本実施形態に係る入力情報の修正について説明するための図である。例えば、図25の上段には、入出力制御部220により、WebページWPに配置されるフォームに対する自動入力が行われた後の状況が示されている。なお、図25では、図23Aに示した場合と同様に、「姓」と「名」、「せい」と「めい」が反対に入力された場合の一例が示されている。 25 to 27 are diagrams for explaining the correction of input information according to the present embodiment. For example, in the upper part of FIG. 25, a situation after the input / output control unit 220 has automatically input the form placed on the web page WP is shown. Note that, in FIG. 25, as in the case shown in FIG. 23A, an example is shown in which “last name” and “first name” and “sei” and “mei” are input in reverse.
 この場合、ユーザUは、各フォームに付与されたIDや、フォームセットFSが含む各文字列に付与された識別子を用いて自動入力結果に対する修正指示を行うことができる。図25に示す一例の場合、ユーザUは、「1と2が逆」という内容の発話UO25aや「Aを1に」という内容の発話UO25bを行うことで、修正指示に係るフィードバックを行っている。 In this case, the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS. In the example shown in FIG. 25, the user U performs feedback relating to a correction instruction by performing an utterance UO 25 a with a content of “1 and 2 are reversed” and an utterance UO 25 b with a content of “A to 1”. .
 この際、入出力制御部220は、例えば、発話UO25aに基づいて、ID「1」に対応するフォーム「姓」とID「2」に対応するフォーム「名」に入力した情報を入れ替えることができる。また、入出力制御部220は、例えば、発話UO25bに基づいて、フォームセットFSが含むID「A」に対応する文字列「上田」で、ID「1」に対応するフォーム「姓」を上書きし、また、フォーム「姓」に入力されていた文字列「高志」をフォーム「名」に移動させることができる。 At this time, the input / output control unit 220 can, for example, replace the information input to the form “last name” corresponding to the ID “1” and the form “first name” corresponding to the ID “2” based on the utterance UO 25a. . Also, the input / output control unit 220 overwrites the form “surname” corresponding to the ID “1” with the character string “Ueda” corresponding to the ID “A” included in the form set FS, for example, based on the utterance UO 25 b. Also, it is possible to move the character string "Koshishi" entered in the form "surname" to the form "first name".
 また、入出力制御部220は、上記のように、フォーム「姓」とフォーム「名」とに入力した文字列の入れ替えが指示された場合、関連するフォーム「せい」とフォーム「めい」とに入力した文字列の入れ替えを自動で行ってもよい。また、例えば、ユーザUが「1を3に」などの発話を行った場合には、入出力制御部220は、フォーム「姓」に入力した文字列「高志」をフォーム「めい」の入力形式に合わせてかな表現に修正したうえで、フォーム「めい」に入力することも可能である。 In addition, as described above, when it is instructed to replace the character string input in the form "last name" and the form "first name", the input / output control unit 220 It is possible to automatically replace the input character string. Also, for example, when the user U utters "1 to 3", etc., the input / output control unit 220 inputs the character string "Toshishi" entered in the form "surname" as the input form of the form "Mei". It is also possible to fill in the form "Mei" after modifying it to a kana expression.
 また、図26には、本来、ID「11」およびID「12」が付与されたフォームに分散されて入力されるべき郵便番号が、ID「11」が付与されたフォームのみに入力された場合の一例が示されている。 Also, in FIG. 26, the case where the zip code that should originally be dispersed and input into the form to which ID "11" and ID "12" are assigned is input only to the form to which ID "11" is assigned. An example is shown.
 この場合も同様に、ユーザUは、各フォームに付与されたIDや、フォームセットFSが含む各文字列に付与された識別子を用いて自動入力結果に対する修正指示を行うことができる。 Also in this case, the user U can give an instruction to correct the automatic input result using the ID given to each form or the identifier given to each character string included in the form set FS.
 図26に示す一例の場合、ユーザUは、「11を11と12に」という内容の発話UO26aや「Gを11と12に」という内容の発話UO26bを行うことで、修正指示に係るフィードバックを行っている。 In the example shown in FIG. 26, the user U performs the feedback relating to the correction instruction by performing the utterance UO 26 a with the content of “11 to 11 and 12” and the utterance UO 26 b with the content “G to 11 and 12”. Is going.
 この際、入出力制御部220は、例えば、発話UO26aや発話UO26bに基づいて、ID「11」を付与したフォームに入力した文字列「111-2222」を参照し、当該文字列内に含まれるデリミタや、フォームの属性、また一般知識などに基づいて、当該文字列を分割し、ID「11」およびID「12」を付与したフォームにそれぞれ入力することができる。 At this time, the input / output control unit 220 refers to the character string “111-2222” input to the form given the ID “11” based on the speech UO 26 a and the speech U O 26 b, for example, and is included in the character string The character string can be divided based on the delimiter, the attribute of the form, the general knowledge, etc., and the character string can be input to the form to which the ID "11" and the ID "12" are given.
 なお、文字列の区切り位置が取得できない場合、入出力制御部220は、ユーザに区切り位置の指定を要求する出力を情報処理端末10に行わせてよい。この場合、入出力制御部220は、例えば、ユーザが、「3桁と4桁」のような発話を行ったことに基づいて、区切り位置を取得し、フォームセットFSが保持する文字列の内容を修正することも可能である。 In addition, when the break position of a character string can not be acquired, the input / output control unit 220 may cause the information processing terminal 10 to perform an output requesting the user to specify the break position. In this case, the input / output control unit 220 acquires the break position based on, for example, the user speaking "3 digits and 4 digits", and the contents of the character string held by the form set FS It is also possible to correct
 また、図27には、本来、英語により入力されるべきフォームに日本語表記の文字列が入力された場合の一例が示されている。 Also, FIG. 27 shows an example of the case where a Japanese-written character string is input to a form that should normally be input in English.
 この場合、ユーザUは、各フォームに付与されたIDや、フォームセットFSが含む各文字列に付与された識別子を用いて自動入力結果に対する修正指示を行うことができる。また、ユーザUは、フォームセットFSの名称やIDを用いて修正指示を行ってもよい。 In this case, the user U can issue an instruction to correct the automatic input result using the ID assigned to each form or the identifier assigned to each character string included in the form set FS. The user U may also issue a correction instruction using the name or ID of the form set FS.
 図27に示す一例の場合、ユーザUは、「英語のフォームセット」という内容の発話UO27aや「AとBを英語で」という内容の発話UO27bを行うことで、修正指示に係るフィードバックを行っている。 In the example shown in FIG. 27, the user U performs feedback relating to the correction instruction by performing the utterance UO 27a with the content "form set in English" and the utterance UO 27b with the content "A and B in English". There is.
 この際、入出力制御部220は、例えば、発話UO26aや発話UO26bに基づいて、フォームセットFSを切り替えたうえで、再度自動入力を実行してもよい。なお、フォームセットFSの切り替えに係る指示が行われる前に、例えば、フォーム間の入れ替えなどに係る修正指示が行われていた場合には、入出力制御部220は、当該修正指示の内容を、フォームセットFSの切り替え後にも反映させてよい。 At this time, the input / output control unit 220 may execute automatic input again after switching the form set FS based on, for example, the utterance UO 26a or the utterance UO 26b. If, for example, a correction instruction relating to switching between forms is performed before the instruction relating to the switching of the form set FS, the input / output control unit 220 determines the content of the correction instruction, It may be reflected even after switching of the form set FS.
 このように、本実施形態に係るフォームセットFSは、ユーザや言語、ロケーション、また用途などに応じて、複数設定することができ、状況に応じて切り替えることが可能である。 As described above, a plurality of form sets FS according to the present embodiment can be set according to the user, the language, the location, the application, and the like, and can be switched according to the situation.
 以上、本実施形態に係る入出力制御部220が有する機能について詳細に説明した。本実施形態に係る入出力制御部220が有する上記の機能によれば、より入力ミスの少ないフォームの自動入力を実現すると共に、入力ミスが発生した場合でも、容易に入力内容を修正することが可能となる。なお、上記では、入出力制御部220が、Webページに配置されるフォームに対して自動入力を行う場合を例に述べたが、入出力制御部220は、係る例に限定されず、複数のフォームに対する自動入力に広く対応することが可能である。 The functions of the input / output control unit 220 according to the present embodiment have been described above in detail. According to the above-described function of the input / output control unit 220 according to the present embodiment, it is possible to realize automatic input of a form with fewer input errors and to easily correct input contents even when an input error occurs. It becomes possible. In the above, although the case where the input / output control unit 220 automatically inputs to the form arranged in the Web page has been described as an example, the input / output control unit 220 is not limited to such an example. It is possible to correspond widely to the automatic input to the form.
 <3.ハードウェア構成例>
 次に、本開示の一実施形態に係る情報処理端末10および情報処理サーバ20に共通するハードウェア構成例について説明する。図28は、本開示の一実施形態に係る情報処理端末10および情報処理サーバ20のハードウェア構成例を示すブロック図である。図28を参照すると、情報処理端末10および情報処理サーバ20は、例えば、プロセッサ871と、ROM872と、RAM873と、ホストバス874と、ブリッジ875と、外部バス876と、インタフェース877と、入力装置878と、出力装置879と、ストレージ880と、ドライブ881と、接続ポート882と、通信装置883と、を有する。なお、ここで示すハードウェア構成は一例であり、構成要素の一部が省略されてもよい。また、ここで示される構成要素以外の構成要素をさらに含んでもよい。
<3. Hardware configuration example>
Next, a hardware configuration example common to the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure will be described. FIG. 28 is a block diagram illustrating an exemplary hardware configuration of the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure. Referring to FIG. 28, the information processing terminal 10 and the information processing server 20 include, for example, a processor 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, and an input device 878. , An output device 879, a storage 880, a drive 881, a connection port 882, and a communication device 883. Note that the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than the components shown here may be further included.
 (プロセッサ871)
 プロセッサ871は、例えば、演算処理装置又は制御装置として機能し、ROM872、RAM873、ストレージ880、又はリムーバブル記録媒体901に記録された各種プログラムに基づいて各構成要素の動作全般又はその一部を制御する。
(Processor 871)
The processor 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation or a part of each component based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable recording medium 901. .
 (ROM872、RAM873)
 ROM872は、プロセッサ871に読み込まれるプログラムや演算に用いるデータ等を格納する手段である。RAM873には、例えば、プロセッサ871に読み込まれるプログラムや、そのプログラムを実行する際に適宜変化する各種パラメータ等が一時的又は永続的に格納される。
(ROM 872, RAM 873)
The ROM 872 is a means for storing a program read by the processor 871, data used for an operation, and the like. The RAM 873 temporarily or permanently stores, for example, a program read by the processor 871 and various parameters and the like that appropriately change when the program is executed.
 (ホストバス874、ブリッジ875、外部バス876、インタフェース877)
 プロセッサ871、ROM872、RAM873は、例えば、高速なデータ伝送が可能なホストバス874を介して相互に接続される。一方、ホストバス874は、例えば、ブリッジ875を介して比較的データ伝送速度が低速な外部バス876に接続される。また、外部バス876は、インタフェース877を介して種々の構成要素と接続される。
(Host bus 874, bridge 875, external bus 876, interface 877)
The processor 871, the ROM 872, and the RAM 873 are connected to one another via, for example, a host bus 874 capable of high-speed data transmission. On the other hand, host bus 874 is connected to external bus 876, which has a relatively low data transmission speed, via bridge 875, for example. The external bus 876 is also connected to various components via an interface 877.
 (入力装置878)
 入力装置878には、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等が用いられる。さらに、入力装置878としては、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラ(以下、リモコン)が用いられることもある。また、入力装置878には、マイクロフォンなどの音声入力装置が含まれる。
(Input device 878)
For the input device 878, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like are used. Furthermore, as the input device 878, a remote controller (hereinafter, remote control) capable of transmitting a control signal using infrared rays or other radio waves may be used. The input device 878 also includes a voice input device such as a microphone.
 (出力装置879)
 出力装置879は、例えば、CRT(Cathode Ray Tube)、LCD、又は有機EL等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置である。また、本開示に係る出力装置879は、触覚刺激を出力することが可能な種々の振動デバイスを含む。
(Output device 879)
The output device 879 is a display device such as a CRT (Cathode Ray Tube), an LCD, or an organic EL, a speaker, an audio output device such as a headphone, a printer, a mobile phone, or a facsimile. It is a device that can be notified visually or aurally. Also, the output device 879 according to the present disclosure includes various vibration devices capable of outputting haptic stimulation.
 (ストレージ880)
 ストレージ880は、各種のデータを格納するための装置である。ストレージ880としては、例えば、ハードディスクドライブ(HDD)等の磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、又は光磁気記憶デバイス等が用いられる。
(Storage 880)
The storage 880 is a device for storing various data. As the storage 880, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.
 (ドライブ881)
 ドライブ881は、例えば、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体901に記録された情報を読み出し、又はリムーバブル記録媒体901に情報を書き込む装置である。
(Drive 881)
The drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901, for example.
 (リムーバブル記録媒体901)
リムーバブル記録媒体901は、例えば、DVDメディア、Blu-ray(登録商標)メディア、HD DVDメディア、各種の半導体記憶メディア等である。もちろん、リムーバブル記録媒体901は、例えば、非接触型ICチップを搭載したICカード、又は電子機器等であってもよい。
(Removable recording medium 901)
The removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like. Of course, the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.
 (接続ポート882)
 接続ポート882は、例えば、USB(Universal Serial Bus)ポート、IEEE1394ポート、SCSI(Small Computer System Interface)、RS-232Cポート、又は光オーディオ端子等のような外部接続機器902を接続するためのポートである。
(Connection port 882)
The connection port 882 is, for example, a port for connecting an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
 (外部接続機器902)
 外部接続機器902は、例えば、プリンタ、携帯音楽プレーヤ、デジタルカメラ、デジタルビデオカメラ、又はICレコーダ等である。
(Externally connected device 902)
The external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.
 (通信装置883)
 通信装置883は、ネットワークに接続するための通信デバイスであり、例えば、有線又は無線LAN、Bluetooth(登録商標)、又はWUSB(Wireless USB)用の通信カード、光通信用のルータ、ADSL(Asymmetric Digital Subscriber Line)用のルータ、又は各種通信用のモデム等である。
(Communication device 883)
The communication device 883 is a communication device for connecting to a network. For example, a communication card for wired or wireless LAN, Bluetooth (registered trademark) or WUSB (Wireless USB), a router for optical communication, ADSL (Asymmetric Digital) (Subscriber Line) router, or modem for various communications.
 <4.まとめ>
 以上説明したように、本開示の一実施形態に係る情報処理サーバ20は、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、当該第1の対象フォームに文字入力を行う入出力制御部220を備える。また、本開示の一実施形態に係る入出力制御部220は、第1の対象フォームに入力された入力内容に対するユーザのフィードバックに基づいて、第1の対象フォームとは異なる第2の対象フォームを選択し、当該第2の対象フォームに前記文字入力を行うことを特徴の一つとする。係る構成によれば、入力対象となるフォームの選択誤りを容易に修正することが可能となる。
<4. Summary>
As described above, the information processing server 20 according to an embodiment of the present disclosure selects a first target form to be input from a plurality of forms based on the input operation of the user, and the first target It has an input / output control unit 220 for inputting characters in a form. In addition, the input / output control unit 220 according to an embodiment of the present disclosure is configured to select a second target form different from the first target form based on user feedback on the input content input to the first target form. One of the features is to select and perform the character input on the second target form. According to the configuration, it is possible to easily correct the selection error of the form to be input.
 以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 The preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It will be apparent to those skilled in the art of the present disclosure that various modifications and alterations can be conceived within the scope of the technical idea described in the claims. It is naturally understood that the technical scope of the present disclosure is also included.
 また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 In addition, the effects described in the present specification are merely illustrative or exemplary, and not limiting. That is, the technology according to the present disclosure can exhibit other effects apparent to those skilled in the art from the description of the present specification, in addition to or instead of the effects described above.
 また、本明細書の情報処理サーバ20の処理に係る各ステップは、必ずしもフローチャートに記載された順序に沿って時系列に処理される必要はない。例えば、情報処理サーバ20の処理に係る各ステップは、フローチャートに記載された順序と異なる順序で処理されても、並列的に処理されてもよい。 Moreover, each step concerning processing of information processing server 20 of this specification does not necessarily need to be processed in chronological order according to the order described in the flowchart. For example, the steps related to the processing of the information processing server 20 may be processed in an order different from the order described in the flowchart or may be processed in parallel.
 なお、以下のような構成も本開示の技術的範囲に属する。
(1)
 ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行う制御部、
 を備え、
 前記制御部は、前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行う、
情報処理装置。
(2)
 前記制御部は、前記フィードバックにより指定された前記フォームを前記第2の対象フォームとして選択し、前記第1の対象フォームに入力された入力内容の少なくとも一部に対応する文字を前記第2の対象フォームに入力する、
前記(1)に記載の情報処理装置。
(3)
 前記制御部は、前記第1の対象フォームに入力された入力内容が含む単位ブロックを前記入力内容と共に表示させ、前記フィードバックにより指定された前記単位ブロックに対応する文字を前記第1の対象フォームから削除すると共に、前記単位ブロックに対応する文字を前記第2の対象フォームに入力する、
前記(1)または(2)に記載の情報処理装置。
(4)
 前記制御部は、前記フィードバックに基づいて前記単位ブロックが含む文字列を分離し、分離した前記文字列を、前記第2の対象フォームに入力する、
前記(3)に記載の情報処理装置。
(5)
 前記入力操作または前記フィードバックのうち少なくとも一方は、発話により行われる、
前記(1)~(4)のいずれかに記載の情報処理装置。
(6)
 前記制御部は、発話により行われる前記入力操作に対する音声認識結果に基づいて、前記第1の対象フォームを選択し、前記音声認識結果を前記第1の対象フォームに入力する、
前記(1)~(5)のいずれかに記載の情報処理装置。
(7)
 前記制御部は、前記音声認識結果と前記フォームに設定されたドメインとに基づいて、前記第1の対象フォームを選択する、
前記(6)に記載の情報処理装置。
(8)
 前記制御部は、選択した前記第2の対象フォームに設定されたドメインに基づいて修正された前記音声認識結果を、前記第2の対象フォームに入力する、
前記(6)または(7)に記載の情報処理装置。
(9)
 前記制御部は、選択した前記第2の対象フォームに設定されたドメインに基づいて前記音声認識結果に係る信頼度の再算出を制御し、修正された前記音声認識結果を、第2の対象フォームに入力する、
前記(6)~(8)のいずれかに記載の情報処理装置。
(10)
 前記制御部は、前記第1の対象フォームに入力された前記音声認識結果が含む単位ブロックを前記音声認識結果と共に表示させ、前記フィードバックにより指定された第1の単位ブロックと、前記フィードバックにより指定された前記フォームに設定されたドメインとに基づいて、前記第1の単位ブロックの前後に位置する第2の単位ブロックに係るつながり確率を再算出させる、
前記(6)~(9)のいずれかに記載の情報処理装置。
(11)
 前記制御部は、前記つながり確率の再算出により修正された第2の単位ブロックに対応する文字列を、当該文字列に対応付けられたドメインが設定された前記フォームに入力する、
前記(10)に記載の情報処理装置。
(12)
 前記制御部は、前記フィードバックに基づいて、前記音声認識結果の少なくとも一部にドメインを新たに対応付ける、
前記(6)~(11)のいずれかに記載の情報処理装置。
(13)
 前記制御部は、前記フィードバックにより指定された文字列と、前記フィードバックにより指定された前記フォームに設定されたドメインとを新たに対応付ける、
前記(12)に記載の情報処理装置。
(14)
 前記制御部は、前記音声認識結果の信頼度が閾値を下回る場合、前記第1の対象フォームを選択せずに、前記音声認識結果を入力する前記フォームを指定するフィードバックを前記ユーザに要求する、
前記(6)~(13)のいずれかに記載の情報処理装置。
(15)
 前記制御部は、前記音声認識結果に係る文字列候補の信頼度が拮抗した場合、拮抗する前記信頼度を有する前記文字列候補に対応するドメインに基づいて、複数の前記第2の対象フォームを選択し、複数の前記第2の対象フォームに、拮抗する前記信頼度を有する前記文字列候補をそれぞれ入力する、
前記(6)~(14)のいずれかに記載の情報処理装置。
(16)
 前記制御部は、前記音声認識結果に同一のドメインに対応付けられた複数の文字列が含まれる場合、ドメインが設定されていない前記フォームを指定し、指定した前記フォームに対する入力内容の発話を前記ユーザに要求する、
前記(6)~(15)のいずれかに記載の情報処理装置。
(17)
 前記制御部は、前記入力操作に基づいて、複数の前記第1の対象フォームを選択し、設定された文字列の自動入力を行う、
前記(1)~(16)のいずれかに記載の情報処理装置。
(18)
 前記制御部は、複数の前記第1の対象フォームに自動入力する文字列を定義するフォームセットを前記ユーザに提示し、指定された前記フォームセットに基づいて前記自動入力を実行する、
前記(17)に記載の情報処理装置。
(19)
 前記制御部は、前記フォームセットが含む文字列または前記フォームの少なくとも一方に識別子を付与し、前記フィードバックに含まれる前記識別子に基づいて、前記自動入力の結果を修正する、
前記(18)に記載の情報処理装置。
(20)
 プロセッサが、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行うことと、
 前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行うことと、
 を含む、
情報処理方法。
The following configurations are also within the technical scope of the present disclosure.
(1)
A control unit that selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form,
Equipped with
The control unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and the second target form Perform the above character input,
Information processing device.
(2)
The control unit selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is the second target Fill in the form,
The information processing apparatus according to (1).
(3)
The control unit causes a unit block included in the input content input to the first target form to be displayed together with the input content, and a character corresponding to the unit block specified by the feedback from the first target form While deleting, the character corresponding to the said unit block is input into said 2nd object form,
The information processing apparatus according to (1) or (2).
(4)
The control unit separates a character string included in the unit block based on the feedback, and inputs the separated character string to the second target form.
The information processing apparatus according to (3).
(5)
At least one of the input operation and the feedback is performed by speech.
The information processing apparatus according to any one of the above (1) to (4).
(6)
The control unit selects the first target form based on the result of speech recognition for the input operation performed by speech, and inputs the result of the speech recognition to the first target form.
The information processing apparatus according to any one of the above (1) to (5).
(7)
The control unit selects the first target form based on the speech recognition result and a domain set in the form.
The information processing apparatus according to (6).
(8)
The control unit inputs, to the second target form, the speech recognition result corrected based on a domain set in the selected second target form.
The information processing apparatus according to (6) or (7).
(9)
The control unit controls recalculation of the reliability related to the voice recognition result based on the domain set in the selected second target form, and the corrected voice recognition result is converted to a second target form. Enter in
The information processing apparatus according to any one of the above (6) to (8).
(10)
The control unit causes the unit block included in the voice recognition result input to the first target form to be displayed together with the voice recognition result, and the first unit block designated by the feedback and the feedback are designated by the feedback. Causing the connection probability of the second unit block located before and after the first unit block to be recalculated based on the domain set in the form;
The information processing apparatus according to any one of the above (6) to (9).
(11)
The control unit inputs a character string corresponding to a second unit block corrected by recalculation of the connection probability into the form in which a domain associated with the character string is set.
The information processing apparatus according to (10).
(12)
The control unit newly associates a domain with at least a part of the speech recognition result based on the feedback.
The information processing apparatus according to any one of the above (6) to (11).
(13)
The control unit newly associates a character string designated by the feedback with a domain set in the form designated by the feedback.
The information processing apparatus according to (12).
(14)
The control unit requests the user to provide feedback for specifying the form for inputting the speech recognition result without selecting the first target form when the reliability of the speech recognition result is lower than a threshold.
The information processing apparatus according to any one of the above (6) to (13).
(15)
The control unit is configured to select a plurality of second target forms based on a domain corresponding to the character string candidate having the reliability that is competitive when the reliability of the character string candidate related to the speech recognition result is antagonized. The character string candidates having the reliability to be competitively selected are respectively input to the plurality of second target forms.
The information processing apparatus according to any one of the above (6) to (14).
(16)
When the voice recognition result includes a plurality of character strings associated with the same domain, the control unit designates the form in which the domain is not set, and utters the input content for the designated form. Ask the user,
The information processing apparatus according to any one of the above (6) to (15).
(17)
The control unit selects a plurality of the first target forms based on the input operation, and performs automatic input of a set character string.
The information processing apparatus according to any one of the above (1) to (16).
(18)
The control unit presents to the user a form set that defines a string of characters to be automatically input to the plurality of first target forms, and executes the automatic input based on the designated form set.
The information processing apparatus according to (17).
(19)
The control unit adds an identifier to at least one of the character string included in the form set and the form, and corrects the result of the automatic input based on the identifier included in the feedback.
The information processing apparatus according to (18).
(20)
The processor selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form;
A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form What to do,
including,
Information processing method.
 10   情報処理端末
 110  表示部
 120  音声出力部
 130  音声入力部
 140  撮像部
 150  センサ部
 160  制御部
 170  サーバ通信部
 20   情報処理サーバ
 210  認識部
 220  入出力制御部
 230  端末通信部
10 information processing terminal 110 display unit 120 voice output unit 130 voice input unit 140 imaging unit 150 sensor unit 160 control unit 170 server communication unit 20 information processing server 210 recognition unit 220 input / output control unit 230 terminal communication unit

Claims (20)

  1.  ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行う制御部、
     を備え、
     前記制御部は、前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行う、
    情報処理装置。
    A control unit that selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form,
    Equipped with
    The control unit selects a second target form different from the first target form based on the feedback of the user on the input content input to the first target form, and the second target form Perform the above character input,
    Information processing device.
  2.  前記制御部は、前記フィードバックにより指定された前記フォームを前記第2の対象フォームとして選択し、前記第1の対象フォームに入力された入力内容の少なくとも一部に対応する文字を前記第2の対象フォームに入力する、
    請求項1に記載の情報処理装置。
    The control unit selects the form specified by the feedback as the second target form, and the character corresponding to at least a part of the input content input to the first target form is the second target Fill in the form,
    An information processing apparatus according to claim 1.
  3.  前記制御部は、前記第1の対象フォームに入力された入力内容が含む単位ブロックを前記入力内容と共に表示させ、前記フィードバックにより指定された前記単位ブロックに対応する文字を前記第1の対象フォームから削除すると共に、前記単位ブロックに対応する文字を前記第2の対象フォームに入力する、
    請求項1に記載の情報処理装置。
    The control unit causes a unit block included in the input content input to the first target form to be displayed together with the input content, and a character corresponding to the unit block specified by the feedback from the first target form While deleting, the character corresponding to the said unit block is input into said 2nd object form,
    An information processing apparatus according to claim 1.
  4.  前記制御部は、前記フィードバックに基づいて前記単位ブロックが含む文字列を分離し、分離した前記文字列を、前記第2の対象フォームに入力する、
    請求項3に記載の情報処理装置。
    The control unit separates a character string included in the unit block based on the feedback, and inputs the separated character string to the second target form.
    The information processing apparatus according to claim 3.
  5.  前記入力操作または前記フィードバックのうち少なくとも一方は、発話により行われる、
    請求項1に記載の情報処理装置。
    At least one of the input operation and the feedback is performed by speech.
    An information processing apparatus according to claim 1.
  6.  前記制御部は、発話により行われる前記入力操作に対する音声認識結果に基づいて、前記第1の対象フォームを選択し、前記音声認識結果を前記第1の対象フォームに入力する、
    請求項1に記載の情報処理装置。
    The control unit selects the first target form based on the result of speech recognition for the input operation performed by speech, and inputs the result of the speech recognition to the first target form.
    An information processing apparatus according to claim 1.
  7.  前記制御部は、前記音声認識結果と前記フォームに設定されたドメインとに基づいて、前記第1の対象フォームを選択する、
    請求項6に記載の情報処理装置。
    The control unit selects the first target form based on the speech recognition result and a domain set in the form.
    The information processing apparatus according to claim 6.
  8.  前記制御部は、選択した前記第2の対象フォームに設定されたドメインに基づいて修正された前記音声認識結果を、前記第2の対象フォームに入力する、
    請求項6に記載の情報処理装置。
    The control unit inputs, to the second target form, the speech recognition result corrected based on a domain set in the selected second target form.
    The information processing apparatus according to claim 6.
  9.  前記制御部は、選択した前記第2の対象フォームに設定されたドメインに基づいて前記音声認識結果に係る信頼度の再算出を制御し、修正された前記音声認識結果を、第2の対象フォームに入力する、
    請求項6に記載の情報処理装置。
    The control unit controls recalculation of the reliability related to the voice recognition result based on the domain set in the selected second target form, and the corrected voice recognition result is converted to a second target form. Enter in
    The information processing apparatus according to claim 6.
  10.  前記制御部は、前記第1の対象フォームに入力された前記音声認識結果が含む単位ブロックを前記音声認識結果と共に表示させ、前記フィードバックにより指定された第1の単位ブロックと、前記フィードバックにより指定された前記フォームに設定されたドメインとに基づいて、前記第1の単位ブロックの前後に位置する第2の単位ブロックに係るつながり確率を再算出させる、
    請求項6に記載の情報処理装置。
    The control unit causes the unit block included in the voice recognition result input to the first target form to be displayed together with the voice recognition result, and the first unit block designated by the feedback and the feedback are designated by the feedback. Causing the connection probability of the second unit block located before and after the first unit block to be recalculated based on the domain set in the form;
    The information processing apparatus according to claim 6.
  11.  前記制御部は、前記つながり確率の再算出により修正された第2の単位ブロックに対応する文字列を、当該文字列に対応付けられたドメインが設定された前記フォームに入力する、
    請求項10に記載の情報処理装置。
    The control unit inputs a character string corresponding to a second unit block corrected by recalculation of the connection probability into the form in which a domain associated with the character string is set.
    The information processing apparatus according to claim 10.
  12.  前記制御部は、前記フィードバックに基づいて、前記音声認識結果の少なくとも一部にドメインを新たに対応付ける、
    請求項6に記載の情報処理装置。
    The control unit newly associates a domain with at least a part of the speech recognition result based on the feedback.
    The information processing apparatus according to claim 6.
  13.  前記制御部は、前記フィードバックにより指定された文字列と、前記フィードバックにより指定された前記フォームに設定されたドメインとを新たに対応付ける、
    請求項12に記載の情報処理装置。
    The control unit newly associates a character string designated by the feedback with a domain set in the form designated by the feedback.
    The information processing apparatus according to claim 12.
  14.  前記制御部は、前記音声認識結果の信頼度が閾値を下回る場合、前記第1の対象フォームを選択せずに、前記音声認識結果を入力する前記フォームを指定するフィードバックを前記ユーザに要求する、
    請求項6に記載の情報処理装置。
    The control unit requests the user to provide feedback for specifying the form for inputting the speech recognition result without selecting the first target form when the reliability of the speech recognition result is lower than a threshold.
    The information processing apparatus according to claim 6.
  15.  前記制御部は、前記音声認識結果に係る文字列候補の信頼度が拮抗した場合、拮抗する前記信頼度を有する前記文字列候補に対応するドメインに基づいて、複数の前記第2の対象フォームを選択し、複数の前記第2の対象フォームに、拮抗する前記信頼度を有する前記文字列候補をそれぞれ入力する、
    請求項6に記載の情報処理装置。
    The control unit is configured to select a plurality of second target forms based on a domain corresponding to the character string candidate having the reliability that is competitive when the reliability of the character string candidate related to the speech recognition result is antagonized. The character string candidates having the reliability to be competitively selected are respectively input to the plurality of second target forms.
    The information processing apparatus according to claim 6.
  16.  前記制御部は、前記音声認識結果に同一のドメインに対応付けられた複数の文字列が含まれる場合、ドメインが設定されていない前記フォームを指定し、指定した前記フォームに対する入力内容の発話を前記ユーザに要求する、
    請求項6に記載の情報処理装置。
    When the voice recognition result includes a plurality of character strings associated with the same domain, the control unit designates the form in which the domain is not set, and utters the input content for the designated form. Ask the user,
    The information processing apparatus according to claim 6.
  17.  前記制御部は、前記入力操作に基づいて、複数の前記第1の対象フォームを選択し、設定された文字列の自動入力を行う、
    請求項1に記載の情報処理装置。
    The control unit selects a plurality of the first target forms based on the input operation, and performs automatic input of a set character string.
    An information processing apparatus according to claim 1.
  18.  前記制御部は、複数の前記第1の対象フォームに自動入力する文字列を定義するフォームセットを前記ユーザに提示し、指定された前記フォームセットに基づいて前記自動入力を実行する、
    請求項17に記載の情報処理装置。
    The control unit presents to the user a form set that defines a string of characters to be automatically input to the plurality of first target forms, and executes the automatic input based on the designated form set.
    The information processing apparatus according to claim 17.
  19.  前記制御部は、前記フォームセットが含む文字列または前記フォームの少なくとも一方に識別子を付与し、前記フィードバックに含まれる前記識別子に基づいて、前記自動入力の結果を修正する、
    請求項18に記載の情報処理装置。
    The control unit adds an identifier to at least one of the character string included in the form set and the form, and corrects the result of the automatic input based on the identifier included in the feedback.
    An information processing apparatus according to claim 18.
  20.  プロセッサが、ユーザの入力操作に基づいて、複数のフォームから入力対象となる第1の対象フォームを選択し、前記第1の対象フォームに文字入力を行うことと、
     前記第1の対象フォームに入力された入力内容に対する前記ユーザのフィードバックに基づいて、前記第1の対象フォームとは異なる第2の対象フォームを選択し、前記第2の対象フォームに前記文字入力を行うことと、
     を含む、
    情報処理方法。
    The processor selects a first target form to be input from a plurality of forms based on a user's input operation, and performs character input on the first target form;
    A second target form different from the first target form is selected based on the user's feedback on the input content input to the first target form, and the character input is performed on the second target form What to do,
    including,
    Information processing method.
PCT/JP2018/038725 2018-01-22 2018-10-17 Information processing device and information processing method WO2019142419A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-008156 2018-01-22
JP2018008156 2018-01-22

Publications (1)

Publication Number Publication Date
WO2019142419A1 true WO2019142419A1 (en) 2019-07-25

Family

ID=67302083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/038725 WO2019142419A1 (en) 2018-01-22 2018-10-17 Information processing device and information processing method

Country Status (1)

Country Link
WO (1) WO2019142419A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023007960A (en) * 2021-07-02 2023-01-19 株式会社アドバンスト・メディア Information processing device, information processing system, information processing method, and program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02126300A (en) * 1988-11-04 1990-05-15 Nippon Telegr & Teleph Corp <Ntt> Speech correction system
JP2000207166A (en) * 1999-01-19 2000-07-28 Nec Corp Device and method for voice input
JP2001306293A (en) * 2000-04-20 2001-11-02 Canon Inc Method and device for inputting information, and storage medium
WO2002031643A1 (en) * 2000-10-11 2002-04-18 Canon Kabushiki Kaisha Information processing device, information processing method, and storage medium
JP2004222169A (en) * 2003-01-17 2004-08-05 Daikin Ind Ltd Information processor and information processing method, and program
JP2015516587A (en) * 2012-03-08 2015-06-11 フェイスブック,インク. Devices that extract information from dialogue

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02126300A (en) * 1988-11-04 1990-05-15 Nippon Telegr & Teleph Corp <Ntt> Speech correction system
JP2000207166A (en) * 1999-01-19 2000-07-28 Nec Corp Device and method for voice input
JP2001306293A (en) * 2000-04-20 2001-11-02 Canon Inc Method and device for inputting information, and storage medium
WO2002031643A1 (en) * 2000-10-11 2002-04-18 Canon Kabushiki Kaisha Information processing device, information processing method, and storage medium
JP2004222169A (en) * 2003-01-17 2004-08-05 Daikin Ind Ltd Information processor and information processing method, and program
JP2015516587A (en) * 2012-03-08 2015-06-11 フェイスブック,インク. Devices that extract information from dialogue

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023007960A (en) * 2021-07-02 2023-01-19 株式会社アドバンスト・メディア Information processing device, information processing system, information processing method, and program

Similar Documents

Publication Publication Date Title
US11594211B2 (en) Methods and systems for correcting transcribed audio files
EP3251115B1 (en) Updating language understanding classifier models for a digital personal assistant based on crowd-sourcing
US20160203002A1 (en) Headless task completion within digital personal assistants
CN110085222B (en) Interactive apparatus and method for supporting voice conversation service
JP2008203559A (en) Interaction device and method
JP2016061954A (en) Interactive device, method and program
JP2003263188A (en) Voice command interpreter with dialog focus tracking function, its method and computer readable recording medium with the method recorded
JP2005321730A (en) Dialog system, dialog system implementation method, and computer program
JP6596373B2 (en) Display processing apparatus and display processing program
WO2019142419A1 (en) Information processing device and information processing method
US11615788B2 (en) Method for executing function based on voice and electronic device supporting the same
JP2008145769A (en) Interaction scenario creation system, its method, and program
JP6828741B2 (en) Information processing device
JP3878147B2 (en) Terminal device
JP5892598B2 (en) Spoken character conversion work support device, phonetic character conversion system, phonetic character conversion work support method, and program
JP6756211B2 (en) Communication terminals, voice conversion methods, and programs
WO2019017027A1 (en) Information processing device and information processing method
WO2019142447A1 (en) Information processing device and information processing method
JP5184071B2 (en) Transcription text creation support device, transcription text creation support program, and transcription text creation support method
JP2019138989A (en) Information processor, method for processing information, and program
JP7333761B2 (en) system and image forming system
JP2008243048A (en) Interaction device, interaction method and program
KR102503586B1 (en) Method, system, and computer readable record medium to search for words with similar pronunciation in speech-to-text records
JP2022121643A (en) Voice recognition program, voice recognition method, voice recognition device and voice recognition system
JP2021182091A (en) Information processing system, information processing method, and information processing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18901170

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18901170

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP