CN112334923A

CN112334923A - Description support device and description support method

Info

Publication number: CN112334923A
Application number: CN201980039801.XA
Authority: CN
Inventors: 佐伯夏树; 荒木昭一; 星见昌克; 釜井孝浩
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-09-27
Filing date: 2019-09-18
Publication date: 2021-02-05
Also published as: JP2020052809A; US20210104240A1; JP7142315B2; US11942086B2; WO2020066778A1

Abstract

The explanation support device (2) displays information relating to explanation items (C1-C10) of the examination object during the speech of the user (4). The explanation support device is provided with an acquisition unit (26), a control unit (20), and a display unit (23). The acquisition unit acquires input information indicating a speech sentence based on a speech. The control unit generates information indicating the result of examination of explanatory matters related to the speech sentence. The display unit displays the information generated by the control unit. The display unit displays a check list (50) indicating whether or not the description item is described in the speech sentence indicated by the input information sequentially acquired by the acquisition unit. The display unit displays display information (55, 70) including the spoken sentence on the basis of the likelihood of each spoken sentence that defines the examination result of the explanatory item in the examination list.

Description

Description support device and description support method

Technical Field

The present disclosure relates to an explanation assistance device and an explanation assistance method.

Background

Patent document 1 discloses an explanation assistance system for assisting explanation using a computer terminal. In the description support system of patent document 1, when a keyword included in a check list is detected in a voice recognition result of a voice collecting voice, a control unit of a computer terminal displays a message including the detected keyword on a display. The control unit extracts a voice recognition result at the sound collection time of the keyword, and transmits the explanation status to the background terminal.

Prior art documents

Patent document

Patent document 1: JP 2013-25609A

Disclosure of Invention

Problems to be solved by the invention

An object of the present disclosure is to provide an explanation assistance device and an explanation assistance method that can easily perform assistance for checking a user to explain an explanation item through information processing.

Means for solving the problem

An explanation assistance device according to an aspect of the present disclosure is a device that displays information related to an explanation item of an examination object during speech of a user. The explanation support device includes an acquisition unit, a control unit, and a display unit. The acquisition unit acquires input information indicating a speech sentence based on a speech. The control unit generates information indicating the result of examination of explanatory matters related to the speech sentence. The display unit displays the information generated by the control unit. The display unit displays a check list indicating whether or not the description item is described in the speech sentence indicated by the input information sequentially acquired by the acquisition unit. The display unit displays display information including the spoken sentence on the basis of the likelihood of each spoken sentence that defines the examination result of the explanatory item in the examination list.

An explanation assistance method according to an aspect of the present disclosure is a method for displaying information relating to an explanation item of an examination subject during speech of a user. The method comprises the following steps: a step in which an acquisition unit acquires input information indicating a speech sentence based on a speech; a step in which the control unit generates information indicating an examination result of explanatory matters related to the speech sentence; and a step in which the display unit displays the information generated by the control unit. The display unit displays a check list indicating whether or not the description item is described in the speech sentence indicated by the input information sequentially acquired by the acquisition unit. The display unit displays display information including the spoken sentence on the basis of the likelihood of each spoken sentence that defines the examination result of the explanatory item in the examination list.

Effect of invention

With the explanation assistance device and the explanation assistance method according to the present disclosure, it is possible to easily perform assistance for checking the explanation of the explanation item by the user through information processing.

Drawings

Fig. 1 is a diagram showing an outline of an explanation support system according to embodiment 1 of the present disclosure.

Fig. 2 is a block diagram illustrating an example of the configuration of the explanation assistance device in the explanation assistance system.

Fig. 3 is a block diagram illustrating an example of the structure of a language processing server in the auxiliary system.

Fig. 4 is a diagram showing a display example in the explanation support device of embodiment 1.

Fig. 5 is a diagram showing an example of display in the auxiliary device according to the explanation of fig. 4 and thereafter.

Fig. 6 is a flowchart for explaining the detection operation of the support system according to embodiment 1.

Fig. 7 is a flowchart for explaining the examination display processing by the explanation assistance device.

Fig. 8 is a diagram for explaining history data in the support device.

Fig. 9 is a diagram illustrating an example of display of an operation session list in the support device.

Fig. 10 is a diagram illustrating an example of display of a speech history screen in the auxiliary device.

Fig. 11 is a diagram illustrating an example of display of an inspection history screen in the support device.

Fig. 12 is a flowchart for explaining processing based on the detection history of the explanation assisting apparatus.

Fig. 13 is a flowchart for explaining the detection operation of the support system according to embodiment 2.

Fig. 14 is a diagram showing a display example in the explanation support device of embodiment 2.

Detailed Description

Hereinafter, the embodiments will be described in detail with reference to the accompanying drawings as appropriate. Wherein unnecessary detailed explanation may be omitted. For example, detailed descriptions of known matters and repetitive descriptions of substantially the same structures may be omitted. This is to avoid unnecessarily obscuring the following description, as will be readily understood by those skilled in the art.

In addition, the applicant provides the drawings and the following description for those skilled in the art to fully understand the present disclosure, and does not intend to limit the subject matter described in the claims by these drawings.

(embodiment mode 1)

Hereinafter, embodiment 1 of the present disclosure will be described with reference to the drawings.

1. Structure of the product

1-1. System overview

The description support system according to embodiment 1 will be described with reference to fig. 1. Fig. 1 is a diagram illustrating an outline of the support system 1 according to the present embodiment.

As shown in fig. 1, the present system 1 includes an explanation support device 2, a language processing server 3, and a voice recognition server 11. The present system 1 automatically detects whether or not the user 4 who is in operation for the customer 40 has uttered an important item (i.e., an explanatory item) appropriately in the explanation of the product or the explanation of the product together with the establishment of the business, for example, and visualizes the inspection result of the business session.

As shown in fig. 1, the support apparatus 2 according to the present embodiment communicates with various client terminals 41 held by a client 40 of a user 4 or with various servers 3 and 11 via a communication network 10 such as a public telephone network and the internet. The present system 1 is applicable to information support when a user 4 such as an operator performs various descriptions of a customer 40, for example, in a call center or a remote customer service system.

The following describes the configuration of the support device 2 and the various servers 3 and 11 in the present system 1.

1-2 description of the construction of the auxiliary device

The configuration of the description assisting apparatus 2 in the present system 1 will be described with reference to fig. 2. Fig. 2 is a block diagram illustrating an example of the configuration of the assisting apparatus 2.

The explanation support device 2 includes, for example, a personal computer, a tablet terminal, or an information terminal such as a smartphone. The support apparatus 2 illustrated in fig. 2 includes a control unit 20, a storage unit 21, an operation unit 22, a display unit 23, a device interface 24, and a network interface 25. Hereinafter, the interface is abbreviated as "I/F". For example, the auxiliary device 2 is described as including a microphone 26 and a speaker 27.

The control unit 20 includes, for example, a CPU or MPU that implements a predetermined function in cooperation with software, and controls the overall operation of the support apparatus 2. The control unit 20 reads the data and the program stored in the storage unit 21, performs various arithmetic processes, and realizes various functions. For example, the control unit 20 executes a program including a command group for realizing various processes of the support apparatus 2 in the present system 1. The program is, for example, an application program, and may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.

The control unit 20 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The control unit 20 may include various semiconductor integrated circuits such as a CPU, MPU, GPU, GPGPU, TPU, microcomputer, DSP, FPGA, and ASIC.

The storage unit 21 is a storage medium that stores programs and data necessary for realizing the functions of the support apparatus 2. As shown in fig. 2, the storage unit 21 includes a storage unit 21a and a temporary storage unit 21 b.

The storage unit 21a stores parameters, data, control programs, and the like for realizing predetermined functions. The storage unit 21a includes, for example, an HDD or an SSD. For example, the storage unit 21a stores the program, data indicating the explanatory matters to be checked in the present system 1, and the like.

The temporary storage unit 21b includes, for example, a RAM such as a DRAM or an SRAM, and temporarily stores (i.e., holds) data. For example, the temporary storage unit 21b may function as an operating area of the control unit 20, or may be configured as a storage area in an internal memory of the control unit 20.

The operation unit 22 is a user interface device operated by a user. The operation unit 22 includes, for example, a keyboard, a mouse, a touch panel, buttons, switches, and combinations thereof. The operation unit 22 is an example of an acquisition unit that acquires various pieces of information input by a user operation.

The display section 23 includes, for example, a liquid crystal display or an organic EL display. The display unit 23 displays information indicating the result of the inspection by the present system 1, for example. The display unit 23 may display various information such as various icons for operating the operation unit 22 and information input from the operation unit 22.

The device I/F24 is a circuit for connecting the auxiliary device 2 to an external device. The device I/F24 is an example of a communication unit that performs communication in accordance with a predetermined communication standard. The predetermined standards include USB, HDMI (registered trademark), IEEE1395, WiFi, Bluetooth (registered trademark), and the like. The device I/F24 may constitute an acquisition unit that receives various pieces of information from the external device in the description support apparatus 2.

The network I/F25 is a circuit for connecting the explanation assistance apparatus 2 to the communication network 10 via a wireless or wired communication line. The network I/F25 is an example of a communication unit that performs communication in accordance with a predetermined communication standard. The predetermined communication standards include communication standards such as IEEE802.3, ieee802.11a/11b/11G/11ac, and 3G or 4G for portable communication. The network I/F25 may constitute an acquisition unit that receives the respective information via the communication network 10 in the explanation assistance apparatus 2.

The microphone 26 is an input device for picking up sound and acquiring sound data of the picked-up sound. The microphone 26 is an example of the acquisition unit in the present embodiment. The microphone 26 and the speaker 27 constitute an earphone used by the user 4, for example, as illustrated in fig. 1.

The speaker 27 is an output device for outputting audio data, and is an example of an output unit in the present embodiment. The microphone 26 and the speaker 27 may be provided externally to the information terminal constituting the explanation support device 2 or may be provided internally to the information terminal.

The above description of the configuration of the assisting apparatus 2 is an example, and the configuration of the assisting apparatus 2 is not limited to this. The support device 2 may include various computers not limited to information terminals. Note that the acquisition unit in the support device 2 may be realized by cooperation with various software in the control unit 20 and the like. The acquisition unit in the auxiliary device 2 may acquire each piece of information by reading each piece of information stored in each storage medium (for example, the storage unit 21a) into a work area (for example, the temporary storage unit 21b) of the control unit 20.

1-3. Server architecture

As an example of the hardware configuration of the various servers 3 and 11 in the present system 1, the configuration of the language processing server 3 will be described with reference to fig. 3. Fig. 3 is a block diagram illustrating the structure of the language processing server 3 in the present system 1.

The language processing server 3 illustrated in fig. 3 includes an arithmetic processing unit 30, a storage unit 31, and a communication unit 32. The language processing server 3 includes one or more computers.

The arithmetic processing unit 30 includes, for example, a CPU and a GPU which implement predetermined functions in cooperation with software, and controls the operation of the speech processing server 3. The arithmetic processing unit 30 reads the data and the program stored in the storage unit 31 and performs various arithmetic processes to realize various functions.

For example, the arithmetic processing unit 30 executes the learning model 35 as a program for executing natural language processing for detecting the description items described later. The learning model 35 includes various neural networks such as a forward propagation neural language model, and includes an input layer, one or more intermediate layers, and an output layer. For example, the output layer of the learning model 35 includes a plurality of nodes corresponding to a plurality of explanatory items, and outputs the likelihood of each explanatory item.

Further, the arithmetic processing unit 30 may execute word embedding for generating an input vector to be input to the learning model 35, for example, by word2vec or the like. Additionally, the learning model 35 may also include word embedding. The arithmetic processing unit 30 may execute a program for machine learning such as the learning model 35. The various programs described above may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.

The arithmetic processing unit 30 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The arithmetic processing unit 30 may include various semiconductor integrated circuits such as a CPU, GPU, TPU, MPU, microcomputer, DSP, FPGA, and ASIC.

The storage unit 31 is a storage medium that stores programs and data necessary for realizing the functions of the language processing server 3, and includes, for example, an HDD or an SSD. The storage unit 31 includes, for example, a DRAM or an SRAM, and may function as an operation area of the arithmetic processing unit 30. The storage unit 31 stores, for example, various dictionaries relating to terms and expressions in natural language processing by the learning model 35, and various parameter groups and programs that define the learning model 35. The parameter group includes, for example, various weighting parameters of the neural network. The storage unit 31 may store training data and a program for performing machine learning of the learning model 35.

The communication unit 32 is an I/F circuit for performing communication according to a predetermined communication standard, and communicatively connects the communication network 10, an external device, or the like to the language processing server 3. The predetermined communication standards include IEEE802.3, IEEE802.11a/11b/11g/11ac, USB, HDMI, IEEE1395, WiFi, Bluetooth, and the like.

The speech recognition server 11 has a similar configuration to that of the language processing server 3 described above, and includes, for example, a speech recognition model that realizes a function of speech recognition in place of the learning model 35. The voice recognition model can be configured in various ways, and may include various neural networks that are machine-learned, for example.

The various servers 3 and 11 in the present system 1 are not limited to the above configuration, and may have various configurations. The present system 1 may also be implemented in cloud computing. Further, hardware resources for realizing the functions of the various servers 3 and 11 may be shared. Further, functions of one or both of the various servers 3 and 11 may be installed in the explanation assistance device 2.

2. Movement of

Next, the operation of the support system 1 and the support apparatus 2 configured as described above will be described.

2-1. summary of actions

An outline of the operation of the support system 1 and the operation of the support device 2 according to the present embodiment will be described with reference to fig. 1 to 5.

The present system 1, as shown in fig. 1 for example, describes a detection operation in which the support apparatus 2 checks the content of speech by the user 4 when performing voice communication for a conversation between the user 4 and the customer 40. The support device 2 displays information visualized to the user 4 with respect to the detection result of the present system 1. A display example of the auxiliary device 2 is illustrated in fig. 4.

Fig. 4 is a display example showing an example of real-time display when detecting an action for a session of the user 4. In the present display example, the display unit 23 of the support apparatus 2 displays various operation buttons 5, a check list 50, and a speech list 55. The operation buttons 5 include, for example, a voice recognition button, a reset button, a setting button, and an application exit button that are operated by clicking on the operation unit 22.

The check list 50 includes a plurality of explanatory items C1 to C10, and a check box 51 associated with each of the explanatory items C1 to C10. The explanatory items C1 to C10 are items to be checked during the speech of the user 4, and are set in advance. The number of items C1 to C10 is not particularly limited and can be set as appropriate. Hereinafter, the items C1 to C10 may be collectively referred to as "item C". The check box 51 has: a selected (ON) state with check symbol 52, and a cleared (OFF) state without check symbol 52. The check list 50 indicates whether or not the explanation item C corresponding to each check box 51 is explained based on the check/clear of the check box 51.

The speech list 55 sequentially displays information on speech sentences from the latest speech recognition result to a predetermined number of minutes in the past, for example. The speech list 55 includes a number column 56, a speech sentence column 57, and a remark column 58. The number column 56 indicates the order of speech recognized by voice in the present system 1. The speech sentence field 57 indicates the speech sentence of the voice recognition result. The remarks column 58 indicates notes and the like corresponding to the spoken sentence. The speech list 55 is an example of display information in the present embodiment.

In fig. 4, an explicit example in the case where the user 4 says "you can get into ABC card" is shown. At this time, the present system 1 performs voice recognition on the spoken sentence 53 of the above-described contents, for example, and detects the explanatory item C1 that explains "the click in meeting guide" of the spoken sentence 53. The explanation assisting apparatus 2 changes the check box 51 of the explanation item C1 from the cleared state to the selected state, and displays the result on the display unit 23 as shown in fig. 4.

The present system 1 repeats the above-described detection operation every time the user 4 speaks, for example, and updates the display of the display unit 23 in real time. Fig. 5 shows an example of display of the display unit 23 after the user 4 repeatedly speaks. According to the present system 1, the user 4 can confirm, for example, explanatory items C1 to C7, C9, C10 explained by the speech of the user 4 in a conversation with the customer 40 and an explanatory item C8 not explained, and can assist the business activities of the user 4 and the like.

When the above detection operation is performed, the present system 1 applies, for example, natural language processing by the learning model 35 to a speech sentence, and calculates likelihood for each explanatory item C. The likelihood indicates the degree to which the corresponding speech sentence is detected as explaining the explanatory item C, and has a value in the range of 0 to 1, for example. Visualizing information about the course of such detection to the user 4 is considered useful in employing the present system 1 in order to more appropriately achieve assistance of the user 4.

Therefore, in the present embodiment, the explanation support device 2 displays information corresponding to the likelihood together with the corresponding speech sentence in the speech list 55, for example. For example, the "snap guide [ 99% ]" in the memo column 58 of fig. 4 indicates that the likelihood of the explanatory item C1 of "snap guide" is "0.99" for the corresponding utterance sentence 53. Thus, the user 4 can confirm how much each utterance is detected as the explanatory notes C1 to C10 while the system 1 obtains the examination result of the examination list 50.

The description support device 2 according to the present embodiment can display the history of confirmation of the detection result of the present system 1 by the user 4 even after the real-time detection operation is performed. The result of the confirmation by the user 4 is useful, for example, to improve the detection accuracy of the present system 1. The present system 1 and the operation of the support device 2 will be described in detail below.

2-2, the detection action of the auxiliary system is explained

The detection operation of the support system 1 according to the present embodiment will be described with reference to fig. 6. Fig. 6 is a flowchart for explaining the detection operation of the present system 1.

Each process shown in the flowchart of fig. 6 is executed by the control unit 20 of the description support device 2 in the present system 1. The flowchart starts when, for example, an operation for performing voice recognition is performed via the operation button 5 displayed on the display unit 23 via the operation unit 22 (see fig. 4). Further, at the start of the present flowchart, for example, all the check boxes 51 in the check list 50 are set to the clear state.

First, the control unit 20 of the support apparatus 2 acquires sound data indicating a sound based on the speech of the user 4 from the microphone 26 (S1). The microphone 26 collects sound during a conversation of the user 4, for example, and generates sound data. The sound data of the sound pickup result is an example of the input information in the present embodiment.

Next, the control unit 20 acquires a speech sentence indicating a result of speech recognition of the speech by communication with the speech recognition server 11 (S2). At this time, the control unit 20 transmits the input voice data to the voice recognition server 11 via the network I/F25.

The speech recognition server 11 executes processing based on the speech recognition model based on the speech data from the explanation assisting apparatus 2, generates text data of a spoken sentence, and transmits the text data to the explanation assisting apparatus 2. The processing based on the voice recognition model includes speech segmentation for voice data and various voice recognition processes. When the control unit 20 of the explanation assistance device 2 receives the speech sentence from the voice recognition server 11 via the network I/F25 (S2), the received speech sentence and the corresponding voice data are recorded in the storage unit 21.

Next, the control unit 20 acquires likelihood information including the likelihood of each explanatory item C with respect to the acquired speech sentence through communication with the language processing server 3 (S3). At this time, the control unit 20 transmits the acquired speech sentence to the language processing server 3 via the network I/F25.

When receiving the speech sentence from the explanation assisting apparatus 2, the language processing server 3 executes natural language processing based on the learning model 35, generates likelihood information, and transmits the likelihood information to the explanation assisting apparatus 2. For example, in the natural language processing, a received speech sentence is converted into an input vector by word embedding and input to the learning model 35. The learning model 35 performs machine learning such that the likelihood of each explanatory item C output based on the input vector indicates the degree to which each explanatory item C is predicted to be explained by the corresponding speech sentence detected.

Next, the control unit 20 of the support apparatus 2 executes the examination display processing based on the acquired speech sentence and the likelihood information (S4). The examination display processing is processing for checking whether or not the respective explanatory items C1 to C10 are explained for each utterance sentence of the speech recognition result of the utterance based on the likelihood information, and displaying the examination result as shown in fig. 4, for example. The details of the examination display processing will be described later.

The control unit 20 determines whether or not the detection operation of the present system 1 is completed based on, for example, the operation of the operation button 5 (S5). If the control unit 20 determines that the detection operation has not been completed (no in S5), it returns to step S1 to execute the processing from step S1 on the new speech. The user 4 performs an operation of ending the detection operation when, for example, the session with the customer 40 is ended.

When determining that the detection operation is completed (yes in S5), the control unit 20 generates history data indicating the history of the detection operation and stores the history data in the storage unit 21 (S6). The history data will be described later (see fig. 8).

When the history data is stored in the storage unit 21(S6), the control unit 20 ends the processing according to the present flowchart.

According to the above processing, the likelihood of the speech sentence of the voice recognition result is calculated every time the user 4 speaks (S1) (S2, S3), and the check results for the various explanatory items C1 to C10 are displayed in real time (S4).

If the length of the speech sentence acquired in step S2 is shorter than the predetermined value, the control unit 20 may omit the process of step S3. The predetermined value may be set to the number of characters or words, etc., which can be considered that the utterance contains no description of the various explanatory items C1 to C10. This can reduce the processing load on speech that is not related to the explanation of the explanatory notes C1 to C10, such as the word to be uttered in a conversation.

2-2-1. examination display processing

The details of the examination display processing (S4 in fig. 6) will be described with reference to fig. 7.

Fig. 7 is a flowchart for explaining the examination display processing by the explanation support device 2. The flowchart of fig. 7 starts in a state where one speech sentence is acquired in step S2 of fig. 6, and likelihood information for the speech sentence is acquired in step S3.

First, the control unit 20 of the support apparatus 2 selects one explanatory item C as the inspection target from a plurality of explanatory items C1 to C10 set in advance (S11). In the flowchart of fig. 6, in order to check all explanatory matters C1 to C10 for one speech sentence, explanatory matters C are sequentially selected one by one in step S11.

Next, the control unit 20 determines whether or not the likelihood in the acquired likelihood information exceeds the detection threshold V1 for the explanatory item C being selected (S12). The detection threshold V1 is a threshold indicating a reference for detecting that the corresponding explanatory item C is explained, and is set, for example, in consideration of the likelihood that the speech sentence explaining the explanatory item C has.

If it is determined that the likelihood of the selected explanatory item C exceeds the detection threshold V1 (yes in S12), the control unit 20 determines whether or not the check box 51 associated with the explanatory item C in the check list 50 is in the check state (S13). For example, if the user 4 has not described the selected item in the session and the corresponding check box 51 is in the clear state, the control unit 20 proceeds to no in step S13.

If it is determined that the check box 51 of the selected explanatory item C is not in the selected state (no in S13), the control unit 20 changes the check box 51 from the cleared state to the selected state, and updates the display of the check list 50 on the display unit 23 (S14). The update of the display in step S14 may be performed simultaneously with step S18.

Further, the control unit 20 holds the likelihood of the selected explanatory item C as a candidate to be displayed in the memo column 58 of the speech list 55 (S15). Specifically, the control unit 20 associates the descriptive item C being selected with the likelihood and holds the descriptive item C in the storage unit 21 as a display candidate.

On the other hand, if the check box 51 of the selected explanatory item C is in the selected state (yes in S13), the control unit 20 proceeds to step S15 without performing the process of step S14.

Further, if it is determined that the likelihood of the selected explanatory item C does not exceed the detection threshold V1 (no in S12), the control unit 20 determines whether or not the likelihood exceeds the display threshold V2(S16), for example. The display threshold V2 is set to be smaller than the detection threshold V1, for example, to a value indicating a predetermined width in the vicinity of the detection threshold V1. The display threshold V2 is a threshold indicating a criterion that is considered to be displayed, which may be related to the explanatory item C, although the likelihood of the spoken sentence does not reach the detection threshold V1.

If it is determined that the likelihood of the selected explanatory item C exceeds the display threshold V2 (yes in S16), the control unit 20 holds the likelihood as a display candidate (S15). On the other hand, if it is determined that the likelihood does not exceed the display threshold V2 (no in S16), the control unit 20 proceeds to step S17 without performing the process of step S15.

The controller 20 determines whether or not all of the explanatory items C1 to C10 are selected as the inspection targets (S17). If all the explanatory items C1 to C10 are not selected (no in S17), the control unit 20 performs the processing of step S11 and subsequent steps on the unselected explanatory item C.

After selecting all the explanatory matters C1 to C10 and checking them (yes in S17), the controller 20 controls the display 23 so as to update and display the speech list 55 (S18). Specifically, the control unit 20 additionally displays a speech sentence in the speech sentence column 57 of the speech list 55 (see fig. 4). When the display candidate of the memo column 58 is held (S15), the control unit 20 additionally displays the held information in the memo column 58.

When the control unit 20 controls the display unit 23 so as to update the speech list 55 and the like (S18), the process of step S4 in fig. 6 is ended, and the process proceeds to step S5.

By the above processing, the examination about each explanatory item C can be performed based on the likelihood information for the speech sentence of the voice recognition result corresponding to one speech of the user 4. At this time, the explanation will be given on the manner in which the support apparatus 2 changes the display of the speech list 55 according to the likelihood. The user 4 can confirm the result of the examination of his/her own speech in real time by examining the list 50 and the speech list 55.

For example, for a spoken sentence whose likelihood exceeds the detection threshold V1, the likelihood is displayed in the remark column 58 of the speech list 55, together with the check box 51 checking the selected state in the list 50. Thus, the user 4 can confirm whether or not the explanatory item C is described with the speech of which degree it is detected, or whether or not the speech in the conversation after the examination is sufficient.

Further, even if the likelihood does not reach the detection threshold V1, the likelihood can be displayed in the memo column 58 for the spoken sentence exceeding the display threshold V2. The user 4 can grasp that the explanation of the explanation item C is insufficient in the remark column 58 when the check box 51 is in the clear state.

Further, for a speech sentence whose likelihood is smaller than the display threshold V2, the likelihood is not displayed in the memo column 58 of the speech list 55. Thus, for example, for speech unrelated to any of the explanatory items C1 to C10, such as chatting, the display of the memo column 58 can be omitted.

In addition, by the above processing, when the likelihood of the plurality of explanatory matters C for one speech sentence exceeds the detection threshold V1 (yes in S12), the plurality of check boxes 51 can be updated to the selected state from the speech sentence (S14). When the likelihood of a plurality of explanatory matters C for one speech sentence exceeds the display threshold V2 (yes in S16), for example, a plurality of likelihoods are collectively indicated in the memo 58 (S16, S18).

2-3 about resume data

The description support device 2 according to the present embodiment accumulates history data in the storage unit 21 every time the above detection operation is performed (S6 in fig. 6). The history data will be described with reference to fig. 8.

Fig. 8 is a diagram for explaining history data D1 in the support apparatus 2. The history data D1 is managed for each "business session ID", for example. The "business session ID" is an ID for identifying a session in which the detection operation of the present system 1 is performed. The history data D1 is recorded by associating "speech number", "voice data", "speech sentence", "likelihood information", and "user evaluation information" as shown in fig. 8, for example.

The history data D1 and the detection threshold V1 used in the detection operation may be associated and managed in the storage unit 21. The detection threshold V1 may be managed for each of the explanatory items C1 to C10.

In the history data D1, the "speech number" indicates the order of speech to be subjected to speech recognition in a conversation identified by the business conversation ID. The "voice data" is voice data of a speech to be subjected to voice recognition, and is divided into files for each speech. The "speech sentence" indicates text data of a speech recognition result of speech corresponding to the sound data of the file of each speech number. The "likelihood information" contains the likelihood for each explanatory item C of the spoken sentence. The "user evaluation information" indicates the evaluation of the user 4 with respect to the detection result of the present system 1, as will be described later.

In the flowchart of fig. 6, the description will be made on the case where the control unit 20 of the support apparatus 2 associates the speech sentence, the voice data, and the likelihood information acquired each time steps S1 to S5 are repeated, sequentially assigns the speech number, and records the speech number in the history data D1 (S6). At the time of step S6, the user evaluation information is not recorded in particular, but is a null value "-".

2-4 confirmation display about history

The support device 2 according to the present embodiment can perform various displays for allowing the user 4 to confirm the detection result based on the history data D1. The confirmation display of the history in the support apparatus 2 will be described with reference to fig. 9 to 12.

Fig. 9 shows an example of a display for explaining the business session list 6 on the display unit 23 of the support apparatus 2. The business session list 6 is displayed, for example, in response to an operation of confirming the setting in the operation button 5.

The business session list 6 manages information on the history of the detection operation performed by the present system 1, for example, for each business session ID of the history data D1. In the example of fig. 9, the business session list 6 includes an employee column 61, a time column 62, a guest column 63, an examination history icon 64, and a speech history icon 65.

In the business session list 6, the clerk column 61 indicates the user 4 in the business session at the time of the detection operation by the present system 1. The time column 62 indicates the time that the business session was conducted. The guest column 63 indicates the customer 40 in the business session when the operation is detected. The inspection history icon 64 receives an operation for displaying an inspection history screen. The inspection history screen displays a final inspection list 50 at the time of the detection operation by the present system 1. The speech history icon 65 accepts an operation for displaying the speech history screen.

Fig. 10 shows an example of display of the speech history screen on the display unit 23. The speech history screen displays the speech words as the speech history in the history data D1 of the business conversation corresponding to the operated speech history icon 65 in association with the reproduction column 66 for reproducing the audio data. In the display example of fig. 10, a search field 67 is displayed on the speech history screen. The explanation support apparatus 2 performs keyword search for a spoken word, for example, in accordance with the operation of the search field 67. The search range of the search field 67 may be specified in units of lines of the speech sentence.

Fig. 11 shows an example of the display of the inspection history screen on the display unit 23. The description support apparatus 2 of the present embodiment performs an operation such as double-clicking any text portion of the explanatory items C1 to C10 on the inspection list 50 on the inspection history screen, and pops up the detection history list 70 of the explanatory item C for the operation to display the same. Fig. 11 shows an example of a state of the detection history list 70 for the explanatory item C2 of "confirmation contact".

The detection history list 70 is a list including the explanatory item C detected as being specified at the time of the business conversation or the speech sentence that may be explained. The detection history list 70 can confirm not only the speech sentence of the speech in which the check box 51 is set to the selected state at the time of the detection operation of the present system 1, but also the speech sentence detected as the explanation of the explanation event C in the subsequent speech. The detection history list 70 is an example of display information in the present embodiment.

In the display example of fig. 11, the detection history list 70 displays a playback field 71, a speech sentence, a system detection frame 72, and a user evaluation frame 73 in association with each other. The system detection block 72 has a check/clear state indicating whether or not the corresponding utterance sentence describing the explanatory item is detected in the detection operation of the present system 1.

The user evaluation box 73 has, for example, a check-in/clear-out state indicating a positive/false evaluation with respect to the detection result shown in the system detection box 72. The check/clear state of the user evaluation box 73 can be changed by an operation of the user 4 such as clicking.

The support apparatus 2 according to the present embodiment stores the information based on the user evaluation box 73 in the user evaluation information in the history data D1. The user evaluation information in the history data D1 can be used to improve the detection accuracy of the present system 1. For example, the detection threshold V1 in the present system 1 can be adjusted, or used as teaching data in machine learning such as active learning of the learning model 35.

The processing of the support apparatus 2 in the above-described detection history list 70 will be described with reference to fig. 12. Fig. 12 is a flowchart for explaining processing based on the detection history of the support apparatus 2.

Each process shown in the flowchart of fig. 12 is executed by explaining the control unit 20 of the assist device 2. The flowchart of fig. 12 is started when an operation for specifying the explanatory item C in the examination list 50 via the operation unit 22 is input in the examination history screen.

First, the control unit 20 searches the speech sentence associated with the likelihood exceeding the search threshold V3 in the history data D1 for the explanatory item C specified by the operation of the user 4 (S21). The search threshold V3 is a threshold serving as a reference for performing a search for a specific explanatory item, and is set to V3 — V1, for example. The search threshold V3 is not limited to this, and may be set appropriately in a range of, for example, V2 or more and V1 or less.

Next, the control unit 20 generates the detection history list 70 so as to include the searched speech sentence based on the search result of the history data D1, and causes the display unit 23 to display the detection history list 70 by, for example, popping up (S22). At this time, the system detection block 72 in the detection history list 70 is set to check or clear depending on whether or not the likelihood exceeds the detection threshold V1. The user evaluation boxes 73 are all set to clear or check, for example, in the initial state.

Next, the control unit 20 receives the operation in the detection history list 70 and executes control corresponding to the various operations (S23). For example, in a case where the user evaluation box 73 is operated, the control section 20 controls the display section 23 so that the selected state or the cleared state of the operated user evaluation box 73 is switched and displayed. Further, in the case where the reproduction bar 71 is operated, the control section 20 controls the speaker 27 so that the sound based on the operated reproduction bar 71 is reproduced.

The control unit 20 determines whether or not the operation of the detection history list 70 is finished, for example, based on the operation of the off button 75 attached to the ejection of the detection history list 70 (S24). The control unit 20 executes step S23 until the operation of the detection history list 70 is finished (no in S24).

When the operation of the detection history list 70 is finished (yes in S24), the control unit 20 updates the history data D1, for example, according to the state of the user evaluation box 73 at the time of finishing the operation (S25). The control unit 20 records "Y" or "N" in the user evaluation information based on the selected state or the cleared state of the user evaluation box 73 of each speech sentence at the time of the end of the operation of the detection history list 70 in the history data D1 stored in the storage unit 21 (see fig. 8). At this time, the column that is not the evaluation target in the user evaluation information is maintained as "-".

The control unit 20 eliminates the pop-up display of the detection history list 70 (S26), and ends the processing according to the present flowchart. The processing sequence of steps S25 and S26 is not particularly limited.

Through the above processing, the control unit 20 causes the display unit 23 to display the detection history list 70 including the spoken word or phrase in the history data D1, based on the likelihood of the explanatory note C designated by the user 4 (S22). In the detection history list 70, the user 4 can evaluate whether or not the detection result of the speech sentence related to the specified explanatory item C is appropriate.

3. Summary of the invention

As described above, in the present embodiment, the explanation assisting apparatus 2 displays information on the explanation items C1 to C10 of the examination subject in the speech of the user 4. The auxiliary device 2 includes a microphone 26 as an example of an acquisition unit, a control unit 20, and a display unit 23. The microphone 26 acquires sound data as input information indicating a speech sentence based on the speech (S1). The control unit 20 generates information indicating the result of the examination of the explanatory matter regarding the speech sentence (S4). The display unit 23 displays the information generated by the control unit 20 (S14, S18, S22). The display unit 23 displays a check list 50 indicating whether or not the explanatory items C1 to C10 are explained in the speech sentence indicated by the input information sequentially acquired by the microphone 26. The display unit 23 displays the speech list 55 or the detection history list 70 as display information including each speech sentence according to the likelihood of the speech sentence that defines the examination result of the explanatory item C in the examination list 50 (S18, S22).

The above explanation assisting apparatus 2 displays the examination list 50 concerning the explanation item C of the examination object and the display information including the speech sentence according to the likelihood. This makes it possible to easily assist the information processing to detect that the user 4 explains the explanatory item C.

In the present embodiment, the check list 50 indicates whether or not the explanation event C is explained based on the likelihood of each speech sentence. The user 4 can confirm the result of the examination in which the examination list 50 is available based on the likelihood in the display information, and information assistance by the user 4 is facilitated.

In the present embodiment, the display unit 23 updates the speech list 55(S18) every time input information is acquired from the microphone 26 (S1). This enables the user 4 to confirm the detection result of the present system 1 in real time during a speech or the like.

In the present embodiment, the speech list 55 as display information includes a speech sentence column 57 indicating a speech sentence and a remark column 58 indicating the likelihood of the speech sentence. The user 4 can make confirmation so that the talkback utterances in the utterance list 55 are compared with the magnitude of the likelihood.

In the present embodiment, the explanation support device 2 further includes a storage unit 21 that records history data D1 in which speech sentences and likelihood are associated with each other. The control unit 20 generates the detection history list 70 as display information based on the history data D1 recorded in the storage unit 21 (S22). This enables the user 4 to check the detection result of the present system 1 later. The detection history list 70 may include a reproduction column 71 for reproducing the audio data for the selected explanatory item C and a display of the speech sentence for the selected explanatory item C.

In the present embodiment, the detection history list 70 includes the system detection boxes 72 each including the speech sentence in the history data D1 and indicating whether or not the associated likelihood exceeds the predetermined detection threshold V1. The system detection box 72 allows the user 4 to easily check the detection result of the present system 1 in the detection history list 70.

In the present embodiment, the present embodiment further includes an operation unit 22, and the operation unit 22 inputs the user's operation for evaluating the examination result of the explanatory item C in the user evaluation box 73 for each spoken word in the detection history list 70. This makes it possible to obtain information indicating an evaluation of the detection result of the user 4 with respect to the present system 1, and the present system 1 can be easily operated.

In the present embodiment, the acquisition unit of the support apparatus 2 will be described as including the microphone 26 for acquiring audio data as input information. The speech sentence represents a result of voice recognition of the voice data. The detection operation of the present system 1 can be performed based on the voice uttered by the user 4.

The explanation support method in the present embodiment is a method of displaying information on the explanation items C1 to C10 of the examination subject during the speech of the user 4. The method comprises the following steps: a step S1 in which the acquisition unit acquires input information indicating a speech sentence based on the speech; a step S4 in which the control unit 20 generates information indicating the result of examination of explanatory matters related to the speech sentence; steps S14, S18, S22 in which the display unit 23 displays information generated by the control unit 20. The display unit 23 displays the check list 50 indicating whether or not the explanatory items C1 to C10 are explained in the speech sentence indicated by the input information sequentially acquired by the acquisition unit. The display unit 23 displays display information including the spoken sentence on the basis of the likelihood of each spoken sentence that defines the examination result of the explanatory item C in the examination list 50.

In the present embodiment, a program for causing the control unit 20 of the computer to execute the above-described support method is provided. With the description support method according to the present embodiment, it is possible to easily support the user 4 to check the description of the description item C by information processing.

(embodiment mode 2)

Hereinafter, embodiment 2 will be described with reference to the drawings. In embodiment 1, the explanation support system 1 that detects whether or not an explanation item is explained in the speech of the user 4 is explained. In embodiment 2, the explanation assisting system 1 for detecting the presence or absence of an NG phrase in the speech of the user 4 will be further explained.

Hereinafter, the description of the configuration and operation of the support system 1 according to embodiment 1 will be omitted as appropriate, and the support system 1 and the support device 2 according to the present embodiment will be described.

Fig. 13 is a flowchart for explaining the detection operation of the support system 1 according to embodiment 2. The description support system 1 according to the present embodiment executes processing for detecting NG phrases (S31 to S33) as shown in fig. 13, in addition to the processing similar to that of fig. 6.

The control unit 20 of the explanation assisting apparatus 2 in this embodiment detects whether or not the speech sentence is a preset NG phrase (i.e., prohibited phrase) based on the speech sentence or the likelihood information acquired in steps S2 and S3 (S31). The determination at step S31 may be performed by detecting a keyword for a predetermined NG phrase in a spoken sentence. The learning model 35 may perform machine learning so as to output the likelihood indicating the degree of prediction that the speech sentence is an NG phrase together with the likelihood of the various items of description C1 to C10.

When detecting that the speech sentence is not an NG phrase (no in S31), the control unit 20 transmits the audio data corresponding to the speech sentence from the explanation assisting apparatus 2 to the customer terminal 41 (S32). For example, the control unit 20 buffers the audio data acquired in step S1 until the judgment of step S31.

On the other hand, if it is detected that the speech sentence is an NG phrase (yes in S31), the network I/F25 is controlled, for example, so that the transmission of the voice data from the explanation assistance apparatus 2 to the customer terminal 41 is cut off (S33). Thus, when it is detected that the user 4 utters the NG phrase, the sound of uttering the NG phrase can be prevented from being recognized by the customer 40.

Fig. 14 shows a display example of the support device 2 according to the present embodiment. When the NG phrase is detected (yes in S31), the description support device 2 may display a warning to the user 4 of the NG phrase in the examination display process (S4). In fig. 14, an explicit example of when the utterance sentence 54 indicating "can promise a larger profit" is detected as an NG phrase. In the present display example, the display unit 23 displays "warning text" in the memo column 58 corresponding to the above-described speech sentence 54. Thereby, the user 4 can be prompted to draw attention when the NG phrase is detected.

As described above, in the present embodiment, the support apparatus 2 further includes a communication unit such as the network I/F25 or the machine I/F24 that transmits information indicating a speech sentence to the outside. When an NG phrase, which is a predetermined prohibited phrase, is detected in a speech sentence, the control unit 20 controls the communication unit so that transmission of information indicating the speech sentence in which the NG phrase is detected is cut off. This prevents information indicating the NG phrase from being selectively transmitted to the outside, and information assistance can be provided to the user 4.

(other embodiments)

As described above, embodiments 1 to 2 have been described as examples of the technique disclosed in the present application. However, the technique in the present disclosure is not limited to this, and can be applied to an embodiment in which changes, substitutions, additions, omissions, and the like are appropriately made. Further, the components described in the above embodiments may be combined to form a new embodiment. Therefore, in the following, other embodiments are exemplified.

In each of the above embodiments, the explanation supporting apparatus 2 of the explanation supporting system 1 performs voice communication with the client terminal 41. The support device 2 of the present embodiment is not particularly limited to voice communication, and may perform various data communications.

In the above embodiments, the description support device 2 of the support system 1 has been described as communicating with the client terminal 41, but the description support device 2 of the present embodiment may not particularly communicate with the client terminal 41. The present system 1 can also be applied to various face-to-face reception customers such as windows of financial institutions. At this time, the explanation assisting apparatus 2 can be configured to appropriately recognize the speech of the user 4 and the speech of the customer 40.

In the above embodiments, the input information of the support apparatus 2 is speech sound data. In the present embodiment, the input information of the support apparatus 2 may be not audio data but text data. The present system 1 can be applied to various electronic conferences, for example.

As described above, the embodiments have been described as examples of the technique in the present disclosure. For this reason, the drawings and detailed description are provided.

Therefore, the components described in the drawings and the detailed description may include not only components necessary for solving the problem but also components not necessary for solving the problem in order to exemplify the above-described technology. Therefore, even if these unnecessary components are described in the drawings and detailed description, these unnecessary components should not be directly regarded as essential.

Further, the above-described embodiments are intended to exemplify the technology in the present disclosure, and various modifications, substitutions, additions, omissions, and the like can be made within the scope of the claims and their equivalents.

Industrial applicability

The present disclosure can be applied to information assistance when a user makes various descriptions, and can be applied to, for example, a call center, a remote customer service system, or various face-to-face customer service.

Claims

1. An explanation assistance device for displaying information relating to an explanation item of an examination object during speech of a user, the explanation assistance device comprising:

an acquisition unit that acquires input information indicating a speech sentence based on the speech;

a control unit that generates information indicating a result of checking the explanatory item related to the speech sentence; and

a display unit for displaying the information generated by the control unit,

the display unit displays a check list indicating whether or not the explanatory item is explained in the speech sentence indicated by the input information sequentially acquired by the acquisition unit,

the display unit displays display information including the speech sentence on the basis of the likelihood of each speech sentence that defines the examination result of the explanatory item in the examination list.

2. The explanation assistance apparatus according to claim 1,

the check list indicates whether the explanatory item is explained based on the likelihood of each of the speech sentences.

3. An explanation aid according to claim 1 or 2, wherein,

the display section updates the display information each time the input information is acquired from the acquisition section.

4. An explanation assisting device according to any one of claims 1 to 3, wherein,

the display information includes information indicating the speech sentence and the magnitude of the likelihood of the speech sentence.

5. An explanation assisting apparatus according to any one of claims 1 to 4, wherein,

the explanation support device further includes: a storage unit that records history data in which the speech sentence and the likelihood are associated with each other,

the control unit generates the display information based on history data recorded in the storage unit.

6. Explanation assistance apparatus according to claim 5,

the display information includes: and information indicating whether or not the associated likelihood exceeds a predetermined threshold for each speech sentence in the history data.

7. Explanation assistance apparatus according to claim 5,

the display information includes: a reproduction section for reproducing the audio data related to the selected explanatory item, and a display of the speech sentence related to the selected explanatory item.

8. An explanation assisting apparatus according to any one of claims 1 to 7,

the explanation support device further includes: and an operation unit that inputs an operation by a user for evaluating an examination result of the explanatory item for each speech sentence in the display information.

9. An explanation assisting apparatus according to any one of claims 1 to 8, wherein,

the explanation support device further includes: a communication unit for transmitting information indicating the speech sentence to the outside,

the control section controls the communication section such that: when a predetermined prohibited phrase is detected in the spoken sentence, transmission of information indicating the spoken sentence for which the prohibited phrase is detected is cut off.

10. An explanation assisting apparatus according to any one of claims 1 to 9, wherein,

the acquisition unit includes a microphone for acquiring audio data as the input information,

the speech sentence represents a voice recognition result of the voice data.

11. An explanation assistance method for displaying information relating to an explanation item of an examination subject in speech of a user, the explanation assistance method comprising:

a step in which an acquisition unit acquires input information indicating a speech sentence based on the speech;

a step in which the control unit generates information indicating an examination result of the explanatory item related to the speech sentence; and

a step of displaying information generated by the control unit on a display unit,

12. A program for causing a computer to execute the method of claim 11.