WO2020188622A1 - 編集支援プログラム、編集支援方法、及び編集支援装置 - Google Patents
編集支援プログラム、編集支援方法、及び編集支援装置 Download PDFInfo
- Publication number
- WO2020188622A1 WO2020188622A1 PCT/JP2019/010793 JP2019010793W WO2020188622A1 WO 2020188622 A1 WO2020188622 A1 WO 2020188622A1 JP 2019010793 W JP2019010793 W JP 2019010793W WO 2020188622 A1 WO2020188622 A1 WO 2020188622A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speaker
- editing
- start point
- sections
- section
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 96
- 230000008569 process Effects 0.000 claims abstract description 93
- 238000012545 processing Methods 0.000 claims description 54
- 238000004891 communication Methods 0.000 description 21
- 239000000463 material Substances 0.000 description 17
- 238000010586 diagram Methods 0.000 description 9
- 238000003825 pressing Methods 0.000 description 5
- 230000000052 comparative effect Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005401 electroluminescence Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- This case concerns editing support programs, editing support methods, and editing support devices.
- voice data including speech data of a plurality of speakers is reproduced, the user transcribes the speech data of each speaker into text, and sets a speaker name indicating the speaker in each speech data. .. It is also known that voice data is classified based on voice characteristics and arbitrary speaker identification information is obtained for each classified voice data (see, for example, Patent Document 1).
- the speaker identification information obtained from the voice characteristics may change depending on the physical condition of the speaker. As a result, the speaker identification information may represent the wrong speaker. In this case, there is a problem that the user takes time and effort to edit the speaker identification information.
- the purpose is to improve the convenience of editing processing for the speaker identification result.
- the editing assistance program correlates information indicating the identified speaker for a sentence generated based on voice recognition with a section of the sentence corresponding to the identified speaker.
- the first editing process of displaying on the display unit and editing the identification result of the speaker occurs, and the speakers of two or more adjacent sections are common by the first editing process, they are adjacent to each other.
- a section in which the two or more sections are combined and displayed on the display unit, and a second editing process for editing the speaker's identification result is performed on a specific section within the combined two or more sections. Designated when the start point is specified and there is a part corresponding to the start point of any of the two or more sections before the combination between the specified start point and the end point of the two or more sections combined.
- the computer is made to execute the process of applying the second editing process to the section from the start point to the portion corresponding to the start point of any of the two or more sections.
- FIG. 1 is an example of a terminal device.
- FIG. 2 is an example of the hardware configuration of the terminal device.
- FIG. 3 is an example of a block diagram of the terminal device.
- FIG. 4 is a flowchart (No. 1) showing an example of the operation of the terminal device.
- FIG. 5 is a flowchart (No. 2) showing an example of the operation of the terminal device.
- FIG. 6 is an example of the portal screen.
- FIG. 7 is an example of speaker data.
- FIG. 8 is an example of text data before update according to the first embodiment.
- FIG. 9 is an example of the editing support screen.
- 10 (a) to 10 (c) are diagrams (No. 1) for explaining an example of the editing work according to the embodiment.
- FIG. 11 is a diagram for explaining an example of updating text data.
- FIG. 12 (a) to 12 (c) are diagrams (No. 2) for explaining an example of the editing work according to the embodiment.
- FIG. 13 is an example of division start point data.
- 14 (a) and 14 (b) are diagrams (No. 3) for explaining an example of the editing work according to the embodiment.
- FIG. 15 is a diagram for explaining another update example of the text data.
- 16 (a) and 16 (b) are diagrams for explaining an example of editing work according to a comparative example.
- FIG. 17A is an example of text data before update according to the second embodiment.
- FIG. 17B is an example of the updated text data according to the second embodiment.
- FIG. 18 is an example of an editing support system.
- FIG. 1 is an example of the terminal device 100.
- the terminal device 100 is an example of an editing support device.
- a personal computer (PC) is shown as an example of the terminal device 100, but it may be a smart device such as a tablet terminal.
- the terminal device 100 includes a keyboard and a pointing device (hereinafter, simply referred to as a keyboard) 100F.
- the terminal device 100 includes a display 100G.
- the display 100G may be a liquid crystal display or an organic electro-luminescence (EL) display.
- EL organic electro-luminescence
- Display 100G displays various screens. Details will be described later, but for example, the display 100G displays the editing support screen 10.
- the editing support screen 10 is a screen that supports editing of a speaker identified for a sentence generated based on voice recognition.
- the speaker identification may be one using Artificial Intelligence (AI), or one using a predetermined voice model defined in advance without using AI.
- AI Artificial Intelligence
- the user who uses the terminal device 100 confirms the speaker candidates displayed on the editing support screen 10, operates the keyboard 100F, and selects one of the speaker candidates from the speaker candidates.
- the terminal device 100 edits the unedited speaker identified based on AI or the like into the selected candidate speaker. In this way, the user can easily edit the speaker by using the editing support screen 10.
- the creator of the minutes of the meeting will be described as an example of the user, but the user is not particularly limited to such a creator.
- the user may be a producer of broadcast subtitles or a person in charge of audio recording in a call center.
- FIG. 2 is an example of the hardware configuration of the terminal device 100.
- the terminal device 100 includes at least a Central Processing Unit (CPU) 100A as a hardware processor, a Random Access Memory (RAM) 100B, a Read Only Memory (ROM) 100C, and a network I / F (interface). Contains 100D. Further, as described above, the terminal device 100 also includes a keyboard 100F and a display 100G.
- CPU Central Processing Unit
- RAM Random Access Memory
- ROM Read Only Memory
- FIG. 2 is an example of the hardware configuration of the terminal device 100.
- the terminal device 100 includes at least a Central Processing Unit (CPU) 100A as a hardware processor, a Random Access Memory (RAM) 100B, a Read Only Memory (ROM) 100C, and a network I / F (interface). Contains 100D. Further, as described above, the terminal device 100 also includes a keyboard 100F and a display 100G.
- CPU Central Processing Unit
- RAM Random Access Memory
- ROM Read Only Memory
- the terminal device 100 may include at least one of a Hard Disk Drive (HDD) 100E, an input / output I / F 100H, a drive device 100I, and a short-range wireless communication circuit 100J, if necessary.
- the short-range wireless communication circuit 100J from the CPU 100A is connected to each other by the internal bus 100K. That is, the terminal device 100 can be realized by a computer.
- a Micro Processing Unit (MPU) may be used as a hardware processor instead of the CPU 100A.
- a semiconductor memory 730 is connected to the input / output I / F 100H.
- Examples of the semiconductor memory 730 include a Universal Serial Bus (USB) memory and a flash memory.
- the input / output I / F 100H reads programs and data stored in the semiconductor memory 730.
- the input / output I / F 100H includes, for example, a USB port.
- a portable recording medium 740 is inserted into the drive device 100I. Examples of the portable recording medium 740 include removable discs such as Compact Disc (CD) -ROM and Digital Versatile Disc (DVD).
- the drive device 100I reads programs and data recorded on the portable recording medium 740.
- the short-range wireless communication circuit 100J is an electric circuit or an electronic circuit that realizes short-range wireless communication such as Wi-Fi (registered trademark) and Bluetooth (registered trademark).
- An antenna 100J' is connected to the short-range wireless communication circuit 100J.
- a CPU that realizes a communication function may be used instead of the short-range wireless communication circuit 100J.
- the network I / F100D includes, for example, a Local Area Network (LAN) port.
- the programs stored in the ROM 100C and the HDD 100E are temporarily stored by the CPU 100A.
- the program recorded on the portable recording medium 740 is temporarily stored in the RAM 100B by the CPU 100A.
- the CPU 100A executes the stored program, the CPU 100A realizes various functions described later and also executes various processes described later.
- the program may be adapted to the flowchart described later.
- FIG. 3 is an example of a block diagram of the terminal device 100.
- FIG. 3 shows a main part of the function of the terminal device 100.
- the terminal device 100 includes a storage unit 110, a processing unit 120, an input unit 130, and a display unit 140.
- the storage unit 110 can be realized by the above-mentioned RAM 100B or HDD 100E.
- the processing unit 120 can be realized by the CPU 100A described above.
- the input unit 130 can be realized by the keyboard 100F described above.
- the display unit 140 can be realized by the display 100G described above. Therefore, the storage unit 110, the processing unit 120, the input unit 130, and the display unit 140 are connected to each other.
- the storage unit 110 includes a voice storage unit 111, a dictionary storage unit 112, a sentence storage unit 113, a model storage unit 114, and a point storage unit 115 as components.
- the processing unit 120 includes a first display control unit 121, a voice recognition unit 122, a sentence generation unit 123, and a speaker identification unit 124 as components. Further, the processing unit 120 includes a voice reproduction unit 125, a speaker editing unit 126, a point management unit 127, and a second display control unit 128 as components.
- Each component of the processing unit 120 accesses at least one of the components of the storage unit 110 to execute various processes. For example, when the voice reproduction unit 125 detects a voice data reproduction instruction, it accesses the voice storage unit 111 and acquires the voice data stored by the voice storage unit 111. When the audio reproduction unit 125 acquires the audio data, the audio reproduction unit 125 reproduces the audio data.
- the other components will be described in detail when the operation of the terminal device 100 is described.
- the first display control unit 121 displays the portal screen (step S101). More specifically, when the first display control unit 121 detects the activation instruction of the portal screen output from the input unit 130, the first display control unit 121 displays the portal screen on the display unit 140. As a result, as shown in FIG. 6, the display unit 140 displays the portal screen 20.
- the portal screen 20 includes a first registration button 21, a second registration button 22, a third registration button 23, and a plurality of fourth registration buttons 24.
- the first registration button 21 is a button for registering the voice data of the conference.
- the user When registering the voice data of the conference, the user prepares the voice data of the conference recorded in advance in the terminal device 100.
- the first display control unit 121 detects the pressing of the first registration button 21.
- the first display control unit 121 detects that the first registration button 21 is pressed, the first display control unit 121 stores the voice data of the conference prepared in the terminal device 100 in the voice storage unit 111.
- the second registration button 22 is a button for registering material data related to meeting materials.
- the user prepares the material data of the meeting in the terminal device 100 in advance.
- the first display control unit 121 detects the pressing of the second registration button 22.
- the first display control unit 121 displays the material data prepared in the terminal device 100 in the first display area 20A in the portal screen 20.
- the third registration button 23 is a button for registering the participants of the conference.
- the user When registering the participants of the conference, the user performs an operation of pressing the third registration button 23 with the pointer Pt.
- the first display control unit 121 detects the pressing of the third registration button 23.
- the display unit 140 displays a registration screen (not shown) for registering the participants of the conference as speakers.
- the first display control unit 121 inputs the participant data including the input speaker to the first display in the portal screen 20. 2 Display in the display area 20B.
- the first display control unit 121 generates a speaker ID, associates it with the input speaker, and stores it in the model storage unit 114.
- the speaker ID is information that identifies the speaker.
- the model storage unit 114 stores the speaker ID and the speaker in association with each other.
- the fourth registration button 24 is a button for registering the voice data of the speaker.
- the user When registering the voice data of the speaker, the user prepares various voice data of the speaker recorded in advance in the terminal device 100.
- a microphone may be connected to the terminal device 100 and the voice data acquired from the microphone may be used.
- the first display control unit 121 detects the pressing of the fourth registration button 24.
- the first display control unit 121 detects that the fourth registration button 24 is pressed, the first display control unit 121 outputs the voice data prepared in the terminal device 100 to the speaker identification unit 124.
- the speaker identification unit 124 generates a trained model in which the characteristics of the speaker's voice are machine-learned based on the speaker's voice data output from the first display control unit 121.
- the speaker identification unit 124 stores the generated learned model in the model storage unit 114 in association with the speaker ID of the speaker corresponding to the voice data to be learned.
- the model storage unit 114 stores speaker data in which the speaker ID and the speaker are associated with the trained model.
- the first display control unit 121 displays the registration mark RM in the participant data regarding the speaker to be registered.
- the registration mark RM is a mark indicating that the voice data of the speaker has been registered because the model storage unit 114 has stored the trained model.
- the voice recognition unit 122 executes voice recognition (step S102).
- the voice recognition unit 122 refers to the voice storage unit 111, and determines whether or not the voice storage unit 111 stores the voice data of the conference.
- the voice recognition unit 122 executes voice recognition for the voice data of the conference stored by the voice storage unit 111 and generates character string data.
- the voice recognition unit 122 identifies a plurality of characters based on the voice of the speaker included in the voice data of the conference, arranges the specified characters in chronological order, and assigns a character ID and a time code to each character.
- the voice recognition unit 122 When the voice recognition unit 122 generates the character string data, the voice recognition unit 122 outputs the generated character string data to the sentence generation unit 123.
- the voice recognition unit 122 includes a plurality of voice recognition engines, and generates corresponding character string data.
- the voice recognition engine includes, for example, AmiVoice (registered trademark).
- the sentence generation unit 123 When the process of step S102 is completed, the sentence generation unit 123 then generates sentence data (step S103). More specifically, when the sentence generation unit 123 receives the character string data output by the voice recognition unit 122, it refers to the dictionary storage unit 112 and executes morphological analysis on the character string data.
- the dictionary storage unit 112 stores the morpheme dictionary. Various words and phrases are stored in the morpheme dictionary. For example, a morpheme dictionary contains words such as "yes”, “certainly”, “material”, and “question”. Therefore, when the sentence generation unit 123 performs morphological analysis on the character string data with reference to the dictionary storage unit 112, the sentence generation unit 123 generates sentence data in which the character string data is divided into a plurality of word blocks. When the sentence generation unit 123 generates sentence data, the sentence generation unit 123 stores the generated sentence data in the sentence storage unit 113 in association with the identifier of each word block. As a result, the sentence storage unit 113 stores the sentence data.
- the speaker identification unit 124 identifies the speaker (step S104). More specifically, the speaker identification unit 124 refers to the model storage unit 114 and compares the learned model stored in the model storage unit 114 with the voice data of the conference stored in the voice storage unit 111. The speaker identification unit 124 compares the trained model with the voice data of the conference, and when the voice data of the conference detects a voice portion corresponding to the trained model (for example, common or similar), the speaker identification unit 124 associates the trained model with the trained model. Identify the speaker ID and time code. In this way, the speaker identification unit 124 identifies each speaker of various audio parts included in the audio data of the conference.
- the speaker identification unit 124 When the speaker identification unit 124 identifies the speaker ID and the time code, the speaker identification unit 124 associates the specified speaker ID with the sentence data stored in the sentence storage unit 113 based on the time code. As a result, as shown in FIG. 8, the sentence storage unit 113 stores the sentence data associated with the speaker ID.
- the text data includes a character ID, a character, a word block, a time code, a speaker ID (initial), and a speaker ID (current) as components.
- a word block identifier is registered in the word block.
- the speaker ID (initial) the speaker ID of the speaker first identified by the speaker identification unit 124 is registered.
- the speaker ID after editing the speaker is registered in the speaker ID (current).
- the same speaker ID is registered in the speaker ID (initial) and the speaker ID (current).
- the sentence storage unit 113 stores such sentence data. If the time code assigned to each character is common to the immediately preceding time code, the time code after the immediately preceding time code may be omitted.
- the first display control unit 121 When the process of step S104 is completed, the first display control unit 121 then displays the speaker and the utterance section (step S105). More specifically, when the process of step S104 is completed, the first display control unit 121 stops displaying the portal screen 20 on the display unit 140, and displays the editing support screen 10 on the display unit 140. Then, the first display control unit 121 displays the speaker and the utterance section corresponding to the speaker in association with each other in the editing support screen 10.
- the display unit 140 displays the editing support screen 10.
- the editing support screen 10 includes a script area 11, a setting area 12, an editing area 13, a play button 14, and the like.
- the first display control unit 121 displays each speaker and the utterance section corresponding to each speaker in the sentence in association with each other in the editing area 13 of the editing support screen 10 based on the sentence data and the speaker data.
- the time code and characters of the sentence data stored in the sentence storage unit 113 are displayed in a state of being related to each other.
- characters from the first time code in which the speaker ID is switched to the last time code in which the continuity of the speaker ID is interrupted are combined and displayed in chronological order.
- setting items related to the playback format of voice data, setting items related to the output format of text data after editing the speaker, and the like are displayed.
- the speaker and the utterance section are displayed in association with each other in the editing area 13.
- the speaker “Oda” and the utterance section “... isn't it?" Are displayed in association with each other.
- the speaker “Kimura” and the utterance section “Yes, I have a question about the material” are displayed in association with each other.
- the speaker “Yamada” and the utterance section "Please ask a question” are displayed in association with each other.
- the progress mark 16 is a mark indicating the current reproduction position of the voice data.
- the switching point 17 is a point indicating the switching of the word block (see FIG. 8). That is, the switching point 17 is displayed at the position between the two word blocks where the word block is switched to another word block. In the present embodiment, one switching point 17 is displayed. For example, a plurality of switching points are displayed, one of the plurality of switching points is set as the current switching point 17, and a color different from the remaining switching points is added. You may. This allows the user to check at which position the word block is switched.
- the switching point 17 can be moved left and right according to the operation on the input unit 130.
- the first display control unit 121 moves the switching point 17 to the right.
- the first display control unit 121 moves the switching point 17 to the left.
- the key for moving the switching point 17 may be a space key. The key for moving the switching point 17 may be appropriately determined according to the design, experiment, and the like.
- step S105 When the process of step S105 is completed, the audio reproduction unit 125 then waits until the reproduction instruction is detected (step S106: NO).
- step S106: YES When the audio reproduction unit 125 detects the reproduction instruction (step S106: YES), the audio reproduction unit 125 reproduces the audio data (step S107). More specifically, when the play button 14 (see FIG. 9) is pressed by the pointer Pt, the voice reproduction unit 125 detects a voice data playback instruction and starts playing the voice data. When the reproduction of the audio data is started, the progress mark 16 (see FIG. 9) moves to the right according to the reproduction speed of the audio data. The user plays back the voice data of the conference, listens to the voice, moves the switching point 17, and performs an operation of specifying the position to edit the speaker.
- the first display control unit 121 waits until the start point is specified (step S108: NO).
- the first display control unit 121 displays the first edit screen (step S109). More specifically, as shown in FIG. 10A, the user first moves the switching point 17 to stop the speaker's editing at a desired predetermined position.
- the first display control unit 121 determines that the predetermined position has been designated as the start point.
- the first display control unit 121 superimposes the first edit screen 30 on the edit area 13 and displays it as shown in FIG. 10 (b).
- the first editing screen 30 is a screen that requests the user to perform editing processing.
- the first display control unit 121 also displays a part of the utterances corresponding to one or a plurality of word blocks located before the start point in the utterance section corresponding to the start point. Identify the section.
- the first display control unit 121 specifies a part of the utterance section corresponding to one word block "certainly". The display of the first edit screen 30 and the specific order of some utterance sections may be reversed.
- step S109 the speaker editorial unit 126 waits until the selection instruction is detected (step S110: NO).
- step S110: YES When the speaker editing unit 126 detects the selection instruction (step S110: YES), it edits the speaker (step S111) as shown in FIG. More specifically, as shown in FIG. 10B, when the user operates the input unit 130 to select one of the plurality of speakers included in the first editing screen 30 by the pointer Pt, the talk The person editorial unit 126 detects the selection instruction. The user may perform an operation of selecting any of a plurality of numerical values included in the first editing screen 30 using the numeric keypad.
- the speakers included in the first editing screen 30 are arranged side by side in order of priority according to at least one of the utterance order and the utterance amount.
- the speaker of the moderator at the conference is often uttered earlier than the other speakers, and it is assumed that the utterance volume is also large. Therefore, on the first editing screen 30, speakers are arranged in order from the speaker with the highest possibility of editing. As a result, it is possible to reduce the time and effort of the speaker's editing process.
- the speaker editing unit 126 When the speaker editing unit 126 detects the selection instruction, it determines that the editing process has occurred, applies the editing process to a part of the utterance sections specified by the first display control unit 121, and applies the editing process to a part of the utterance sections. Edit and display the speaker to the selected speaker.
- the speaker editorial unit 126 applies the editing process to a part of the utterance section corresponding to the word block "certainly”, and selects the speaker "Kimura” in the part of the utterance section. Edit and display in "Kimura". Since there is no substantial change in this example, a detailed description will be described later.
- the speaker editorial unit 126 determines whether or not the speakers are common (step S112). More specifically, whether or not the speaker editorial unit 126 shares the speaker of the edited speaker and the speaker of the previous utterance section located immediately before the part of the utterance section corresponding to the word block of the edited speaker. To judge. In the present embodiment, the speaker editorial unit 126 is located immediately before a part of the utterance section corresponding to the word block "certainly" of the edited speaker "Kimura” and the edited speaker “Kimura”. Judge whether or not the speaker “Oda” in the utterance section "... isn't it?" Is common. Here, since the speaker "Kimura” and the speaker “Oda" are not common, the speaker editorial department 126 determines that the speakers are not common (step S112: NO).
- the speaker editorial unit 126 skips the processing of steps S113 and S114 and determines whether or not the processing has been completed after the start point (step S115).
- the speaker editing unit 126 determines that the processing has not been completed after the start point (step S115: NO)
- the first display control unit 121 executes the processing of step S109 again as shown in FIG. That is, in the process of the first step S109, as shown in FIG. 10B, one word block “certainly” located before the start point in the utterance section corresponding to the start point specified by the switching point 17. A part of the utterance section corresponding to is subject to the editing process of the speaker.
- the speaker editing unit 126 determines that the processing has not been completed after the start point, and the first display control unit 121 again superimposes the first editing screen 30 on the editing area 13 as shown in FIG. 10 (c). To display. In addition to displaying the first edit screen 30, the first display control unit 121 sets the remaining utterance section corresponding to one or more word blocks located after the start point in the utterance section corresponding to the start point. Identify. In the present embodiment, the first display control unit 121 specifies the remaining utterance section corresponding to the plurality of word blocks "Yes, I have a question about the material".
- the speaker editing unit 126 edits the speaker in the processing of step S111 (see FIG. 5). To do. More specifically, as shown in FIG. 10C, when the user operates the input unit 130 again to select one of the plurality of speakers included in the first editing screen 30 with the pointer Pt, The speaker editorial unit 126 detects the selection instruction. When the speaker editorial unit 126 detects the selection instruction, it accesses the sentence storage unit 113, and as shown in FIG. 11, the speaker ID (current) of the speaker corresponding to the specified word block is edited. Update to the speaker ID of.
- the speaker editing unit 126 detects the selection instruction, it determines that the editing process has occurred, applies the editing process to the specified remaining utterance section, and selects the speaker in the remaining utterance section. Edit and display to the speaker.
- the speaker editorial unit 126 applies the editing process to the remaining utterance section corresponding to the plurality of word blocks "Yes, I have a question about the material", and the speaker "Kimura” in the remaining utterance section is displayed. Edit and display the selected speaker "Yamada".
- the speaker editorial unit 126 again determines whether or not the speakers are common.
- the speaker editorial department 126 immediately after the remaining utterance section corresponding to the plurality of word blocks "Yes, I have a question about the material" of the edited speaker “Yamada” and the edited speaker “Yamada”. Judge whether or not the speaker “Yamada” in the later utterance section "Please ask a question” is common.
- the speaker editorial department 126 determines that the speakers are common (step S112: YES).
- the speaker editorial unit 126 displays the utterance sections in a combined state (step S113). More specifically, the speaker editorial unit 126 displays the utterance sections of two speakers that are common after editing in a combined state. At the same time, the speaker editorial unit 126 displays one of the two speakers corresponding to each of the two utterance sections before the combination in association with the utterance section after the combination. As a result, the speaker editorial department 126 combines the remaining utterance section corresponding to the plurality of word blocks "Yes, I have a question about the material" and the later utterance section "Please ask a question", and is shown in FIG. 12 (a).
- the new utterance section "Yes, please ask a question about the material” is displayed in a state where the two utterance sections are combined.
- one speaker is associated with the combined utterance section and displayed. In this way, the speaker is edited and the utterance sections are combined.
- the editing work is performed in chronological order, and the labor of the editing work is suppressed. ..
- the point management unit 127 When the process of step S113 is completed, the point management unit 127 then saves the break start point (step S114). More specifically, the point management unit 127 sets the start point for specifying the delimiter between the two utterance sections before combining the utterance sections as the delimiter start point data, and combines the start point corresponding to that point and the end point of the combined utterance section. It is stored in the point storage unit 115. As a result, the point storage unit 115 stores the division start point data.
- the starting point of the division between the two utterance sections before joining the utterance sections is the utterance section "Yes, I have a question about the material" and the utterance section "Question request".
- the point storage unit 115 associates the identifier “08” of the word block “question” at the end of the utterance section with the identifier “09” of the word block “question” at the beginning of the utterance section. , Stored as break start point data.
- the point storage unit 115 stores the identifier of the word block capable of specifying the start point corresponding to the break start point and the end point of the combined utterance section in addition to storing the break start point data. For example, the point storage unit 115 stores the identifier “03” of the word block “certainly” and the identifier “04” of the word block “yes” as a word block capable of specifying the start point. Further, the point storage unit 115 stores the identifier "11" of the word block "suru” and the predetermined identifier "-" as the identifier of the word block whose end point can be specified. The character ID may be used in the same manner as the word block identifier instead of the word block identifier.
- step S115 the speaker editorial unit 126 determines whether or not the process has been completed after the start point.
- the speaker editorial unit 126 determines that processing has been completed after the start point (step S115: YES)
- step S116: NO the second display control unit 128 then waits until another start point is specified (step S116: NO).
- step S116: NO the second display control unit 128 displays the second edit screen (step S117). More specifically, as shown in FIG. 12B, when the user moves the switching point 17, stops it at a position different from the predetermined position described above, and presses the enter key, the second display is performed. The control unit 128 determines that the other position has been designated as the starting point.
- the second display control unit 128 displays the second edit screen 40 superimposed on the edit area 13, as shown in FIG. 12 (c).
- the second editing screen 40 is a screen that requests the user to perform editing processing.
- the speakers included in the second editing screen 40 are lined up in the same manner as in the first editing screen 30.
- the second display control unit 128 specifies a part of the utterance section corresponding to one word block “yes”. The display of the second edit screen 40 and the specific order of some utterance sections may be reversed.
- step S117 the speaker editorial unit 126 waits until the selection instruction is detected (step S118: NO).
- step S118: YES the speaker editorial unit 126 edits the speaker (step S119). More specifically, as shown in FIG. 12 (c), when the user operates the input unit 130 to select one of the plurality of speakers included in the second editing screen 40 by the pointer Pt, the talk The person editorial unit 126 detects the selection instruction. The user may perform an operation of selecting any of a plurality of numerical values included in the second editing screen 40 using the numeric keypad.
- the speaker editorial unit 126 When the speaker editorial unit 126 detects the selection instruction, it determines that the editing process has occurred, applies the editing process to a part of the specified utterance sections, and selects the speaker of the part of the utterance section. Edit and display to the speaker. In the present embodiment, the speaker editorial unit 126 applies the editing process to a part of the utterance section corresponding to the word block "yes”, and selects the speaker “Yamada” of the part of the utterance section as the selected speaker. Edit and display in "Yamada". Since there is no substantial change in this example, a detailed description will be described later.
- the second display control unit 128 redisplays the second edit screen (step S120). More specifically, as shown in FIG. 14A, the second display control unit 128 superimposes the second editing screen 40 on the editing area 13 and redisplays it. Further, the second display control unit 128 corresponds to one or a plurality of word blocks located after the other start point in the utterance section corresponding to the other start point together with the redisplay of the second edit screen 40. The remaining utterance section is specified as a specific utterance section. In the present embodiment, the second display control unit 128 specifies the remaining utterance section corresponding to the plurality of word blocks "Please ask a question about the material" as a specific utterance section. The order of redisplaying the second editing screen 40 and specifying the remaining utterance section may be reversed.
- step S120 the speaker editorial unit 126 waits until the selection instruction is detected (step S121: NO).
- step S121: YES the point management unit 127 determines whether or not there is a break start point (step S122). More specifically, the point management unit 127 refers to the point storage unit 115 and determines whether or not the division start point data is stored in the point storage unit 115.
- the speaker editorial unit 126 edits the speaker up to the break start point (step S123), and ends the process. More specifically, as shown in FIG. 14A, when the user operates the input unit 130 to select one of the plurality of speakers included in the second editing screen 40 by the pointer Pt, the talk The person editorial unit 126 detects the selection instruction. When the speaker editorial unit 126 detects the selection instruction, it accesses the sentence storage unit 113. Then, as shown in FIG. 15, the speaker editorial unit 126 has the speaker ID of the speaker corresponding to the word block from the word block immediately after another start point to the word block immediately before the break start point from the specified word blocks. The editing process is applied to (currently), and the speaker ID is updated to the speaker ID of the edited speaker.
- the speaker editorial unit 126 detects the selection instruction, it determines that the editing process has occurred, applies the editing process to the specific utterance section, and selects the speaker in the specific utterance section as the selected speaker. Edit and display in.
- the speaker editorial unit 126 applies the editing process to a specific utterance section corresponding to a plurality of word blocks "question about the material", and applies the editing process to the specific utterance section.
- the speaker "Yamada" in the utterance section is edited and displayed as the selected speaker "Kimura".
- step S122 determines that there is no break start point (step S122: NO)
- the speaker editorial unit 126 skips the process of step S123 and ends the process. If there is no break start point, the speaker editorial unit 126 may execute error processing and then end the processing.
- FIGS. 16 (a) and 16 (b) are diagrams for explaining a comparative example.
- FIGS. 16 (a) and 16 (b) are views corresponding to the above-mentioned FIGS. 14 (a) and 14 (b).
- the point management unit 127 stores and manages the division start point data in the point storage unit 115.
- the break start point data is not managed, as shown in FIG. 16A, the user operates the input unit 130 to point to one of the plurality of speakers included in the second editing screen 40.
- the speaker editorial unit 126 detects the selection instruction.
- the speaker editing unit 126 When the speaker editing unit 126 detects the selection instruction, the speaker editing unit 126 edits and displays the speakers in the remaining utterance section corresponding to all the plurality of word blocks specified by the second display control unit 128 to the selected speaker. ..
- the speaker editorial department 126 is the speaker “Yamada” of the remaining utterance section corresponding to all of the plurality of word blocks “Please ask a question about the material”. Is edited and displayed for the selected speaker "Kimura”. For this reason, a plurality of word blocks "please ask a question” that the speaker has no error in are edited, and the user has to edit this part again.
- such useless editing work does not occur. That is, according to the first embodiment, the convenience of the editing process for the speaker identification result is improved as compared with the comparative example.
- the terminal device 100 includes the processing unit 120, and the processing unit 120 includes the first display control unit 121, the speaker editing unit 126, and the second display control unit 128.
- the first display control unit 121 associates the information indicating the identified speaker with respect to the sentence data generated based on the voice recognition with the utterance section corresponding to the identified speaker in the sentence data, and displays the display unit 140. Display in.
- the speaker editorial unit 126 generates an editing process for editing the identification result of the speaker, and when each speaker in two or more adjacent utterance sections is common by the editing process, two or more adjacent utterance sections are used. Is displayed on the display unit 140 in a combined state.
- the second display control unit 128 specifies the start point of the utterance section for editing the identification result of the speaker for a specific utterance section in the two or more combined utterance sections, and from the designated start point, If there is a part corresponding to the start point of any of the two or more sections before the combination before the end point of the two or more combined utterance sections, the editing process is applied to the utterance section from the specified start point to that part. To do. This makes it possible to improve the convenience of the editing process for the speaker identification result.
- the characteristics of the speaker's voice cannot be sufficiently discriminated, and the speaker can be accurately identified. May not be identifiable.
- a short word block a word block of about several characters such as "yes" is applicable. If the speaker cannot be identified accurately, the terminal device 100 may display an erroneous identification result. Even in such a case, according to the present embodiment, the convenience of the editing process for the speaker identification result can be improved.
- FIG. 17A is an example of text data before update according to the second embodiment.
- FIG. 17B is an example of the updated text data according to the second embodiment.
- the speaker editing unit 126 edits the speaker in units of one or a plurality of word blocks, but the speaker may be edited in units of characters included in the word block. In this case, the switching point 17 described above may be moved in character units.
- the speaker editorial unit 126 uses the characters.
- the speaker ID (current) of "quality” is updated from the speaker ID "03" to the speaker ID "04" that identifies the speaker "Kagawa” (not shown).
- the speaker editorial unit 126 divides the identifier of the word block and reassigns the identifier after the word block.
- the speaker editorial unit 126 reassigns the identifier “09” of the word block of the character “question” to the identifier “10”.
- the speaker editorial unit 126 can estimate the utterance time of the new word block based on the utterance time of the original word block. For example, the speaker editorial unit 126 can estimate the utterance time of the original word block + the number of characters ⁇ several milliseconds as the utterance time of the new word block.
- FIG. 18 is an example of the editing support system ST.
- the same components as those of the terminal device 100 shown in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.
- the editing support system ST includes a terminal device 100 and a server device 200.
- the terminal device 100 and the server device 200 are connected via a communication network NW.
- Examples of the communication network NW include Local Area Network (LAN) and the Internet.
- the terminal device 100 includes an input unit 130, a display unit 140, and a communication unit 150.
- the server device 200 includes a storage unit 110, a processing unit 120, and a communication unit 160. Both of the two communication units 150 and 160 can be realized by the network I / F 100D or the short-range wireless communication circuit 100J.
- the storage unit 110 and the processing unit 120 described in the first embodiment may be provided in the server device 200 instead of the terminal device 100. That is, the server device 200 may be used as an editing support device.
- the input unit 130 of the terminal device 100 is operated, and the voice data of the conference described above is stored in the storage unit 110 (more specifically, the voice storage unit 111) via the two communication units 150 and 160. Further, the input unit 130 is operated, and the voice data of the speaker described above is input to the processing unit 120 (more specifically, the speaker identification unit 124) via the two communication units 150 and 160.
- the processing unit 120 accesses the storage unit 110, acquires the voice data of the conference, executes various processes described in the first embodiment on the voice data of the conference, and generates text data. In addition, the processing unit 120 generates a trained model in which the characteristics of the speaker's voice are machine-learned based on the input speaker's voice data. Then, the processing unit 120 identifies the speaker based on the audio data of the conference and the learned model. The processing unit 120 outputs the screen information of the editing support screen 10 that displays the identified speaker and the utterance section corresponding to the speaker in association with each other as the processing result to the communication unit 160. The communication unit 160 transmits the processing result to the communication unit 150, and when the communication unit 150 receives the processing result, the communication unit 150 outputs screen information to the display unit 140. As a result, the display unit 140 displays the editing support screen 10.
- the server device 200 may include the storage unit 110 and the processing unit 120 without the terminal device 100 including the storage unit 110 and the processing unit 120. Further, the server device 200 may include the storage unit 110, and another server device (not shown) connected to the communication network NW may include the processing unit 120. Such a configuration may be used as an editing support system. Even in such an embodiment, the convenience of editing processing for the speaker identification result can be improved.
- the present invention is not limited to the specific embodiment of the present invention, and various modifications and modifications are made within the scope of the gist of the present invention described in the claims. It can be changed.
- the first editing screen 30 is continuously and dynamically displayed.
- the switching point 17 may be moved with the cursor keys, and the first edit screen 30 may be displayed each time the enter key is pressed. Such control may be applied to the second editing screen 40.
- the identification character or the identification code may be adopted as the identification result instead of the speaker.
- Terminal device 110
- Storage unit 115
- Point storage unit 120
- Processing unit 121
- First display control unit 122
- Sentence generation unit 124
- Speaker identification unit 125
- Voice playback unit 126
- Speaker editing unit 127
- Point management unit 128
- Second display control Unit 130
- Display unit 140
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
図1は端末装置100の一例である。端末装置100は編集支援装置の一例である。図1では、端末装置100の一例としてPersonal Computer(PC)が示されているが、タブレット端末といったスマートデバイスであってもよい。端末装置100はキーボード及びポインティングデバイス(以下、単にキーボードという)100Fを備えている。端末装置100はディスプレイ100Gを備えている。ディスプレイ100Gは液晶ディスプレイであってもよいし、有機electro-luminescence(EL)ディスプレイであってもよい。
続いて、図17を参照して、本件の第2実施形態について説明する。図17(a)は第2実施形態に係る更新前の文章データの一例である。図17(b)は第2実施形態に係る更新後の文章データの一例である。第1実施形態では、話者編集部126は話者を一又は複数の単語ブロック単位で編集したが、単語ブロックに含まれる文字の単位で話者を編集してもよい。この場合、上述した切り替わりポイント17を文字単位で移動させればよい。
続いて、図18を参照して、本件の第3実施形態について説明する。図18は編集支援システムSTの一例である。尚、図3に示す端末装置100の各部と同様の構成には同一符号を付し、その説明を省略する。
110 記憶部
115 ポイント記憶部
120 処理部
121 第1表示制御部
122 音声認識部
123 文章生成部
124 話者識別部
125 音声再生部
126 話者編集部
127 ポイント管理部
128 第2表示制御部
130 入力部
140 表示部
Claims (15)
- 音声認識に基づいて生成された文章について識別された話者を示す情報と、前記文章中の、識別された前記話者に対応する区間とを関連付けて表示部に表示し、
前記話者の識別結果を編集する第1の編集処理が発生し、前記第1の編集処理によって、隣接する2以上の区間の各話者が共通する場合には、隣接する前記2以上の区間を結合した状態で前記表示部に表示し、
結合した前記2以上の区間内の特定の区間について、前記話者の識別結果を編集する第2の編集処理を行う区間の始点が指定され、かつ、指定された前記始点から、結合した前記2以上の区間の終点までの間に結合前の前記2以上の区間のいずれかの始点に対応する箇所が存在する場合、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
処理をコンピュータに実行させるための編集支援プログラム。 - 前記第1の編集処理が発生し、前記第1の編集処理によって、隣接する前記2以上の区間の各話者が共通する場合には、隣接する前記2以上の区間に前記第1の編集処理を適用して、隣接する前記2以上の区間を結合した状態で前記表示部に表示する、
ことを特徴とする請求項1に記載の編集支援プログラム。 - 前記第1の編集処理を要求する第1の編集画面と前記第2の編集処理を要求する第2の編集画面を前記表示部に表示し、
前記第1の編集画面に対する指示に基づいて、隣接する前記2以上の区間に前記第1の編集処理を適用し、前記第2の編集画面に対する指示に基づいて、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
ことを特徴とする請求項1又は2に記載の編集支援プログラム。 - 前記第1の編集画面及び前記第2の編集画面は、いずれも、編集対象として前記話者を示す情報を含み、前記話者を示す情報は前記話者の発声順序と発声量の少なくとも一方に応じた優先順位に従って並んでいる、
ことを特徴とする請求項3に記載の編集支援プログラム。 - 前記第1の編集処理が前記話者に対応する区間の区間途中で発生し、前記第1の編集処理によって、前記区間途中より前で隣接する前記2以上の区間の各話者が共通し、かつ、前記区間途中より後で隣接する前記2以上の区間の各話者が共通する場合には、前記区間途中より前で隣接する前記2以上の区間を結合した状態で前記表示部に表示した後に、前記区間途中より後で隣接する前記2以上の区間を結合した状態で前記表示部に表示する、
ことを特徴とする請求項1から4のいずれか1項に記載の編集支援プログラム。 - 前記話者の音声と前記音声認識とに基づいて、前記文章を生成し、
前記話者の音声と前記話者の音声の特徴を学習した学習済モデルとに基づいて、生成した前記文章について前記話者を識別する、
処理を含むことを特徴とする請求項1から5のいずれか1項に記載の編集支援プログラム。 - 指定された前記始点、及び前記2以上の区間のいずれかの始点に対応する箇所を記憶部に保存する処理を含み、
前記記憶部を参照して、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
ことを特徴とする請求項1から6のいずれか1項に記載の編集支援プログラム。 - 音声認識に基づいて生成された文章について識別された話者を示す情報と、前記文章中の、識別された前記話者に対応する区間とを関連付けて表示部に表示し、
前記話者の識別結果を編集する第1の編集処理が発生し、前記第1の編集処理によって、隣接する2以上の区間の各話者が共通する場合には、隣接する前記2以上の区間を結合した状態で前記表示部に表示し、
結合した前記2以上の区間内の特定の区間について、前記話者の識別結果を編集する第2の編集処理を行う区間の始点が指定され、かつ、指定された前記始点から、結合した前記2以上の区間の終点までの間に結合前の前記2以上の区間のいずれかの始点に対応する箇所が存在する場合、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
処理をコンピュータが実行する編集支援方法。 - 音声認識に基づいて生成された文章について識別された話者を示す情報と、前記文章中の、識別された前記話者に対応する区間とを関連付けて表示部に表示し、
前記話者の識別結果を編集する第1の編集処理が発生し、前記第1の編集処理によって、隣接する2以上の区間の各話者が共通する場合には、隣接する前記2以上の区間を結合した状態で前記表示部に表示し、
結合した前記2以上の区間内の特定の区間について、前記話者の識別結果を編集する第2の編集処理を行う区間の始点が指定され、かつ、指定された前記始点から、結合した前記2以上の区間の終点までの間に結合前の前記2以上の区間のいずれかの始点に対応する箇所が存在する場合、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
処理を実行する処理部を備える編集支援装置。 - 前記処理部は、前記第1の編集処理が発生し、前記第1の編集処理によって、隣接する前記2以上の区間の各話者が共通する場合には、隣接する前記2以上の区間に前記第1の編集処理を適用して、隣接する前記2以上の区間を結合した状態で前記表示部に表示する、
ことを特徴とする請求項9に記載の編集支援装置。 - 前記処理部は、前記第1の編集処理を要求する第1の編集画面と前記第2の編集処理を要求する第2の編集画面を前記表示部に表示し、前記第1の編集画面に対する指示に基づいて、隣接する前記2以上の区間に前記第1の編集処理を適用し、前記第2の編集画面に対する指示に基づいて、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
ことを特徴とする請求項9又は10に記載の編集支援装置。 - 前記処理部は、前記第1の編集画面及び前記第2の編集画面は、いずれも、編集対象として前記話者を示す情報を含み、前記話者を示す情報は前記話者の発声順序と発声量の少なくとも一方に応じた優先順位に従って並んでいる、
ことを特徴とする請求項11に記載の編集支援装置。 - 前記処理部は、前記第1の編集処理が前記話者に対応する区間の区間途中で発生し、前記第1の編集処理によって、前記区間途中より前で隣接する前記2以上の区間の各話者が共通し、かつ、前記区間途中より後で隣接する前記2以上の区間の各話者が共通する場合には、前記区間途中より前で隣接する前記2以上の区間を結合した状態で前記表示部に表示した後に、前記区間途中より後で隣接する前記2以上の区間を結合した状態で前記表示部に表示する、
ことを特徴とする請求項9から12のいずれか1項に記載の編集支援装置。 - 前記処理部は、前記話者の音声と前記音声認識とに基づいて、前記文章を生成し、前記話者の音声と前記話者の音声の特徴を学習した学習済モデルとに基づいて、生成した前記文章について前記話者を識別する、
ことを特徴とする請求項9から13のいずれか1項に記載の編集支援装置。 - 前記処理部は、指定された前記始点、及び前記2以上の区間のいずれかの始点に対応する箇所を記憶部に保存し、前記記憶部を参照して、指定された前記始点から前記2以上の区間のいずれかの始点に対応する箇所までの区間に前記第2の編集処理を適用する、
ことを特徴とする請求項9から14のいずれか1項に記載の編集支援装置。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217030247A KR20210132115A (ko) | 2019-03-15 | 2019-03-15 | 편집 지원 프로그램, 편집 지원 방법 및 편집 지원 장치 |
PCT/JP2019/010793 WO2020188622A1 (ja) | 2019-03-15 | 2019-03-15 | 編集支援プログラム、編集支援方法、及び編集支援装置 |
CN201980093895.9A CN113544772A (zh) | 2019-03-15 | 2019-03-15 | 编辑支持程序、编辑支持方法和编辑支持装置 |
JP2021506790A JP7180747B2 (ja) | 2019-03-15 | 2019-03-15 | 編集支援プログラム、編集支援方法、及び編集支援装置 |
EP19920221.9A EP3940695A4 (en) | 2019-03-15 | 2019-03-15 | EDITING SUPPORT PROGRAM, EDITING SUPPORT METHOD AND EDITING SUPPORT DEVICE |
US17/412,472 US20210383813A1 (en) | 2019-03-15 | 2021-08-26 | Storage medium, editing support method, and editing support device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2019/010793 WO2020188622A1 (ja) | 2019-03-15 | 2019-03-15 | 編集支援プログラム、編集支援方法、及び編集支援装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/412,472 Continuation US20210383813A1 (en) | 2019-03-15 | 2021-08-26 | Storage medium, editing support method, and editing support device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020188622A1 true WO2020188622A1 (ja) | 2020-09-24 |
Family
ID=72520594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/010793 WO2020188622A1 (ja) | 2019-03-15 | 2019-03-15 | 編集支援プログラム、編集支援方法、及び編集支援装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210383813A1 (ja) |
EP (1) | EP3940695A4 (ja) |
JP (1) | JP7180747B2 (ja) |
KR (1) | KR20210132115A (ja) |
CN (1) | CN113544772A (ja) |
WO (1) | WO2020188622A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022126454A (ja) * | 2021-02-18 | 2022-08-30 | 富士通株式会社 | 表示制御プログラム、表示制御装置および表示制御方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014038132A (ja) | 2012-08-10 | 2014-02-27 | Toshiba Corp | 情報処理装置、プログラム、及び情報処理方法 |
JP2016062071A (ja) * | 2014-09-22 | 2016-04-25 | 株式会社東芝 | 電子機器、方法およびプログラム |
US20170075652A1 (en) * | 2015-09-14 | 2017-03-16 | Kabushiki Kaisha Toshiba | Electronic device and method |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5173854A (en) * | 1984-06-11 | 1992-12-22 | Tandem Computers Incorporated | Distributed text editing system with remote terminal transmits successive audit messages each identifying individual editing operation |
US5347295A (en) * | 1990-10-31 | 1994-09-13 | Go Corporation | Control of a computer through a position-sensed stylus |
JP3039204B2 (ja) * | 1993-06-02 | 2000-05-08 | キヤノン株式会社 | 文書処理方法及び装置 |
US6535848B1 (en) * | 1999-06-08 | 2003-03-18 | International Business Machines Corporation | Method and apparatus for transcribing multiple files into a single document |
JP5353835B2 (ja) * | 2010-06-28 | 2013-11-27 | ブラザー工業株式会社 | 情報処理プログラムおよび情報処理装置 |
WO2012127592A1 (ja) * | 2011-03-18 | 2012-09-27 | 富士通株式会社 | 通話評価装置、通話評価方法 |
JP5779032B2 (ja) | 2011-07-28 | 2015-09-16 | 株式会社東芝 | 話者分類装置、話者分類方法および話者分類プログラム |
CN102915728B (zh) * | 2011-08-01 | 2014-08-27 | 佳能株式会社 | 声音分段设备和方法以及说话者识别系统 |
US9460722B2 (en) * | 2013-07-17 | 2016-10-04 | Verint Systems Ltd. | Blind diarization of recorded calls with arbitrary number of speakers |
JP6716300B2 (ja) | 2016-03-16 | 2020-07-01 | 株式会社アドバンスト・メディア | 議事録生成装置、及び議事録生成プログラム |
KR101818980B1 (ko) * | 2016-12-12 | 2018-01-16 | 주식회사 소리자바 | 다중 화자 음성 인식 수정 시스템 |
JP6548045B2 (ja) * | 2017-03-31 | 2019-07-24 | 本田技研工業株式会社 | 会議システム、会議システム制御方法、およびプログラム |
US10468031B2 (en) * | 2017-11-21 | 2019-11-05 | International Business Machines Corporation | Diarization driven by meta-information identified in discussion content |
-
2019
- 2019-03-15 CN CN201980093895.9A patent/CN113544772A/zh active Pending
- 2019-03-15 JP JP2021506790A patent/JP7180747B2/ja active Active
- 2019-03-15 WO PCT/JP2019/010793 patent/WO2020188622A1/ja active Application Filing
- 2019-03-15 EP EP19920221.9A patent/EP3940695A4/en active Pending
- 2019-03-15 KR KR1020217030247A patent/KR20210132115A/ko active IP Right Grant
-
2021
- 2021-08-26 US US17/412,472 patent/US20210383813A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014038132A (ja) | 2012-08-10 | 2014-02-27 | Toshiba Corp | 情報処理装置、プログラム、及び情報処理方法 |
JP2016062071A (ja) * | 2014-09-22 | 2016-04-25 | 株式会社東芝 | 電子機器、方法およびプログラム |
US20170075652A1 (en) * | 2015-09-14 | 2017-03-16 | Kabushiki Kaisha Toshiba | Electronic device and method |
Non-Patent Citations (1)
Title |
---|
See also references of EP3940695A4 |
Also Published As
Publication number | Publication date |
---|---|
KR20210132115A (ko) | 2021-11-03 |
US20210383813A1 (en) | 2021-12-09 |
EP3940695A4 (en) | 2022-03-30 |
EP3940695A1 (en) | 2022-01-19 |
JPWO2020188622A1 (ja) | 2021-10-14 |
CN113544772A (zh) | 2021-10-22 |
JP7180747B2 (ja) | 2022-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6074050B2 (ja) | 音声検索システム、音声検索方法、及びコンピュータ読み取り可能な記憶媒体 | |
KR20140022824A (ko) | 오디오 상호작용 메시지 교환 | |
CN101467142A (zh) | 在车辆中从数字媒体存储设备提取元数据以用于媒体选择的系统和方法 | |
JP2011102862A (ja) | 音声認識結果管理装置および音声認識結果表示方法 | |
JP2016157225A (ja) | 音声検索装置、音声検索方法及びプログラム | |
US20220093103A1 (en) | Method, system, and computer-readable recording medium for managing text transcript and memo for audio file | |
JP2016102920A (ja) | 文書記録システム及び文書記録プログラム | |
JP3940723B2 (ja) | 対話情報分析装置 | |
KR102036721B1 (ko) | 녹음 음성에 대한 빠른 검색을 지원하는 단말 장치 및 그 동작 방법 | |
WO2020188622A1 (ja) | 編集支援プログラム、編集支援方法、及び編集支援装置 | |
JP3896760B2 (ja) | 対話記録編集装置、方法及び記憶媒体 | |
WO2010146869A1 (ja) | 編集支援システム、編集支援方法および編集支援プログラム | |
JP2013092912A (ja) | 情報処理装置、情報処理方法、並びにプログラム | |
JP2020052262A (ja) | 修正候補提示方法、修正候補提示プログラムおよび情報処理装置 | |
JP5929879B2 (ja) | 音声出力装置、プログラム、及び音声出力方法 | |
JP6836330B2 (ja) | 情報処理プログラム、情報処理装置及び情報処理方法 | |
JP2022061932A (ja) | アプリとウェブサイトの連動によって音声ファイルに対するメモを作成する方法、システム、およびコンピュータ読み取り可能な記録媒体 | |
JP7344612B1 (ja) | プログラム、会話要約装置、および会話要約方法 | |
KR102377038B1 (ko) | 화자가 표지된 텍스트 생성 방법 | |
KR101576683B1 (ko) | 히스토리 저장모듈을 포함하는 오디오 재생장치 및 재생방법 | |
CN114385109B (zh) | 音频播放的处理方法、装置、电子设备及存储介质 | |
US20240163374A1 (en) | Speech reproduction control system, speech reproduction control method and non-transitory computer-readable recording medium encoded with speech reproduction control program | |
US20220391438A1 (en) | Information processing apparatus, information processing method, and program | |
JP2022052695A (ja) | 音声ファイルに対するテキスト変換記録とメモをともに管理する方法、システム、およびコンピュータ読み取り可能な記録媒体 | |
JP2023164835A (ja) | 情報処理システム、プログラム及び情報処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19920221 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021506790 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20217030247 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2019920221 Country of ref document: EP |