WO2023003271A1 - 화자들의 음성을 처리하기 위한 장치 및 방법 - Google Patents
화자들의 음성을 처리하기 위한 장치 및 방법 Download PDFInfo
- Publication number
- WO2023003271A1 WO2023003271A1 PCT/KR2022/010276 KR2022010276W WO2023003271A1 WO 2023003271 A1 WO2023003271 A1 WO 2023003271A1 KR 2022010276 W KR2022010276 W KR 2022010276W WO 2023003271 A1 WO2023003271 A1 WO 2023003271A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- language
- speakers
- speaker
- location
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 150
- 238000000034 method Methods 0.000 title claims description 31
- 238000013519 translation Methods 0.000 claims abstract description 102
- 230000004044 response Effects 0.000 claims abstract description 9
- 230000005236 sound signal Effects 0.000 claims description 15
- 238000003672 processing method Methods 0.000 claims 10
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000000926 separation method Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000011017 operating method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Definitions
- Embodiments of the present invention relate to an apparatus and method for processing a speaker's voice.
- a microphone is a device that recognizes voice and converts the recognized voice into a voice signal that is an electrical signal.
- a microphone When a microphone is disposed in a space where a plurality of speakers are located, such as a conference room or classroom, the microphone receives all voices from the plurality of speakers and generates voice signals related to the voices of the plurality of speakers.
- An object to be solved by the present invention is to provide a voice processing apparatus and method capable of determining a speaker's location using voice signals of speakers and separating and recognizing voice signals for each speaker.
- the problem to be solved by the present invention is to determine the location of each speaker from the voices of the speakers, determine the current language of each speaker according to the determined location, and determine the current language of each speaker's voice according to the determined current language.
- An object of the present invention is to provide a speech processing device and method capable of generating a translation result translated into a language.
- the problem to be solved by the present invention is to use a translation result in which the current language of each speaker's voice is translated into another language, to generate a transcript of the translation containing the voice contents of each speaker expressed in another language. It is to provide a processing device and method.
- a voice processing apparatus is configured to generate a translation result for the voices of speakers, and the voice processing apparatus is configured to generate a voice signal associated with the voices of the speakers in response to the voices of the speakers.
- a memory configured to store location-language information representing a language corresponding to the location of a sound source of the speakers' voices, and a translation result obtained by translating the language of each speaker's voice using a voice signal and location-language information, and a processor configured to generate a translation meeting transcript including audio contents of each speaker expressed in another language by using the translation result.
- the voice processing apparatus and method it is possible to determine the position of a speaker using the voice signals of the speakers, and to separate and recognize the voice signals for each speaker.
- the position of each speaker is determined from the voice of the speaker, the current language of each speaker is determined according to the determined position, and the language of each speaker is determined according to the determined current language. This has the effect of producing a translation result in which the current language of the voice is translated into another language.
- a meeting transcript of a translation containing the audio content of each speaker expressed in another language There is an effect that can generate.
- FIG. 1 shows a voice processing system according to embodiments of the present invention.
- FIG. 2 shows a voice processing device according to embodiments of the present invention.
- FIG. 3 is a diagram for explaining the operation of a voice processing apparatus according to embodiments of the present invention.
- FIG. 4 is a flow chart illustrating a voice separation method by a voice processing apparatus according to embodiments of the present invention.
- FIG. 5 is a diagram for explaining a translation function of a voice processing apparatus according to embodiments of the present invention.
- FIG. 6 is a diagram for explaining a translation function of a voice processing device according to embodiments of the present invention.
- FIG. 7 is a flowchart illustrating a method of generating a translation result by a voice processing apparatus according to embodiments of the present invention.
- FIG. 8 is a diagram for explaining the operation of a voice processing apparatus according to embodiments of the present invention.
- a voice processing system 10 may include a voice processing device 100 and a translation server 200 .
- the voice processing system 10 may separate voices of speakers (SPK1 to SPK4) and provide translation for each of the separated voices of the speakers (SPK1 to SPK4).
- Speakers SPK1 to SPK4 may be located in a space (eg, a conference room, a vehicle, a lecture room, etc.) and pronounce a voice.
- the first speaker SPK1 located at the first position P1 can utter a voice in the first language (eg, Korean (KR)), and the second speaker SPK2 located at the second position P2 can utter a voice in a second language (eg, English (EN)), and the third speaker (SPK3) located at the third position (P3) utters a voice in a third language (eg, Japanese (JP)).
- the fourth speaker SPK4 located at the fourth position P4 can utter a voice in a fourth language (eg, Chinese (CN)).
- a voice in the first language eg, Korean (KR)
- the second speaker SPK2 located at the second position P2 can utter a voice in a second language (eg, English (EN)
- the third speaker (SPK3) located at the third position (P3)
- the voice processing apparatus 100 may generate voice signals related to the voices of the speakers SPK1 to SPK4 in response to the voices of the speakers SPK1 to SPK4 .
- the voice signal is a signal associated with voices uttered during a specific time period, and may be a signal representing voices of a plurality of speakers.
- the voice processing apparatus 100 may separate and recognize the voices of the speakers SPK1 to SPK4 for each of the speakers SPK1 to SPK4.
- the voices of the plurality of speakers (SPK1 to SPK4) are all included in the voice.
- the voice processing apparatus 100 determines the location of each sound source of the voices of the speakers (SPK1 to SPK4) from the voice signals associated with the voices of the speakers (SPK1 to SPK4), By performing sound source separation based on the location of the sound source, it is possible to extract (or generate) a separated audio signal related to the voice of each speaker (SPK1 to SPK4) from the audio signal.
- the voice processing apparatus 100 generates a separate voice signal associated with the voices of the speakers SPK1 to SPK4 located at each location P1 to P4 based on the location of the sound source of the voices (ie, the location of the speaker).
- the voice processing apparatus 100 classifies components of the voice signal according to positions P1 to P4, and uses the classified components corresponding to each position P1 to P4 to generate each position P1 to P4. It is possible to generate a separate voice signal associated with the voice uttered in .
- the voice processing apparatus 100 may generate a first separation voice signal related to the voice of the first speaker SPK1 uttered at the first position P1 based on the voice signal.
- the first split voice signal may be a voice signal having the highest correlation with the voice of the first speaker SPK1 among the voices of the speakers SPK1 to SPK4.
- the voice component of the first speaker SPK1 may have the highest proportion.
- the voice processing apparatus 100 determines the positions of the speakers SPK1 to SPK4 from the voice signal, and determines the current language of the voice of the speakers SPK1 to SPK4 (that is, the source language (source language)). language) may be determined based on the positions of the speakers SPK1 to SPK4 determined from the voice signal, and a translation result obtained by translating the voice language of the speakers SPK1 to SPK4 into another language may be generated.
- the voice processing apparatus 100 can determine the language (ie, the source language) of the speech of the speakers SPK1 to SPK4 through the positions of the speakers SPK1 to SPK4, There is no need to determine the language by interpreting the ⁇ SPK4) voice itself, which has the effect of reducing the time and resources required for translation.
- Step 100 includes transmitting a translation request to an external translation server and receiving a translation result generated from a translation program executed by the external server from the translation server.
- the voice processing apparatus 100 may generate a translation result for each voice.
- the translation result may be text data or a voice signal related to each voice of the speakers SPK1 to SPK4 expressed in the target language.
- the translation server 200 may provide language translation. According to embodiments, the translation server 200 receives voice signals associated with the voices of the speakers SPK1 to SPK4 from the voice processing device 100, and translates the voices of the speakers SPK1 to SPK4 into other languages. Results may be provided to the voice processing device 100 .
- the translation server 200 may perform a translation task through its own calculation and provide a translation result, but is not limited thereto.
- the translation server 200 may receive a translation result from the outside and provide the input translation result to the voice processing device 100 again.
- the voice processing device 100 may include the translation server 200 according to embodiments. This may mean that the voice processing device 100 stores a translation program executed using a processor of the voice processing device 100 .
- the voice processing device 100 may include a microphone 110 , a communication circuit 120 , a processor 130 and a memory 140 .
- the audio processing device 100 may further include a speaker 150.
- the microphone 110 may generate a voice signal in response to the generated voice.
- the microphone 110 may detect air vibration caused by voice and generate a voice signal that is an electrical signal corresponding to the vibration according to the detection result.
- the microphone 110 may receive voices of speakers SPK1 to SPK4 located at respective positions P1 to P4 and convert the voices of the speakers SPK1 to SPK4 into electrical audio signals.
- each of the plurality of microphones 110 may generate a voice signal in response to a voice.
- voice signals generated from each of the microphones 110 may have a phase difference (or time delay) from each other.
- the voice processing apparatus 100 is described as including a microphone 110 and directly generating voice signals related to the voices of the speakers SPK1 to SPK4 using the microphone 110, but in practice
- the microphone may be configured externally and separate from the voice processing device 100, and the voice processing device 100 may receive a voice signal from the separately configured microphone and process or use the received voice signal.
- the voice processing apparatus 100 may generate a separated voice signal from a voice signal received from a separated microphone.
- the voice processing device 100 includes the microphone 110 and described unless otherwise noted.
- the communication circuit 120 may exchange data with an external device according to a wireless communication method. According to embodiments, the communication circuit 120 may exchange data with an external device using radio waves of various frequencies. For example, the communication circuit 120 may exchange data with an external device according to at least one wireless communication method among short-distance wireless communication, mid-range wireless communication, and long-distance wireless communication.
- the processor 130 may control overall operations of the voice processing device 100 .
- the processor 130 may include a processor having an arithmetic processing function.
- the processor 130 may include a central processing unit (CPU), a micro controller unit (MCU), a graphics processing unit (GPU), a digital signal processor (DSP), an analog to digital converter, or a digital to analog converter (DAC). converter), but is not limited thereto.
- the operation of the voice processing device 100 described herein may be understood as the operation of the processor 130.
- the processor 130 may process voice signals generated by the microphone 110 .
- the processor 130 may convert an analog-type voice signal generated by the microphone 110 into a digital-type voice signal and process the converted digital-type voice signal.
- the type of signal analog or digital
- a digital type audio signal and an analog type audio signal will be used interchangeably.
- the processor 130 may extract (or generate) a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 by using the voice signal generated by the microphone 110 .
- the processor 130 may generate separate voice signals related to the voices of the speakers SPK1 to SPK4 located at each position P1 to P4.
- the split voice signal may be in the form of voice data or text data.
- the processor 130 may determine the location of the sound sources of the voices (ie, the locations of the speakers SPK1 to SPK4) by using a time delay (or phase delay) between the voice signals. For example, the processor 130 may determine relative positions of sound sources (ie, speakers SPK1 to SPK4) with respect to the audio processing device 100 .
- the processor 130 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 based on the determined location of the sound source. According to embodiments, the processor 130 classifies the components of the voice signal for each sound source position P1 to P4, and uses the classified components corresponding to each position P1 to P4 to generate sound source positions P1 to P4. It is possible to generate a separate voice signal associated with the voice uttered in . For example, the processor 130 may generate a first split voice signal related to the voice of the first speaker SPK1 based on the location of the sound source of the voices.
- the processor 130 may match and store sound source location information indicating the determined sound source location with a separated voice signal. For example, the processor 130 matches the first separated voice signal associated with the voice of the first speaker SPK1 and the first sound source location information indicating the location of the sound source of the voice of the first speaker SPK1 to store in the memory 140. can That is, since the position of the sound source corresponds to the position of each of the speakers SPK1 to SPK4, the sound source position information may function as speaker position information for identifying the position of each of the speakers SPK1 to SPK4.
- the processor 130 may determine the language (ie, the source language) of the voices of the speakers SPK1 to SPK4 using the sound source location information. According to embodiments, the processor 130 may determine the language of each voice by determining sound source location information from the voices of the speakers SPK1 to SPK4 and determining location-language information corresponding to the determined sound source location information. At this time, the location-language information is information representing the language of the speakers SPK1 to SPK4 at each location, and may be matched with each location in advance and stored in the memory 140 . Let me tell you about this later.
- the processor 130 may transmit a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 and information indicating the language of the corresponding voice to the translation server 200 using the communication circuit 120 . According to embodiments, the processor 130 may generate a control command for transmitting the separated voice signal and information indicating the language of the voice to the translation server 200 .
- the translation server 200 may generate a translation result in which the language of the speaker's voice is reversed using the separated voice signal.
- the processor 130 translates the voices of the speakers SPK1 to SPK4 using the separated voice signal and location-language information associated with each voice of the speakers SPK1 to SPK4, and translates the voice of the speakers SPK1 to SPK4.
- the processor 130 may generate a translation result obtained by translating the speaker's voice into the target language by executing a translation program and providing the translation program with a separate voice signal and location-language information associated with the speaker's voice as inputs. there is.
- the translation result may mean all text data or voice signals related to the voices of the speakers SPK1 to SPK4 expressed in the target language.
- the processor 130 may generate meeting minutes written in the languages of the speakers SPK1 to SPK4 using the translation result. For example, the processor 130 generates text data for each of the voices of the speakers SPK1 to SPK4 using the separated voice signal, and arranges or lists the text data of each speaker according to the time point at which the voice is recognized, thereby recording the meeting minutes. can create
- Operations of the processor 130 or the voice processing device 100 described herein may be implemented in the form of a program executable by a computing device.
- the processor 130 may execute an application stored in the memory 140 and perform operations corresponding to instructions instructing specific operations according to the execution of the application.
- the memory 140 may store data necessary for the operation of the audio processing device 100 .
- the memory 140 may include at least one of non-volatile memory and volatile memory.
- the memory 140 may store identifiers corresponding to positions P1 to P4 in space.
- the identifier may be data for distinguishing the positions P1 to P4. Since each of the speakers SPK1 to SPK4 is located at each of the positions P1 to P4, it is possible to distinguish each of the speakers SPK1 to SPK4 using identifiers corresponding to the positions P1 to P4.
- the first identifier indicating the first position P1 may indicate the first speaker SPK1. From this point of view, the identifiers corresponding to the positions P1 to P4 in space may function as speaker identifiers for identifying each of the speakers SPK1 to SPK4.
- the identifier may be input through an input device (eg, a touch pad) of the voice processing device 100 .
- the memory 140 may store sound source location information related to the position of each of the speakers SPK1 to SPK4 and a separate voice signal associated with the voice of each of the speakers SPK1 to SPK4.
- the memory 140 may store position-language information indicating the language of the voices of the speakers SPK1 to SPK4. According to embodiments, location-language information may be matched to each location in advance and stored in the memory 140 . Let me tell you about this later.
- the speaker 150 may vibrate under the control of the processor 130, and voice may be generated according to the vibration. According to embodiments, the speaker 150 may reproduce a voice related to the voice signal by forming a vibration corresponding to the voice signal.
- FIG. 3 is a diagram for explaining the operation of a voice processing apparatus according to embodiments of the present invention.
- the operation of the voice processing device 100 described herein may be understood as an operation performed under the control of the processor 130 included in the voice processing device 100.
- each of the speakers SPK1 to SPK4 located at each position P1 to P4 may speak.
- the voice processing apparatus 100 may generate a separate voice signal associated with the voices of the speakers SPK1 to SPK4 from the voices of the speakers SPK1 to SPK4, and Sound source position information indicating the position of each sound source, that is, the speakers SPK1 to SPK4, may be stored.
- the voice processing apparatus 100 may determine the locations of sound sources of voices (ie, the locations of speakers SPK1 to SPK4) by using a time delay (or phase delay) between voice signals.
- the audio processing apparatus 100 may determine relative positions of sound sources (ie, speakers SPK1 to SPK4) with respect to the audio processing apparatus 100.
- the voice processing apparatus 100 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 based on the determined location of the sound source.
- the first speaker SPK1 utters the voice 'AAA'
- the second speaker SPK2 utters the voice 'BBB'
- the third speaker SPK3 utters the voice 'CCC'.
- the fourth speaker (SPK4) utters the voice 'DDD'.
- the voice processing apparatus 100 may generate voice signals related to the voices of the speakers SPK1 to SPK4 in response to the voices of the speakers SPK1 to SPK4 .
- the generated voice signal includes components associated with the voices 'AAA', 'BBB', 'CCC' and the voice 'DDD' of the speakers SPK1 to SPK4.
- the voice processing apparatus 100 uses the generated voice signal to provide a first split audio signal associated with the voice 'AAA' of the first speaker SPK1 and a second split audio signal associated with the voice 'BBB' of the second speaker SPK2.
- a voice signal, a third separate voice signal associated with the voice 'CCC' of the third speaker (SPK3) and a fourth separate voice signal associated with the voice 'DDD' of the fourth speaker (SPK4) may be generated.
- the voice processing apparatus 100 stores the separated voice signals associated with the voices of the speakers SPK1 to SPK4 and the sound source location information indicating the locations (ie, sound source locations) of the speakers SPK1 to SPK4 in a memory ( 140) can be stored.
- the audio processing apparatus 100 may include a first separated audio signal associated with 'AAA' of the first speaker SPK1 and a first location P1 that is the location of the sound source of the first speaker SPK1.
- One location information can be stored in the memory 140 . For example, as shown in FIG. 3 , each of the separated voice signals and sound source location information may be matched and stored.
- the voice processing apparatus 100 may generate a separate voice signal related to the voice of each speaker (SPK1 to SPK4) from the voices of the speakers (SPK1 to SPK4), and separate voice Location information indicating the location of each signal and the speakers SPK1 to SPK4 may be stored.
- FIG. 4 is a flow chart illustrating a voice separation method by a voice processing apparatus according to embodiments of the present invention.
- the operating method of the voice processing device to be described with reference to FIG. 4 may be stored in a non-transitory storage medium and implemented as an application (eg, a voice separation application) executable by a computing device.
- the processor 130 may execute an application stored in the memory 140 and perform operations corresponding to instructions instructing specific operations according to the execution of the application.
- the voice processing apparatus 100 may receive voice signals associated with voices of speakers SPK1 to SPK4 (S110). According to embodiments, the voice processing device 100 may convert voice detected in space into a voice signal that is an electrical signal.
- the voice processing apparatus 100 may determine the positions of the speakers SPK1 to SPK4 using voice signals related to the voices of the speakers SPK1 to SPK4 (S120). According to embodiments, the voice processing apparatus 100 may generate sound source location information indicating locations of speakers SPK1 to SPK4 and corresponding locations of sound sources (ie, locations of speakers SPK1 to SPK4).
- the voice processing apparatus 100 may generate a separate voice signal associated with each of the voices of the speakers SPK1 to SPK4 based on the location of the sound source for each of the voices (S130). According to embodiments, the voice processing apparatus 100 may generate a separated voice signal associated with each of the voices of the speakers SPK1 to SPK4 by separating the generated voice signal based on the location of the sound source for each of the voices. there is. For example, the voice processing apparatus 100 may generate a separated voice signal associated with each of the voices of the speakers SPK1 to SPK4 by separating components included in the voice signal based on the location of the sound source.
- the audio processing apparatus 100 may store sound source location information representing the location of a sound source and a separated voice signal (S140). According to embodiments, the voice processing apparatus 100 may match and store sound source location information representing the location of a sound source and a separate voice signal associated with the voices of the speakers SPK1 to SPK4 . For example, the voice processing apparatus 100 may match and store data corresponding to a separated voice signal related to the voice of each of the speakers SPK1 to SPK4 and sound source location information.
- the voice processing device 100 executes an application (eg, a voice separation application) stored in the memory 140, and the speakers (SPK1 to SPK4) Separate voice signals associated with the voices of the speakers SPK1 to SPK4 may be generated (or separated) from the voice signals associated with the voices of the speakers SPK1 to SPK4 .
- an application eg, a voice separation application
- the speakers SPK1 to SPK4
- Separate voice signals associated with the voices of the speakers SPK1 to SPK4 may be generated (or separated) from the voice signals associated with the voices of the speakers SPK1 to SPK4 .
- the first speaker (SPK1) utters the voice 'AAA' in Korean (KR)
- the second speaker (SPK2) utters the voice 'BBB' in English (EN)
- the third speaker ( SPK3) utters the voice 'CCC' in Chinese (CN)
- the fourth speaker (SPK4) utters the voice 'DDD' in Japanese (JP).
- the voice processing apparatus 100 determines the location of each speaker (SPK1 to SPK4) from the voices of the speakers (SPK1 to SPK4), and determines the position of each speaker (SPK1 to SPK4) An associated separate speech signal can be generated.
- the voice processing apparatus 100 determines the language of the voices of the speakers SPK1 to SPK4 using position-language information stored in correspondence with the respective positions of the speakers SPK1 to SPK4, can provide a translation of the voice of
- the voice processing apparatus 100 may store in the memory 140 first position-language information indicating that the language corresponding to the first position P1 is 'KR'.
- the audio processing apparatus 100 provides the first sound source location information indicating the first separated audio signal associated with the voice 'AAA' of the first speaker SPK1 and the first position P1 that is the location of the first speaker SPK1.
- first location-language information representing Korean (KR), which is the language of the voice 'AAA' of the first speaker (SPK1) may be stored in the memory 140 .
- FIG. 6 is a diagram for explaining a translation function of a voice processing device according to embodiments of the present invention.
- the voice processing apparatus 100 generates a separate voice signal related to each voice of the speakers SPK1 to SPK4, and uses the separate voice signals to generate each voice of the speakers SPK1 to SPK4. You can generate translation results for .
- the translation result indicates a result of converting the language of the voices of the speakers SPK1 to SPK4 into another language (eg, a target language).
- the voice processing device 100 converts the separated voice signal into text data (eg, speech-to-text (STT) conversion), generates a translation result for the converted text data, and converts the translation result into a voice signal. Conversion (eg, Text-to-Speech (TTS) conversion) may be performed. That is, the translation result referred to in this specification may mean all text data or voice signals related to the voices of the speakers SPK1 to SPK4 expressed in the target language.
- text data eg, speech-to-text (STT) conversion
- TTS Text-to-Speech
- the voice processing device 100 may output the generated translation result.
- the voice processing device 100 may output the generated translation result through the speaker 150 or transmit it to another external device.
- the first speaker SPK1 utters the voice 'AAA' in Korean (KR).
- the source language of the voice 'AAA' of the first speaker (SPK1) is Korean (KR).
- the voice processing apparatus 100 determines the location (eg, P1) of the sound source of the first speaker SPK1 in response to the voice 'AAA' of the first speaker SPK1, and determines the location of the sound source of the first speaker SPK1 based on the location of the sound source. ) may generate a first separated voice signal associated with the voice 'AAA'.
- the voice processing apparatus 100 may provide translation for the voices of the speakers SPK1 to SPK4 by using the generated separated voice signals.
- the voice processing apparatus 100 uses the location-language information stored in the memory 140 to determine the language of voices uttered by the speakers SPK1 to SPK4 located at each location P1 to P4. It is determined, and a translation result for the language of each of the speakers (SPK1 to SPK4) may be generated according to the determined language.
- the audio processing apparatus 100 uses the first sound source location information indicating the first location P1, which is the location of the sound source of the voice 'AAA' of the first speaker SPK1, in the memory 140 ), it is possible to read first location-language information indicating that the language of the voice 'AAA' uttered at the first location P1 is Korean (KR).
- the voice processing device 100 may generate a translation result obtained by translating Korean (KR), which is the language of the first speaker SPK1's voice 'AAA', into another language.
- the voice processing apparatus 100 uses a separated voice signal for the voice 'AAA' of the first speaker (SPK1) and information indicating that the language of the voice 'AAA' is Korean (KR) to generate the voice 'AAA' You can create a translation result that translates the language of ' into another language.
- the language (ie, the target language) to which the voices of the speakers SPK1 to SPK4 are to be translated may be pre-determined, designated by an external user's input, or set by the voice processing device 100. there is.
- the voice processing apparatus 100 determines the language of a voice of one of the speakers SPK1 to SPK4 based on location-language information indicating a language corresponding to the location of the speakers SPK1 to SPK4. It is possible to generate a translation result of translating into the languages of the other speakers.
- the voice processing apparatus 100 is configured to translate the voice 'AAA' of the first speaker SPK1 located at the first location P1 into a language to be translated based on pre-stored location-language information ( That is, it is possible to determine that the target languages) are languages (English, Chinese, and Japanese) corresponding to the positions of the remaining speakers SPK2 to SPK4 except for the first speaker SPK1. According to the determination, the voice processing device 100 may generate translation results in which the language of the voice 'AAA' is translated into English, Chinese, and Japanese.
- the voice processing apparatus 100 converts the voices of the speakers SPK1 to SPK4 from the voices of the speakers SPK1 to SPK4.
- the location ie, the location of the sound source
- the languages source language and destination language
- the voice of the speakers SPK1 to SPK4 is determined based on the determined language. There is a translatable effect.
- the voice processing apparatus 100 may provide translation results to the remaining speakers (SPK2 to SPK4). Also, according to embodiments, the voice processing device 100 may transmit the translation result to other devices (eg, a speaker, a display, or an external device).
- other devices eg, a speaker, a display, or an external device.
- FIG. 7 is a flowchart illustrating a method of generating a translation result by a voice processing apparatus according to embodiments of the present invention.
- the operating method of the voice processing device to be described with reference to FIG. 7 may be stored in a non-transitory storage medium and implemented as an application (eg, a translation application) executable by a computing device.
- the processor 130 may execute an application stored in the memory 140 and perform operations corresponding to instructions instructing specific operations according to the execution of the application.
- the voice processing apparatus 100 may receive voice signals associated with voices of speakers SPK1 to SPK4 (S210).
- the voice processing apparatus 100 may determine the positions of the speakers SPK1 to SPK4 using voice signals related to the voices of the speakers SPK1 to SPK4 (S220). According to embodiments, the voice processing apparatus 100 may generate sound source location information indicating locations of speakers SPK1 to SPK4 and corresponding locations of sound sources (ie, locations of speakers SPK1 to SPK4).
- the voice processing apparatus 100 may generate a separate voice signal associated with each of the voices of the speakers SPK1 to SPK4 based on the location of the sound source for each of the voices (S230).
- the voice processing apparatus 100 may determine the voice language (ie, the current language) of the speakers SPK1 to SPK4 based on the positions of the speakers SPK1 to SPK (S240). According to embodiments, the voice processing apparatus 100 may determine the language (that is, the current language) of each of the speakers SPK1 to SPK4 using the determined sound source location information and the stored location-language information (S240 ).
- the voice processing apparatus 100 may generate a translation result for each voice of the speakers SPK1 to SPK4 according to the language of the determined voice (S250). According to embodiments, the voice processing apparatus 100 uses separate voice signals of each of the speakers SPK1 to SPK4 and information about the language of the speakers SPK1 to SPK4 to convert the speakers SPK1 to SPK4. A translation result for each voice can be generated.
- the voice processing apparatus 100 converts the language of the voice of one of the speakers SPK1 to SPK4 to the other speakers based on position-language information indicating a language corresponding to the position of the speakers SPK1 to SPK4. It is possible to generate translation results translated into their language.
- the voice processing apparatus 100 may generate meeting minutes (MOM) using separate voice signals associated with the voices of the speakers SPK1 to SPK4 .
- MOM meeting minutes
- the minutes of the meeting may be data recording the contents of each utterance of the speakers (SPK1 to SPK4).
- the utterance contents of each of the speakers SPK1 to SPK4 may be arranged in chronological order.
- the voice processing apparatus 100 generates meeting minutes (MOM) and stores the speech contents of the speakers (SPK1 to SPK4) in the meeting minutes (MOM) by using the separated voice signals related to the voices of the speakers (SPK1 to SPK4). (or record). At this time, the voice processing apparatus 100 may match and record the content of speech of each speaker (SPK1 to SPK4) and an identifier (eg, name) for identifying each speaker (SPK1 to SPK4). Therefore, it is possible to check which speaker uttered what content through the meeting minutes (MOM).
- MOM meeting minutes
- the meeting minutes (MOM) may be composed of at least one of text data, voice data, or image data, but is not limited thereto.
- the voice processing apparatus 100 may generate the meeting minutes MOM by processing the separated voice signals associated with the voices of the speakers SPK1 to SPK4. For example, the voice processing apparatus 100 responds to the voices of the speakers SPK1 to SPK4, generates separate voice signals associated with the voices of the speakers SPK1 to SPK4, and converts the generated separate voice signals into text. By saving, meeting minutes (MOM) can be created.
- the speech processing apparatus 100 provides not only meeting minutes (ie, original meeting minutes) including contents of speech of each of the speakers (SPK1 to SPK4) expressed in the original language (ie, the starting language) , Meeting minutes (ie, translation meeting minutes) including the contents of the voices of each of the speakers (SPK1 to SPK4) expressed in different languages (ie, target languages) may be generated.
- meeting minutes ie, original meeting minutes
- translation meeting minutes including the contents of the voices of each of the speakers (SPK1 to SPK4 expressed in different languages
- target languages ie, target languages
- the voice processing apparatus 100 generates an original transcript using the separated voice signals for each of the speakers (SPK1 to SPK4), and uses the translation result for the separated voice signals to convert the speakers (SPK1 to SPK4).
- SPK4 It is possible to generate translation meeting minutes (MOM) translated into the language of each voice.
- the voice processing apparatus 100 converts the audio contents of the speakers SPK1 to SPK4 into Korean (KR), which is the language of the first speaker SPK1 among the speakers SPK1 to SPK4.
- KR Korean
- MOM can be created.
- the voice processing apparatus 100 includes a first separated voice signal (that is, expressed in Korean (KR)) associated with the voice of the first speaker SPK1 among the speakers SPK1 to SPK4, and the remaining speakers SPK2 to SPK4.
- Korean meeting minutes (KR MOM) may be generated using a translation result in which the voice language of SPK4) is translated into Korean (KR), which is the language of the first speaker (SPK1).
- the voice processing apparatus 100 may generate meeting minutes EN MOM, CN MOM, and JP MOM in which voice contents of the speakers SPK1 to SPK4 are expressed in the languages of the other speakers SPK2 to SPK4.
- the first speaker SPK1 at the first position P1 utters the voice 'AAA' in Korean
- the third speaker SPK3 at the third position P3 speaks the voice 'CCC' is uttered in Chinese
- the second speaker (SPK2) at the second position (P2) utters the voice 'BBB' in English.
- the voice processing apparatus 100 determines the first position P1, which is the location of the sound source of the voice 'AAA', and generates a first separated audio signal associated with the voice 'AAA'.
- the voice processing device 100 may determine that the language (ie, the source language) of the voice 'AAA' is Korean (KR) based on the location-language information.
- the voice processing apparatus 100 may generate Korean meeting minutes (KR MOM) using the first separated voice signal for the voice 'AAA'.
- the voice processing apparatus 100 may generate Korean meeting minutes (KR MOM) and record (or store) text data corresponding to a first separated voice signal for voice 'AAA' in the Korean meeting minutes (KR MOM).
- the Korean language meeting minutes (KR MOM) may include contents about voice 'AAA' uttered in Korean (KR).
- the voice processing device 100 may generate English meeting minutes (EN MOM), Chinese meeting minutes (CN MOM), and Japanese meeting minutes (JP MOM) by using the translation result for the voice 'AAA'.
- the voice processing device 100 generates English meeting minutes (EN MOM), converts a translation result in which the language of voice 'AAA' is translated into English (EN), and converts the text data into English meeting minutes (EN MOM). can be recorded (or stored). That is, the English meeting minutes (EN MOM) may include contents about voice 'AAA' written in English (EN).
- the voice processing device 100 may record the contents of the voice 'CCC' uttered in Chinese (CN) in the Chinese meeting transcript (CN MOM) using the third separated voice signal for the voice 'CCC', and the voice Using the translation result for 'CCC', it is possible to record the content of the voice 'CCC' uttered in another language in meeting minutes (KR MOM, EN MOM, JP MOM) in other languages.
- the voice processing device 100 may record the content of the voice 'BBB' uttered in English (EN) in the English meeting transcript (EN MOM) using the second separated voice signal for the voice 'BBB', and the voice Using the translation result for 'BBB', the content of the voice 'BBB' spoken in another language can be recorded in meeting minutes (KR MOM, CN MOM, JP MOM) in other languages.
- Embodiments of the present invention relate to an apparatus and method for processing a speaker's voice.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Otolaryngology (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (14)
- 화자들의 음성에 대한 번역 결과를 생성하도록 구성되는 음성 처리 장치에 있어서,화자들의 음성에 응답하여, 화자들의 음성과 연관된 음성 신호를 생성하도록 구성되는 마이크;화자들의 음성의 음원 위치에 대응하는 언어를 나타내는 위치-언어 정보를 저장하도록 구성되는 메모리;상기 음성 신호 및 위치-언어 정보를 이용하여 화자들 각각의 음성의 언어를 번역한 번역 결과를 생성하고, 번역 결과를 이용하여, 다른 언어로 표현된 화자들 각각의 음성 내용이 포함된 번역문 회의록을 생성하도록 구성되는 프로세서를 포함하는,음성 처리 장치.
- 제1항에 있어서, 상기 프로세서는,마이크로부터 생성된 음성 신호를 이용하여, 화자들의 음성의 음원 위치를 결정하고, 결정된 음원 위치를 나타내는 음원 위치 정보를 생성하고,상기 음성 신호로부터 각 음원 위치에서 발화된 음성과 연관된 분리 음성 신호를 생성하고,상기 메모리에 저장된 위치-언어 정보를 이용하여, 화자들의 음성의 현재 언어를 결정하고,상기 분리 음성 신호와 결정된 현재 언어를 이용하여, 화자들의 음성의 현재 언어가 다른 언어로 번역된 번역 결과를 생성하는,음성 처리 장치.
- 제1항에 있어서, 상기 프로세서는,마이크로부터 생성된 음성 신호를 이용하여, 화자들의 음성의 음원 위치를 결정하고, 결정된 음원 위치를 나타내는 음원 위치 정보를 생성하고,상기 음성 신호로부터 각 음원 위치에서 발화된 음성과 연관된 분리 음성 신호를 생성하고,상기 메모리에 저장된 위치-언어 정보를 이용하여, 화자들의 음성의 현재 언어를 결정하고,상기 분리 음성 신호와 결정된 현재 언어를 이용하여, 화자들의 음성의 현재 언어가 다른 언어로 번역된 번역 결과를 생성하는,음성 처리 장치.
- 제2항에 있어서, 상기 프로세서는,상기 메모리에 저장된 위치-언어 정보를 이용하여, 화자들 각각의 음성의 현재 언어가 번역될 다른 언어를 결정하고,결정된 현재 언어 및 다른 언어에 따라, 화자들의 음성의 현재 언어가 다른 언어로 번역된 번역 결과를 생성하는,음성 처리 장치.
- 제4항에 있어서, 상기 프로세서는,화자들의 음성과 연관된 음성 신호를 이용하여 화자들 중 제1화자의 음성의 음원 위치를 나타내는 제1음원 위치 정보를 생성하고,상기 음성 신호와 상기 제1음원 위치 정보를 이용하여, 상기 제1화자의 음성과 연관된 제1분리 음성 신호를 생성하고,상기 메모리에 저장된 위치-언어 정보를 참조하여, 상기 제1음원 위치 정보에 대응하는 제1화자의 음성의 언어를 결정하고,상기 메모리에 저장된 위치-언어 정보를 참조하여, 상기 화자들 중 상기 제1화자를 제외한 나머지 화자들의 음성의 언어를 결정하고,상기 제1분리 음성 신호를 이용하여, 상기 제1화자의 음성의 언어가 나머지 화자들의 음성의 언어로 번역된 번역 결과를 생성하는,음성 처리 장치.
- 제2항에 있어서, 상기 프로세서는,상기 분리 음성 신호를 이용하여 화자들의 음성의 현재 언어로 표현된 화자들 각각의 음성 내용이 포함된 원문 회의록을 생성하는,음성 처리 장치.
- 제1항에 있어서, 상기 프로세서는,상기 번역문 회의록을 생성하고, 상기 번역 결과를 텍스트 변환하고, 텍스트 데이터를 상기 번역문 회의록에 기록하는,음성 처리 장치.
- 화자들의 음성에 대한 번역 결과를 생성하도록 구성되는 음성 처리 장치를 이용한 음성 처리 방법에 있어서,화자들의 음성의 음원 위치에 대응하는 언어를 나타내는 위치-언어 정보를 저장하는 단계;마이크를 이용하여 화자들의 음성과 연관된 음성 신호를 생성하는 단계;상기 음성 신호 및 위치-언어 정보를 이용하여 화자들 각각의 음성의 언어를 번역한 번역 결과를 생성하는 단계;상기 번역 결과를 이용하여, 다른 언어로 표현된 화자들 각각의 음성 내용이 포함된 번역문 회의록을 생성하는 단계를 포함하는,음성 처리 방법.
- 제8항에 있어서, 상기 번역 결과를 생성하는 단계는,생성된 음성 신호를 이용하여, 화자들의 음성의 음원 위치를 결정하는 단계;결정된 음원 위치를 나타내는 음원 위치 정보를 생성하는 단계;상기 음성 신호로부터 각 음원 위치에서 발화된 음성과 연관된 분리 음성 신호를 생성하는 단계;저장된 위치-언어 정보를 이용하여, 화자들의 음성의 현재 언어를 결정하는 단계; 및상기 분리 음성 신호와 결정된 현재 언어를 이용하여, 화자들의 음성의 현재 언어가 다른 언어로 번역된 번역 결과를 생성하는 단계를 포함하는,음성 처리 방법.
- 제9항에 있어서,상기 마이크는 어레이를 이루도록 배치된 복수의 마이크들을 포함하고,상기 화자들의 음성의 음원 위치를 결정하는 단계는,상기 복수의 마이크들로부터 생성된 복수의 음성 신호들 사이의 시간 지연에 기초하여 상기 음원 위치를 결정하는 단계를 포함하는,음성 처리 방법.
- 제9항에 있어서, 상기 번역 결과를 생성하는 단계는,저장된 위치-언어 정보를 이용하여, 화자들 각각의 음성의 현재 언어가 번역될 다른 언어를 결정하는 단계; 및결정된 현재 언어 및 다른 언어에 따라, 화자들의 음성의 현재 언어가 다른 언어로 번역된 번역 결과를 생성하는 단계를 더 포함하는,음성 처리 방법.
- 제11항에 있어서, 상기 번역 결과를 생성하는 단계는,화자들의 음성과 연관된 음성 신호를 이용하여 화자들 중 제1화자의 음성의 음원 위치를 나타내는 제1음원 위치 정보를 생성하는 단계;상기 음성 신호와 상기 제1음원 위치 정보를 이용하여, 상기 제1화자의 음성과 연관된 제1분리 음성 신호를 생성하는 단계;저장된 위치-언어 정보를 참조하여, 상기 제1음원 위치 정보에 대응하는 제1화자의 음성의 언어를 결정하는 단계;저장된 위치-언어 정보를 참조하여, 상기 화자들 중 상기 제1화자를 제외한 나머지 화자들의 음성의 언어를 결정하는 단계; 및상기 제1분리 음성 신호를 이용하여, 상기 제1화자의 음성의 언어가 나머지 화자들의 음성의 언어로 번역된 번역 결과를 생성하는 단계를 더 포함하는,음성 처리 방법.
- 제9항에 있어서, 상기 음성 처리 방법은,상기 분리 음성 신호를 이용하여 화자들의 음성의 현재 언어로 표현된 화자들 각각의 음성 내용이 포함된 원문 회의록을 생성하는 단계를 더 포함하는,음성 처리 방법.
- 제8항에 있어서, 상기 음성 처리 방법은,상기 번역 결과를 텍스트 변환하고, 텍스트 데이터를 상기 번역문 회의록에 기록하는 단계를 더 포함하는,음성 처리 방법.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280062878.0A CN117980989A (zh) | 2021-07-19 | 2022-07-14 | 用于处理说话者的语音的设备和方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2021-0094265 | 2021-07-19 | ||
KR1020210094265A KR20230013473A (ko) | 2021-07-19 | 2021-07-19 | 화자들의 음성을 처리하기 위한 장치 및 방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023003271A1 true WO2023003271A1 (ko) | 2023-01-26 |
Family
ID=84979437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/010276 WO2023003271A1 (ko) | 2021-07-19 | 2022-07-14 | 화자들의 음성을 처리하기 위한 장치 및 방법 |
Country Status (3)
Country | Link |
---|---|
KR (1) | KR20230013473A (ko) |
CN (1) | CN117980989A (ko) |
WO (1) | WO2023003271A1 (ko) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015060095A (ja) * | 2013-09-19 | 2015-03-30 | 株式会社東芝 | 音声翻訳装置、音声翻訳方法およびプログラム |
JP2017084354A (ja) * | 2015-10-23 | 2017-05-18 | パナソニックIpマネジメント株式会社 | 翻訳装置及び翻訳システム |
JP2018081239A (ja) * | 2016-11-17 | 2018-05-24 | 富士通株式会社 | 音声処理方法、音声処理装置、及び音声処理プログラム |
KR20180131155A (ko) * | 2017-05-31 | 2018-12-10 | 네이버랩스 주식회사 | 번역 장치, 번역 방법 및 번역 컴퓨터 프로그램 |
KR20190026521A (ko) * | 2017-09-05 | 2019-03-13 | 엘지전자 주식회사 | 인공지능 홈 어플라이언스 및 음성 인식 서버 시스템의 동작 방법 |
-
2021
- 2021-07-19 KR KR1020210094265A patent/KR20230013473A/ko unknown
-
2022
- 2022-07-14 CN CN202280062878.0A patent/CN117980989A/zh active Pending
- 2022-07-14 WO PCT/KR2022/010276 patent/WO2023003271A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015060095A (ja) * | 2013-09-19 | 2015-03-30 | 株式会社東芝 | 音声翻訳装置、音声翻訳方法およびプログラム |
JP2017084354A (ja) * | 2015-10-23 | 2017-05-18 | パナソニックIpマネジメント株式会社 | 翻訳装置及び翻訳システム |
JP2018081239A (ja) * | 2016-11-17 | 2018-05-24 | 富士通株式会社 | 音声処理方法、音声処理装置、及び音声処理プログラム |
KR20180131155A (ko) * | 2017-05-31 | 2018-12-10 | 네이버랩스 주식회사 | 번역 장치, 번역 방법 및 번역 컴퓨터 프로그램 |
KR20190026521A (ko) * | 2017-09-05 | 2019-03-13 | 엘지전자 주식회사 | 인공지능 홈 어플라이언스 및 음성 인식 서버 시스템의 동작 방법 |
Also Published As
Publication number | Publication date |
---|---|
KR20230013473A (ko) | 2023-01-26 |
CN117980989A (zh) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014107076A1 (en) | Display apparatus and method of controlling a display apparatus in a voice recognition system | |
WO2018008885A1 (ko) | 영상처리장치, 영상처리장치의 구동방법 및 컴퓨터 판독가능 기록매체 | |
WO2020189955A1 (en) | Method for location inference of iot device, server, and electronic device supporting the same | |
WO2011074771A2 (ko) | 외국어 학습 장치 및 그 제공 방법. | |
WO2014010982A1 (en) | Method for correcting voice recognition error and broadcast receiving apparatus applying the same | |
WO2019117362A1 (ko) | 온라인 노래방 서비스의 반주 및 가창 음성 간 동기화 시스템 및 이를 수행하기 위한 장치 | |
WO2016080713A1 (ko) | 음성제어 영상표시 장치 및 영상표시 장치의 음성제어 방법 | |
EP3304548A1 (en) | Electronic device and method of audio processing thereof | |
WO2021251539A1 (ko) | 인공신경망을 이용한 대화형 메시지 구현 방법 및 그 장치 | |
WO2022203152A1 (ko) | 다화자 훈련 데이터셋에 기초한 음성합성 방법 및 장치 | |
WO2023003271A1 (ko) | 화자들의 음성을 처리하기 위한 장치 및 방법 | |
WO2019004762A1 (ko) | 이어셋을 이용한 통역기능 제공 방법 및 장치 | |
WO2021054671A1 (en) | Electronic apparatus and method for controlling voice recognition thereof | |
WO2018074658A1 (ko) | 하이브리드 자막 효과 구현 단말 및 방법 | |
WO2022092790A1 (ko) | 음성을 처리할 수 있는 모바일 단말기 및 이의 작동 방법 | |
WO2022065934A1 (ko) | 음성 처리 장치 및 이의 작동 방법 | |
WO2022250387A1 (ko) | 음성을 처리하기 위한 음성 처리 장치, 음성 처리 시스템 및 음성 처리 방법 | |
WO2022065891A1 (ko) | 음성 처리 장치 및 이의 작동 방법 | |
WO2022065537A1 (ko) | 자막 동기화를 제공하는 영상 재생 장치 및 그 동작 방법 | |
WO2021080362A1 (ko) | 이어셋을 이용한 언어 처리 시스템 | |
WO2021091063A1 (ko) | 전자장치 및 그 제어방법 | |
WO2019103200A1 (ko) | 통합 음성비서 서비스 제공 방법 및 장치 | |
WO2020138943A1 (ko) | 음성을 인식하는 장치 및 방법 | |
KR20220042009A (ko) | 차량과 통신할 수 있는 음성 처리 장치 및 이의 작동 방법 | |
WO2020055027A1 (ko) | 언어 학습 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22846141 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18580554 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2024503740 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280062878.0 Country of ref document: CN |