WO2021102647A1 - 数据处理方法、装置和存储介质 - Google Patents
数据处理方法、装置和存储介质 Download PDFInfo
- Publication number
- WO2021102647A1 WO2021102647A1 PCT/CN2019/120706 CN2019120706W WO2021102647A1 WO 2021102647 A1 WO2021102647 A1 WO 2021102647A1 CN 2019120706 W CN2019120706 W CN 2019120706W WO 2021102647 A1 WO2021102647 A1 WO 2021102647A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- user
- tone color
- data
- timbre
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 64
- 238000012545 processing Methods 0.000 claims description 56
- 230000015654 memory Effects 0.000 claims description 29
- 238000006243 chemical reaction Methods 0.000 claims description 28
- 238000000605 extraction Methods 0.000 claims description 17
- 230000002996 emotional effect Effects 0.000 claims description 15
- 238000013519 translation Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 25
- 238000004891 communication Methods 0.000 description 19
- 230000008451 emotion Effects 0.000 description 13
- 230000001360 synchronised effect Effects 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000002452 interceptive effect Effects 0.000 description 5
- 230000005291 magnetic effect Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 230000037237 body shape Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
Definitions
- This application relates to terminal technology, in particular to a data processing method, device and storage medium.
- the embodiments of the present application provide a data processing method, device, and storage medium.
- the embodiment of the present application provides a data processing method, including:
- Target tone color template from a tone color template database; using the selected target tone color template to convert the translation text into audio data matching the target tone color;
- the audio data is output.
- the embodiment of the present application also provides a data processing device, including:
- the obtaining unit is configured to obtain the data to be processed
- the first processing unit is configured to translate the first voice data in the to-be-processed data to obtain a translated text; perform image recognition on the first image data in the to-be-processed data to obtain a recognition result;
- the second processing unit is configured to use the translated text and/or the recognition result to determine the target tone color template; select the target tone color template from the tone color template database; use the selected target tone color template to convert the translated text into and Audio data matching the target timbre;
- the output unit is configured to output the audio data.
- the embodiment of the present application further provides a data processing device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the above-mentioned methods when the program is executed. A step of.
- the embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any one of the foregoing methods are implemented.
- the data processing method, device, and storage medium provided by the embodiments of the application obtain the data to be processed; translate the first voice data in the data to be processed to obtain the translated text; and compare the first voice data in the data to be processed Perform image recognition on the image data to obtain the recognition result; use the translated text and/or recognition result to determine the target timbre template; select the target timbre template from the timbre template database; use the selected target timbre template to translate the translation text Converted into audio data matching the target timbre; outputting the audio data, thereby using the target timbre to play the conversation content of the second user using the second terminal to the first user using the first terminal, prompting
- the first user has a strong interest in the content of the conversation of the second user, realizes the voice-changing communication between the second user and the first user, and brings a stronger sense of substitution to the first user Interactive Experience.
- FIG. 1 is a schematic diagram of the implementation process of a data processing method according to an embodiment of the application
- FIG. 2 is a schematic diagram of an implementation process of determining a target tone color template based on the translated text by the first terminal in an embodiment of the application;
- FIG. 3 is a schematic diagram of an implementation process of a first terminal determining a target tone color template based on a recognition result of the first image data according to an embodiment of the application;
- FIG. 4 is a schematic diagram of the implementation process of the first terminal determining the target tone color template based on the recognition result of the translated text and the first image data according to the embodiment of the application;
- FIG. 5 is a schematic diagram 1 of an implementation process of generating audio data matching the target timbre by the first terminal according to an embodiment of the application;
- FIG. 6 is a second schematic diagram of the implementation process of generating audio data matching the target tone color by the first terminal according to the embodiment of the application;
- FIG. 7 is a third schematic diagram of the implementation process of generating audio data matching the target tone color by the first terminal according to the embodiment of the application.
- FIG. 8 is a fourth schematic diagram of the implementation process of generating audio data matching the target tone color by the first terminal according to the embodiment of the application;
- FIG. 9 is a schematic diagram 5 of the implementation process of generating audio data matching the target tone color by the first terminal according to the embodiment of the application.
- FIG. 10a is a schematic diagram of an implementation process of the first terminal playing the conversation content of the second user through the target timbre according to the embodiment of the application;
- FIG. 10b is a schematic diagram of another implementation process for the first terminal to play the conversation content of the second user through the target timbre according to the embodiment of this application;
- FIG. 11 is a schematic diagram of interactive communication between a first user and a second user according to an embodiment of the application
- FIG. 12 is a schematic diagram 1 of the structure of the data processing device according to an embodiment of the application.
- FIG. 13 is a second schematic diagram of the structure of the data processing device according to an embodiment of the application.
- the performance culture has become more and more popular, showing a trend of globalization, and gradually entering the public life in a specific form, such as the performance of the two-dimensional character.
- the user can only see the body shape of the performer, but cannot hear the performer communicate with the audience through the personalized tone, and thus cannot bring the audience a more immersive interactive experience. Lead to poor performance.
- performers can wear doll-shaped costumes such as Mickey Mouse and Donald Duck for role performances, and interact with the audience watching the performance.
- the client can collect the performer’s audio and collect the The audio is sent to the server, and the server obtains the recognized text by recognizing the audio data, and then translates the recognized text to obtain the translation result; sends the translation result to the client, and broadcasts the voice through the headset device to achieve The interaction between the performer and the audience watching the performance, but the performer cannot use personalized tones such as Mickey Mouse, Donald Duck, or a celebrity such as Andy Lau to achieve voice-changing communication with the audience, thus failing to bring the audience a sense of substitution A stronger interactive experience, resulting in poor performance.
- performers are even more unable to achieve cross-language communication with audiences of different mother tongues with personalized timbre.
- the first terminal obtains the data to be processed; translates the first voice data in the data to be processed to obtain the translated text; and compares the first voice data in the data to be processed Perform image recognition on image data to obtain the recognition result; use the translated text and/or recognition result to determine the target tone color template; select the target tone color template from the tone color template database; use the selected target tone color template to translate the
- the text is converted into audio data matching the target timbre (which may include online conversion and offline conversion); the audio data is output, so that the target timbre is used to play the conversation content of the second user using the second terminal to the user
- the first user of the first terminal prompts the first user to have a strong interest in the content of the conversation of the second user, so as to realize a voice-changing communication between the second user and the first user.
- FIG. 1 is a schematic diagram of the implementation flow of the data processing method according to the embodiment of the application; as shown in FIG. 1, the method includes:
- Step 101 The first terminal obtains data to be processed.
- the data to be processed includes: first voice data and first image data.
- the first voice data includes voice data generated by the second user when the first user who uses the first terminal interacts with the second user who uses the second terminal.
- the first image data includes image data of clothing worn by the second user when the second user interacts with the second user.
- the specific types of the first terminal and the second terminal may not be limited in this application. For example, they may be smart phones, personal computers, notebook computers, tablet computers, and portable wearable devices.
- the following describes how the first terminal obtains the to-be-processed data.
- the second terminal may be provided with or connected to a voice collection module, such as a microphone, through which the voice of the second user is collected to obtain the first voice data;
- the second terminal establishes communication with the first terminal, and transmits the collected first voice data to the first terminal through a wireless transmission module.
- the wireless transmission module may be a Bluetooth module, a wireless fidelity (WiFi, Wireless Fidelity) module, or the like.
- the second user when the second user interacts with the second user through role-playing, the second user initiates a dialogue with the currently popular music and songs, and the second terminal uses voice
- the collection module collects the voice of the second user to obtain first voice data; the second terminal establishes communication with the first terminal, and sends the first voice data to the first terminal through the wireless transmission module.
- the second terminal uses a voice collection module to collect the second user’s Voice, obtain the first voice data; the second terminal establishes communication with the first terminal, and sends the first voice data to the first terminal.
- the second terminal may be provided with or connected to an image acquisition module, such as a camera, through the video acquisition module to perform image acquisition on the clothing worn by the second user to obtain the first image Data; the second terminal establishes communication with the first terminal, and transmits the collected first image data to the first terminal through a wireless transmission module.
- an image acquisition module such as a camera
- the second terminal uses image collection
- the module collects the clothing worn by the second user to obtain the first image data; the second terminal establishes communication with the first terminal, and sends the first language image data to the first terminal through the wireless transmission module.
- the second terminal uses the image acquisition module to collect the second user
- the first image data is obtained for the worn clothing; the second terminal establishes communication with the first terminal, and sends the first image data to the first terminal through the wireless transmission module.
- the second terminal sends the first voice data of the second user to the first terminal, and the first terminal may subsequently perform translation processing on the first voice data, thereby helping to use the first voice data.
- a user understands the conversation content of the second user in a language familiar to him, thereby promoting smoother communication between the first user and the second user.
- the second terminal sends the first voice data of the second user to the first terminal, and the first terminal can subsequently determine a target tone color template according to the content of the second user’s conversation, and pass the target
- the timbre plays the content of the second user's speech to the first user, thereby helping the first user to deeply understand the content of the second user's speech.
- the second terminal sends the first image data of the clothing worn by the second user to the first terminal, and the first terminal may subsequently determine the target tone color template according to the clothing worn by the second user , And play the conversation content of the second user to the first user through the target timbre, so as to arouse the first user's interest in the conversation content of the second user.
- Step 102 The first terminal translates the first voice data in the to-be-processed data to obtain a translated text; performs image recognition on the first image data in the to-be-processed data to obtain a recognition result; State the translated text and/or recognition results, and determine the target tone color template.
- the translating the first voice data in the to-be-processed data to obtain the translated text includes:
- the translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.
- performing image recognition on the first image data in the to-be-processed data to obtain the recognition result includes:
- performing image preprocessing on the first image data includes: performing data enhancement and normalization on the first image data.
- the following describes how the first terminal determines the target tone color template.
- the determination of the target tone color template by the first terminal may specifically include the following situations:
- the first terminal determines a target tone color template based on the translated text corresponding to the first voice data
- the first terminal determines the target tone color template based on the recognition result corresponding to the first image data
- the first terminal determines a target tone color template based on the translated text and the recognition result in combination with the selection of the first user.
- the second terminal sends the first voice data of the second user to the first terminal.
- the first terminal can determine the target tone color template according to the conversation content of the second user.
- the use of the translated text to determine the target tone color template includes: searching for a first text corresponding to a preset character string from the translated text; when searching from the translated text When the first text corresponding to the preset character string is reached, the target tone color template is determined based on the first text.
- the recognition text corresponding to the dialogue initiated by the second user in response to the Huawei incident is "Ren Zhengfei is a great Entrepreneur”
- the first text corresponding to the preset character string is "Ren Zhengfei”
- the first text corresponding to the preset character string can be searched in the recognized text "Ren Zhengfei is a great entrepreneur”
- Ren Zhengfei The tone color is the target tone color template.
- FIG. 2 a schematic diagram of the implementation process of the first terminal determining the target tone color template based on the translated text is described, as shown in FIG. 2, including:
- Step 1 The first terminal searches for the first text corresponding to the preset character string from the translated text;
- Step 2 When the first text corresponding to the preset character string is searched from the translated text, the target tone color template is determined based on the first text.
- a certain character included in the content of the second user’s conversation is used to determine a target tone color template, which can stimulate The first user is extremely interested in the conversation content initiated by the second user, and subsequently, the timbre of the character can be used to play the conversation content of the second user to the first user to achieve voice-changing communication.
- the second terminal sends the first image data of the clothing worn by the second user to the first terminal, so that the first terminal can determine the target according to the clothing worn by the second user Tone color template.
- the use of the recognition result to determine the target tone color template includes: determining whether the recognition result indicates that the first image corresponding to the first image data matches a preset image; When the recognition result indicates that the first image corresponding to the first image data matches the preset image, the target tone color template is determined based on the first image.
- the second user when the second user interacts with the first user through role-playing, the second user is wearing a costume in the shape of Mickey Mouse, assuming that the preset image corresponds to The clothing is Mr. Mi’s clothing. Since the first image corresponding to the first image data matches the preset image, the Mickey Mouse tone is determined as the target tone template.
- FIG. 3 a schematic diagram of the realization process of the first terminal determining the target tone color template based on the recognition result of the first image data is described, as shown in FIG. 3, including:
- Step 1 The first terminal judges whether the recognition result of the first image data indicates that the first image corresponding to the first image data matches a preset image.
- the clothing corresponding to the preset image is a suit, Mickey Mouse clothing, Tang teacher clothing, and so on.
- the service worn by the second user is a suit, that is, the clothing corresponding to the first image is a suit.
- Step 2 When the recognition result indicates that the first image corresponding to the first image data matches the preset image, determine the target tone color template based on the first image.
- Step 3 When the recognition result indicates that the first image corresponding to the first image data does not match the preset image, the timbre template set as the default is used as the target timbre template.
- the clothing worn by the second user may be used to determine the target tone color template.
- the first user can be inspired Have great interest in the content of the conversation initiated by the second user, and subsequently can use the timbre corresponding to the clothing worn by the second user to play the content of the conversation of the second user to the first user, To achieve the effect of vocal unity.
- the second terminal sends the first voice data and first image data of the second user to the first terminal.
- the first terminal can be based on the clothing worn by the second user.
- the content of the conversation of the second user and the selection of the first user determine a target tone color template.
- the first terminal determines the target tone color template based on the recognition result of the translated text and the first image data, as shown in FIG. 4, including:
- Step 1 The first terminal judges whether the translated text contains the first text corresponding to the preset character string; and judges whether the recognition result of the first image data represents the first image corresponding to the first image data and the preset image match.
- Step 2 When the first text corresponding to the preset character string is searched from the translated text and the recognition result indicates that the first image corresponding to the first image data matches the preset image, a prompt message is displayed .
- the prompt information is used to prompt the user to select a desired tone color template from the tone color template corresponding to the first text and the tone color template corresponding to the first image.
- a list of tone color templates may also be displayed on the display interface, where the list of tone color templates is used by the user to select the desired tone color template.
- Step 3 Receive a first operation for the prompt information; in response to the first operation, use the timbre template selected by the user as the target timbre template.
- the target timbre template may be determined according to the timbre template selected by the user.
- the first user can be motivated to The content of the conversation initiated by the second user generates great interest, and the timbre selected by the first user can subsequently be used to play the content of the conversation of the second user to the first user to improve the first user’s Satisfaction.
- the tone color template set as the default is used as the target tone color template.
- Step 103 The first terminal selects the target tone color template from the tone color template database, and uses the selected target tone color template to convert the translated text into audio data matching the target tone color.
- the first terminal before the first terminal selects the target tone color template from the tone color template database, it also needs to establish a tone color template database.
- the method further includes:
- Input the training data in the input layer of the convolutional neural network, and perform input-to-output mapping on the training data in at least one feature extraction layer of the convolutional neural network to obtain at least two feature data;
- a tone color template database is generated.
- a piece of voice data may refer to a user's voice data collected after authorization by the user.
- the use of the convolutional neural network can quickly collect the timbre of different users, and by cloning the user’s timbre, personalized timbre templates of multiple classic characters, such as the timbre templates of Mickey Mouse characters, and celebrities can be obtained.
- said using the selected target timbre template to convert the translated text into audio data matching the target timbre includes:
- the selected target timbre template is used to convert the translated text into audio data matching the target timbre.
- the target language of the receiver can be determined according to the audio of the receiver of the audio data; it can also be determined according to the text information input by the receiver of the audio data.
- the method further includes:
- the converted text is converted into audio data matching the target tone color.
- the following describes how the first terminal uses the target tone color template to generate audio data.
- the first terminal uses the selected target tone color template to generate audio data in combination with the translated text, which may specifically include the following situations:
- the target timbre template is used to perform text-to-speech (TTS, Text To Speech) conversion on the translated text to obtain audio data.
- TTS Text To Speech
- the target timbre template is used in combination with the intonation of the second user to perform TTS conversion on the translated text to obtain audio data.
- the target timbre template is used in combination with the intonation and emotion of the second user to perform TTS conversion on the translated text to obtain audio data.
- the target timbre template is used to perform TTS conversion on the translated text in combination with the intonation, emotion, and speech rate of the second user to obtain audio data.
- multiple target tone color templates are used to perform TTS conversion on the translated text to obtain audio data.
- said using the selected target timbre template to convert the translation text into audio data matching the target timbre includes: the first terminal uses the selected target timbre template to perform the translation on the translation text. Text-to-speech (TTS, Text To Speech) TTS conversion generates audio data matching the target timbre.
- TTS Text-to-speech
- a target timbre is taken as an example to describe a schematic diagram of the realization process of the first terminal generating audio data matching the target timbre, as shown in FIG. 5, including:
- Step 1 The first terminal determines the target tone color template.
- the recognition text corresponding to the dialogue initiated by the second user to the first user in response to the Huawei incident is "Ren Zhengfei is a great entrepreneur", and the first terminal uses Ren Zhengfei's timbre template as the target timbre based on the translated text corresponding to the recognition text template.
- Step 2 The first terminal uses the target tone color template to perform TTS conversion on the translated text to generate audio data matching the target tone color.
- the first terminal broadcasts the conversation content of the second user "Ren Zhengfei is a great entrepreneur" to the first user through Ren Zhengfei's tone, thereby inspiring the first user to have a strong sense of the second user's conversation content interest.
- the method when generating audio data matching the target timbre, the method further includes: the first terminal performs feature extraction on the first voice data to obtain intonation features; and using the selected target timbre template , In combination with the intonation feature, perform TTS conversion on the translated text to generate audio data matching the target timbre.
- the intonation feature may characterize the priority of the second user's voice.
- performing intonation feature extraction on the first speech data to obtain the intonation feature includes: extracting the fundamental frequency value of the voiced segment from the first speech data using an autocorrelation method; The silent and unvoiced segments in the speech data are interpolated to finally obtain the fundamental frequency curve; the fundamental frequency curve is fitted to obtain a continuous smooth audio curve; the obtained continuous smooth audio curve is logarithmic and filtered to obtain Intonation characteristics.
- the first terminal may use the target tone determined based on the clothing worn by the second user to combine the second user’s intonation to convert the The conversation content of the second user is played to the first user.
- the second user’s conversation content is played to the first user by combining the target timbre determined based on the second user’s conversation content and the second user’s intonation. In this way, the first user not only There will be great interest in the content of the conversation of the second user, and a sense of intimacy with the second user himself.
- a schematic diagram of the realization process of the first terminal generating audio data matching the target timbre is described, as shown in Fig. 6, including:
- Step 1 The first terminal determines the target tone color template.
- the first terminal uses Ren Zhengfei's tone color template as a target tone color template based on the translated text corresponding to the recognized text.
- Step 2 The first terminal performs feature extraction on the first voice data of the second user to obtain intonation features.
- Step 3 The first terminal uses the target tone color template to perform TTS conversion on the translated text in combination with the intonation feature to generate audio data matching the target tone color.
- the first terminal broadcasts the conversation content of the second user "Ren Zhengfei is a great entrepreneur" to the first user through the tone of Ren Zhengfei and the tone of the second user.
- the method when generating audio data matching the target timbre, further includes: performing feature extraction on the first voice data by the first terminal to obtain emotional features; and using the selected target timbre template And generating audio data matching the target timbre by combining the emotional feature and the intonation feature.
- the emotional feature may represent the emotions generated by the second user during the conversation, such as anger, fear, sadness, and the like.
- performing emotional feature extraction on the first voice data to obtain emotional features may include: extracting formant features from the voice data; and identifying the user's emotional features based on the extracted formant features.
- the first terminal may determine the target timbre based on the clothing worn by the second user, combined with the tone and emotion of the second user.
- the content of the second user’s conversation is played to the first user; or, the second user’s voice is combined with the tone and emotion of the second user through a target tone determined based on the content of the second user’s conversation.
- the content of the conversation is played to the first user, so that the first user will not only have great interest in the content of the conversation of the second user, but also become curious about the second user himself.
- FIG. 7 a schematic diagram of the realization process of the first terminal generating audio data matching the target timbre is described, as shown in FIG. 7, including:
- Step 1 The first terminal determines the target tone color template.
- the first terminal uses Ren Zhengfei's tone color template as a target tone color template based on the translated text corresponding to the recognized text.
- Step 2 The first terminal performs feature extraction on the first voice data of the second user to obtain intonation and emotional features.
- Step 3 The first terminal uses the target tone color template to perform TTS conversion on the translated text in combination with the intonation and emotional characteristics to generate audio data matching the target tone color.
- the first terminal broadcasts the conversation content of the second user "Ren Zhengfei is a great entrepreneur" to the first user through Ren Zhengfei's timbre, combined with the tone and emotion of the second user.
- the method when generating audio data matching the target timbre, the method further includes:
- the speaking rate feature represents the amount of vocabulary spoken by the second user in a unit time.
- the first terminal performs feature extraction on the first voice data of the second user, and the process of obtaining the speech rate feature includes: counting the vocabulary amount in a unit time based on the first voice data; Vocabulary, obtain the characteristics of speaking speed.
- the first terminal may combine the tone, emotion, and tone of the second user with the target tone determined based on the clothing worn by the second user.
- the speech rate plays the conversation content of the second user to the first user; or, through the target tone determined based on the conversation content of the second user, combined with the second user’s intonation, emotion, and speech rate
- the content of the conversation of the second user is played to the first user, so that the first user will not only have a great interest in the content of the conversation of the second user, but also of the second user himself. Be curious.
- FIG. 8 a schematic diagram of the realization process of generating audio data matching the target timbre by the first terminal is described, as shown in FIG. 8, including:
- Step 1 The first terminal determines the target tone color template.
- the first terminal uses Ren Zhengfei's tone color template as a target tone color template based on the translated text corresponding to the recognized text.
- Step 2 The first terminal performs feature extraction on the first voice data of the second user to obtain features of intonation, emotion, and speech rate.
- Step 3 The first terminal uses the target tone color template to perform TTS conversion on the translated text in combination with the characteristics of intonation, emotion, and speech rate to generate audio data matching the target tone color.
- the first terminal broadcasts the conversation content of the second user "Ren Zhengfei is a great entrepreneur" to the first user through Ren Zhengfei's timbre, combined with the intonation, emotion, and speech speed of the second user.
- the number of target timbre templates determined by the first terminal according to the recognition text of the second user is at least two; when generating audio data matching the target timbre, the method further includes: The first terminal segments the translated text according to the number of target timbre templates to obtain at least two paragraphs; using the at least two target timbre templates to perform TTS conversion on the at least two paragraphs to obtain at least Two audio segments; splicing the at least two audio segments to obtain audio data.
- the first terminal may determine a plurality of target timbres based on the conversation content of the second user, and use the plurality of target timbres to transfer the first terminal Second, the conversation content of the user is played to the first user. In this way, the first user will be extremely curious about the conversation content of the second user.
- FIG. 9 a schematic diagram of the implementation process of the first terminal generating audio data matching the target timbre is described, as shown in FIG. 9, including:
- Step 1 The first terminal determines multiple target tone color templates.
- the first terminal uses the tone color templates of Ren Zhengfei and Andy Lau as the target tone color templates based on the translated text corresponding to the recognized text.
- Step 2 The first terminal translates the first voice data of the second user to obtain a translated text, and segments the translated text to obtain two paragraphs.
- Step 3 The first terminal uses two target tone color templates to perform TTS conversion on the two paragraphs to generate two audio clips;
- Step 4 Splicing the two audio clips to obtain audio data.
- Step 104 Output the audio data.
- the first terminal may play audio matching the target tone color through an audio output module.
- the audio output module may be implemented by a speaker of the first terminal.
- the first terminal when the second user interacts with the first user, the first terminal does not use the timbre of the second user to play the conversation content of the second user to the first user. Instead, the user plays the conversation content of the second user to the first user through a target tone determined based on the clothing worn by the second user.
- the first terminal When the second user interacts with the first user, the first terminal does not use the timbre of the second user to play the conversation content of the second user to the first user, but The conversation content of the second user is played to the first user through a target timbre determined based on the conversation content of the second user.
- the first terminal When the second user interacts with the first user, the first terminal does not use the timbre of the second user to play the conversation content of the second user to the first user, but The content of the conversation of the second user is played to the first user based on the target timbre selected by the first user.
- FIG. 10a a schematic diagram of the implementation process of the first terminal playing the second user's conversation content through the target timbre is described, as shown in FIG. 10a, including:
- Step 1 The second terminal sends the first voice data and the first image data of the second user to the first terminal.
- the second terminal uses a microphone to collect the audio of the second user such as "hello, where are you from?" to obtain the first voice data, and uses a camera to collect the Mickey Mouse worn by the second user, Obtain the first image data; send the first voice data and the first image data to the first terminal through a wireless transmission module.
- Step 2 The first terminal translates the first voice data to obtain a translated text; performs image recognition on the first image data to obtain a recognition result.
- Step 3 The first terminal uses the translated text and/or the recognition result to determine a target tone color template.
- Step 4 The first terminal selects the target tone color template from the tone color template database, and uses the selected target tone color template to convert the translated text into audio data matching the target tone color.
- Step 5 The first terminal outputs the audio data.
- the first terminal broadcasts the translated text "children, where are you from” corresponding to the conversation content of the second user to the first user through the Mickey Mouse timbre.
- FIG. 10b a schematic diagram of the implementation process of the first terminal playing the conversation content of the second user through the target timbre is described, as shown in FIG. 10b, including:
- Step 1 The second terminal sends the first voice data and the first image data of the second user to the first terminal.
- the second terminal uses a microphone to collect the audio of the second user, such as "hello, where are you from?", to obtain the first voice data; and uses a camera to collect the Mickey Mouse worn by the second user to obtain the first image data ; Send the first voice data and the first image data to the first terminal through a wireless transmission module.
- Step 2 The first terminal translates the first voice data to obtain a translated text; performs image recognition on the first image data to obtain a recognition result.
- Step 3 The first terminal uses the translated text and/or the recognition result to determine a target tone color template.
- Step 4 The first terminal selects the target tone color template from the tone color template database, and sends the translated text and the target tone color template to the server.
- the server uses the target tone color template to convert the translated text into audio data matching the target tone color; and returns the audio data to the first terminal.
- Step 5 The first terminal receives and outputs audio data sent by the server.
- the first terminal plays the translated text "children, where are you from” corresponding to the conversation content of the second user to the first user through the Mickey Mouse timbre.
- the first terminal may convert the translated text into audio data matching the target timbre; or the server may convert the translated text into audio data matching the target timbre, and support online conversion and Offline conversion, the implementation method is more flexible.
- the first terminal receives the data to be processed sent by the second terminal; translates the first voice data in the data to be processed to obtain the translated text; Perform image recognition on the first image data to obtain the recognition result; use the translated text and/or recognition result to determine the target timbre template; select the target timbre template from the timbre template database; use the selected target timbre template to compare all
- the translated text is converted into audio data matching the target timbre (can include online conversion and offline conversion); the audio data is output, so that the target timbre can be used to play the conversation content of the second user using the second terminal
- the first user is encouraged to have a strong interest in the content of the conversation of the second user, so as to realize the voice-changing communication between the second user and the first user.
- FIG. 12 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 12, the data processing device includes:
- the obtaining unit 121 is configured to obtain data to be processed
- the first processing unit 122 is configured to translate the first voice data in the to-be-processed data to obtain a translated text; perform image recognition on the first image data in the to-be-processed data to obtain a recognition result;
- the second processing unit 123 is configured to use the translated text and/or the recognition result to determine the target tone color template; select the target tone color template from the tone color template database; use the selected target tone color template to convert the translated text to the target tone Audio data with timbre matching;
- the output unit 124 is configured to output the audio data.
- the first processing unit 122 is configured to perform voice recognition on the first voice data by using a voice recognition technology to obtain recognized text;
- the translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.
- the first processing unit 122 is configured to perform image preprocessing on the first image data in the to-be-processed data to obtain preprocessed first image data;
- Image recognition technology is used to perform image recognition on the extracted feature data, and the recognition result is obtained.
- the image preprocessing includes: performing data enhancement and normalization on the first image data.
- the second processing unit 123 is configured to search for a first text corresponding to a preset character string from the translated text
- the target tone color template is determined based on the first text.
- the second processing unit 123 is configured to determine whether the recognition result indicates that the first image corresponding to the first image data matches a preset image
- the target tone color template is determined based on the first image.
- the second processing unit 123 is configured to obtain the target language of the receiver of the audio data
- the selected target timbre template is used to convert the translated text into audio data matching the target timbre.
- the second processing unit 123 is configured to convert the translated text into text matching the target language when the language corresponding to the translated text and the target language belong to different languages; Using the selected target tone color template, the converted text is converted into audio data matching the target tone color.
- the device further includes:
- a generating unit configured to collect at least two voice data by the first terminal; use the at least two voice data as training data;
- Input the training data in the input layer of the convolutional neural network, and perform input-to-output mapping on the training data in at least one feature extraction layer of the convolutional neural network to obtain at least two feature data;
- a tone color template database is generated.
- the second processing unit 123 is configured to:
- TTS conversion is performed on the translated text to generate audio data matching the target tone color.
- the second processing unit 123 is configured to:
- TTS conversion is performed on the translated text to generate audio data matching the target timbre.
- the second processing unit 123 is configured to:
- the selected target timbre template is combined with the emotional characteristics and intonation characteristics to generate audio data matching the target timbre.
- the second processing unit 123 is configured to:
- the second processing unit 123 is configured to:
- the first terminal segments the translated text according to the number of target timbre templates to obtain at least two paragraphs;
- the at least two audio segments are spliced to obtain audio data.
- the acquisition unit 121 and the output unit 124 can be implemented by the communication interface in the data processing device; the first processing unit 122 and the second processing unit 123 can be implemented by the data processing device Processor implementation.
- the device provided in the above embodiment performs data processing
- only the division of the above-mentioned program modules is used as an example.
- the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal
- the internal structure is divided into different program modules to complete all or part of the processing described above.
- the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
- FIG. 13 is a schematic diagram of the hardware composition structure of the data processing device of the embodiment of the application, as shown in FIG. 13,
- the data processing device 130 includes a memory 133, a processor 132, and a computer program that is stored on the memory 133 and can run on the processor 132; the processor 132 located in the data processing device implements one or the above data processing device side when the program is executed.
- the processor 132 located in the data processing device 130 executes the program, it realizes: receiving the to-be-processed data sent by the second terminal; the to-be-processed data is obtained by the second terminal; Translate to obtain the translated text; and select a target tone color template from the tone color template database; use the selected target tone color template to convert the translated text into audio data matching the target tone color; output the audio data.
- the processor 132 located in the data processing device 130 executes the program, it realizes: acquiring a target language matching the language of the recipient of the audio data;
- the selected target timbre template is used to convert the translated text into audio data matching the target timbre.
- the processor 132 located in the data processing device 130 executes the program, it realizes: acquiring a target language matching the language of the recipient of the audio data;
- the converted text is converted into audio data matching the target tone color.
- the processor 132 located in the data processing device 130 executes the program, it realizes: collecting at least one voice data;
- a tone color template database is generated.
- the processor 132 located in the data processing device 130 executes the program, it realizes that: using the selected target timbre template, perform a text-to-speech TTS conversion on the translated text to generate audio data matching the target timbre.
- the processor 132 located in the data processing device 130 executes the program, it realizes: sending the translated text and the target timbre template to the data processing device; the translated text and the target timbre template are used for the data processing device to respond to the TTS conversion of the translated text to generate audio data matching the target timbre;
- the processor 132 located in the data processing device 130 executes the program, it is realized that the audio data is synchronously output as the data to be processed is acquired.
- the data processing device further includes a communication interface 131; various components in the data processing device are coupled together through the bus system 134. It can be understood that the bus system 134 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 134 also includes a power bus, a control bus, and a status signal bus.
- the memory 133 in this embodiment may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
- the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
- RAM random access memory
- SRAM static random access memory
- SSRAM synchronous static random access memory
- Synchronous Static Random Access Memory Synchronous Static Random Access Memory
- DRAM Dynamic Random Access Memory
- SDRAM Synchronous Dynamic Random Access Memory
- DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- ESDRAM Enhanced Synchronous Dynamic Random Access Memory
- SLDRAM synchronous connection dynamic random access memory
- DRRAM Direct Rambus Random Access Memory
- the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
- the method disclosed in the foregoing embodiments of the present application may be applied to the processor 132 or implemented by the processor 132.
- the processor 132 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 132 or instructions in the form of software.
- the aforementioned processor 132 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
- the processor 132 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as execution and completion by a hardware decoding processor, or execution and completion by a combination of hardware and software modules in the decoding processor.
- the software module may be located in a storage medium, and the storage medium is located in a memory.
- the processor 132 reads information in the memory and completes the steps of the foregoing method in combination with its hardware.
- the embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
- a storage medium which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
- Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the data processing device side when the computer instructions are executed by a processor.
- the disclosed method and smart device can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
- the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
- the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
- a person of ordinary skill in the art can understand that all or part of the steps of the above method embodiments can be implemented by a program instructing relevant hardware.
- the foregoing program can be stored in a computer readable storage medium, and when the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
- the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
- the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a data processing device, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (12)
- 一种数据处理方法,包括:获取待处理数据;对所述待处理数据中的第一语音数据进行翻译,得到翻译文本;并对所述待处理数据中的第一图像数据进行图像识别,得到识别结果;利用所述翻译文本和/或识别结果,确定目标音色模板;从音色模板数据库中选取所述目标音色模板,利用选取的目标音色模板,将所述翻译文本转换为与目标音色匹配的音频数据;输出所述音频数据。
- 根据权利要求1所述的方法,其中,所述利用选取的目标音色模板,将所述翻译文本转换为与目标音色匹配的音频数据,包括:获取所述音频数据的接收者的目标语言;判断所述翻译文本对应的语言是否与所述目标语言属于同一语种;当确定所述翻译文本对应的语言与所述目标语言属于同一语种时,利用选取的目标音色模板,将所述翻译文本转换为与目标音色匹配的音频数据。
- 根据权利要求2所述的方法,其中,所述方法还包括:当所述翻译文本对应的语言与所述目标语言属于不同语种时,将所述翻译文本转换为与所述目标语言匹配的文本;利用选取的目标音色模板,将转换得到的文本转换为与目标音色匹配的音频数据。
- 根据权利要求1至3任一项所述的方法,其中,所述利用所述翻译文本,确定目标音色模板,包括:从所述翻译文本中搜索与预设字符串对应的第一文本;当从所述翻译文本中搜索到与预设字符串对应的所述第一文本时,基于所述第一文本,确定所述目标音色模板。
- 根据权利要求1至3任一项所述的方法,其中,所述利用所述识别结果,确定目标音色模板,包括:判断所述识别结果是否表征第一图像数据对应的第一图像与预设图像相匹配;当所述识别结果表征第一图像数据对应的第一图像与预设图像相匹配时,基于所述第一图像,确定所述目标音色模板。
- 根据权利要求1所述的方法,其中,所述利用选取的目标音色模板,将所述翻译文本转换为与目标音色匹配的音频数据,包括:利用选取的目标音色模板,对所述翻译文本进行文本到语音TTS转换,生成与所述目标音色匹配的音频数据。
- 根据权利要求6所述的方法,其中,生成与所述目标音色匹配的音频数据时,所述方法还包括:对所述第一语音数据进行特征提取,得到语调特征;利用选取的目标音色模板,结合所述语调特征,对所述翻译文本进行TTS转换,生成与所述目标音色匹配的音频数据。
- 根据权利要求7所述的方法,其中,生成与所述目标音色匹配的音频数据时,所述方法还包括:对所述第一语音数据进行特征提取,得到情感特征;利用选取的目标音色模板,结合所述情感特征和语调特征,生成与所述目标音色匹 配的音频数据。
- 根据权利要求8所述的方法,其中,生成与所述目标音色匹配的音频数据时,所述方法还包括:对所述第一语音数据进行特征提取,得到语速特征;利用选取的目标音色模板,结合所述语速特征、所述情感特征和所述语调特征,生成与所述目标音色匹配的音频数据。
- 一种数据处理装置,包括:获取单元,配置为获取待处理数据;第一处理单元,配置为对所述待处理数据中的第一语音数据进行翻译,得到翻译文本;并对所述待处理数据中的第一图像数据进行图像识别,得到识别结果;第二处理单元,配置为利用所述翻译文本和/或识别结果,确定目标音色模板;从音色模板数据库中选取所述目标音色模板;利用选取的目标音色模板,将所述翻译文本转换为与目标音色匹配的音频数据;输出单元,配置为输出所述音频数据。
- 一种数据处理装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至9任一项所述方法的步骤。
- 一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现权利要求1至9任一项所述方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/120706 WO2021102647A1 (zh) | 2019-11-25 | 2019-11-25 | 数据处理方法、装置和存储介质 |
CN201980100970.XA CN114514576A (zh) | 2019-11-25 | 2019-11-25 | 数据处理方法、装置和存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/120706 WO2021102647A1 (zh) | 2019-11-25 | 2019-11-25 | 数据处理方法、装置和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021102647A1 true WO2021102647A1 (zh) | 2021-06-03 |
Family
ID=76129013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/120706 WO2021102647A1 (zh) | 2019-11-25 | 2019-11-25 | 数据处理方法、装置和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114514576A (zh) |
WO (1) | WO2021102647A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783403A (zh) * | 2022-02-18 | 2022-07-22 | 腾讯科技(深圳)有限公司 | 有声读物的生成方法、装置、设备、存储介质及程序产品 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130338997A1 (en) * | 2007-03-29 | 2013-12-19 | Microsoft Corporation | Language translation of visual and audio input |
CN107992485A (zh) * | 2017-11-27 | 2018-05-04 | 北京搜狗科技发展有限公司 | 一种同声传译方法及装置 |
CN108231062A (zh) * | 2018-01-12 | 2018-06-29 | 科大讯飞股份有限公司 | 一种语音翻译方法及装置 |
CN109543021A (zh) * | 2018-11-29 | 2019-03-29 | 北京光年无限科技有限公司 | 一种面向智能机器人的故事数据处理方法及系统 |
CN109658916A (zh) * | 2018-12-19 | 2019-04-19 | 腾讯科技(深圳)有限公司 | 语音合成方法、装置、存储介质和计算机设备 |
CN110401671A (zh) * | 2019-08-06 | 2019-11-01 | 董玉霞 | 一种同传翻译系统及同传翻译终端 |
CN110415680A (zh) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | 一种同声传译方法、同声传译装置以及一种电子设备 |
-
2019
- 2019-11-25 WO PCT/CN2019/120706 patent/WO2021102647A1/zh active Application Filing
- 2019-11-25 CN CN201980100970.XA patent/CN114514576A/zh active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130338997A1 (en) * | 2007-03-29 | 2013-12-19 | Microsoft Corporation | Language translation of visual and audio input |
CN107992485A (zh) * | 2017-11-27 | 2018-05-04 | 北京搜狗科技发展有限公司 | 一种同声传译方法及装置 |
CN108231062A (zh) * | 2018-01-12 | 2018-06-29 | 科大讯飞股份有限公司 | 一种语音翻译方法及装置 |
CN110415680A (zh) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | 一种同声传译方法、同声传译装置以及一种电子设备 |
CN109543021A (zh) * | 2018-11-29 | 2019-03-29 | 北京光年无限科技有限公司 | 一种面向智能机器人的故事数据处理方法及系统 |
CN109658916A (zh) * | 2018-12-19 | 2019-04-19 | 腾讯科技(深圳)有限公司 | 语音合成方法、装置、存储介质和计算机设备 |
CN110401671A (zh) * | 2019-08-06 | 2019-11-01 | 董玉霞 | 一种同传翻译系统及同传翻译终端 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783403A (zh) * | 2022-02-18 | 2022-07-22 | 腾讯科技(深圳)有限公司 | 有声读物的生成方法、装置、设备、存储介质及程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN114514576A (zh) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Czyzewski et al. | An audio-visual corpus for multimodal automatic speech recognition | |
CN108806656B (zh) | 歌曲的自动生成 | |
WO2021083071A1 (zh) | 语音转换、文件生成、播音、语音处理方法、设备及介质 | |
JP6876752B2 (ja) | 応答方法及び装置 | |
CN108806655B (zh) | 歌曲的自动生成 | |
CN106898340B (zh) | 一种歌曲的合成方法及终端 | |
US20200126566A1 (en) | Method and apparatus for voice interaction | |
WO2020253509A1 (zh) | 面向情景及情感的中文语音合成方法、装置及存储介质 | |
JP2019057273A (ja) | 情報をプッシュする方法及び装置 | |
JP2023022150A (ja) | 双方向音声翻訳システム、双方向音声翻訳方法及びプログラム | |
CN110675886B (zh) | 音频信号处理方法、装置、电子设备及存储介质 | |
WO2019242414A1 (zh) | 语音处理方法、装置、存储介质及电子设备 | |
CN112840396A (zh) | 用于处理用户话语的电子装置及其控制方法 | |
CN109543021B (zh) | 一种面向智能机器人的故事数据处理方法及系统 | |
KR20200027331A (ko) | 음성 합성 장치 | |
JP2004101901A (ja) | 音声対話装置及び音声対話プログラム | |
TW202018696A (zh) | 語音識別方法、裝置及計算設備 | |
JP2023527473A (ja) | オーディオ再生方法、装置、コンピュータ可読記憶媒体及び電子機器 | |
CN116917984A (zh) | 交互式内容输出 | |
CN116312471A (zh) | 语音迁移、语音交互方法、装置、电子设备及存储介质 | |
CN114283820A (zh) | 多角色语音的交互方法、电子设备和存储介质 | |
WO2021102647A1 (zh) | 数据处理方法、装置和存储介质 | |
US11790913B2 (en) | Information providing method, apparatus, and storage medium, that transmit related information to a remote terminal based on identification information received from the remote terminal | |
JP7333371B2 (ja) | 話者分離基盤の自動通訳方法、話者分離基盤の自動通訳サービスを提供するユーザ端末、及び、話者分離基盤の自動通訳サービス提供システム | |
KR102584436B1 (ko) | 화자분리 기반 자동통역 서비스를 제공하는 시스템, 사용자 단말 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19954459 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19954459 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.11.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19954459 Country of ref document: EP Kind code of ref document: A1 |