WO2021119908A1 - Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement - Google Patents

Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement Download PDF

Info

Publication number
WO2021119908A1
WO2021119908A1 PCT/CN2019/125606 CN2019125606W WO2021119908A1 WO 2021119908 A1 WO2021119908 A1 WO 2021119908A1 CN 2019125606 W CN2019125606 W CN 2019125606W WO 2021119908 A1 WO2021119908 A1 WO 2021119908A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
command
speech
command word
word
Prior art date
Application number
PCT/CN2019/125606
Other languages
English (en)
Chinese (zh)
Inventor
杨林举
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/125606 priority Critical patent/WO2021119908A1/fr
Priority to CN201980100994.5A priority patent/CN114556353A/zh
Publication of WO2021119908A1 publication Critical patent/WO2021119908A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data

Definitions

  • This application relates to simultaneous interpretation technology, in particular to a data processing method, device, electronic equipment and storage medium.
  • AI artificial intelligence
  • the simultaneous interpretation system is a voice translation product for conference scenes that has appeared in recent years. It uses AI technology to provide multilingual text translation and text display for conference speakers' speech content.
  • the operation of the presentation document used by the conference lecturer needs to be operated by the lecturer or auxiliary personnel, which is inconvenient for the lecturer.
  • embodiments of the present application provide a data processing method, device, electronic equipment, and storage medium.
  • the embodiment of the application provides a data processing method, including:
  • the command word contained in the voice data When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
  • the speech content contained in the voice data When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
  • detecting the command words contained in the voice data includes:
  • the command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
  • the determining the command word in the voice data that meets the first preset condition includes at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
  • the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
  • detecting the speech content contained in the voice data includes:
  • the speech content in the voice data that does not meet the second preset condition is determined.
  • the determination of the speech content that does not meet the second preset condition in the voice data includes at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the presenting the speech content when the voice data is played includes:
  • the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
  • the embodiment of the present application also provides a simultaneous interpretation device, including:
  • the acquiring unit is configured to acquire voice data and perform content detection on the voice data
  • the first processing unit is configured to, when a command word contained in the voice data is detected, perform a corresponding operation on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data ; The presentation document is used to present when the voice data is played;
  • the second processing unit is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
  • the embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements any of the foregoing data processing methods when the program is executed. step.
  • the embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any of the foregoing data processing methods are implemented.
  • the data processing method, device, electronic equipment, and storage medium provided in the embodiments of the present application acquire voice data, and perform content detection on the voice data; when the command word contained in the voice data is detected, the command word is Corresponding instructions perform corresponding operations on the presentation document; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; the voice data is detected When the speech content is included, the speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without the need for manual operation or assisting the lecturer to assist in the corresponding operation of the presentation document, which improves Speech efficiency, saving speech time, which can improve user experience.
  • Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies
  • Figure 2 is a schematic flow diagram of a command execution method in the simultaneous interpretation process in related technologies
  • FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the application.
  • FIG. 4 is a schematic diagram of another flow chart of a data processing method according to an embodiment of the application.
  • FIG. 5 is a schematic flowchart of still another data processing method according to an embodiment of the application.
  • FIG. 6 is a schematic flowchart of a command execution method in a simultaneous interpretation process according to an embodiment of the application
  • FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of the composition structure of an electronic device according to an embodiment of the application.
  • Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies; as shown in Figure 1, after the simultaneous interpretation server is started, the speaker uses the simultaneous interpretation server to give a speech; during the speech, the simultaneous interpretation server obtains the speaker’s voice data , Perform voice recognition on the voice data to obtain the recognized text in the same language as the voice data; then perform machine translation on the recognized text to obtain the translated text; the simultaneous interpretation server determines whether the translated text needs to be synthesized into speech, and determines the need to synthesize speech , The speech is synthesized according to the translated text, and the synthesized speech, the recognized text, and the translated text are sent out as the target result; if it is determined that no synthesized speech is needed, the recognized text and the translated text are sent out as the target result.
  • the simultaneous interpretation server may collect the voice data through an operating terminal; the operating terminal may be a personal computer (PC, Personal Computer).
  • the simultaneous interpretation server may send the target result to the operation terminal, and the operation terminal screens to the display screen to display the recognized text and the translated text; the simultaneous interpretation server may also send the target result to the terminal through the terminal
  • the human-computer interaction interface displays recognized text and translated text, and the synthesized voice is played through the voice playback module of the terminal; thus, the target result is displayed to the user, and the content of the lecturer’s speech is translated into the language required by the user and displayed.
  • the terminal may be a mobile phone, a tablet computer, etc.; the terminal is held by the user.
  • the speaker can also display the presentation document through the operation terminal, specifically project the presentation document to the display screen, and show it to the user through the display screen;
  • the presentation document may be presentation software (PPT, PowerPoint) documents, Word documents, etc.
  • the lecturer or assistant lecturer needs to use the mouse, page turning pen and other devices to click the corresponding button.
  • the lecturer in order to perform operations such as playing, turning pages, and rewinding, the lecturer needs to stop the lecture and perform corresponding operations on the presentation document before continuing the lecture, or the lecturer needs to issue a command to the assistant lecturer to assist the lecturer to perform After the corresponding operation, the speaker will continue to speak.
  • the command execution method in the simultaneous interpretation process includes: When the speaker uses the simultaneous interpretation system to give a speech, when the speaker needs to perform a presentation on the presentation document For a certain operation, when there is no auxiliary lecturer, the lecturer performs corresponding operations on the corresponding equipment (such as the above operation terminal), such as page turning, backing, etc., so that the corresponding equipment determines and executes the corresponding command; when there is an auxiliary lecturer When the lecturer issues a command to the assistant lecturer, the assistant lecturer performs corresponding operations on the corresponding device, so that the corresponding device determines and executes the corresponding command; after the lecturer waits for the corresponding device to execute the corresponding command, the lecturer continues to use simultaneous interpretation
  • the system gives a speech.
  • the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, display screen, and terminal.
  • the speaker when the speaker operates the corresponding equipment (such as the above operation for the presentation document), the speaker needs to stop the speech and then perform the corresponding operation on the presentation document, which increases the length of the speech and destroys the audience’s experience;
  • the lecturer When assisting the lecturer, the lecturer needs to issue an order to the assistant lecturer, and the assistant lecturer will perform corresponding operations on the corresponding equipment, which will also increase the labor cost required for the lecture.
  • voice data is acquired, and the content of the voice data is detected; when a command word contained in the voice data is detected, a demonstration is performed according to the instruction corresponding to the command word.
  • the document performs corresponding operations; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected , The speech content is presented when the voice data is played; thus, the speaker can perform corresponding operations on the presentation document by speaking, without having to manually operate or assist the speaker to assist in performing the corresponding operations, improve the efficiency of the speech, and save the speech Time, which can improve the user experience.
  • FIG. 3 is a schematic flowchart of the data processing method of the embodiment of the application; as shown in FIG. 3, the method includes:
  • Step 301 Acquire voice data, and perform content detection on the voice data
  • Step 302 When the command words contained in the voice data are detected, perform corresponding operations on the presentation document according to the instructions corresponding to the command words;
  • the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
  • Step 303 When the speech content contained in the voice data is detected, present the speech content when the voice data is played.
  • the content displayed by the presentation document is associated with the content of the voice data, which means that the presentation document is a document associated with the content of the speech that is displayed on the display screen when the lecturer is speaking, such as the presentation software (PPT) displayed during the lecture. , PowerPoint) documents, Word documents, etc.
  • PPT presentation software
  • the presentation document is used for presentation when the voice data is played, which means that the presentation document is presented while the voice data is being played. That is, the data processing method can be applied to any meeting scene where a presentation document needs to be displayed, such as a seminar.
  • the data processing method can be applied to electronic equipment; the electronic equipment can be a server, a terminal, or the like.
  • the electronic device may be the server, and the server projects the presentation document to the display screen for display.
  • the server receives the voice data and executes the data processing method of the embodiment of the present application. , Realize the corresponding operation on the presentation document.
  • the electronic device may also be a terminal, and the presentation document may be projected from the server to the display screen for display.
  • the terminal receives the voice data, executes the data processing method of the embodiment of the application, and determines the content of the presentation document Instruction, the corresponding instruction is sent to the server, and the server implements the corresponding operation on the presentation document.
  • the presentation document can also be displayed through the terminal's human-computer interaction interface; during the speech process, the terminal receives the voice data, executes the data processing method in the embodiment of the application, and realizes Perform corresponding operations on the presentation document displayed on the human-computer interactive interface.
  • the voice data can be collected by the operating terminal.
  • the operating terminal such as a mobile terminal such as a PC
  • a voice collection module such as a microphone
  • Voice collection is performed to obtain voice data, and the voice data is sent to an electronic device (specifically, the server or terminal described above), and the electronic device executes the data processing method of the embodiment of the present application.
  • the voice data needs to be detected; when it is detected that the voice data contains command words, it can be determined that the user needs to operate the presentation document.
  • detecting the command words contained in the voice data includes:
  • the command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
  • the determination of the command word that meets the first preset condition in the voice data includes at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the command vocabulary includes at least one command word, and each command word corresponds to a voice format expression (that is, the pronunciation of the command word) and a text format expression (that is, the text of the command word).
  • the command words determined based on the pronunciation specifically, the command words in the voice data determined above and the pronunciation of a word in the command dictionary exceed a preset threshold
  • the command words determined based on the text Specifically referring to the command word that matches a word in the command word library in the above-identified recognized text) is the same, the command word can be directly determined;
  • the command words determined based on pronunciation may also be different from the command words determined based on text.
  • the target command words need to be determined from the command words determined based on the pronunciation and the command words determined based on the text.
  • the command word determined based on pronunciation is recorded as the first command word;
  • the command word determined based on the text is recorded as the second command word;
  • the first weight and the second weight are determined; the first weight characterization determines the credibility of the command word based on pronunciation, and the second weight characterization is based on The text determines the credibility of the command word; weighting the first command word according to the first weight, and weighting the second command word according to the second weight; Select one of the first command word and the second command word as the target command word.
  • command word A is directly used as the target person.
  • the first weight and the second weight need to be determined; the command word A is weighted according to the first weight, The command word B is weighted according to the second weight; the weighted result of the command word A and the weighted result of the command word B are obtained, and the weighted result of the command word A and the weighted result of the command word B are compared to determine The weighted result of command word A is greater than the weighted result of command word B, then command word A is used as the target command word, and vice versa, command word B is used as the target command word.
  • first command words based on pronunciation
  • second command words through text; that is, the number of first command words is at least two, and the number of second command words For at least two.
  • the at least two first command words and the at least two second command words do not have the same command word (that is, each command word in the at least two first command words, and each command word in the at least two second command words Words are not the same), then according to the weighted processing result of each first command word and the weighted processing result of each second command word, select weighted processing from at least two first command words and at least two second command words The command word with the largest result is used as the target command word;
  • the weighted processing results for the same command word are added together to obtain the weighted processing result for each command word; according to the weighted processing result of each command word, from at least two first command words And at least two second command words, the command word with the largest weighted processing result is selected as the target command word.
  • command word A and command word B are obtained, and the probability of command word A being the target command word is a1%, and the probability of command word B being the target command word is b%; here, a1%+ b% can be equal to 1;
  • command word A and command word C based on the text, and get command word A as the target command word.
  • the probability that command word A is the target command word is a2%, and the probability of command word C being the target command word is c%; here, a2%+c% can be equal to 1;
  • the first weight is x
  • the second weight is y
  • x+y 1;
  • the weighted processing results for each command word are as follows:
  • the weighted processing result of the command word C c%*y;
  • the command word with the largest weighted processing result is selected from the command word A, the command word B, and the command word C as the target command word.
  • the electronic device After detecting the command word, the electronic device can determine that the user needs to perform a corresponding operation on the presentation document, and the electronic device can determine the corresponding operation instruction according to the detected command word to realize the control of the presentation document.
  • the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
  • the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
  • the corresponding operation is performed on the presentation document, that is, the control of the presentation document is realized.
  • the first control instruction for the presentation document that controls the back of the presentation document is determined according to the command word "back”; that is, the first control instruction is used to control Back the presentation document (that is, turn to the previous page);
  • a second control instruction for controlling the page turning of the presentation document is determined according to the command word "page turning”; that is, the second control instruction is used to control the presentation Turn the document page (that is, turn to the next page).
  • the first control instruction may be specifically executed by a program that displays a presentation document.
  • a program that displays a presentation document.
  • the presentation document is a PPT document
  • the corresponding program may be a Microsoft Office PowerPoint application
  • the Microsoft Office PowerPoint application determines the first control instruction and executes the corresponding back operation.
  • the second control instruction is similar to the first control instruction.
  • the second control instruction may be presented by a program that displays a presentation document.
  • the presentation document is a PPT document
  • the corresponding program may be Microsoft Office PowerPoint application;
  • the Microsoft Office PowerPoint application determines the second control instruction, and executes the corresponding page turning operation.
  • the instruction library may be preset by the developer and stored in the electronic device.
  • the instructions stored in the instruction library and the command words corresponding to the corresponding instructions can be provided to the speaker for viewing in advance.
  • the instruction library may be preset by the developer and stored in the server.
  • the instruction library may be preset by the developer and saved in the server, and then the set instruction library may be sent to the terminal; accordingly, when After the instruction library is updated, the server may send the updated instruction library to the terminal again, and the terminal receives and saves the corresponding instruction library.
  • the language of the command word can have at least one language, so that for speakers of different languages, the target command corresponding to the command word can be obtained by searching the command library.
  • command words in any language there can be at least one expression, that is, there can be at least one command word with similar semantics.
  • the instructions include: page turning of the presentation document, rewinding of the presentation document, etc.;
  • the data processing method of the embodiment of the present application can also be applied to a simultaneous interpretation scene.
  • the electronic device may specifically be a device that realizes simultaneous interpretation, such as the simultaneous interpretation server used in the method shown in Figure 1, and the voice data may be collected by the operating terminal and sent to the simultaneous interpretation. server.
  • the operating terminal may be the aforementioned PC or the like.
  • the simultaneous interpretation scene may adopt the architecture of a simultaneous interpretation system (the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, and terminal).
  • the data processing method of the embodiment of the present application can be applied to a device that implements simultaneous interpretation.
  • the device that implements simultaneous interpretation may be a newly added device in the architecture of the simultaneous interpretation system, or it may be used for simultaneous interpretation.
  • a certain device (such as the simultaneous interpretation server and terminal) in the system architecture can be improved to be able to implement the method of the embodiment of the present application.
  • the improvement can include a command word detection module in a certain device; the command word detection module is used to detect the voice data, and when a corresponding command word is detected, determine the command corresponding to the command word ; Perform corresponding operations according to the instructions.
  • the operating terminal when the speaker is giving a speech, the operating terminal (such as a PC) may be equipped with or connected to a voice collection module, such as a microphone, through which voice collection is performed to obtain the voice And send the voice data to the device for realizing simultaneous interpretation; the operating terminal can also project the presentation document to the display screen, and show it to the user through the display screen.
  • the device for implementing simultaneous interpretation receives voice data, executes the data processing method of the embodiment of the present application, implements corresponding operations on the presentation document, and performs simultaneous interpretation on the speech content in the voice data.
  • the server receives the voice data sent by the operating terminal, executes the data processing method in the embodiment of this application, and implements corresponding operations on the presentation document and addresses the speech data in the voice data.
  • the content is interpreted simultaneously.
  • the device for realizing simultaneous interpretation may also be a terminal held by the user, the operating terminal or the server that receives voice data may send the voice data to the terminal held by the user, and the terminal held by the user Receive voice data, execute the data processing method of the embodiment of the present application, implement corresponding operations on the presentation document, and perform simultaneous interpretation of the speech content in the voice data.
  • detecting the speech content contained in the voice data includes:
  • the speech content in the voice data that does not meet the second preset condition is determined.
  • the determining the speech content in the voice data that does not meet the second preset condition includes at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the above-mentioned preset threshold and preset matching degree threshold may be preset by the developer and stored in the corresponding device.
  • speech format When detecting the speech content, you can directly detect the speech data to determine the speech content (speech format), or you can determine the recognized text after the speech data is recognized, and then determine the speech content in the recognized text (text format), That is, the speech text in the recognized text is determined.
  • the presenting the content of the speech when the voice data is played includes:
  • the recognition result includes at least one of the following: recognized text in at least one language, and translated speech data in at least one language.
  • the identification result corresponding to the determined speech content includes at least one of the following:
  • the language corresponding to the first recognized text is the same as the language corresponding to the speech content;
  • the speech synthesis is performed on the second recognized text to obtain the speech corresponding to the second recognized text, which is the translated speech data corresponding to the speech content.
  • the content specifically included in the recognition result can be selected in advance by the user holding the terminal, and the selected result is sent to the server, so that the server provides the corresponding recognition result according to the user's selection.
  • the method when the data processing method is applied to a server, the method further includes:
  • the first acquisition request is used to acquire a corresponding recognition result
  • the target recognition result is determined, and the acquired target recognition result is sent to the terminal.
  • the first acquisition request may be a request for acquiring recognized text; it may also be a request for acquiring translated voice data; it may also be a request for acquiring recognized text and translated voice data.
  • the recognition result of the corresponding language can be obtained according to the acquisition request sent by the user through the terminal.
  • the method may further include:
  • the second acquisition request includes at least: the target language
  • the obtained recognition result corresponding to the target language is sent to the terminal.
  • the terminal may be provided with a human-computer interaction interface.
  • the user holding the terminal can select a language through the human-computer interaction interface.
  • the terminal generates a second acquisition request containing the target language according to the user's selection, and sends the second acquisition request to the server , So that the server receives the second acquisition request.
  • the terminal may be a mobile phone; this is considering that most users currently carry a mobile phone with them and send the recognition result to the mobile phone without adding other devices to receive and display the recognition result, which can save cost and is convenient to operate.
  • the data processing method can also be applied to a terminal held by a user.
  • the user holding the terminal can select the language and the content specifically included in the recognition result through the human-computer interaction interface of the terminal.
  • the terminal determines the language selected by the user and the specific content included in the recognition result, and uses the data processing method provided in the embodiment of the present application to obtain and present the recognition result that meets the needs of the user.
  • the data processing method is applied in a simultaneous interpretation scenario. As the speech proceeds, the voice data will continue to change, and the recognition result will also continue to change as the voice data changes.
  • the data processing method provided in the embodiments of this application can be applied to a variety of voice application scenarios, such as the above-mentioned simultaneous interpretation scenario and video conference scenario; in the video conference scenario, the command word detection module can also be used to determine what the user said Command words, based on the determined command words to perform corresponding operations on the documents displayed in the video conference.
  • the data processing method provided by the embodiment of the application obtains voice data and performs content detection on the voice data; when a command word contained in the voice data is detected, the presentation document is corresponding to the instruction corresponding to the command word Operation; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected, the The speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without manual operation or assisting the speaker to assist in the corresponding operation, which improves the efficiency of the speech, saves the speech time, and can improve the user Experience.
  • FIG. 4 is a schematic flowchart of a keyword detection method according to an embodiment of the application; as shown in FIG. 4, the keyword detection method can be applied to electronic devices (such as the above-mentioned server and a terminal held by a user).
  • electronic devices such as the above-mentioned server and a terminal held by a user.
  • Step 401 Receive input voice data
  • Step 402 Perform command word detection for the input voice data
  • Step 403 Determine whether the command word is detected; when it is determined that the command word is detected, go to step 404; when it is determined that the command word is not detected, go to step 405;
  • the judgment whether the command word is detected includes one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • Step 404 Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
  • the instruction may be an operation instruction for the presentation document, and a corresponding operation is performed on the presentation document according to the determined operation instruction.
  • the instruction may be an operation instruction for the presentation document, and a corresponding operation is performed on the presentation document according to the determined operation instruction.
  • the command word can also be extended based on user requirements.
  • the command word may also include operations for other programs, such as adjusting the volume of the electronic device (such as increasing the volume, etc.). After determining the corresponding command word for the volume, determine and execute the corresponding instruction to realize the adjustment of the volume. .
  • Step 405 Continue to detect the command word.
  • Fig. 5 is a schematic flow diagram of another data processing method according to an embodiment of the application; as shown in Fig. 5, the data processing method is applied to an electronic device (such as the above-mentioned server and a terminal held by a user), and the method includes:
  • Step 501 Perform voice recognition on voice data
  • the voice data is the voice spoken by the speaker during the speech in the simultaneous interpretation scenario.
  • Step 502 Perform command word detection on the voice data; determine whether the command word is detected, and if it is determined that the command word is detected, then go to step 503; if it is determined that the command word is not detected, then go to step 504;
  • the performing command word detection on the voice data includes:
  • a command word matching a word in the command dictionary in the recognized text is detected; the recognized text is obtained by text recognition of the voice data.
  • the recognized text corresponding to the voice data can be detected to determine whether the command word is detected, or the voice data can be directly detected during the voice recognition process to determine the command word.
  • the voice data can be directly detected during the voice recognition process to determine the command word.
  • Step 503 Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
  • Step 504 Perform machine translation on the recognized text corresponding to the voice data to obtain the translated text
  • the recognized text is text obtained by performing voice recognition on voice data.
  • Step 505 Determine whether speech synthesis is required; if it is determined that speech synthesis is required, go to step 506, and if it is determined not to perform speech synthesis, go to step 507;
  • the need for speech synthesis can be preset by the developer and stored in the corresponding device.
  • Step 506 Perform speech synthesis on the translated text
  • Step 507 Output the result of simultaneous interpretation.
  • the simultaneous interpretation result may include: recognized text and translated text; when it is determined that speech synthesis is required, the simultaneous interpretation result may also include: translated voice data (that is, voice data obtained by performing voice synthesis on the translated text).
  • FIG. 6 is a schematic flowchart of a method for executing commands in a simultaneous interpretation process according to an embodiment of the application; as shown in FIG. 6, the method for executing commands in a simultaneous interpretation process includes:
  • Step 601 The speaker uses the simultaneous interpretation system to give a speech
  • Step 602 When the speaker needs to perform a corresponding operation, the speaker speaks the corresponding command word;
  • Step 603 The simultaneous interpretation system uses the above data processing method to determine the corresponding command word, and executes the instruction corresponding to the command word;
  • Step 604 The lecturer continues to use the simultaneous interpretation system to give a lecture.
  • FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 7, the data processing device includes:
  • the obtaining unit 71 is configured to obtain voice data and perform content detection on the voice data
  • the first processing unit 72 is configured to perform corresponding operations on the presentation document according to instructions corresponding to the command words when the command words contained in the voice data are detected;
  • the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
  • the second processing unit 73 is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
  • the first processing unit 72 is configured to query a command vocabulary according to the voice data, and determine a command word in the voice data that meets a first preset condition.
  • the first processing unit 72 is configured to determine a command word that meets a first preset condition in the voice data, including at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the first processing unit 72 is configured to query an instruction library according to the command word to determine a target instruction corresponding to the command word; the target instruction represents an operation instruction for a presentation document; the instruction The library includes at least one instruction and a command word corresponding to each instruction in the at least one instruction;
  • the second processing unit 73 is configured to query a command word database according to the voice data, and determine the speech content in the voice data that does not meet the second preset condition.
  • the second processing unit 73 is configured to determine the speech content that does not meet the second preset condition in the voice data, including at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the second processing unit 73 is configured to determine a recognition result corresponding to the speech content
  • the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
  • the acquisition unit 71, the first processing unit 72, and the second processing unit 73 can all be processors in electronic devices (such as the aforementioned servers and terminals held by users), such as a central processing unit ( CPU, Central Processing Unit, Digital Signal Processor (DSP, Digital Signal Processor), Microcontroller Unit (MCU) or Programmable Gate Array (FPGA, Field-Programmable Gate Array) etc. are implemented.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • MCU Microcontroller Unit
  • FPGA Field-Programmable Gate Array
  • the device provided in the above embodiment performs data processing
  • only the division of the above-mentioned program modules is used as an example.
  • the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal
  • the internal structure is divided into different program modules to complete all or part of the processing described above.
  • the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 8 is a schematic diagram of the hardware composition of the electronic device according to the embodiment of the application.
  • the electronic device 80 includes a memory 83 and a processor. 82 and a computer program stored on the memory 83 and capable of running on the processor 82; the processor 82 located in the electronic device implements the method provided by one or more technical solutions on the electronic device side when the program is executed.
  • the processor 82 located in the electronic device 80 executes the program, it realizes: acquiring voice data, and performing content detection on the voice data;
  • the command word contained in the voice data When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
  • the speech content contained in the voice data When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
  • the electronic device further includes a communication interface 81; various components in the electronic device are coupled together through the bus system 84. It can be understood that the bus system 84 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 84 also includes a power bus, a control bus, and a status signal bus.
  • the memory 83 in the embodiment of the present application may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), and erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
  • the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • SSRAM synchronous static random access memory
  • Synchronous Static Random Access Memory Synchronous Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM synchronous connection dynamic random access memory
  • DRRAM Direct Rambus Random Access Memory
  • the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
  • the method disclosed in the foregoing embodiments of the present application may be applied to the processor 82 or implemented by the processor 82.
  • the processor 82 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 82 or instructions in the form of software.
  • the aforementioned processor 82 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
  • the processor 82 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium, and the storage medium is located in a memory.
  • the processor 82 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
  • the embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
  • a storage medium which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
  • Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the electronic device side when the computer instructions are executed by a processor.
  • the disclosed method and smart device can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
  • the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is allowed to execute all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Procédé et appareil de traitement de données, dispositif électronique, et support d'enregistrement. Le procédé consiste : à acquérir des données vocales, et à effectuer une détection de contenu sur les données vocales (301) ; lors de la détection d'un mot de commande compris dans les données vocales, à effectuer une opération correspondante par rapport à un document de présentation selon une instruction correspondant au mot de commande (302), le contenu du document de présentation étant associé au contenu des données vocales, et le document de présentation étant utilisé dans le cadre d'une présentation durant la lecture des données vocales ; et lors de la détection d'un contenu locutoire compris dans les données vocales, à présenter le contenu locutoire pendant que les données vocales sont lues (303).
PCT/CN2019/125606 2019-12-16 2019-12-16 Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement WO2021119908A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/125606 WO2021119908A1 (fr) 2019-12-16 2019-12-16 Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement
CN201980100994.5A CN114556353A (zh) 2019-12-16 2019-12-16 数据处理方法、装置、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/125606 WO2021119908A1 (fr) 2019-12-16 2019-12-16 Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement

Publications (1)

Publication Number Publication Date
WO2021119908A1 true WO2021119908A1 (fr) 2021-06-24

Family

ID=76476927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125606 WO2021119908A1 (fr) 2019-12-16 2019-12-16 Procédé et appareil de traitement de données, dispositif électronique et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN114556353A (fr)
WO (1) WO2021119908A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339193A (zh) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 一种声控会议演讲的方法及系统
CN103257841A (zh) * 2013-05-16 2013-08-21 问航 一种用扩音器的语音输入控制ppt演示软件的方法
US20160328206A1 (en) * 2014-03-28 2016-11-10 Panasonic Intellectual Property Management Co., Ltd. Speech retrieval device, speech retrieval method, and display device
CN107046523A (zh) * 2016-11-22 2017-08-15 深圳大学 一种基于个人移动终端的同声传译方法及客户端
CN206594846U (zh) * 2016-11-06 2017-10-27 刘守全 语音控制式历史教学装置
CN109584880A (zh) * 2018-11-26 2019-04-05 山东弘享机器人有限公司 一种机器人演讲ppt文件的实现方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339193A (zh) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 一种声控会议演讲的方法及系统
CN103257841A (zh) * 2013-05-16 2013-08-21 问航 一种用扩音器的语音输入控制ppt演示软件的方法
US20160328206A1 (en) * 2014-03-28 2016-11-10 Panasonic Intellectual Property Management Co., Ltd. Speech retrieval device, speech retrieval method, and display device
CN206594846U (zh) * 2016-11-06 2017-10-27 刘守全 语音控制式历史教学装置
CN107046523A (zh) * 2016-11-22 2017-08-15 深圳大学 一种基于个人移动终端的同声传译方法及客户端
CN109584880A (zh) * 2018-11-26 2019-04-05 山东弘享机器人有限公司 一种机器人演讲ppt文件的实现方法

Also Published As

Publication number Publication date
CN114556353A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
JP7029613B2 (ja) インターフェイススマートインタラクティブ制御方法、装置、システム及びプログラム
US11609631B2 (en) Natural human-computer interaction for virtual personal assistant systems
US9953648B2 (en) Electronic device and method for controlling the same
CN108683937B (zh) 智能电视的语音交互反馈方法、系统及计算机可读介质
JP6862632B2 (ja) 音声インタラクション方法、装置、設備、コンピュータ記憶媒体及びコンピュータプログラム
US11328133B2 (en) Translation processing method, translation processing device, and device
US9298287B2 (en) Combined activation for natural user interface systems
US20180277097A1 (en) Method and device for extracting acoustic feature based on convolution neural network and terminal device
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
CN111050201B (zh) 数据处理方法、装置、电子设备及存储介质
WO2020078300A1 (fr) Procédé de commande de projection d'écran d'un terminal, et terminal
CN110494841B (zh) 语境语言翻译
JP2015516587A (ja) 対話から情報を抽出するデバイス
WO2020024620A1 (fr) Procédé et dispositif de traitement d'informations vocales, appareil et support d'enregistrement
CN109543021B (zh) 一种面向智能机器人的故事数据处理方法及系统
CN114556328A (zh) 数据处理方法、装置、电子设备和存储介质
KR20200027331A (ko) 음성 합성 장치
WO2021120190A1 (fr) Procédé et appareil de traitement de données, dispositif électronique et support de stockage
CN111126084B (zh) 数据处理方法、装置、电子设备和存储介质
KR20170126667A (ko) 회의 기록 자동 생성 방법 및 그 장치
CN111009240A (zh) 一种语音关键词筛选方法、装置、出行终端、设备及介质
WO2021087665A1 (fr) Procédé et appareil de traitement de données, serveur et support de stockage
CN110992960A (zh) 控制方法、装置、电子设备和存储介质
US10403275B1 (en) Speech control for complex commands
WO2021097629A1 (fr) Procédé et appareil de traitement de données, et dispositif électronique et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956203

Country of ref document: EP

Kind code of ref document: A1