WO2021119908A1 - Data processing method and apparatus, electronic device, and storage medium - Google Patents

Data processing method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021119908A1
WO2021119908A1 PCT/CN2019/125606 CN2019125606W WO2021119908A1 WO 2021119908 A1 WO2021119908 A1 WO 2021119908A1 CN 2019125606 W CN2019125606 W CN 2019125606W WO 2021119908 A1 WO2021119908 A1 WO 2021119908A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
command
speech
command word
word
Prior art date
Application number
PCT/CN2019/125606
Other languages
French (fr)
Chinese (zh)
Inventor
杨林举
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/125606 priority Critical patent/WO2021119908A1/en
Priority to CN201980100994.5A priority patent/CN114556353A/en
Publication of WO2021119908A1 publication Critical patent/WO2021119908A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data

Definitions

  • This application relates to simultaneous interpretation technology, in particular to a data processing method, device, electronic equipment and storage medium.
  • AI artificial intelligence
  • the simultaneous interpretation system is a voice translation product for conference scenes that has appeared in recent years. It uses AI technology to provide multilingual text translation and text display for conference speakers' speech content.
  • the operation of the presentation document used by the conference lecturer needs to be operated by the lecturer or auxiliary personnel, which is inconvenient for the lecturer.
  • embodiments of the present application provide a data processing method, device, electronic equipment, and storage medium.
  • the embodiment of the application provides a data processing method, including:
  • the command word contained in the voice data When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
  • the speech content contained in the voice data When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
  • detecting the command words contained in the voice data includes:
  • the command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
  • the determining the command word in the voice data that meets the first preset condition includes at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
  • the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
  • detecting the speech content contained in the voice data includes:
  • the speech content in the voice data that does not meet the second preset condition is determined.
  • the determination of the speech content that does not meet the second preset condition in the voice data includes at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the presenting the speech content when the voice data is played includes:
  • the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
  • the embodiment of the present application also provides a simultaneous interpretation device, including:
  • the acquiring unit is configured to acquire voice data and perform content detection on the voice data
  • the first processing unit is configured to, when a command word contained in the voice data is detected, perform a corresponding operation on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data ; The presentation document is used to present when the voice data is played;
  • the second processing unit is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
  • the embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements any of the foregoing data processing methods when the program is executed. step.
  • the embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any of the foregoing data processing methods are implemented.
  • the data processing method, device, electronic equipment, and storage medium provided in the embodiments of the present application acquire voice data, and perform content detection on the voice data; when the command word contained in the voice data is detected, the command word is Corresponding instructions perform corresponding operations on the presentation document; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; the voice data is detected When the speech content is included, the speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without the need for manual operation or assisting the lecturer to assist in the corresponding operation of the presentation document, which improves Speech efficiency, saving speech time, which can improve user experience.
  • Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies
  • Figure 2 is a schematic flow diagram of a command execution method in the simultaneous interpretation process in related technologies
  • FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the application.
  • FIG. 4 is a schematic diagram of another flow chart of a data processing method according to an embodiment of the application.
  • FIG. 5 is a schematic flowchart of still another data processing method according to an embodiment of the application.
  • FIG. 6 is a schematic flowchart of a command execution method in a simultaneous interpretation process according to an embodiment of the application
  • FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of the composition structure of an electronic device according to an embodiment of the application.
  • Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies; as shown in Figure 1, after the simultaneous interpretation server is started, the speaker uses the simultaneous interpretation server to give a speech; during the speech, the simultaneous interpretation server obtains the speaker’s voice data , Perform voice recognition on the voice data to obtain the recognized text in the same language as the voice data; then perform machine translation on the recognized text to obtain the translated text; the simultaneous interpretation server determines whether the translated text needs to be synthesized into speech, and determines the need to synthesize speech , The speech is synthesized according to the translated text, and the synthesized speech, the recognized text, and the translated text are sent out as the target result; if it is determined that no synthesized speech is needed, the recognized text and the translated text are sent out as the target result.
  • the simultaneous interpretation server may collect the voice data through an operating terminal; the operating terminal may be a personal computer (PC, Personal Computer).
  • the simultaneous interpretation server may send the target result to the operation terminal, and the operation terminal screens to the display screen to display the recognized text and the translated text; the simultaneous interpretation server may also send the target result to the terminal through the terminal
  • the human-computer interaction interface displays recognized text and translated text, and the synthesized voice is played through the voice playback module of the terminal; thus, the target result is displayed to the user, and the content of the lecturer’s speech is translated into the language required by the user and displayed.
  • the terminal may be a mobile phone, a tablet computer, etc.; the terminal is held by the user.
  • the speaker can also display the presentation document through the operation terminal, specifically project the presentation document to the display screen, and show it to the user through the display screen;
  • the presentation document may be presentation software (PPT, PowerPoint) documents, Word documents, etc.
  • the lecturer or assistant lecturer needs to use the mouse, page turning pen and other devices to click the corresponding button.
  • the lecturer in order to perform operations such as playing, turning pages, and rewinding, the lecturer needs to stop the lecture and perform corresponding operations on the presentation document before continuing the lecture, or the lecturer needs to issue a command to the assistant lecturer to assist the lecturer to perform After the corresponding operation, the speaker will continue to speak.
  • the command execution method in the simultaneous interpretation process includes: When the speaker uses the simultaneous interpretation system to give a speech, when the speaker needs to perform a presentation on the presentation document For a certain operation, when there is no auxiliary lecturer, the lecturer performs corresponding operations on the corresponding equipment (such as the above operation terminal), such as page turning, backing, etc., so that the corresponding equipment determines and executes the corresponding command; when there is an auxiliary lecturer When the lecturer issues a command to the assistant lecturer, the assistant lecturer performs corresponding operations on the corresponding device, so that the corresponding device determines and executes the corresponding command; after the lecturer waits for the corresponding device to execute the corresponding command, the lecturer continues to use simultaneous interpretation
  • the system gives a speech.
  • the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, display screen, and terminal.
  • the speaker when the speaker operates the corresponding equipment (such as the above operation for the presentation document), the speaker needs to stop the speech and then perform the corresponding operation on the presentation document, which increases the length of the speech and destroys the audience’s experience;
  • the lecturer When assisting the lecturer, the lecturer needs to issue an order to the assistant lecturer, and the assistant lecturer will perform corresponding operations on the corresponding equipment, which will also increase the labor cost required for the lecture.
  • voice data is acquired, and the content of the voice data is detected; when a command word contained in the voice data is detected, a demonstration is performed according to the instruction corresponding to the command word.
  • the document performs corresponding operations; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected , The speech content is presented when the voice data is played; thus, the speaker can perform corresponding operations on the presentation document by speaking, without having to manually operate or assist the speaker to assist in performing the corresponding operations, improve the efficiency of the speech, and save the speech Time, which can improve the user experience.
  • FIG. 3 is a schematic flowchart of the data processing method of the embodiment of the application; as shown in FIG. 3, the method includes:
  • Step 301 Acquire voice data, and perform content detection on the voice data
  • Step 302 When the command words contained in the voice data are detected, perform corresponding operations on the presentation document according to the instructions corresponding to the command words;
  • the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
  • Step 303 When the speech content contained in the voice data is detected, present the speech content when the voice data is played.
  • the content displayed by the presentation document is associated with the content of the voice data, which means that the presentation document is a document associated with the content of the speech that is displayed on the display screen when the lecturer is speaking, such as the presentation software (PPT) displayed during the lecture. , PowerPoint) documents, Word documents, etc.
  • PPT presentation software
  • the presentation document is used for presentation when the voice data is played, which means that the presentation document is presented while the voice data is being played. That is, the data processing method can be applied to any meeting scene where a presentation document needs to be displayed, such as a seminar.
  • the data processing method can be applied to electronic equipment; the electronic equipment can be a server, a terminal, or the like.
  • the electronic device may be the server, and the server projects the presentation document to the display screen for display.
  • the server receives the voice data and executes the data processing method of the embodiment of the present application. , Realize the corresponding operation on the presentation document.
  • the electronic device may also be a terminal, and the presentation document may be projected from the server to the display screen for display.
  • the terminal receives the voice data, executes the data processing method of the embodiment of the application, and determines the content of the presentation document Instruction, the corresponding instruction is sent to the server, and the server implements the corresponding operation on the presentation document.
  • the presentation document can also be displayed through the terminal's human-computer interaction interface; during the speech process, the terminal receives the voice data, executes the data processing method in the embodiment of the application, and realizes Perform corresponding operations on the presentation document displayed on the human-computer interactive interface.
  • the voice data can be collected by the operating terminal.
  • the operating terminal such as a mobile terminal such as a PC
  • a voice collection module such as a microphone
  • Voice collection is performed to obtain voice data, and the voice data is sent to an electronic device (specifically, the server or terminal described above), and the electronic device executes the data processing method of the embodiment of the present application.
  • the voice data needs to be detected; when it is detected that the voice data contains command words, it can be determined that the user needs to operate the presentation document.
  • detecting the command words contained in the voice data includes:
  • the command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
  • the determination of the command word that meets the first preset condition in the voice data includes at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the command vocabulary includes at least one command word, and each command word corresponds to a voice format expression (that is, the pronunciation of the command word) and a text format expression (that is, the text of the command word).
  • the command words determined based on the pronunciation specifically, the command words in the voice data determined above and the pronunciation of a word in the command dictionary exceed a preset threshold
  • the command words determined based on the text Specifically referring to the command word that matches a word in the command word library in the above-identified recognized text) is the same, the command word can be directly determined;
  • the command words determined based on pronunciation may also be different from the command words determined based on text.
  • the target command words need to be determined from the command words determined based on the pronunciation and the command words determined based on the text.
  • the command word determined based on pronunciation is recorded as the first command word;
  • the command word determined based on the text is recorded as the second command word;
  • the first weight and the second weight are determined; the first weight characterization determines the credibility of the command word based on pronunciation, and the second weight characterization is based on The text determines the credibility of the command word; weighting the first command word according to the first weight, and weighting the second command word according to the second weight; Select one of the first command word and the second command word as the target command word.
  • command word A is directly used as the target person.
  • the first weight and the second weight need to be determined; the command word A is weighted according to the first weight, The command word B is weighted according to the second weight; the weighted result of the command word A and the weighted result of the command word B are obtained, and the weighted result of the command word A and the weighted result of the command word B are compared to determine The weighted result of command word A is greater than the weighted result of command word B, then command word A is used as the target command word, and vice versa, command word B is used as the target command word.
  • first command words based on pronunciation
  • second command words through text; that is, the number of first command words is at least two, and the number of second command words For at least two.
  • the at least two first command words and the at least two second command words do not have the same command word (that is, each command word in the at least two first command words, and each command word in the at least two second command words Words are not the same), then according to the weighted processing result of each first command word and the weighted processing result of each second command word, select weighted processing from at least two first command words and at least two second command words The command word with the largest result is used as the target command word;
  • the weighted processing results for the same command word are added together to obtain the weighted processing result for each command word; according to the weighted processing result of each command word, from at least two first command words And at least two second command words, the command word with the largest weighted processing result is selected as the target command word.
  • command word A and command word B are obtained, and the probability of command word A being the target command word is a1%, and the probability of command word B being the target command word is b%; here, a1%+ b% can be equal to 1;
  • command word A and command word C based on the text, and get command word A as the target command word.
  • the probability that command word A is the target command word is a2%, and the probability of command word C being the target command word is c%; here, a2%+c% can be equal to 1;
  • the first weight is x
  • the second weight is y
  • x+y 1;
  • the weighted processing results for each command word are as follows:
  • the weighted processing result of the command word C c%*y;
  • the command word with the largest weighted processing result is selected from the command word A, the command word B, and the command word C as the target command word.
  • the electronic device After detecting the command word, the electronic device can determine that the user needs to perform a corresponding operation on the presentation document, and the electronic device can determine the corresponding operation instruction according to the detected command word to realize the control of the presentation document.
  • the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
  • the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
  • the corresponding operation is performed on the presentation document, that is, the control of the presentation document is realized.
  • the first control instruction for the presentation document that controls the back of the presentation document is determined according to the command word "back”; that is, the first control instruction is used to control Back the presentation document (that is, turn to the previous page);
  • a second control instruction for controlling the page turning of the presentation document is determined according to the command word "page turning”; that is, the second control instruction is used to control the presentation Turn the document page (that is, turn to the next page).
  • the first control instruction may be specifically executed by a program that displays a presentation document.
  • a program that displays a presentation document.
  • the presentation document is a PPT document
  • the corresponding program may be a Microsoft Office PowerPoint application
  • the Microsoft Office PowerPoint application determines the first control instruction and executes the corresponding back operation.
  • the second control instruction is similar to the first control instruction.
  • the second control instruction may be presented by a program that displays a presentation document.
  • the presentation document is a PPT document
  • the corresponding program may be Microsoft Office PowerPoint application;
  • the Microsoft Office PowerPoint application determines the second control instruction, and executes the corresponding page turning operation.
  • the instruction library may be preset by the developer and stored in the electronic device.
  • the instructions stored in the instruction library and the command words corresponding to the corresponding instructions can be provided to the speaker for viewing in advance.
  • the instruction library may be preset by the developer and stored in the server.
  • the instruction library may be preset by the developer and saved in the server, and then the set instruction library may be sent to the terminal; accordingly, when After the instruction library is updated, the server may send the updated instruction library to the terminal again, and the terminal receives and saves the corresponding instruction library.
  • the language of the command word can have at least one language, so that for speakers of different languages, the target command corresponding to the command word can be obtained by searching the command library.
  • command words in any language there can be at least one expression, that is, there can be at least one command word with similar semantics.
  • the instructions include: page turning of the presentation document, rewinding of the presentation document, etc.;
  • the data processing method of the embodiment of the present application can also be applied to a simultaneous interpretation scene.
  • the electronic device may specifically be a device that realizes simultaneous interpretation, such as the simultaneous interpretation server used in the method shown in Figure 1, and the voice data may be collected by the operating terminal and sent to the simultaneous interpretation. server.
  • the operating terminal may be the aforementioned PC or the like.
  • the simultaneous interpretation scene may adopt the architecture of a simultaneous interpretation system (the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, and terminal).
  • the data processing method of the embodiment of the present application can be applied to a device that implements simultaneous interpretation.
  • the device that implements simultaneous interpretation may be a newly added device in the architecture of the simultaneous interpretation system, or it may be used for simultaneous interpretation.
  • a certain device (such as the simultaneous interpretation server and terminal) in the system architecture can be improved to be able to implement the method of the embodiment of the present application.
  • the improvement can include a command word detection module in a certain device; the command word detection module is used to detect the voice data, and when a corresponding command word is detected, determine the command corresponding to the command word ; Perform corresponding operations according to the instructions.
  • the operating terminal when the speaker is giving a speech, the operating terminal (such as a PC) may be equipped with or connected to a voice collection module, such as a microphone, through which voice collection is performed to obtain the voice And send the voice data to the device for realizing simultaneous interpretation; the operating terminal can also project the presentation document to the display screen, and show it to the user through the display screen.
  • the device for implementing simultaneous interpretation receives voice data, executes the data processing method of the embodiment of the present application, implements corresponding operations on the presentation document, and performs simultaneous interpretation on the speech content in the voice data.
  • the server receives the voice data sent by the operating terminal, executes the data processing method in the embodiment of this application, and implements corresponding operations on the presentation document and addresses the speech data in the voice data.
  • the content is interpreted simultaneously.
  • the device for realizing simultaneous interpretation may also be a terminal held by the user, the operating terminal or the server that receives voice data may send the voice data to the terminal held by the user, and the terminal held by the user Receive voice data, execute the data processing method of the embodiment of the present application, implement corresponding operations on the presentation document, and perform simultaneous interpretation of the speech content in the voice data.
  • detecting the speech content contained in the voice data includes:
  • the speech content in the voice data that does not meet the second preset condition is determined.
  • the determining the speech content in the voice data that does not meet the second preset condition includes at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the above-mentioned preset threshold and preset matching degree threshold may be preset by the developer and stored in the corresponding device.
  • speech format When detecting the speech content, you can directly detect the speech data to determine the speech content (speech format), or you can determine the recognized text after the speech data is recognized, and then determine the speech content in the recognized text (text format), That is, the speech text in the recognized text is determined.
  • the presenting the content of the speech when the voice data is played includes:
  • the recognition result includes at least one of the following: recognized text in at least one language, and translated speech data in at least one language.
  • the identification result corresponding to the determined speech content includes at least one of the following:
  • the language corresponding to the first recognized text is the same as the language corresponding to the speech content;
  • the speech synthesis is performed on the second recognized text to obtain the speech corresponding to the second recognized text, which is the translated speech data corresponding to the speech content.
  • the content specifically included in the recognition result can be selected in advance by the user holding the terminal, and the selected result is sent to the server, so that the server provides the corresponding recognition result according to the user's selection.
  • the method when the data processing method is applied to a server, the method further includes:
  • the first acquisition request is used to acquire a corresponding recognition result
  • the target recognition result is determined, and the acquired target recognition result is sent to the terminal.
  • the first acquisition request may be a request for acquiring recognized text; it may also be a request for acquiring translated voice data; it may also be a request for acquiring recognized text and translated voice data.
  • the recognition result of the corresponding language can be obtained according to the acquisition request sent by the user through the terminal.
  • the method may further include:
  • the second acquisition request includes at least: the target language
  • the obtained recognition result corresponding to the target language is sent to the terminal.
  • the terminal may be provided with a human-computer interaction interface.
  • the user holding the terminal can select a language through the human-computer interaction interface.
  • the terminal generates a second acquisition request containing the target language according to the user's selection, and sends the second acquisition request to the server , So that the server receives the second acquisition request.
  • the terminal may be a mobile phone; this is considering that most users currently carry a mobile phone with them and send the recognition result to the mobile phone without adding other devices to receive and display the recognition result, which can save cost and is convenient to operate.
  • the data processing method can also be applied to a terminal held by a user.
  • the user holding the terminal can select the language and the content specifically included in the recognition result through the human-computer interaction interface of the terminal.
  • the terminal determines the language selected by the user and the specific content included in the recognition result, and uses the data processing method provided in the embodiment of the present application to obtain and present the recognition result that meets the needs of the user.
  • the data processing method is applied in a simultaneous interpretation scenario. As the speech proceeds, the voice data will continue to change, and the recognition result will also continue to change as the voice data changes.
  • the data processing method provided in the embodiments of this application can be applied to a variety of voice application scenarios, such as the above-mentioned simultaneous interpretation scenario and video conference scenario; in the video conference scenario, the command word detection module can also be used to determine what the user said Command words, based on the determined command words to perform corresponding operations on the documents displayed in the video conference.
  • the data processing method provided by the embodiment of the application obtains voice data and performs content detection on the voice data; when a command word contained in the voice data is detected, the presentation document is corresponding to the instruction corresponding to the command word Operation; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected, the The speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without manual operation or assisting the speaker to assist in the corresponding operation, which improves the efficiency of the speech, saves the speech time, and can improve the user Experience.
  • FIG. 4 is a schematic flowchart of a keyword detection method according to an embodiment of the application; as shown in FIG. 4, the keyword detection method can be applied to electronic devices (such as the above-mentioned server and a terminal held by a user).
  • electronic devices such as the above-mentioned server and a terminal held by a user.
  • Step 401 Receive input voice data
  • Step 402 Perform command word detection for the input voice data
  • Step 403 Determine whether the command word is detected; when it is determined that the command word is detected, go to step 404; when it is determined that the command word is not detected, go to step 405;
  • the judgment whether the command word is detected includes one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • Step 404 Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
  • the instruction may be an operation instruction for the presentation document, and a corresponding operation is performed on the presentation document according to the determined operation instruction.
  • the instruction may be an operation instruction for the presentation document, and a corresponding operation is performed on the presentation document according to the determined operation instruction.
  • the command word can also be extended based on user requirements.
  • the command word may also include operations for other programs, such as adjusting the volume of the electronic device (such as increasing the volume, etc.). After determining the corresponding command word for the volume, determine and execute the corresponding instruction to realize the adjustment of the volume. .
  • Step 405 Continue to detect the command word.
  • Fig. 5 is a schematic flow diagram of another data processing method according to an embodiment of the application; as shown in Fig. 5, the data processing method is applied to an electronic device (such as the above-mentioned server and a terminal held by a user), and the method includes:
  • Step 501 Perform voice recognition on voice data
  • the voice data is the voice spoken by the speaker during the speech in the simultaneous interpretation scenario.
  • Step 502 Perform command word detection on the voice data; determine whether the command word is detected, and if it is determined that the command word is detected, then go to step 503; if it is determined that the command word is not detected, then go to step 504;
  • the performing command word detection on the voice data includes:
  • a command word matching a word in the command dictionary in the recognized text is detected; the recognized text is obtained by text recognition of the voice data.
  • the recognized text corresponding to the voice data can be detected to determine whether the command word is detected, or the voice data can be directly detected during the voice recognition process to determine the command word.
  • the voice data can be directly detected during the voice recognition process to determine the command word.
  • Step 503 Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
  • Step 504 Perform machine translation on the recognized text corresponding to the voice data to obtain the translated text
  • the recognized text is text obtained by performing voice recognition on voice data.
  • Step 505 Determine whether speech synthesis is required; if it is determined that speech synthesis is required, go to step 506, and if it is determined not to perform speech synthesis, go to step 507;
  • the need for speech synthesis can be preset by the developer and stored in the corresponding device.
  • Step 506 Perform speech synthesis on the translated text
  • Step 507 Output the result of simultaneous interpretation.
  • the simultaneous interpretation result may include: recognized text and translated text; when it is determined that speech synthesis is required, the simultaneous interpretation result may also include: translated voice data (that is, voice data obtained by performing voice synthesis on the translated text).
  • FIG. 6 is a schematic flowchart of a method for executing commands in a simultaneous interpretation process according to an embodiment of the application; as shown in FIG. 6, the method for executing commands in a simultaneous interpretation process includes:
  • Step 601 The speaker uses the simultaneous interpretation system to give a speech
  • Step 602 When the speaker needs to perform a corresponding operation, the speaker speaks the corresponding command word;
  • Step 603 The simultaneous interpretation system uses the above data processing method to determine the corresponding command word, and executes the instruction corresponding to the command word;
  • Step 604 The lecturer continues to use the simultaneous interpretation system to give a lecture.
  • FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 7, the data processing device includes:
  • the obtaining unit 71 is configured to obtain voice data and perform content detection on the voice data
  • the first processing unit 72 is configured to perform corresponding operations on the presentation document according to instructions corresponding to the command words when the command words contained in the voice data are detected;
  • the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
  • the second processing unit 73 is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
  • the first processing unit 72 is configured to query a command vocabulary according to the voice data, and determine a command word in the voice data that meets a first preset condition.
  • the first processing unit 72 is configured to determine a command word that meets a first preset condition in the voice data, including at least one of the following:
  • the recognized text is obtained by text recognition of the voice data.
  • the first processing unit 72 is configured to query an instruction library according to the command word to determine a target instruction corresponding to the command word; the target instruction represents an operation instruction for a presentation document; the instruction The library includes at least one instruction and a command word corresponding to each instruction in the at least one instruction;
  • the second processing unit 73 is configured to query a command word database according to the voice data, and determine the speech content in the voice data that does not meet the second preset condition.
  • the second processing unit 73 is configured to determine the speech content that does not meet the second preset condition in the voice data, including at least one of the following:
  • Determining the speech content in the speech data the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
  • the second processing unit 73 is configured to determine a recognition result corresponding to the speech content
  • the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
  • the acquisition unit 71, the first processing unit 72, and the second processing unit 73 can all be processors in electronic devices (such as the aforementioned servers and terminals held by users), such as a central processing unit ( CPU, Central Processing Unit, Digital Signal Processor (DSP, Digital Signal Processor), Microcontroller Unit (MCU) or Programmable Gate Array (FPGA, Field-Programmable Gate Array) etc. are implemented.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • MCU Microcontroller Unit
  • FPGA Field-Programmable Gate Array
  • the device provided in the above embodiment performs data processing
  • only the division of the above-mentioned program modules is used as an example.
  • the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal
  • the internal structure is divided into different program modules to complete all or part of the processing described above.
  • the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 8 is a schematic diagram of the hardware composition of the electronic device according to the embodiment of the application.
  • the electronic device 80 includes a memory 83 and a processor. 82 and a computer program stored on the memory 83 and capable of running on the processor 82; the processor 82 located in the electronic device implements the method provided by one or more technical solutions on the electronic device side when the program is executed.
  • the processor 82 located in the electronic device 80 executes the program, it realizes: acquiring voice data, and performing content detection on the voice data;
  • the command word contained in the voice data When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
  • the speech content contained in the voice data When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
  • the electronic device further includes a communication interface 81; various components in the electronic device are coupled together through the bus system 84. It can be understood that the bus system 84 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 84 also includes a power bus, a control bus, and a status signal bus.
  • the memory 83 in the embodiment of the present application may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), and erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
  • the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • SSRAM synchronous static random access memory
  • Synchronous Static Random Access Memory Synchronous Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM synchronous connection dynamic random access memory
  • DRRAM Direct Rambus Random Access Memory
  • the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
  • the method disclosed in the foregoing embodiments of the present application may be applied to the processor 82 or implemented by the processor 82.
  • the processor 82 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 82 or instructions in the form of software.
  • the aforementioned processor 82 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
  • the processor 82 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium, and the storage medium is located in a memory.
  • the processor 82 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
  • the embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
  • a storage medium which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
  • Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the electronic device side when the computer instructions are executed by a processor.
  • the disclosed method and smart device can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
  • the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is allowed to execute all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.

Abstract

A data processing method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring voice data, and performing content detection on the voice data (301); upon detecting a command word included in the voice data, performing a corresponding operation with respect to a presentation document according to an instruction corresponding to the command word (302), wherein content of the presentation document is associated with content of the voice data, and the presentation document is used for a presentation during playback of the voice data; and upon detecting a speech content included in the voice data, presenting the speech content when the voice data is being played (303).

Description

数据处理方法、装置、电子设备和存储介质Data processing method, device, electronic equipment and storage medium 技术领域Technical field
本申请涉及同声传译技术,具体涉及一种数据处理方法、装置、电子设备和存储介质。This application relates to simultaneous interpretation technology, in particular to a data processing method, device, electronic equipment and storage medium.
背景技术Background technique
随着人工智能技术的飞速发展,人工智能(AI,Artificial Intelligence)概念已从实验室中的黑科技逐步落地现实,应用到现实生活中的方方面面。With the rapid development of artificial intelligence technology, the concept of artificial intelligence (AI) has gradually come to reality from the black technology in the laboratory and applied to all aspects of real life.
同传系统是近些年出现的针对会议场景的语音翻译产品,其运用AI技术为会议演讲者的演讲内容提供多语种的文本翻译和文本展现。The simultaneous interpretation system is a voice translation product for conference scenes that has appeared in recent years. It uses AI technology to provide multilingual text translation and text display for conference speakers' speech content.
相关同传系统中,对于会议演讲者所使用的演示文档进行操作时,需要由演讲者或辅助人员进行操作,对于演讲者来说,操作不方便。In the related simultaneous interpretation system, the operation of the presentation document used by the conference lecturer needs to be operated by the lecturer or auxiliary personnel, which is inconvenient for the lecturer.
发明内容Summary of the invention
为解决相关技术问题,本申请实施例提供了一种数据处理方法、装置、电子设备和存储介质。To solve related technical problems, embodiments of the present application provide a data processing method, device, electronic equipment, and storage medium.
本申请实施例提供了一种数据处理方法,包括:The embodiment of the application provides a data processing method, including:
获取语音数据,并对所述语音数据进行内容检测;Acquire voice data, and perform content detection on the voice data;
检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
上述方案中,检测所述语音数据中包含的命令词,包括:In the above solution, detecting the command words contained in the voice data includes:
根据所述语音数据查询命令词库,确定所述语音数据中符合第一预设条件的命令词。The command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
上述方案中,所述确定所述语音数据中符合第一预设条件的命令词,包括以下至少之一:In the above solution, the determining the command word in the voice data that meets the first preset condition includes at least one of the following:
确定所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Determining a command word in the voice data whose pronunciation similarity with a word in the command dictionary exceeds a preset threshold;
确定识别文本中与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。Determine a command word in the recognized text that matches a word in the command dictionary; the recognized text is obtained by text recognition of the voice data.
上述方案中,所述根据所述命令词对应的指令对演示文档进行相应 操作,包括:In the above solution, the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
根据所述命令词查询指令库,确定所述命令词对应的目标指令;所述目标指令表征针对演示文档的操作指令;所述指令库包括至少一个指令和所述至少一个指令中各指令对应的命令词;According to the command word query instruction library, the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
根据所述目标指令对演示文档执行相应操作。Perform corresponding operations on the presentation document according to the target instruction.
上述方案中,检测所述语音数据中包含的演讲内容,包括:In the above solution, detecting the speech content contained in the voice data includes:
根据所述语音数据查询命令词库,确定所述语音数据中不符合第二预设条件的演讲内容。According to the voice data query command vocabulary, the speech content in the voice data that does not meet the second preset condition is determined.
上述方案中,所述确定所述语音数据中不符合第二预设条件的演讲内容,包括以下至少之一:In the above solution, the determination of the speech content that does not meet the second preset condition in the voice data includes at least one of the following:
确定所述语音数据中的演讲内容;所述演讲内容中任意一个词的发音与所述命令词库中各个词的发音的相似度均低于预设阈值;Determining the speech content in the speech data; the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
确定识别文本中的演讲文本;所述演讲文本中任意一个词与所述命令词库中各个词的匹配度低于预设匹配度阈值;所述识别文本为对所述语音数据进行文本识别得到。Determine the speech text in the recognized text; the matching degree between any word in the speech text and each word in the command dictionary is lower than the preset matching degree threshold; the recognized text is obtained by text recognition of the speech data .
上述方案中,所述将所述演讲内容在所述语音数据被播放时进行呈现,包括:In the above solution, the presenting the speech content when the voice data is played includes:
确定所述演讲内容对应的识别结果;Determine the recognition result corresponding to the speech content;
将所述识别结果在所述语音数据被播放时进行呈现;Presenting the recognition result when the voice data is played;
其中,所述识别结果包括以下至少之一:至少一种语种的演讲文本、至少一种语种的翻译语音数据。Wherein, the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
本申请实施例还提供了一种同声传译装置,包括:The embodiment of the present application also provides a simultaneous interpretation device, including:
获取单元,配置为获取语音数据,并对所述语音数据进行内容检测;The acquiring unit is configured to acquire voice data and perform content detection on the voice data;
第一处理单元,配置为检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;The first processing unit is configured to, when a command word contained in the voice data is detected, perform a corresponding operation on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data ; The presentation document is used to present when the voice data is played;
第二处理单元,配置为检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。The second processing unit is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
本申请实施例又提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一数据处理方法的步骤。The embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements any of the foregoing data processing methods when the program is executed. step.
本申请实施例还提供了一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现上述任一数据处理方法的步骤。The embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any of the foregoing data processing methods are implemented.
本申请实施例提供的数据处理方法、装置、电子设备和存储介质,获取语音数据,并对所述语音数据进行内容检测;检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所 述语音数据被播放时进行呈现;检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现;如此,演讲者能够通过说话实现相应操作,无需自己手动操作或辅助演讲人员协助对演示文档进行相应操作,提高演讲效率,节约演讲时长,从而能够提高用户体验。The data processing method, device, electronic equipment, and storage medium provided in the embodiments of the present application acquire voice data, and perform content detection on the voice data; when the command word contained in the voice data is detected, the command word is Corresponding instructions perform corresponding operations on the presentation document; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; the voice data is detected When the speech content is included, the speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without the need for manual operation or assisting the lecturer to assist in the corresponding operation of the presentation document, which improves Speech efficiency, saving speech time, which can improve user experience.
附图说明Description of the drawings
图1为相关技术中同声传译方法的流程示意图;Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies;
图2为相关技术中同声传译过程中命令执行方法的流程示意图;Figure 2 is a schematic flow diagram of a command execution method in the simultaneous interpretation process in related technologies;
图3为本申请实施例的数据处理方法的一种流程示意图;FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the application;
图4为本申请实施例的数据处理方法的另一种流程示意图;4 is a schematic diagram of another flow chart of a data processing method according to an embodiment of the application;
图5为本申请实施例的数据处理方法的再一种流程示意图;FIG. 5 is a schematic flowchart of still another data processing method according to an embodiment of the application;
图6为本申请实施例的同声传译过程中命令执行方法的流程示意图;6 is a schematic flowchart of a command execution method in a simultaneous interpretation process according to an embodiment of the application;
图7为本申请实施例的数据处理装置的组成结构示意图;FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application;
图8为本申请实施例的电子设备的组成结构示意图。FIG. 8 is a schematic diagram of the composition structure of an electronic device according to an embodiment of the application.
具体实施方式Detailed ways
下面结合附图及具体实施例对本申请作进一步详细的说明。The application will be further described in detail below in conjunction with the drawings and specific embodiments.
图1为相关技术中同声传译方法的流程示意图;如图1所示,同传服务器启动后,演讲者运用所述同传服务器进行演讲;演讲过程中,同传服务器获取演讲者的语音数据,对语音数据进行语音识别,得到与语音数据语种相同的识别文本;再对所述识别文本进行机器翻译,得到翻译文本;同传服务器判断是否需要将所述翻译文本合成语音,确定需要合成语音,则根据所述翻译文本合成语音,将合成的语音、识别文本、翻译文本作为目标结果发出;确定不需要合成语音,则将所述识别文本、所述翻译文本作为目标结果发出。Figure 1 is a schematic flow diagram of the simultaneous interpretation method in related technologies; as shown in Figure 1, after the simultaneous interpretation server is started, the speaker uses the simultaneous interpretation server to give a speech; during the speech, the simultaneous interpretation server obtains the speaker’s voice data , Perform voice recognition on the voice data to obtain the recognized text in the same language as the voice data; then perform machine translation on the recognized text to obtain the translated text; the simultaneous interpretation server determines whether the translated text needs to be synthesized into speech, and determines the need to synthesize speech , The speech is synthesized according to the translated text, and the synthesized speech, the recognized text, and the translated text are sent out as the target result; if it is determined that no synthesized speech is needed, the recognized text and the translated text are sent out as the target result.
具体来说,所述同传服务器可以通过操作端采集所述语音数据;所述操作端可以为个人电脑(PC,Personal Computer)。所述同传服务器可以将目标结果发送给操作端,由所述操作端投屏到显示屏幕以显示识别文本、翻译文本;所述同传服务器还可以将目标结果发送给终端,通过所述终端的人机交互界面显示识别文本、翻译文本,通过所述终端的语音播放模块播放合成的语音;从而为用户展示目标结果,实现将演讲者的演讲内容翻译成用户需要的语种并进行展示。这里,所述终端可以为手机、平板电脑等;所述终端由用户持有。Specifically, the simultaneous interpretation server may collect the voice data through an operating terminal; the operating terminal may be a personal computer (PC, Personal Computer). The simultaneous interpretation server may send the target result to the operation terminal, and the operation terminal screens to the display screen to display the recognized text and the translated text; the simultaneous interpretation server may also send the target result to the terminal through the terminal The human-computer interaction interface displays recognized text and translated text, and the synthesized voice is played through the voice playback module of the terminal; thus, the target result is displayed to the user, and the content of the lecturer’s speech is translated into the language required by the user and displayed. Here, the terminal may be a mobile phone, a tablet computer, etc.; the terminal is held by the user.
演讲过程中,演讲者还可以通过所述操作端展示演示文档,具体将演示文档投屏到显示屏幕,通过所述显示屏幕展示给用户;这里,所述演示文档可以为演示文稿软件(PPT,PowerPoint)文档、Word文档等。During the speech, the speaker can also display the presentation document through the operation terminal, specifically project the presentation document to the display screen, and show it to the user through the display screen; here, the presentation document may be presentation software (PPT, PowerPoint) documents, Word documents, etc.
当演讲者需要对演示文档进行播放、翻页、回退、停止播放等操作时, 需由演讲者或者辅助演讲人员使用鼠标、翻页笔等设备点击相应按键实现。在此过程中,为进行播放、翻页、回退等操作,演讲者需要停止演讲,对演示文档进行相应操作后再继续演讲,或者,演讲者需向辅助演讲人员发出命令,辅助演讲人员执行相应操作后,演讲者再继续演讲。When the lecturer needs to perform operations such as playing, turning pages, rewinding, or stopping playing the presentation document, the lecturer or assistant lecturer needs to use the mouse, page turning pen and other devices to click the corresponding button. In this process, in order to perform operations such as playing, turning pages, and rewinding, the lecturer needs to stop the lecture and perform corresponding operations on the presentation document before continuing the lecture, or the lecturer needs to issue a command to the assistant lecturer to assist the lecturer to perform After the corresponding operation, the speaker will continue to speak.
具体可以参考图2所示的流程图,如图2所示,所述同声传译过程中命令执行方法,包括:演讲者使用同传系统进行演讲的过程中,当演讲者需对演示文档进行某一操作,当没有辅助演讲人员时,演讲者对相应设备(如上述操作端)进行相应操作,比如翻页、回退等操作,从而相应设备确定并执行相应的命令;当有辅助演讲人员时,演讲者向辅助演讲人员发出命令,由辅助演讲人员对相应设备进行相应操作,从而所述相应设备确定并执行相应命令;演讲者等待相应设备执行完相应命令后,演讲者继续使用同传系统进行演讲。这里,所述同传系统可以包括上述同传服务器、操作端、显示屏幕、终端。For details, please refer to the flowchart shown in Figure 2. As shown in Figure 2, the command execution method in the simultaneous interpretation process includes: When the speaker uses the simultaneous interpretation system to give a speech, when the speaker needs to perform a presentation on the presentation document For a certain operation, when there is no auxiliary lecturer, the lecturer performs corresponding operations on the corresponding equipment (such as the above operation terminal), such as page turning, backing, etc., so that the corresponding equipment determines and executes the corresponding command; when there is an auxiliary lecturer When the lecturer issues a command to the assistant lecturer, the assistant lecturer performs corresponding operations on the corresponding device, so that the corresponding device determines and executes the corresponding command; after the lecturer waits for the corresponding device to execute the corresponding command, the lecturer continues to use simultaneous interpretation The system gives a speech. Here, the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, display screen, and terminal.
相关技术中的方案,当演讲者自己对相应设备进行操作(如上述针对演示文档进行操作),演讲者需停止演讲再对演示文档进行相应操作,增加了演讲时长,破坏观众的体验;当有辅助演讲人员时,演讲者需要向辅助演讲人员发出命令,由辅助演讲人员对相应设备进行相应操作,这还会增加演讲所需要的人力成本。In the related technology, when the speaker operates the corresponding equipment (such as the above operation for the presentation document), the speaker needs to stop the speech and then perform the corresponding operation on the presentation document, which increases the length of the speech and destroys the audience’s experience; When assisting the lecturer, the lecturer needs to issue an order to the assistant lecturer, and the assistant lecturer will perform corresponding operations on the corresponding equipment, which will also increase the labor cost required for the lecture.
基于此,在本申请的各种实施例中,获取语音数据,并对所述语音数据进行内容检测;检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现;从而,演讲者能够通过说话对演示文档实现相应操作,无需自己手动操作或辅助演讲人员协助执行相应操作,提高演讲效率,节约演讲时长,从而能够提高用户体验。Based on this, in various embodiments of the present application, voice data is acquired, and the content of the voice data is detected; when a command word contained in the voice data is detected, a demonstration is performed according to the instruction corresponding to the command word. The document performs corresponding operations; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected , The speech content is presented when the voice data is played; thus, the speaker can perform corresponding operations on the presentation document by speaking, without having to manually operate or assist the speaker to assist in performing the corresponding operations, improve the efficiency of the speech, and save the speech Time, which can improve the user experience.
本申请实施例提供了一种数据处理方法,图3为本申请实施例的数据处理方法的一种流程示意图;如图3所示,所述方法包括:The embodiment of the application provides a data processing method. FIG. 3 is a schematic flowchart of the data processing method of the embodiment of the application; as shown in FIG. 3, the method includes:
步骤301:获取语音数据,并对所述语音数据进行内容检测;Step 301: Acquire voice data, and perform content detection on the voice data;
步骤302:检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;Step 302: When the command words contained in the voice data are detected, perform corresponding operations on the presentation document according to the instructions corresponding to the command words;
这里,所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;Here, the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
步骤303:检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。Step 303: When the speech content contained in the voice data is detected, present the speech content when the voice data is played.
这里,所述演示文档展示的内容与语音数据的内容相关联,指所述演示文档为演讲者演讲时通过显示屏幕展示的与演讲内容相关联的文档,如演讲时展示的演示文稿软件(PPT,PowerPoint)文档、Word文档等。Here, the content displayed by the presentation document is associated with the content of the voice data, which means that the presentation document is a document associated with the content of the speech that is displayed on the display screen when the lecturer is speaking, such as the presentation software (PPT) displayed during the lecture. , PowerPoint) documents, Word documents, etc.
所述演示文档用于在所述语音数据被播放时进行呈现,指所述演示文档在所述语音数据被播放的同时进行呈现。即所述数据处理方法可以应用于任意需要展示演示文档的会议场景,如:研讨会等。The presentation document is used for presentation when the voice data is played, which means that the presentation document is presented while the voice data is being played. That is, the data processing method can be applied to any meeting scene where a presentation document needs to be displayed, such as a seminar.
所述数据处理方法可以应用于电子设备;所述电子设备可以为服务器、终端等。The data processing method can be applied to electronic equipment; the electronic equipment can be a server, a terminal, or the like.
实际应用时,所述电子设备可以为所述服务器,所述服务器将演示文档投屏到显示屏幕进行展示,演讲过程中,所述服务器接收所述语音数据,执行本申请实施例的数据处理方法,实现对演示文档进行相应操作。In actual application, the electronic device may be the server, and the server projects the presentation document to the display screen for display. During the speech, the server receives the voice data and executes the data processing method of the embodiment of the present application. , Realize the corresponding operation on the presentation document.
所述电子设备也可以为终端,演示文档可以由服务器投屏到显示屏幕进行展示,演讲过程中,所述终端接收所述语音数据,执行本申请实施例的数据处理方法,确定针对演示文档的指令,将相应指令发送给服务器,由所述服务器实现对演示文档进行相应操作。The electronic device may also be a terminal, and the presentation document may be projected from the server to the display screen for display. During the speech process, the terminal receives the voice data, executes the data processing method of the embodiment of the application, and determines the content of the presentation document Instruction, the corresponding instruction is sent to the server, and the server implements the corresponding operation on the presentation document.
所述电子设备为终端时,所述演示文档也可以通过终端的人机交互界面展示;演讲过程中,所述终端接收所述语音数据,执行本申请实施例的数据处理方法,实现对通过自身的人机交互界面展示的演示文档进行相应操作。When the electronic device is a terminal, the presentation document can also be displayed through the terminal's human-computer interaction interface; during the speech process, the terminal receives the voice data, executes the data processing method in the embodiment of the application, and realizes Perform corresponding operations on the presentation document displayed on the human-computer interactive interface.
这里,演讲者进行演讲时,可以由操作端采集语音数据,所述操作端(如PC等移动终端)可以设有或者连接有语音采集模块,如麦克风,所述操作端通过所述语音采集模块进行声音采集,得到语音数据,并将所述语音数据发送给电子设备(具体可以为上述服务器或终端),由所述电子设备执行本申请实施例的数据处理方法。Here, when the speaker is giving a speech, the voice data can be collected by the operating terminal. The operating terminal (such as a mobile terminal such as a PC) can be equipped with or connected to a voice collection module, such as a microphone, through the voice collection module. Voice collection is performed to obtain voice data, and the voice data is sent to an electronic device (specifically, the server or terminal described above), and the electronic device executes the data processing method of the embodiment of the present application.
实际应用时,为了确定用户是否需要对演示文档进行操作,需要对语音数据进行检测;当检测到语音数据中包含命令词时,即可确定用户需要对演示文档进行操作。In actual application, in order to determine whether the user needs to operate the presentation document, the voice data needs to be detected; when it is detected that the voice data contains command words, it can be determined that the user needs to operate the presentation document.
基于此,在一实施例中,检测所述语音数据中包含的命令词,包括:Based on this, in an embodiment, detecting the command words contained in the voice data includes:
根据所述语音数据查询命令词库,确定所述语音数据中符合第一预设条件的命令词。The command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
具体地,所述确定所述语音数据中符合第一预设条件的命令词,包括以下至少之一:Specifically, the determination of the command word that meets the first preset condition in the voice data includes at least one of the following:
确定所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Determining a command word in the voice data whose pronunciation similarity with a word in the command dictionary exceeds a preset threshold;
确定识别文本中与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。Determine a command word in the recognized text that matches a word in the command dictionary; the recognized text is obtained by text recognition of the voice data.
这里,实际应用时,可以直接对语音数据进行检测,确定其中的命令词(语音格式),也可以对语音数据进行语音识别后,确定与语音数据相同语种的识别文本,确定识别文本中的命令词(文本格式)。Here, in actual application, you can directly detect the voice data to determine the command words (voice format), or you can determine the recognized text in the same language as the voice data after the voice data is recognized, and determine the command in the recognized text Word (text format).
相应地,所述命令词库中包括至少一个命令词,每个命令词对应有语音格式的表述(即命令词的发音)和文本格式的表述(即命令词的文字)。Correspondingly, the command vocabulary includes at least one command word, and each command word corresponds to a voice format expression (that is, the pronunciation of the command word) and a text format expression (that is, the text of the command word).
实际应用时,基于发音确定的命令词(具体指上述确定的所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词)与基于文本确定的命令词(具体指上述确定的识别文本中与所述命令词库中一个词匹配的命令词)相同时,可以直接确定命令词;In actual application, the command words determined based on the pronunciation (specifically, the command words in the voice data determined above and the pronunciation of a word in the command dictionary exceed a preset threshold) and the command words determined based on the text (Specifically referring to the command word that matches a word in the command word library in the above-identified recognized text) is the same, the command word can be directly determined;
基于发音确定的命令词与基于文本确定的命令词也可能不同,此时,需要从基于发音确定的命令词、基于文本确定的命令词中确定目标命令词。The command words determined based on pronunciation may also be different from the command words determined based on text. In this case, the target command words need to be determined from the command words determined based on the pronunciation and the command words determined based on the text.
具体来说,将基于发音确定的命令词,记作第一命令词;将基于文本确定的命令词,记作第二命令词;Specifically, the command word determined based on pronunciation is recorded as the first command word; the command word determined based on the text is recorded as the second command word;
确定所述第一命令词与所述第二命令词不相同时,确定第一权重和第二权重;所述第一权重表征基于发音确定命令词的可信度,所述第二权重表征基于文本确定命令词的可信度;根据所述第一权重对所述第一命令词作加权处理,并根据所述第二权重对所述第二命令词作加权处理;根据加权处理结果从所述第一命令词和所述第二命令词中选择一个命令词作为目标命令词。When it is determined that the first command word is different from the second command word, the first weight and the second weight are determined; the first weight characterization determines the credibility of the command word based on pronunciation, and the second weight characterization is based on The text determines the credibility of the command word; weighting the first command word according to the first weight, and weighting the second command word according to the second weight; Select one of the first command word and the second command word as the target command word.
举例来说,确定所述第一命令词为命令词A、确定第二命令词也为命令词A时,则直接将命令词A作为目标人物。For example, when it is determined that the first command word is command word A and the second command word is also command word A, then command word A is directly used as the target person.
确定所述第一命令词为命令词A、确定第二命令词为命令词B时,则需要确定第一权重和第二权重;根据所述第一权重对所述命令词A作加权处理,并根据所述第二权重对所述命令词B作加权处理;得到命令词A的加权结果和命令词B的加权结果,将命令词A的加权结果和命令词B的加权结果进行比较,确定命令词A的加权结果大于命令词B的加权结果,则将命令词A作为目标命令词,反之,则将命令词B作为目标命令词。When it is determined that the first command word is command word A, and the second command word is determined to be command word B, the first weight and the second weight need to be determined; the command word A is weighted according to the first weight, The command word B is weighted according to the second weight; the weighted result of the command word A and the weighted result of the command word B are obtained, and the weighted result of the command word A and the weighted result of the command word B are compared to determine The weighted result of command word A is greater than the weighted result of command word B, then command word A is used as the target command word, and vice versa, command word B is used as the target command word.
这里,考虑到基于发音也可能得到多个第一命令词;通过文本也可能得到多个第二命令词;即所述第一命令词的数量为至少两个,所述第二命令词的数量为至少两个。Here, considering that it is possible to obtain multiple first command words based on pronunciation; it is also possible to obtain multiple second command words through text; that is, the number of first command words is at least two, and the number of second command words For at least two.
相应情况下,根据第一权重对至少两个第一命令词中各第一命令词分别进行加权处理,得到针对每个第一命令词的加权处理结果;Under corresponding circumstances, weighting each first command word in the at least two first command words respectively according to the first weight, to obtain a weighted processing result for each first command word;
根据第二权重对至少两个第二命令词中各第二命令词分别进行加权处理,得到针对每个第二命令词的加权处理结果;Weighting each second command word in the at least two second command words according to the second weight, to obtain a weighted processing result for each second command word;
确定所述至少两个第一命令词与所述至少两个第二命令词不存在同一命令词(即至少两个第一命令词中各命令词,与至少两个第二命令词中各命令词均不相同),则根据每个第一命令词的加权处理结果和每个第二命令词的加权处理结果,从至少两个第一命令词和至少两个第二命令词中选择加权处理结果最大的命令词,作为目标命令词;It is determined that the at least two first command words and the at least two second command words do not have the same command word (that is, each command word in the at least two first command words, and each command word in the at least two second command words Words are not the same), then according to the weighted processing result of each first command word and the weighted processing result of each second command word, select weighted processing from at least two first command words and at least two second command words The command word with the largest result is used as the target command word;
确定所述至少两个第一命令词与所述至少两个第二命令词存在同一命令词(即至少两个第一命令词中存在某个命令词,与至少两个第二命令词中存在的某个命令词相同),则将针对同一命令词的加权处理结果相加,得到针对每个命令词的加权处理结果;根据每个命令词的加权处理结果,从 至少两个第一命令词和至少两个第二命令词中选择加权处理结果最大的命令词,作为目标命令词。It is determined that the at least two first command words and the at least two second command words have the same command word (that is, a certain command word exists in at least two first command words, and there is a command word in at least two second command words The same command word), the weighted processing results for the same command word are added together to obtain the weighted processing result for each command word; according to the weighted processing result of each command word, from at least two first command words And at least two second command words, the command word with the largest weighted processing result is selected as the target command word.
举例来说,基于发音得到命令词A、命令词B,并得到命令词A为目标命令词的可能性为a1%、命令词B为目标命令词的可能性为b%;这里,a1%+b%可以等于1;For example, based on pronunciation, command word A and command word B are obtained, and the probability of command word A being the target command word is a1%, and the probability of command word B being the target command word is b%; here, a1%+ b% can be equal to 1;
基于文本得到命令词A、命令词C,并得到命令词A为目标命令词的可能性为a2%、命令词C为目标命令词的可能性为c%;这里,a2%+c%可以等于1;Obtain command word A and command word C based on the text, and get command word A as the target command word. The probability that command word A is the target command word is a2%, and the probability of command word C being the target command word is c%; here, a2%+c% can be equal to 1;
假设第一权重为x,第二权重为y,且x+y=1;Suppose the first weight is x, the second weight is y, and x+y=1;
针对每个命令词的加权处理结果如下:The weighted processing results for each command word are as follows:
命令词A的加权处理结果:a1%*x+a2%*y;The weighted processing result of command word A: a1%*x+a2%*y;
命令词B的加权处理结果:b%*x;The weighted processing result of command word B: b%*x;
命令词C的加权处理结果:c%*y;The weighted processing result of the command word C: c%*y;
从命令词A、命令词B、命令词C中选择加权处理结果最大的命令词,作为目标命令词。The command word with the largest weighted processing result is selected from the command word A, the command word B, and the command word C as the target command word.
检测到命令词后,所述电子设备可以确定用户需要对演示文档进行相应操作,所述电子设备可以根据检测到的命令词确定相应的操作指令,以实现对演示文档的控制。After detecting the command word, the electronic device can determine that the user needs to perform a corresponding operation on the presentation document, and the electronic device can determine the corresponding operation instruction according to the detected command word to realize the control of the presentation document.
具体地,所述根据所述命令词对应的指令对演示文档进行相应操作,包括:Specifically, the corresponding operation on the presentation document according to the instruction corresponding to the command word includes:
根据所述命令词查询指令库,确定所述命令词对应的目标指令;所述目标指令表征针对演示文档的操作指令;所述指令库包括至少一个指令和所述至少一个指令中各指令对应的命令词;According to the command word query instruction library, the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
根据所述目标指令对演示文档执行相应操作。Perform corresponding operations on the presentation document according to the target instruction.
通过确定的所述目标指令,对所述演示文档执行相应操作,即实现对所述演示文档的控制。Through the determined target instruction, the corresponding operation is performed on the presentation document, that is, the control of the presentation document is realized.
举例来说,检测到命令词“后退”时,根据所述命令词“后退”确定针对演示文档的、控制所述演示文档后退的第一控制指令;即所述第一控制指令,用于控制演示文档后退(即翻到前一页);For example, when the command word "back" is detected, the first control instruction for the presentation document that controls the back of the presentation document is determined according to the command word "back"; that is, the first control instruction is used to control Back the presentation document (that is, turn to the previous page);
检测到命令词“翻页”时,根据所述命令词“翻页”确定针对演示文档的、控制所述演示文档翻页的第二控制指令;即所述第二控制指令,用于控制演示文档翻页(即翻到后一页)。When the command word "page turning" is detected, a second control instruction for controlling the page turning of the presentation document is determined according to the command word "page turning"; that is, the second control instruction is used to control the presentation Turn the document page (that is, turn to the next page).
所述第一控制指令具体可以由展示演示文档的程序进行执行,例如,所述展示文档为PPT文档,则对应的程序可以为Microsoft Office PowerPoint应用程序;The first control instruction may be specifically executed by a program that displays a presentation document. For example, if the presentation document is a PPT document, the corresponding program may be a Microsoft Office PowerPoint application;
即由所述Microsoft Office PowerPoint应用程序确定第一控制指令并执行相应的后退操作。That is, the Microsoft Office PowerPoint application determines the first control instruction and executes the corresponding back operation.
所述第二控制指令与所述第一控制指令类似,所述第二控制指令可以 由展示演示文档的程序进行呈现,例如,所述展示文档为PPT文档,则对应的程序可以为Microsoft Office PowerPoint应用程序;The second control instruction is similar to the first control instruction. The second control instruction may be presented by a program that displays a presentation document. For example, if the presentation document is a PPT document, the corresponding program may be Microsoft Office PowerPoint application;
即由所述Microsoft Office PowerPoint应用程序确定第二控制指令,执行相应的翻页操作。That is, the Microsoft Office PowerPoint application determines the second control instruction, and executes the corresponding page turning operation.
这里,所述指令库可以由开发人员预先设置并保存电子设备中。所述指令库中保存的指令和相应指令对应的命令词可以预先提供给演讲者查看。Here, the instruction library may be preset by the developer and stored in the electronic device. The instructions stored in the instruction library and the command words corresponding to the corresponding instructions can be provided to the speaker for viewing in advance.
具体来说,所述数据处理方法应用于服务器(即所述电子设备为服务器)时,所述指令库可以由开发人员预先设置并保存在服务器中。Specifically, when the data processing method is applied to a server (that is, the electronic device is a server), the instruction library may be preset by the developer and stored in the server.
所述数据处理方法应用于终端(即所述电子设备为终端)时,所述指令库可以由开发人员预先设置并保存在服务器后,将设置的指令库发送给所述终端;相应地,当所述指令库更新后,所述服务器可以再次发送更新后的指令库给所述终端,所述终端接收并保存相应指令库。When the data processing method is applied to a terminal (that is, the electronic device is a terminal), the instruction library may be preset by the developer and saved in the server, and then the set instruction library may be sent to the terminal; accordingly, when After the instruction library is updated, the server may send the updated instruction library to the terminal again, and the terminal receives and saves the corresponding instruction library.
所述命令词的语种可以有至少一种语种,从而针对使用不同语种的演讲者,均可以通过查找指令库,得到所述命令词对应的目标指令。The language of the command word can have at least one language, so that for speakers of different languages, the target command corresponding to the command word can be obtained by searching the command library.
针对任意一种语种的命令词,可以对应有至少一种表述,即可以有至少一种语义相似的命令词。For command words in any language, there can be at least one expression, that is, there can be at least one command word with similar semantics.
举例来说,所述指令包括:演示文档翻页、演示文档回退等;For example, the instructions include: page turning of the presentation document, rewinding of the presentation document, etc.;
针对“演示文档翻页”指令,可以对应有如下语义相似的命令词:翻页、翻到下一页、跳到下一页等;For the "turning presentation document page" command, there can be corresponding command words with similar semantics as follows: turn page, turn to the next page, skip to the next page, etc.;
针对“演示文档回退”指令,可以对应有如下语义相似的命令词:返回前一页、回退等。For the "return of presentation document" command, there can be corresponding command words with similar semantics as follows: return to the previous page, return, etc.
本申请实施例的数据处理方法还可以应用于同声传译场景。当应用于同声传译场景时,所述电子设备具体可以为实现同声传译的设备,如图1方法所应用的同传服务器,所述语音数据可以由操作端采集并发送给所述同传服务器。所述操作端可以是上述PC等。The data processing method of the embodiment of the present application can also be applied to a simultaneous interpretation scene. When applied to a simultaneous interpretation scenario, the electronic device may specifically be a device that realizes simultaneous interpretation, such as the simultaneous interpretation server used in the method shown in Figure 1, and the voice data may be collected by the operating terminal and sent to the simultaneous interpretation. server. The operating terminal may be the aforementioned PC or the like.
具体来说,所述同声传译场景可以采用同传系统(所述同传系统可以包括上述同传服务器、操作端、终端)的架构。本申请实施例的数据处理方法可以应用于实现同声传译的设备,所述实现同声传译的设备可以是在所述同传系统的架构中新增加的设备,也可以是对所述同传系统的架构中某一设备(如所述同传服务器、终端)进行改进,以能够实现本申请实施例的方法即可。Specifically, the simultaneous interpretation scene may adopt the architecture of a simultaneous interpretation system (the simultaneous interpretation system may include the foregoing simultaneous interpretation server, operating terminal, and terminal). The data processing method of the embodiment of the present application can be applied to a device that implements simultaneous interpretation. The device that implements simultaneous interpretation may be a newly added device in the architecture of the simultaneous interpretation system, or it may be used for simultaneous interpretation. A certain device (such as the simultaneous interpretation server and terminal) in the system architecture can be improved to be able to implement the method of the embodiment of the present application.
这里,所述改进可以在某一设备中加入命令词检测模块;所述命令词检测模块用于对所述语音数据进行检测,当检测出相应的命令词时,确定所述命令词对应的指令;根据所述指令执行对应的操作。Here, the improvement can include a command word detection module in a certain device; the command word detection module is used to detect the voice data, and when a corresponding command word is detected, determine the command corresponding to the command word ; Perform corresponding operations according to the instructions.
具体来说,在会议的同声传译场景下,演讲者进行演讲时,操作端(如PC)可以设有或者连接有语音采集模块,如麦克风,通过所述语音采集模块进行声音采集,得到语音数据,并将所述语音数据发送给所述实现同声 传译的设备;所述操作端还可以将演示文档投屏到显示屏幕,通过显示屏幕展示给用户。所述实现同声传译的设备接收语音数据,执行本申请实施例的数据处理方法,实现对演示文档进行相应操作,并且针对语音数据中的演讲内容进行同声传译。Specifically, in the simultaneous interpretation scenario of a conference, when the speaker is giving a speech, the operating terminal (such as a PC) may be equipped with or connected to a voice collection module, such as a microphone, through which voice collection is performed to obtain the voice And send the voice data to the device for realizing simultaneous interpretation; the operating terminal can also project the presentation document to the display screen, and show it to the user through the display screen. The device for implementing simultaneous interpretation receives voice data, executes the data processing method of the embodiment of the present application, implements corresponding operations on the presentation document, and performs simultaneous interpretation on the speech content in the voice data.
所述实现同声传译的设备为服务器时,所述服务器接收所述操作端发送的语音数据,执行本申请实施例的数据处理方法,实现对演示文档进行相应操作,并且针对语音数据中的演讲内容进行同声传译。When the device for realizing simultaneous interpretation is a server, the server receives the voice data sent by the operating terminal, executes the data processing method in the embodiment of this application, and implements corresponding operations on the presentation document and addresses the speech data in the voice data. The content is interpreted simultaneously.
所述实现同声传译的设备还可以为用户持有的终端,所述操作端或者接收语音数据的服务器可以将所述语音数据发送给所述用户持有的终端,所述用户持有的终端接收语音数据,执行本申请实施例的数据处理方法,实现对演示文档进行相应操作,并且针对语音数据中的演讲内容进行同声传译。The device for realizing simultaneous interpretation may also be a terminal held by the user, the operating terminal or the server that receives voice data may send the voice data to the terminal held by the user, and the terminal held by the user Receive voice data, execute the data processing method of the embodiment of the present application, implement corresponding operations on the presentation document, and perform simultaneous interpretation of the speech content in the voice data.
实际应用时,为了可以实现同声传译,需要确定语音数据中的演讲内容,对所述演讲内容进行文本识别,并将识别结果展示给用户。In actual applications, in order to achieve simultaneous interpretation, it is necessary to determine the speech content in the speech data, perform text recognition on the speech content, and display the recognition result to the user.
基于此,在一实施例中,检测所述语音数据中包含的演讲内容,包括:Based on this, in an embodiment, detecting the speech content contained in the voice data includes:
根据所述语音数据查询命令词库,确定所述语音数据中不符合第二预设条件的演讲内容。According to the voice data query command vocabulary, the speech content in the voice data that does not meet the second preset condition is determined.
具体地,所述确定所述语音数据中不符合第二预设条件的演讲内容,包括以下至少之一:Specifically, the determining the speech content in the voice data that does not meet the second preset condition includes at least one of the following:
确定所述语音数据中的演讲内容;所述演讲内容中任意一个词的发音与所述命令词库中各个词的发音的相似度均低于预设阈值;Determining the speech content in the speech data; the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
确定识别文本中的演讲文本;所述演讲文本中任意一个词与所述命令词库中各个词的匹配度低于预设匹配度阈值;所述识别文本为对所述语音数据进行文本识别得到。Determine the speech text in the recognized text; the matching degree of any word in the speech text with each word in the command dictionary is lower than the preset matching degree threshold; the recognized text is obtained by text recognition of the speech data .
这里,以上所述的预设阈值、预设匹配度阈值可以由开发人员预先设定并保存在相应设备中。Here, the above-mentioned preset threshold and preset matching degree threshold may be preset by the developer and stored in the corresponding device.
检测演讲内容时,可以直接对语音数据进行检测,确定其中的演讲内容(语音格式),也可以对语音数据进行语音识别后,确定识别文本,再确定识别文本中的演讲内容(文本格式),即确定识别文本中的演讲文本。When detecting the speech content, you can directly detect the speech data to determine the speech content (speech format), or you can determine the recognized text after the speech data is recognized, and then determine the speech content in the recognized text (text format), That is, the speech text in the recognized text is determined.
确定演讲内容后,需要将演讲内容以用户所需的形式进行呈现,具体来说,所述将所述演讲内容在所述语音数据被播放时进行呈现,包括:After determining the content of the speech, it is necessary to present the content of the speech in a form required by the user. Specifically, the presenting the content of the speech when the voice data is played includes:
确定所述演讲内容对应的识别结果;Determine the recognition result corresponding to the speech content;
将所述识别结果在所述语音数据被播放时进行呈现;Presenting the recognition result when the voice data is played;
其中,所述识别结果包括以下至少之一:至少一种语种的识别文本、至少一种语种的翻译语音数据。Wherein, the recognition result includes at least one of the following: recognized text in at least one language, and translated speech data in at least one language.
这里,所述确定演讲内容对应的识别结果,包括以下至少之一:Here, the identification result corresponding to the determined speech content includes at least one of the following:
对所述演讲内容进行语音识别,得到第一识别文本;所述第一识别文本对应的语种与所述演讲内容对应的语种相同;Performing voice recognition on the speech content to obtain a first recognized text; the language corresponding to the first recognized text is the same as the language corresponding to the speech content;
对所述第一识别文本进行翻译,得到第二识别文本;所述第二识别文本对应的语种与所述演讲内容对应的语种不相同;Translating the first recognized text to obtain a second recognized text; the language corresponding to the second recognized text is different from the language corresponding to the speech content;
对所述第二识别文本进行语音合成,得到所述第二识别文本对应的语音,即为所述演讲内容对应的翻译语音数据。The speech synthesis is performed on the second recognized text to obtain the speech corresponding to the second recognized text, which is the translated speech data corresponding to the speech content.
实际应用时,所述识别结果具体包括的内容可以由持有终端的用户预先进行选择,并将选择的结果发送给服务器,从而所述服务器根据用户的选择提供相应的识别结果。In actual application, the content specifically included in the recognition result can be selected in advance by the user holding the terminal, and the selected result is sent to the server, so that the server provides the corresponding recognition result according to the user's selection.
基于此,在一实施例中,所述数据处理方法应用于服务器时,所述方法还包括:Based on this, in an embodiment, when the data processing method is applied to a server, the method further includes:
接收终端发送的第一获取请求;所述第一获取请求用于获取相应识别结果;Receiving a first acquisition request sent by a terminal; the first acquisition request is used to acquire a corresponding recognition result;
根据所述第一获取请求,确定目标识别结果,将获取的所述目标识别结果发送给终端。According to the first acquisition request, the target recognition result is determined, and the acquired target recognition result is sent to the terminal.
例如,所述第一获取请求可以是请求获取识别文本;也可以是请求获取翻译语音数据;还可以是请求获取识别文本和翻译语音数据。For example, the first acquisition request may be a request for acquiring recognized text; it may also be a request for acquiring translated voice data; it may also be a request for acquiring recognized text and translated voice data.
实际应用时,为了提供符合用户需求的语种对应的识别结果,可以根据用户通过终端发送的获取请求,获取相应语种的识别结果。In actual application, in order to provide a recognition result corresponding to a language that meets the needs of the user, the recognition result of the corresponding language can be obtained according to the acquisition request sent by the user through the terminal.
基于此,在一实施例中,所述方法还可以包括:Based on this, in an embodiment, the method may further include:
接收终端发送的第二获取请求;所述第二获取请求至少包括:目标语种;Receiving a second acquisition request sent by the terminal; the second acquisition request includes at least: the target language;
从至少一种语种的识别结果中获取所述目标语种对应的识别结果;Acquiring the recognition result corresponding to the target language from the recognition result of at least one language;
将获取的所述目标语种对应的识别结果发送给终端。The obtained recognition result corresponding to the target language is sent to the terminal.
这里,所述终端可以设有人机交互界面,持有终端的用户通过人机交互界面可以选择语种,终端根据用户的选择生成包含目标语种的第二获取请求,并将第二获取请求发送给服务器,从而所述服务器接收所述第二获取请求。Here, the terminal may be provided with a human-computer interaction interface. The user holding the terminal can select a language through the human-computer interaction interface. The terminal generates a second acquisition request containing the target language according to the user's selection, and sends the second acquisition request to the server , So that the server receives the second acquisition request.
所述终端可以为手机;这是考虑到目前绝大多数用户都会随身携带手机,将识别结果发送到手机上,无需再增加其他设备来接收并展示识别结果,可以节约成本,且操作方便。The terminal may be a mobile phone; this is considering that most users currently carry a mobile phone with them and send the recognition result to the mobile phone without adding other devices to receive and display the recognition result, which can save cost and is convenient to operate.
在一实施例中,所述数据处理方法还可以应用于用户持有的终端,持有终端的用户可以通过终端的人机交互界面选择语种、选择识别结果具体包括的内容,所述用户持有的终端确定用户选择的语种、识别结果具体包括的内容,运用本申请实施例提供的数据处理方法得到符合用户需求的识别结果并进行呈现。In an embodiment, the data processing method can also be applied to a terminal held by a user. The user holding the terminal can select the language and the content specifically included in the recognition result through the human-computer interaction interface of the terminal. The terminal determines the language selected by the user and the specific content included in the recognition result, and uses the data processing method provided in the embodiment of the present application to obtain and present the recognition result that meets the needs of the user.
本申请实施例中,所述数据处理方法应用于同声传译场景下,随着演讲的进行,语音数据将不断变化,所述识别结果也随着语音数据的变化而不断变化。In the embodiment of the present application, the data processing method is applied in a simultaneous interpretation scenario. As the speech proceeds, the voice data will continue to change, and the recognition result will also continue to change as the voice data changes.
本申请实施例提供的数据处理方法,可以适用于多种语音应用场景, 例如:上述同声传译场景、视频会议场景;应用于视频会议场景下,也可通过命令词检测模块确定用户说出的命令词,基于确定的命令词对视频会议中展示的文档进行相应的操作。The data processing method provided in the embodiments of this application can be applied to a variety of voice application scenarios, such as the above-mentioned simultaneous interpretation scenario and video conference scenario; in the video conference scenario, the command word detection module can also be used to determine what the user said Command words, based on the determined command words to perform corresponding operations on the documents displayed in the video conference.
应理解,上述实施例中说明各步骤的顺序并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the order of the steps described in the above embodiments does not mean the order of execution. The order of execution of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
本申请实施例提供的数据处理方法,获取语音数据,并对所述语音数据进行内容检测;检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现;如此,演讲者能够通过说话实现相应操作,无需自己手动操作或辅助演讲人员协助执行相应操作,提高演讲效率,节约演讲时长,从而能够提高用户体验。The data processing method provided by the embodiment of the application obtains voice data and performs content detection on the voice data; when a command word contained in the voice data is detected, the presentation document is corresponding to the instruction corresponding to the command word Operation; the content of the presentation document is associated with the content of the voice data; the presentation document is used to present the voice data when the voice data is played; when the speech content contained in the voice data is detected, the The speech content is presented when the voice data is played; in this way, the speaker can realize the corresponding operation by speaking, without manual operation or assisting the speaker to assist in the corresponding operation, which improves the efficiency of the speech, saves the speech time, and can improve the user Experience.
图4为本申请实施例的关键词检测方法的一种流程示意图;如图4所示,所述关键词检测方法可以应用于电子设备(如上述服务器、用户持有的终端),所述方法包括:FIG. 4 is a schematic flowchart of a keyword detection method according to an embodiment of the application; as shown in FIG. 4, the keyword detection method can be applied to electronic devices (such as the above-mentioned server and a terminal held by a user). include:
步骤401:接收输入的语音数据;Step 401: Receive input voice data;
步骤402:针对输入的语音数据进行命令词检测;Step 402: Perform command word detection for the input voice data;
步骤403:判断是否检测到命令词;确定检测到命令词时,则进入步骤404;确定未检测到命令词时,则进入步骤405;Step 403: Determine whether the command word is detected; when it is determined that the command word is detected, go to step 404; when it is determined that the command word is not detected, go to step 405;
这里,所述判断是否检测到命令词,包括以下之一:Here, the judgment whether the command word is detected includes one of the following:
判断所述语音数据中是否存在与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Judging whether there is a command word in the voice data whose pronunciation similarity to a word in the command dictionary exceeds a preset threshold;
判断识别文本中是否存在与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。It is determined whether there is a command word matching a word in the command dictionary in the recognized text; the recognized text is obtained by text recognition of the voice data.
步骤404:确定所述命令词对应的指令,根据所述指令执行相应操作;Step 404: Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
这里,所述指令可以是针对演示文档的操作指令,根据确定的操作指令对演示文档执行相应操作,具体可以参照图3所示方法。Here, the instruction may be an operation instruction for the presentation document, and a corresponding operation is performed on the presentation document according to the determined operation instruction. For details, refer to the method shown in FIG. 3.
需要说明的是,在其他场景下,基于用户需求还可以对所述命令词进行扩展。所述命令词还可以包括针对其他程序的操作,如针对电子设备的音量进行调整(如调高音量等),确定相应针对音量的命令词后,确定并执行对应的指令,实现对音量的调整。It should be noted that in other scenarios, the command word can also be extended based on user requirements. The command word may also include operations for other programs, such as adjusting the volume of the electronic device (such as increasing the volume, etc.). After determining the corresponding command word for the volume, determine and execute the corresponding instruction to realize the adjustment of the volume. .
步骤405:继续检测命令词。Step 405: Continue to detect the command word.
图5为本申请实施例的数据处理方法的再一种流程示意图;如图5所示,所述数据处理方法应用于电子设备(如上述服务器、用户持有的终端),所述方法包括:Fig. 5 is a schematic flow diagram of another data processing method according to an embodiment of the application; as shown in Fig. 5, the data processing method is applied to an electronic device (such as the above-mentioned server and a terminal held by a user), and the method includes:
步骤501:对语音数据进行语音识别;Step 501: Perform voice recognition on voice data;
这里,所述语音数据为同声传译场景下演讲者在演讲过程中所说的语音。Here, the voice data is the voice spoken by the speaker during the speech in the simultaneous interpretation scenario.
步骤502:对所述语音数据进行命令词检测;判断是否检测到命令词,确定检测到命令词,则进入步骤503;确定未检测到命令词,则进入步骤504;Step 502: Perform command word detection on the voice data; determine whether the command word is detected, and if it is determined that the command word is detected, then go to step 503; if it is determined that the command word is not detected, then go to step 504;
这里,所述对所述语音数据进行命令词检测,包括:Here, the performing command word detection on the voice data includes:
检测所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Detecting command words in the voice data whose pronunciation similarity with a word in the command dictionary exceeds a preset threshold;
检测识别文本中与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。A command word matching a word in the command dictionary in the recognized text is detected; the recognized text is obtained by text recognition of the voice data.
这里,可以对语音数据对应的识别文本进行检测,以确定是否检测到命令词,也可以在进行语音识别过程中对语音数据直接进行检测,以确定命令词。具体可以参考图3所示方法中的确定所述语音数据中符合第一预设条件的命令词,这里不再赘述。Here, the recognized text corresponding to the voice data can be detected to determine whether the command word is detected, or the voice data can be directly detected during the voice recognition process to determine the command word. For details, refer to the method shown in FIG. 3 for determining the command word in the voice data that meets the first preset condition, which will not be repeated here.
步骤503:确定所述命令词对应的指令,根据所述指令执行相应操作;Step 503: Determine the instruction corresponding to the command word, and execute the corresponding operation according to the instruction;
步骤504:对所述语音数据对应的识别文本进行机器翻译,得到翻译文本;Step 504: Perform machine translation on the recognized text corresponding to the voice data to obtain the translated text;
这里,所述识别文本为对语音数据进行语音识别获得的文本。Here, the recognized text is text obtained by performing voice recognition on voice data.
步骤505:判断是否需要进行语音合成;确定需进行语音合成,则进入步骤506,确定不进行语音合成,则进入步骤507;Step 505: Determine whether speech synthesis is required; if it is determined that speech synthesis is required, go to step 506, and if it is determined not to perform speech synthesis, go to step 507;
这里,对于是否需要进行语音合成可以由开发人员预先设定并保存在相应设备中。Here, the need for speech synthesis can be preset by the developer and stored in the corresponding device.
步骤506:对所述翻译文本进行语音合成;Step 506: Perform speech synthesis on the translated text;
步骤507:输出同传结果。Step 507: Output the result of simultaneous interpretation.
这里,所述同传结果可以包括:识别文本、翻译文本;当确定需要进行语音合成时,所述同传结果还可以包括:翻译语音数据(即对翻译文本进行语音合成得到的语音数据)。Here, the simultaneous interpretation result may include: recognized text and translated text; when it is determined that speech synthesis is required, the simultaneous interpretation result may also include: translated voice data (that is, voice data obtained by performing voice synthesis on the translated text).
图6为本申请实施例的同声传译过程中命令执行方法的流程示意图;如图6所示,所述同声传译过程中命令执行方法,包括:FIG. 6 is a schematic flowchart of a method for executing commands in a simultaneous interpretation process according to an embodiment of the application; as shown in FIG. 6, the method for executing commands in a simultaneous interpretation process includes:
步骤601:演讲者使用同传系统进行演讲;Step 601: The speaker uses the simultaneous interpretation system to give a speech;
步骤602:演讲者需要执行相应操作时,演讲者说出对应的命令词;Step 602: When the speaker needs to perform a corresponding operation, the speaker speaks the corresponding command word;
步骤603:同传系统运用上述数据处理方法确定相应的命令词,执行所述命令词对应的指令;Step 603: The simultaneous interpretation system uses the above data processing method to determine the corresponding command word, and executes the instruction corresponding to the command word;
步骤604:演讲者继续使用同传系统进行演讲。Step 604: The lecturer continues to use the simultaneous interpretation system to give a lecture.
为实现本申请实施例的数据处理方法,本申请实施例还提供了一种数据处理装置。图7为本申请实施例的数据处理装置的组成结构示意图;如图7所示,所述数据处理装置包括:In order to implement the data processing method of the embodiment of the present application, the embodiment of the present application also provides a data processing device. FIG. 7 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 7, the data processing device includes:
获取单元71,配置为获取语音数据,并对所述语音数据进行内容检测;The obtaining unit 71 is configured to obtain voice data and perform content detection on the voice data;
第一处理单元72,配置为检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;The first processing unit 72 is configured to perform corresponding operations on the presentation document according to instructions corresponding to the command words when the command words contained in the voice data are detected;
这里,所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;Here, the content of the presentation document is associated with the content of the voice data; the presentation document is used for presentation when the voice data is played;
第二处理单元73,配置为检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。The second processing unit 73 is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
在一实施例中,所述第一处理单元72,配置为根据所述语音数据查询命令词库,确定所述语音数据中符合第一预设条件的命令词。In an embodiment, the first processing unit 72 is configured to query a command vocabulary according to the voice data, and determine a command word in the voice data that meets a first preset condition.
在一实施例中,所述第一处理单元72,配置为确定所述语音数据中符合第一预设条件的命令词,包括以下至少之一:In an embodiment, the first processing unit 72 is configured to determine a command word that meets a first preset condition in the voice data, including at least one of the following:
确定所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Determining a command word in the voice data whose pronunciation similarity with a word in the command dictionary exceeds a preset threshold;
确定识别文本中与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。Determine a command word in the recognized text that matches a word in the command dictionary; the recognized text is obtained by text recognition of the voice data.
在一实施例中,所述第一处理单元72,配置为根据所述命令词查询指令库,确定所述命令词对应的目标指令;所述目标指令表征针对演示文档的操作指令;所述指令库包括至少一个指令和所述至少一个指令中各指令对应的命令词;In an embodiment, the first processing unit 72 is configured to query an instruction library according to the command word to determine a target instruction corresponding to the command word; the target instruction represents an operation instruction for a presentation document; the instruction The library includes at least one instruction and a command word corresponding to each instruction in the at least one instruction;
根据所述目标指令对演示文档执行相应操作。Perform corresponding operations on the presentation document according to the target instruction.
在一实施例中,所述第二处理单元73,配置为根据所述语音数据查询命令词库,确定所述语音数据中不符合第二预设条件的演讲内容。In an embodiment, the second processing unit 73 is configured to query a command word database according to the voice data, and determine the speech content in the voice data that does not meet the second preset condition.
在一实施例中,所述第二处理单元73,配置为确定所述语音数据中不符合第二预设条件的演讲内容,包括以下至少之一:In an embodiment, the second processing unit 73 is configured to determine the speech content that does not meet the second preset condition in the voice data, including at least one of the following:
确定所述语音数据中的演讲内容;所述演讲内容中任意一个词的发音与所述命令词库中各个词的发音的相似度均低于预设阈值;Determining the speech content in the speech data; the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
确定识别文本中的演讲文本;所述演讲文本中任意一个词与所述命令词库中各个词的匹配度低于预设匹配度阈值;所述识别文本为对所述语音数据进行文本识别得到。Determine the speech text in the recognized text; the matching degree of any word in the speech text with each word in the command dictionary is lower than the preset matching degree threshold; the recognized text is obtained by text recognition of the speech data .
在一实施例中,所述第二处理单元73,配置为确定所述演讲内容对应的识别结果;In an embodiment, the second processing unit 73 is configured to determine a recognition result corresponding to the speech content;
将所述识别结果在所述语音数据被播放时进行呈现;Presenting the recognition result when the voice data is played;
其中,所述识别结果包括以下至少之一:至少一种语种的演讲文本、至少一种语种的翻译语音数据。Wherein, the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
实际应用时,所述获取单元71、所述第一处理单元72、所述第二处理单元73均可由电子设备(如上述服务器、用户持有的终端)中的处理器,比如中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)等实现。In practical applications, the acquisition unit 71, the first processing unit 72, and the second processing unit 73 can all be processors in electronic devices (such as the aforementioned servers and terminals held by users), such as a central processing unit ( CPU, Central Processing Unit, Digital Signal Processor (DSP, Digital Signal Processor), Microcontroller Unit (MCU) or Programmable Gate Array (FPGA, Field-Programmable Gate Array) etc. are implemented.
需要说明的是:上述实施例提供的装置在进行数据处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将终端的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的装置与数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the device provided in the above embodiment performs data processing, only the division of the above-mentioned program modules is used as an example. In actual applications, the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal The internal structure is divided into different program modules to complete all or part of the processing described above. In addition, the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
基于上述设备的硬件实现,本申请实施例还提供了一种电子设备,图8为本申请实施例的电子设备的硬件组成结构示意图,如图8所示,电子设备80包括存储器83、处理器82及存储在存储器83上并可在处理器82上运行的计算机程序;位于电子设备的处理器82执行所述程序时实现上述电子设备侧一个或多个技术方案提供的方法。Based on the hardware implementation of the above device, an embodiment of the application also provides an electronic device. FIG. 8 is a schematic diagram of the hardware composition of the electronic device according to the embodiment of the application. As shown in FIG. 8, the electronic device 80 includes a memory 83 and a processor. 82 and a computer program stored on the memory 83 and capable of running on the processor 82; the processor 82 located in the electronic device implements the method provided by one or more technical solutions on the electronic device side when the program is executed.
具体地,位于电子设备80的处理器82执行所述程序时实现:获取语音数据,并对所述语音数据进行内容检测;Specifically, when the processor 82 located in the electronic device 80 executes the program, it realizes: acquiring voice data, and performing content detection on the voice data;
检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
需要说明的是,位于电子设备80的处理器82执行所述程序时实现的具体步骤已在上文详述,这里不再赘述。It should be noted that the specific steps implemented when the processor 82 in the electronic device 80 executes the program have been described in detail above, and will not be repeated here.
可以理解,电子设备还包括通信接口81;电子设备中的各个组件通过总线系统84耦合在一起。可理解,总线系统84配置为实现这些组件之间的连接通信。总线系统84除包括数据总线之外,还包括电源总线、控制总线和状态信号总线等。It can be understood that the electronic device further includes a communication interface 81; various components in the electronic device are coupled together through the bus system 84. It can be understood that the bus system 84 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 84 also includes a power bus, a control bus, and a status signal bus.
可以理解,本申请实施例中的存储器83可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic  Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 83 in the embodiment of the present application may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), and erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage. The volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access Memory), and dynamic random access memory. Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Type synchronous dynamic random access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronous connection dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, Direct Rambus Random Access Memory) ). The memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
上述本申请实施例揭示的方法可以应用于处理器82中,或者由处理器82实现。处理器82可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器82中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器82可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器82可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器,处理器82读取存储器中的信息,结合其硬件完成前述方法的步骤。The method disclosed in the foregoing embodiments of the present application may be applied to the processor 82 or implemented by the processor 82. The processor 82 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 82 or instructions in the form of software. The aforementioned processor 82 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like. The processor 82 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or any conventional processor or the like. Combining the steps of the method disclosed in the embodiments of the present application, it may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, and the storage medium is located in a memory. The processor 82 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
本申请实施例还提供了一种存储介质,具体为计算机存储介质,更具体的为计算机可读存储介质。其上存储有计算机指令,即计算机程序,该计算机指令被处理器执行时上述电子设备侧一个或多个技术方案提供的方法。The embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium. Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the electronic device side when the computer instructions are executed by a processor.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和智能设备,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed method and smart device can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本申请实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
另外,在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现, 也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit; The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is allowed to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.
需要说明的是:“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that: "first", "second", etc. are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
另外,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。In addition, the technical solutions described in the embodiments of the present application can be combined arbitrarily without conflict.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application.

Claims (10)

  1. 一种数据处理方法,包括:A data processing method, including:
    获取语音数据,并对所述语音数据进行内容检测;Acquire voice data, and perform content detection on the voice data;
    检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;When the command word contained in the voice data is detected, corresponding operations are performed on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data; the presentation document is used for Presenting when the voice data is played;
    检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。When the speech content contained in the voice data is detected, the speech content is presented when the voice data is played.
  2. 根据权利要求1所述的方法,其中,检测所述语音数据中包含的命令词,包括:The method according to claim 1, wherein detecting the command words contained in the voice data comprises:
    根据所述语音数据查询命令词库,确定所述语音数据中符合第一预设条件的命令词。The command word database is inquired according to the voice data, and the command words in the voice data that meet the first preset condition are determined.
  3. 根据权利要求2所述的方法,其中,所述确定所述语音数据中符合第一预设条件的命令词,包括以下至少之一:The method according to claim 2, wherein the determining the command words in the voice data that meet the first preset condition includes at least one of the following:
    确定所述语音数据中与所述命令词库中一个词的发音的相似度超过预设阈值的命令词;Determining a command word in the voice data whose pronunciation similarity with a word in the command dictionary exceeds a preset threshold;
    确定识别文本中与所述命令词库中一个词匹配的命令词;所述识别文本为对所述语音数据进行文本识别得到。Determine a command word in the recognized text that matches a word in the command dictionary; the recognized text is obtained by text recognition of the voice data.
  4. 根据权利要求1至3任一项所述的方法,其中,所述根据所述命令词对应的指令对演示文档进行相应操作,包括:The method according to any one of claims 1 to 3, wherein the corresponding operation on the presentation document according to the instruction corresponding to the command word comprises:
    根据所述命令词查询指令库,确定所述命令词对应的目标指令;所述目标指令表征针对演示文档的操作指令;所述指令库包括至少一个指令和所述至少一个指令中各指令对应的命令词;According to the command word query instruction library, the target instruction corresponding to the command word is determined; the target instruction represents an operation instruction for the presentation document; the instruction library includes at least one instruction and each instruction corresponding to the at least one instruction Command word
    根据所述目标指令对演示文档执行相应操作。Perform corresponding operations on the presentation document according to the target instruction.
  5. 根据权利要求1所述的方法,其中,检测所述语音数据中包含的演讲内容,包括:The method according to claim 1, wherein detecting the speech content contained in the voice data comprises:
    根据所述语音数据查询命令词库,确定所述语音数据中不符合第二预设条件的演讲内容。According to the voice data query command vocabulary, the speech content in the voice data that does not meet the second preset condition is determined.
  6. 根据权利要求5所述的方法,其中,所述确定所述语音数据中不符合第二预设条件的演讲内容,包括以下至少之一:The method according to claim 5, wherein the determining the speech content in the voice data that does not meet the second preset condition comprises at least one of the following:
    确定所述语音数据中的演讲内容;所述演讲内容中任意一个词的发音与所述命令词库中各个词的发音的相似度均低于预设阈值;Determining the speech content in the speech data; the similarity between the pronunciation of any word in the speech content and the pronunciation of each word in the command word library is lower than a preset threshold;
    确定识别文本中的演讲文本;所述演讲文本中任意一个词与所述命令词库中各个词的匹配度低于预设匹配度阈值;所述识别文本为对所述语音数据进行文本识别得到。Determine the speech text in the recognized text; the matching degree of any word in the speech text with each word in the command dictionary is lower than the preset matching degree threshold; the recognized text is obtained by text recognition of the speech data .
  7. 根据权利要求1、5或6所述的方法,其中,所述将所述演讲内 容在所述语音数据被播放时进行呈现,包括:The method according to claim 1, 5, or 6, wherein the presenting the speech content when the voice data is played includes:
    确定所述演讲内容对应的识别结果;Determine the recognition result corresponding to the speech content;
    将所述识别结果在所述语音数据被播放时进行呈现;Presenting the recognition result when the voice data is played;
    其中,所述识别结果包括以下至少之一:至少一种语种的演讲文本、至少一种语种的翻译语音数据。Wherein, the recognition result includes at least one of the following: speech text in at least one language, and translated speech data in at least one language.
  8. 一种同声传译装置,包括:A simultaneous interpretation device, including:
    获取单元,配置为获取语音数据,并对所述语音数据进行内容检测;The acquiring unit is configured to acquire voice data and perform content detection on the voice data;
    第一处理单元,配置为检测到所述语音数据中包含的命令词时,根据所述命令词对应的指令对演示文档进行相应操作;所述演示文档的内容与所述语音数据的内容相关联;所述演示文档用于在所述语音数据被播放时进行呈现;The first processing unit is configured to, when a command word contained in the voice data is detected, perform a corresponding operation on the presentation document according to the instruction corresponding to the command word; the content of the presentation document is associated with the content of the voice data ; The presentation document is used to present when the voice data is played;
    第二处理单元,配置为检测到所述语音数据中包含的演讲内容时,将所述演讲内容在所述语音数据被播放时进行呈现。The second processing unit is configured to, when detecting speech content included in the voice data, present the speech content when the voice data is played.
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至7任一项所述方法的步骤。An electronic device comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the steps of the method according to any one of claims 1 to 7 when the processor executes the program.
  10. 一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现权利要求1至7任一项所述方法的步骤。A storage medium having computer instructions stored thereon, and when the instructions are executed by a processor, the steps of the method according to any one of claims 1 to 7 are realized.
PCT/CN2019/125606 2019-12-16 2019-12-16 Data processing method and apparatus, electronic device, and storage medium WO2021119908A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/125606 WO2021119908A1 (en) 2019-12-16 2019-12-16 Data processing method and apparatus, electronic device, and storage medium
CN201980100994.5A CN114556353A (en) 2019-12-16 2019-12-16 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/125606 WO2021119908A1 (en) 2019-12-16 2019-12-16 Data processing method and apparatus, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021119908A1 true WO2021119908A1 (en) 2021-06-24

Family

ID=76476927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125606 WO2021119908A1 (en) 2019-12-16 2019-12-16 Data processing method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN114556353A (en)
WO (1) WO2021119908A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339193A (en) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 Voice control conference speed method and system
CN103257841A (en) * 2013-05-16 2013-08-21 问航 Method for controlling PPT presentation software by voice input of microphone
US20160328206A1 (en) * 2014-03-28 2016-11-10 Panasonic Intellectual Property Management Co., Ltd. Speech retrieval device, speech retrieval method, and display device
CN107046523A (en) * 2016-11-22 2017-08-15 深圳大学 A kind of simultaneous interpretation method and client based on individual mobile terminal
CN206594846U (en) * 2016-11-06 2017-10-27 刘守全 Voice command formula teaching apparatus of history lesson
CN109584880A (en) * 2018-11-26 2019-04-05 山东弘享机器人有限公司 A kind of implementation method of robot speech PPT file

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339193A (en) * 2010-07-21 2012-02-01 Tcl集团股份有限公司 Voice control conference speed method and system
CN103257841A (en) * 2013-05-16 2013-08-21 问航 Method for controlling PPT presentation software by voice input of microphone
US20160328206A1 (en) * 2014-03-28 2016-11-10 Panasonic Intellectual Property Management Co., Ltd. Speech retrieval device, speech retrieval method, and display device
CN206594846U (en) * 2016-11-06 2017-10-27 刘守全 Voice command formula teaching apparatus of history lesson
CN107046523A (en) * 2016-11-22 2017-08-15 深圳大学 A kind of simultaneous interpretation method and client based on individual mobile terminal
CN109584880A (en) * 2018-11-26 2019-04-05 山东弘享机器人有限公司 A kind of implementation method of robot speech PPT file

Also Published As

Publication number Publication date
CN114556353A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
JP7029613B2 (en) Interfaces Smart interactive control methods, appliances, systems and programs
US20230376108A1 (en) Natural human-computer interaction for virtual personal assistant systems
US9953648B2 (en) Electronic device and method for controlling the same
CN108683937B (en) Voice interaction feedback method and system for smart television and computer readable medium
CN110517689B (en) Voice data processing method, device and storage medium
US9298287B2 (en) Combined activation for natural user interface systems
US20180277097A1 (en) Method and device for extracting acoustic feature based on convolution neural network and terminal device
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
CN111050201B (en) Data processing method and device, electronic equipment and storage medium
WO2020078300A1 (en) Method for controlling screen projection of terminal and terminal
US20210142819A1 (en) Method, apparatus, device and computer-readable storage medium for voice interaction
WO2020024620A1 (en) Voice information processing method and device, apparatus, and storage medium
CN109543021B (en) Intelligent robot-oriented story data processing method and system
CN114556328A (en) Data processing method and device, electronic equipment and storage medium
KR20200027331A (en) Voice synthesis device
WO2021120190A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
KR20170126667A (en) Method for generating conference record automatically and apparatus thereof
US20200051556A1 (en) Speech control for complex commands
CN111009240A (en) Voice keyword screening method and device, travel terminal, equipment and medium
WO2021087665A1 (en) Data processing method and apparatus, server, and storage medium
CN110992960A (en) Control method, control device, electronic equipment and storage medium
WO2021097629A1 (en) Data processing method and apparatus, and electronic device and storage medium
US20130179165A1 (en) Dynamic presentation aid
WO2021119908A1 (en) Data processing method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956203

Country of ref document: EP

Kind code of ref document: A1