WO2021097629A1 - 数据处理方法、装置、电子设备和存储介质 - Google Patents

数据处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021097629A1
WO2021097629A1 PCT/CN2019/119268 CN2019119268W WO2021097629A1 WO 2021097629 A1 WO2021097629 A1 WO 2021097629A1 CN 2019119268 W CN2019119268 W CN 2019119268W WO 2021097629 A1 WO2021097629 A1 WO 2021097629A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
segment
processed
recognition result
target
Prior art date
Application number
PCT/CN2019/119268
Other languages
English (en)
French (fr)
Inventor
薛征山
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/119268 priority Critical patent/WO2021097629A1/zh
Priority to CN201980100711.7A priority patent/CN114430832A/zh
Publication of WO2021097629A1 publication Critical patent/WO2021097629A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying

Definitions

  • This application relates to simultaneous interpretation technology, in particular to a data processing method, device, electronic equipment and storage medium.
  • AI artificial intelligence
  • the simultaneous interpretation system is a voice translation product for conference scenes that has appeared in recent years. It uses AI technology to provide multilingual text translation and text presentation for conference speakers' speech content.
  • embodiments of the present application provide a data processing method, device, electronic equipment, and storage medium.
  • the embodiment of the application provides a data processing method, including:
  • the text to be processed is a piece of text in the recognition result;
  • the recognition result is determined based on voice data;
  • the recognition result is presented when the voice data is played;
  • the query fragment library of the to-be-processed text determine a target fragment whose semantic relevance to the to-be-processed text meets a preset condition
  • the fragment library includes at least one fragment and position information of each fragment in the recognition result
  • the segments in the segment library change as the voice data changes.
  • the determining the target segment whose semantic relevance with the to-be-processed text satisfies a preset condition according to the query segment database of the to-be-processed text includes:
  • the first target segment is determined from the at least one segment.
  • the determining that the to-be-processed text meets the first preset condition includes at least one of the following:
  • a first selection instruction for the at least two segments is received, and a first target segment is determined from the at least two segments according to the first selection instruction.
  • the determining the target segment whose relevance to the to-be-processed text satisfies a preset condition includes:
  • a second target segment is determined from the at least one segment.
  • the determining that the text to be processed meets the second preset condition includes at least one of the following:
  • determining the second target fragment from the at least two fragments includes:
  • a second selection instruction for the at least two segments is received, and a second target segment is determined from the at least two segments according to the second selection instruction.
  • the method further includes:
  • Segmenting the recognition result to obtain a segmentation result includes at least one segment
  • the segment library is updated according to the at least one segment; each segment in the at least one segment and the position information of the corresponding segment in the recognition result are stored in the segment library in correspondence with each other.
  • the method further includes:
  • Term extraction is performed on the bilingual data of the machine translation model, and the term dictionary is generated based on the extracted terms.
  • the embodiment of the application also provides a data processing device, including:
  • a determining unit configured to determine a text to be processed; the text to be processed is a piece of text in a recognition result; the recognition result is determined based on voice data; the recognition result is presented when the voice data is played;
  • the first processing unit is configured to determine, according to the to-be-processed text query segment library, a target segment whose semantic relevance to the to-be-processed text satisfies a preset condition;
  • the second processing unit is configured to return from the to-be-processed text to the target segment for presentation according to the position information of the target segment in the recognition result;
  • the fragment library includes at least one fragment and position information of each fragment in the recognition result
  • the segments in the segment library change as the voice data changes.
  • the embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements any of the above data processing methods when the program is executed. step.
  • the embodiments of the present application also provide a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any of the foregoing data processing methods are implemented.
  • the data processing method, device, electronic equipment, and storage medium provided by the embodiments of the application determine the text to be processed; the text to be processed is a piece of text in the recognition result; the recognition result is determined based on voice data; the recognition result is The voice data is presented when it is played; according to the to-be-processed text query segment library, a target segment whose semantic relevance to the to-be-processed text meets a preset condition is determined; and the target segment is in the recognition result
  • the position information from the text to be processed is returned to the target fragment for presentation; wherein the fragment library includes at least one fragment and the position information of each fragment in the recognition result; the fragments in the fragment library follow all
  • the voice data changes according to the change. In this way, the target segment related to the text to be processed selected by the user can be displayed to the user, helping the user understand the content of the speech, and improving the user experience.
  • Figure 1 is a schematic diagram of the system architecture of the application of simultaneous interpretation methods in related technologies
  • FIG. 2 is a schematic flowchart of a data processing method according to an embodiment of the application
  • FIG. 3 is a schematic diagram of another flow chart of a data processing method according to an embodiment of the application.
  • FIG. 4 is a schematic flowchart of a method for determining a target segment according to an embodiment of the application
  • FIG. 5 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of the composition structure of an electronic device according to an embodiment of the application.
  • Figure 1 is a schematic diagram of the system architecture of the application of the simultaneous interpretation method in the related art; as shown in Figure 1, the system may include: a machine simultaneous interpretation server, a voice processing server, a terminal held by a user, an operating terminal, and a display screen.
  • the terminal held by the user may be a mobile phone, a tablet computer, etc.;
  • the operating terminal may be a personal computer (PC, Personal Computer), a mobile phone, etc., where the PC may be a desktop computer, a notebook computer, a tablet computer, etc.
  • the lecturer can give conference speeches through the operation terminal.
  • the operation terminal collects the lecturer's voice data and sends the collected voice data to the machine simultaneous interpretation server.
  • the machine simultaneous interpretation service The terminal recognizes the voice data through the voice processing server, and obtains a recognition result (the recognition result may be a recognized text in the same language as the voice data, or a translated text in another language obtained by translating the recognized text);
  • the machine simultaneous interpretation server can send the recognition result to the operation terminal, and the operation terminal screens the recognition result on the display screen; it can also send the recognition result to the terminal held by the user (specifically according to the language required by the user, the corresponding transmission The recognition result of the corresponding language), to show the recognition result to the user, so as to realize the translation of the speech content of the speaker into the language required by the user and display it.
  • the voice processing server may include: a voice recognition module, a text smoothing module, and a machine translation module.
  • the voice recognition module is used to perform text recognition on the user's voice data to obtain recognized text;
  • the text smoothing module is used to format the recognized text, such as: oral smoothness, punctuation recovery, and reverse text standardization, etc.
  • the machine translation module is used to translate the recognized text after format processing into another language text, that is, to obtain the translated text.
  • the functions of the above-mentioned machine simultaneous interpretation server and voice processing server can also be implemented on the terminal held by the user, that is, the operating terminal collects the speech data of the speaker, and sends the collected voice data to the user holding the terminal.
  • the terminal held by the user recognizes the voice data, obtains the recognition result, and displays the recognition result.
  • the terminal held by the user may include the above-mentioned speech recognition module, text smoothing module, and machine translation module, and implement corresponding functions.
  • the speech processing server or the terminal held by the user can determine the speech content (including recognized text, translated text, etc.) in different languages corresponding to the speech data and provide it to the user for viewing, but only the speech content is displayed synchronously to provide the user for viewing ,
  • the speech content including recognized text, translated text, etc.
  • the to-be-processed text in the recognition result (such as the content that is difficult for the user to understand) is determined; A target segment whose semantic relevance meets a preset condition; according to the position information of the target segment in the recognition result, return from the to-be-processed text to the target segment for presentation; wherein, the segment library includes at least one The segment and the position information of each segment in the recognition result; the segments in the segment library change with the change of the voice data; in this way, the target segment related to the text to be processed selected by the user can be displayed to the user, which helps The user understands the content of the speech and improves the user experience.
  • FIG. 2 is a schematic flowchart of the data processing method of the embodiment of the application; as shown in FIG. 2, the method includes:
  • Step 201 Determine the text to be processed; the text to be processed is a piece of text in the recognition result;
  • the recognition result is determined based on voice data; the recognition result is presented when the voice data is played.
  • Step 202 According to the query fragment library of the text to be processed, determine a target fragment whose semantic relevance to the text to be processed meets a preset condition;
  • the target segment is associated with the semantics of the text to be processed.
  • Step 203 According to the position information of the target segment in the recognition result, return from the to-be-processed text to the target segment for presentation.
  • the segment library includes at least one segment and the position information of each segment in the recognition result; the segments in the segment library change with the change of the voice data.
  • the recognition result is presented when the voice data is played, which means that the recognition result is presented while the voice data is being played, that is, the data data processing method can be applied to the scene of simultaneous interpretation.
  • the first terminal when the speaker is giving a speech, uses the voice collection module to collect the content of the speech in real time, that is, to obtain the voice data to be processed.
  • a communication connection can be established between the first terminal and the server for simultaneous interpretation, and the first terminal sends the acquired voice data to the server for simultaneous interpretation, and the server can obtain all the data in real time.
  • the voice data to be processed is described and text recognition is performed on the voice data to be processed, and the recognition result is obtained for presentation, that is, the recognition result is displayed while the voice data is played.
  • the simultaneous interpretation scene may adopt the system architecture shown in FIG. 1, and the data processing method of the embodiment of the present application may be applied to an electronic device.
  • the electronic device may be a newly added device in the system architecture of FIG. 1, or it may be It is only necessary to improve a certain device in the architecture of FIG. 1 to be able to implement the method of the embodiment of the present application.
  • the electronic device may be a server, a terminal held by a user, or the like.
  • the electronic device may be a server, and the text to be processed may be performed by the user holding the terminal through the terminal's human-computer interaction interface (here, the recognition result is presented through the terminal held by the user) Select and send the selection result to the server, and the server determines the text to be processed based on the selection result sent by the terminal held by the user.
  • the human-computer interaction interface here, the recognition result is presented through the terminal held by the user
  • the electronic device may also be a server with or connected to a human-computer interaction interface, and the user selects the text to be processed from the recognition result through the human-computer interaction interface of the server.
  • the server may be a newly added server in the system architecture of FIG. 1 to implement the method of this application (that is, the method shown in FIG. 2), or it may be an improvement to the voice processing server in the architecture of FIG. Just implement the application method.
  • the electronic device may also be a terminal held by a user, and the terminal held by the user may receive the recognition result sent by the server, and the user selects the text to be processed from the recognition result through the human-computer interaction interface of the terminal.
  • the terminal held by the user may be a newly added terminal in the system architecture of FIG. 1 that can implement the method of the present application, or it may be an improvement to the terminal held by the user in the architecture of FIG. 1 to implement the present application Method.
  • the terminal held by the user may be a PC, a tablet computer, a mobile phone, and the like.
  • the data processing method is applied in a simultaneous interpretation scenario. As the speech progresses, the voice data will continue to change, and the recognition result will also continue to change as the voice data changes.
  • the text to be processed may be a piece of text in the recognition result;
  • the recognition result refers to the recognized text obtained after text recognition of the voice data; here, the recognized text is text in any language .
  • the recognized text is obtained based on voice data
  • the recognized text may contain multiple characters; the text to be processed may be one character in the recognized text, or at least two consecutive characters in the recognized text .
  • the recognized text includes "machine translation refers to the process of using a computer to convert one natural language into another natural language"
  • the user can select a piece of text, assuming that "natural language” is selected, the text to be processed As "natural language”.
  • the text to be processed selected by the user may be a professional term or a descriptive text. If it is a professional term, other fragments in the recognition result may be directly mentioned (that is, other fragments may contain the text to be processed), so , You can search for fragments containing the text to be processed to help users understand the text to be processed.
  • step 202 the querying the fragment library according to the to-be-processed text to determine the target fragment whose semantic relevance to the to-be-processed text satisfies a preset condition includes:
  • the first target segment is determined from the at least one segment.
  • the determining that the to-be-processed text meets the first preset condition includes at least one of the following:
  • the preset digital threshold can be preset and saved by the developer. For example, assuming that the language corresponding to the recognized text is Chinese, the preset number threshold may be 6, which is considering that the general term will not exceed 6 characters in the case of Chinese.
  • the text to be processed matches a term in the preset term dictionary means that there is a term in the term dictionary that is the same as the text to be processed.
  • the text to be processed includes "machine translation", which matches the term “machine translation” included in the term dictionary.
  • the text to be processed selected by the user may be a professional term or a descriptive text
  • the target segment is determined based on the segment containing the text to be processed
  • the text to be processed is a descriptive text
  • the method further includes:
  • Term extraction is performed on the bilingual data of the machine translation model, and the term dictionary is generated based on the extracted terms.
  • any method such as text-reranking, bootstrapping, and deep learning may be combined to perform term extraction.
  • the embodiment of the present application does not limit the term extraction method.
  • the fragment when the number of fragments (specifically, the fragments containing the text to be processed) is one, the fragment can be directly used as the first target fragment; when the number of the fragments is at least two, it can be selected from at least two fragments. One of the two fragments is selected as the first target fragment; specifically, the at least two fragments may be displayed to the user, and the user selects the first target fragment from the at least two fragments.
  • determining the first target fragment from the at least two fragments includes:
  • a first selection instruction for the at least two segments is received, and a first target segment is determined from the at least two segments according to the first selection instruction.
  • the segment corresponding to the text to be processed in the recognition result refers to the segment in the recognition result that contains the text to be processed selected by the user.
  • machine translation refers to the use of a computer to convert one natural language into another natural language. The process of language.
  • the determining the similarity between the corresponding segment of the to-be-processed text in the recognition result and each of the at least two segments includes:
  • a semantic similarity calculation is performed on the segment corresponding to the to-be-processed text in the recognition result and each segment of the at least two segments.
  • any method for calculating the semantic similarity can be applied, and it is not limited.
  • the electronic device may have a processing module that uses a preset neural network model for semantic recognition to perform semantic similarity calculation.
  • the text to be processed selected by the user may be a professional term or a descriptive text. If it is a descriptive text, the above-mentioned processing methods for professional terms cannot be used. In this case, it is necessary to determine according to the semantics of the descriptive text The target fragment.
  • the determining the target segment whose relevance to the to-be-processed text satisfies a preset condition includes:
  • a second target segment is determined from the at least one segment.
  • the to-be-processed text meets the second preset condition, it is considered that the to-be-processed text is a descriptive text rather than a professional term.
  • the similarity between the segment and the segment corresponding to the text to be processed in the recognition result meets a preset similarity condition, which refers to the similarity between the segment and the segment corresponding to the text to be processed in the recognition result Exceeds the preset similarity threshold.
  • the preset similarity threshold may be preset by the developer and stored in the electronic device.
  • the determining that the text to be processed meets the second preset condition includes at least one of the following:
  • the fragments are directly used as Second target fragment; when the number of the fragments is at least two, one of the at least two fragments can be selected as the second target fragment; specifically, the at least two fragments can be displayed to the user, and the user can select from The second target segment is selected from the at least two segments.
  • determining the second target fragment from the at least two fragments includes:
  • a second selection instruction for the at least two segments is received, and a second target segment is determined from the at least two segments according to the second selection instruction.
  • presenting according to the sorting result based on the magnitude of similarity may be that according to the sorting result, at least two segments are sequentially presented through the human-computer interaction interface. After each segment is displayed, the feedback instruction is determined according to the user's operation on the human-computer interaction interface.
  • the human-computer interaction interface displays the confirm button and the next button, confirm to click the confirm button to select the corresponding segment as the target segment, confirm to click the next button to confirm to continue
  • the next segment of the corresponding segment is displayed
  • the data processing method is applied to an electronic device, and the following description will be given for the electronic device receiving corresponding selection instructions (first selection instruction and second selection instruction).
  • the electronic device may be a server.
  • the user holding the terminal may perform the selection operation through the human-computer interaction interface, and the user holds the The terminal determines the corresponding selection instruction, and sends the determined corresponding instruction to the server, so that the server receives the corresponding selection instruction.
  • the electronic device may also be a server with or connected to a human-computer interaction interface.
  • the receiving corresponding selection instructions may be performed by the user through the server's human-computer interaction interface
  • the server determines the operation performed by the user through the human-computer interaction interface, that is, receives the corresponding selection instruction.
  • the electronic device may also be a terminal held by the user.
  • the receiving of the corresponding selection instructions may be the terminal held by the user confirming that the user uses the human-computer interaction interface
  • the operation performed is that the terminal held by the user receives the corresponding selection instruction.
  • the voice data is constantly changing; accordingly, the recognition result is also constantly changing, and the fragment library stores the fragments determined based on the recognition result, so The segments in the segment library also continuously change with the changes of the voice data in the simultaneous interpretation scene.
  • a segment library updated based on the recognition result is provided.
  • the target segment related to the text to be processed can be determined, thereby facilitating the user Look back at the content of the previous speech.
  • the method further includes:
  • Segmenting the recognition result to obtain a segmentation result includes at least one segment
  • the segment library is updated according to the at least one segment; each segment in the at least one segment and the position information of the corresponding segment in the recognition result are stored in the segment library in correspondence with each other.
  • At least one segment of the speech content can be obtained, so that the segment obtained by querying the segment database according to the text to be processed is the speech content related to the text to be processed.
  • segmenting the recognition result may include: performing semantic analysis on the recognition result, segmenting the recognition result according to the semantic analysis result to obtain at least one segment, and dividing the at least one segment Fragments are used as the result of the segmentation.
  • the segment library is updated according to the at least one segment, that is, the segment library may include each segment in the speech content. Therefore, the segment library is queried according to the text to be processed, and the target segment is obtained, that is, the speech content related to the text to be processed is obtained.
  • the semantic analysis may be implemented by using a preset semantic analysis model.
  • the electronic device may have a processing module that uses a preset semantic analysis model for semantic analysis; the semantic analysis model may use Latent Semantic Analysis (LSA, Latent Semantic Analysis) model, probabilistic Latent Semantic Analysis (pLSA, probabilistic Latent Semantic Analysis) model, etc.
  • LSA Latent Semantic Analysis
  • pLSA probabilistic Latent Semantic Analysis
  • other semantic analysis models can also be used, which are not limited here.
  • the segmentation result when stored in the fragment library, it can be correspondingly stored in the fragment library according to the position information of each fragment in the recognition result (the position information here can be understood as the sequence of sentences).
  • the recognition results include: "Machine translation refers to the process of using computers to convert one natural language into another. It is a branch of computational linguistics and one of the ultimate goals of artificial intelligence. At the same time, Machine translation has important practical value”.
  • Machine translation refers to the process of using a computer to convert one natural language into another natural language
  • Fragment B It is a branch of computational linguistics and one of the ultimate goals of artificial intelligence
  • Fragment C Machine translation has important practical value.
  • fragment A, fragment B, and fragment C are stored in the fragment library according to the sequence of the sentences of each fragment in the recognition result.
  • the recognition result may correspond to at least one language, that is, the recognition result may be the recognition text of the first language, the recognition text of the second language, ..., the Nth language.
  • Recognition text, N is greater than or equal to 1. Recognized texts in different languages are used to present to users who speak different languages.
  • each term in the preset term dictionary corresponds to at least one language; the language corresponding to the term is the same as the language corresponding to the recognized text; thus, the recognized text in different languages can be processed by the above data Method to review the text.
  • a text recognition method for voice data is provided.
  • the method further includes:
  • the language corresponding to the recognized text of the first language is the same as the language corresponding to the voice data.
  • the method further includes:
  • the translation model is used to translate text in one language into text in another language.
  • the data processing method is applied to an electronic device.
  • the electronic device may be a server.
  • the server may obtain voice data and perform text recognition to obtain the recognition result; the recognition result is sent to the terminal held by the user, thereby holding
  • the user of the terminal can browse the recognition result through the terminal.
  • the user can select the language through the terminal held by the user, and the server provides the recognition text of the corresponding language based on the language selected by the terminal held by the user.
  • the recognition result of the corresponding language type may be obtained according to the acquisition request sent by the user through the terminal held by the user.
  • the electronic device is a server, and the method may further include: receiving an acquisition request sent by a terminal; the acquisition request is used to acquire a recognition result; the acquisition request includes at least: a target language;
  • the terminal refers to a terminal held by the user.
  • the terminal held by the user receives the recognition result and presents it.
  • the user browses the recognition result, he can select the text to be processed, the terminal held by the user determines the text to be processed, and the terminal held by the user determines the text to be processed It is sent to the server, and the server applies the above-mentioned data processing method for corresponding processing, and presents the determined target segment to the user for browsing through the terminal held by the user.
  • the electronic device may also be a server connected to itself or provided with a human-computer interaction interface.
  • the user sets the language through the human-computer interaction interface in advance.
  • the server obtains voice data and performs text recognition to obtain the preset language corresponding
  • the recognition result is presented through the human-computer interaction interface.
  • the server may also be connected to a display screen, and the server uses a projection technology to project the recognition result to the display screen for presentation.
  • the server determines the text to be processed, it applies the aforementioned data processing method for corresponding processing, so that the target segment (ie, the first target segment or the second target segment) most relevant to the text to be processed selected by the user can be directly returned to the user Browse to help users understand the current content.
  • the electronic device may also be a terminal held by the user.
  • the user holding the terminal can set the language in advance through the human-computer interaction interface of the terminal.
  • the terminal held by the user performs text recognition on the voice data to obtain the The recognition result corresponding to the set language is presented through the human-computer interaction interface.
  • the above-mentioned data processing method is applied to perform corresponding processing, so that the target segment that is most relevant to the to-be-processed text selected by the user (that is, the first target segment or the second target Fragment), directly presented to the user for browsing, helping the user understand the current content.
  • the method provided in the embodiments of this application can be applied to a simultaneous interpretation scenario, such as simultaneous interpretation in a meeting.
  • the above-mentioned data processing method is used to embed the historical text automatic backtracking function (specifically refers to the simultaneous interpretation process).
  • Determine the target segment such as the first target segment or the second target segment, and return to the target segment from the currently browsed to-be-processed text to be displayed to the user
  • the target segment related to the to-be-processed text selected by the user can be displayed, Show to users to help users understand the content.
  • the data processing method provided by the embodiment of the present invention determines the text to be processed; the text to be processed is any piece of text in the recognition result; the recognition result is determined based on the voice data; the recognition result is performed when the voice data is played Present; according to the query fragment library of the to-be-processed text, determine a target fragment whose semantic relevance to the to-be-processed text satisfies a preset condition; the target fragment is semantically associated with the to-be-processed text; according to the target The position information of the fragment in the recognition result is returned from the text to be processed to the target fragment for presentation; wherein the fragment library includes at least one fragment and the position information of each fragment in the recognition result; the fragment The fragments in the library change with the change of the voice data.
  • the target fragments related to the text to be processed selected by the user can be displayed to the user, helping the user understand the speech content, and improving the user experience; in order to solve the problem of simultaneous interpretation
  • the user encounters a certain term or a certain piece of text that he does not understand, he hopes to read back to view the previous content, and understand the current term or text according to the relevant content (ie target fragment) mentioned above The problem.
  • Fig. 3 is a schematic diagram of another flow chart of the data processing method according to an embodiment of the application.
  • the data processing method is applied to a simultaneous interpretation scene. As shown in Fig. 3, the method includes:
  • Step 301 Acquire voice data, perform text recognition on the voice data, and obtain a recognition result; update the segment library according to the recognition result.
  • the step 301, acquiring voice data, and performing text recognition on the voice data to obtain a recognition result includes:
  • voice data is acquired; text recognition is performed on the voice data to obtain recognized text in the first language; the recognized text in the first language is translated to obtain recognized text in other languages.
  • each sentence recognized or translated here can be segmented according to punctuation, the punctuation can be a period, colon, question mark, etc.), and save it to the fragment In the library.
  • the sentence may be stored in the fragment library in the form of a list (List), and the List is used to store a variable-length vector.
  • the voice data is constantly changing, and all previous historical texts are saved in the List; here, the sentences are saved in order, and the order of saving the sentences follows the original sentence order.
  • Step 302 Determine the to-be-processed text T selected by the user, and determine that the to-be-processed text T is a term or a descriptive text.
  • the specific judgment method may include:
  • the to-be-processed text T is greater than or equal to a preset threshold (here set to 7), then the to-be-processed text is considered to be a descriptive text;
  • the term dictionary is searched according to the to-be-processed text; it is determined that the to-be-processed text T exists in the term dictionary (that is, the to-be-processed text matches or matches a term in the term dictionary) ), the to-be-processed text T is determined as a term; otherwise, the to-be-processed text T is determined as a descriptive text.
  • terms may be extracted from the machine translation bilingual corpus, and a term dictionary is generated based on the extracted terms; any method for extracting specific terms may be used, which is not limited here.
  • a term dictionary may also be preset by the developer.
  • Step 303 It is determined that the to-be-processed text T is a term, and the first operation is performed to determine the target segment from the recognition result.
  • the first operation includes:
  • Step 3031 Use the to-be-processed text T as a query, and traverse the List based on the query to find whether the to-be-processed text T exists in a certain segment (assuming a certain sentence) in the List;
  • Step 3032 It is determined that no sentence containing the text T to be processed is found in the List, and then a prompt message is sent to remind the user that there is no relevant information about the term in the above;
  • Step 3033 Determine that a sentence containing the text T to be processed is found in the List, and only a sentence containing the text T to be processed is found, then the sentence at the corresponding position is used as the target fragment and directly returned to the user (specifically, from the current text to be processed Return to the sentence containing the text T to be processed for presentation); in the case of step 3033, there is a case where a sentence containing the text T to be processed is found from the List, and there are multiple sentences containing the text T to be processed , At this time, assuming that multiple sentences containing the text T to be processed are an RList, then the sentence where the text T to be processed is located and each sentence in the RList are calculated for similarity; according to the similarity, sort from large to small, press The sorting results are returned to the user in turn. The sentence selected by the user and the user needs to locate is used as the target segment.
  • a neural network model for semantic similarity calculation can be used for similarity calculation.
  • a recurrent neural network RNN, Recurrent Neural Network
  • LSTM Long Short-Term Memory
  • Encoder Encoder
  • the RNN-LSTM-Encoder obtains the sentence containing the sentence of the text T to be processed Representation, obtain the sentence representation of each sentence in the RList at the same time, and then use the Cos-Similarity algorithm to calculate the similarity.
  • Step 304 It is determined that the text T to be processed is a descriptive text, and then a second operation is performed to determine the target segment from the recognition result.
  • the position of the to-be-processed text T in the List is no longer located in the second operation (because the non-term descriptive text is generally relatively long, it is difficult to accurately locate it).
  • the second operation includes: calculating the similarity between the to-be-processed text T and each sentence in the List; and sorting them in descending order according to the similarity, and returning them to the user in order according to the sorting results.
  • the sentence that the user needs to locate is selected by the user as the target segment.
  • the sentence representation of the sentence where the text T to be processed is located and the sentence representation of each sentence in the List can be obtained through the RNN-LSTM-Encoder, and the Cos-Similarity algorithm is used to calculate the similarity.
  • the method shown in the embodiment of FIG. 3 can be applied to the electronic equipment to which the method shown in the embodiment of FIG. 2 is applied.
  • the electronic equipment can be a server, a terminal held by a user, etc.; how to determine the corresponding method for the server and the terminal held by the user
  • the information (such as the text to be processed, the user's selection, etc.) has been specifically described in the method shown in FIG. 2 and will not be repeated here.
  • FIG. 4 is a schematic flowchart of a method for determining a target segment according to an embodiment of the application; as shown in FIG. 4, the method for determining a target segment includes:
  • Step 401 Obtain the text to be processed selected by the user, and determine that the text to be processed satisfies the first preset condition or the second preset condition; when the text to be processed satisfies the first preset condition, execute step 402, when the If the text to be processed meets the second preset condition, step 403 is executed;
  • the first preset condition includes at least one of the following:
  • the word count of the text to be processed is lower than or equal to a preset word count threshold
  • the to-be-processed text matches a term in the preset term dictionary.
  • the second preset condition includes at least one of the following:
  • the word count of the text to be processed is higher than a preset word count threshold
  • the text to be processed does not match each term in the preset term dictionary.
  • the method for generating the term dictionary can refer to the method shown in FIG. 2, which will not be repeated here.
  • Step 402 Determine the fragments in the fragment library. When the number of fragments is 1, directly use the fragment as the first target fragment; when the number of fragments is greater than 1, it is determined that the text to be processed is in Based on the similarity between the corresponding segment in the recognition result and each of the at least two segments, a first target segment is selected from the at least two segments based on the similarity.
  • fragments in the fragment library determined in step 402 are fragments in the fragment library that contain the text to be processed selected by the user.
  • the selection of the target segment based on the similarity includes:
  • a first selection instruction for the at least two segments is received, and a first target segment is determined from the at least two segments according to the first selection instruction.
  • the segment corresponding to the text to be processed in the recognition result refers to the segment in the recognition result that contains the text to be processed selected by the user.
  • Step 403 Determine the fragments in the fragment library. When the number of the fragments is 1, directly use the fragment as the second target fragment; when the number of the fragments is greater than 1, select from at least two fragments One is used as the second target segment.
  • the segment in the segment library determined in step 403 is the segment in the segment library whose similarity with the segment corresponding to the text to be processed in the recognition result meets the preset similarity condition, specifically , The similarity between the segment and the segment corresponding to the to-be-processed text in the recognition result exceeds a preset similarity threshold.
  • the preset similarity threshold may be preset by the developer and stored in the electronic device.
  • the selecting one of at least two segments as the second target segment includes:
  • a second selection instruction for the at least two segments is received, and a second target segment is determined from the at least two segments according to the second selection instruction.
  • the at least two fragments after the presentation sequence described in step 402 and step 403 above may be presented through the human-computer interaction interface of the server itself, correspondingly , Said receiving the corresponding selection instructions (the first selection instruction and the second selection instruction) refers to receiving the operation performed by the user through the human-computer interaction interface to determine the corresponding selection instruction; it may also be that the server sends at least two fragments to the user Some terminals are presented through the human-computer interaction interface of the terminal held by the user, and the terminal held by the user determines the operation performed by the user through the human-computer interaction interface to determine the corresponding selection instruction, and send the determined corresponding selection instruction To the server, the server can receive the corresponding selection instruction.
  • the at least two fragments after the presentation sequence described in step 402 and step 403 above may be a person who passes through the terminal held by the user.
  • the computer interactive interface is presented. Accordingly, the receiving of the corresponding selection instructions (the first selection instruction and the second selection instruction) refers to determining the operation performed by the user through the human-computer interaction interface, so that the terminal held by the user can determine the corresponding selection instruction.
  • FIG. 5 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 5, the data processing device includes:
  • the determining unit 51 is configured to determine a text to be processed; the text to be processed is a piece of text in a recognition result; the recognition result is determined based on voice data; the recognition result is presented when the voice data is played;
  • the first processing unit 52 is configured to determine, according to the to-be-processed text query fragment library, a target segment whose semantic relevance to the to-be-processed text satisfies a preset condition; the target segment is semantically related to the to-be-processed text Joint
  • the second processing unit 53 is configured to return from the to-be-processed text to the target segment for presentation according to the position information of the target segment in the recognition result;
  • the segment library includes at least one segment and the position information of each segment in the recognition result; the segments in the segment library change with the change of the voice data.
  • the first processing unit 52 is configured to determine that the text to be processed meets a first preset condition
  • the first target segment is determined from the at least one segment.
  • the first processing unit 52 is configured to determine that the text to be processed meets the first preset condition, including at least one of the following:
  • the number of the fragments may be at least two;
  • the first processing unit 52 is configured to obtain a fragment corresponding to the to-be-processed text in the recognition result
  • a first selection instruction for the at least two segments is received, and a first target segment is determined from the at least two segments according to the first selection instruction.
  • the first processing unit 52 is configured to determine that the text to be processed meets a second preset condition
  • a second target segment is determined from the at least one segment.
  • the first processing unit 52 is configured to determine that the text to be processed meets the second preset condition, including at least one of the following:
  • the number of the fragments may be at least two;
  • the first processing unit 52 is configured to obtain a fragment corresponding to the to-be-processed text in the recognition result
  • a second selection instruction for the at least two segments is received, and a second target segment is determined from the at least two segments according to the second selection instruction.
  • the similarity between the segment and the segment corresponding to the to-be-processed text in the recognition result meets the preset similarity condition, including:
  • the similarity between the segment and the segment corresponding to the text to be processed in the recognition result exceeds a preset similarity threshold.
  • the device further includes: a third processing unit configured to obtain the voice data and perform text recognition on the voice data to obtain the recognition result;
  • Segmenting the recognition result to obtain a segmentation result includes at least one segment
  • the segment library is updated according to the at least one segment; each segment in the at least one segment is stored in the segment library corresponding to the position information of the segment in the recognition result.
  • the device further includes: a fourth processing unit configured to perform term extraction on the bilingual data of the machine translation model, and generate the term dictionary based on the extracted terms.
  • the determining unit 51, the first processing unit 52, the second processing unit 53, the third processing unit, and the fourth processing unit can all be operated by the electronic device (such as a server, a user
  • the processor in the terminal such as a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a microcontroller unit (MCU, Microcontroller Unit), or a programmable gate array (FPGA) , Field-Programmable Gate Array) and other implementations.
  • the device provided in the above embodiment performs data processing
  • only the division of the above-mentioned program modules is used as an example.
  • the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal
  • the internal structure is divided into different program modules to complete all or part of the processing described above.
  • the device provided in the above-mentioned embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.
  • FIG. 6 is a schematic diagram of the hardware composition structure of the electronic device of the embodiment of the application.
  • the electronic device 60 includes a memory 63 and a processor. 62 and a computer program stored on the memory 63 and capable of running on the processor 62; the processor 62 located in the electronic device executes the program to implement the method provided by one or more technical solutions on the electronic device side.
  • the processor 62 located in the electronic device 60 executes the program, it realizes: the text to be processed is determined; the text to be processed is a piece of text in the recognition result; the recognition result is determined based on the voice data; the recognition result is Presenting the voice data when it is played;
  • the query fragment library of the to-be-processed text determine a target fragment whose semantic relevance to the to-be-processed text satisfies a preset condition; the target fragment is semantically associated with the to-be-processed text;
  • the segment library includes at least one segment and the position information of each segment in the recognition result; the segments in the segment library change with the change of the voice data.
  • the electronic device further includes a communication interface 61; various components in the electronic device are coupled together through the bus system 64.
  • the bus system 64 is configured to implement connection and communication between these components.
  • the bus system 64 also includes a power bus, a control bus, and a status signal bus.
  • the memory 63 in this embodiment may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access memory (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
  • the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • SSRAM synchronous static random access memory
  • Synchronous Static Random Access Memory Synchronous Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM synchronous connection dynamic random access memory
  • DRRAM Direct Rambus Random Access Memory
  • the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
  • the method disclosed in the foregoing embodiments of the present application may be applied to the processor 62 or implemented by the processor 62.
  • the processor 62 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 62 or instructions in the form of software.
  • the aforementioned processor 62 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and so on.
  • the processor 62 may implement or execute various methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium, and the storage medium is located in a memory.
  • the processor 62 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
  • the embodiments of the present application also provide a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
  • Computer instructions that is, computer programs, are stored thereon, and when the computer instructions are executed by the processor, the method provided by one or more technical solutions on the electronic device side is provided.
  • the disclosed method and smart device can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
  • the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种数据处理方法、装置、电子设备和存储介质。其中,所述方法包括:确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现(201);根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化(202);根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现(203)。

Description

数据处理方法、装置、电子设备和存储介质 技术领域
本申请涉及同声传译技术,具体涉及一种数据处理方法、装置、电子设备和存储介质。
背景技术
随着人工智能技术的飞速发展,人工智能(AI,Artificial Intelligence)概念已从实验室中的黑科技逐步落地现实,应用到现实生活中的方方面面。
同传系统是近些年出现的针对会议场景的语音翻译产品,其运用AI技术为会议演讲者的演讲内容提供多语种的文本翻译和文本展现。
相关同传系统中,仅将演讲者的内容同步显示在设备上供用户观看,用户在观看演讲者内容的时候,会遇到难以理解某部分内容的状况,影响用户对演讲内容的理解。
发明内容
为解决相关技术问题,本申请实施例提供了一种数据处理方法、装置、电子设备和存储介质。
本申请实施例提供了一种数据处理方法,包括:
确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;
根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,
所述片段库包括至少一个片段和各片段在识别结果中的位置信息;
所述片段库中的片段随着所述语音数据的变化而变化。
上述方案中,所述根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段,包括:
确定所述待处理文本符合第一预设条件;
确定所述片段库中的至少一个片段,所述片段包含所述待处理文本;
从所述至少一个片段中确定第一目标片段。
上述方案中,所述确定所述待处理文本符合第一预设条件,包括以 下至少之一:
确定所述待处理文本的字数低于或等于预设字数阈值;
确定所述待处理文本与预设术语词典中的一个术语匹配。
上述方案中,所述片段的数量为至少两个;从至少两个片段中确定所述第一目标片段,包括:
获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第一选择指令,根据所述第一选择指令从所述至少两个片段中确定第一目标片段。
上述方案中,所述确定与所述待处理文本的关联性满足预设条件的目标片段,包括:
确定所述待处理文本符合第二预设条件;
确定所述片段库中的至少一个片段;所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件;
从所述至少一个片段中确定第二目标片段。
上述方案中,所述确定待处理文本符合第二预设条件,包括以下至少之一:
确定所述待处理文本的字数高于预设字数阈值;
确定所述待处理文本与预设术语词典中各术语均不匹配。
上述方案中,所述片段的数量为至少两个时,从至少两个片段中确定所述第二目标片段,包括:
获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第二选择指令,根据所述第二选择指令从所述至少两个片段中确定第二目标片段。
上述方案中,所述方法还包括:
获得所述语音数据,并对所述语音数据进行文本识别,得到所述识别结果;
对所述识别结果进行切分,获得切分结果;所述切分结果包括至少一个片段;
根据所述至少一个片段,更新所述片段库;所述至少一个片段中各片段与相应片段在识别结果中的位置信息对应保存在所述片段库。
上述方案中,所述方法还包括:
对机器翻译模型的双语数据进行术语抽取,基于抽取的术语生成所述术语词典。
本申请实施例还提供了一种数据处理装置,包括:
确定单元,配置为确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
第一处理单元,配置为根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;
第二处理单元,配置为根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,
所述片段库包括至少一个片段和各片段在识别结果中的位置信息;
所述片段库中的片段随着所述语音数据的变化而变化。
本申请实施例又提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一数据处理方法的步骤。
本申请实施例还提供了一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现上述任一数据处理方法的步骤。
本申请实施例提供的数据处理方法、装置、电子设备和存储介质,确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化,如此,能够将用户所选的待处理文本相关的目标片段展示给用户,帮助用户理解演讲内容,提升用户体验。
附图说明
图1为相关技术中同声传译方法应用的系统架构示意图;
图2为本申请实施例的数据处理方法的一种流程示意图;
图3为本申请实施例的数据处理方法的另一种流程示意图;
图4为本申请实施例的目标片段的确定方法的流程示意图;
图5为本申请实施例的数据处理装置的组成结构示意图;
图6为本申请实施例的电子设备的组成结构示意图。
具体实施方式
下面结合附图及具体实施例对本申请作进一步详细的说明。
图1为相关技术中同声传译方法应用的系统架构示意图;如图1所示,所述系统可包括:机器同传服务端、语音处理服务器、用户持有的终端、操作端、显示屏幕。所述用户持有的终端可以为手机、平板电脑等;所述操作端可以采用个人电脑(PC,Personal Computer)、手机等,其中,所述PC可以为台式电脑、笔记本电脑、平板电脑等。
实际应用中,演讲者可以通过操作端进行会议演讲,在进行会议演讲的过程中,操作端采集演讲者的语音数据,将采集的语音数据发送给机器同传服务端,所述机器同传服务端通过语音处理服务器对语音数据进行识别,得到识别结果(所述识别结果可以是与语音数据相同语种的识别文本,也可以是对所述识别文本进行翻译后得到的其他语种的翻译文本);机器同传服务端可以将识别结果发送给操作端,由操作端将识别结果投屏到显示屏幕上;还可以将识别结果发送给用户持有的终端(具体依据用户所需的语种,对应发送相应语种的识别结果),为用户展示识别结果,从而实现将演讲者的演讲内容翻译成用户需要的语种并进行展示。其中,所述语音处理服务器可以包括:语音识别模块、文本顺滑模块、机器翻译模块。所述语音识别模块用于对用户的语音数据进行文本识别,得到识别文本;所述文本顺滑模块用于对所述识别文本进行格式处理,例如:口语顺滑、标点恢复和逆文本标准化等;所述机器翻译模块用于将格式处理后的识别文本翻译成另一种语种的文本,即得到翻译文本。
实际应用中,上述机器同传服务器、语音处理服务器的功能也可以在用户持有的终端上实现,即所述操作端采集演讲者的语音数据,将采集的语音数据发送给所述用户持有的终端,由用户持有的终端对语音数据进行识别,得到识别结果,并展示所述识别结果。相应地,所述用户持有的终端可以包括上述语音识别模块、文本顺滑模块、机器翻译模块,并实现相应的功能。
所述语音处理服务器或用户持有的终端可以确定所述语音数据对应的不同语种的演讲内容(包括识别文本、翻译文本等)并提供给用户观看,然而仅将演讲内容同步显示以提供用户观看,当用户遇到难以理解的内容时,并不能对难以理解的内容进行解释,以帮助用户理解,从而影响用户对整个演讲内容的理解。
基于此,在本申请的各种实施例中,确定识别结果中的待处理文本(如:上述用户难以理解的内容);根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,所述片段库包括至少一个片段和各片段在识别结果中的位置 信息;所述片段库中的片段随着所述语音数据的变化而变化;如此,能够将用户所选的待处理文本相关的目标片段展示给用户,帮助用户理解演讲内容,提升用户体验。
本申请实施例提供了一种数据处理方法,图2为本申请实施例的数据处理方法的一种流程示意图;如图2所示,所述方法包括:
步骤201:确定待处理文本;所述待处理文本为识别结果中的一段文本;
其中,所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现。
步骤202:根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;
这里,所述目标片段与所述待处理文本的语义相关联。
步骤203:根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现。
其中,所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化。
这里,所述步骤201中,所述识别结果在所述语音数据被播放时进行呈现,指在播放语音数据的同时呈现识别结果,即所述数据数据处理方法可以应用于同声传译的场景。
具体来说,在同声传译场景下,当演讲者进行演讲时,第一终端(如图1所示的操作端)利用语音采集模块实时采集演讲内容,即得到待处理的语音数据。所述第一终端与用于实现同声传译的服务器之间可以建立通信连接,所述第一终端将获取的语音数据发送给用于实现同声传译的服务器,所述服务器即可实时获取所述待处理的语音数据并对所述待处理的语音数据进行文本识别,获得识别结果进行呈现,即实现在播放所述语音数据的同时呈现识别结果。
所述同声传译场景可以采用如图1所示系统架构,本申请实施例的数据处理方法可以应用于电子设备,所述电子设备可以是在图1系统架构中新增加的设备,也可以是对图1架构中某一设备进行改进,以能够实现本申请实施例的方法即可。所述电子设备可以是服务器、用户持有的终端等。
具体来说,实际应用时,所述电子设备可以为服务器,所述待处理文本可以由持有终端的用户通过终端的人机交互界面(这里,识别结果通过用户持有的终端进行呈现)进行选择,并将选择结果发送给服务器,所述服务器基于所述用户持有的终端发送的选择结果,确定待处理文本。
所述电子设备还可以为具有或连接有人机交互界面的服务器,用户通过服务器的人机交互界面从识别结果中选择待处理文本。
这里,所述服务器可以是在图1系统架构中新增加的服务器,用于实现本申请方法(即图2所示方法),也可以是对图1架构中所述语音处理服 务器进行改进,以实现本申请方法即可。
所述电子设备也可以为用户持有的终端,所述用户持有的终端可以接收服务器发送的识别结果,用户通过终端的人机交互界面从识别结果中选择待处理文本。这里,所述用户持有的终端可以是在图1系统架构中新增加的可实现本申请方法的终端,也可以是对图1架构中所述用户持有的终端进行改进,以实现本申请方法即可。这里,所述用户持有的终端可以为PC、平板电脑、手机等。
本实施例中,所述数据处理方法应用于同声传译场景下,随着演讲的进行,语音数据将不断变化,所述识别结果也随着语音数据的变化而不断变化。
步骤201中,所述待处理文本可以为识别结果中的一段文本;所述识别结果指对所述语音数据进行文本识别后,得到的识别文本;这里,所述识别文本为任一语种的文本。
具体来说,基于语音数据得到识别文本,所述识别文本可以包含多个字符;所述待处理文本可以是所述识别文本中的一个字符,或者是所述识别文本中至少两个连续的字符。
例如:识别文本包括“机器翻译是指利用计算机将一种自然语言转换为另一种自然语言的过程……”,用户可以选择一段文本,假设选择了“自然语言”,则所述待处理文本为“自然语言”。
实际应用中,用户选择的待处理文本可能是一个专业术语或者一段描述性文本,若是一段专业术语,识别结果中的其它片段可能直接提到(即其它片段可能包含所述待处理文本),因此,可以查找包含待处理文本的片段,用以帮助用户理解所述待处理文本。
基于此,在步骤202中,所述根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段,包括:
确定所述待处理文本符合第一预设条件;
确定所述片段库中的至少一个片段,所述片段包含所述待处理文本;
从所述至少一个片段中确定第一目标片段。
其中,所述确定待处理文本符合第一预设条件,包括以下至少之一:
确定所述待处理文本的字数低于或等于预设字数阈值;
确定所述待处理文本与预设术语词典中的一个术语匹配。
所述预设数字阈值可以由开发人员预先设定并保存。例如,假设识别文本对应的语种是汉语,所述预设数字阈值可以为6,这是考虑到中文情况下一般术语不会超过6个字。
这里,所述待处理文本与预设术语词典中的一个术语匹配表示术语词典中存在一个术语与所述待处理文本相同。结合上述示例,待处理文本包括“机器翻译”,与术语词典中包括的术语“机器翻译”匹配。
实际应用中,考虑到用户选择的待处理文本可能是一个专业术语或 者一段描述性文本,若是专业术语,识别结果中的其他片段可能直接提到(即其他片段可能包含所述待处理文本),则可以采用上述方法确定包含待处理文本的片段,根据包含待处理文本的片段确定目标片段;若所述待处理文本是一段描述性文本,则不建议采用上述根据包含待处理文本的片段确定目标片段的方法;因此,为了判断待处理文本是否为专业术语,需要提供一个用于判断所述待处理文本是否为专业术语的术语词典。
基于此,在一实施例中,所述方法还包括:
对机器翻译模型的双语数据进行术语抽取,基于抽取的术语生成所述术语词典。
这里,可以结合文本重排序(text-reranking)、自举法(Bootstrapping)、深度学习等任一方法进行术语抽取,本申请实施例对术语抽取的方法不作限定。
实际应用中,当片段(具体指包含待处理文本的片段)的数量为一个时,则可以直接将该片段作为第一目标片段;当所述片段的数量为至少两个时,可以从至少两个片段中选择一个作为所述第一目标片段;具体可以将所述至少两个片段显示给用户,由用户从所述至少两个片段中选择第一目标片段。
具体来说,所述片段的数量为至少两个时;从至少两个片段中确定所述第一目标片段,包括:
获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第一选择指令,根据所述第一选择指令从所述至少两个片段中确定第一目标片段。
这里,所述待处理文本在所述识别结果中对应的片段,指所述识别结果中包含用户选择的所述待处理文本的片段。
结合上述示例,所述待处理文本为“自然语言”来说,所述待处理文本在所述识别结果中对应的片段为“机器翻译是指利用计算机将一种自然语言转换为另一种自然语言的过程”。
这里,所述确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度,包括:
对所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段进行语义相似度计算。
本实施例中可以应用任意进行语义相似度计算的方法,不作限定。例如:所述电子设备可以具有一个处理模块,该处理模块采用预设的语 义识别的神经网络模型进行语义相似度计算。
实际应用中,用户选择的待处理文本可能是一个专业术语或者一段描述性文本,若是一段描述性文本,则不能采用上述针对专业术语的处理方法,此时,需要根据描述性文本的语义,确定目标片段。
基于此,在一实施例中,所述确定与所述待处理文本的关联性满足预设条件的目标片段,包括:
确定所述待处理文本符合第二预设条件;
确定所述片段库中的至少一个片段;所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件;
从所述至少一个片段中确定第二目标片段。
这里,所述待处理文本符合第二预设条件时,则认为所述待处理文本是一段描述性文本,而不是专业术语。
所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件,指所述片段与所述待处理文本在所述识别结果中对应的片段的相似度超过预设相似度阈值。所述预设相似度阈值可以由开发人员预先设定并保存在电子设备中。
具体地,所述确定待处理文本符合第二预设条件,包括以下至少之一:
确定所述待处理文本的字数高于预设字数阈值;
确定所述待处理文本与预设术语词典中各术语均不匹配。
实际应用中,当所述片段(具体指上述与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件的片段)的数量为一,则直接将该片段作为第二目标片段;当所述片段的数量为至少两个时,可以从至少两个片段中选择一个作为所述第二目标片段;具体可以将所述至少两个片段显示给用户,由用户从所述至少两个片段中选择第二目标片段。
具体来说,所述片段的数量为至少两个时,从至少两个片段中确定所述第二目标片段,包括:
获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第二选择指令,根据所述第二选择指令从所述至少两个片段中确定第二目标片段。
上述示例中,在呈现至少两个片段时,按照相似度排序结果进行呈现;通过将至少两个片段按照基于相似度大小的排序结果呈现,可以帮助用户挑选最合适的上文中的目标片段,从而提高选择结果的准确性。
这里,按照基于相似度大小的排序结果呈现,可以是按照排序结果,将至少两个片段依次通过人机交互界面呈现,每显示一条片段后,根据用户对人机交互界面的操作确定反馈指令,基于反馈指令确定是选择目标片段还是查看下一条片段(例如,人机交互界面显示有确认按钮和下一条按钮,确定点击确认按钮则选择相应的片段为目标片段,确定点击下一条按钮则确定继续显示相应片段的下一条片段);也可以是将至少两个片段排序后,按排序结果呈现(如按排序结果形成一个列表,将该列表呈现),由用户针对呈现的排序结果进行选择。
上述示例中,所述数据处理方法应用于电子设备,针对电子设备接收相应选择指令(第一选择指令和第二选择指令)作以下说明。
所述电子设备可以是服务器,相应的,所述接收相应选择指令时(第一选择指令和第二选择指令),可以是持有终端的用户通过人机交互界面进行选择操作,用户持有的终端确定相应选择指令,将确定的相应指令发送给服务器,从而使得服务器接收相应选择指令。
所述电子设备还可以是具有或连接有人机交互界面的服务器,相应的,所述接收相应选择指令(第一选择指令和第二选择指令),可以是用户通过服务器的人机交互界面进行操作,所述服务器确定用户通过人机交互界面进行的操作,即接收到相应选择指令。
所述电子设备也可以是用户持有的终端,相应的,所述接收相应选择指令(第一选择指令和第二选择指令),可以是所述用户持有的终端确定用户通过人机交互界面进行的操作,即所述用户持有的终端接收相应选择指令。
本申请实施例中,在同声传译场景下,随着演讲的进行,语音数据在不停变化;相应地,识别结果也不断变化,而所述片段库保存着基于识别结果确定的片段,因此所述片段库中的片段也随着同声传译场景下语音数据的变化而不断变化。
实际应用中,为了便于用户查找到与待处理文本相关的演讲内容,提供一种基于识别结果更新的片段库,根据所述片段库可以确定与所述待处理文本相关的目标片段,从而便于用户回看之前的演讲内容。
基于此,在一实施例中,所述方法还包括:
获得所述语音数据,并对所述语音数据进行文本识别,得到所述识别结果;
对所述识别结果进行切分,获得切分结果;所述切分结果包括至少一个片段;
根据所述至少一个片段,更新所述片段库;所述至少一个片段中各片段与相应片段在识别结果中的位置信息对应保存在所述片段库。
也就是说,对识别结果进行切分,可以得到至少一个演讲内容中的片段,从而根据待处理文本查询片段库得到的片段即为与待处理文本相 关的演讲内容。
这里,实际应用时,对所述识别结果进行切分,可以包括:对所述识别结果进行语义分析,根据语义分析结果对所述识别结果进行切分,得到至少一个片段,将所述至少一个片段作为所述切分结果。
根据所述至少一个片段更新所述片段库,即所述片段库可以包括演讲内容中的每个片段。从而,根据待处理文本查询所述片段库,得到目标片段,即得到与待处理文本相关的演讲内容。
这里,所述语义分析,可以采用预设的语义分析模型实现,例如,所述电子设备可以具有一个处理模块,该处理模块采用预设的语义分析模型进行语义分析;所述语义分析模型可以采用潜在语义分析(LSA,Latent Semantic Analysis)模型、概率潜在语义分析(pLSA,probabilistic Latent Semantic Analysis)模型等,当然也可以采用其他语义分析模型,这里不作限定。
这里,将切分结果保存在片段库时,可以按各片段在识别结果中的位置信息(这里的位置信息可以理解为语句的先后顺序)对应保存在所述片段库中。
举例来说,识别结果包括:“机器翻译是指利用计算机将一种自然语言转换为另一种自然语言的过程。它是计算语言学的一个分支,是人工智能的终极目标之一。同时,机器翻译又具有重要的实用价值”。对所述识别结果进行切分,可以得到包含以下片段的切分结果:
片段A:机器翻译是指利用计算机将一种自然语言转换为另一种自然语言的过程;
片段B:它是计算语言学的一个分支,是人工智能的终极目标之一;
片段C:机器翻译又具有重要的实用价值。
将上述片段A、片段B、片段C按各片段在识别结果中的语句的先后顺序,保存在片段库中。
需要说明的是,在实际应用中,所述识别结果可以对应有至少一种语种,即所述识别结果可以是第一语种的识别文本、第二语种的识别文本、……、第N语种的识别文本,N大于或等于1。不同语种的识别文本用于呈现给使用不同语种的用户观看。
相应的,所述预设术语词典中每个术语对应有至少一种语种;所述术语对应的语种与所述识别文本对应的语种相同;从而针对不同语种的识别文本,均可以通过上述数据处理方法,进行文本回看。
实际应用中,为了可以得到至少一种语种的识别文本,以提供给不同语种的用户,提供了针对语音数据的文本识别方法。
基于此,在一实施例中,所述方法还包括:
对所述语音数据进行语音识别,获得第一语种的识别文本;所述第一语种的识别文本对应的语种与所述语音数据对应的语种相同。
在另一实施例中,为了获得其他语种的识别文本,所述方法还包括:
运用预设的翻译模型对所述第一语种的识别文本进行翻译,获得至少一种其他语种的识别文本。
所述翻译模型,用于将一种语种的文本翻译为另一种语种的文本。
实际应用中,所述数据处理方法应用于电子设备,所述电子设备可以是服务器,服务器可以获取语音数据并进行文本识别,得到识别结果;将识别结果发送给用户持有的终端,从而持有终端的用户可以通过终端浏览识别结果。这里,用户可以通过用户持有的终端选择语种,服务器基于用户持有的终端选择的语种提供相应语种的识别文本。为了提供符合用户需求的语种对应的识别结果,可以根据用户通过用户持有的终端发送的获取请求,获取相应语种的识别结果。
基于此,在一实施例中,所述电子设备为服务器,所述方法还可以包括:接收终端发送的获取请求;所述获取请求用于获取识别结果;所述获取请求至少包括:目标语种;
从至少一种语种的识别文本中获取所述目标语种对应的识别文本,作为识别结果发送给终端。
这里,所述终端指用户持有的终端。用户持有的终端接收到识别结果后进行呈现,用户浏览识别结果时,可以选择待处理文本,所述用户持有的终端确定待处理文本,所述用户持有的终端将确定的待处理文本发送给服务器,由服务器应用上述数据处理方法进行相应处理,将确定的目标片段通过所述用户持有的终端呈现给用户浏览。
实际应用中,所述电子设备还可以是自身连接或设有人机交互界面的服务器,用户预先通过人机交互界面设置语种,所述服务器获取语音数据并进行文本识别,得到预先设置的语种对应的识别结果,通过所述人机交互界面呈现识别结果。当然,所述服务器还可以连接有显示屏幕,则所述服务器运用投屏技术将识别结果投屏到显示屏幕进行呈现。所述服务器确定待处理文本后,应用上述数据处理方法进行相应处理,从而可以将与用户选择的待处理文本最相关的目标片段(即第一目标片段或第二目标片段),直接返回给用户浏览,帮助用户理解当前内容。
实际应用中,所述电子设备也可以是用户持有的终端,持有终端的用户可以预先通过终端的人机交互界面设置语种,所述用户持有的终端对语音数据进行文本识别,得到预先设置的语种对应的识别结果,通过人机交互界面呈现识别结果。所述用户持有的终端确定用户选择的待处理文本后,应用上述数据处理方法进行相应处理,从而可以将与用户选择的待处理文本最相关的目标片段(即第一目标片段或第二目标片段),直接呈现给用户浏览,帮助用户理解当前内容。
本申请实施例提供的方法,可以应用于同声传译场景,比如会议的同声传译,在这种场景下,通过上述数据处理方法,在同声传译过程中嵌入 历史文本自动回溯功能(具体指确定目标片段,如:第一目标片段或第二目标片段,并从当前浏览到的待处理文本处返回到目标片段处展示给用户),可以将用户所选的待处理文本相关的目标片段,展示给用户,帮助用户理解内容。
应理解,上述实施例中说明各步骤的顺序并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本发明实施例提供的数据处理方法,确定待处理文本;所述待处理文本为识别结果中任一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;所述目标片段与所述待处理文本的语义相关联;根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化,如此,能够将用户所选的待处理文本相关的目标片段展示给用户,帮助用户理解演讲内容,提升用户体验;以解决在同声传译过程中,用户遇到不理解的某个术语或某段文本时,希望能够回溯往上翻阅以查看之前的内容,根据上文提到的相关内容(即目标片段)来理解当前的术语或者文本的问题。
图3为本申请实施例的数据处理方法的另一种流程示意图,所述数据处理方法应用于同声传译场景,如图3所示,所述方法包括:
步骤301、获取语音数据,对所述语音数据进行文本识别,得到识别结果;根据所述识别结果更新片段库。
这里,所述步骤301,获取语音数据,对所述语音数据进行文本识别,得到识别结果,包括:
在同传过程中,获取语音数据;对语音数据进行文本识别,得到第一语种的识别文本;对所述第一语种的识别文本进行翻译,得到其他语种的识别文本。
针对任一语种的识别文本来说,操作方法相同,将识别出的或翻译出的每个句子(这里可以根据标点进行切分,所述标点可以是句号、冒号、问号等),保存到片段库中。
这里,所述片段库中可以以列表(List)的形式保存句子,所述List用于保存可变长度的向量。
随着同声传译的进行,语音数据不断变化,List中保存之前所有的历史文本;这里,按照句子依次保存,句子的保存顺序遵从原始的语句顺序。
步骤302、确定用户选取的待处理文本T,判断所述待处理文本T为术语或描述性文本。
这里,判断待处理文本T是一个术语,还是一段描述性的文本,具体 的判断方法,可以包括:
确定所述待处理文本T的长度大于或等于预设阈值(这里设定为7),则认为所述待处理文本为描述性文本;
确定所述待处理文本T长度小于7,则根据所述待处理文本查找术语词典;确定所述待处理文本T存在于术语词典(即所述待处理文本与术语词典中的一个术语匹配或一致)中,则确定待处理文本T为术语;否则确定待处理文本T为描述性文本。
这里,对于术语词典的获得方法不作限定。在一实施例中,可以从机器翻译双语语料中抽取术语,根据抽取的术语生成术语词典;具体术语抽取的方法可以采用任意一种,这里不作限定。在另一实施中,也可以是由开发人员预先设定一个术语词典。
步骤303:确定待处理文本T是术语,则执行第一操作,以从识别结果中确定目标片段。
这里,所述第一操作,包括:
步骤3031:将待处理文本T作为一个查询条件(query),基于query遍历List,查找待处理文本T是否存在于List当中某个片段(假设为某个句子)中;
步骤3032:确定List中没有找到包含待处理文本T的句子,则发送提示消息,用以提示用户,以往上文中没有该术语的相关信息;
步骤3033:确定List中找到包含待处理文本T的句子,并且只找个一个包含待处理文本T的句子,则将相应位置的句子作为目标片段,直接返回给用户(具体指从当前待处理文本的位置回到所述包含待处理文本T的句子进行呈现);在步骤3033的情况下,存在从List中找到包含待处理文本T的句子,并且有多个包含待处理文本T的句子的情况,此时,假设多个包含待处理文本T的句子为一个RList,则将待处理文本T所在的句子与RList中的各个句子,进行相似度计算;根据相似度按从大到小排序,按排序结果依次返回给用户。由用户选择,用户需要定位的那个句子,作为所述目标片段。
这里,可以运用一个用于进行语义相似度计算的神经网络模型进行相似度计算。如一个循环神经网络(RNN,Recurrent Neural Network)-长短期记忆网络(LSTM,Long Short-Term Memory)-编码器(Encoder),所述RNN-LSTM-Encoder获得包含待处理文本T的句子的句子表示,同时获得RList中各个句子的句子表示,然后运用余弦相似度(Cos-Similarity)算法,计算相似度。
步骤304:确定待处理文本T是描述性文本,则执行第二操作,以从识别结果中确定目标片段。
这里,所述第二操作中不再定位待处理文本T在List中的位置(因为非术语的描述性文本,一般比较长,很难精确定位到)。
所述第二操作,包括:将待处理文本T与List中的各个句子,进行相似度计算;并根据相似度按从大到小排序,按照排序结果依次返回给用户。由用户选择,用户需要定位的那个句子,作为目标片段。
这里,可以通过RNN-LSTM-Encoder,获得待处理文本T所在句子的句子表示,以及List中各个句子的句子表示,运用Cos-Similarity算法计算相似度。
图3实施例所示的方法可以应用于图2实施例所示方法所应用的电子设备,所述电子设备可以为服务器、用户持有的终端等;针对服务器和用户持有的终端如何确定相应信息(如待处理文本、用户的选择等),已在图2所示方法中具体说明,这里不再赘述。
应理解,上述实施例中说明各步骤的顺序并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图4为本申请实施例的目标片段的确定方法的流程示意图;如图4所示,所述目标片段的确定方法,包括:
步骤401:获取用户选择待处理文本,判断所述待处理文本满足第一预设条件或第二预设条件;当所述待处理文本满足第一预设条件,则执行步骤402,当所述待处理文本满足第二预设条件,则执行步骤403;
这里,所述第一预设条件,包括以下至少之一:
所述待处理文本的字数低于或等于预设字数阈值;
所述待处理文本与预设术语词典中的一个术语匹配。
这里,所述第二预设条件,包括以下至少之一:
所述待处理文本的字数高于预设字数阈值;
所述待处理文本与预设术语词典中各术语均不匹配。
这里,所述术语词典的生成方法可以参考图2中所示的方法,这里不再赘述。
步骤402:确定所述片段库中的片段,当所述片段的数量为1,则直接将所述片段作为第一目标片段;当所述片段的数量大于1时,确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度,基于相似度,从所述至少两个片段中选择第一目标片段。
这里,步骤402中确定的所述片段库中的片段为所述片段库中包含用户选择的待处理文本的片段。
这里,所述基于相似度,选择目标片段,包括:
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第一选择指令,根据所述第一选择指令从所述至少两个片段中确定第一目标片段。
这里,所述待处理文本在所述识别结果中对应的片段,指所述识别 结果中包含用户选择的所述待处理文本的片段。
步骤403:确定所述片段库中的片段,当所述片段的数量为1,则直接将所述片段作为第二目标片段;当所述片段的数量大于1时,从至少两个片段中选择一个作为所述第二目标片段。
这里,步骤403中确定的所述片段库中的片段为所述片段库中与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件的片段,具体来说,所述片段与所述待处理文本在所述识别结果中对应的片段的相似度超过预设相似度阈值。
所述预设相似度阈值可以由开发人员预先设定并保存在电子设备中。
具体地,所述从至少两个片段中选择一个作为所述第二目标片段,包括:
获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第二选择指令,根据所述第二选择指令从所述至少两个片段中确定第二目标片段。
当所述目标片段的确定方法应用于服务器时,以上步骤402、步骤403中所述的呈现排序后的所述至少两个片段,可以是通过所述服务器自身的人机交互界面呈现,相应的,所述接收相应选择指令(第一选择指令和第二选择指令)指接收用户通过人机交互界面进行的操作以确定相应选择指令;也可以是服务器将至少两个片段发送给所述用户持有的终端,通过所述用户持有的终端的人机交互界面呈现,所述用户持有的终端确定用户通过人机交互界面进行的操作,以确定相应选择指令,将确定的相应选择指令发送给服务器,所述服务器即可接收相应选择指令。
当所述目标片段的确定方法应用于用户持有的终端时,以上步骤402、步骤403中所述的呈现排序后的所述至少两个片段,可以是通过所述用户持有的终端的人机交互界面呈现,相应的,所述接收相应选择指令(第一选择指令和第二选择指令)指确定用户通过人机交互界面进行的操作,从而所述用户持有的终端即可确定相应选择指令。
为实现本申请实施例的数据处理方法,本申请实施例还提供了一种数据处理装置。图5为本申请实施例的数据处理装置的组成结构示意图;如图5所示,所述数据处理装置包括:
确定单元51,配置为确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
第一处理单元52,配置为根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;所述目标片段与所述待处理文本的语义相关联;
第二处理单元53,配置为根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;
其中,所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化。
在一实施例中,所述第一处理单元52,配置为确定所述待处理文本符合第一预设条件;
确定所述片段库中的至少一个片段,所述片段包含所述待处理文本;
从所述至少一个片段中确定第一目标片段。
在一实施例中,所述第一处理单元52,配置为确定待处理文本符合第一预设条件,包括以下至少之一:
确定所述待处理文本的字数低于或等于预设字数阈值;
确定所述待处理文本与预设术语词典中的一个术语匹配。
这里,所述片段的数量可以为至少两个;
所述片段的数量为至少两个时,所述第一处理单元52,配置为获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第一选择指令,根据所述第一选择指令从所述至少两个片段中确定第一目标片段。
在一实施例中,所述第一处理单元52,配置为确定所述待处理文本符合第二预设条件;
确定所述片段库中的至少一个片段;所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件;
从所述至少一个片段中确定第二目标片段。
在一实施例中,所述第一处理单元52,配置为确定待处理文本符合第二预设条件,包括以下至少之一:
确定所述待处理文本的字数高于预设字数阈值;
确定所述待处理文本与预设术语词典中各术语均不匹配。
这里,所述片段的数量可以为至少两个;
所述片段的数量为至少两个时,所述第一处理单元52,配置为获取所述待处理文本在所述识别结果中对应的片段;
确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
接收针对所述至少两个片段的第二选择指令,根据所述第二选择指令从所述至少两个片段中确定第二目标片段。
这里,所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件,包括:
所述片段与所述待处理文本在所述识别结果中对应的片段的相似度超过预设相似度阈值。
在一实施例中,所述装置还包括:第三处理单元,配置为获得所述语音数据,并对所述语音数据进行文本识别,得到所述识别结果;
对所述识别结果进行切分,获得切分结果;所述切分结果包括至少一个片段;
根据所述至少一个片段,更新所述片段库;所述至少一个片段中各片段与所述片段在识别结果中的位置信息对应保存在所述片段库。
在一实施例中,所述装置还包括:第四处理单元,配置为对机器翻译模型的双语数据进行术语抽取,基于抽取的术语生成所述术语词典。
实际应用时,所述确定单元51、所述第一处理单元52、所述第二处理单元53、所述第三处理单元、所述第四处理单元均可由所述电子设备(如服务器、用户持有的终端)中的处理器,比如中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)等实现。
需要说明的是:上述实施例提供的装置在进行数据处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将终端的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的装置与数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
基于上述设备的硬件实现,本申请实施例还提供了一种电子设备,图6为本申请实施例的电子设备的硬件组成结构示意图,如图6所示,电子设备60包括存储器63、处理器62及存储在存储器63上并可在处理器62上运行的计算机程序;位于电子设备的处理器62执行所述程序时实现上述电子设备侧一个或多个技术方案提供的方法。
具体地,位于电子设备60的处理器62执行所述程序时实现:确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;所述目标片段与所述待处理文本的语义相关 联;
根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,
所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化。
需要说明的是,位于电子设备60的处理器62执行所述程序时实现的具体步骤已在上文详述,这里不再赘述。
可以理解,电子设备还包括通信接口61;电子设备中的各个组件通过总线系统64耦合在一起。可理解,总线系统64配置为实现这些组件之间的连接通信。总线系统64除包括数据总线之外,还包括电源总线、控制总线和状态信号总线等。
可以理解,本实施例中的存储器63可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
上述本申请实施例揭示的方法可以应用于处理器62中,或者由处理器62实现。处理器62可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器62中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器62可以是通用处理器、DSP,或 者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器62可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器,处理器62读取存储器中的信息,结合其硬件完成前述方法的步骤。
本申请实施例还提供了一种存储介质,具体为计算机存储介质,更具体的为计算机可读存储介质。其上存储有计算机指令,即计算机程序,该计算机指令被处理器执行时上述电子设备侧一个或多个技术方案提供的方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和智能设备,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光 盘等各种可以存储程序代码的介质。
需要说明的是:“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
另外,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (12)

  1. 一种数据处理方法,包括:
    确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
    根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;
    根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,
    所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化。
  2. 根据权利要求1所述的方法,其中,所述根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段,包括:
    确定所述待处理文本符合第一预设条件;
    确定所述片段库中的至少一个片段,所述片段包含所述待处理文本;
    从所述至少一个片段中确定第一目标片段。
  3. 根据权利要求2所述的方法,其中,所述确定所述待处理文本符合第一预设条件,包括以下至少之一:
    确定所述待处理文本的字数低于或等于预设字数阈值;
    确定所述待处理文本与预设术语词典中的一个术语匹配。
  4. 根据权利要求2所述的方法,其中,所述片段的数量为至少两个;从至少两个片段中确定所述第一目标片段,包括:
    获取所述待处理文本在所述识别结果中对应的片段;
    确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
    基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
    接收针对所述至少两个片段的第一选择指令,根据所述第一选择指令从所述至少两个片段中确定第一目标片段。
  5. 根据权利要求1所述的方法,其中,所述确定与所述待处理文本的关联性满足预设条件的目标片段,包括:
    确定所述待处理文本符合第二预设条件;
    确定所述片段库中的至少一个片段;所述片段与所述待处理文本在所述识别结果中对应的片段的相似度符合预设相似度条件;
    从所述至少一个片段中确定第二目标片段。
  6. 根据权利要求5所述的方法,其中,所述确定待处理文本符合第二预设条件,包括以下至少之一:
    确定所述待处理文本的字数高于预设字数阈值;
    确定所述待处理文本与预设术语词典中各术语均不匹配。
  7. 根据权利要求5所述的方法,其中,所述片段的数量为至少两个时,从至少两个片段中确定所述第二目标片段,包括:
    获取所述待处理文本在所述识别结果中对应的片段;
    确定所述待处理文本在所述识别结果中对应的片段与所述至少两个片段中各片段的相似度;
    基于相似度,对所述至少两个片段进行排序,呈现排序后的所述至少两个片段;
    接收针对所述至少两个片段的第二选择指令,根据所述第二选择指令从所述至少两个片段中确定第二目标片段。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    获得所述语音数据,并对所述语音数据进行文本识别,得到所述识别结果;
    对所述识别结果进行切分,获得切分结果;所述切分结果包括至少一个片段;
    根据所述至少一个片段,更新所述片段库;所述至少一个片段中各片段与相应片段在识别结果中的位置信息对应保存在所述片段库。
  9. 根据权利要求3或6所述的方法,其中,所述方法还包括:
    对机器翻译模型的双语数据进行术语抽取,基于抽取的术语生成所述术语词典。
  10. 一种数据处理装置,包括:
    确定单元,配置为确定待处理文本;所述待处理文本为识别结果中的一段文本;所述识别结果基于语音数据确定;所述识别结果在所述语音数据被播放时进行呈现;
    第一处理单元,配置为根据所述待处理文本查询片段库,确定与所述待处理文本的语义关联性满足预设条件的目标片段;
    第二处理单元,配置为根据所述目标片段在所述识别结果中的位置信息,从所述待处理文本返回到所述目标片段进行呈现;其中,
    所述片段库包括至少一个片段和各片段在识别结果中的位置信息;所述片段库中的片段随着所述语音数据的变化而变化。
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至9任一项所述方法的步骤。
  12. 一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现权利要求1至9任一项所述方法的步骤。
PCT/CN2019/119268 2019-11-18 2019-11-18 数据处理方法、装置、电子设备和存储介质 WO2021097629A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/119268 WO2021097629A1 (zh) 2019-11-18 2019-11-18 数据处理方法、装置、电子设备和存储介质
CN201980100711.7A CN114430832A (zh) 2019-11-18 2019-11-18 数据处理方法、装置、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/119268 WO2021097629A1 (zh) 2019-11-18 2019-11-18 数据处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021097629A1 true WO2021097629A1 (zh) 2021-05-27

Family

ID=75980072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119268 WO2021097629A1 (zh) 2019-11-18 2019-11-18 数据处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN114430832A (zh)
WO (1) WO2021097629A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114368654A (zh) * 2021-12-06 2022-04-19 北京声智科技有限公司 数据处理方法、装置、设备,及计算机可读存储介质
WO2022267167A1 (zh) * 2021-06-24 2022-12-29 未鲲(上海)科技服务有限公司 文本类型智能识别方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984772A (zh) * 2014-06-04 2014-08-13 百度在线网络技术(北京)有限公司 文本检索字幕库生成方法和装置、视频检索方法和装置
US20140303958A1 (en) * 2013-04-03 2014-10-09 Samsung Electronics Co., Ltd. Control method of interpretation apparatus, control method of interpretation server, control method of interpretation system and user terminal
CN107562760A (zh) * 2016-06-30 2018-01-09 科大讯飞股份有限公司 一种语音数据处理方法及装置
CN107679032A (zh) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 语音转换纠错方法和装置
CN107785018A (zh) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 多轮交互语义理解方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033060B (zh) * 2018-08-16 2023-01-17 科大讯飞股份有限公司 一种信息对齐方法、装置、设备及可读存储介质
CN109726265A (zh) * 2018-12-13 2019-05-07 深圳壹账通智能科技有限公司 辅助聊天的信息处理方法、设备及计算机可读存储介质
CN110263149A (zh) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 一种文本展示方法及装置
CN110288985B (zh) * 2019-06-28 2022-03-08 北京猎户星空科技有限公司 语音数据处理方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140303958A1 (en) * 2013-04-03 2014-10-09 Samsung Electronics Co., Ltd. Control method of interpretation apparatus, control method of interpretation server, control method of interpretation system and user terminal
CN103984772A (zh) * 2014-06-04 2014-08-13 百度在线网络技术(北京)有限公司 文本检索字幕库生成方法和装置、视频检索方法和装置
CN107562760A (zh) * 2016-06-30 2018-01-09 科大讯飞股份有限公司 一种语音数据处理方法及装置
CN107785018A (zh) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 多轮交互语义理解方法和装置
CN107679032A (zh) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 语音转换纠错方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022267167A1 (zh) * 2021-06-24 2022-12-29 未鲲(上海)科技服务有限公司 文本类型智能识别方法、装置、设备及介质
CN114368654A (zh) * 2021-12-06 2022-04-19 北京声智科技有限公司 数据处理方法、装置、设备,及计算机可读存储介质

Also Published As

Publication number Publication date
CN114430832A (zh) 2022-05-03

Similar Documents

Publication Publication Date Title
US11164568B2 (en) Speech recognition method and apparatus, and storage medium
US10657332B2 (en) Language-agnostic understanding
CN108304375B (zh) 一种信息识别方法及其设备、存储介质、终端
US20200210468A1 (en) Document recommendation method and device based on semantic tag
CN108334490B (zh) 关键词提取方法以及关键词提取装置
CN115082602B (zh) 生成数字人的方法、模型的训练方法、装置、设备和介质
JP6361351B2 (ja) 発話ワードをランク付けする方法、プログラム及び計算処理システム
CN110415679B (zh) 语音纠错方法、装置、设备和存储介质
CN109918555B (zh) 用于提供搜索建议的方法、装置、设备和介质
WO2021134524A1 (zh) 数据处理方法、装置、电子设备和存储介质
US20120290561A1 (en) Information processing apparatus, information processing method, program, and information processing system
CN111324771B (zh) 视频标签的确定方法、装置、电子设备及存储介质
EP3405912A1 (en) Analyzing textual data
US11132108B2 (en) Dynamic system and method for content and topic based synchronization during presentations
CN108121697B (zh) 一种文本改写的方法、装置、设备和计算机存储介质
WO2021087665A1 (zh) 数据处理方法、装置、服务器和存储介质
WO2021097629A1 (zh) 数据处理方法、装置、电子设备和存储介质
CN111126084B (zh) 数据处理方法、装置、电子设备和存储介质
CN110992960A (zh) 控制方法、装置、电子设备和存储介质
CN112417875B (zh) 配置信息的更新方法、装置、计算机设备及介质
WO2024149183A1 (zh) 文档显示方法、装置及电子设备
CN110888940B (zh) 文本信息提取方法、装置、计算机设备及存储介质
CN118035487A (zh) 视频索引生成和检索方法、装置、电子设备及存储介质
WO2023103943A1 (zh) 图片处理方法、装置及电子设备
CN110162617B (zh) 提取摘要信息的方法、装置、语言处理引擎和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19953025

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 281022)

122 Ep: pct application non-entry in european phase

Ref document number: 19953025

Country of ref document: EP

Kind code of ref document: A1