CN109887492B - Data processing method and device and electronic equipment - Google Patents

Data processing method and device and electronic equipment Download PDF

Info

Publication number
CN109887492B
CN109887492B CN201811497640.2A CN201811497640A CN109887492B CN 109887492 B CN109887492 B CN 109887492B CN 201811497640 A CN201811497640 A CN 201811497640A CN 109887492 B CN109887492 B CN 109887492B
Authority
CN
China
Prior art keywords
text
punctuation
word segmentation
symbol
spliced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811497640.2A
Other languages
Chinese (zh)
Other versions
CN109887492A (en
Inventor
郑宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201811497640.2A priority Critical patent/CN109887492B/en
Publication of CN109887492A publication Critical patent/CN109887492A/en
Application granted granted Critical
Publication of CN109887492B publication Critical patent/CN109887492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention provides a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: acquiring a current voice recognition text; splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text. According to the embodiment of the invention, the punctuation at the tail of the text before pause can be determined by combining the two texts before and after pause, so that the problem of punctuation addition error due to pause is solved, and the accuracy rate of punctuation addition is improved.

Description

Data processing method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, and an electronic device.
Background
Artificial intelligence includes a very broad spectrum of science, consisting of different fields such as machine learning, computer vision, etc. In general, one of the main goals of artificial intelligence research is to enable machines to perform complex tasks that typically require human intelligence to complete; since the birth of artificial intelligence, theories and technologies are mature day by day, and application fields are expanded continuously. Such as in the field of machine translation, for example, translating chinese into english, translating english into chinese, and the like.
As machine translation technology continues to mature, machine-based, co-propagating translations have emerged, which may include: speech recognition and machine translation, as shown in FIG. 1; wherein the speech recognition comprises a plurality of stages: acquiring Voice data, VAD (Voice Activity Detection) punctuation, Voice recognition and text punctuation; the VAD sentence is to cut the voice into a plurality of voice segments according to the mute time, the text sentence is to add punctuation marks to the voice recognition texts corresponding to the voice segments, for example, the voice recognition text 'big Haohai I' is called lie, and then the punctuation marks are added to the text 'big Haohai I' is called lie.
The user may pause in the process of speaking a sentence without punctuation, for example, the user speaks a sentence first: "we are eagerly expecting", then pause for a while and say again: "this new technology"; when the pause time exceeds a threshold value, the sentence is divided into a plurality of sections; for example, a 20ms pause is formed between the two sentences, and the speech of the complete sentence is segmented into two speech segments: "the voice segment corresponding to" we are keenly expecting "and the voice segment corresponding to" this new technology ". Then, when adding punctuation to the speech recognition text segment corresponding to each speech segment, punctuation may be added to the end of the segment, for example, a period may be added to the end of the text that "we expect earnestly", resulting in "we expect earnestly". "obviously" what we are expecting "is followed by the punctuation that should not be added, resulting in punctuation addition errors.
Disclosure of Invention
The embodiment of the invention provides a data processing method for improving the accuracy of adding punctuations.
Correspondingly, the embodiment of the invention also provides a data processing device and electronic equipment, which are used for ensuring the realization and application of the method.
In order to solve the above problem, an embodiment of the present invention discloses a data processing method, which specifically includes: acquiring a current voice recognition text; splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text.
Optionally, the adding punctuation in the spliced text includes: performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments; determining symbol marks corresponding to the word segmentation segments according to the symbol matching model; and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
Optionally, the symbol matching model includes a first symbol matching model and a second symbol matching model, and determining the symbol identifier corresponding to each participle segment according to the symbol matching model includes: sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark; sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
Optionally, the determining the symbol identifier corresponding to the word segmentation segment according to the first probability information and the second probability information of each symbol identifier corresponding to the word segmentation segment includes: calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
Optionally, the method further comprises: if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation; and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
Optionally, the splicing the current translation text and the last N output texts to obtain a spliced text includes: if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
Optionally, the method is applied to the field of co-transmission.
The embodiment of the invention also discloses a data processing device, which specifically comprises: the text acquisition module is used for acquiring a current voice recognition text; the text splicing module is used for splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; and the punctuation adding module is used for adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as the current output text and outputting the current output text.
Optionally, the punctuation adding module comprises: the word segmentation sub-module is used for carrying out word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments; the punctuation determination submodule is used for determining the symbol identifications corresponding to the word segmentation segments according to the symbol matching model; and the symbol adding submodule is used for adding the symbol mark after the character corresponding to the word segmentation in the spliced text if the symbol mark of the word segmentation is the set mark.
Optionally, the symbol matching model includes a first symbol matching model and a second symbol matching model, and the punctuation determination sub-module includes: the first information determining unit is used for sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark; the second information determining unit is used for sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and the symbol determining unit is used for determining the symbol identifications corresponding to the word segmentation segments according to the first probability information and the second probability information of the word segmentation segments corresponding to the symbol identifications.
Optionally, the symbol determining unit is configured to calculate first variance information according to first probability information of each symbol identifier corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
Optionally, the apparatus further comprises: a tail punctuation adding module, configured to add a set punctuation at the tail of the punctuation-added spliced text if no punctuation exists at the tail of the punctuation-added spliced text and the current speech recognition text is the last speech recognition text of the speech data; and the tail punctuation deleting module is used for deleting the punctuation at the tail of the spliced text added with the punctuation if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data.
Optionally, the text splicing module is configured to, if N is 1, obtain a last text segment in a last output text, where the last text segment is a text after a last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
Optionally, the device is applied to the field of co-transmission.
The embodiment of the invention also discloses a readable storage medium, and when the instructions in the storage medium are executed by a processor of the electronic equipment, the electronic equipment can execute the data processing method according to any one of the embodiments of the invention.
An embodiment of the present invention also discloses an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors, and the one or more programs include instructions for: acquiring a current voice recognition text; splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text.
Optionally, the adding punctuation in the spliced text includes: performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments; determining symbol marks corresponding to the word segmentation segments according to the symbol matching model; and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
Optionally, the symbol matching model includes a first symbol matching model and a second symbol matching model, and determining the symbol identifier corresponding to each participle segment according to the symbol matching model includes: sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark; sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
Optionally, the determining the symbol identifier corresponding to the word segmentation segment according to the first probability information and the second probability information of each symbol identifier corresponding to the word segmentation segment includes: calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
Optionally, the electronic device further includes instructions for: if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation; and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
Optionally, the splicing the current translation text and the last N output texts to obtain a spliced text includes: if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
Optionally, the electronic device is applied to the field of co-transmission.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, after the current voice recognition text is obtained, the current voice recognition text and the last N output texts can be spliced to obtain a spliced text, and then punctuations are added in the spliced text; the punctuation at the end of the last output text can be further determined by combining the context; and subsequently extracting data except the last N output texts from the spliced texts added with punctuations as a current output text and outputting the current output text, and further providing punctuations at the tail of the last output text when outputting the next speech recognition text. According to the embodiment of the invention, the punctuation at the tail of the text before pause can be determined by combining the two texts before and after pause, so that the problem of punctuation addition error due to pause is solved, and the accuracy rate of punctuation addition is improved.
Drawings
FIG. 1 is a flow chart of the steps of one data processing method embodiment of the present invention;
FIG. 2 is a flow chart of the steps of an alternative embodiment of a data processing method of the present invention;
FIG. 3 is a block diagram of an embodiment of a data processing apparatus according to the present invention;
FIG. 4 is a block diagram of an alternate embodiment of a data processing apparatus of the present invention;
FIG. 5 illustrates a block diagram of an electronic device for data processing in accordance with an exemplary embodiment;
fig. 6 is a schematic structural diagram of an electronic device for data processing according to another exemplary embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core ideas of the embodiment of the invention is that after a sentence of voice recognition text is received, a punctuation at the end of the last sentence of output text is determined by combining the sentence of voice recognition text and the last several sentences of output text, and the punctuation at the end of the last sentence of output text is given when the sentence of voice recognition text is output; and then, by combining the texts before and after the pause, the punctuation at the tail of the text before the pause is determined, so that the problem of adding punctuation by mistake due to the pause is solved, and the accuracy rate of adding punctuation is improved.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:
and 102, acquiring a current voice recognition text.
And step 104, splicing the current voice recognition text and the last N output texts to obtain a spliced text.
And 106, adding punctuations in the spliced text, extracting data except the last N output texts from the spliced text added with punctuations as a current output text and outputting the current output text.
In the embodiment of the invention, in the process of voice recognition, the voice can be segmented into a plurality of voice segments according to the mute time interval; and then, sequentially carrying out voice recognition on each voice segment to obtain a voice recognition text, and adding punctuations to the voice recognition text and then outputting the voice recognition text. The output text may be further processed, such as directly displaying the output text, translating the output text into a translated text in another language, and displaying the translated text.
In the embodiment of the invention, after the voice recognition text is obtained, whether the voice recognition text is the first voice recognition text of the voice data can be judged; if the speech recognition text is the first speech recognition text of the speech data, punctuation may be added to the speech recognition text, wherein punctuation may be added between characters in the speech recognition text, and punctuation is not added at the end, and then the speech recognition text after punctuation is added (which may be referred to as an output text) is output.
If the speech recognition text is not the first speech recognition text of the speech data, the speech recognition text can be called as a current speech recognition text, at this time, the current speech recognition text and a previous output text can be spliced, and then punctuation is added to the spliced text. In the process of outputting the user voice, one sentence may be output twice, or more than two times; therefore, in order to ensure the accuracy of adding punctuations, the embodiment of the invention can acquire the last N output texts of the current speech recognition text, wherein N is a positive integer, and then the current speech recognition text is spliced with the last N output texts to obtain a spliced text. And then extracting data except the upper N output texts from the spliced texts added with punctuations as current output texts and outputting the current output texts. If a punctuation exists between the last output text and the current voice recognition text in the spliced text after punctuation is added, the current output text can comprise the punctuation and the current voice text after punctuation is added; if no punctuation exists between the last output text and the current voice recognition text in the spliced text after punctuation is added, the current output text only comprises the current voice recognition text after punctuation is added; and then the punctuation at the end of the last sentence of output text can be given when the next sentence of speech recognition text is output.
It should be noted that punctuation may or may not be added between the words of each speech recognition text (or concatenated text), so that after punctuation is added in the speech recognition text (or concatenated text), punctuation may be added between the words of the speech recognition text (or concatenated text), and at this time, the speech recognition text (or concatenated text) to which punctuation is added is different from the speech recognition text (or concatenated text); punctuation may not be added between words of the speech recognition text (or the concatenated text), and the speech recognition text (or the concatenated text) after punctuation is added is the same as the speech recognition text (or the concatenated text).
In one embodiment of the invention, for example, the sentence "good family, i call lie, and know everyone with great pleasure" is that the user says "good family, i call lie, after 20ms, says" very ", after 20ms, says" happy ", after 20ms, says" know ", after 20ms, says" everyone "; at the moment, the voice data corresponding to the good family, the good plum, the great family, the happy family, the cognitive family and the big family are respectively taken as a voice segment. Firstly, the voice recognition can be carried out on the voice segment of 'good family, i call lie', so as to obtain a voice recognition text 'good family, i call lie'; since the sentence is the first speech recognition text of the speech data, punctuation can be added in the sentence, and the speech recognition text ' good family, i ' me li lie ', with punctuation added is obtained and then output. And then, performing voice recognition on the 'very' voice segment to obtain a corresponding voice recognition text 'very', wherein the voice recognition text is not the first voice recognition text of the voice data, so that the voice recognition text can be called as a current voice recognition text, and then the current voice recognition text is spliced with the last output text to obtain a spliced text, such as 'good family, good li, and so on'. Then adding punctuation into the spliced text to obtain the spliced text with the punctuation added, such as 'good family, i call lie, very'; and then extracting data except the last output text from the punctuation added spliced text as a current output text, for example, extracting the data except the last output text, and outputting the data which is used as the current output text. And then carrying out voice recognition on the 'happy' voice segment to obtain a corresponding voice recognition text 'happy', wherein the voice recognition text is not the first voice recognition text of the voice data, so that the voice recognition text can be called as a current voice recognition text, and then the current voice recognition text and the last output text are spliced to obtain a spliced text 'happy'. Adding punctuation into the spliced text to obtain the spliced text with punctuation added, such as 'happy'; and then extracting data except the last output text from the punctuation added spliced text as a current output text, for example, extracting and outputting 'happy' as the current output text. Then, the speech segment of "know" is subjected to speech recognition to obtain a corresponding speech recognition text "know", and since the speech recognition text is not the first speech recognition text of the speech data, the speech recognition text can be called as a current speech recognition text, and then the current speech recognition text and the last two output texts (", very" and "happy") are spliced to obtain a spliced text "e.g.," very happy know ". Adding punctuation into the spliced text to obtain the spliced text with punctuation added, such as 'happy understanding'; and then extracting data except the two output texts from the punctuation added spliced text as a current output text, for example, extracting 'knowledge' as the current output text and outputting. Then, the voice recognition is carried out on the voice segment of the 'big' to obtain a corresponding voice recognition text 'big', the voice recognition text is not the first voice recognition text of the voice data, so that the voice recognition text can be called as a current voice recognition text, and then the current voice recognition text is spliced with the three output texts ('very', 'happy' and 'know'), so that a spliced text is obtained, such as 'very happy know big'. Adding punctuation into the spliced text to obtain the spliced text with punctuation added, such as 'good understanding'; and then extracting data except the upper three output texts from the spliced text added with punctuations as a current output text, for example, extracting and outputting 'big' as the current output text.
In one embodiment of the present invention, for example, in the sentence "i am london arrived in june", the user says "i am arrived in june" first and then says "london" 30ms apart; at this time, the corresponding voice data of "i am a month of june" is taken as one voice fragment, and "london" is taken as another voice fragment. Firstly, the speech segment of 'I am in June arrives' can be subjected to speech recognition to obtain a speech recognition text 'I am in June arrives'; since the sentence is the first speech recognition text of the speech data, punctuation can be added to the sentence, and the speech recognition text "i am due to june" after punctuation is added is obtained and then output. Then, the voice recognition is carried out on the voice segment of the London to obtain a corresponding voice recognition text of the London, the voice recognition text is not the first voice recognition text of the voice data, the voice recognition text can be called as a current voice recognition text, and then the current voice recognition text and a last output text are spliced to obtain a spliced text such as the London in June. Adding punctuation into the spliced text to obtain the spliced text with punctuation added, such as 'London from June'; and extracting data except the last output text from the punctuation added spliced text to serve as a current output text, for example, extracting London to serve as the current output text and outputting the current output text.
In the embodiment of the invention, after the current voice recognition text is obtained, the current voice recognition text and the last N output texts can be spliced to obtain a spliced text, and then punctuations are added in the spliced text; the punctuation at the end of the last output text can be further determined by combining the context; and subsequently extracting data except the last N output texts from the spliced texts added with punctuations as a current output text and outputting the current output text, and further providing punctuations at the tail of the last output text when outputting the next speech recognition text. According to the embodiment of the invention, the punctuation at the tail of the text before pause can be determined by combining the two texts before and after pause, so that the problem of punctuation addition error due to pause is solved, and the accuracy rate of punctuation addition is improved.
In another embodiment of the invention, punctuation between characters of the spliced text can be determined by adopting a symbol matching model, so that punctuation is added in the spliced text; the symbolic matching model may be used to determine punctuation connected after each word segmentation segment, such as a language model (Ngram model), a neural network model, and the like.
Referring to fig. 2, a flowchart illustrating steps of an alternative embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:
step 202, obtaining a current speech recognition text.
And 204, splicing the current voice recognition text and the last N output texts to obtain a spliced text.
In the embodiment of the invention, after the voice recognition text is obtained by recognizing the voice fragment, the voice recognition text can be obtained, if the voice recognition text is the first voice recognition text of the voice data, punctuation can be added in the voice recognition text and output, wherein the punctuation can not be added at the tail of the voice recognition text. If the speech recognition text is not the first speech recognition text of the speech data, the speech recognition text can be called as a current speech recognition text, then the current speech recognition text and the last N output texts are spliced, and punctuations are added in the spliced text.
The output texts before the current speech recognition text may be longer in some texts and shorter in some texts, so that if a plurality of output texts before the current speech recognition text are shorter, the last output texts of the current speech recognition text can be obtained, and then the current speech recognition text and the last output texts are spliced (at this time, N is greater than 1); if the last output text of the current speech recognition text is longer, the last output text of the current speech recognition text can be obtained, and then the current speech recognition text and the last output text are spliced (at this time, N is 1).
If the current translation text is spliced with the previous output text, punctuations may be added between the characters of the previous output text, and punctuations may not be added; when the punctuation is added to the previous output text, the last section of text in the previous output text can be saved while the previous output text is output, wherein the last section of text is the text after the last punctuation in the previous output text, so that the storage space is saved; and in the splicing process, the last section of text in the last output text can be obtained, and the current translation text is spliced behind the last section of text to obtain a spliced text. Of course, punctuation may not be added between the characters of the last output text, and at this time, the text may be stored while the last output text is output; and then in the splicing process, splicing the current voice recognition text after the last output text to obtain a spliced text.
And step 206, performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments.
And 208, determining the symbol identifications corresponding to the word segmentation segments according to the symbol matching model.
In the embodiment of the present invention, punctuation may be added to the spliced text by using a symbol matching model, wherein the spliced text may be divided into a plurality of word segmentation segments, and then the word segmentation segments are sequentially input into the symbol matching model, so as to obtain symbol marks corresponding to the word segmentation segments, where the symbol marks may include a plurality of kinds, such as marks of various punctuation marks, e.g., ". "; ""! "? ", for example, a symbol identifier of a non-punctuation mark such as" @ ", the non-punctuation mark being used for representing a case without punctuation marks; of course, the symbol mark of the punctuation mark may also include other symbols such as "… …", and the symbol of the non-punctuation mark may also be other symbols such as "rah", etc., which is not limited in this respect.
In this embodiment of the present invention, the symbol matching model may include a first symbol matching model and a second symbol matching model, where the first symbol matching model may be one of a speech model and a neural network model, and the second symbol matching model may be the other of the speech model and the neural network model; of course, the first symbol matching model and the second symbol model may be other models, and the embodiment of the present invention is not limited thereto.
In an example of the present invention, a method for determining a symbol identifier corresponding to each participle segment according to a symbol matching model may include the following sub-steps:
substep S2: and sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark.
Substep S4: and sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark.
Substep S6: and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
In the embodiment of the present invention, the word segmentation segments may be sequentially input into the first symbol matching model according to the sequence of the word segmentation segments in the spliced text, and after the word segmentation segments are input into the first symbol model, the word segmentation segments may be processed by the first symbol matching model. The first symbol matching model can calculate, for each word segmentation segment, first probability information of each symbol mark connected after the word segmentation segment. Correspondingly, the word segmentation segments may be sequentially input into the second symbol matching model, and the second symbol matching model calculates second probability information of each symbol mark connected after the word segmentation segment.
Then, aiming at each word segmentation segment, determining a symbol mark corresponding to the word segmentation segment according to the first probability information and the second probability information of the word segmentation segment; the method comprises the following specific steps: calculating first variance information according to the first probability information of each symbol mark; calculating second variance information according to the second probability information of each symbol identifier; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment. The larger the variance information is, the larger the probability information difference between the symbol identifications is, which indicates that the reliability of the corresponding symbol matching model is higher, and then the symbol identifications corresponding to the word segmentation segments are determined according to a group of probability information with the larger variance information, so that the reliability of adding punctuations can be improved, and the accuracy of adding punctuations is further improved.
In another example of the present invention, a manner of determining the symbolic mark corresponding to each participle segment according to the symbolic matching model may be that, after performing the substep S2, the substeps S4-S6 are not performed, and for each participle segment, the symbolic mark corresponding to the participle segment is determined directly according to the first probability information of the participle segment; the symbol identifier with the largest first probability information can be selected as the symbol identifier corresponding to the word segmentation segment.
In another example of the present invention, a manner of determining the symbol identifier corresponding to each participle segment according to the symbol matching model may be that the substeps S2 and S6 are not performed, and the substep S4 is directly performed; for each word segmentation segment, determining a symbol identifier corresponding to the word segmentation segment directly according to the second probability information of the word segmentation segment; the symbol identifier with the largest second probability information can be selected as the symbol identifier corresponding to the word segmentation segment.
Correspondingly, the texts after the punctuation in the last sentence in the last output text are spliced, so that the number of word segmentation segments processed by the symbol matching model can be reduced, and the effect of increasing the efficiency of punctuation addition in the spliced texts can be achieved.
Step 210, if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
In the embodiment of the invention, the symbol marks of the word segmentation segments can be the symbol marks of punctuation marks or can be the symbol marks of non-punctuation marks; therefore, for each word segmentation segment, whether the symbol mark of the word segmentation segment is the set mark can be judged. If the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text; if the symbol mark of the word segmentation segment is not the set mark, punctuation does not need to be added after the characters corresponding to the word segmentation segment in the spliced text, and whether the symbol mark of the next word segmentation segment is the set mark or not is directly judged. The setting mark may include a symbol mark of a punctuation mark, and may be specifically set as required.
When the symbol mark of the word segmentation segment is determined to be the set mark, if the word segmentation segment is not the last segment of the spliced text, the character corresponding to the word segmentation segment in the spliced text and the symbol mark corresponding to the word segmentation segment between the characters corresponding to the next word segmentation segment of the word segmentation segment can be added.
Of course, when it is determined that the symbol identifier of the segmentation segment is the set identifier, if the segmentation segment is the last segment of the spliced text, the symbol identifier corresponding to the segmentation segment may be added after the character corresponding to the segmentation segment in the spliced text (i.e., at the end of the spliced text); or adding a symbol mark corresponding to the word segmentation segment not after the character corresponding to the word segmentation segment in the spliced text (namely, at the tail of the spliced text); the method and the device can be specifically set according to requirements, and the embodiment of the invention is not limited in this respect.
And step 212, if no punctuation exists at the tail of the spliced text added with punctuation and the current speech recognition text is the last speech recognition text of the speech data, adding a set punctuation at the tail of the spliced text added with punctuation.
And 214, if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
In the embodiment of the invention, except the output text corresponding to the last speech recognition text of the speech data, punctuations at the tail of the output texts corresponding to other speech recognition texts can be given at the beginning of the next output text; furthermore, the end of the output text corresponding to other speech recognition texts may not contain punctuation, and the end of the output text corresponding to the last speech recognition text may contain punctuation. Therefore, after adding punctuation in the spliced text, if the current speech recognition text is not the last text of the speech data, it can be determined whether punctuation exists at the end of the spliced text after adding punctuation. If the punctuation exists at the tail of the spliced text after the punctuation is added, deleting the punctuation at the tail of the spliced text after the punctuation is added; if no punctuation exists at the end of the spliced text after punctuation is added, step 216 may be performed. If the current voice recognition text is the last text of the voice data, judging whether punctuation exists at the tail of the spliced text after punctuation is added, and if the punctuation does not exist at the tail of the spliced text after punctuation is added, adding a set punctuation at the tail of the spliced text after punctuation is added; if punctuation exists at the end of the spliced text after punctuation is added, step 216 may be performed. The set punctuation can be set as required, wherein the set punctuation can include punctuation symbols for sentence termination.
And step 216, extracting data except the last N output texts from the spliced texts added with punctuations as current output texts and outputting the current output texts.
Then extracting data except the last N output texts from the spliced texts added with punctuations as a current output text and outputting the current output text, wherein if the current speech recognition text is not the last speech recognition text of the speech data, the end of the current output text has no punctuations; if the current speech recognition text is the last speech recognition text of the speech data, the end of the current output text may contain punctuation.
The embodiment of the invention can be applied to the field of simultaneous transmission, for example, after the voice data is acquired by the voice acquisition device, the voice data can be sent to the voice recognition service, the voice recognition service can divide the voice data into a plurality of voice segments according to the time interval of receiving the voice data, then recognize each voice segment, and then execute the steps 202 to 216. The speech recognition service can output the current output text to a machine translation service, and the machine translation service translates the current output text into a translation text of another speech; then, on one hand, the translated text can be sent to display equipment to display the translated text, on the other hand, the translated text can be sent to a voice conversion service to convert the translated text into voice of a corresponding language, and then the voice is output to voice playing equipment to be played, so that simultaneous transmission is realized. The speech recognition service, the machine translation service, and the speech conversion service may be deployed on the same device, or may be deployed on different devices, which may be specifically set according to requirements.
In the embodiment of the invention, after the current voice recognition text is obtained, the current voice recognition text and the last N output texts can be spliced to obtain a spliced text, and then punctuations are added in the spliced text; and extracting data except the last N output texts from the spliced texts added with punctuations as a current output text and outputting the current output text, and then giving out punctuations at the tail of the last output text when outputting the next speech recognition text, thereby improving the accuracy of adding punctuations at the tail of the last output text. In addition, punctuation can be added to the current speech recognition text by combining the last output text, and the accuracy of adding punctuation between characters of the current output text can be improved.
Secondly, after adding punctuations in the spliced text, if no punctuation exists at the tail of the spliced text with the punctuations added and the current speech recognition text is the last speech recognition text of the speech data, adding a set punctuation at the tail of the spliced text with the punctuations added; and then can guarantee that the last speech recognition text of speech data corresponds the end of the output text and carries the punctuation, guarantee the integrality of output text to and improved user experience.
Further, in the embodiment of the present invention, word segmentation may be performed on the spliced text, then the symbol identifier corresponding to each word segmentation is determined according to the symbol matching model, and if the symbol identifier of the word segmentation is a set identifier, the symbol identifier is added after the word corresponding to the word segmentation in the spliced text. The word segmentation method comprises the steps that word segmentation fragments can be input into a first symbol matching model in sequence, and first probability information of each word segmentation fragment corresponding to each symbol mark is obtained; sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment. And then according to the result output by the two symbol matching models, the accuracy of determining the symbol mark corresponding to each word segmentation segment is improved, so that the accuracy of adding punctuations is further improved.
Further, in the embodiment of the present invention, first variance information is calculated according to the first probability information of each symbol identifier; calculating second variance information according to the second probability information of each symbol identifier; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment. The larger the variance information is, the larger the probability information difference between the symbol identifications is, and the higher the credibility of the corresponding symbol matching model is, so that the credibility of the added punctuation can be improved, and the accuracy of the added punctuation is further improved.
In the embodiment of the invention, in the splicing process, if N is 1, the last section of text in the last output text can be obtained, and the last section of text is the text after the last punctuation in the last output text; and then splicing the current translation text after the last text segment to obtain a spliced text. Therefore, only the last punctuation text in the last output text is needed to be saved, and the storage space is saved; and the efficiency of adding punctuations in the spliced text subsequently can be improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 3, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a text obtaining module 302, configured to obtain a current speech recognition text;
a text splicing module 304, configured to splice the current speech recognition text and the last N output texts to obtain a spliced text, where N is a positive integer;
and a punctuation adding module 306, configured to add punctuation in the spliced text, extract data other than the last N output texts from the spliced text to which punctuation is added, as a current output text, and output the current output text.
Referring to fig. 4, a block diagram of an alternative embodiment of a data processing apparatus of the present invention is shown.
In an optional embodiment of the present invention, the apparatus further comprises:
and an end punctuation adding module 308, configured to add a set punctuation at the end of the punctuation-added spliced text if a punctuation does not exist at the end of the punctuation-added spliced text and the current speech recognition text is the last speech recognition text of the speech data.
In an optional embodiment of the present invention, the apparatus further comprises:
and a last punctuation deleting module 310, configured to delete a punctuation at the end of the spliced text to which the punctuation is added if the punctuation exists at the end of the spliced text to which the punctuation is added and the current speech recognition text is not the last speech recognition text of the speech data.
In an optional embodiment of the present invention, the punctuation adding module 306 includes:
the word segmentation submodule 3062 is configured to perform word segmentation on the spliced text to obtain a plurality of corresponding word segmentation segments;
the punctuation determination submodule 3064 is configured to determine symbol identifiers corresponding to the word segmentation according to the symbol matching model;
the symbol adding submodule 3066 is configured to add the symbol identifier after the character corresponding to the word segmentation in the spliced text if the symbol identifier of the word segmentation is the set identifier.
In an alternative embodiment of the present invention, the symbol matching model includes a first symbol matching model and a second symbol matching model, and the punctuation determination submodule 3064 includes:
a first information determination unit 30642, configured to sequentially input each word segmentation into the first symbol matching model, so as to obtain first probability information of each word segmentation corresponding to each symbol identifier;
a second information determination unit 30644, configured to input each word segmentation segment into the second symbol matching model in sequence, so as to obtain second probability information of each word segmentation segment corresponding to each symbol identifier;
the symbol determination unit 30646 is configured to determine, for a participle segment, a symbol identifier corresponding to the participle segment according to the first probability information and the second probability information of each symbol identifier corresponding to the participle segment.
In an optional embodiment of the present invention, the symbol determination unit 30646 is configured to calculate first variance information according to first probability information of each symbol identifier corresponding to the word segmentation; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
In an optional embodiment of the present invention, the text splicing module 304 is configured to, if N is 1, obtain a last text segment in a last output text, where the last text segment is a text after a last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
An alternative embodiment of the present invention is applicable to the field of simulcast.
In the embodiment of the invention, after the current voice recognition text is obtained, the current voice recognition text and the last N output texts can be spliced to obtain a spliced text, and then punctuations are added in the spliced text; the punctuation at the end of the last output text can be further determined by combining the context; and subsequently extracting data except the last N output texts from the spliced texts added with punctuations as a current output text and outputting the current output text, and further providing punctuations at the tail of the last output text when outputting the next speech recognition text. According to the embodiment of the invention, the punctuation at the tail of the text before pause can be determined by combining the two texts before and after pause, so that the problem of punctuation addition error due to pause is solved, and the accuracy rate of punctuation addition is improved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Fig. 5 is a block diagram illustrating a structure of an electronic device 500 for data processing according to an example embodiment. For example, the electronic device 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, electronic device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.
The processing component 502 generally controls overall operation of the electronic device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on the electronic device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power component 506 provides power to the various components of the electronic device 500. Power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 500.
The multimedia component 508 includes a screen that provides an output interface between the electronic device 500 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the electronic device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of components, such as a display and keypad of the electronic device 500, the sensor assembly 514 may detect a change in the position of the electronic device 500 or a component of the electronic device 500, the presence or absence of user contact with the electronic device 500, orientation or acceleration/deceleration of the electronic device 500, and a change in the temperature of the electronic device 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate wired or wireless communication between the electronic device 500 and other devices. The electronic device 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 514 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 514 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the electronic device 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a data processing method, the method comprising: acquiring a current voice recognition text; splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text.
Optionally, the adding punctuation in the spliced text includes: performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments; determining symbol marks corresponding to the word segmentation segments according to the symbol matching model; and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
Optionally, the symbol matching model includes a first symbol matching model and a second symbol matching model, and determining the symbol identifier corresponding to each participle segment according to the symbol matching model includes: sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark; sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
Optionally, the determining the symbol identifier corresponding to the word segmentation segment according to the first probability information and the second probability information of each symbol identifier corresponding to the word segmentation segment includes: calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
Optionally, the method further comprises: if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation; and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
Optionally, the splicing the current translation text and the last N output texts to obtain a spliced text includes: if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
Optionally, the method is applied to the field of co-transmission.
Fig. 6 is a schematic structural diagram of an electronic device 600 for data processing according to another exemplary embodiment of the present invention. The electronic device 600 may be a server, which may vary greatly due to different configurations or capabilities, and may include one or more Central Processing Units (CPUs) 622 (e.g., one or more processors) and memory 632, one or more storage media 630 (e.g., one or more mass storage devices) storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 622 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the server.
The server may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, one or more keyboards 656, and/or one or more operating systems 641, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for: acquiring a current voice recognition text; splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer; adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text.
Optionally, the adding punctuation in the spliced text includes: performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments; determining symbol marks corresponding to the word segmentation segments according to the symbol matching model; and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
Optionally, the symbol matching model includes a first symbol matching model and a second symbol matching model, and determining the symbol identifier corresponding to each participle segment according to the symbol matching model includes: sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark; sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark; and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
Optionally, the determining the symbol identifier corresponding to the word segmentation segment according to the first probability information and the second probability information of each symbol identifier corresponding to the word segmentation segment includes: calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
Optionally, the electronic device further includes instructions for: if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation; and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
Optionally, the splicing the current translation text and the last N output texts to obtain a spliced text includes: if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
Optionally, the method is applied to the field of co-transmission.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The data processing method, the data processing apparatus and the electronic device provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (22)

1. A data processing method, comprising:
acquiring a current voice recognition text;
splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer;
adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text;
and if the punctuation exists between the last output text and the current voice recognition text in the spliced text after the punctuation is added, the current output text comprises the punctuation and the current voice text after the punctuation is added, and the punctuation is not added at the tail of the spliced text.
2. The method of claim 1, wherein adding punctuation to the stitched text comprises:
performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments;
determining symbol marks corresponding to the word segmentation segments according to the symbol matching model;
and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
3. The method according to claim 2, wherein the symbol matching model includes a first symbol matching model and a second symbol matching model, and the determining the symbol identifier corresponding to each participle segment according to the symbol matching model includes:
sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark;
sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark;
and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
4. The method according to claim 3, wherein the determining the symbolic marks corresponding to the word segmentation segments according to the first probability information and the second probability information of the symbolic marks corresponding to the word segmentation segments comprises:
calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment;
calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark;
if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment;
and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
5. The method of claim 1, further comprising:
if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation;
and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
6. The method of claim 1, wherein the splicing the current translated text and the last N output texts to obtain a spliced text comprises:
if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text;
and splicing the current translation text after the last text segment to obtain a spliced text.
7. The method according to any one of claims 1 to 6, wherein the method is applied in the field of co-transmission.
8. A data processing apparatus, comprising:
the text acquisition module is used for acquiring a current voice recognition text;
the text splicing module is used for splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer;
the punctuation adding module is used for adding punctuation in the spliced text, extracting data except the upper N output texts from the spliced text added with punctuation as a current output text and outputting the current output text;
and if the punctuation exists between the last output text and the current voice recognition text in the spliced text after the punctuation is added, the current output text comprises the punctuation and the current voice text after the punctuation is added, and the punctuation is not added at the tail of the spliced text.
9. The apparatus of claim 8, wherein the punctuation addition module comprises:
the word segmentation sub-module is used for carrying out word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments;
the punctuation determination submodule is used for determining the symbol identifications corresponding to the word segmentation segments according to the symbol matching model;
and the symbol adding submodule is used for adding the symbol mark after the character corresponding to the word segmentation in the spliced text if the symbol mark of the word segmentation is the set mark.
10. The apparatus of claim 9, wherein the symbol matching model comprises a first symbol matching model and a second symbol matching model, and wherein the punctuation determination sub-module comprises:
the first information determining unit is used for sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark;
the second information determining unit is used for sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark;
and the symbol determining unit is used for determining the symbol identifications corresponding to the word segmentation segments according to the first probability information and the second probability information of the word segmentation segments corresponding to the symbol identifications.
11. The apparatus of claim 10,
the symbol determining unit is used for calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment; calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark; if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment; and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
12. The apparatus of claim 8, further comprising:
a tail punctuation adding module, configured to add a set punctuation at the tail of the punctuation-added spliced text if no punctuation exists at the tail of the punctuation-added spliced text and the current speech recognition text is the last speech recognition text of the speech data;
and the tail punctuation deleting module is used for deleting the punctuation at the tail of the spliced text added with the punctuation if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data.
13. The apparatus of claim 8,
the text splicing module is used for acquiring the last section of text in the last output text if N is 1, wherein the last section of text is the text after the last punctuation in the last output text; and splicing the current translation text after the last text segment to obtain a spliced text.
14. The apparatus according to any one of claims 8-13, applied in the field of co-transmission.
15. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method according to any of method claims 1-7.
16. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
acquiring a current voice recognition text;
splicing the current voice recognition text and the last N output texts to obtain a spliced text, wherein N is a positive integer;
adding punctuation in the spliced text, extracting data except the last N output texts from the spliced text added with punctuation as a current output text and outputting the current output text;
and if the punctuation exists between the last output text and the current voice recognition text in the spliced text after the punctuation is added, the current output text comprises the punctuation and the current voice text after the punctuation is added, and the punctuation is not added at the tail of the spliced text.
17. The electronic device of claim 16, wherein adding punctuation in the stitched text comprises:
performing word segmentation processing on the spliced text to obtain a plurality of corresponding word segmentation segments;
determining symbol marks corresponding to the word segmentation segments according to the symbol matching model;
and if the symbol mark of the word segmentation segment is a set mark, adding the symbol mark after the character corresponding to the word segmentation segment in the spliced text.
18. The electronic device of claim 17, wherein the symbol matching model comprises a first symbol matching model and a second symbol matching model, and wherein determining the symbol identifier corresponding to each participle segment according to the symbol matching model comprises:
sequentially inputting each word segmentation segment into the first symbol matching model to obtain first probability information of each word segmentation segment corresponding to each symbol mark;
sequentially inputting each word segmentation segment into the second symbol matching model to obtain second probability information of each word segmentation segment corresponding to each symbol mark;
and aiming at a word segmentation segment, determining the symbol identifications corresponding to the word segmentation segment according to the first probability information and the second probability information of the symbol identifications corresponding to the word segmentation segment.
19. The electronic device of claim 18, wherein the determining the symbolic identifications corresponding to the segment of the word according to the first probability information and the second probability information of the symbolic identifications corresponding to the segment of the word comprises:
calculating first variance information according to first probability information of each symbol mark corresponding to the word segmentation segment;
calculating second variance information according to second probability information of the word segmentation corresponding to each symbol mark;
if the first variance information is larger than the second variance information, selecting the symbol mark with the maximum first probability information as the symbol mark corresponding to the word segmentation segment;
and if the second variance information is larger than the first variance information, selecting the symbol mark with the maximum second probability information as the symbol mark corresponding to the word segmentation segment.
20. The electronic device of claim 16, further comprising instructions for:
if no punctuation exists at the tail of the spliced text added with punctuation and the current voice recognition text is the last voice recognition text of the voice data, adding a set punctuation at the tail of the spliced text added with punctuation;
and if the punctuation exists at the tail of the spliced text added with the punctuation and the current voice recognition text is not the last voice recognition text of the voice data, deleting the punctuation at the tail of the spliced text added with the punctuation.
21. The electronic device of claim 16, wherein the concatenating the current translated text and the last N output texts to obtain a concatenated text comprises:
if N is 1, acquiring a last section of text in the last output text, wherein the last section of text is a text after the last punctuation in the last output text;
and splicing the current translation text after the last text segment to obtain a spliced text.
22. Electronic equipment according to any of claims 16-21, characterized in that it is applied in the field of simulcast.
CN201811497640.2A 2018-12-07 2018-12-07 Data processing method and device and electronic equipment Active CN109887492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811497640.2A CN109887492B (en) 2018-12-07 2018-12-07 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811497640.2A CN109887492B (en) 2018-12-07 2018-12-07 Data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109887492A CN109887492A (en) 2019-06-14
CN109887492B true CN109887492B (en) 2021-02-12

Family

ID=66925008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811497640.2A Active CN109887492B (en) 2018-12-07 2018-12-07 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109887492B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112466286A (en) * 2019-08-19 2021-03-09 阿里巴巴集团控股有限公司 Data processing method and device and terminal equipment
CN111261162B (en) * 2020-03-09 2023-04-18 北京达佳互联信息技术有限公司 Speech recognition method, speech recognition apparatus, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition
CN107291704A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN108564953A (en) * 2018-04-20 2018-09-21 科大讯飞股份有限公司 A kind of punctuate processing method and processing device of speech recognition text
CN108597517A (en) * 2018-03-08 2018-09-28 深圳市声扬科技有限公司 Punctuation mark adding method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2798678B2 (en) * 1988-08-24 1998-09-17 株式会社リコー Japanese sentence generator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition
CN107291704A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN108597517A (en) * 2018-03-08 2018-09-28 深圳市声扬科技有限公司 Punctuation mark adding method, device, computer equipment and storage medium
CN108564953A (en) * 2018-04-20 2018-09-21 科大讯飞股份有限公司 A kind of punctuate processing method and processing device of speech recognition text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于改进的多层BLSTM的中文分词和标点预测》;李雅昆等;《计算机应用》;20180510;第38卷(第5期);第1278-1282、1314页 *

Also Published As

Publication number Publication date
CN109887492A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN107291690B (en) Punctuation adding method and device and punctuation adding device
CN107221330B (en) Punctuation adding method and device and punctuation adding device
CN107291704B (en) Processing method and device for processing
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
CN108628813B (en) Processing method and device for processing
CN111831806B (en) Semantic integrity determination method, device, electronic equipment and storage medium
US11335348B2 (en) Input method, device, apparatus, and storage medium
CN109558599B (en) Conversion method and device and electronic equipment
EP3734472A1 (en) Method and device for text processing
CN112001364A (en) Image recognition method and device, electronic equipment and storage medium
CN111160047A (en) Data processing method and device and data processing device
CN111369978A (en) Data processing method and device and data processing device
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN112735396A (en) Speech recognition error correction method, device and storage medium
CN109887492B (en) Data processing method and device and electronic equipment
CN113539233A (en) Voice processing method and device and electronic equipment
CN111739535A (en) Voice recognition method and device and electronic equipment
CN113343675A (en) Subtitle generating method and device for generating subtitles
CN109979435B (en) Data processing method and device for data processing
CN109977424B (en) Training method and device for machine translation model
CN114154395A (en) Model processing method and device for model processing
CN108345590B (en) Translation method, translation device, electronic equipment and storage medium
CN110780749B (en) Character string error correction method and device
CN113589954A (en) Data processing method and device and electronic equipment
CN110019928B (en) Video title optimization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant