CN110705254A

CN110705254A - Text sentence-breaking method and device, electronic equipment and storage medium

Info

Publication number: CN110705254A
Application number: CN201910927354.3A
Authority: CN
Inventors: 孔常青; 高建清; 刘聪; 胡国平; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-01-17
Anticipated expiration: 2039-09-27
Also published as: CN110705254B

Abstract

The embodiment of the invention provides a text sentence-breaking method, a text sentence-breaking device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a character feature vector of each character in the text; inputting the character feature vector of each character into the sentence-breaking model to obtain the sentence-breaking probability of each character output by the sentence-breaking model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks; determining a plurality of candidate sentence-break results based on the sentence-break probability of each word; and determining a sentence-break result based on a preset word number threshold and a plurality of candidate sentence-break results. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention can obtain the sentence-breaking result of which the length of each sentence is less than or equal to the preset word number threshold while ensuring that the local semantics are not cut off, thereby realizing efficient and accurate text sentence-breaking and avoiding the loss of labor cost and time cost.

Description

Text sentence-breaking method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a text sentence-breaking method and apparatus, an electronic device, and a storage medium.

Background

The caption refers to the commentary displayed in the playing interface when the audio and video is played, and can help audiences understand the audio and video content.

At present, the punctuation of the subtitle text is mostly finished manually, which consumes manpower resources and time cost. Although the rapid development of natural language processing technology makes text sentence-breaking technology become mature day by day, when breaking a caption text sentence, there is usually a specific text sentence-breaking requirement, and the general text sentence-breaking technology cannot meet the sentence-breaking requirement of the caption text.

Disclosure of Invention

The embodiment of the invention provides a text sentence-breaking method and device, electronic equipment and a storage medium, which are used for solving the problems that the existing caption text sentence-breaking is finished manually and wastes time and labor.

In a first aspect, an embodiment of the present invention provides a text sentence-breaking method, including:

determining a character feature vector of each character in the text;

inputting the character feature vector of each character into a sentence-breaking model to obtain the sentence-breaking probability of each character output by the sentence-breaking model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks;

determining a plurality of candidate sentence-break results based on the sentence-break probability of each word;

and determining a sentence-break result based on a preset word number threshold value and the plurality of candidate sentence-break results.

Preferably, the determining a plurality of candidate sentence-break results based on the sentence-break probability of each word specifically includes:

constructing a search tree based on the sentence break probability of each word;

based on the search tree, a plurality of candidate sentence-break results are determined.

Preferably, the determining a sentence-break result based on a preset word count threshold and the plurality of candidate sentence-break results specifically includes:

ranking the plurality of candidate sentence-break results based on the sequence of sentence-break scores from large to small; the sentence break score is determined based on the sentence break probability of the character corresponding to each sentence break position in the candidate sentence break result and the sentence break probability of the character corresponding to each sentence break position;

starting from the first candidate sentence-break result, if the word number of each sentence in the current candidate sentence-break result is less than or equal to the preset word number threshold, taking the current candidate sentence-break result as the sentence-break result; otherwise, updating the next candidate sentence-break result as the current candidate sentence-break result.

Preferably, the determining the sentence break result based on a preset word count threshold and the plurality of candidate sentence break results further comprises:

if clauses with the word number larger than the preset word number threshold exist in each candidate sentence-break result, determining clauses with the word number larger than the preset word number threshold in the first candidate sentence-break result;

sentence interruption is carried out based on the sentence interruption probability of each word in the clauses with the word number larger than the preset word number threshold value until the word number of each clause in the first candidate sentence interruption result is smaller than or equal to the preset word number threshold value;

and taking the first candidate sentence-breaking result as the sentence-breaking result.

Preferably, the sentence break based on the sentence break probability of each word in the sentence with the word number greater than the preset word number threshold further includes:

determining the distance between any word and the position of the last sentence break in the clauses with the word number larger than the preset word number threshold;

determining a distance excitation probability of any word based on the distance;

and updating the sentence break probability of any character based on the sentence break probability and the distance excitation probability of any character.

Preferably, the determining the word feature vector of each word in the text specifically includes:

determining the word feature vector of any word based on the word vector of any word or based on the word vector and the assistant feature vector of any word;

wherein the supplementary feature vector comprises at least one of a location feature vector, a word co-occurrence feature vector, and an acoustic feature vector; the position feature vector represents the position of any character in the participle, and the character co-occurrence feature vector represents the co-occurrence condition of any character and a sentence break.

Preferably, the word co-occurrence feature vector of any word includes mutual information of the any word and a sentence break; the mutual information is determined based on the sentence break occurrence probability, the word occurrence probability of any word and the word break co-occurrence probability.

Preferably, the determining a word feature vector of each word in the text further comprises:

extracting audio data from the audio and video file;

and performing voice recognition on the audio data to obtain the text.

Preferably, the determining a sentence-break result based on a preset word count threshold and the plurality of candidate sentence-break results further comprises:

determining the boundary of the front and back moments of each clause in the sentence-break result based on the audio data corresponding to the text;

and converting the text into a text in a subtitle format based on the sentence break result and the boundary of the front moment and the rear moment of each clause.

In a second aspect, an embodiment of the present invention provides a text sentence-breaking device, including:

the character feature vector determining unit is used for determining a character feature vector of each character in the text;

the sentence break probability determining unit is used for inputting the character feature vector of each character into a sentence break model to obtain the sentence break probability of each character output by the sentence break model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks;

a candidate sentence-break result determining unit, configured to determine a plurality of candidate sentence-break results based on the sentence-break probability of each word;

and the sentence break result determining unit is used for determining a sentence break result based on a preset word number threshold value and the candidate sentence break results.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, where the processor and the communication interface, the memory complete communication with each other through the bus, and the processor may call a logic instruction in the memory to perform the steps of the method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the text sentence-breaking method, the text sentence-breaking device, the electronic equipment and the storage medium, the text sentence-breaking is carried out based on the preset word number threshold and the sentence-breaking probability of each word, the sentence-breaking result that the length of each sentence is smaller than or equal to the preset word number threshold is obtained while the local semantics are not cut off, efficient and accurate text sentence-breaking is achieved, and the loss of labor cost and time cost is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a text sentence-breaking method according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a method for determining a word feature vector according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a sentence break model according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a sentence break result determination method according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a search tree according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a text sentence-breaking device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the traditional subtitle generation method, voice information in audio and video needs to be transcribed into subtitle texts through manual audiometry, and then the subtitle texts are subjected to sentence break processing, so that the subtitle generation method meets the requirements of the audio and video on subtitles. Assuming that the word number requirement of the audio and video on the caption is N words, the working personnel is required to segment the caption text within the N words, so that each word can not be longer than the N words, and the local semantics can not be cut off. The process consumes manpower resources and time cost, and cannot meet the subtitle synchronization requirement of live programs.

In recent years, with the rapid development of natural language processing technology, text sentence segmentation technology is becoming mature, but the general text sentence segmentation technology generally performs sentence segmentation in word units, does not have the requirement on the number of words of a sentence, has obvious difference from the caption sentence segmentation requirement of a television program scene, and cannot meet the sentence segmentation requirement of an actual caption system.

In order to solve the above problem, an embodiment of the present invention provides a text sentence-breaking method. The text sentence segmentation method provided by the embodiment of the invention can be applied to a non-real-time subtitle offline generation scene, can also be applied to a real-time subtitle online generation scene, and can also be applied to other scenes with word number requirements on the sentence segmentation. In the following embodiments, a subtitle generating scene is taken as an example for the scheme description, and details of the embodiments of the present invention are not repeated. Fig. 1 is a schematic flow diagram of a text sentence-breaking method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

step 110, determining a word feature vector for each word in the text.

The text here is the text that needs sentence break processing. In a subtitle generating scene, the text may be obtained by performing speech recognition on audio data that needs subtitle production, or may be obtained by performing manual audiometry and recording on the audio data that needs subtitle production, which is not specifically limited in this embodiment of the present invention. In the scene of subtitle off-line generation, audio data is extracted from audio and video files needing subtitle production, and in the scene of subtitle on-line generation, the audio data is obtained by performing real-time voice endpoint detection on audio and video streams.

In the existing technical scheme, sentence segmentation is usually performed by taking word segmentation in a text as a unit, so that a sentence segmentation result is obtained. When a sentence is broken for a text, the word number requirement for each clause exists, and the sentence breaking requirement for the text by using the clause as a unit is not consistent with the sentence breaking requirement of the text. Here, the word segmentation operation refers to splitting the text into individual words. Each character is provided with a character feature vector correspondingly, the character feature vector is used for representing the features of a single character, and the character feature vector of any character can comprise the character vector of the character, also can comprise the word vector to which the character belongs, and also can comprise the statistical probability of the sentence break before and after the character, and the like.

Step 120, inputting the character feature vector of each character into a sentence-breaking model to obtain the sentence-breaking probability of each character output by the sentence-breaking model; the sentence break model is obtained based on sample word feature vectors of sample words in the sample text and training of sentence break marks.

Specifically, the sentence break model is a pre-trained model, and is used for analyzing whether a sentence break is performed at the position of each word based on the input word feature vector of each word, and outputting the sentence break probability of each word. Here, the sentence break probability of any word is used to indicate the probability of performing a sentence break at the position of the word, and the sentence break at the position of the word may refer to a sentence break before the word or a sentence break after the word.

In addition, before step 120 is executed, the sentence break model may be obtained through training in advance, and specifically, the sentence break model may be obtained through training in the following manner: firstly, a large number of sample texts are collected, and sentence breaking is carried out on the sample texts based on a preset word number threshold value, so that a sentence breaking mark of each sample word in the sample texts is obtained, and the sentence breaking mark is used for representing whether the position of the sample word is a sentence breaking position or not. In addition, a sample word feature vector for each sample word in the sample text is determined. And then training the initial model based on the sample word feature vector and the sentence break identification of the sample word in the sample text, thereby obtaining a sentence break model. The initial model may be a single neural network model or a combination of a plurality of neural network models, and the embodiment of the present invention does not specifically limit the type and structure of the initial model.

Step 130, determining a plurality of candidate sentence-break results based on the sentence-break probability of each word.

Specifically, the candidate sentence-break result is a sentence-break result obtained by sentence-breaking the text based on the sentence-break probability of each word. Here, the candidate sentence-break result may be obtained by an algorithm such as greedy search, cluster search, or the like, or a sentence-break threshold may be preset, and if the sentence-break probability of any word is greater than the sentence-break threshold, a sentence-break is performed at the word, and different candidate sentence-break results are determined by different sentence-break thresholds.

Step 140, determining a sentence-break result based on a preset word count threshold and a plurality of candidate sentence-break results.

Specifically, the preset word number threshold is a preset maximum word number of a clause, for example, when the preset word number threshold is 10, a sentence-breaking result obtained by breaking a text is obtained, and the length of each clause is less than or equal to 10. The sentence break result of the text is used for indicating the position of the sentence break in the text. It should be noted that the length of each clause corresponding to the sentence-breaking result is less than or equal to the preset word number threshold.

After a plurality of candidate sentence-break results are obtained, whether each sentence length in any candidate sentence-break result is smaller than or equal to a preset word number threshold value or not can be judged, and then the candidate sentence-break result with each sentence length smaller than or equal to the preset word number threshold value is selected from the plurality of candidate sentence-break results and taken as a final sentence-break result.

According to the method provided by the embodiment of the invention, text sentence breaking is carried out based on the preset word number threshold and the sentence breaking probability of each word, the local semantics are not cut off, and the sentence breaking result of which the length of each sentence is less than or equal to the preset word number threshold is obtained, so that efficient and accurate text sentence breaking is realized, and the loss of labor cost and time cost is avoided.

Based on the above embodiment, in the method, step 110 specifically includes: the word feature vector for any word is determined based on the word vector for that word, or based on the word vector and the assist feature vector for that word.

In particular, the word feature vector of any word may be the word vector of that word, or a combination of the word vector and the ancillary feature vector of that word.

Further, fig. 2 is a schematic flow chart of a method for determining a word feature vector according to an embodiment of the present invention, as shown in fig. 2, in the method, step 110 specifically includes:

step 111, determine the word vector for any word.

Specifically, for any word, a word vector of the word may be obtained by initializing a word2vec model or a Glove model, or may be obtained by initializing randomly, which is not specifically limited in this embodiment of the present invention.

Step 112, determining an assistant feature vector of the word; the auxiliary feature vector comprises at least one of a position feature vector, a word co-occurrence feature vector and an acoustic feature vector; the position feature vector represents the position of the word in the participle to which the word belongs, and the word co-occurrence feature vector represents the co-occurrence condition of the word and the sentence break.

Specifically, the position feature vector of any word is used to represent the position of the word in the belonging participle, and the position feature vector may be used to represent that the word is in the beginning, end, or word of the belonging participle, may also be used to represent whether the word is in the end of the belonging participle, and may also be used to represent whether the word is in the beginning of the belonging participle. For example, for a participle "subtitle text", the participle is composed of four words "word", "subtitle", "text" and "text", assuming that a position feature vector adopts a one-bit discrete value to indicate whether the word is at the end of the participle to which the word belongs, and setting "0" to indicate that the word is not at the end of the word, and "1" to indicate that the word is at the end of the word, the position feature vectors corresponding to the "word", "subtitle" and "text" are all "0", and the position feature vector corresponding to the "text" is "1". For another example, if the position feature vector uses a two-bit discrete value to indicate that the word is in the beginning of a word, the end of a word, or a word in the word segment, and "00" is set to indicate that the word is in the beginning of a word, "01" is set to indicate that the word is in the word, and "11" is set to indicate that the word is in the end of a word, the position feature vector corresponding to the "word" is "00", the position feature vectors corresponding to the "curtain" and the "text" are both "01", and the position feature vector corresponding to the "text" is "11.

The word co-occurrence feature vector of any word is used for representing the co-occurrence condition of the word and the sentence break, the word co-occurrence feature vector can be obtained by statistics in advance, the word co-occurrence feature vector can be used for representing the correlation between the word and the sentence break, and the higher the correlation is, the higher the probability of the sentence break at the position of the word is.

The acoustic feature vector of any word is used to represent the features of the speech data corresponding to the word, such as the intensity, loudness, pitch, pause time, and speech speed of the speech data corresponding to the word.

It should be noted that, in the embodiment of the present invention, the execution order of step 111 and step 112 is not specifically limited, and step 111 may be executed before step 112, may be executed after step 112, and may also be executed synchronously with step 112.

Step 113, determining a word feature vector of the word based on the word vector and the assistant feature vector of the word.

Specifically, after the word vector and the auxiliary feature vector of any word are obtained, the word vector and the auxiliary feature vector may be spliced to obtain the word feature vector of the word.

The method provided by the embodiment of the invention enriches the character feature vector of any character by determining the auxiliary feature vector of the character, and is beneficial to improving the accuracy of sentence break probability.

Based on any of the above embodiments, in the method, in the auxiliary feature vector, the word co-occurrence feature vector of any word includes mutual information of the word and the sentence break; mutual information is determined based on the sentence break occurrence probability, the word occurrence probability for the word, and the word break co-occurrence probability.

In particular, Mutual Information (PMI) is used to measure the correlation between two variables. In the embodiment of the invention, mutual information contained in the word co-occurrence feature vector of any word is used for measuring the correlation between the word and the sentence break.

And the sentence break occurrence probability is the probability of sentence break occurrence obtained through the statistics of sentence break results of a large number of sample texts. For any word, the word occurrence probability of the word is the probability of the word occurring through statistics of a large number of sample texts, and the word phrase co-occurrence probability of the word is the probability of the phrase at the position of the word obtained through statistics of phrase results of a large number of sample texts, for example, in a 1000-word text, the phrase is 100 times, and the word occurs 50 times, wherein the phrase at the position of the word is 20 times. The sentence break occurrence probability is 100/1000 ═ 0.1, the word occurrence probability of the word is 50/1000 ═ 0.05, and the word break co-occurrence probability of the word is 20/1000 ═ 0.02. The mutual information of the word can be calculated to be 0.02/(0.1 × 0.05) according to the PMI formula.

Further, when the word co-occurrence eigenvector is calculated, the mutual information PMI (Φ W) of the word preceding sentence break can be calculated by the PMI formula respectively_i) PMI (W) of mutual information of word-and-word punctuation_iΦ)：

Where phi is a punctuation symbol, W_iIs any word, P (phi W)_i) Probability of co-occurrence of word-phrase for pre-word phrase, P (W)_iPhi) is the co-occurrence probability of word and sentence breaks after word, P (phi) is the occurrence probability of sentence breaks, P (W)_i) Is the word occurrence probability.

According to any of the embodiments, in the method, the acoustic feature vector in the auxiliary feature vector includes a pause duration feature vector, or the pause duration feature vector and the speech rate feature vector.

Specifically, for any word, the time interval between the speech data corresponding to the word and the speech data corresponding to the word subsequent to the word, i.e., the pause duration of the word. The pause duration feature vector of the word is used to characterize the pause duration of the word. Usually, there is a correlation between the size of the pause duration and the semantic sentence break, and the larger the pause duration is, the higher the probability of sentence break at the position of the word is.

When determining the sentence break probability based on the pause duration feature vector, it is generally considered that the longer the pause duration, the greater the probability of the post-word sentence break. However, if the speech speed of the speaker corresponding to the speech data is slow, applying the pause duration feature vector for determining the sentence break probability results in that the sentence break probability of each word is higher than the actual situation. In the embodiment of the invention, the sentence breaking probability is determined by using the speech rate feature vector. Here, the speech rate feature vector is used to represent the speech rate of the speaker corresponding to the voice data, and the speech rate feature vector of any word may be obtained based on the number of words ending at the word and the duration of the voice data ending at the word. The complementary relation exists between the speech speed characteristic vector and the pause duration characteristic vector, so that the problem that the semantic segmentation is too fragmented due to too slow speech speed of a speaker can be avoided.

Based on any of the above embodiments, fig. 3 is a schematic structural diagram of a sentence-breaking model provided by an embodiment of the present invention, as shown in fig. 3, the sentence-breaking model includes an input layer, a hidden layer, and an output layer, where W1, W2, …, and Wn in the input layer represent n words in a text, Vec1, Vec2, …, and Vecn represent word feature vectors corresponding to the n words, the hidden layer analyzes each word feature vector, and outputs Punc1, Punc2, …, and Puncn by the output layer, where Punc1, Punc2, …, and Puncn represent sentence-breaking identifications of the n words, and each sentence-breaking identification corresponds to a sentence-breaking probability for representing a confidence of the sentence-breaking identification.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of a sentence break result determination method provided in an embodiment of the present invention, as shown in fig. 4, in the method, step 130 specifically includes:

step 131, constructing a search tree based on the sentence break probability of each word;

step 132, determining a plurality of candidate sentence-break results based on the search tree.

Specifically, assume any word is W_i，W_iHas a sentence-break probability of P (1| W)_i) Then W is_iHas a probability of no sentence break of 1-P (1| W)_i). And constructing a search tree based on the sentence break probability and the sentence break probability of each word in the text. Fig. 5 is a schematic structural diagram of a search tree according to an embodiment of the present invention, as shown in fig. 5, each word corresponds to two nodes, and each node is used for indicating a sentence break or a sentence break at a position where the word is located, and each node includes two nodes for indicating a sentence break or a sentence break at a position where a next word of the word is located, so that the formed search tree includes a case where each word in a text is a sentence break or a sentence break. And calculating the sum of the probabilities corresponding to each node in each path in the search tree as the score of each path, and selecting a plurality of paths with the highest score as candidate sentence-breaking results or selecting a plurality of paths with the score larger than or equal to a preset score threshold value as candidate sentence-breaking results.

Referring to fig. 5, the leftmost path is a sentence with a uniform position of each word, the rightmost path is a sentence with a uniform position of each word, and assuming that n is 10, the sentence break probability and the sentence break probability of each word are shown in the following table:

i	1	2	3	4	5	6	7	8	9	10
											sentence break probability P (1\| W)_i)	0.1	0.2	0.7	0.2	0.4	0.9	0.1	0.2	0.3	0.7
Probability of punctuation 1-P (1\| W)_i)	0.9	0.8	0.3	0.8	0.6	0.1	0.9	0.8	0.7	0.3

For the path A, the values corresponding to the 3 rd and 6 th nodes in the path A are sentence break probabilities, and the values corresponding to the other nodes are sentence break probabilities, so that the score corresponding to the path A is the sum of the sentence break probabilities of the 3 rd and 6 th words and the sentence break probabilities of the 1 st, 2 nd, 4 th, 5 th, 7 th, 8 th, 9 th and 10 th words; for the path B, the values corresponding to the 3 rd, 6 th and 10 th nodes in the path B are sentence break probabilities, and the values corresponding to the other nodes are sentence break probabilities, so that the score corresponding to the path B is the sum of the sentence break probabilities of the 3 rd, 6 th and 10 th words and the sentence break probabilities of the 1 st, 2 nd, 4 th, 5 th, 7 th, 8 th and 9 th words.

Based on any of the above embodiments, in the method, step 140 specifically includes: step 141, arranging a plurality of candidate sentence-breaking results in a descending order based on the sentence-breaking scores; the sentence break score is determined based on the sentence break probability of the character corresponding to each sentence break position in the candidate sentence break result and the sentence break probability of the character corresponding to each sentence break position; starting from the first candidate sentence-break result, if the word number of each sentence in the current candidate sentence-break result is less than or equal to a preset word number threshold, taking the current candidate sentence-break result as a sentence-break result; otherwise, updating the next candidate sentence-break result as the current candidate sentence-break result.

Specifically, the sentence-break score may be a score of a corresponding path on the search tree constructed in step 131, that is, a sum of a sentence-break probability of a word corresponding to each sentence-break position in the candidate sentence-break result and a sentence-break probability of a word corresponding to each sentence-break position, or a score result obtained by inputting the sentence-break probability of a word corresponding to each sentence-break position in the candidate sentence-break result and the sentence-break probability of a word corresponding to each sentence-break position into a score model obtained by training a sample sentence-break result and a corresponding score thereof in advance, which is not specifically limited in the embodiment of the present invention.

Suppose that 3 candidate sentence-break results are obtained, and the 3 candidate sentence-break results are arranged according to the sequence of sentence-break scores from large to small. Of the 3 candidate sentence-break results, the first candidate sentence-break result has sentence numbers of 7, 9, and 12, the second candidate sentence-break result has sentence numbers of 10, 9, and the third candidate sentence-break result has sentence numbers of 12, 7, and 9, respectively. Assuming that the preset word number threshold is 10, firstly, judging whether a first candidate sentence-break result meets the preset word number threshold, judging whether clauses with the word number larger than 10 exist in the first candidate sentence-break result, sequentially judging whether a second candidate sentence-break result meets the preset word number threshold, judging whether the length of each clause in the second candidate sentence-break result is smaller than or equal to 10, and taking the second candidate sentence-break result as a final sentence-break result.

According to any of the above embodiments, in the method, step 140 further includes: step 142, if each candidate sentence-break result has a clause with the word number larger than the preset word number threshold, determining a clause with the word number larger than the preset word number threshold in the first candidate sentence-break result; step 143, sentence breaking is performed based on the sentence breaking probability of each word in the clauses with the word number larger than the preset word number threshold value until the word number of each clause in the first candidate sentence breaking result is less than or equal to the preset word number threshold value; and taking the first candidate sentence-breaking result as a sentence-breaking result.

Specifically, the sentence break probability of each word in the clause where the number of words is greater than the preset word number threshold may be that the sentence break is performed at a position where the word with the largest sentence break probability is located, or that the sentence break is performed at a position where the word with the sentence break probability greater than the preset probability threshold.

After sentence break is carried out on the clauses with the word number larger than the preset word number threshold, whether the clauses with the word number larger than the preset word number threshold still exist in the first candidate sentence break result or not is judged again, and if the clauses do not exist, the first candidate sentence break result is used as a sentence break result; if yes, the clauses with the word number larger than the preset word number threshold value are determined again, and sentence breaking is carried out.

Assume that the default word count threshold is 10. In the 3 candidate sentence-break results, the sentence numbers of the first candidate sentence-break result are 7, 9 and 12, the sentence numbers of the second candidate sentence-break result are 11, 9 and 8, the sentence numbers of the third candidate sentence-break result are 12, 7 and 9, and sentences with word numbers larger than a preset word number threshold exist in all the 3 candidate sentence-break results. At this time, of the clauses of the first candidate sentence-break result, the clause with the word number greater than 10 is determined as the third clause, and the word number of the third clause is determined as 12. The sentence break probability of each word in the third clause is obtained as follows:

i	17	18	19	20	21	22	23	24	25	26	27	28
													sentence break probability P (1\| W)_i)	0.1	0.2	0.7	0.2	0.4	0.9	0.1	0.2	0.3	0.7	0.1	0.2

In the third clause, the probability of sentence break of the 22 th character is the largest, sentence break is carried out at the position of the 22 th character to obtain two clauses with the length of 6, the number of the clauses of the first candidate sentence break result is 7, 9, 6 and 6 respectively, the number of the clauses is less than 10, and the first candidate sentence break result is used as the sentence break result.

According to any of the above embodiments, the method further includes, between step 142 and step 143: determining the distance between any word and the position of the last sentence break in the clauses with the word number larger than the preset word number threshold; determining a distance excitation probability for the word based on the distance; and updating the sentence break probability of the character based on the sentence break probability and the distance excitation probability of the character.

Specifically, under the condition that each candidate sentence-break result does not satisfy the preset word number threshold, a sentence with a word number greater than the preset word number threshold in the first candidate sentence-break result needs to be sentence-broken. In order to satisfy the preset word number threshold value as much as possible, the embodiment of the invention takes the distance as an excitation condition on the basis of determining the distance between any word in a clause and the position of the last clause, and excites the clause probability of the word, and the excitation is larger when the distance between any word and the position of the last clause is larger. Here, the excitation is embodied by a distance excitation probability, and for the distance excitation probability of any word, that is, the probability for exciting a sentence break at the position of the word, which is obtained based on the distance, the sentence break probability can be updated based on the distance excitation probability and the sentence break probability, so that the sentence break probability under the distance excitation is obtained. Here, the distance excitation probability and the sentence break probability may be directly added to be the updated sentence break probability, and the distance excitation probability and the sentence break probability may also be weighted to obtain the updated sentence break probability.

Further, the excitation formula is as follows:

wherein P (1| W)_i) Is the ith word W_iSentence break probability of P (1| W)_i) Is' W_iThe sentence-breaking probability under distance excitation, alpha and beta are regulating parameters, N is a preset word number threshold, and l is W_iThe distance from the position of the last sentence break.

According to any of the above embodiments, the method further includes, before step 110: extracting audio data from the audio and video file; and carrying out voice recognition on the audio data to obtain a text.

Here, the audio/video file is an audio file or a video file that needs to be subjected to caption production. The audio and video files can be pre-recorded files or can be generated in real time in the live broadcasting process. After the audio and video file is determined, audio data are extracted according to the audio and video file, and voice recognition is carried out on the audio data, so that a text needing sentence breaking can be obtained.

According to any of the above embodiments, the method further includes, after the step 140: determining the front and rear time boundary of each clause in the sentence-break result based on the audio data corresponding to the text; and converting the text into a text in a subtitle format based on the sentence break result and the front and rear time boundaries of each clause.

Specifically, after the sentence-break result is obtained, aligning each clause in the sentence-break result with the audio data through a forced alignment algorithm to obtain a corresponding front-rear time boundary of each clause in the audio data. Specifically, the specific position of each word in the text in the audio data can be obtained through a forward and backward algorithm, and further the specific position of each clause in the audio data, namely the front and back time boundaries of each clause, can be obtained.

And after the front-rear time boundary of each clause is obtained, carrying out format conversion on the text based on the sentence break result and the front-rear time boundary of each clause. And obtaining the subtitle format text. For example, the text "this is a sentence-breaking model with very good effect", the sentence-breaking result obtained by the sentence-breaking is "this is a sentence-breaking model with very good effect", wherein the front and back time boundaries of the clause "this is a sentence-breaking model" are 1s and 4s, and the front and back time boundaries of the clause "with very good effect" are 4.5s and 7s, and the caption format text obtained thereby is as follows:

1.0004.000 this is a sentence-breaking model

4.5007.000 the effect is very good

Based on any one of the above embodiments, an embodiment of the present invention provides a sentence break model training method, which specifically includes the following steps:

first, a large amount of sample texts meeting the requirements of the subtitle clauses of the media scene are collected. Here, the sample text is a subtitle text where a sentence break position is manually marked in an actual service, for example, a subtitle text manually added in a broadcasting process of various television programs.

After the sample text is obtained, the sample text is split into independent sample words, and a sentence break identifier of each sample word in the sample text is determined, wherein the sentence break identifier is used for representing whether a sentence is broken at the position of the sample word, for example, if the sentence break identifier is 0, the sentence is not broken at the position of the sample word, and if the sentence break identifier is 1, the sentence is broken at the position of the sample word.

Further, a word vector and an assistant feature vector for each sample word in the sample text are determined, and a word feature vector for any sample word is determined by splicing the word vector and the assistant feature vector for any sample word.

And then training the initial model based on the sample character feature vector of the sample character in the sample text and the sentence break identifier, thereby obtaining the sentence break model. Here, the initial model may be a long-short term memory network LSTM, a bidirectional long-short term memory network BLSTM, a Self-Attention mechanism Self-Attention, and the like, which is not specifically limited in this embodiment of the present invention.

Based on any one of the above embodiments, an embodiment of the present invention provides a subtitle generating method, which specifically includes the following steps:

under a non-real-time subtitle offline generation scene, extracting audio data from an audio/video file needing subtitle production; and under the scene of real-time subtitle on-line generation, audio data is obtained by carrying out real-time voice endpoint detection on the audio and video stream. And then inputting the audio data into a voice recognition system for voice recognition, wherein in the process, the voice recognition system performs slicing processing on the audio data according to the pause information in the audio data and outputs a sliced text, namely a subtitle text.

And after the subtitle text is obtained, preprocessing the subtitle text. Here the preprocessing steps include word segmentation, word vector conversion and assist feature vector extraction. The auxiliary feature vectors include position feature vectors, word co-occurrence feature vectors and acoustic feature vectors, the word co-occurrence feature vectors can be obtained by querying from a word co-occurrence feature table counted in advance, and pause duration feature vectors and speech speed feature vectors in the acoustic feature vectors need to be determined by combining audio data.

After a word vector and an auxiliary characteristic vector of each word in the subtitle text are obtained, the word characteristic vector of the word is determined by splicing the word vector and the auxiliary characteristic vector of any word. And then inputting the character feature vector of each character in the caption text into the sentence break model, and calculating and outputting the sentence break probability of each character through the sentence break model by a forward algorithm.

After the sentence break probability of each character is obtained, determining a plurality of candidate sentence break results and a sentence break score of each candidate sentence break result through cluster searching based on the sentence break probability and the sentence break probability of each character; and the sentence break score is the sum of the sentence break probability of the character corresponding to each sentence break position and the sentence break probability of the character corresponding to each sentence break position in the candidate sentence break result.

Then, arranging a plurality of candidate sentence-break results based on the sequence of sentence-break scores from large to small, starting from the first candidate sentence-break result, judging whether the word number of each sentence in the current candidate sentence-break result is less than or equal to a preset word number threshold one by one, if so, taking the current candidate sentence-break result as the sentence-break result, otherwise, updating the next candidate sentence-break result as the current candidate sentence-break result for judgment.

If clauses with the word number larger than the preset word number threshold exist in each candidate sentence-break result, analyzing the first candidate sentence-break result, determining the clauses with the word number larger than the preset word number threshold in the first candidate sentence-break result, performing sentence-break on the positions of the words with the maximum sentence-break probability in the clauses, detecting whether the clauses with the word number larger than the preset word number threshold exist in the first candidate sentence-break result again after the sentence-break is completed, and if the clauses do not exist, taking the first candidate sentence-break result as the sentence-break result; if the candidate sentence-breaking result exists, sentence-breaking is carried out on the clauses with the word number larger than the preset word number threshold value until the clauses with the word number larger than the preset word number threshold value do not exist in the first candidate sentence-breaking result.

After the sentence-breaking result is obtained, aligning each clause in the sentence-breaking result with the audio and video through a forced alignment algorithm to obtain a corresponding front-rear time boundary of each clause in the audio and video. And after the front and rear time boundaries of each clause are obtained, carrying out format conversion on the subtitle text based on the sentence break result and the front and rear time boundaries of each clause. And obtaining the text in the subtitle format.

According to the method provided by the embodiment of the invention, the subtitle text is segmented based on the preset word number threshold and the segmentation probability of each word, the segmentation result that the length of each clause is less than or equal to the preset word number threshold is obtained while the local semantics are not cut off, the efficient and accurate subtitle text segmentation is realized, the loss of labor cost and time cost is avoided, the instantaneity of the subtitle text segmentation is improved, and the subtitle generation speed is favorably accelerated. In addition, the auxiliary feature vector of any word is determined, so that the word feature vector of the word is enriched, and the accuracy of sentence break probability is improved.

Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a text sentence-breaking device provided in an embodiment of the present invention, as shown in fig. 6, the device includes a word feature vector determining unit 610, a sentence-breaking probability determining unit 620, a candidate sentence-breaking result determining unit 630, and a sentence-breaking result determining unit 640;

the word feature vector determining unit 610 is configured to determine a word feature vector of each word in the text;

the sentence break probability determining unit 620 is configured to input the word feature vector of each word into a sentence break model, so as to obtain a sentence break probability of each word output by the sentence break model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks;

the candidate sentence-break result determining unit 630 is configured to determine a plurality of candidate sentence-break results based on the sentence-break probability of each word;

the sentence break result determining unit 640 is configured to determine a sentence break result based on a preset word count threshold and the plurality of candidate sentence break results.

The device provided by the embodiment of the invention can be used for text sentence breaking based on the preset word number threshold and the sentence breaking probability of each word, so that the local semantics are not cut off, and the sentence breaking result of which the length of each sentence is less than or equal to the preset word number threshold is obtained, thereby realizing efficient and accurate text sentence breaking and avoiding the loss of labor cost and time cost.

Based on any of the above embodiments, in the apparatus, the candidate sentence-break result determining unit 630 is specifically configured to:

Based on any of the above embodiments, in the apparatus, the sentence break result determining unit 640 includes:

a sequential ranking subunit, configured to rank the multiple candidate sentence-break results in a descending order based on the sentence-break scores; the sentence break score is determined based on the sentence break probability of the character corresponding to each sentence break position in the candidate sentence break result and the sentence break probability of the character corresponding to each sentence break position;

a word number judging subunit, configured to start from a first candidate sentence-break result, and if the number of words in each sentence in the current candidate sentence-break result is less than or equal to the preset word number threshold, take the current candidate sentence-break result as the sentence-break result; otherwise, updating the next candidate sentence-break result as the current candidate sentence-break result.

Based on any of the above embodiments, in the apparatus, the sentence break result determining unit 640 further includes:

a super word segmentation sub-unit, configured to determine a first segmentation result with a word count greater than the preset word count threshold if each of the candidate segmentation results has a segmentation with a word count greater than the preset word count threshold;

According to any of the above embodiments, in the apparatus, the superword-breaking sentence unit is further configured to:

Based on any of the above embodiments, in the apparatus, the word feature vector determination unit 610 is specifically configured to:

Based on any of the above embodiments, in the apparatus, the word co-occurrence feature vector of any word includes mutual information of any word and a sentence break; the mutual information is determined based on the sentence break occurrence probability, the word occurrence probability of any word and the word break co-occurrence probability.

Based on any embodiment above, the apparatus further comprises:

the text acquisition unit is used for extracting audio data from the audio and video file; and performing voice recognition on the audio data to obtain the text.

Based on any embodiment above, the apparatus further comprises:

the caption generating unit is used for determining the boundary of the front moment and the rear moment of each clause in the sentence break result based on the audio data corresponding to the text;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method: determining a character feature vector of each character in the text; inputting the character feature vector of each character into a sentence break model to obtain the sentence break probability of each character output by the sentence break model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks; determining a plurality of candidate sentence-break results based on the sentence-break probability of each word; and determining a sentence-break result based on a preset word number threshold value and the plurality of candidate sentence-break results.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining a character feature vector of each character in the text; inputting the character feature vector of each character into a sentence break model to obtain the sentence break probability of each character output by the sentence break model; the sentence break model is obtained by training based on sample word feature vectors of sample words in the sample text and sentence break marks; determining a plurality of candidate sentence-break results based on the sentence-break probability of each word; and determining a sentence-break result based on a preset word number threshold value and the plurality of candidate sentence-break results.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A text sentence-breaking method, comprising:

determining a character feature vector of each character in the text;

2. The text sentence-breaking method according to claim 1, wherein the determining a plurality of candidate sentence-breaking results based on the sentence-breaking probability of each word specifically comprises:

3. The text sentence-breaking method according to claim 1 or 2, wherein the determining a sentence-breaking result based on a preset word count threshold and the plurality of candidate sentence-breaking results specifically comprises:

4. The text sentence-breaking method of claim 3 wherein the determining a sentence-breaking result based on a preset word count threshold and the plurality of candidate sentence-breaking results further comprises:

5. The text sentence-breaking method of claim 4 wherein the sentence-breaking based on the sentence-breaking probability of each word in the sentence with the word count greater than the preset word count threshold further comprises:

6. The method of claim 1, wherein the determining the word feature vector of each word in the text specifically comprises:

7. The text sentence-breaking method of claim 6 wherein the word co-occurrence feature vector of any word includes mutual information of the any word and the sentence-breaking; the mutual information is determined based on the sentence break occurrence probability, the word occurrence probability of any word and the word break co-occurrence probability.

8. The method of text sentence-breaking according to claim 1, wherein the determining a word feature vector for each word in the text further comprises:

extracting audio data from the audio and video file;

and performing voice recognition on the audio data to obtain the text.

9. The text sentence-breaking method of claim 1 wherein the determining a sentence-breaking result based on a preset word count threshold and the plurality of candidate sentence-breaking results further comprises:

10. A text sentence-breaking apparatus, comprising:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the text sentence breaking method according to any of claims 1 to 9 are implemented by the processor when executing the program.

12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the text sentence-breaking method according to any one of claims 1 to 9.